特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

基于CLIP文本特征增強的剪紙圖像分類

  • 打印
  • 收藏
收藏成功


打開文本圖片集

關(guān)鍵詞:視覺語言大模型;剪紙分類;小樣本分類;模態(tài)融合;提示學習 中圖分類號:TP391 文獻標志碼:A 文章編號:1001-3695(2025)07-010-1994-09 doi:10.19734/j.issn.1001-3695.2024.11.0485

Abstract:Toaddressthechallengesoflarge modalitygaps between textand image featuresand insuficient classprototype representationin paper-cut image clasification,this paper proposed a CLIP-based textfeature enhancement method(CLIP visualtextenhancer,C-VTE).Themethdextractedtext featuresthrough manualprompttemplates,designedavisual-textenhancement module,andemployedCrosssAtentionand proportionalresidualconnections tofuseimageandtextfeatures,therebyreducing modalitydiscrepancyandenhancing the expressiveabilityofcategoryfeatures.Experimentsonapaper-cutdataset andfourpublicdatasets includingCaltech01validatedits efectivenessForbase-classclasificationonthepaper-cutdataset, C-VTE achieved 72.51% average accuracy,outperforming existing methods by 3.14 percentage points. In few-shot classification tasks on public datasets,it attained 84.78% average accuracy with a 2.45 percentage-point improvement.Ablation experimentsdemonstratethatboth themodalityfusion moduleand proportional residual components contribute significantlytoperformanceimprovement.Themethodofersnovelinsightsforeficientadaptationof vision-languagemodelsindownstreamclassification tasks,particularly suited for few-shot learning and base-class dominated scenarios.

Key words:visual language large model;paper-cut classification;few-shotclasification;multimodal fusion;prompt learning

0 引言

在非遺領(lǐng)域中,剪紙主要是以圖片的形式存在,且種類復(fù)雜,數(shù)量繁多。(剩余22719字)

目錄
monitor