基于CLIP文本特征增強的剪紙圖像分類

打印
收藏

收藏成功

微博 QQ空間微信

打開文本圖片集

關(guān)鍵詞：視覺語言大模型；剪紙分類；小樣本分類；模態(tài)融合；提示學習中圖分類號：TP391 文獻標志碼：A 文章編號：1001-3695（2025）07-010-1994-09 doi：10.19734/j.issn.1001-3695.2024.11.0485

Abstract：Toaddressthechallengesoflarge modalitygaps between textand image featuresand insuficient classprototype representationin paper-cut image clasification，this paper proposed a CLIP-based textfeature enhancement method（CLIP visualtextenhancer，C-VTE）.Themethdextractedtext featuresthrough manualprompttemplates，designedavisual-textenhancement module，andemployedCrosssAtentionand proportionalresidualconnections tofuseimageandtextfeatures，therebyreducing modalitydiscrepancyandenhancing the expressiveabilityofcategoryfeatures.Experimentsonapaper-cutdataset andfourpublicdatasets includingCaltech01validatedits efectivenessForbase-classclasificationonthepaper-cutdataset， C-VTE achieved 72.51% average accuracy，outperforming existing methods by 3.14 percentage points. In few-shot classification tasks on public datasets，it attained 84.78% average accuracy with a 2.45 percentage-point improvement.Ablation experimentsdemonstratethatboth themodalityfusion moduleand proportional residual components contribute significantlytoperformanceimprovement.Themethodofersnovelinsightsforeficientadaptationof vision-languagemodelsindownstreamclassification tasks，particularly suited for few-shot learning and base-class dominated scenarios.

Key words：visual language large model；paper-cut classification；few-shotclasification；multimodal fusion；prompt learning

0 引言

在非遺領(lǐng)域中，剪紙主要是以圖片的形式存在，且種類復(fù)雜，數(shù)量繁多。（剩余22719字）

試讀結(jié)束

購買全文6.00元下一篇基于完整超圖神經(jīng)網(wǎng)絡(luò)的捆綁推薦模型

計算機應(yīng)用研究

2025年07期

￥12.00/本

特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

基于CLIP文本特征增強的剪紙圖像分類