特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

面向視覺-語言模型的遞進(jìn)互提示學(xué)習(xí)

  • 打印
  • 收藏
收藏成功


打開文本圖片集

doi: 10.19734/j. issn. 1001-3695.2024.10.0446

ProgCoPL: progressive co-prompting learning for vision-language models

Tao Junjie1,Zhang Weifeng1,2+,Wang Yuxia3,Miao Yi1 ,Xu Ling1 (1.Schoolofuece&o(lflellgee),ZgSUesit,g;2. Schoolfee&niUitinZ;i Institute,Jiaxing Zhejiang 31400o,China)

Abstract:Thelarge-scalepre-trainedvision-language modelCLIPaligns imagesandtexts inasharedsemanticspace,demonstratingrobust generalizationcapabilitiesacrossdiversedownstream tasks.However,existing promptlearning methodsoftenindependently insert learnable prompt vectors intoeach layerofCLIP's visualand text encoders.This appoach results in limitedcross-modalinteraction,withindependentpromptsacrosslayersfailing toefectivelyguidetheencoders incapturing taskrelevant information.Toaddress these isses,thispaper proposedProgCoPL.This method introduced text-guided promptvectorsintothevisualencoderlayersandvision-guidedpromptvectorsintothetextencoderlayers,therebyenhancingcro-modal interactionandalignment.Furthermore,ProgCoPL incorporated informationtransmissionchannelsbetweenpromptvectors acrosslayers,enablinghierarchicalandprogressiveintegrationof taskspecificinformation.Experimentson11datasetsshow thatProgCoPLeficientlyadaptsCLIPtodownstreamtasks,significantlyimprovingitscros-datasetgeneralizationability. ProgCoPLoutperforms existing methods in multiplegeneralization tests,particularlyachieving notable advancements incrossdataset scenarios.

Key Words:multimodal;prompt learning;vision-language model; Transformer encoder

0 引言

大規(guī)模視覺-語言模型(visual languagemodel,V-L Model)已經(jīng)成為當(dāng)今計算機(jī)跨模態(tài)智能領(lǐng)域的核心技術(shù)之一。(剩余19086字)

目錄
monitor