特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

多維度交叉注意力融合的視聽(tīng)分割網(wǎng)絡(luò)

  • 打印
  • 收藏
收藏成功


打開(kāi)文本圖片集

doi:10.19734/j.issn.1001-3695.2024.08.0369

Audio-visual segmentation network with multi-dimensional cross-attention fusion

LiFanfan,Zhang Yuanyuan,Zhang Yonglong,Zhu Junwu? (School of Information Engineering,Yangzhou University,Yangzhou Jiangsu 2251Oo,China)

Abstract:Audio-visual segmentation (AVS)aimsto locateandaccuratelysegmentthesoundingobjects inimagesbasedon both visualandauditoryinformation.Whilemostexistingresearch focusesprimarilyonexploring methods foraudio-visualinformationfusio,thereisinsuicientin-depthexplorationoffine-grinedaudio-visualanalysis,particularlyinaligingcontinuousaudiofeatures withspatialpixel-level information.Therefore,thispaperproposedanaudio-visualsegmentationatention fusion(AVSAF)method basedoncontrastive learning.Firstly,themethodusedmulti-ead crossattentionmechanismand memorytokentoconstructaaudio-visualtokenfusionmodule toreducethelossofmulti-modalinformation.Secondlyitintro ducedcontrastivelearning tominimizethediscrepancybetweenaudioandvisualfeatures,enhancing theiralignment.Aduallayerdecoderwasthenemployedtoaccuratelypredictandsegment thetarget’sposition.Finalyitcarredoutalargeumber of experiments on the S4 and MS3 sub-datasets of the AVSBenge-Object dataset.The J -valueisincreasedby3.O4and4.71 percentage pointsrespectively,and the F valueis increased by 2.4 and3.5percentage points respectively,which fully proves the effectiveness of the proposed method in audio-visual segmentation tasks.

Key words:audio-visual segmentation;multi-modal;contrastive learning;attention mechanism

0引言

人類(lèi)的感知是多維的,包括視覺(jué)、聽(tīng)覺(jué)、觸覺(jué)、味覺(jué)和嗅覺(jué)。(剩余13740字)

目錄
monitor