基于圖文對比融合的圖像人物情感識別

打開文本圖片集
關(guān)鍵詞:情感識別;視覺語言模型;情境感知;多模態(tài)融合
中圖分類號:TP391.41 文獻(xiàn)標(biāo)志碼:A 文章編號:1001-3695(2025)07-007-1972-06
doi:10.19734/j.issn.1001-3695.2024.12.0497
Abstract:Context-based recognition of human emotions in images has becomean increasingly popular task in recentyears, withaplication value in manyfields.Most existing methodsonly encode thehuman subjectandthe background separately,extracting isolatedfeaturesforsimple interaction,lackinganefectivefeaturefusionmechanismbetweenthesubjectandthecontextualbackground.Aimedtoaddresstheisueoftheinteractionbetweencomplexbackgroundsandthehumansubject,thispaperproposedanewnetwork forhumanemotionrecognitioninimages basedontext-imagecontrastivefusion.Firstly,itdesigned promptwords toextracttextualdescriptionsoftheemotionalstatebetweenthecontextualbackgroundandthetargethumansubjectbyfullyutilizedtheextensivesocialcontext informationandreasoningcapabilitiesof largevisual-language models.Secondly,it proposedatext-imagecontrastivefusionmodule,which fusedthecroppedtargethumansubjectimagefeatureswithhe textdescriptionfeaturesobtainedbasedonthepromptwordsthrough thismodule.Finaly,thefusionalgorithmintroduceda contrastive lossfunction tounifytherepresentationof imageencodingand text encoding,allowing for more accuratecaptureof efectiveemotionalexpresions during fusion.Experimentalresultsshowthat thenetorkcanlearnmoreefectiveemotioalfeature representations,and the network achieves superior results on the EMOTIC dataset with an mAP of 37.30% . The proposed methodbetterintegratesthefeaturesof thehumansubjectandthebackgroundintheimage,therebyimprovingtheaccuracyof human emotion recognition in images.
Key words:emotion recognition;vision-language model;context awareness;multimodal fusion
0 引言
人物情感識別系統(tǒng)已經(jīng)應(yīng)用到醫(yī)療健康、智慧教育、人機(jī)交互等領(lǐng)域,潛移默化地影響著人們的生活,情感識別在真實(shí)場景中面臨著復(fù)雜多變的情況,如何根據(jù)情境線索識別人物情感具有重要意義。(剩余12948字)