結(jié)合多尺度與多層級聚合的卷軸畫圖像描述模型

打開文本圖片集
DOI:10.16652/j.issn.1004-373x.2025.17.007引用格式:,,.結(jié)合多尺度與多層級聚合的卷軸畫圖像描述模型[J].現(xiàn)代電子技術(shù),2025,48(17):41-47.
關(guān)鍵詞:圖像描述;卷軸畫圖像;多尺度特征;非對稱卷積;多層級聚合解碼;Transformer中圖分類號:TN919-34;TP391 文獻(xiàn)標(biāo)識碼:A 文章編號:1004-373X(2025)17-0041-07
Scroll painting image caption model combining multi-scale and multi-level aggregation
YUE Chaoyang1,HU Wenjin12, ZHANG Fujun1 (1.KeyLabotofigsticduraloutinstrutiostnUesityou; 2.SchoolofMathematicsandComputerScience,NorthwestMinzu University,Lanzhou73oo3o,China)
Abstract:Thecrollpaintingimageshavediferentsizesandacertainspatialdistributioncharacteristics,andthetrasforer based encodinglayerisprone tolosingkeyimage information,soascroll painting imagecaptionmodel MMAcombiningmultiscaleandmulti-levelaggregationisproposed.Inthestageofencoding,byintroducingasymmetricconvolutionandmulti-scale featuremodules,theabilityof theconvolutionlayertoobtainspatialinformationcanbeimprovedefectivelyandtheglobaland localmulti-scalecontextualinformationofthescrollpaintingimagecanbeintegratedtoobtainafeaturerepresentationwithrich semanticinforatio.Amulti-evelaggationnetworkisdsigndintestageofeoding.Byggatingtefeatursofit codinglayers,thesemanticinformationofthehigh-levelcodinglayerandthecontentinformationofthelow-levelcodinglayeare utilizedeffctivelytherebyaleviatingtheinformationlossefetivelyExperimentalresultsshowthatthemodelachievsood resultsonthescrollpintingdataset,improving BLEU-4andMETEORby26.7%and0.9%,respectively,incomparisonwiththe NIC (neural image caption) model,and generates more accurate description sentences.
Keywords:imagecaption;scroll paintingimage;multi-scale feature;asymmetricconvolution;multi-levelaggregation decoding; Transformer
0 引言
卷軸畫作為我國獨特的繪畫藝術(shù)形式,通過對卷軸畫進(jìn)行圖像描述能夠幫助人們賞析和理解卷軸畫,促進(jìn)文化交流,同時為卷軸畫的研究和數(shù)字化保護(hù)提供技術(shù)支持。(剩余10500字)