基于增強(qiáng)控制流圖與孿生網(wǎng)絡(luò)架構(gòu)的代碼克隆檢測(cè)方法

打開(kāi)文本圖片集
關(guān)鍵詞:控制流圖;孿生網(wǎng)絡(luò)架構(gòu);代碼表征;語(yǔ)義相似性;克隆檢測(cè)中圖分類號(hào):TP311 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1001-3695(2025)07-028-2132-09doi:10.19734/j. issn.1001-3695.2024.11.0441
Abstract:Toaddresstheissues of mising contextual information and weak semantic learning capabilities inexisting code clone detection methods,this paper proposedamethod basedonanenhancedcontrolflowgraph(ECFG)and twin network architecture.Firstly,itdesignedECFG,whichembeddedcross-nodecorelationedges tostrenghencontextualawareness. Then,itintroducedCGSMN,asemanticmatching modelbasedontwinnetworks.Thismodelintegratedamulti-headatntion mechanism to extractkeyinformationfromthenodes,thenimprovedtherelational graphatentionnetwork tocapture nternode associationsand generate graph feature vectors.Finall,it explored thesemanticrelationships between thesefeature vectorsandcomputedthesemanticsimilarity.Empirical evaluationwasconductedon tworepresentative datasets.Theresults show that,compared to methods such as ASTNN,F(xiàn)A-AST,and DHAST,the F1 -score on the BigCloneBench dataset improves by0.5 to15.5percentage points,andby1.5to16.5percentage pointsonthe GogleCode Jamdataset,demonstrating the effectiveness of the proposed method for semantic clone detection.
Key words:controlflow graph;siamese neural network;coderepresentation;codesemantic similarity;codeclonedetection
0 引言
近年來(lái),開(kāi)源文化的蓬勃發(fā)展催生了一系列以協(xié)作共享為理念的開(kāi)發(fā)者社群,對(duì)軟件開(kāi)發(fā)模式產(chǎn)生顛覆性改變[1],軟件代碼的復(fù)制、粘貼和修改等克隆模式成為一種普遍且高效的實(shí)踐。(剩余23693字)