基于特征融合的音頻偽造檢測方法

打開文本圖片集
關(guān)鍵詞:音頻深度偽造檢測;深度學(xué)習(xí);特征融合;聲碼器偽跡
中圖分類號(hào):TN912.3 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1001-3695(2025)07-025-2109-07
doi:10.19734/j.issn.1001-3695.2024.11.0460
Abstract:Advancements inartificialinteligence have madedistinguishingsynthesized speech fromgenuinespeech increasinglychallenging,complicating audio deepfake detection.Existing methods often exhibit low acuracy,poor generalization, and weakrobustness.Thisstudy proposed MFF-STViT,amethod integratingthreeaudio features with vocoderartifactfeatures through anovelfeature fusionmoduletoenhance representation.The fused features were processdusing animproved Transformer model,STViT,toreduce redundancyand improve detectionperformance.Onthe ASVspoof2019LA testset,the method reduced the equal error rate(EER)by 71.38% on average. On the ASVspoof2O21 LA dataset, it achieved average reductions of 44.41% in EERand 18.11% intheminimum tandem detection cost function(min-tDCF).For the ASVspoof2021 DF dataset, the average EER decreased by 57.81% ,with reductions exceeding 80% in specific partitions. These findings demonstrate the efectiveness of MFF-STViT in improving accuracy,generalization,and robustness.
Keywords:audio deepfake detection;deep learning;feature fusion;vocoder artifacts
0 引言
近年來,自動(dòng)說話人確認(rèn)(automaticspeakerverification,ASV)系統(tǒng)因其采集方式簡便、特異性高、成本低等優(yōu)點(diǎn)被廣泛應(yīng)用于語音郵件、電話銀行、呼叫中心、生物特征認(rèn)證、法醫(yī)應(yīng)用等領(lǐng)域[1]。(剩余19472字)