基于VITS的高性能歌聲轉(zhuǎn)換模型

打開文本圖片集
中圖分類號:TP391.4 文獻標識碼:A 文章編號:2096-4706(2025)12-0129-06
High-performance Singing Voice Conversion Model Based on VITS
ZHOUKeru,JIN Wei (SchoolofMedicalTechnologyandInformationEngineering,ZhejiangChineseMedicalUniversity,Hangzhou31oo3,China)
Abstract: Singing voice conversion is the processof transforming the voice of the source singer into that of the target singer whileretaining thoriginalcontentand melody.With the developmentof technology,various networkarchitectures and models have beenputforwardoneafteranother,and thealgorithms forsingingvoiceconversionhavealsobecomediversified. However,problemssuchaspoorqualityofteconvertedaudio,highdistortionrates,andlackofvocalrangeareboudtocur. This paperproposes UVC(Ultra Singing Voice Conversion)model with multi-decoupled feature constraints basedon highfidelityfow.This modelisbuiltonthebasisof theVITmodel.BycombiningtheContentVecencoderandtheNSF-HFI-GAN vocoder,itimproves theinputandoutputof the model,greatlyenhancingthequalityandfuencyoftheconvertedaudioand possessing strong robustness.
Keywords: singing voice conversion; VITS; ContentVec encoder; NSF-HIFI-GAN vocoder
0 引言
音樂一直是人類生活中不可或缺的一部分,歌聲轉(zhuǎn)換是指將源歌曲的聲音轉(zhuǎn)換成另一位歌唱者的聲音的技術(shù),旨在將源說話者聲音的各個方面進行轉(zhuǎn)換,如基頻、頻譜包絡(luò)和韻律特征,使其與目標說話者的特征相匹配。(剩余10098字)