面向說(shuō)話(huà)人日志的多原型驅(qū)動(dòng)圖神經(jīng)網(wǎng)絡(luò)方法

打開(kāi)文本圖片集
Multi-prototype driven graph neural network for speaker diarization
Abstract:Recently,theutilizationof graphneuralnetwork forsesson-levelmodelinghasdemonstrateditseficacyforspeakerdiarization.However,mostof existing variantssolelyrelyonlocalstructure information,gnoringtheimportanceof global speakerinformation,whichcannotfullycompensateforthelackof speakerinformationinthespeakerdiarizationtask.This paper proposedamulti-prototypedriven graphneuralnetwork(MPGNN)forrepresentationlearning,whichefectivelycombined local and global speaker information within each session and simultaneously remaps X -vector to a new embedding space that was moresuitableforclustering.Specifically,,the designof prototypelearning withadynamicandadaptive approach wasacritical component,where more accurateglobal speaker informationcould becaptured.Experimentalresultsshowthatthe proposed MPGNN approach significantly outperforms the baseline systems,achieving diarization error rates(DER)of 3.33% , 3.52% , (204號(hào) 5.66% ,and 6.52% on the AMI_SDM and CALLHOME datasets respectively.
Keywords:speakerdiarization;graphneural network;local structure information;global speaker information;multiprototype learning
0 引言
說(shuō)話(huà)人日志(speakerdiarization,SD)的目標(biāo)是解決“誰(shuí)在何時(shí)說(shuō)話(huà)”的問(wèn)題,即在給定的包含多個(gè)說(shuō)話(huà)人交流的長(zhǎng)音頻信號(hào)中,同時(shí)實(shí)現(xiàn)說(shuō)話(huà)人識(shí)別和說(shuō)話(huà)人定位。(剩余15780字)