面向說(shuō)話(huà)人日志的多原型驅(qū)動(dòng)圖神經(jīng)網(wǎng)絡(luò)方法

打印
收藏

收藏成功

微博 QQ空間微信

打開(kāi)文本圖片集

Multi-prototype driven graph neural network for speaker diarization

Abstract：Recently，theutilizationof graphneuralnetwork forsesson-levelmodelinghasdemonstrateditseficacyforspeakerdiarization.However，mostof existing variantssolelyrelyonlocalstructure information，gnoringtheimportanceof global speakerinformation，whichcannotfullycompensateforthelackof speakerinformationinthespeakerdiarizationtask.This paper proposedamulti-prototypedriven graphneuralnetwork（MPGNN）forrepresentationlearning，whichefectivelycombined local and global speaker information within each session and simultaneously remaps X -vector to a new embedding space that was moresuitableforclustering.Specifically，，the designof prototypelearning withadynamicandadaptive approach wasacritical component，where more accurateglobal speaker informationcould becaptured.Experimentalresultsshowthatthe proposed MPGNN approach significantly outperforms the baseline systems，achieving diarization error rates（DER）of 3.33% ， 3.52% ，（204號(hào) 5.66% ，and 6.52% on the AMI_SDM and CALLHOME datasets respectively.

Keywords：speakerdiarization；graphneural network；local structure information；global speaker information；multiprototype learning

0 引言

說(shuō)話(huà)人日志（speakerdiarization，SD）的目標(biāo)是解決“誰(shuí)在何時(shí)說(shuō)話(huà)”的問(wèn)題，即在給定的包含多個(gè)說(shuō)話(huà)人交流的長(zhǎng)音頻信號(hào)中，同時(shí)實(shí)現(xiàn)說(shuō)話(huà)人識(shí)別和說(shuō)話(huà)人定位。（剩余15780字）

試讀結(jié)束

購(gòu)買(mǎi)全文6.00元下一篇鄰域變異的黑猩猩多峰優(yōu)化算法

計(jì)算機(jī)應(yīng)用研究

2025年06期

￥12.00/本

特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

面向說(shuō)話(huà)人日志的多原型驅(qū)動(dòng)圖神經(jīng)網(wǎng)絡(luò)方法