基于漸近式k-means聚類的多行動(dòng)者確定性策略梯度算法

打開文本圖片集
中圖分類號(hào):TP18 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1671-5489(2025)03-0885-10
Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k -Means Clustering
LIU Quan 1,2 ,LIU Xiaosong2,WU Guangjun2,LIU Yuhan3 (1.SchoolofComputerScienceand Techology,Kashi Unersity,Kashi844O,XinjiangUygurAutonomousRegion,China; 2. School of Computer Science and Technology,Soochow University,Suzhou 215008,Jiangsu Province,China; 3. Academyof Future Education,Xi'an Jiaotong-Liverpool University, Suzhou 2150oo,Jiangsu Province,China)
Abstract: Aiming at the problems of poor learning performance and high fluctuation in the deep deterministic policy gradient (DDPG) algorithm for tasks with some large state spaces,we proposed a multi-actor deep deterministic policy gradient algorithm based on progressive k -means clustering (MDDPG-PK-Means) algorithm. In the training process,when selecting actions for the state at each time step,the decision-making of the actor network was assisted based on the discrimination results of the k -means clustering algorithm. At the same time,as the training steps increased,the number of (204 k -means cluster centers gradually increased. The MDDPG-PK-Means algorithm was applied to the MuJoCo simulation platform,the experimental results show that,compared with DDPG and other algorithms,the MDDPG-PK-Means algorithm has better performance in most continuous tasks.
Keywords: deep reinforcement learning;deterministic policy gradient algorithm; k -means clustering; multi-actor
強(qiáng)化學(xué)習(xí)(reinforcement learning,RL)是一種在環(huán)境中不斷自主學(xué)習(xí),尋找規(guī)律以最大化未來累計(jì)獎(jiǎng)賞,從而尋找最優(yōu)策略達(dá)到目標(biāo)的方法[1].其根據(jù)Agent 的當(dāng)前狀態(tài)尋找可執(zhí)行的動(dòng)作,因此強(qiáng)化學(xué)習(xí)適合解決序貫決策問題[2-3].
在傳統(tǒng)強(qiáng)化學(xué)習(xí)中,基于值函數(shù)的 SARSA(state-action-reward-state-action)和 Q -Learning[4-5]算法在經(jīng)典強(qiáng)化學(xué)習(xí)任務(wù),如Cart-Pole和Mountain-Car等低維狀態(tài)空間環(huán)境中效果較好,但在高維動(dòng)作空間環(huán)境中性能不佳.隨著深度學(xué)習(xí)的發(fā)展,深度神經(jīng)網(wǎng)絡(luò)有高效識(shí)別高維數(shù)據(jù)的能力,因此將深度學(xué)習(xí)(deep learning,DL)與強(qiáng)化學(xué)習(xí)相結(jié)合的深度強(qiáng)化學(xué)習(xí)(deep reinforcement learning,DRL)[6]能解決高維動(dòng)作空間問題,目前 DRL已成為人工智能領(lǐng)域的熱門研究方向之—[7-8]。(剩余13831字)