基于漸近式k-means聚類的多行動(dòng)者確定性策略梯度算法

打印
收藏

收藏成功

微博 QQ空間微信

打開文本圖片集

中圖分類號(hào)：TP18 文獻(xiàn)標(biāo)志碼：A 文章編號(hào)：1671-5489（2025）03-0885-10

Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k -Means Clustering

LIU Quan 1，2 ，LIU Xiaosong2，WU Guangjun2，LIU Yuhan3 （1.SchoolofComputerScienceand Techology，Kashi Unersity，Kashi844O，XinjiangUygurAutonomousRegion，China; 2. School of Computer Science and Technology，Soochow University，Suzhou 215008，Jiangsu Province，China; 3. Academyof Future Education，Xi'an Jiaotong-Liverpool University， Suzhou 2150oo，Jiangsu Province，China）

Abstract： Aiming at the problems of poor learning performance and high fluctuation in the deep deterministic policy gradient （DDPG） algorithm for tasks with some large state spaces，we proposed a multi-actor deep deterministic policy gradient algorithm based on progressive k -means clustering （MDDPG-PK-Means） algorithm. In the training process，when selecting actions for the state at each time step，the decision-making of the actor network was assisted based on the discrimination results of the k -means clustering algorithm. At the same time，as the training steps increased，the number of （204 k -means cluster centers gradually increased. The MDDPG-PK-Means algorithm was applied to the MuJoCo simulation platform，the experimental results show that，compared with DDPG and other algorithms，the MDDPG-PK-Means algorithm has better performance in most continuous tasks.

Keywords： deep reinforcement learning；deterministic policy gradient algorithm; k -means clustering; multi-actor

強(qiáng)化學(xué)習(xí)（reinforcement learning，RL）是一種在環(huán)境中不斷自主學(xué)習(xí)，尋找規(guī)律以最大化未來累計(jì)獎(jiǎng)賞，從而尋找最優(yōu)策略達(dá)到目標(biāo)的方法[1]．其根據(jù)Agent 的當(dāng)前狀態(tài)尋找可執(zhí)行的動(dòng)作，因此強(qiáng)化學(xué)習(xí)適合解決序貫決策問題[2-3].

在傳統(tǒng)強(qiáng)化學(xué)習(xí)中，基于值函數(shù)的 SARSA（state-action-reward-state-action）和 Q -Learning[4-5]算法在經(jīng)典強(qiáng)化學(xué)習(xí)任務(wù)，如Cart-Pole和Mountain-Car等低維狀態(tài)空間環(huán)境中效果較好，但在高維動(dòng)作空間環(huán)境中性能不佳．隨著深度學(xué)習(xí)的發(fā)展，深度神經(jīng)網(wǎng)絡(luò)有高效識(shí)別高維數(shù)據(jù)的能力，因此將深度學(xué)習(xí)（deep learning，DL）與強(qiáng)化學(xué)習(xí)相結(jié)合的深度強(qiáng)化學(xué)習(xí)（deep reinforcement learning，DRL）[6]能解決高維動(dòng)作空間問題，目前 DRL已成為人工智能領(lǐng)域的熱門研究方向之—[7-8]。（剩余13831字）

試讀結(jié)束

購(gòu)買全文6.00元下一篇受磁場(chǎng)調(diào)控的黏彈性流體在雙層系統(tǒng)中的熱不穩(wěn)定性

吉林大學(xué)學(xué)報(bào)（理學(xué)版）

2025年03期

￥10.00/本

特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

基于漸近式k-means聚類的多行動(dòng)者確定性策略梯度算法