一種Q學習制作海克斯棋開局庫方法

打印
收藏

收藏成功

微博 QQ空間微信

打開文本圖片集

引文格式：[J].南通大學學報（自然科學版），2025，24（2）：22-28.

中圖分類號：TP312 文獻標志碼：A 文章編號：1673-2340（2025）02-0022-07

A Q-learning method for creating a Hex opening library

XUZhifan1，LIYuan1*，WANGJingwen'，LIZhuoxuan2，CAOYiding3 （l.School of Science， Shenyang Universityof Technology，Shenyangllo87o，China; 2.SchoolofMathematics， SoutheastUniversity，Nanjing2lll89，China; 3.BaiyangEra （Beijing） Technology Co.，Ltd.，BeijinglOoo89，China）

Abstract： Hex is a perfect-information board game，and its opening library-an essential component of the game system—has traditionally been generated based on human expertise and Monte Carlo tree search （MCTS）algorithms. However，this approach is computationally expensive and may not consistently ensure acuracy.This study proposes a self-playmethod based on Q-learning for the eficientconstruction of Hexopening libraries.Theproposed method employs multi-threaded simulations and an improved upperconfidence bound applied to trees （UCT） algorithm to identify promising opening moves.An enhanced ε -greedy strategyis incorporated to improve the convergence rate of the Q-learning algorithm.To further improve performance，Q-values are integrated into the upper confidence bound （UCB） formula as prior knowledge，which is intended to enhance decision-making accuracy during gameplay.Experimentalresultsindicate that after 3Ooo training iterations，theQ-values acrossthe board converge，suggesting the method's potentialfor stable policy learning.Incomparativeevaluations，the generated opening library achieveda 62.9% average win rate against the improved UCT algorithm.When Q-values were used as prior input to the UCB formula，the averagewin rate increased to 75.9% . The method was also applied in the Chinese Computer Game Competition，where theimplementationreceivedafirst-placeaward，supporting thepracticalapplicabilityof theapproach. Key words： computer game; Hex;reinforcement learning;Q-learning; opening library

計算機博弈作為各個領域博弈理論的起源與基礎，可用于研究人類思維的模式和規(guī)律，模仿人類下棋，提高人類智能水平[1-3]。（剩余10800字）

試讀結束

購買全文6.00元下一篇基于機器學習的磁性元件磁芯損耗預測方法

南通大學學報(自然科學版)

2025年02期

￥4.90/本

特黄三级爱爱视频|国产1区2区强奸|舌L子伦熟妇aV|日韩美腿激情一区|6月丁香综合久久|一级毛片免费试看|在线黄色电影免费|国产主播自拍一区|99精品热爱视频|亚洲黄色先锋一区

一種Q學習制作海克斯棋開局庫方法