基于強(qiáng)化學(xué)習(xí)的人道主義應(yīng)急物資分配優(yōu)化研究

打開文本圖片集
Research on the Optimization of Humanitarian Emergency Material Allocation Based on Reinforcement Learning
ZHANGJianjunYANGYundan ZHOU Yizhuo
(School of Economics and Management, Tongji University, Shanghai 2Ooo92,China)
Abstract: The efcient allocation of limited humanitarian aid supplies following major emergencies is a critical research topic,aiming to meet the material needs of affected areas while reducing the sufering of disaster victims. This paper addresses this issue by modeling a Mixed Integer Nonlinear Programming (MINLP) problem,which involves solving multi-period dynamic optimization allocation strategies.Reinforcement Learning (RL),as one of the two mainstream methods for current strategy exploration,is particularly suitable for dynamic resource allocation scenarios due to its strong scalability and adaptability to external dynamics through interaction with the environment and feedback signals. We employ the Dueling DQN algorithm to solve for the optimal policy,overcoming the overestimation of Q-values that has been a drawback in previous RL applications to humanitarian aid distribution. This approach more accurately estimates the action-value function for affcted regions. Additionally,the paper introduces a novel stochastic demand assumption,enhancing the model's realism and validity by better reflecting the actual conditions of disaster scenarios. The effectiveness of the proposed method is demonstrated using a numerical example based on the Ya'an earthquake,making this the first study to substantiate the optimization of emergency resource allocation using real data sources with RL. Comparative analysis shows that the Dueling DQN algorithm reduces the total cost by approximately 5% compared to traditional DQN methods, indicating a more effective reduction in the sufering of affected populations. This aligns with the“people-oriented”rescue principle of China and holds significant theoretical and practical implications for humanitarian-based emergencyresponses.
Key words: deep reinforcement learning; humanitarian; emergency supplies distribution
0 引言
在重大突發(fā)事件發(fā)生后,拯救生命、減輕受災(zāi)民眾痛苦是災(zāi)害救援的首要目標(biāo)。(剩余11650字)