Multi-Agent Reinforcement Learning for Supply Chain Management

Student: Orlov Aleksandr

Educational Programme: Financial Technology and Data Analysis (Master)

Final Grade: 8

Year of Graduation: 2019

The paper presents a study of multi-agent reinforcement learning in an application to supply chain management. All over the world, the production and delivery of goods by from a manufacturer to final customers requires sufficiently long supply chains. There are cases when demand fluctuations in supply chains due to non-optimal behavior of people in the middle of the chain - distributors and wholesalers - led to large financial losses (in particular, for JDS Uniphase, Cisco). Therefore It is interesting to create computerized agents that can optimally manage supply chains, avoiding human errors. Using the model system as an example, the Beer Game was considered - a linear supply chain with 4 players, from a manufacturer to consumers. In this game, players observe only their position information and cannot exchange information between agents. Each turn, the players are charged the cost of storing goods, proportional to the amount of the stocks in their warehouses. The task is to get all players to act together to reduce the total cost of storing the goods for the entire game, without sharing information with each other. For the creation of agents, reinforcement learning was used, namely the Deep Q-learning algorithm, DQN. It is known that multi-agent reinforcement learning has a number of specific issues with learning as compared to training a single agent in the system, such as the “curse of dimension” and the problem of awarding a reward. The QMIX approach, created in 2018 by researchers from the University of Oxford, allows training multi-agent systems consisting of DQN agents. An important feature of QMIX is the ability to use additional information during the training, which is not accessible to an individual agent, for example, the complete state of the environment. Another feature is the fact that during the testing phase trained agents use only the information available to them. QMIX has not previously been used in supply chain tasks, so this application has a scientific interest. In the course of this work, the problem of studying the possibility of using the QMIX approach for multi-agent learning in the Beer Game was considered. A comparison was made with base agents known in the literature. The resulting agents in the computer simulation of Beer Game played better and had more stable behavior than agents that imitate human behavior.

Full text (added June 9, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses