• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Reinforcement Learning for Task and Motion Planning in a Traffic Flow

Student: Shikunov Maxim

Supervisor: Aleksandr I. Panov

Faculty: Faculty of Computer Science

Educational Programme: Data Science (Master)

Year of Graduation: 2019

This work discusses the possibility of integrating the principles from the concept of "smart city" to improve unmanned automated vehicle (UAV) control system. To do this, methods to exchange information between different devices are used. In particular it is expected to obtain data with the help of quad-copters that perform work of tracking traffic at intersections with the subsequent transfer of this data to self-driving cars. To simulate the process a virtual environment was developed that models traffic flow at an four way intersection. This environment follows the standard interaction pattern adopted by the OpenAI gym API. Several reinforcement learning algorithms such as Deep Q-networks (DQN) and Proximal Policy Optimisation (PPO) were used for testing the environment. Furthermore a hierarchical model of reinforcement learning was applied to tackle the task. The statement of the problem of traffic flow at the crossroads can be split into sub-tasks and each of them can be solved separately. This allows to use the concept of options from hierarchical reinforcement learning. Specifically Option-Critic model were applied for the problem. It is able to learn options without explicitly saying what they are. But this model still requires to set a number of options with further difficulties in finding optimal quantity. To avoid this obstacle a slight modification of the method were introduced. The number of options were change to the range of values among which optimal number is determined automatically by reducing this problem to the problem of a multi-armed bandit (MAB). Based on the results of the work, a new virtual environment with high functionality and supporting various modifications was proposed. On the simplest of them several baselines with the above mentioned models were tested. In addition, the Option-Critic model without a fixed number of options was considered. Though the results require further investigation and more experiments.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses