Distributional and Entropy-Regularized Reinforcement Learning

Student: Konobeev Mikhail

Supervisor: Pavel Shvechikov

Educational Programme: Applied Mathematics and Information Science (Bachelor)

Year of Graduation: 2018

Distributional reinforcement learning was shown to provide a significant improvement over the \Q-learning algorithm, yet it is not entirely clear why such improvement occurs. We analyze distributional algorithms from entropy-regularized reinforcement learning framework that leads to non-deterministic policies and has shown to yield several useful connections between value-based and policy-based methods. We also propose a method that takes advantage of off-policy learning of the distribution and more stable and easier to employ on-policy learning. This method achieves better sample efficiency and higher reward of the learned agents.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses