• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Meta-Reinforcement Learning

Student: Konobeev Mikhail

Supervisor: Denis Belomestny

Faculty: Faculty of Computer Science

Educational Programme: Statistical Learning Theory (Master)

Year of Graduation: 2020

We consider meta-learning from two different perspectives. Firstly, we study fixed-design linear regression model and derive lower and upper bounds for the mean squared error risk. Secondly, we consider meta-learning in the form of automated neural architecture selection. Both studies could be applied to reinforcement learning setting with the first being used to theoretically understand the process value function approximation and the second to pick architectures that are better suited for the specifics of this domain. In the linear regression meta-learning we study the model of Baxter [2000] and establish a problem-dependent lower bound on the transfer risk (risk on the newly observed task) valid for all estimators. Our bound suggests that there is no meta-learning algorithm which converges to the regression function as the number of tasks $n \to \infty$ while keeping the sample size of the new task fixed. In contrast, in the non-asymptotic regime, for a sufficiently large number of tasks, meta-learning can be considerably better compared to the single-task learning. To this end, we design a maximum likelihood - type estimator with risk identity matching the lower bound up to a constant. We demonstrate that this optimal estimator is equivalent to the weighted form of a biased regularization, a popular technique in transfer and meta-learning. Finally, we propose a practical adaptation of estimator through EM procedure and show its effectiveness in series of experiments. From the perspective of neural architecture selection we note that considerable progress has been made in recent years by finding different and more complex neural architectures in domains such as computer vision and natural language processing. However, in reinforcement learning studies primarily used simple neural models. To this end we use automated neural architecture search (NAS) methods to discover novel architectures in Atari 2600 games. We find that using more complex architectures can lead to better performance of reinforcement learning agents.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses