• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
ФИО студента
Название работы
Руководитель
Факультет
Программа
Оценка
Год защиты
Никишин Евгений Сергеевич
Stability Improvement and Knowledge Transfer in Deep Reinforcement Learning
2019
Deep reinforcement learning (RL) methods demonstrated a lot of successes in a variety of applications during the last years. Nevertheless, deep RL methods still require solving a lot of engineering problems, lack robustness to hyperparameter selection and struggle to generalize between similar environments. In this thesis, we focus on two important issues of deep RL that impede it from wide practical usage in applications: training instability and poor transferability of learned policies.

We observe that the average cumulative rewards

are unstable throughout the learning process and do not increase monotonically given more training steps.

Furthermore, a highly rewarded policy, once learned, is often forgotten by an agent, leading to performance deterioration.

These problems are partly caused by the fundamental presence of noise in gradient estimators in RL.

In order to reduce the effect of noise on training, we propose to apply stochastic weight averaging

(SWA), a recent method that averages weights along the optimization trajectory.

We show that SWA stabilizes the model solutions, alleviates the problem of forgetting the highly rewarded policy during training,

and improves the average rewards on several Atari and MuJoCo environments.

We further note that the learned representations of observations are often overspecializing to a particular environment and become not useful in other environments even if the environments have the same underlying dynamics.

In order to leverage a trained policy network in another environment, we propose to train a modification of variational autoencoder (VAE) for both of the environments. Each VAE produces latent representations for an observation and the corresponding next observation via encoding the observation and modeling dynamics in a latent space. For environments that have the same underlying dynamics, it is possible to share weights of the dynamics which will enable to obtain the same latent spaces for the two environments. Given the shared latent space, we propose to imitate a trained in a source environment policy resulting in a policy that can be applied in both source and target environments.

In our preliminary experiments, we demonstrate that the proposed model is capable of imitating a trained policy as well as reconstructing observations and next observations for both source and target environments using the same latent dynamics network.

Выпускные квалификационные работы (ВКР) в НИУ ВШЭ выполняют все студенты в соответствии с университетским Положением и Правилами, определенными каждой образовательной программой.

Аннотации всех ВКР в обязательном порядке публикуются в свободном доступе на корпоративном портале НИУ ВШЭ.

Полный текст ВКР размещается в свободном доступе на портале НИУ ВШЭ только при наличии согласия студента – автора (правообладателя) работы либо, в случае выполнения работы коллективом студентов, при наличии согласия всех соавторов (правообладателей) работы. ВКР после размещения на портале НИУ ВШЭ приобретает статус электронной публикации.

ВКР являются объектами авторских прав, на их использование распространяются ограничения, предусмотренные законодательством Российской Федерации об интеллектуальной собственности.

В случае использования ВКР, в том числе путем цитирования, указание имени автора и источника заимствования обязательно.

Расширенный поиск ВКР