• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Cost-sensitive Training for Autoregressive Models

Student: Saparina Irina

Supervisor: Anton Osokin

Faculty: Faculty of Computer Science

Educational Programme: Statistical Learning Theory (Master)

Final Grade: 10

Year of Graduation: 2020

This work is devoted to the investigation of the training procedure for autoregressive models, which are widely used in many important applications, including machine translation and code generation. Autoregressive model outputs predictions one by one, and each observation depends on the past (the output sequences can be, for example, text, audio wave, or time series). Training autoregressive models to better predict under the test metric, instead of maximizing the likelihood, has been reported to be beneficial in several use cases. However, this approach brings additional complications, which prevent wider adoption. In this study, we follow the learning-to-search line of works~\citep{Daume2009b,searnn2018leblond} and investigate several components of this approach: reference policy, costs and loss function. We experiment on three challenging tasks: word ordering, neural machine translation and code generation. First, we propose a way to construct a reference policy based on an alignment between the model output and ground truth. We prove that our reference policy is optimal when applied to the Kendall-tau distance between permutations (appear in the task of word ordering). This reference policy helps to approximate the METEOR score and allows computing the costs based on this score at each training step. We use this feature of our reference policy on the neural machine translation and code generation tasks. Second, we observe that the learning-to-search approach benefits from choosing the costs related to the test metrics. Finally, we study the effect of different learning objectives and find that the standard KL loss only learns several high-probability tokens. We propose the ordering-based loss functions that target high-probability tokens explicitly and can replace the standard KL loss.

Full text (added May 24, 2020)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses