How to Adjust a Smaller Size Neural Network without Quality Loss

Staff members of the HSE Faculty of Computer Science recently presented their papers at the biggest international conference on machine learning, Neural Information Processing Systems (NIPS)’.

The two biggest international conferences in machine learning today are Neural Information Processing Systems (NIPS) and the International Conference on Machine Learning (ICML). Most of the cutting-edge academic papers in this field are first presented at one of these two conferences.

NIPS has been held annually since 1986. Traditionally, the key elements of the programme are lectures by invited speakers, plenary presentations (15 min) and poster sessions. No more than 1% of the applied papers usually get the status of plenary papers. A total of 678 out of 3,240 papers have been accepted this year, and 40 of them were accepted as plenary. In addition to that, a record number of participants – 7,850 took part in the event this year, with this number rising from 5,600 in 2016.

Unlike most papers presented at the conference, which were more practice-oriented, the plenary presentation by Anton Osokin, Associate Professor at Big Data and Information Retrieval School, ‘On Structured Prediction Theory with Calibrated Convex Surrogate Losses’, was dedicated to theoretical issues in the field of machine learning such as structured prediction. His paper was the first to unite such tasks as consistency, optimization tasks, and qualitative characteristics of structure complexity, into one formalism. ‘Our paper provides a theoretical basis for practical research in structural prediction’, said Anton, ‘In fact, we determine the characteristics of tasks that can be used for creating effective solutions’.

Applicable methods to adjust neural networks are also highly important for effective task solving. The use of neural networks has brought about a revolution in such fields as image analysis and natural language processing. But neural networks have some shortcomings as well, such as comparatively slow and memory-consuming learning algorithms. The poster presentation ‘Structured Bayesian Pruning via Log-Normal Multiplicative Noise’ by researchers from the HSE Centre of Deep Learning and Bayesian Methods unveiled a new method that allows adjusting a neural network of a considerably smaller size without loss in quality, as well as speeding up the model. Remarkably, Bayesian methods have been long used for sparse model learning in machine learning, but it was only recently that these results were implemented on modern neural network architectures.

‘We have carried out this study together with my doctoral students, Kirill Neklyudov, Dmitry Molchanov, and Arsenii Ashukha, who are working as researchers at the HSE Centre of Deep Learning and Bayesian Methods, which was created at the Faculty of Computer Science in January 2017,’ said Research Professor Dmitry Vetrov, head of the Centre of Deep Learning and Bayesian Methods, ‘For Kirill, a newcomer to the team and the first author of the paper, it was the first experience of preparing an academic paper at such a high level, and I’m very happy and proud that he was so successful at his first attempt. Unfortunately, Kirill and Dmitry couldn’t make it to the conference due to problems with getting a U.S. visa’.

‘Two papers by researchers from the HSE Faculty of Computer Science at one of the most important conferences in neural networks and machine learning is a serious achievement. We are proud of our colleagues, whose academic expertise is so highly valued by the international professional community’, said Ivan Arzhantsev, Dean of the Faculty of Computer Science.

In addition to that, Novi Quadrianto, Academic Supervisor of the HSE Centre of Deep Learning and Bayesian Methods, made a poster presentation on ‘Recycling Privileged Learning and Distribution Matching for Fairness’.