• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

© iStock

Researchers at HSE University and the AIRI Institute have proposed a method for quickly fine-tuning neural networks. Their approach involves processing data in groups and then optimally shuffling these groups to improve their interactions. The method outperforms alternatives in image generation and analysis, as well as in fine-tuning text models, all while requiring less memory and training time. The results have been presented at the NeurIPS 2024 Conference.

The larger the neural network, the more challenging it becomes to quickly adapt it to a new task. Retraining a model from scratch is a time-consuming and costly process. Therefore, developers seek cost-effective ways to adapt a model to a specific task while preserving the overall quality of the original.

One such approach is fine-tuning using orthogonal matrices, which, unlike other methods, preserve the essential features of the original model. Popular alternatives, such as block-diagonal or butterfly matrices, have drawbacks: they are either limited in scope or require extensive computations.

Researchers at the HSE Faculty of Computer Science and the AIRI Institute have proposed a new method of constructing matrices, which they call Group-and-Shuffle. Instead of working with all the data at once, they divide the parameters into small groups, process each group separately, and then shuffle them together. This structure is both flexible and efficient: it enables the model to adapt more precisely to the task while requiring fewer computations and less memory.

Building on GS matrices, the researchers developed GSOFT, a new method for orthogonal fine-tuning of neural networks. Unlike previous approaches, GSOFT uses fewer parameters while maintaining training stability and quality, even with limited data. The team also introduced a two-sided version of the method—Double GSOFT—which allows simultaneous adjustment of parameters from both sides, enhancing the model’s flexibility and accuracy.

'We discovered how to construct orthogonal matrices using only two special types of matrices, instead of five or six as required by previous methods. This saves computational resources and training time,' explains Nikolay Yudin, Research Assistant at the HSE Laboratory for Matrix and Tensor Methods in Machine Learning.

The researchers tested the approach on three types of tasks. When fine-tuning the RoBERTa language model, the method outperformed others while using a comparable number of parameters. In image generation, where the model needed to preserve the original features while adapting to the user’s request, GSOFT and Double GSOFT outperformed popular methods like LoRA and BOFT, all while using less memory and training time.

Subject-driven generation visual results on 3,000 training iterations
© Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M. (2024). Group and shuffle: Efficient structured orthogonal parametrization. arXiv preprint arXiv:2406.10019.

The authors also tested their approach on convolutional neural networks, which are commonly used for image and video analysis, such as in face recognition. The team adapted the GS matrices even for cases where the model required strong resistance to interference and distortion.

'We tested the method across various scenarios—from language and generative models to robust convolutional networks. In every case, it performed reliably while using fewer resources. This confirms that the method can be applied effectively to a variety of purposes,' comments Aibek Alanov, Senior Research Fellow at the Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, HSE FCS, and leader of the Controllable Generative AI team at FusionBrain, AIRI.

See also:

Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors

An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.

New Development by HSE Scientists Helps Design Reliable Electronics Faster at a Lower Cost

Scientists from HSE MIEM have developed a new approach to modelling electrothermal processes in high-power electronic circuits on printed circuit boards (PCB). The method allows engineers to quickly and accurately predict how electronic components heat up during operation, helping prevent overheating and potential failures. The results have been published in Russian Microelectronics.

The Future of Cardiogenetics Lies in Artificial Intelligence

Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.

HSE Researchers: Young Russians Have Sufficient Knowledge About Money but Lack Money Management Skills

Adolescents and young adults in Russia today are well versed in financial terminology: they know what bank cards, loans, interest rates, and online payments are. However, as researchers at HSE University have found, real money-management skills remain poorly developed among most young people. The study ‘Financial Literacy, Financial Culture, and Financial Autonomy of Youth’ has been published in Monitoring of Public Opinion: Economic and Social Changes.

Why Weaker Competitors Give Up—and How to Keep Them in the Game

Anastasia Antsygina, Assistant Professor at HSE University’s Faculty of Economic Sciences, has developed a prize distribution model that maximises competitor engagement. She proposed revising the traditional ‘winner-takes-all’ approach and, in certain cases, offering a small reward even to those who have lost. According to her, this could increase participant motivation and make the competition more intense. The findings of her research were published in the Economic Theory journal.

HSE Researchers Compile Scientific Database for Studying Children’s Eating Habits

The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.

New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind

A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.

Teaching a Machine to Read the Past: HSE Develops Neural Network to Decipher Manuscripts

Diaries and letters are an invaluable resource for humanities scholars. But what can be done when the text is impossible to read? At the HSE Faculty of Humanities, this challenge has been translated into the language of mathematics: a team of philologists, historians, and machine learning specialists has created an information system that not only recognises illegible handwriting but also helps analyse archival content.

Scientists Develop Algorithm for Accurate Financial Time Series Forecasting

Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

HSE and Yandex Propose Method to Speed Up Neural Networks for Image Generation

A team of scientists at HSE FCS and Yandex Research has proposed a method that reduces computational costs and accelerates text-to-image generation in diffusion models without compromising quality. These models currently set the standard for text-to-image generation, but their use is limited by high computational loads, the company said in a statement.