• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • News
  • HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately

HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately

HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately

© iStock

Scientists at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a model capable of predicting protein–protein interactions with 95% accuracy. GSMFormer-PPI integrates three types of protein data (including information about protein surface properties) to analyse relationships between proteins, rather than simply combining datasets as in previous models. The solution could accelerate the discovery of disease molecular mechanisms, biomarkers, and potential therapeutic targets. The paper has been published in Scientific Reports.

Almost all cellular processes depend on interactions between proteins. Cells use these interactions to transmit signals, initiate and regulate chemical reactions, and form molecular complexes essential for proper functioning. When such interactions are disrupted, cellular processes can malfunction, potentially leading to disease.

Therefore, to study disease mechanisms and identify therapeutic targets, it is important for scientists to understand which proteins can interact and which cannot. Determining this experimentally is difficult: when dozens or hundreds of proteins are considered, the number of possible pairs becomes too large to test individually. As a result, biologists use machine learning methods to predict these interactions based on the structure and properties of molecules.

HSE researchers have developed the GSMFormer-PPI system, which takes into account three types of data for each protein in a candidate pair: the amino acid sequence, the three-dimensional structure, and the properties of the molecular surface. To process this information, the authors used existing models that convert this data into numerical representations. A protein language model analyses the amino acid sequence—the order of amino acids that make up the protein. The three-dimensional structure of the protein is represented as a graph, in which amino acids are treated as nodes and their spatial contacts as edges; this representation is processed by a graph neural network. In addition, a separate algorithm captures protein surface properties—the shape and physicochemical characteristics of the regions through which proteins recognise one another.

These numerical representations of proteins were then fed into a transformer module developed by the authors—a neural network that jointly analyses different types of protein data. In contrast to many previous approaches, where features were often simply concatenated into a single vector, this model does not combine them mechanically but instead captures the relationships between them.

Maria Poptsova

'When proteins interact, their surface is particularly important: it is through the surface that molecules recognise one another, and it is where the physicochemical properties that determine binding are concentrated. In our model, we sought to incorporate this information alongside the protein’s sequence and three-dimensional structure and not merely concatenate these features but enable the algorithm to analyse the relationships between them. This is what allowed us to predict protein–protein interactions more accurately,' comments one of the authors, Maria Poptsova, Director of the Centre for Biomedical Research and Technology at the HSE FCS AI and Digital Science Institute.

General schema of the proposed GSMFormer-PPI model. Panel A illustrates the different types of protein representations used by the model: structural, sequential, and surface-based. Panel B shows how these representations are projected to a common dimensional space, processed by a transformer, and then used to generate the final prediction of the interaction.
© Arteaga, D., Chervov, N. & Poptsova, M. Multimodal graph, surface, and language-based model for protein protein interaction prediction. Sci Rep 16, 4772 (2026).

The researchers tested the new model’s performance on the PINDER dataset, a large database of known protein interactions. In these experiments, GSMFormer-PPI achieved an accuracy of 95.7%, outperforming popular graph-based models such as GCN and GAT. The researchers also tested a simpler version of GSMFormer-PPI—without the module that analyses relationships between different types of data. This version performed worse, demonstrating that it is not only the protein data itself but also how the model integrates and compares it that drives its accuracy.

Additional tests showed that all three types of data—sequence, spatial structure, and surface properties—are essential for accurate predictions. When the researchers removed any one component, prediction accuracy declined. In other words, the model performs better precisely because it considers the protein on multiple levels simultaneously. In the future, such systems could help identify protein pairs more efficiently when studying disease mechanisms and searching for drug targets.

The work was supported by a grant for research centres in AI provided by the Ministry of Economic Development of the Russian Federation and implemented at HSE University.

See also:

Resource Race and Green Transition: Three Unexpected Conclusions from Foresight Centre’s Research on Climate and Poverty

Beneath the surface of green energy—which most people associate with solar panels, electric vehicles, and reduced CO2 emissions—lies a complex web of geopolitical interests, international inequality, and resource constraints. Researchers from the Laboratory for Science and Technology Studies (LST) at the HSE ISSEK Foresight Centre have published a series of articles in leading international journals on hidden and overt conflicts surrounding critically important metals and minerals, as well as related processes in the energy sector.

Immersion in Second Language Environment Influences Bilinguals’ Perception of Emotions

Researchers at the Cognitive Health and Intelligence Centre at the HSE Institute for Cognitive Neuroscience have discovered how bilingual individuals process emotional words in their native (first) and non-native (second) languages. It was found that the link between word meaning and bodily sensations is weaker in a second language than in a first language. However, the more a person is immersed in a language environment, the smaller this difference becomes. The article has been published in Language, Cognition and Neuroscience.

Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors

An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.

New Development by HSE Scientists Helps Design Reliable Electronics Faster at a Lower Cost

Scientists from HSE MIEM have developed a new approach to modelling electrothermal processes in high-power electronic circuits on printed circuit boards (PCB). The method allows engineers to quickly and accurately predict how electronic components heat up during operation, helping prevent overheating and potential failures. The results have been published in Russian Microelectronics.

The Future of Cardiogenetics Lies in Artificial Intelligence

Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.

HSE Researchers: Young Russians Have Sufficient Knowledge About Money but Lack Money Management Skills

Adolescents and young adults in Russia today are well versed in financial terminology: they know what bank cards, loans, interest rates, and online payments are. However, as researchers at HSE University have found, real money-management skills remain poorly developed among most young people. The study ‘Financial Literacy, Financial Culture, and Financial Autonomy of Youth’ has been published in Monitoring of Public Opinion: Economic and Social Changes.

Why Weaker Competitors Give Up—and How to Keep Them in the Game

Anastasia Antsygina, Assistant Professor at HSE University’s Faculty of Economic Sciences, has developed a prize distribution model that maximises competitor engagement. She proposed revising the traditional ‘winner-takes-all’ approach and, in certain cases, offering a small reward even to those who have lost. According to her, this could increase participant motivation and make the competition more intense. The findings of her research were published in the Economic Theory journal.

HSE Researchers Compile Scientific Database for Studying Children’s Eating Habits

The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.

New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind

A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.

Scientists Develop Algorithm for Accurate Financial Time Series Forecasting

Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.