Scientists Present New Solution to Imbalanced Learning Problem

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.
The problem of imbalanced learning is becoming increasingly relevant across various fields, including banking and medicine. Conventional methods, such as random oversampling, often generate low-quality samples or fail to accurately model rare class data.
Simplicial SMOTE (Synthetic Minority Oversampling Technique), a novel solution proposed by scientists from HSE University and Sber AI Lab, addresses these issues by enabling more accurate modelling of complex topological data structures and improving classifier performance on imbalanced datasets.
It generates new examples of a rare class by leveraging information from multiple closed instances ('simplex'), rather than just two close points, as in the original SMOTE and its well-known modifications. This facilitates a better understanding of the data and advances performance. The technique improves training on imbalanced data, where one class (eg, normal transactions) has many examples, while another class (eg, fraud) has few.
Researchers have experimentally shown on a large number of test datasets that the proposed approach achieves significantly better performance metrics, such as the F1 Score and Matthews Correlation Coefficient, for both the basic SMOTE and its modifications. In particular, an improvement was observed in gradient boosting, a classifier commonly used in practice.
'Our technique is particularly effective for tasks involving imbalanced data, where the rare class holds greater significance. Banks can use Simplicial SMOTE to detect fraud more effectively, and medical centres can apply it to diagnose rare diseases,' says Andrey Savchenko, co-author of the article and Leading Research Fellow at the Laboratories for Theoretical Modelling in AI of the HSE AI and Digital Science Institute.
The new technique can be integrated into existing oversampling algorithms (such as Borderline-SMOTE, Safe-level-SMOTE, and ADASYN), enabling better accuracy without significantly increasing computational complexity. According to the researchers, the developed approach could contribute to the creation of more accurate and reliable machine learning models, thereby improving the quality of analytics.
The study was conducted with support from the HSE Basic Research Programme.
See also:
‘Policymakers Should Prioritise Investing in AI for Climate Adaptation’
Michael Appiah, from Ghana, is a Postdoctoral Fellow at the International Laboratory of Intangible-Driven Economy (IDLab) at HSE University–Perm. He recently spoke at the seminar ‘Artificial Intelligence, Digitalization, and Climate Vulnerability: Evidence from Heterogeneous Panel Models’ about his research on ‘the interplay between artificial intelligence, digitalisation, and climate vulnerability.’ Michael told the HSE News Service about the academic journey that led him to HSE University, his early impressions of Perm, and how AI can be utilised to combat climate change.
HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages
Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.
HSE Scientists Uncover How Authoritativeness Shapes Trust
Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.
Language Mapping in the Operating Room: HSE Neurolinguists Assist Surgeons in Complex Brain Surgery
Researchers from the HSE Center for Language and Brain took part in brain surgery on a patient who had been seriously wounded in the SMO. A shell fragment approximately five centimetres long entered through the eye socket, penetrated the cranial cavity, and became lodged in the brain, piercing the temporal lobe responsible for language. Surgeons at the Burdenko Main Military Clinical Hospital removed the foreign object while the patient remained conscious. During the operation, neurolinguists conducted language tests to ensure that language function was preserved.
AI Overestimates How Smart People Are, According to HSE Economists
Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.
Scientists Discover One of the Longest-Lasting Cases of COVID-19
An international team, including researchers from HSE University, examined an unusual SARS-CoV-2 sample obtained from an HIV-positive patient. Genetic analysis revealed multiple mutations and showed that the virus had been evolving inside the patient’s body for two years. This finding supports the theory that the virus can persist in individuals for years, gradually accumulate mutations, and eventually spill back into the population. The study's findings have been published in Frontiers in Cellular and Infection Microbiology.
HSE Scientists Use MEG for Precise Language Mapping in the Brain
Scientists at the HSE Centre for Language and Brain have demonstrated a more accurate way to identify the boundaries of language regions in the brain. They used magnetoencephalography (MEG) together with a sentence-completion task, which activates language areas and reveals their functioning in real time. This approach can help clinicians plan surgeries more effectively and improve diagnostic accuracy in cases where fMRI is not the optimal method. The study has been published in the European Journal of Neuroscience.
For the First Time, Linguists Describe the History of Russian Sign Language Interpreter Training
A team of researchers from Russia and the United Kingdom has, for the first time, provided a detailed account of the emergence and evolution of the Russian Sign Language (RSL) interpreter training system. This large-scale study spans from the 19th century to the present day, revealing both the achievements and challenges faced by the professional community. Results have been published in The Routledge Handbook of Sign Language Translation and Interpreting.
HSE Scientists Develop DeepGQ: AI-based 'Google Maps' for G-Quadruplexes
Researchers at the HSE AI Research Centre have developed an AI model that opens up new possibilities for the diagnosis and treatment of serious diseases, including brain cancer and neurodegenerative disorders. Using artificial intelligence, the team studied G-quadruplexes—structures that play a crucial role in cellular function and in the development of organs and tissues. The findings have been published in Scientific Reports.
New Catalyst Maintains Effectiveness for 12 Hours
An international team including researchers from HSE MIEM has developed a catalyst that enables fast and low-cost hydrogen production from water. To achieve this, the scientists synthesised nanoparticles of a complex oxide containing six metals and anchored them onto various substrates. The catalyst supported on reduced graphene layers proved to be nearly three times more efficient than the same oxide without a substrate. This development could significantly reduce the cost of hydrogen production and accelerate the transition to green energy. The study has been published in ACS Applied Energy Materials. The work was carried out under a grant from the Russian Science Foundation.


