• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Scientists Present New Solution to Imbalanced Learning Problem

Scientists Present New Solution to Imbalanced Learning Problem

© iStock

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.

The problem of imbalanced learning is becoming increasingly relevant across various fields, including banking and medicine. Conventional methods, such as random oversampling, often generate low-quality samples or fail to accurately model rare class data.

Simplicial SMOTE (Synthetic Minority Oversampling Technique), a novel solution proposed by scientists from HSE University and Sber AI Lab, addresses these issues by enabling more accurate modelling of complex topological data structures and improving classifier performance on imbalanced datasets.

It generates new examples of a rare class by leveraging information from multiple closed instances ('simplex'), rather than just two close points, as in the original SMOTE and its well-known modifications. This facilitates a better understanding of the data and advances performance. The technique improves training on imbalanced data, where one class (eg, normal transactions) has many examples, while another class (eg, fraud) has few.

Researchers have experimentally shown on a large number of test datasets that the proposed approach achieves significantly better performance metrics, such as the F1 Score and Matthews Correlation Coefficient, for both the basic SMOTE and its modifications. In particular, an improvement was observed in gradient boosting, a classifier commonly used in practice.

'Our technique is particularly effective for tasks involving imbalanced data, where the rare class holds greater significance. Banks can use Simplicial SMOTE to detect fraud more effectively, and medical centres can apply it to diagnose rare diseases,' says Andrey Savchenko, co-author of the article and Leading Research Fellow at the Laboratories for Theoretical Modelling in AI of the HSE AI and Digital Science Institute.

The new technique can be integrated into existing oversampling algorithms (such as Borderline-SMOTE, Safe-level-SMOTE, and ADASYN), enabling better accuracy without significantly increasing computational complexity. According to the researchers, the developed approach could contribute to the creation of more accurate and reliable machine learning models, thereby improving the quality of analytics.

The study was conducted with support from the HSE Basic Research Programme.

See also:

Scientists Find That Only Technological Innovations Consistently Advance Environmental Sustainability

Renewable energy and labour productivity do not always contribute to environmental sustainability. Technological innovation is the only factor that consistently has a positive effect. This is the conclusion reached by an international team of researchers, including Natalia Veselitskaya, Leading Research Fellow at the HSE ISSEK Foresight Centre. The study has been published in Sustainable Development.

HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately

Scientists at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a model capable of predicting protein–protein interactions with 95% accuracy. GSMFormer-PPI integrates three types of protein data (including information about protein surface properties) to analyse relationships between proteins, rather than simply combining datasets as in previous models. The solution could accelerate the discovery of disease molecular mechanisms, biomarkers, and potential therapeutic targets. The paper has been published in Scientific Reports.

HSE Scientists Uncover Mechanism Behind Placental Lipid Metabolism Disorders in Preeclampsia

Scientists at HSE University have discovered that in preeclampsia—one of the most severe complications of pregnancy—the placenta remodels its lipid metabolism, reducing its own cholesterol synthesis while increasing cholesterol transfer to the foetus. This compensatory mechanism helps sustain foetal nutrition but accelerates placental deterioration and may lead to preterm birth. The study findings have been published in Frontiers in Molecular Biosciences.

HSE Experts Reveal Low Accuracy of Technology Forecasts in Transportation

HSE researchers evaluated the accuracy of technology forecasts in the transportation sector over the past 50 years and found that the average accuracy rate does not exceed 25%, with the lowest accuracy observed in aviation and rail transport. According to the scientists, this is due to limitations of the forecasting method and the inherent complexities of the sector. The study findings have been published in Technological Forecasting and Social Change.

Wearable Device Data and Saliva Biomarkers Help Assess Stress Resilience

A team of scientists, including researchers from HSE University, has proposed a method for assessing stress resilience using physiological markers derived from wearable devices and saliva samples. The participants who adapted better to stress showed higher heart rate variability, higher zinc concentrations in saliva, and lower potassium levels.  The findings were published in the Journal of Molecular Neuroscience.

When Circumstances Are Stronger Than Habits: How Financial Stress Affects Smoking Cessation

HSE researchers have found that the likelihood of quitting smoking rises with increasing financial struggles. While low levels of financial difficulties do not affect smoking behaviour, moderate financial stress can increase the probability of quitting by 13% to 21%. Responses to high financial stress differ by gender: men are almost 1.5 times more likely to give up cigarettes than under normal conditions, whereas no significant effect is observed on women’s decisions to quit smoking. These conclusions are based on data from the Russia Longitudinal Monitoring Survey (RLMS-HSE) for 2000–2023 and have been published in Monitoring of Public Opinion: Economic and Social Changes

HSE Researchers Propose New Method of Verbal Fluency Analysis for Early Detection of Cognitive Impairment

Researchers from the HSE Center for Language and Brain and the Mental Health Research Centre have proposed a new method of linguistic analysis that enables the distinction between normal and pathological ageing. Using this approach, they showed that patterns in patients’ word choices during verbal fluency tests allow clinicians to more accurately differentiate clinically significant impairments from subjective memory complaints. Incorporating this type of analysis into clinical practice could improve the accuracy of early dementia diagnosis. The results have been published in Applied Neuropsychology: Adult.

How the Brain Processes a Word: HSE Researchers Compare Reading Routes in Adults and Children

Researchers from the HSE Center for Language and Brain used magnetoencephalography to study how the brains of adults and children respond to words during reading. They showed that in children the brain takes longer to process words that are frequently used in everyday speech, while rare words and pseudowords are processed in the same way—slowly and in parts. With age, the system is reorganised: high-frequency words shift to a fast route, whereas new letter combinations are still analysed slowly. The study was published in the journal Psychophysiology.

How Neural Networks Detect and Interpret Wordplay: New Insights from HSE Researchers

An international team including researchers from the HSE Faculty of Computer Science has presented KoWit-24, an annotated dataset of 2,700 Russian-language Kommersant news headlines containing wordplay. The dataset enables an assessment of how artificial intelligence detects and interprets wordplay. Experiments with five large language models show that even advanced systems still make mistakes, and that interpreting wordplay is more challenging for them than detecting it. The results were presented at the RANLP conference; the paper is available on Arxiv.org, and the dataset and the code for reproducing the experiments are available on GitHub.

HSE Economists Find That Auction Prices Depend on Artist’s Life Story

Researchers from the Centre for Big Data in Economics and Finance at the HSE Faculty of Economic Sciences have found that facts from an artist’s life are statistically significant in pricing a painting, alongside such traditional characteristics as the material, the size of the canvas, or the presence of the artist’s signature. This conclusion is based on an analysis of prices for 15,000 works by 158 artists sold since 1999 by the major auction houses Sotheby’s and Christie’s. The article has been published in the journal Empirical Studies of the Arts.