‘Data Mining Can Help Forecast the Pandemic Situation with an Accuracy Within 2.5%’
A mathematical model of Covid-19 spreading in Nizhny Novgorod Region, which has been created by the Big Data Laboratory at Nizhny Novgorod Development Strategy Project Office, has been widely discussed in the media and on social networks. The research was led by Anastasia Popova, a master’s student of HSE University in Nizhny Novgorod, repeat winner of machine learning competitions, and winner of Ilya Segalovich Award by Yandex. In the following interview given on April 15, Anastasia speaks about how the model was developed, the data it uses, and long-term potential applications.
— What data is the coronavirus spread model based on? And how well equipped is the Big Data Laboratory to undertake such studies?
— The Big Data Laboratory, which I’m heading, focuses on various development programmes for the region, from transport models to research and education centres.
The Nizhny Novgorod Development Strategy Project Office tasked the Laboratory with forecasting how the epidemic situation would develop in the region. We were to use mathematical calculations to predict how the spread of Covid-19 in the Nizhny Novgorod Region would be impacted by people demonstrating increased responsibility, or, on the contrary, by weaker isolation.
We recruited not only programmers and analysts from our team to work on the project, but also other experts, such as epidemiologists from the Volga Region Medical Research University.
— Is the situation developing in line with your forecasts?
— Our calculations were based on the data as of April 6 and 7, when there were about 80 detected Covid-19 cases in Nizhny Novgorod, with a daily increase of about 20. For today’s date, we have predicted 204 detected cases in the city, while the real number is 224 (data as of April 13, 2020). By Friday, we forecast about 500 cases, with 1,600 by April 24.
Today, Yandex’s self-isolation index has fallen (it used to be over 4, but is 2 today). In addition, it does not count people who travel without active Yandex services. We will see the effect of violating self-isolation as a leap in about ten days from now.
— What is the main difficulty in building such a model?
— The main difficulty is that the course of the epidemic is influenced not only by policies that are frequently changing, but also by people’s level of responsibility. That’s why all forecasts are conditional and seek to answer the question: ‘What will happen if such and such measures are implemented?’ There are other factors as well, such as the share of asymptomatic carriers and immunity, and it’s really hard to estimate them. As of today, we are following the scenario with incomplete isolation, which leads to almost 12,000 cases in the near future. In addition, we have a model that predicts the number of cases for the near future in Russia within 5-7 days with an error of 1-2%, and in Nizhny Novgorod with an error of up to 10%.
— What data have you used? Have you analysed the cities that have already passed the peak phase (such as Wuhan)?
— In terms of data analysis, the Covid-19 pandemic is a unique opportunity to work not in a laboratory, but ‘in real life.’ It would be unprofessional to ignore global experience. We have used several sources for our model.
First, we used the data on most countries and regions that have published Covid-19 statistics, including 297 regions in the world, and 21 provinces in Italy. Second, we are constantly monitoring Russian and international research on Covid-19. And, as I mentioned above, we are in continuous contact with epidemiologists from the Nizhny Novgorod Region.
This means that we have collected the data for our model from all over the world, both aggregated by country, and distributed by regions and smaller territorial areas over the whole period of the outbreak. The analysis included several dozen cities and regions, in order to detect the ones that have epidemiologic parameters that are closest to ours (policies, population size and density).
— Is your current mathematical forecast short-term or long-term in nature?
— The time frame is crucial to our research. It determines the choice of certain methods. When we are building a short-term model, we use exponential function time series extrapolation. In this case, the model returns high accuracy for up to 7 days ahead of the epidemic reaching its plateau. The error in the short-term forecast for 7 days for the whole Russia is less than 2.5%.
When we model the whole period of the epidemic, we use a more complicated SEIR-model, which includes 11 differential equations with 14 variables that mark the virus’s epidemiologic characteristics, the policies introduced, specific characteristics of the location, and preparedness of the local health care system.
The data on Nizhny Novgorod are actively being accumulated. The situation is changing hourly. And still, we are lacking material needed to build precise models.
That’s why we are focusing on developing the modelling for the whole period of the epidemic in the Nizhny Novgorod Region. Coefficients for this model have been chosen by data from China (excluding Hubei), since it has already defeated the epidemic, and we can observe all its stages. Some of the parameters have been chosen statistically and based on epidemiologists’ opinions, while the rest of them are based on the time series of the cumulative number of cases in Nizhny Novgorod (at the time the model was built, there were 80 confirmed cases in Nizhny Novgorod, with a daily increase of 24). The model error per 11,500 people is 9% for a 7-day forecast.
— Will you be improving your calculations? If so, how often?
— We are working on the model and are trying to update it as we get new data. This is very important, because Nizhny Novgorod is only starting to experience a sharp growth in cases. We are updating our coefficients and forecasts daily. We are now making the model more complicated, so that it considers more measures on epidemic prevention and compliance with them, as well as factors related to healthcare system preparedness such as the number of equipped hospital beds and ventilators available.
— Do you believe it’s necessary to self-isolate with the Covid-19 pandemic spreading?
— For me, it is completely obvious that the main factors of an optimistic model would have been the timely introduction of almost complete home isolation on March 28, the Yandex self-isolation index at 4.5, and maintaining home isolation until the end of the epidemic. The factors for a realistic scenario would be partial abolition of home isolation on April 6, a Yandex self-isolation index at 3.8, and maintaining home isolation until the end of the epidemic.
I believe that self-isolation should be as strict as possible; otherwise, the epidemic will become uncontrollable, and many more people will suffer.
During the Covid-19 epidemic, it is essential to act preventively, since the lag from a policy introduction to its effect is about two weeks
And thousands of people may become ill during those two weeks. That’s why I believe that the Nizhny Novgorod authorities were smart to take preventive measures, when there were only 11 confirmed cases. This will help us avoid a huge number of victims, but only assuming that all city residents act consciously and responsibly.
Unfortunately, the self-isolation index is gradually falling. But I hope very much that Nizhny Novgorod residents prove to be responsible. Each of us should understand that when we violate the self-isolation regime, we compromise not only our health, but also health and lives of other people.
— You are finishing your studies at the Master’s programme in Data Mining
— Yes, this year I’m graduating from HSE University. The tasks set to us by our teachers have been very interesting and, importantly, applied. Initially, my research project was dedicated to recognizing human emotions in speech, which could help improve the quality of security systems. My graduation thesis is about image recognition – increasing the information capacity of attributes’ vectors mined by high-precision neural networks from images, with the use of human re-identification approach. I love participating in projects that have the ability to optimize certain processes or prevent negative scenarios from occurring. This is my way of changing the world for the better.
Interview by Yulia Guseva
See also:
Russian Researchers Explain Origins of Dangerous Coronavirus Variants
HSE researchers, in collaboration with their colleagues from Skoltech and the Central Research Institute for Epidemiology, have uncovered the mechanisms behind the emergence of new and dangerous coronavirus variants, such as Alpha, Delta, Omicron, and others. They have discovered that the likelihood of a substitution occurring at a specific site of the SARS-CoV-2 genome is dependent on concordant substitutions occurring at other sites. This explains why new and more contagious variants of the virus can emerge unexpectedly and differ significantly from those that were previously circulating. The study’s findings have been published in eLife.
Machine Learning Algorithm to Reduce Tester Workload
Researchers from HSE University and the Russian Technological University (RTU MIREA) have developed an intelligent system to automate software testing on a variety of platforms. Its computer vision feature is capable of recognising elements in a graphical user interface even after a redesign. The details are published in the Journal of the Siberian Federal University.
HSE University Hosts Third Summer School on Machine Learning in Bioinformatics
Between August 23rd and 25th, the HSE Faculty of Computer Science held its annual summer school on machine learning in bioinformatics. After two years of being held online, the school returned to an offline format for this year. Over three days, more than 120 participants attended lectures and seminars by leading experts in the field from institutions such as HSE University, Skoltech, AIRI, MSU, MIPT, Genotek, and Sber Artificial Intelligence Laboratory.
Machine Learning Helps Improve Perovskite Solar Cells
A team of researchers from HSE MIEM, LPI RAS, and the University of Southern California have applied machine learning to the analysis of internal defects in perovskite solar cells and proposed ways to improve their energy efficiency. The findings of the study performed on the Cs2AgBiBr6 double perovskite can be used to develop more efficient and durable perovskite-based materials. The paper has been published in the Journal of Physical Chemistry Letters.
HSE Faculty of Computer Science and Skoltech Hold Math of Machine Learning Olympiad 2022
HSE's Faculty of Computer Science and the Skolkovo Institute of Science and Technology have held the Mathematics of Machine Learning Olympiad for the fifth time. The participants competed for prizes and the opportunity to matriculate at two universities without exams by enrolling in the HSE and Skoltech joint master's programme in Math of Machine Learning.
Helping the Homeless with AI Technology
A research team from the HSE University Artificial Intelligence Centre led by Ivan Yamshchikov has developed a model to predict the success of efforts to rehabilitate homeless people. The model can predict the effectiveness of the work of organisations for the homeless with about 80% accuracy. The project was presented at a conference dedicated to the activities of social centres.
Machine Learning has Helped Forecast Global Hotspots of Unrest and Revolution
HSE scientists Andrey Korotayev and Ilya Medvedev used machine learning (ML) to build an index of instability in the world. The new method made it possible to use a large number of variables and distribute them in non-standard fashion.
HSE Biologists Prepare Strategy for Universal COVID Test
Russian researchers have developed a strategy to create a cheap and rapid COVID-19 test based on isothermal amplification. According to their publication in Applied Biochemistry and Microbiology, use of this strategy will make it possible to create universal test systems for any of the COVID-19 variants.
People’s Values Affect Their Attitudes to COVID-19 Restrictions
HSE social and political analysts have established which value models and circumstances promote support for restrictive government policies aimed at combatting the coronavirus pandemic. The research is published in Plos One.
Model of Predator-Prey Relationship Helps Predict Spread of COVID-19
Researchers from the HSE Faculty of Economic Sciences have proposed a mathematical model that describes the course of the COVID-19 pandemic, taking into account the restrictions applied in different countries. The model will help governments make reasonable and timely decisions on introducing or lifting restrictions. The paper was published in Eurasian Economic Review.