‘Data Mining Can Help Forecast the Pandemic Situation with an Accuracy Within 2.5%’
A mathematical model of Covid-19 spreading in Nizhny Novgorod Region, which has been created by the Big Data Laboratory at Nizhny Novgorod Development Strategy Project Office, has been widely discussed in the media and on social networks. The research was led by Anastasia Popova, a master’s student of HSE University in Nizhny Novgorod, repeat winner of machine learning competitions, and winner of Ilya Segalovich Award by Yandex. In the following interview given on April 15, Anastasia speaks about how the model was developed, the data it uses, and long-term potential applications.
— What data is the coronavirus spread model based on? And how well equipped is the Big Data Laboratory to undertake such studies?
— The Big Data Laboratory, which I’m heading, focuses on various development programmes for the region, from transport models to research and education centres.
The Nizhny Novgorod Development Strategy Project Office tasked the Laboratory with forecasting how the epidemic situation would develop in the region. We were to use mathematical calculations to predict how the spread of Covid-19 in the Nizhny Novgorod Region would be impacted by people demonstrating increased responsibility, or, on the contrary, by weaker isolation.
We recruited not only programmers and analysts from our team to work on the project, but also other experts, such as epidemiologists from the Volga Region Medical Research University.
— Is the situation developing in line with your forecasts?
— Our calculations were based on the data as of April 6 and 7, when there were about 80 detected Covid-19 cases in Nizhny Novgorod, with a daily increase of about 20. For today’s date, we have predicted 204 detected cases in the city, while the real number is 224 (data as of April 13, 2020). By Friday, we forecast about 500 cases, with 1,600 by April 24.
Today, Yandex’s self-isolation index has fallen (it used to be over 4, but is 2 today). In addition, it does not count people who travel without active Yandex services. We will see the effect of violating self-isolation as a leap in about ten days from now.
— What is the main difficulty in building such a model?
— The main difficulty is that the course of the epidemic is influenced not only by policies that are frequently changing, but also by people’s level of responsibility. That’s why all forecasts are conditional and seek to answer the question: ‘What will happen if such and such measures are implemented?’ There are other factors as well, such as the share of asymptomatic carriers and immunity, and it’s really hard to estimate them. As of today, we are following the scenario with incomplete isolation, which leads to almost 12,000 cases in the near future. In addition, we have a model that predicts the number of cases for the near future in Russia within 5-7 days with an error of 1-2%, and in Nizhny Novgorod with an error of up to 10%.
— What data have you used? Have you analysed the cities that have already passed the peak phase (such as Wuhan)?
— In terms of data analysis, the Covid-19 pandemic is a unique opportunity to work not in a laboratory, but ‘in real life.’ It would be unprofessional to ignore global experience. We have used several sources for our model.
First, we used the data on most countries and regions that have published Covid-19 statistics, including 297 regions in the world, and 21 provinces in Italy. Second, we are constantly monitoring Russian and international research on Covid-19. And, as I mentioned above, we are in continuous contact with epidemiologists from the Nizhny Novgorod Region.
This means that we have collected the data for our model from all over the world, both aggregated by country, and distributed by regions and smaller territorial areas over the whole period of the outbreak. The analysis included several dozen cities and regions, in order to detect the ones that have epidemiologic parameters that are closest to ours (policies, population size and density).
— Is your current mathematical forecast short-term or long-term in nature?
— The time frame is crucial to our research. It determines the choice of certain methods. When we are building a short-term model, we use exponential function time series extrapolation. In this case, the model returns high accuracy for up to 7 days ahead of the epidemic reaching its plateau. The error in the short-term forecast for 7 days for the whole Russia is less than 2.5%.
When we model the whole period of the epidemic, we use a more complicated SEIR-model, which includes 11 differential equations with 14 variables that mark the virus’s epidemiologic characteristics, the policies introduced, specific characteristics of the location, and preparedness of the local health care system.
The data on Nizhny Novgorod are actively being accumulated. The situation is changing hourly. And still, we are lacking material needed to build precise models.
That’s why we are focusing on developing the modelling for the whole period of the epidemic in the Nizhny Novgorod Region. Coefficients for this model have been chosen by data from China (excluding Hubei), since it has already defeated the epidemic, and we can observe all its stages. Some of the parameters have been chosen statistically and based on epidemiologists’ opinions, while the rest of them are based on the time series of the cumulative number of cases in Nizhny Novgorod (at the time the model was built, there were 80 confirmed cases in Nizhny Novgorod, with a daily increase of 24). The model error per 11,500 people is 9% for a 7-day forecast.
— Will you be improving your calculations? If so, how often?
— We are working on the model and are trying to update it as we get new data. This is very important, because Nizhny Novgorod is only starting to experience a sharp growth in cases. We are updating our coefficients and forecasts daily. We are now making the model more complicated, so that it considers more measures on epidemic prevention and compliance with them, as well as factors related to healthcare system preparedness such as the number of equipped hospital beds and ventilators available.
— Do you believe it’s necessary to self-isolate with the Covid-19 pandemic spreading?
— For me, it is completely obvious that the main factors of an optimistic model would have been the timely introduction of almost complete home isolation on March 28, the Yandex self-isolation index at 4.5, and maintaining home isolation until the end of the epidemic. The factors for a realistic scenario would be partial abolition of home isolation on April 6, a Yandex self-isolation index at 3.8, and maintaining home isolation until the end of the epidemic.
I believe that self-isolation should be as strict as possible; otherwise, the epidemic will become uncontrollable, and many more people will suffer.
During the Covid-19 epidemic, it is essential to act preventively, since the lag from a policy introduction to its effect is about two weeks
And thousands of people may become ill during those two weeks. That’s why I believe that the Nizhny Novgorod authorities were smart to take preventive measures, when there were only 11 confirmed cases. This will help us avoid a huge number of victims, but only assuming that all city residents act consciously and responsibly.
Unfortunately, the self-isolation index is gradually falling. But I hope very much that Nizhny Novgorod residents prove to be responsible. Each of us should understand that when we violate the self-isolation regime, we compromise not only our health, but also health and lives of other people.
— You are finishing your studies at the Master’s programme in Data Mining
— Yes, this year I’m graduating from HSE University. The tasks set to us by our teachers have been very interesting and, importantly, applied. Initially, my research project was dedicated to recognizing human emotions in speech, which could help improve the quality of security systems. My graduation thesis is about image recognition – increasing the information capacity of attributes’ vectors mined by high-precision neural networks from images, with the use of human re-identification approach. I love participating in projects that have the ability to optimize certain processes or prevent negative scenarios from occurring. This is my way of changing the world for the better.
Interview by Yulia Guseva
In a recent report, HSE experts evaluated the world’s 14 countries hit hardest by the COVID-19 pandemic based on data (including the number of recorded deaths) from May 1, 2020 or later. The report also examined 16 other countries whose experience was considered significant. While refraining from making generalizations, experts nonetheless noted that leaders in Europe and the United States have generally not responded to the situation as effectively as their Asian counterparts. Africa, meanwhile, follows its own course, while the situation in Brazil is worse.
China was the first country to be hit by the coronavirus, and other countries have looked to its handling of the outbreak as a model. Former HSE student Sergey Stepanov, who has been studying and working in China for the past four years, shared his personal experience with the COVID-19 outbreak while in China.
The Covid-19 pandemic has severely restricted social contacts for people everywhere, and especially for the elderly. Yet, HSE researchers found that meeting with friends and relatives was one of the key conditions for happiness among Europeans aged 50 and older. In fact, such social contacts were just as important for them as their health, material well-being, or professional fulfilment. The report on the results of the study was prepared for the XXI April International Academic Conference on Economic and Social Development.
Demographers have been thrust to the frontlines of the world’s efforts to evaluate the coronavirus pandemic, but so far without any weapons. Lacking data, they cannot reliably assess the situation. And this is despite the fact that the Internet, it would seem, is flush with statistics. A webinar hosted by the HSE International Laboratory for Population and Health discussed the paradoxes of quantitative approaches to COVID-19. IQ.HSE spoke with webinar participants Vladimir Shkolnikov, Inna Danilova, and Dmitry Jdanov.
The HSE Institute for Social Policy is renewing its monitoring of the population’s socio-economic status and social well-being. The first issue includes a 2019 summary and short analysis of the trends related to the Covid-19 pandemic. Experts have concluded that many unfavourable trends were not overcome last year and that the current crisis will worsen the situation.
In late March at the Kommunarka COVID-19 Hospital outside of Moscow, a pilot project was launched that allows healthcare professionals to coordinate their efforts remotely while working with coronavirus patients. The system, which is run using ‘smart glasses’, is now in operation at ten Moscow hospitals. Ilya Flaks, a graduate of HSE’s Master’s Programme in E-Business and project founder, spoke with the HSE News Service about how the smart glasses help doctors and what prospects lie ahead for using virtual reality (VR) in health care.
The OECD Committee for Scientific and Technological Policy (STP) held its first meeting of the year in early April. HSE staff members Mikhail Gershman, Dirk Meissner and Elena Sabelnikova joined Ministry of Education and Science representatives as members of the Russian delegation to the event. Here, they explain which approaches participants discussed for combating the coronavirus and for preventing other global crises.
The coronavirus pandemic is transforming modern society, reviving old social practices and formulas such as the Russian ‘flat-car-dacha’ principle, while opening new technological frontiers and creating new cultural skills. Professor Vitaly Kurennoy, Head of HSE’s School of Cultural Studies, discusses these issues in an op-ed for Izvestiia. Read the full translation of the article below.
On March 18, the Older Generation Support Centre opened at HSE University. HSE News Service continues to interview students and employees of the university who are helping senior citizens through the difficulties of self-isolation by delivering medicine and food, sharing useful information, and holding online classes. Today, volunteers who teach foreign languages to pensioners speak about how they have become volunteers.
Conducting online lectures and research seminars on Zoom, providing assistance and useful instructions to the university community—all this is necessary for a smooth transition to remote learning during the COVID-19 pandemic. Over the past two weeks, various HSE departments have done all of this in order to provide their students with online instruction. The IT Office particularly felt the weight of this large task. HSE News spoke with digital service staff about what they have managed to do in such a short time frame.