First International Data Analysis Olympiad Held in Moscow
On April 4, the winners of the First International Data Analysis Olympiad (IDAO) were announced. The event was organized by the HSE Faculty of Computer Science, Yandex, and Harbour.Space University (Barcelona) with the support of Sberbank. Magic City team from St. Petersburg took out first prize, a team from the Ukraine came second, and the Apex team from Belarus came third.
36 teams from Russia, Ukraine, Belarus, Azerbaijan, Israel, India and Peru took part in the IDAO finals. The competition was held in two stages. In the online qualification round, which took place from January 15 to February 11, 2018, the contestants had to solve a problem put forward by Yandex. The second, on-site, round was held on April 2 and 3 in Moscow at the Yandex headquarters. The finalists had 36 hours to solve a problem put forward by Sberbank.
‘Data Scientist is one of the most in-demand professions on the job market today, and it is essential that this area has more and more qualified specialists. Data analysis is what defines the future of business and the economy in general, and that’s why forward-looking companies are wanting to hire such experts’, said Stanislav Fedotov, curator at the Yandex School of Data Analysis, and Associate Professor at the HSE Faculty of Computer Science, ‘There are several well-known global events in programming, which popularize it and help detect the best of the best, such as ACM/ICPC (International Collegiate Programming Contest). The field of Data Science, however, is only just beginning to grow, both in Russia and internationally. We want IDAO to be a similarly big event in Data Science, and we will use it to promote this thrilling area among young professionals’.
According to Stanislav Fedotov, one of the important features of this Olympiad is that the participants get tasks that are related to real-life. For example, at the online stage, the contestants solved a task for Yandex.Market. When a user enters this service with a specific purpose, the system chooses a set of options which match their query. For example, when someone looks for a kettle, Yandex.Market offers them a lot of options of kettles with various prices and options. But teaching the system to predict queries would be much more interesting, as this would mean that it would offer not what the individual is looking for at that particular moment, but something they would be likely to want in future. ‘The participants were given a search history of notional users, and they had to predict the categories of items these individuals hadn’t looked at over the last three weeks, but would be likely to search for in a week’s time. They had to choose five users, suggest five categories of goods for each user and ‘guess’ at least one of them’, Stanislav Fedotov explained.
The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country
In the finals, 36 teams (41 teams passed the selection round, but not all of them managed to come to Moscow) had 36 hours to solve a problem put forward by Sberbank’s data scientists.
According to Andrey Chertok, Managing Director for Research and Development at Sberbank, the participants had to solve a real problem on which the Sberbank team worked recently, and which is faced by all banks. The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country. The problem is that cash delivery isn’t always performed effectively, and as a result, cash lies useless in some ATMs, while others run out of cash too quickly.
‘The bank’s losses due to excessive money just ‘lying around’ in ATMs amounts to billions of roubles annually’, Andrey Chertok emphasized. ‘Our team uses data analysis more and more frequently to solve such problems. For example, the problem with cash delivery optimization and forecasting the amount of money to be cashed from a specific ATM was successfully solved with machine learning methods. We proposed a mini version of what we’ve done at Sberbank to the Olympiad participants.’ The finalists worked with real data of Sberbank ATMs’ locations and loading. During the process, the teams faced the same problems that are faced by bank data analysis teams in real life. This includes whether or not the data should be cleaned, and that the data sometimes has so-called ‘outliers’ which relate to more intensive cash delivery on days when salaries or pensions are paid. ‘In a short period of time, all the participants were quite successful in building usable models and got some hands-on experience in solving a real banking task’, said Andrey Chertok. ‘I believe, at this Olympiad, we managed to bring together competitive spirit and applicability’.
Applicability and effectiveness were important requirements of the prototypes that the finalists worked on. According to Tamara Voznesenskaya, Deputy Dean of the HSE Faculty of Computer Science, the main objective of any modeling in this area is prediction precision. However, the experts who carry out academic analysis don’t always care about parameters such as the time or resources spent, and as a result, their models are not always applicable in real life. ‘The algorithms may build high-quality models, but they require either a lot of time, or a big memory volume, and they can’t be integrated into, for example, mobile apps’, Tamara Voznesenskaya explained. Contestants were therefore encouraged to follow the principle ‘Efficiency is as important as quality’.
The organizers hope that as the Olympiad continues to grow, specialists in data analysis (fans of Kaggle competitions) and competitive programming will unite in teams.
According to the Olympiad winners, members of the Magic City team from St. Petersburg State University, their solution was based on data cleaning, which they heard about from Sberbank analysts. ‘Our initial solutions didn’t provide stable results in tests, they were ‘shaky’. We decided to remove all the ‘garbage’, track the abnormalities, and only detect the most necessary information since we didn’t have much data and quality played a crucial role’, explained Artem Plotkin, Roman Pyankov, and Sergey Arefyev. ‘We subsequently proceeded to work with XCBoost, a ready-made algorithm, and decide what had to be changed or added to it’.
Alexander Makeev from Ukraine, a regular participant in Kaggle, took silver: ‘Kaggle can take three years, half a year, or a year, and the teams are not limited in terms of the number of members. Participants can create crazy models that are calculated over many weeks using super powerful resources. It’s impossible to come first if you don’t have all these resources. In addition, the resource requirements make it impossible to apply these solutions in real life’.
According to the team in third place from Apex, representing Yanka Kupala State University of Grodno (Belarus) and consisting of Evgeny Demidovich, Konstantin Mlynarchik, and Sergey Petrov, the Olympiad will be remembered thanks to the fact that the final round lasted for two days offline, not several months online. Also, the tasks themselves were very memorable. ‘These tasks were not only related to machine learning. We also had to invent something ourselves, to act as data engineers’, Evgeny Demidovich noted. ‘We didn’t have much data in the task, so we expanded the data, and tried to help the model find a solution, to invent a data representation that would ensure the model wouldn’t fail. We used the Random forest algorithm. We took 20 models, trained them on the data we cut, and then averaged these 20 models in order to come up with a more stable solution’.
IDAO winners received valuable prizes and the champions were awarded laptops. In addition, the HSE Faculty of Computer Science will take into account the winners’ achievements in enrolling in master’s programmes, and Harbour.Space University offers scholarships that cover full tuition.
The organizers are going to make the International Data Analysis Olympiad a regular event. According to Rostislav Yavorskiy, Associate Professor at the HSE Faculty of Computer Science, this competition is highly relevant since there is a high global interest in Data Science. However, there will be a dearth of well-qualified professionals in the field for a long time: ‘Our Olympiad has several purposes: to attract as many young specialists in this area as possible, to motivate them in self-development, and to contribute to the development of the professional community. I believe we’ve made the first step, and we hope to continue’.
On February 18, the online round of the International Data Analysis Olympiad (IDAO) officially finished. The Data analysis competition is organized by the HSE Faculty of Computer Science and Yandex with the support of Sberbank. This year 1287 teams from 78 countries took part in the online round.
The HSE Faculty of Computer Science and Yandex with the support of Sberbank are to organize the 2nd International Data Analysis Olympiad (IDAO). The Olympiad is held by leading experts in data analysis for their future colleagues and aims to bring together analysts, scientists, professionals, and junior researchers.
The Data Culture project at HSE is celebrating its first anniversary! Its key concept is that all students should possess at least basic competencies in data analysis, since data skills are increasingly becoming an entry-level requirement for professionals in almost every field. More than half of all HSE programmes were involved in the project in its first year. In the following academic year, the project is being expanded to cover all programmes and all students. Let’s have a closer look at the project’s profile and try to see the ways students can benefit.
The Higher School of Economics has joined the LHCb collaboration at the Large Hadron Collider, which is operated by the European Organization for Nuclear Research. The group from HSE will consist of researchers from the Laboratory of Methods for Big Data Analysis (LAMBDA). This will give HSE researchers full access to data from the collaboration and allow the university to participate in various projects.
On June 5th, the results of the Competition of Innovations in Education (KIVO–2018) were announced. The competition was organized by the HSE Institute of Education together with the Rybakov Fund. Out of 503 applications, the jury selected 28 projects. Their authors will take part in an innovation accelerator summer school, which will take in Moscow in late June. The competition finals will be held in autumn.
The joint department with SAS at the HSE Faculty of Computer Science aims to support educational programmes in data analysis and enrich teaching and learning with business expertise. The Higher School of Economics is the first Russian university to have founded a joint department with SAS.
Thuy Tien Huynh, a 21-year-old from Ho Chi Minh City, Vietnam, has long been interested in Russian language and culture, classic Russian literary works, movies, and music from the Soviet era. While searching for opportunities to study abroad in Russia as a student in the faculty of Russian Linguistics and Literature at the University of Social Sciences and Humanities in Vietnam, she came across information on the HSE International Olympiad and decided to give it a shot.
On February 20, the first online stage of the International Data Analysis Olympiad (IDAO) was completed. IDAO was organised by the Faculty of Computer Science of the Higher School of Economics in partnership with Harbour.Space University (Barcelona), Yandex and with the Gold sponsor, Sberbank.
The IDAO (International Data Analysis Olympiad), created by leading experts in data analysis for their future colleagues, aims to bring together analysts, scientists, professionals, and junior researchers from all over the world on a single platform. This is the first time an event of this scale will be held in Russia. The HSE Faculty of Computer Science, Yandex and Harbour. Space University organize the Olympiad with the support of Sberbank.