First International Data Analysis Olympiad Held in Moscow
On April 4, the winners of the First International Data Analysis Olympiad (IDAO) were announced. The event was organized by the HSE Faculty of Computer Science, Yandex, and Harbour.Space University (Barcelona) with the support of Sberbank. Magic City team from St. Petersburg took out first prize, a team from the Ukraine came second, and the Apex team from Belarus came third.
36 teams from Russia, Ukraine, Belarus, Azerbaijan, Israel, India and Peru took part in the IDAO finals. The competition was held in two stages. In the online qualification round, which took place from January 15 to February 11, 2018, the contestants had to solve a problem put forward by Yandex. The second, on-site, round was held on April 2 and 3 in Moscow at the Yandex headquarters. The finalists had 36 hours to solve a problem put forward by Sberbank.
‘Data Scientist is one of the most in-demand professions on the job market today, and it is essential that this area has more and more qualified specialists. Data analysis is what defines the future of business and the economy in general, and that’s why forward-looking companies are wanting to hire such experts’, said Stanislav Fedotov, curator at the Yandex School of Data Analysis, and Associate Professor at the HSE Faculty of Computer Science, ‘There are several well-known global events in programming, which popularize it and help detect the best of the best, such as ACM/ICPC (International Collegiate Programming Contest). The field of Data Science, however, is only just beginning to grow, both in Russia and internationally. We want IDAO to be a similarly big event in Data Science, and we will use it to promote this thrilling area among young professionals’.
According to Stanislav Fedotov, one of the important features of this Olympiad is that the participants get tasks that are related to real-life. For example, at the online stage, the contestants solved a task for Yandex.Market. When a user enters this service with a specific purpose, the system chooses a set of options which match their query. For example, when someone looks for a kettle, Yandex.Market offers them a lot of options of kettles with various prices and options. But teaching the system to predict queries would be much more interesting, as this would mean that it would offer not what the individual is looking for at that particular moment, but something they would be likely to want in future. ‘The participants were given a search history of notional users, and they had to predict the categories of items these individuals hadn’t looked at over the last three weeks, but would be likely to search for in a week’s time. They had to choose five users, suggest five categories of goods for each user and ‘guess’ at least one of them’, Stanislav Fedotov explained.
The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country
In the finals, 36 teams (41 teams passed the selection round, but not all of them managed to come to Moscow) had 36 hours to solve a problem put forward by Sberbank’s data scientists.
According to Andrey Chertok, Managing Director for Research and Development at Sberbank, the participants had to solve a real problem on which the Sberbank team worked recently, and which is faced by all banks. The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country. The problem is that cash delivery isn’t always performed effectively, and as a result, cash lies useless in some ATMs, while others run out of cash too quickly.
‘The bank’s losses due to excessive money just ‘lying around’ in ATMs amounts to billions of roubles annually’, Andrey Chertok emphasized. ‘Our team uses data analysis more and more frequently to solve such problems. For example, the problem with cash delivery optimization and forecasting the amount of money to be cashed from a specific ATM was successfully solved with machine learning methods. We proposed a mini version of what we’ve done at Sberbank to the Olympiad participants.’ The finalists worked with real data of Sberbank ATMs’ locations and loading. During the process, the teams faced the same problems that are faced by bank data analysis teams in real life. This includes whether or not the data should be cleaned, and that the data sometimes has so-called ‘outliers’ which relate to more intensive cash delivery on days when salaries or pensions are paid. ‘In a short period of time, all the participants were quite successful in building usable models and got some hands-on experience in solving a real banking task’, said Andrey Chertok. ‘I believe, at this Olympiad, we managed to bring together competitive spirit and applicability’.
Applicability and effectiveness were important requirements of the prototypes that the finalists worked on. According to Tamara Voznesenskaya, Deputy Dean of the HSE Faculty of Computer Science, the main objective of any modeling in this area is prediction precision. However, the experts who carry out academic analysis don’t always care about parameters such as the time or resources spent, and as a result, their models are not always applicable in real life. ‘The algorithms may build high-quality models, but they require either a lot of time, or a big memory volume, and they can’t be integrated into, for example, mobile apps’, Tamara Voznesenskaya explained. Contestants were therefore encouraged to follow the principle ‘Efficiency is as important as quality’.
The organizers hope that as the Olympiad continues to grow, specialists in data analysis (fans of Kaggle competitions) and competitive programming will unite in teams.
According to the Olympiad winners, members of the Magic City team from St. Petersburg State University, their solution was based on data cleaning, which they heard about from Sberbank analysts. ‘Our initial solutions didn’t provide stable results in tests, they were ‘shaky’. We decided to remove all the ‘garbage’, track the abnormalities, and only detect the most necessary information since we didn’t have much data and quality played a crucial role’, explained Artem Plotkin, Roman Pyankov, and Sergey Arefyev. ‘We subsequently proceeded to work with XCBoost, a ready-made algorithm, and decide what had to be changed or added to it’.
Alexander Makeev from Ukraine, a regular participant in Kaggle, took silver: ‘Kaggle can take three years, half a year, or a year, and the teams are not limited in terms of the number of members. Participants can create crazy models that are calculated over many weeks using super powerful resources. It’s impossible to come first if you don’t have all these resources. In addition, the resource requirements make it impossible to apply these solutions in real life’.
According to the team in third place from Apex, representing Yanka Kupala State University of Grodno (Belarus) and consisting of Evgeny Demidovich, Konstantin Mlynarchik, and Sergey Petrov, the Olympiad will be remembered thanks to the fact that the final round lasted for two days offline, not several months online. Also, the tasks themselves were very memorable. ‘These tasks were not only related to machine learning. We also had to invent something ourselves, to act as data engineers’, Evgeny Demidovich noted. ‘We didn’t have much data in the task, so we expanded the data, and tried to help the model find a solution, to invent a data representation that would ensure the model wouldn’t fail. We used the Random forest algorithm. We took 20 models, trained them on the data we cut, and then averaged these 20 models in order to come up with a more stable solution’.
IDAO winners received valuable prizes and the champions were awarded laptops. In addition, the HSE Faculty of Computer Science will take into account the winners’ achievements in enrolling in master’s programmes, and Harbour.Space University offers scholarships that cover full tuition.
The organizers are going to make the International Data Analysis Olympiad a regular event. According to Rostislav Yavorskiy, Associate Professor at the HSE Faculty of Computer Science, this competition is highly relevant since there is a high global interest in Data Science. However, there will be a dearth of well-qualified professionals in the field for a long time: ‘Our Olympiad has several purposes: to attract as many young specialists in this area as possible, to motivate them in self-development, and to contribute to the development of the professional community. I believe we’ve made the first step, and we hope to continue’.
Thuy Tien Huynh, a 21-year-old from Ho Chi Minh City, Vietnam, has long been interested in Russian language and culture, classic Russian literary works, movies, and music from the Soviet era. While searching for opportunities to study abroad in Russia as a student in the faculty of Russian Linguistics and Literature at the University of Social Sciences and Humanities in Vietnam, she came across information on the HSE International Olympiad and decided to give it a shot.
On February 20, the first online stage of the International Data Analysis Olympiad (IDAO) was completed. IDAO was organised by the Faculty of Computer Science of the Higher School of Economics in partnership with Harbour.Space University (Barcelona), Yandex and with the Gold sponsor, Sberbank.
The IDAO (International Data Analysis Olympiad), created by leading experts in data analysis for their future colleagues, aims to bring together analysts, scientists, professionals, and junior researchers from all over the world on a single platform. This is the first time an event of this scale will be held in Russia. The HSE Faculty of Computer Science, Yandex and Harbour. Space University organize the Olympiad with the support of Sberbank.
Registration is now open for the Open Doors: Russian Scholarship Project, Russia’s first competition for international students applying to Master’s programmes. With registration open until January 15, 2018, the competition consists of two remote stages: a portfolio contest and an online contest. The first stage will take place from December 1 to January 15.
Elizaveta Povalyaeva, Artem Fomenko and Ismail Khamitov, fourth-year students of Software Engineering, took first place in the student category at the BPI Challenge 2017, a business process analysis competition. They presented their solution at the 13th International Workshop on Business Process Intelligence 2017 and are the first students from Russia to participate in the event and win, with the highest-ever number of projects participating in the competition.
Work by graduates of the HSE undergraduate programme in sociology took first place at the Russian national undergraduate and graduate student academic research competition in the social sciences category. In addition, research by a graduate of the HSE St. Petersburg won an audience’s choice award.
First-year ICEF master’s student Tamara Shangina has won the McKinsey Moscow Next Generation Women Leadership Award 2017 with her project on data analysis. Below, Tamara discusses how she decided to switch gears from programming to finance, as well as what role case championships and a student loan from Sberbank have played in her life.
On September 12, the winner of the 2016 Innovation in Education (KIvO) award was announced at the EdCrunch International Conference on New Educational Technologies. Taking home the prize this year was The Language of Generations, a social project that pairs up senior citizens from Russia with foreign students who are learning Russian.
Concept lattices can help spot pedophiles on the web. Researchers of the HSE's Department of Data Analysis and Artificial Intelligence have helped the Dutch police create a computer program that can detect internet pedophiles and even determine how dangerous they can be.
The 4th Summer School of the Laboratory for Comparative Social Research (LCSR) was centered around Categorical Data Analysis and saw the participation of more than 40 junior academics from universities and research centres of Russia, Ukraine, Belarus, Italy, Germany, Poland, Romania, Israel and the U.S.