First International Data Analysis Olympiad Held in Moscow
On April 4, the winners of the First International Data Analysis Olympiad (IDAO) were announced. The event was organized by the HSE Faculty of Computer Science, Yandex, and Harbour.Space University (Barcelona) with the support of Sberbank. Magic City team from St. Petersburg took out first prize, a team from the Ukraine came second, and the Apex team from Belarus came third.
36 teams from Russia, Ukraine, Belarus, Azerbaijan, Israel, India and Peru took part in the IDAO finals. The competition was held in two stages. In the online qualification round, which took place from January 15 to February 11, 2018, the contestants had to solve a problem put forward by Yandex. The second, on-site, round was held on April 2 and 3 in Moscow at the Yandex headquarters. The finalists had 36 hours to solve a problem put forward by Sberbank.
‘Data Scientist is one of the most in-demand professions on the job market today, and it is essential that this area has more and more qualified specialists. Data analysis is what defines the future of business and the economy in general, and that’s why forward-looking companies are wanting to hire such experts’, said Stanislav Fedotov, curator at the Yandex School of Data Analysis, and Associate Professor at the HSE Faculty of Computer Science, ‘There are several well-known global events in programming, which popularize it and help detect the best of the best, such as ACM/ICPC (International Collegiate Programming Contest). The field of Data Science, however, is only just beginning to grow, both in Russia and internationally. We want IDAO to be a similarly big event in Data Science, and we will use it to promote this thrilling area among young professionals’.
According to Stanislav Fedotov, one of the important features of this Olympiad is that the participants get tasks that are related to real-life. For example, at the online stage, the contestants solved a task for Yandex.Market. When a user enters this service with a specific purpose, the system chooses a set of options which match their query. For example, when someone looks for a kettle, Yandex.Market offers them a lot of options of kettles with various prices and options. But teaching the system to predict queries would be much more interesting, as this would mean that it would offer not what the individual is looking for at that particular moment, but something they would be likely to want in future. ‘The participants were given a search history of notional users, and they had to predict the categories of items these individuals hadn’t looked at over the last three weeks, but would be likely to search for in a week’s time. They had to choose five users, suggest five categories of goods for each user and ‘guess’ at least one of them’, Stanislav Fedotov explained.
The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country
In the finals, 36 teams (41 teams passed the selection round, but not all of them managed to come to Moscow) had 36 hours to solve a problem put forward by Sberbank’s data scientists.
According to Andrey Chertok, Managing Director for Research and Development at Sberbank, the participants had to solve a real problem on which the Sberbank team worked recently, and which is faced by all banks. The task is very applicable: it is about optimizing the cash supply for Sberbank ATMs, numbering tens of thousands across the country. The problem is that cash delivery isn’t always performed effectively, and as a result, cash lies useless in some ATMs, while others run out of cash too quickly.
‘The bank’s losses due to excessive money just ‘lying around’ in ATMs amounts to billions of roubles annually’, Andrey Chertok emphasized. ‘Our team uses data analysis more and more frequently to solve such problems. For example, the problem with cash delivery optimization and forecasting the amount of money to be cashed from a specific ATM was successfully solved with machine learning methods. We proposed a mini version of what we’ve done at Sberbank to the Olympiad participants.’ The finalists worked with real data of Sberbank ATMs’ locations and loading. During the process, the teams faced the same problems that are faced by bank data analysis teams in real life. This includes whether or not the data should be cleaned, and that the data sometimes has so-called ‘outliers’ which relate to more intensive cash delivery on days when salaries or pensions are paid. ‘In a short period of time, all the participants were quite successful in building usable models and got some hands-on experience in solving a real banking task’, said Andrey Chertok. ‘I believe, at this Olympiad, we managed to bring together competitive spirit and applicability’.
Applicability and effectiveness were important requirements of the prototypes that the finalists worked on. According to Tamara Voznesenskaya, Deputy Dean of the HSE Faculty of Computer Science, the main objective of any modeling in this area is prediction precision. However, the experts who carry out academic analysis don’t always care about parameters such as the time or resources spent, and as a result, their models are not always applicable in real life. ‘The algorithms may build high-quality models, but they require either a lot of time, or a big memory volume, and they can’t be integrated into, for example, mobile apps’, Tamara Voznesenskaya explained. Contestants were therefore encouraged to follow the principle ‘Efficiency is as important as quality’.
The organizers hope that as the Olympiad continues to grow, specialists in data analysis (fans of Kaggle competitions) and competitive programming will unite in teams.
According to the Olympiad winners, members of the Magic City team from St. Petersburg State University, their solution was based on data cleaning, which they heard about from Sberbank analysts. ‘Our initial solutions didn’t provide stable results in tests, they were ‘shaky’. We decided to remove all the ‘garbage’, track the abnormalities, and only detect the most necessary information since we didn’t have much data and quality played a crucial role’, explained Artem Plotkin, Roman Pyankov, and Sergey Arefyev. ‘We subsequently proceeded to work with XCBoost, a ready-made algorithm, and decide what had to be changed or added to it’.
Alexander Makeev from Ukraine, a regular participant in Kaggle, took silver: ‘Kaggle can take three years, half a year, or a year, and the teams are not limited in terms of the number of members. Participants can create crazy models that are calculated over many weeks using super powerful resources. It’s impossible to come first if you don’t have all these resources. In addition, the resource requirements make it impossible to apply these solutions in real life’.
According to the team in third place from Apex, representing Yanka Kupala State University of Grodno (Belarus) and consisting of Evgeny Demidovich, Konstantin Mlynarchik, and Sergey Petrov, the Olympiad will be remembered thanks to the fact that the final round lasted for two days offline, not several months online. Also, the tasks themselves were very memorable. ‘These tasks were not only related to machine learning. We also had to invent something ourselves, to act as data engineers’, Evgeny Demidovich noted. ‘We didn’t have much data in the task, so we expanded the data, and tried to help the model find a solution, to invent a data representation that would ensure the model wouldn’t fail. We used the Random forest algorithm. We took 20 models, trained them on the data we cut, and then averaged these 20 models in order to come up with a more stable solution’.
IDAO winners received valuable prizes and the champions were awarded laptops. In addition, the HSE Faculty of Computer Science will take into account the winners’ achievements in enrolling in master’s programmes, and Harbour.Space University offers scholarships that cover full tuition.
The organizers are going to make the International Data Analysis Olympiad a regular event. According to Rostislav Yavorskiy, Associate Professor at the HSE Faculty of Computer Science, this competition is highly relevant since there is a high global interest in Data Science. However, there will be a dearth of well-qualified professionals in the field for a long time: ‘Our Olympiad has several purposes: to attract as many young specialists in this area as possible, to motivate them in self-development, and to contribute to the development of the professional community. I believe we’ve made the first step, and we hope to continue’.
HSE University-St Petersburg and the Indian Institute of Technology Delhi (IIT Delhi), a leading Indian university, have agreed to launch joint research projects in the field of social, political studies, humanities, and data analysis for master's students. On the Russian side, this work will be coordinated by the HSE St Petersburg School of Social Sciences.
Based on the assessment results, the programme's strengths were identified as its broad coverage of educational disciplines, in-depth exploration of theoretical aspects in machine learning, the quality of staff, and the involvement of potential employers. This is the fifth educational programme at the HSE Faculty of Computer Science to receive this prestigious professional and public accreditation.
The International Collegiate Programming Contest (ICPC) Challenge took place at the end of August. Unlike the traditional ICPC format, in which teams of three students solve a set of algorithmic tasks, participants of the ICPC Challenge must individually solve an optimisation problem that is relevant to science-driven industry, but which does not have an exact algorithmic solution. In the end, prize places were taken by Mikhail Gustokashin, Director of the Centre of Student Competitions at the HSE University Faculty of Computer Science (FCS), and Dmitry Rempel, student of the faculty.
At Sarov Technopark, Researchers from HSE Faculty of Computer Science Discussed AI for Data Analysis in Physics
The Laboratory of Methods for Data Analysis of the HSE Faculty of Computer Science, in collaboration with the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF, Sarov) and the National Centre for Physics and Mathematics, recently held the Second All-Russian School-Seminar on High Energy Physics and Accelerator Technology.
HSE University’s Faculty of Computer Science and Samolet, one of Russia’s largest development companies, are launching a new Master’s Programme 'Data Science for Construction, Architecture and Engineering'. Samolet will finance the tuition of the 30 best applicants on the programme.
In early May, the HSE Institute of Ecology and the International Children's and Youth Award ‘Ecology is Everyone's Business’ held a joint seminar in Dagestan, where they discussed the launch of youth environmental projects for federal and international competitions. At the meeting, the Institute's experts presented methods of organising project activity in the field of ecology and sustainable development for educators and young people in Dagestan. Teachers and students from more than 50 schools, colleges and universities of the republic took part in the event.
The awards ceremony for the winners of the Data Analysis National Olympiad (DANO) has taken place at the HSE Cultural Centre. This is the second time the Olympiad has been held, and was organized by HSE University together with Tinkoff. About 7,000 participants from different regions of Russia and 13 other countries took part in the event. Those coming in the top 25 of the individual ratings will be able to study at HSE University for free.
Winner of the International KIVO Competition of Innovations in Education Announced in Nizhny Novgorod
The final 10 teams presented their projects to experts in a pitching session at the finals of the KIVO Competition of Innovations in Education. The project defence and award ceremony took place as a side event of the Global Impact Conference, an international platform that brings together experts in the field of sustainable development.
In September, HSE University announced the results of a competition of digital projects by early-career HSE scientists. The event was organised within the framework of the strategic project ‘Digital Transformation: Technologies, Effects, Efficiency’. The organisers selected 8 out of 22 applications. The research teams have already started to implement their projects, and the results will be presented at the end of November. The HSE News Service shares the details of three of the highest-scoring projects in the competition. The creators of the projects are staff members of the HSE Center for Language and Brain, MIEM, and the Faculty of Computer Science.
Data analysis enthusiasts from different regions of Russia and 13 foreign countries recently took part in the Data Analysis National Olympiad (DANO). The results of the first round will be announced on October 20th.