• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE Students Take Second Place at International Hackathon

Two students from the ‘Financial Technologies and Data Analysis’ master's programme, Nikita Churkin and Dmitry Simakov, took second place  and won 100,000 rupees (approximately 100,000 rubles) at the WNS Analytics Wizard international hackathon 2018, held on September 14-16. The competition was organised by the Indian platform Analytics Vidhuya (similar to the American Kaggle).

This is the largest platform in India among Data Science competitions. Participants were asked to solve the problem of binary classification — to predict whether an employee would be promoted or not depending on their characteristics, for example, work experience and the department where they work.

3,846 participants - Kaggle masters, several representatives from Russia, analysts and data scientists from India - competed at the hackathon. 1,300 of them sent their solutions in for review.

The most difficult thing for the HSE students was to figure out how to get the maximum information out of 13 factors. As a result, over 4500 features were obtained, which led to an even more difficult task: how to select the most useful features correctly and within a reasonable time.

The main difference between WNS Analytics Wizard 2018 and Kaggle, which is held for 2-3 months, is its short duration. Nikita and Dmitry had partially prepared functions for variable selection and rapid validation, which were written during their work in Sberbank and participation in other data analysis competitions. Without these functions, it would have been even more difficult to do all the work that became part of the final decision in such a short time.

The Analytics Vidhuya platform offers unconventional tasks to participants. At WNS Analytics Wizard 2018, the accuracy measure was an F1 Score, for which the cutoff was selected separately. It is a particularly difficult task given the conditions of class imbalance and the big difference between the data for training and the data for testing.