Our Students Became Winners of the International Data Science and Machine Learning Competition from Kaggle
Ekaterina Melianova and Artem Volgin, second year students of the Master's Program "Applied Statistics with Network Analysis", took second place in the international competition on data analysis from Kaggle. Based on survey data from 19,717 respondents from 171 countries analyzed the community of PhD owners in the field of Data Science.
Kaggle is a Data Science platform from Google. The Kaggle community brings together about three million Data Scientists from around the world. It publishes training materials, organizes surveys and online competitions. The platform has held more than a hundred public contests on machine learning, the prize fund of the competition reaches tens of thousands of dollars.
Participants in the annual Kaggle ML & DS Survey competition analyzed the data from the online survey of Kaggle website users. It was necessary to select any group represented in the survey and tell an interesting story about it based on the data. Presentation and originality of the project were evaluated, as well as the clarity of the code and reproducibility of the results.
We have chosen to analyze PhD holders. This topic is interesting for us because we study the effectiveness of human capital, and of education in particular. Most of the survey data consisted of responses that focused on specific data skills owned by the respondent (e.g., Python programming or knowledge of a particular machine learning method).
Using these questions, we calculated a similarity metric between respondents and constructed a graph from which we drew interesting conclusions about the characteristics of the academic data science community. With this method we were able to identify some clusters within the PhD community, to look at the differences in skills between groups of countries, and to identify basic and more specialized skills.
Also methods of network analysis allowed us to visualize the results. In addition, we have shown how beneficial or disadvantageous it is in terms of salary to receive PhD in different countries, as well as how existing gender discrimination in Data Science professions manifests itself in relation to women with PhD.
According to researchers, they have chosen the Master's Program "Applied Statistics with Network Analysis" because they are interested in the applied data analysis. The program allows to master the wide spectrum of various statistical methods, including the network analysis which is extremely popular in many scientific areas.