• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Evgenii Tsymbalov
Machine Learning for Analytics of Free-to-Play Games on Social and Mobile Platforms
Data Science
(Master’s programme)
2016
For solution of the problem of churn prediction, cohort-based ensemble meta-classifier was proposed. Data processing pipeline includes feature engineering and selection along with optimization. Parametrical meta-metric, which penalizes undesirable outcomes of cohort test procedure, was designed to reflect real life experience of using prediction models. All steps of the model training pipeline include classifiers’ optimization; final step optimizes the whole ensemble on meta-metric values. Various numerical experiments show importance of steps used in model training pipeline.

All of the research stages were using data from real-life social on-line game projects. Issues of practical implementation, such as resource boundaries and results reinterpretation, were considered during the model construction. Classifiers considered for model ensemble include state-of-the-art algorithms, based on ensemble methods: random forest, gradient boosting; decision trees were used as base classification algorithm for them. Model parameters’ selection was performed using cross-validation; feature selection is based on statistical tests and performance of classifiers. Various strategies for pipeline were examined. Optimization of cohort-based ensemble meta-classifier is based on classifiers’ threshold adjustment.

During research, algorithm for weekly data preprocessing and model retraining was developed and implemented in Webgames LLC. Mostly automated, this algorithm requires human assistance only on step of meta-metric parameter choice, and can be used in various business fields, such as banking, telecommunication and entertainment industries, for churn prediction. Obtained results lay the groundwork for model improvement and generalization.

Key words: machine learning, churn prediction, ensemble models, random forests, cohort-based metric, threshold optimization.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses