• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Student
Title
Supervisor
Faculty
Educational Programme
Final Grade
Year of Graduation
Danil Gizdatullin
Predicting the response of customers for a large telecommunication company based on machine learning methods
School of Applied Mathematics and Information Science
Bachelor’s programme
2014
The problem of predicting the customers’ behavior is reviewed in this project. The behavior we are interested in is disconnection of «Unlimited Opera» service. The main aim of this project is to understand which combination of feature selection method and classification method is more appropriate for predictive tasks on this data. The tasks of the research are as follows: the first task is to create predictive models based on different methods of feature selection and classification, the second is to compare the quality of these models.Solution of this problem is using Knowledge Discovery in Database process (KDD process). The first step is data selection. The data about customers’ behavior were taken from company’s data warehouse. This data contain information about customers’ behavior for three months. We have considered the data only about service users. Then the target attribute was set. If user continued to use service target attribute was 0, otherwise 1. Next step is Preprocessing: Data cleaning etc. It consists of removing the anomalous values, data balancing. The third step is Data Transformation. It was one of the most laborious processes as the data without any transformations gave bad accuracy in classification task. Because of this from 316 attributes we received 3010 by different data manipulations. The fourth step is data mining process. It consists of using different algorithms of feature selection and using different types of classifiers based on machine learning techniques. The last step is comparing different combinations of methods and revealing the best one.As a result, the work has given fairly accurate predictive models. Particularly, Multilayer Neural Networks and Logistic Regression have performed well in combination with the method of attribute selection based on the information gain measure and Relief method. The best models’ accuracy of prediction is about 84,75%.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses