Year of Graduation
Predicting the response of customers for a large telecommunication company based on machine learning methods
School of Applied Mathematics and Information Science
The problem of predicting the customers’ behavior is reviewed in this project. The behavior we are interested in is disconnection of «Unlimited Opera» service. The main aim of this project is to understand which combination of feature selection method and classification method is more appropriate for predictive tasks on this data. The tasks of the research are as follows: the first task is to create predictive models based on different methods of feature selection and classification, the second is to compare the quality of these models.Solution of this problem is using Knowledge Discovery in Database process (KDD process). The first step is data selection. The data about customers’ behavior were taken from company’s data warehouse. This data contain information about customers’ behavior for three months. We have considered the data only about service users. Then the target attribute was set. If user continued to use service target attribute was 0, otherwise 1. Next step is Preprocessing: Data cleaning etc. It consists of removing the anomalous values, data balancing. The third step is Data Transformation. It was one of the most laborious processes as the data without any transformations gave bad accuracy in classification task. Because of this from 316 attributes we received 3010 by different data manipulations. The fourth step is data mining process. It consists of using different algorithms of feature selection and using different types of classifiers based on machine learning techniques. The last step is comparing different combinations of methods and revealing the best one.As a result, the work has given fairly accurate predictive models. Particularly, Multilayer Neural Networks and Logistic Regression have performed well in combination with the method of attribute selection based on the information gain measure and Relief method. The best models’ accuracy of prediction is about 84,75%.