Year of Graduation
Automating Analysis of User Review Texts
Mathematical Methods of Modelling and Computer Technologies
The paper is devoted to automating passenger car user comments in Internet (from the site auto.ru). To this end, I have developed software to download 35797 letters (in Russian) and present them as a user-friendly database. Also, I selected 111 keyword combinations to reflect car usage advantages or drawbacks. By applying the method of Annotated Suffix Tree, I obtained a 35797 x 111 of relevance indices between the keywords and comment texts. In this matrix, the entries related to car drawbacks are made negative. Application of the k-means clustering at experimentally determined K=9 leads to a very stable cluster structure. This structure admits a natural interpretation. In particular, there are 4 clusters related to various aspects to dynamics, comfort, etc. Distributions of car brands over the clusters are analyzed, too.