Year of Graduation
Applying recursive tensor neural network model to sentiment analysis of internet shop reviews in Russian language
School of Applied Mathematics and Information Science
With an increase in the number of users that post their opinions on the internet, manual processing of opinions has become impossible. Therefore, the field of automatic sentiment analysis is developing rapidly. One of the recent methods of sentiment analysis, Recursive Neural Tensor Model, has showed much better results compared to earlier approaches of text analysis in English. The main goals of this study are to develop a program which implements this technique for texts in Russian, to reveal its advantages and disadvantages, and to assess a possibility of its practical application to sentiment analysis of internet-shop reviews.In order to test the model a sample of 247 reviews on a particular internet shop from Yandex.Market website was extracted. Each review contains a five-point scale rating of the shop’s quality. These ratings are considered to be estimations of the sentiment of each sentence in the review. For each sentence in the sample a binary dependency tree was constructed with the use of “АОТ” morphology analyzer and the programming module for interactive syntax analysis developed by the authors of this study. Reviewers’ original spelling in sample was retained though.The studied model was implemented using C++ programming language. It was compared with classic sentiment analysis methods, based on bag-of-words model, by the accuracy of binary review classification into two classes (with rating from “1” to “2” and with rating from “3” to “5”) using only one sentence from the review. The conducted experiments showed that though the studied model takes syntactic information into account, its accuracy is lower than the accuracy of one of the classic methods that does not require this information. Moreover, it was noted that unlike classic techniques, for which accuracy tends to rise with an increase in the number of words in a sentence, Recursive Neural Tensor Model shows good accuracy for short and medium-sized sentences (up to 15 words) while its accuracy for long sentences is significantly lower.This behavior of the studied model may be caused by accumulating of a big number of grammar mistakes in the sample and also of the fact that unlike the original study an estimation of sentiment of each phrase of each sentence for the used sample is unknown.