• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • Student Theses
  • Search and Justification of New Features for Enhancing Twitter Bot Detection's Accuracy Based on Genetic Algorithm and Naïve Bayes Approach

Search and Justification of New Features for Enhancing Twitter Bot Detection's Accuracy Based on Genetic Algorithm and Naïve Bayes Approach

Student: Kiss Iris

Supervisor: Alexander A. Gorbunov

Faculty: Graduate School of Business

Educational Programme: Big Data Systems (Master)

Year of Graduation: 2021

Twitter’s popularity has attracted both individuals and organizations. To be more efficient, the usage of bots has been adopted, which generate and share these tweets in an automated form. To this day, millions of accounts are known to be bots, which has interested researchers in locating ways in which bots can be identified on social media. The methods of identification vary in their implementation, algorithms, and chosen features. Therefore, the aim of this thesis is to locate new characteristics that can be used by algorithms to enhance social media bot detection on Twitter. The first part of the thesis concentrates on a literature research which provides already known features that help detecting bots. Using these features, a model based on Naïve Bayes algorithm is developed to classify Twitter users into human or bot accounts and its accuracy is tested. To find the combination of features a genetic algorithm is implemented on the same base. Lastly, new features are proposed and studied by examining how they affect the accuracy previously achieved. The research yielded 35 features commonly used for bot detection. Out of these, a combination of 15 achieved the highest accuracy when applying the classification algorithm. Introducing a new feature improves the accuracy, by using a different combination of 14 selected features. With the aim of discovering whether the achieved result is comparable with other tools, the algorithm’s score on a different dataset than before is then compared to “Botometer” where the later attained slightly poorer accuracy than the here presented classifier. In conclusion, a potentially new feature that enhances bot detection has been discovered which is defined by the linguistic complexity of a Twitter account. This feature can hence be further analysed and subsequently be implemented in already existing bot classification tools.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses