• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Automating the Process of Managing a Data Collection System from Open Sources

Student: Burnashov Evgeniy

Supervisor: Kirill Lychagin

Faculty: HSE Tikhonov Moscow Institute of Electronics and Mathematics (MIEM HSE)

Educational Programme: Computer Systems and Networks (Master)

Final Grade: 7

Year of Graduation: 2019

The aim of this final qualifying paper is to solve the problem of automating the process of managing a data collection system from open sources. The purpose of this final qualifying paper is to simplify the management of the data collection system, which is several heterogeneous web-robots, as well as the creation of a common infrastructure to control these robots. Web-robots that collect information on the Internet are called “web-crawlers” or simply “crawlers”; these robots are applications that are usually written in the programming languages Python or Java. Crawlers have different properties and perform similar yet different functions. One of the ways to create a common system for managing heterogeneous crawlers is containerization technology. This technology allows you to place applications in containers, which are very small copies of the operating systems on which applications will run. The infrastructure is based on Docker Swarm technology, the crawlers used in the system were written in Python and Java. This work contains 3chapters, introduction and conclusion. Number of illustration - 21. Number of sources used – 25.

Full text (added May 26, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses