• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Scheduling of Data-Intensive Workflows

Student: Sopov Vitalii

Supervisor: Oleg V. Sukhoroslov

Faculty: Faculty of Computer Science

Educational Programme: Applied Mathematics and Information Science (Bachelor)

Final Grade: 9

Year of Graduation: 2019

Today, the calculations made throughout large scientific experiments or business processes of commercial organizations utilize tens and hundreds of machines, use and generate terabytes and petabytes of data. To automate distributed computations, they are divided into many interdependent tasks which are performed on individual machines and collectively called an application workflow. Efficient use of computing resources to reduce the execution time of application workflows requires scheduling algorithms that take into account the distributed nature of the algorithms and effectively distribute individual tasks to specific machines, depending not only on their computing power, but also on the time of data transfer between nodes, taking into account the network topology and the state of the network channel. This work, using an existing widely-known formal model of application workflow, provides new techniques which are specifically designed for applications operating on large amounts of data and applies them to existing scheduling algorithms. It provides an overview of existing scheduling algorithms, considers the problems which arise when the amount of data transfers increases and which are not solved by existing algorithms, and then proposes new techniques that take into account the data transfer problems in application workflows. The proposed techniques are then implemented in existing scheduling algorithms. Testing proposed modifications on several real-life scientific workflows as well as a set of synthetic configurations shows their high effectiveness in comparison to the basic algorithms and their readiness to be used in real-life systems. Keywords: distributed computing, scheduling algorithms, application workflow, discrete-event simulation, big data, data-intensive.

Full text (added May 19, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses