• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Research of a Data Lake Concept and an Architecture for Big Data Storage and Analytics in Healthcare

Student: Cherepov Dmitry

Supervisor: Olga A. Tsukanova

Faculty: Graduate School of Business

Educational Programme: Big Data Systems (Master)

Year of Graduation: 2020

The healthcare field contains huge volumes of data, the processing of which is a non-trivial task and requires the use of modern technological tools. Solving the problem of choosing the most suitable technologies for data storage and processing will simplify data analysis processes and improve the quality of the results. In this paper, the aspects of medical data are researched and requirements for a storage system are developed, based on which a comparison of the main types of systems and the choice in favor of a data lake are made. The features of the data lake architecture are investigated and a comparative analysis of existing cloud and local solutions is carried out. A technological stack is selected to implement the necessary data storage and analysis capabilities; moreover, a process of data transfer in the data lake infrastructure is designed. The data lake is configured and launched in accordance with the developed structure; for testing, a set of medical data is selected (information on the coronavirus infection COVID-19 in each country of the world), for which a number of problems are identified that require further research. During the data analysis process, visualizations and statistical analysis are performed, as well as models for forecasting time series and machine learning models are tested. The experiment on the test dataset successfully demonstrates the capabilities of the designed data lake infrastructure to solve the set tasks. The results are noteworthy and might be useful both for researchers in the development of a methodological base for the analysis of medical big data, as well as for practitioners and representatives of the healthcare sector.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses