• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Improving Medical Data Enclosure Using Differential Privacy

Student: Dornostup Olga

Supervisor: Sergey Lisitsyn

Faculty: Graduate School of Business

Educational Programme: Big Data Systems (Master)

Year of Graduation: 2020

In the modern world the amount of data that are digitally recorded, collected and somewhere stored is rapidly increasing every day. Some of these data contain personal information about the individuals and are considered as extremely sensitive or confidential. However, there are many big data breaches in privacy that appear through data mining, sharing, or publishing, which can lead to the data disclosure. A vivid example of such data is the data collections produced in the medical and healthcare sectors. From one side, sharing these data expends the general knowledge of people’s health, forces the development of digital health care services and platforms, enables to build the predicting models, helping to preserve the diseases. But from the other side, people hesitate to share any health-related information, because they usually worry about the third parties having an access to their personal data and its possible usage. Unfortunately, the classic approaches to data protection based on the data anonymization are unable to resist against the situation when an adversary has some background knowledge about the individuals in the database. In this paper, I consider one of the most popular, powerful and gaining more and more attention definition of privacy - differential privacy, which overcomes this kind of attacks. Differential privacy approach is also proposed as an appropriate algorithm for protecting the health data. I aimed to investigate on the possible compromise between keeping the sensitive data protected and getting the appropriate data utility. In particular, I focused on differentially private realizations of two common supervised learning algorithms for solving the classification problem: naïve Bayes and logistic regression, and also on unsupervised learning techniques, such as dimension reduction and data clustering. It was shown that for the particular situations the algorithm’s performance does not fall dramatically, while the privacy of individuals in the dataset is not violated.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses