• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Learning String Kernels Utilizing Adaptive Variable Length Compression

Student: Filatov Gleb

Supervisor: Attila Kertesz-Farkas

Faculty: Faculty of Computer Science

Educational Programme: Data Science (Master)

Final Grade: 7

Year of Graduation: 2017

Currently there exists many approaches to classifying protein strings in bioinformatics and one of these approaches is based on string-specific distance/similarity measures. In this thesis a new convolutional kernel over strings is presented – the LZW-kernel, which is based on Lempel-Ziv-Welch compression algorithm. The motivation for a novel method is that there exists a tradeoff between computational complexity of the algorithm, which determines distance/similarity between strings and its further quality, as a measure applied to various machine learning algorithms. Our goal is to reduce this gap – utilizing an algorithm less computationally intensive, than current leader (which is Smith-Waterman alignment score). LZW-kernel was tested with two classification algorithms – SVM and k-NN with various number of neighbors. We managed to get better classification quality than any of quadratic time string comparison methods, but not surpassing the classification quality of a leader, which is cubic in time.

Full text (added May 30, 2017)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses