Year of Graduation
CRISPR-cassettes – clustering, sequence analysis, creating a web-service with bioinformatics applications
School of Applied Mathematics and Information Science
This study examines CRISPR (Clustered, Regularly Interspaced Short Palindromic Repeats) cassettes which are special parts of genome. CRISPR systems provide defensive function and have similar structural features. Presented work performed as the part of the joint project with D. Zubankov supported by RTCB IITP RAS and describes analysis of existing information about CRISPR systems using mathematical approaches. The study describes repeats-based clustering algorithm with the calculation of the distances between every pair of CRISPR cassettes by applying dynamical programming methods which were created based on Needleman-Wunsch algorithm. In addition, the work presents effective algorithm for building prefix function which is used for finding string overlaps in Shortest Common Supersequence problem. Finally, the final stage of the project, the development of the web-application which will provide abilities to examine CRISPR cassettes uploaded by scientists will be depicted including its technical and architectural features.