• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

'At the Intersection of Mathematics, Biology, and Machine Learning, I Found My Place'

Aleksei Shmelev conducts research in genomics and uses machine learning to explore the history of human populations. In this interview with the HSE Young Scientists project, he discusses the adaptive introgression of Tibetans and Denisovans and the use of IBD graphs to predict human population membership.

How I Started in Science

In high school, I focused most on three subjects: physics, mathematics, and biology. I found them all equally fascinating, participated in olympiads in each discipline, and won several prizes. At the same time, when it came to choosing a field of study for university, I decided to focus on mathematics. The idea of dissecting frogs in biology labs didn’t appeal to me at all, while mathematics, as I saw it, is everywhere and offers broader opportunities for growth. Still, I believed that my knowledge of biology would prove useful someday, just in a different context. 

In the first year of my bachelor’s programme, all students were required to choose an academic supervisor for a small research project. I remember that in one of our classes, Prof. Vladimir Shchur, who taught our mathematical analysis seminars, announced that he was opening a new International Laboratory for Statistical and Computational Genomics, right at the intersection of mathematics and biology. I was intrigued and decided to give it a try. 

My first assignment was to develop a maximum-likelihood method for dating events within a fixed topology of ancestral recombination graphs. It proved challenging, but by the end of my first year, I faced a choice: continue refining algorithms for this method or explore a new field—machine learning. By then, I had been wondering how a machine could be trained to 'think' and what kind of mathematics underpinned that process. It struck me as a promising field—one that could also be applied in the lab within an interdisciplinary area like genomics, which closely matched my interests. So I decided that while I still had time to study, it was worth trying to solve problems specifically in machine learning. I had no idea at the time that I would still be working in this field today.

About Denisovans and Tibetans

My first major project was to study adaptive introgression—a process in which a gene enters a population from an outside source and eventually becomes crucial for its survival. In Tibetans, this turned out to be the EPAS1 gene, which helps them live in low-oxygen conditions. This gene is known to have been inherited from the Denisovans. We were interested in how many generations passed between the time Denisovans interbred with the ancestors of modern Tibetans and the point when natural selection began favouring this gene. Although methods exist to estimate the length of such a period, they remain quite imprecise. 

We had only a single Denisovan genome and a few dozen modern Tibetan genomes. That dataset was far too small to train a model, so we simulated various scenarios for the length of the neutrality period and calculated different statistics from the region surrounding the EPAS1 gene. To analyse them, we used a contrastive learning approach in which the model learns to translate data from each simulation into a compact vector representation (embedding), clustering similar scenarios and distinguishing different ones. When we tested the trained model on real data, we found that the data fell into clusters corresponding to the simulated scenarios. This allowed us to accurately estimate the period of neutrality and confirm that the real data aligned with our simulations.

Photo: HSE University

Breakthrough in the Study of Closely Related Human Populations

Another project arose through collaboration with Genotek, a genetic research company that aimed to improve the accuracy of predicting human population membership from microchip genotyping data of modern humans. We proposed using a graph-based model built on IBD segments—stretches of DNA inherited from common ancestors. In other words, if two individuals share a long segment of their genome, it indicates that they had a relatively recent common ancestor. 

We constructed an IBD graph in which each vertex represented an individual, and the weight of an edge between two vertices reflected their genetic similarity, calculated as the sum of the lengths of shared IBD segments. The graph was not necessarily complete: an edge was drawn only if the similarity between two individuals exceeded a certain threshold. We then used this graph to train graph neural networks (GNNs), which learned to predict the population membership of each vertex. For each new client, we first calculated the shared DNA segments between them and all individuals in the database. The client was then added to the graph as a new vertex, and the model, using this updated graph, generated a probability distribution over populations for that individual. This approach proved more accurate than existing methods and was better at classifying closely related populations. Based on these results, Genotek purchased a license to use our method.

What I Take Pride In

I wouldn’t say there is a single result I am most proud of. In the problems we tackle, we often need to draw ideas from various areas of machine learning—ranging from image processing (CV) and text analysis (NLP) to time series prediction—and adapt them to our specific genetic data. However, careless application of methods from other fields, without considering the specifics of biological data, usually leads to low accuracy or results that are difficult to apply in practice. Therefore, we must carefully examine which aspects of a method can be effectively transferred to our data and identify where we need to develop our own approaches. Although genomics is not yet widely popular among machine learning specialists, I expect significant progress in the study of organismal evolution as more data accumulates and methods improve. I hope that our current developments will prove useful in the future and contribute to this advancement. In this regard, I am grateful to be part of a team working together toward this goal.

My Dream

I am grateful to have the opportunity to work on things that interest me, alongside people who share the same passion. I hope that over time, there will be even more such people.

I believe that any activity can eventually become a science if it accumulates experience and develops a community, along with established rules and methods. But to create something new in the field, one must first master the existing rules and understand which problems are truly relevant. This is what distinguishes an undergraduate from a professor. Tennis is a good example. Many people can simply hit a ball with a racket, but few can play at the level of Roger Federer. Behind this mastery are refined techniques, innovative shots, and constant training. In a similar way, I believe the film industry—and many other fields—have long since evolved into sciences. For me, engaging in science is not only an opportunity to study and advance a chosen field but also a chance to discuss it with people who have deeper understanding and can point out what truly deserves attention.

If I Hadn't Become a Scientist

I would have chosen one of three careers in the film industry—director of photography, computer graphics specialist, or editor. I became interested in filmmaking back in school. My friends and I filmed short reports about the sights of our country and the world. We travelled to locations ourselves and worked in the style of Heads and Tails (‘Орёл и решка’), a popular show at the time. However, our films did not gain much traction online, so we eventually turned to cycling, which one of my friends was passionate about. This is how our French Rider cycling channel gradually came to life, featuring reviews of bicycles and accessories, as well as clips and reports from exhibitions. There’s even a small behind-the-scenes segment at the end of one of our videos where you can see me at work.

I’ve always been fascinated by how high-quality special effects are created in Hollywood films. Sometimes, when you watch a breakdown of a scene, you realise that a large portion was CGI, even though it appeared completely natural during the first viewing. I was especially interested in the software used to create such realistic effects and simulations. In the first year of my bachelor’s programme, I decided to try my hand at 3D modelling in Blender, and I was instantly fascinated. At one point, as part of a research project, we even created an animation to illustrate the concept behind the method our team was developing. To this day, when scientific projects require visualisation, I sometimes draw on my modest 3D modelling skills.

I prefer to stay behind the scenes—planning the composition, controlling the lights and equipment, and then assembling everything during the editing stage. Over the years, I’ve accumulated several lenses, and more than one generation of cameras has come and gone. I’m interested in both photography and videography, but video appeals to me more because it combines image, motion, and sound into a single whole and offers greater freedom for experimentation. I haven’t started my own blog yet—I don’t have enough time—but I would like to.

Photo: HSE University

Who I Would Like to Meet

When I was working on my bachelor’s thesis, one chapter was devoted to studying adaptive introgression. Our contrastive learning-based method performed well in simulations, but the lack of validation on real data limited the value of my work. I came across the article 'The History and Evolution of the Denisovan-EPAS1 Haplotype in Tibetans' (PNAS, 2021), in which the authors addressed an almost identical problem using classical methods and had access to the Denisovan and Tibetan genomes I needed, which were not publicly available. The lead author of the article was Xinjun Zhang (Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA). I contacted her several times by email, and eventually, Prof. Schur and I had an online discussion with her about our work. As a result, we obtained the data, allowing me to successfully test my method under real-world conditions. 

It was a very important experience for me. Someone I had never met before looked into our problem and helped us obtain the data, without which it would have been difficult to give my work real scientific value. I believe this is how the scientific community should function—when researchers from around the world are willing to support one another. This work is not yet complete, but I hope that in the near future we will be able to continue it together, combining the efforts of our laboratory with those of colleagues from other countries.

A Typical Day for Me

Most machine learning problems involve conducting a large number of experiments in one way or another. While theoretical justification is important, convincing the reviewers of your future paper that a method truly works usually requires extensive comparisons with similar approaches and validation on established benchmarks. Therefore, everything begins with a plan: you need to decide which experiments to conduct, in what order, and what questions they should answer. Once we have discussed such a plan in the laboratory, I begin implementing it. 

Typically, my day begins by turning on the computer and checking what the model has computed overnight—what metrics have appeared and whether everything is progressing in the right direction. While having breakfast, I think about what new experiments can be conducted, what adjustments are needed, and how to fix any issues that arose. During the day, I work on coding and analyse intermediate results. By evening, I try to set up new training tasks so that the GPU can keep working overnight.

Photo: HSE University

Whether I Have Experienced Burnout

I wouldn’t say that I have experienced burnout in the usual sense. I don’t lose interest in the task at hand—on the contrary, I strive to complete it whenever I see a promising solution. Rather, there are times when deadlines demand an urgent launch of new experiments, leaving me with almost no energy to write code. In machine learning, you also have to write many auxiliary scripts to test hypotheses—building distributions, calculating metrics, and visualising the model’s predictions. This used to take a lot of time, and often the resulting code was a one-off solution, suitable only for a specific experimental setup. Such tasks have now become much easier: I can ‘vibe code’—explain to a large language model, by voice or text, exactly what needs to be done—and receive ready-to-use working code, while also practicing a foreign language. Of course, the automatically generated code still needs to be checked, but for quickly testing simple hypotheses, it works very well and significantly speeds up the process.

Also, if I feel tired, I can simply go for a walk to shift my focus, and when I return, I can tackle more complex tasks that require considering the specifics of my data and carefully thinking through the details myself.

My Interests besides Science

I often play table tennis and basketball with friends during the warmer months, and I practiced lawn tennis for many years while at school. I also completed a piano course at music school and still keep in touch with some friends from that time—in particular, we run our cycling channel together. Unfortunately, I don’t have much time to practice music these days, so I rarely play the piano.

Photo: HSE University

What I Have Been Reading Lately

A while ago, a friend recommended that I read Elon Musk’s biography. Recently, I finally found the time to start it. I chose the version written by the American author and journalist Walter Isaacson.

From my perspective, Elon Musk’s business is not only multifaceted—spanning space, automotive, biology, and artificial intelligence—but also highly knowledge-intensive, as genuinely new technologies are being developed under his leadership. I am interested in how, when faced with challenges and doubts about his ideas, he continues to move forward and what key decisions enable him to do so. I don’t know if the book will answer all my questions, but I believe one can certainly learn strategic thinking from him, as well as the ability to persevere through obstacles and see plans through to completion.

What I Have Been Watching Lately

I enjoy films directed by Luc Besson and Guy Ritchie. Their styles would probably align most closely with mine if I were a director myself. I sometimes like to rewatch some of their works, particularly Sherlock Holmes.

Advice for Aspiring Scientists

Find a research field that genuinely interests you. Don’t be afraid to try new things, and don’t give up if something doesn’t work out the first time. Take advantage of every opportunity for growth—participate in internships, summer schools, workshops, and conferences. As long as you have the time and energy, invest it in your own development.

My Favourite Place in Moscow

The GES-2 House of Culture. I appreciate it when historical buildings are not left to decay but are restored and repurposed. This approach preserves the diversity of urban architecture while giving these buildings a new lease on life.