‘HSE Academic Environment Helped Me to Make a Soft Transition to the New Era of Computer Vision’
This academic year, HSE University launched the first online master's programme ‘Master of Computer Vision’ supervised by Professor Andrey Savchenko. Alexander Rassadin, graduate of the Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod) and active participant of many CV projects, is delivering the course ‘Deep Learning for Computer Vision’ as part of the curriculum for this new programme. Alexander told us how he once wrote an algorithm for robot movement, the moment he realized what his dream job was and why analyzing sports games is more interesting than predicting a tsunami.
When did you first get interested in computer vision?
This happened in 2014, when I graduated from my bachelor's programme at another university and enrolled on the master's programme at HSE University, I could say it happened by inertia. As part of my research, I was working on mathematical models, such as tsunami forecasting.
The programme ‘Master of Computer Vision’ was developed at the HSE University in Nizhny Novgorod by researchers at the Faculty of Informatics, Mathematics, and Computer Science and leaders from the computer vision industry - including experts from Huawei, itSeez3D, Intel, Harman and Xperience.ai. Since Intel developed the OpenCV library in Nizhny Novgorod in the early 2000s, the city has become a significant global centre for computer vision. The creators of this library work in leading IT companies and regularly invite graduates of HSE University in Nizhny Novgorod to work in the computer vision industry.
At the same time I was working as part-time programmer. Since school I've been interested in algorithms, but I've always wanted to do something tangible that I can try in real life, to tell or to show it to anyone, even someone without professional skills. Unfortunately, neither research activity related to solving equations, nor bug fixing (the process of changing to a system or product designed to handle a programming bug/glitch. - Ed.) allowed me to fully express myself.
Once, while studying a robotics course, I chose the task of writing an algorithm for robot movement based on pointers on the surface. I immediately felt that this was what I wanted. I had no experience in computer vision at that time and didn't really know what it was. Nevertheless, I successfully defended the project: I segmented the arrows on the floor using the OpenCV library. After that I understood that I wanted to do something along these lines.
Less than three months later, my work project ended, but I learnt that a new computer vision startup was recruiting specialists. I still can't believe that I successfully passed the selection process and got there. A year later, I they let me head a small team. This would not have happened if I had been engaged in another field. I completely dedicated myself to this new field, which I simply had never done before.
Did studying at HSE University contribute to your development in the industry?
It was my interest in CV that prompted me to change university and attend HSE. During my master’s studies at my first university, I felt disappointed at the quality of education and research. I saw neither prospects for professional development, nor the opportunity to build my career: my classmates worked as regular programmers or in the department.
Computer Vision systems and applications allow you to extract information from the array of images accumulated by machines and classify it to identify patterns, make predictions and get rid of routine tasks. CV technologies are becoming more precise every year. Five to ten years ago, machines recognized only 65-70% of objects. Today, computers can recognize up to 98% of objects.
As I developed as a CV specialist, these thoughts began to take root. As a result, I made a radical decision — to change my master's programme in the middle of my studies. It turned out to be the right decision. The main HSE building (in Nizhny Novgorod. - Ed.) was on the next street, I had already talked to HSE students and teachers, so in general, I didn’t feel like stranger at the university. After enrolling, I realized that the programme I chose was as close as possible to what I was looking for, and sometimes it matched perfectly. For example, at work we were developing models for face recognition and at the same time studying the same topic at HSE.
From my first days of studying, I started to fill in the gaps in my knowledge. My colleagues were older and noticeably more experienced, and communicating with them helped me enormously to develop in the field. At that time, the decline of classical computer vision was taking place – this was a time when there were no neural networks or with only few of them. Thus, I found myself in a period of transition, and it was HSE’s academic environment that helped me to make a relatively easy transition to the new era. Without the knowledge and people who were surrounding me, it would have been much more difficult to switch. After the first year of my studies, Andrey Savchenko invited me to join a research group in order not just to apply neural network methods, but to engage in their research and development.
What projects and developments in the field of CV have you participated in?
I have devoted most of my professional activity to working on solutions in the field of video surveillance: selection of objects in the frame, their analysis - detection of static and dynamic characteristics, visible and invisible attributes, analysis of their actions, recognition of the personality of people and pets, etc. Over the last year my research interest has been focused on sports analytics: the analysis of sporting matches, game statistics, personal trainers and assistants. I have devoted much of my career to the analysis of the three-dimensional world (scene and human analysis) and medical images.
How does the Russian computer vision industry sit on the world market?
The massive transition from classic CV to neural networks in our country started, as it seems to me, around 2016. A year before, the first conversations about neural networks and pilot projects started taking place. In 2017 there was a real boom, with the launch of the Prisma, MSQRD, FindFace projects, and many others.
Computer vision technologies are actively used in many digital industries: in the "Smart City" system, intelligent transport systems, high-tech agriculture, etc. The range of CV application areas and scenarios is constantly growing.
CV technologies are widely used in medicine, helping to make diagnostics extremely accurate, and subsequent treatment as effective as possible. Microsoft has developed the CV InnerEye system: this displays possible tumors and other abnormal formations revealed during computed tomography on the doctor's monitor.
Computer vision is also helping to achieve the UN goal of doubling agricultural production to feed all the people on the planet. Using CV, systems of precision farming have been created to increase crop yields. In addition, computers help to estimate the weight of pigs from video images and determine the ripeness of crops.
It seems to me that today our country is following global trends. To a large extent, this is the merit of the Open Data Science community, which does a huge job of attracting people to the field, supporting specialists and helping them to develop. I don't like to make forecasts because it is difficult to predict changes in such a rapidly developing area. Currently researchers are focusing on honest, unbiased, interpreted AI and, in general, solutions aimed at social benefits, although I cannot say that most projects are being implemented in this area now. A lot is being done to automate production, we should mention the snowballing growth in the quality of NLP models and solutions based on them. From a technical point of view, methods from these two areas tend to be unified. The number of medical image analysis projects is growing exponentially, and we can already see real implementations despite the fact that certification and clinical application of such developments is incredibly long-term and complex.
As part of the programme, you are going to deliver the course ‘Deep Learning for Computer Vision’. How will this interest students?
The course is dedicated to the basic techniques of neural network two-dimensional computer vision. We work with images and videos obtained from conventional monocular cameras. Students learn to classify images, detect and segment objects on them, monitor objects, and recognize people by face and body. These sections are relevant for most tasks, and especially for tasks related to video surveillance. We will study the current situation in the field (some of these methods appeared in 2020) and trace their development. We’ll learn about the main data sets — a key component of modern CV, methods for evaluating the quality of neural network algorithms and available solutions. All the material is supported by practical examples, as well as individual and group tasks. In fact, a student who has successfully completed this course will be able to create his first project in the field of video surveillance.
In 1960, a visual image reader, Mark I Perceptron, was created, but due to poor technical equipment, it could not cope with solving machine vision problems. In the 1960s, the first image processing programs started to appear.
In the 1970s, MIT doctoral student Lawrence Roberts was the first to propose a working concept for constructing three-dimensional images of objects based on the analysis of their two-dimensional images. During this period, various approaches to the recognition of objects in the image - by texture, structure, feature - were developing.
In the 1980s, the American company Automatix pioneered the use of computer vision systems in business: they developed several machines with cameras for soldering chips, which sent pictures to the processor.
In the 1980-90s, sensors of two-dimensional digital information fields appeared, which made it possible to obtain stable images for analysis. The mid-90s saw the launch of the first commercial automatic car navigation systems. At the end of the 1990s, effective means for computer analysis of movements entered the market. In 2012, a revolution took place in the computer vision industry: deep ultra-precise neural networks were used for the first time at the ImageNet image recognition competition. CV algorithms have expanded significantly, and there has been a boom in mobile application development.
By Ekaterina Zinkovskaya, eLearning Office