From Chaotic Dynamics to Process Mining
Sergey Shershakov is 2012 graduate of the HSE master’s programme in System and Software Engineering, lecturer of a course in Data Algorithms and Structures, a researcher at the Laboratory of Process-Aware Information Systems (PAIS Lab), and participant of the Young Faculty Support Programme in the Category ‘New Researchers’. Sergey told us what Process Mining is, how to keep your knowledge up-to-date without working in the industry, and why HSE graduates don’t have to ‘forget everything they’ve been taught’.
Way to HSE
When I applied to the HSE master’s programme, it had already been two years since I’d graduated from the Moscow Technical University of Communications and Informatics (MTUCI), and four years since I had started working in a research laboratory at the same university. At some point, I started thinking about developing my academic potential and decided that getting a postgraduate degree would be a good idea. I was looking at the programmes available, and chose software engineering as a natural continuation of my major degree.
My first meeting with HSE in person took place at the building on Kirpichnaya Ulitsa, which hosted a one-day school in software engineering. Here I also met Sergey Avdoshin. The master’s programme in software engineering took place as the first enrolment in the School of Software Engineering, which previously had been the Faculty of Business Informatics.
HSE and all that it means
In 2010, I started my master’s studies. Then, I met Irina Lomazova, who taught us a course on ‘Formal Methods in Software Engineering’. Later, Prof. Lomazova became my academic supervisor.
I was impressed by the building on Kirpichnaya. It had large lecture rooms, well-equipped computer classes, and recreation areas. Wi-Fi was available everywhere in the building, and many classrooms had projectors. Expendables were always in supply. So, everything seemed ideal from the material point of view.
The opportunities offered by HSE to its students were also quite attractive. They included academic mobility support, which means international conferences, schools, and internships. Academic work is the key to academic mobility; in 2012, I submitted my first independent academic paper for the SYRCoSE conference, which takes place annually in various Russian cities, and it was accepted. That year, the conference took place in Perm at the HSE campus in Perm. It was then I began to see HSE’s influence beyond Moscow.
Towards the end of my master’s studies, I prepared one more paper, and it was accepted at an international conference in Canada. I applied for a grant from the HSE Academic Fund, won the competition, and went on my first overseas academic trip, funded by HSE.
While studying on the HSE master’s programme between 2010 and 2012, I managed to get what I’d been lacking in my education, and this was not only knowledge, but also connections and prospects for the future.
Choosing the area for research
After graduation, I spent some time working as a researcher at MTUCI, where I developed software for hardware and software systems in a Laboratory of precision signal generators headed by V. Kochemasov. In 2012, when I graduated from the master’s course at HSE, I decided to leave the lab. I tried participating in freelance projects, but after some time I realized that an academic institution is the type of employment that suits me best, and research is probably the most creative option of all.
At different moments, I looked for interesting fields in other areas of research. I was particularly interested in chaotic dynamics as applied to steganography tasks. It was a fascinating area, but I preferred software engineering, which seemed less chaotic.
In 2011, Prof. Wil van der Aalst from Technische Universiteit Eindhoven (TU/e) visited HSE. He is a renowned scholar in process-aware information systems and the founder of Process Mining, a new research area. This was also when when the idea of creating a laboratory in this field as part of HSE appeared. By the end of my master’s studies, the decision to create the laboratory had been made, and when it was founded, its head, Irina Lomazova, invited me to join the research team.
What Process Mining is and why it is important
Process Mining is the main area of research at the Laboratory of Process-Aware Information Systems (PAIS Lab). Process Mining is at the intersection of business process research, their modeling, data mining, and machine learning. It involves a lot of approaches for work with big data.
Process Mining is based on the fundamental idea that the support of most processes (technological, business, social, medical etc) is organized by means of information systems, which maintain event logs as they work, and store the information on how the system works in real life: for example, bills paid, invoices issued, or check-ups prescribed. They store a lot of such events, and real system behavior can be observed by their means. As a result, we can detect inconsistencies between real business processes and the way they were initially planned.
Process Mining methods can be used to develop various models based on event logs, both with the use of formal mathematical tools, such as Petri nets and transition systems, and more business-oriented methods, such as business process description languages, including BPMN. Process Mining ideas have demonstrated good applicability and have a huge amount of theoretical and practical potential.
I was impressed by the building on Kirpichnaya. It had large lecture rooms, well-equipped computer classes, and recreation areas. The opportunities offered by HSE to its students were also quite attractive. They included academic mobility support, which means international conferences, schools, and internships
The first and main topic I’m looking at is ‘transition systems reduction with preserving the model’s precision’. This task investigates the method of describing processes recorded as logs by means of transition systems, and various transformations of these systems that impact their qualitative features. We study the methods of models’ reduction so that they have a simpler structure and a smaller size.
If we are speaking about real logs in general, it turns out that they reach huge sizes rather fast. Modern rates of information creation allow the ‘average’ system to create several gigabytes of operation information a second. This is enough for a model built directly on logs of such size to be unbelievably big.
There are various approaches to reducing the model size, but each of them alters certain qualitative characteristics of the model, which may make the model unusable.
The second topic is Process Mining application in software engineering. Today, we are working on mining hierarchical UML sequence diagrams or hybrid diagrams.
I have to admit I wasn’t initially planning to become a teacher, as I was more attracted to research. But in early 2013, Sergey Avdoshin invited me to deliver some lectures in the course ‘Data Algorithms and Structures’ in the spring semester, and conduct programming workshops for students of Applied Mathematics and Information Science. I already had some teaching experience and agreed. Then I realized that to do properly, even part-time teaching means a big workload and is very time-consuming. After one semester, it went without saying that I continued doing it, and I got term papers and graduation theses as a ‘bonus’ to lectures and workshops.
There is an active trend towards project work at HSE today. Speaking about the Software Engineering programme, project work plays an ever-increasing role, since we teach our students practical things, and they can fully comprehend software production only by learning the whole cycle of development for independent products or system components. With the use of such approaches, we don’t only give each student part of a task – to invent and implement an algorithm – but also create the conditions where they can involve the best practices, industrial standards (such as SWEBOK) and tools to support development. This way they acquire the skill of seeing the whole process from a project manager’s perspective.
Usually, when graduates start at companies in lower positions, they adopt the technologies that are already used in the company’s business process. That’s why I believe it’s right to teach the students to use the technologies they’ll face in practice, such as software project management tools, source code versioning tools, bug trackers etc. As a result, employers hardly ever tell our graduates to ‘forget everything that they’ve been taught at university’.
How to keep your knowledge up-to-date without working in the industry
First, I remain an acting programmer, I create a lot of source code and I enjoy it. Since I work not only with stand-alone projects, but often develop some modules for existing systems, such as ProM academic system for Process Mining, which consists of many plugins written by different professionals, this forces me to work with approaches used by various professionals in software development. Working with someone else’s code, and preferably a good one, is a great experience, which helps me to learn from best practices and avoid using poor solutions, the so-called ‘anti-patterns’, in future.
The ‘program feeling’ I had before my master’s studies and work at the laboratory is quite different from what I have now. I’ve started to understand the nature of information technology, mainly thanks to in-depth studies of formal methods, which form the mathematical basis for software engineers. This is part of a ‘systemic approach’, which helps develop a professional year after year.
There is an active trend towards project work at HSE today. Speaking about the Software Engineering programme, project work plays an ever-increasing role, since we teach our students practical things, and they can fully comprehend software production only by learning the whole cycle of development for independent products or system components
I spend most of my free time developing my own software related to the area of my research. I ‘polish’ my developments thoroughly and try to teach my students do the same. This includes modeling, thorough API development, distinguishing interfaces from implementation, developing unit-tests and other tests, and so on. Product life cycle support is provided by a project management system with obligatory version log and control. This makes possible to include a new developer in the project at any stage, who will get suitable tools in their hands and will be able to start working in the active business process as soon as possible.
The second method is term papers, graduation theses, and work with interns. I’m always interested in how students would solve the task, so I usually keep the detailed solution scheme under control, to a certain extent, because I understand that students work on projects during a limited period, and then this project passes to other ones. I have to understand how certain things are implemented, so that I can help continue working on them. The students are willing to use all the newest and most interesting things in their projects, so I’m always facing some new technologies.
And finally, I keep educating myself. When I choose a certain set of technologies to work on the code, I read the most recent reviews, technical blogs, and experts who contribute their inventions in the field. Such experts, for example, are Bertrand Meyer and Bjarne Stroustrup, who created C++ and is still continuing theoretical activities for its development.
‘Missionary work’ at HSE
Since 2015, I’ve represented HSE in regional cities. It all started from an offer to go to Voronezh and speak to school children at an HSE Open Day. It turned out to be an exciting experience.
There is a chance that I never meet many of the people working at HSE, but there I met some remarkable colleagues from various departments, and we spent two and a half days together traveling from school to school and speaking about our university. In addition to that, it was the first time in many years that I came to a high, not higher, school. I teach first-year students, and their difference from school students is that they’ve already decided where to study. In this sense, I quickly found a common language with school students. It was interesting to talk to those who hadn’t decided yet and to understand what they want to hear, what they are interested in.
Sometime later, I went to Tambov, then to Stavropol, and also to Barnaul as one of the organizers of the HSE School Olympiad. I’m rather fond of such ‘missionary work’. HSE is strong in terms of its reputation and the ability to convert it into something practical. This approach is important to every researcher, since each of us faces this task – getting a good academic reputation, then distributing our research, and converting it into something else.
Finally, on next plans
I’m currently planning on focusing on my research and am dedicating much more time to it rather than teaching. My primary task is to defend my candidate’s thesis next year and thus write one more page of my professional biography.
Interview by Olga Podolskaya
As part of the HSE Faculty of Computer Science fifth anniversary celebration at Mercury Moscow City Tower, Ilya Segalovich Scholarships were awarded.
On April 4, 2019, the world finals of ICPC, an annual student team competition and the main event in competitive programming, took place.
On March 20, a conference for HSE staff and students will take place at HSE. It will consider the university’s development programme and elect the new Academic Council. The previous conference took place five years ago, in 2014, and the university has changed a lot since then. HSE News Service talked with some of the university leaders about how their own work at the University has changed over this period.
Global online platform Coursera has launched the Data Science Academy project - a guide to the world of data science with a selection of the best online courses, specializations, and online master's programmes from world leading companies and universities. The English online course ‘Advanced Machine Learning’, developed by HSE in cooperation with Yandex is among the Academy's courses.
On December 1st and 2nd, the semifinals of the International Collegiate Programming Contest (ICPC) for teams from the Northern Eurasia region were held. HSE teams from St. Petersburg (8th place) and the Moscow-based Faculty of Computer Science (13th place) received first class certificates. Furthermore, the teams will represent HSE at the finals, which will be held in Portugal next year.
Professor Lev Shchur, Academic Supervisor of the master's programme in Supercomputer Simulations in Science and Engineering, and Professor Hai-Qing Lin, Director of Beijing Computational Science Research Center (CSRC) have signed a cooperation agreement for implementing a joint research project in computational physics and high performance computing.
On September 12, Pham Cong Thang, who goes by Thang, arrived in Moscow to begin a post-doctoral fellowship at HSE’s International Laboratory of Deep Learning and Bayesian Methods (Faculty of Computer Science, Big Data and Information Retrieval School). Working under the supervision of Professor Dmitry Vetrov, Thang will focus primarily on text and image processing, and computer vision.
On July 17-23 the Third Machine Learning summer school organized by Yandex School of Data Analysis, Laboratory of Methods for Big Data Analysis at the National Research University Higher School of Economics and Imperial College London was held in Reading, UK. 60 students, doctoral students and researchers from 18 countries and 47 universities took part in the event.
Team HSE has taken second place at RuCTF - a leading information security competition. The championship was held in Ekaterinburg on April 14-17, 2017. RuCTF (‘Capture the Flag’) is an annual open all-Russian interuniversity competition and conference on information security. The event has been held annually since 2008.
Thirty school students from Moscow and the Moscow Region recently had an opportunity to meet international researchers and analyze data obtained from the Large Hadron Collider at a workshop organized by HSE’s Faculty of Computer Science, Yandex and CERN.