Data and Service Engineering for Automating Business Processes

Master 2021/2022

Type: Elective course (Data Science)

Area of studies: Applied Mathematics and Informatics

Delivered by: School of Data Analysis and Artificial Intelligence

Where: Faculty of Computer Science

When: 2 year, 1, 2 module

Mode of studies: offline

Open to: students of one campus

Instructors: Alexey Neznanov

Master’s programme: Data Science

Language: English

ECTS credits: 8

Contact hours: 54

Full Syllabus

Abstract

Machine learning is changing the world rapidly and dramatically, every modern enterprise is now eyeing machine learning as one of the top instruments to improve business KPIs. Yet, behind any successful application of machine learning is a large chunk of work that is done by engineers, which includes Data Engineering functions such as data cleaning, wrangling, integration, etc. And the models must be deployed in production as reliable services. And finally, advanced analytics will need to take place in order to understand how the service is operating. In this course you will learn the basics of these engineering and analytic disciplines. We won’t focus on machine learning algorithms in this course, its a prerequisite.

Learning Objectives

To gain basic proficiency in data engineering, understand the key concepts, technologies and challenges of this subject area.

Expected Learning Outcomes

Basic understanding of problems in data integration and data cleaning, familiarity of ETL processes and data warehouses.
Understanding of advanced anomaly detection and collective learning techniques and their applications in building machine learning services.
Understanding of basic reliability and durability mechanisms used in database and streaming systems.
Understanding of Big Data technologies, including Hadoop and Spark stack and massively parallel DBMSs.
Understanding of course content.
Understanding of different Enterprise Architectures for real-time online businesses, various trade-offs of using each type of architecture.
Understanding of key aspect of reliability of ML services and key technologies to build a reliable machine learning service.
Understanding of non-relational database, when they should be used, what are their strengths and weaknesses.
Understanding of query processing and optimisation in relational systems, ability to reason about and optimise query plans.
Understanding of relational model, SQL, its power and its limitations.

Course Contents

Introduction
Relational Data Model and Databases
Non-relational Databases
Event-based data models. Kappa and Lambda architectures. Process mining.
Durability and Reliability of Databases and Streaming Systems
Query Processing in Relational Systems
Big Data
Data Integration and cleaning
Building a reliable ML service
Anomaly detection and collective learning

Assessment Elements

Programming task 1
Programming task 2
Exam
You can receive full credit for the final automatically, if you do well on all the assignments.
Programming task 1
Programming task 2
Exam
You can receive full credit for the final automatically, if you do well on all the assignments.

Interim Assessment

2021/2022 2nd module
0.2 * Exam + 0.4 * Programming task 2 + 0.4 * Programming task 1

Bibliography

Recommended Core Bibliography

Harrington, J. L. Relational database design and implementation. – Morgan Kaufmann, 2016. – 441 pp.

Recommended Additional Bibliography

Xu Z. et al. (ed.). Big Data: 6th CCF Conference, Big Data 2018, Xi'an, China, October 11-13, 2018, Proceedings. – Springer, 2018. – Vol. 945.

Course Syllabus