• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2021/2022

Data and Service Engineering for Automating Business Processes

Type: Elective course (Data Science)
Area of studies: Applied Mathematics and Informatics
When: 2 year, 1, 2 module
Mode of studies: offline
Open to: students of one campus
Instructors: Alexey Neznanov
Master’s programme: Data Science
Language: English
ECTS credits: 8
Contact hours: 54

Course Syllabus

Abstract

Machine learning is changing the world rapidly and dramatically, every modern enterprise is now eyeing machine learning as one of the top instruments to improve business KPIs. Yet, behind any successful application of machine learning is a large chunk of work that is done by engineers, which includes Data Engineering functions such as data cleaning, wrangling, integration, etc. And the models must be deployed in production as reliable services. And finally, advanced analytics will need to take place in order to understand how the service is operating. In this course you will learn the basics of these engineering and analytic disciplines. We won’t focus on machine learning algorithms in this course, its a prerequisite.
Learning Objectives

Learning Objectives

  • To gain basic proficiency in data engineering, understand the key concepts, technologies and challenges of this subject area.
Expected Learning Outcomes

Expected Learning Outcomes

  • Basic understanding of problems in data integration and data cleaning, familiarity of ETL processes and data warehouses.
  • Understanding of advanced anomaly detection and collective learning techniques and their applications in building machine learning services.
  • Understanding of basic reliability and durability mechanisms used in database and streaming systems.
  • Understanding of Big Data technologies, including Hadoop and Spark stack and massively parallel DBMSs.
  • Understanding of course content.
  • Understanding of different Enterprise Architectures for real-time online businesses, various trade-offs of using each type of architecture.
  • Understanding of key aspect of reliability of ML services and key technologies to build a reliable machine learning service.
  • Understanding of non-relational database, when they should be used, what are their strengths and weaknesses.
  • Understanding of query processing and optimisation in relational systems, ability to reason about and optimise query plans.
  • Understanding of relational model, SQL, its power and its limitations.
Course Contents

Course Contents

  • Introduction
  • Relational Data Model and Databases
  • Non-relational Databases
  • Event-based data models. Kappa and Lambda architectures. Process mining.
  • Durability and Reliability of Databases and Streaming Systems
  • Query Processing in Relational Systems
  • Big Data
  • Data Integration and cleaning
  • Building a reliable ML service
  • Anomaly detection and collective learning
Assessment Elements

Assessment Elements

  • non-blocking Programming task 1
  • non-blocking Programming task 2
  • non-blocking Exam
    You can receive full credit for the final automatically, if you do well on all the assignments.
  • non-blocking Programming task 1
  • non-blocking Programming task 2
  • non-blocking Exam
    You can receive full credit for the final automatically, if you do well on all the assignments.
Interim Assessment

Interim Assessment

  • 2021/2022 2nd module
    0.2 * Exam + 0.4 * Programming task 2 + 0.4 * Programming task 1
Bibliography

Bibliography

Recommended Core Bibliography

  • Harrington, J. L. Relational database design and implementation. – Morgan Kaufmann, 2016. – 441 pp.

Recommended Additional Bibliography

  • Xu Z. et al. (ed.). Big Data: 6th CCF Conference, Big Data 2018, Xi'an, China, October 11-13, 2018, Proceedings. – Springer, 2018. – Vol. 945.