Магистратура
2020/2021
Большие данные: продвинутый уровень
Статус:
Курс по выбору (Международный менеджмент / Master in International Management)
Направление:
38.04.02. Менеджмент
Кто читает:
Департамент бизнес-информатики
Где читается:
Высшая школа бизнеса
Когда читается:
1-й курс, 4 модуль
Формат изучения:
с онлайн-курсом
Преподаватели:
Сахнюк Павел Анатольевич
Прогр. обучения:
Международный менеджмент
Язык:
английский
Кредиты:
3
Контактные часы:
2
Course Syllabus
Abstract
Program International Management Link https://www.coursera.org/learn/big-data-integration-processing?specialization=big-data Semester 2 Level Graduate Year 1 Study mode MOOC Type of course Elective ECTS 3 Prerequisites The Course “Big Data Advanced Analytics” is an elective course. It is recommended to have a preliminary knowledge in the following disciplines prior attending this course: Introduction to Data Science Learning outcomes • to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms Contents The course covers the basic concepts in big data integration and processing, the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. It also introduces the big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. The course also provides students with practical hands-on experience to analyze Twitter data. This course covers the following topics: • Retrieving Big Data • Big Data Integration • Processing Big Data • Big Data Analytics using Spark • Learn By Doing: Putting MongoDB and Spark to Work
Learning Objectives
- Students will be able to:• to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms
Expected Learning Outcomes
- know basic concepts in big data integration and processing
- you will be introduced to the Postgres database
- you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
- you will learn the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
- you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data
Course Contents
- Big Data IntegrationIn this module you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
- Retrieving Big DataThis module covers the various aspects of data retrieval and relational querying. You will also be introduced to the Postgres database.
- Learn By Doing: Putting MongoDB and Spark to WorkIn this module you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data.
- Big Data Analytics using SparkIn this module, you will go deeper into big data processing by learning the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
- Processing Big DataThis module introduces Learners to big data pipelines and workflows as well as processing and analysis of big data using Apache Spark.
Interim Assessment
- Interim assessment (4 module)the result will be evaluated upon submission of the certificate
Bibliography
Recommended Core Bibliography
- Goyal, A. (2020). A Self-Assessing Compilation Based Search Approach for Analytical Research and Data Retrieval.
- Hoger Khayrolla Omar, & Alaa Khalil Jumaa. (2019). Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java. https://doi.org/10.24017/science.2019.1.2
- Ilya Ganelin, Ema Orhian, Kai Sasaki, & Brennon York. (2016). Spark : Big Data Cluster Computing in Production. Wiley.
Recommended Additional Bibliography
- Edward, S. G., & Sabharwal, N. (2015). Practical MongoDB : Architecting, Developing, and Administering MongoDB. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1124206
- Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, & Su-I Hou. (2019). Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. https://doi.org/10.3390/data4020084
- Langewisch, R. P. (2016). Performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX, A.