Большие данные: продвинутый уровень

Магистратура 2021/2022

Статус: Курс по выбору (Международный менеджмент / Master in International Management)

Направление: 38.04.02. Менеджмент

Кто читает: Департамент бизнес-информатики

Где читается: Высшая школа бизнеса

Когда читается: 1-й курс, 4 модуль

Формат изучения: с онлайн-курсом

Охват аудитории: для своего кампуса

Преподаватели: Смелов Леонид Сергеевич

Прогр. обучения: Международный менеджмент

Язык: английский

Кредиты: 3

Контактные часы: 2

Full Syllabus

Abstract

Program International Management Link https://www.coursera.org/learn/big-data-integration-processing?specialization=big-data Semester 2 Level Graduate Year 1 Study mode MOOC Type of course Elective ECTS 3 Prerequisites The Course “Big Data Advanced Analytics” is an elective course. It is recommended to have a preliminary knowledge in the following disciplines prior attending this course: Introduction to Data Science Learning outcomes • to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms Contents The course covers the basic concepts in big data integration and processing, the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. It also introduces the big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. The course also provides students with practical hands-on experience to analyze Twitter data. This course covers the following topics: • Retrieving Big Data • Big Data Integration • Processing Big Data • Big Data Analytics using Spark • Learn By Doing: Putting MongoDB and Spark to Work

Learning Objectives

Students will be able to:• to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms

Expected Learning Outcomes

know basic concepts in big data integration and processing
you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
you will be introduced to the Postgres database
you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data
you will learn the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.

Course Contents

Big Data Integration
Retrieving Big Data
Learn By Doing: Putting MongoDB and Spark to Work
Big Data Analytics using Spark
Processing Big Data

Assessment Elements

test upon finishing this online course
online test upon finishing this course

Interim Assessment

2021/2022 4th module
the result will be evaluated upon submission of the certificate

Bibliography

Recommended Core Bibliography

Goyal, A. (2020). A Self-Assessing Compilation Based Search Approach for Analytical Research and Data Retrieval.
Hoger Khayrolla Omar, & Alaa Khalil Jumaa. (2019). Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java. https://doi.org/10.24017/science.2019.1.2
Ilya Ganelin, Ema Orhian, Kai Sasaki, & Brennon York. (2016). Spark : Big Data Cluster Computing in Production. Wiley.

Recommended Additional Bibliography

Edward, S. G., & Sabharwal, N. (2015). Practical MongoDB : Architecting, Developing, and Administering MongoDB. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1124206
Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, & Su-I Hou. (2019). Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. https://doi.org/10.3390/data4020084
Langewisch, R. P. (2016). Performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX, A.

Course Syllabus