• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2020/2021

Большие данные: продвинутый уровень

Направление: 38.04.02. Менеджмент
Когда читается: 1-й курс, 4 модуль
Формат изучения: с онлайн-курсом
Преподаватели: Сахнюк Павел Анатольевич
Прогр. обучения: Международный менеджмент
Язык: английский
Кредиты: 3
Контактные часы: 2

Course Syllabus

Abstract

Program International Management Link https://www.coursera.org/learn/big-data-integration-processing?specialization=big-data Semester 2 Level Graduate Year 1 Study mode MOOC Type of course Elective ECTS 3 Prerequisites The Course “Big Data Advanced Analytics” is an elective course. It is recommended to have a preliminary knowledge in the following disciplines prior attending this course: Introduction to Data Science Learning outcomes • to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms Contents The course covers the basic concepts in big data integration and processing, the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. It also introduces the big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. The course also provides students with practical hands-on experience to analyze Twitter data. This course covers the following topics: • Retrieving Big Data • Big Data Integration • Processing Big Data • Big Data Analytics using Spark • Learn By Doing: Putting MongoDB and Spark to Work
Learning Objectives

Learning Objectives

  • Students will be able to:• to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms
Expected Learning Outcomes

Expected Learning Outcomes

  • know basic concepts in big data integration and processing
  • you will be introduced to the Postgres database
  • you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
  • you will learn the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
  • you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data
Course Contents

Course Contents

  • Big Data Integration
    In this module you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
  • Retrieving Big Data
    This module covers the various aspects of data retrieval and relational querying. You will also be introduced to the Postgres database.
  • Learn By Doing: Putting MongoDB and Spark to Work
    In this module you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data.
  • Big Data Analytics using Spark
    In this module, you will go deeper into big data processing by learning the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
  • Processing Big Data
    This module introduces Learners to big data pipelines and workflows as well as processing and analysis of big data using Apache Spark.
Assessment Elements

Assessment Elements

  • non-blocking test upon finishing this online course
  • non-blocking online test upon finishing this course
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    the result will be evaluated upon submission of the certificate
Bibliography

Bibliography

Recommended Core Bibliography

  • Goyal, A. (2020). A Self-Assessing Compilation Based Search Approach for Analytical Research and Data Retrieval.
  • Hoger Khayrolla Omar, & Alaa Khalil Jumaa. (2019). Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java. https://doi.org/10.24017/science.2019.1.2
  • Ilya Ganelin, Ema Orhian, Kai Sasaki, & Brennon York. (2016). Spark : Big Data Cluster Computing in Production. Wiley.

Recommended Additional Bibliography

  • Edward, S. G., & Sabharwal, N. (2015). Practical MongoDB : Architecting, Developing, and Administering MongoDB. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1124206
  • Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, & Su-I Hou. (2019). Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. https://doi.org/10.3390/data4020084
  • Langewisch, R. P. (2016). Performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX, A.