• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2021/2022

Большие данные: продвинутый уровень

Направление: 38.04.02. Менеджмент
Когда читается: 1-й курс, 4 модуль
Формат изучения: с онлайн-курсом
Охват аудитории: для своего кампуса
Прогр. обучения: Международный менеджмент
Язык: английский
Кредиты: 3
Контактные часы: 2

Course Syllabus

Abstract

Program International Management Link https://www.coursera.org/learn/big-data-integration-processing?specialization=big-data Semester 2 Level Graduate Year 1 Study mode MOOC Type of course Elective ECTS 3 Prerequisites The Course “Big Data Advanced Analytics” is an elective course. It is recommended to have a preliminary knowledge in the following disciplines prior attending this course: Introduction to Data Science Learning outcomes • to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms Contents The course covers the basic concepts in big data integration and processing, the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. It also introduces the big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. The course also provides students with practical hands-on experience to analyze Twitter data. This course covers the following topics: • Retrieving Big Data • Big Data Integration • Processing Big Data • Big Data Analytics using Spark • Learn By Doing: Putting MongoDB and Spark to Work
Learning Objectives

Learning Objectives

  • Students will be able to:• to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms
Expected Learning Outcomes

Expected Learning Outcomes

  • know basic concepts in big data integration and processing
  • you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
  • you will be introduced to the Postgres database
  • you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data
  • you will learn the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
Course Contents

Course Contents

  • Big Data Integration
  • Retrieving Big Data
  • Learn By Doing: Putting MongoDB and Spark to Work
  • Big Data Analytics using Spark
  • Processing Big Data
Assessment Elements

Assessment Elements

  • non-blocking test upon finishing this online course
  • non-blocking online test upon finishing this course
Interim Assessment

Interim Assessment

  • 2021/2022 4th module
    the result will be evaluated upon submission of the certificate
Bibliography

Bibliography

Recommended Core Bibliography

  • Goyal, A. (2020). A Self-Assessing Compilation Based Search Approach for Analytical Research and Data Retrieval.
  • Hoger Khayrolla Omar, & Alaa Khalil Jumaa. (2019). Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java. https://doi.org/10.24017/science.2019.1.2
  • Ilya Ganelin, Ema Orhian, Kai Sasaki, & Brennon York. (2016). Spark : Big Data Cluster Computing in Production. Wiley.

Recommended Additional Bibliography

  • Edward, S. G., & Sabharwal, N. (2015). Practical MongoDB : Architecting, Developing, and Administering MongoDB. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1124206
  • Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, & Su-I Hou. (2019). Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. https://doi.org/10.3390/data4020084
  • Langewisch, R. P. (2016). Performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX, A.