• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2019/2020

Научно - исследовательский семинар

Направление: 38.03.02. Менеджмент
Кто читает: Кафедра информационных систем и технологий в логистике
Когда читается: 3-й курс, 1-4 модуль
Формат изучения: без онлайн-курса
Преподаватели: Заходякин Глеб Викторович, Кузнецова Юлия Александровна, Рожков Максим Игоревич
Язык: английский
Кредиты: 3
Контактные часы: 56

Course Syllabus

Abstract

The course focuses on practical application of data analysis methods and tools. All classes are conducted in a computer lab and include a brief review of the necessary theoretical principles, their software implementations as well as examples of application. In this class the R statistical programming language is used combined with a modern package ecosystem for efficient data analysis. The class is taught in blended learning mode. MOOC “Data Analysis with R” on the Udacity platform is incorporated into the syllabus. The MOOC is developed by Facebook.
Learning Objectives

Learning Objectives

  • The research seminar is aimed to help students gaining data skills that are required for successful completion of their educational program as well as for solving day to day business tasks in logistics and supply chain management.
Expected Learning Outcomes

Expected Learning Outcomes

  • Knows most commonly used data types in R
  • Shares the analysis results in the form of R Markdown reports
  • Knows the grammar of data visualization and common methods for exploring patterns in continuous, categorical and multi-dimensional data
  • Formulates the data analysis problem based on the business problem description
  • Knows tools for data transformation available in R
  • Determines data requirements to address the analysis tasks
  • Writes functions in R and applies it to lists and tibbles
  • Knows methods of time series forecasting
  • Applies R statistical programming language for analysis, visualization and forecasting of economic data
  • Knows the concept of statistical inference and basic tests for comparing groups
  • Builds predictive models for a continuous output variable (regression task)
  • Uses methods for model evaluation and model selection
  • Builds predictive models for a categorical output variable (classification task)
  • Chooses a suitable method for solving data analysis problem
Course Contents

Course Contents

  • Introduction to R programming language and software ecosystem
    Overview of the software for data analysis and the role of open-source tools. R statistical programming language. CRAN repository and CRAN Task Views. Rstudio IDE. R scripts and R Markdown documents. The concepts of reproducibility and literate analysis. Using R Markdown for reporting. Basic data structures and data manipulation in R. Variables, functions and control flow.
  • The grammar of graphics and the ggplot2 package for exploratory data analysis
    The elements of the grammar of graphics: the data, the layers, the geoms, the scales, the transformations, the facets. Using the ggplot2 package for analysis of univariate and multivariate data. The concept and the purpose of exploratory data analysis.
  • Data importing and data transformation using tidy tools
    Importing data from text and Excel files. Tibbles. The grammar of data transformation. Tidy data. Transforming and reshaping data using tidy tools. Cleaning data and handling missing values.
  • Introduction to functional programming in R
    The Don’t Repeat Yourself principle. Writing functions in R. Applying functions to lists and tibbles. Using purr::map_* functions to process lists and data frames. The split-apply-combine principle.
  • Time series analysis and forecasting
    Components of time series: trend, seasonality, cycles. A stationary time series. Selecting the method of forecasting. Assessment of the adequacy of the selected method of forecasting. Tools of exploring data sets. Time series decomposition in R. Getting time series data from the Web. Forecast accuracy evaluation. "Naive" forecasting models as a baseline for model evaluation. Exponential Smoothing. State-space models. Adjustments and Box-Cox transformations for time series data. The tidy approaches to time series forecasting in R.
  • Statistical Inference. Methods for comparing groups
    The sources of data. The definition of the studied general population and sample. The concept of statistical inference. Interval estimation of a population’s mean. Null hypothesis statistical testing. Tests for comparing groups. One-way Analysis of Variance. Distribution fitting.
  • The regression task. Multiple linear regression
    Cross-sectional data. Finding patterns in multivariate data. Correlation analysis. Simple linear regression. Building and interpreting the linear regression models. Checking assumptions. Statistical inference using regression output. Non-linear and variance stabilizing transformations. Multiple regression. Using categorical predictors. Modeling and interpreting the interactions. Multicollinearity. Methods for variable selection for multiple regression models. The concept of regularization. Ridge, LASSO and Elastic Net regression. Regression analysis of time series data . Exploring data using autocorrelation analysis. The model of "white noise". A stationary time series. Building regression models with autocorrelations. Identification and elimination of autocorrelation. Time series and the problem of heteroscedasticity. Cointegration of time series. The Box-Jenkins methodology of modeling time series (ARIMA). ARIMA models and its parameters. Bringing the time series to stationarity. The procedure for identification of ARIMA models.
  • The classification task. Decision trees and logistic regression
    Finding patterns in categorical data. The classification task. Logistic regression. Decision trees. Evaluating and comparing the classifiers. Accuracy metrics for classification. Using resampling and cross-validation to evaluate the models. ROC analysis.
  • The MLR framework for building, evaluating and deploying predictive models
    Automating model building, validation and scoring using the mlr package.
Assessment Elements

Assessment Elements

  • non-blocking Participation
    Solving in-class and homework assignments. Participating discussions.
  • non-blocking Assessment
    Individual assignments on exploratory data analysis, time series forecasting and regression.
  • non-blocking Presentation
    A 7-10 minutes presentation on the business cases for using data analysis and predictive modeling in business and research.
  • blocking Final Examination
    Examination format: This is an oral examination: project presentation and defense (15 slides presentation). The computer application and written report must be submitted beforehand. Your teacher will provide instructions on how to submit. The platform: The exam is taken on Zoom or MS Teams platforms. Your lecturer will choose a suitable platform and provide the connection link for the examination. Students are required to join a session 15 minutes before the beginning. The computers must meet the following technical requirements: For MS Teams: https://docs.microsoft.com/ru-ru/microsoftteams/hardware-requirements-for-the-teams-app For Zoom: https://support.zoom.us/hc/ru/articles/201362023-System-Requirements-for-PC-Mac-and-Linux A student is supposed to follow the requirements below: Check your computer for compliance with technical requirements no later than 7 days before the exam; Sign in with your corporate account (@edu.hse.ru) and use your real name when connecting; Check your microphone, speakers or headphones, webcam, Internet connection (we recommend using wired Internet connection if possible); Close applications other that MS Teams/Zoom client software and the applications required to present your project. If one of the necessary requirements for participation in the exam cannot be met, a student is obliged to inform a professor and a manager of a program 2 weeks before the exam date to decide on the student's participation in the exams. Students are *not* allowed to: Turn off the video camera during the presentation of your team’s project. All team participants must enable video from the webcam while presenting. You can turn off the camera when your team is not delivering the presentation. Read a pre-made text during the presentation; Leave the place where the exam is taken (go beyond the camera's visible area) while your team is presenting; Students are allowed to: Ask questions to the presenting team. Interact with other team members while presenting your project. Connection failures: A short-term communication failure during the exam is considered to be the loss of a student's network connection with the Zoom or MS Teams platform for no longer than 1 minute. A long-term communication failure during the exam is considered to be the loss of a student's network connection with the Zoom or MS Teams platform for longer than 1 minute. A student cannot continue to participate in the exam, if there is a long-term communication failure appeared. The retake procedure is similar to the exam procedure. In case of long-term communication failure during the examination task, the student must notify the teacher, record the fact of loss of connection with the platform (screenshot, a response from the Internet provider). Then contact the manager of a program with an explanatory note about the incident to decide on retaking the exam.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.3 * Assessment + 0.3 * Final Examination + 0.2 * Participation + 0.2 * Presentation
Bibliography

Bibliography

Recommended Core Bibliography

  • Lantz, B. (2019). Machine Learning with R : Expert Techniques for Predictive Modeling, 3rd Edition (Vol. Third edition). Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2106304
  • Larose, D. T., & Larose, C. D. (2015). Data Mining and Predictive Analytics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=958471
  • Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2015). Introduction to Time Series Analysis and Forecasting (Vol. Second edition). Hoboken, New Jersey: Wiley-Interscience. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=985114
  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131
  • Прогнозирование и планирование в условиях рынка : учеб. пособие / Т.Н. Бабич, И.А. Козьева, Ю.В. Вертакова, Э.Н. Кузьбожев. — М. : ИНФРА-М, 2018. — 336 с. — (высшее образование: Бакалавриат). — www.dx.doi.org/10.12737/2517. - Режим доступа: http://znanium.com/catalog/product/944382

Recommended Additional Bibliography

  • Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1175341
  • Larose, D. T., Larose, C. D. Discovering knowledge in data: an introduction to data mining. – John Wiley & Sons, 2014. – 336 pp.
  • Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and Applications. Cyprus, Europe: John Wiley & Sons, Inc. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.F848CE7