• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Research Seminar

2019/2020
Учебный год
ENG
Обучение ведется на английском языке
3
Кредиты
Статус:
Курс обязательный
Когда читается:
3-й курс, 1-4 модуль

Преподаватели

Программа дисциплины

Аннотация

The course focuses on practical application of data analysis methods and tools. All classes are conducted in a computer lab and include a brief review of the necessary theoretical principles, their software implementations as well as examples of application. In this class the R statistical programming language is used combined with a modern package ecosystem for efficient data analysis. The class is taught in blended learning mode. MOOC “Data Analysis with R” on the Udacity platform is incorporated into the syllabus. The MOOC is developed by Facebook.
Цель освоения дисциплины

Цель освоения дисциплины

  • The research seminar is aimed to help students gaining data skills that are required for successful completion of their educational program as well as for solving day to day business tasks in logistics and supply chain management.
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Knows most commonly used data types in R
  • Shares the analysis results in the form of R Markdown reports
  • Knows the grammar of data visualization and common methods for exploring patterns in continuous, categorical and multi-dimensional data
  • Formulates the data analysis problem based on the business problem description
  • Knows tools for data transformation available in R
  • Determines data requirements to address the analysis tasks
  • Writes functions in R and applies it to lists and tibbles
  • Knows methods of time series forecasting
  • Applies R statistical programming language for analysis, visualization and forecasting of economic data
  • Knows the concept of statistical inference and basic tests for comparing groups
  • Builds predictive models for a continuous output variable (regression task)
  • Uses methods for model evaluation and model selection
  • Builds predictive models for a categorical output variable (classification task)
  • Chooses a suitable method for solving data analysis problem
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction to R programming language and software ecosystem
    Overview of the software for data analysis and the role of open-source tools. R statistical programming language. CRAN repository and CRAN Task Views. Rstudio IDE. R scripts and R Markdown documents. The concepts of reproducibility and literate analysis. Using R Markdown for reporting. Basic data structures and data manipulation in R. Variables, functions and control flow.
  • The grammar of graphics and the ggplot2 package for exploratory data analysis
    The elements of the grammar of graphics: the data, the layers, the geoms, the scales, the transformations, the facets. Using the ggplot2 package for analysis of univariate and multivariate data. The concept and the purpose of exploratory data analysis.
  • Data importing and data transformation using tidy tools
    Importing data from text and Excel files. Tibbles. The grammar of data transformation. Tidy data. Transforming and reshaping data using tidy tools. Cleaning data and handling missing values.
  • Introduction to functional programming in R
    The Don’t Repeat Yourself principle. Writing functions in R. Applying functions to lists and tibbles. Using purr::map_* functions to process lists and data frames. The split-apply-combine principle.
  • Time series analysis and forecasting
    Components of time series: trend, seasonality, cycles. A stationary time series. Selecting the method of forecasting. Assessment of the adequacy of the selected method of forecasting. Tools of exploring data sets. Time series decomposition in R. Getting time series data from the Web. Forecast accuracy evaluation. "Naive" forecasting models as a baseline for model evaluation. Exponential Smoothing. State-space models. Adjustments and Box-Cox transformations for time series data. The tidy approaches to time series forecasting in R.
  • Statistical Inference. Methods for comparing groups
    The sources of data. The definition of the studied general population and sample. The concept of statistical inference. Interval estimation of a population’s mean. Null hypothesis statistical testing. Tests for comparing groups. One-way Analysis of Variance. Distribution fitting.
  • The regression task. Multiple linear regression
    Cross-sectional data. Finding patterns in multivariate data. Correlation analysis. Simple linear regression. Building and interpreting the linear regression models. Checking assumptions. Statistical inference using regression output. Non-linear and variance stabilizing transformations. Multiple regression. Using categorical predictors. Modeling and interpreting the interactions. Multicollinearity. Methods for variable selection for multiple regression models. The concept of regularization. Ridge, LASSO and Elastic Net regression. Regression analysis of time series data . Exploring data using autocorrelation analysis. The model of "white noise". A stationary time series. Building regression models with autocorrelations. Identification and elimination of autocorrelation. Time series and the problem of heteroscedasticity. Cointegration of time series. The Box-Jenkins methodology of modeling time series (ARIMA). ARIMA models and its parameters. Bringing the time series to stationarity. The procedure for identification of ARIMA models.
  • The classification task. Decision trees and logistic regression
    Finding patterns in categorical data. The classification task. Logistic regression. Decision trees. Evaluating and comparing the classifiers. Accuracy metrics for classification. Using resampling and cross-validation to evaluate the models. ROC analysis.
  • The MLR framework for building, evaluating and deploying predictive models
    Automating model building, validation and scoring using the mlr package.
Элементы контроля

Элементы контроля

  • Participation (неблокирующий)
    Solving in-class and homework assignments. Participating discussions.
  • Assessment (неблокирующий)
    Individual assignments on exploratory data analysis, time series forecasting and regression
  • Presentation (неблокирующий)
    A 7-10 minutes presentation on the business cases for using data analysis and predictive modeling in business and research
  • Final Examination (блокирующий)
    Project presentation and defense (15 slides presentation, computer application, report)
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (4 модуль)
    0.3 * Assessment + 0.3 * Final Examination + 0.2 * Participation + 0.2 * Presentation
Список литературы

Список литературы

Рекомендуемая основная литература

  • R for data science : import, tidy, transform, visualize, and model data, Wickham, H., Grolemund, G., 2017

Рекомендуемая дополнительная литература

  • ggplot2 : elegant graphics for data analysis, Wickham H., 2009
  • R в действии : анализ и визуализация данных в программе R, Кабаков Р. И., Волковой П. А., 2014
  • Бизнес-прогнозирование, Ханк Дж. Э., Райтс А. Дж., 2003