- The research seminar is aimed to help students gaining data skills that are required for successful completion of their educational program as well as for solving day to day business tasks in logistics and supply chain management.
- Knows most commonly used data types in R
- Shares the analysis results in the form of R Markdown reports
- Knows the grammar of data visualization and common methods for exploring patterns in continuous, categorical and multi-dimensional data
- Formulates the data analysis problem based on the business problem description
- Knows tools for data transformation available in R
- Determines data requirements to address the analysis tasks
- Writes functions in R and applies it to lists and tibbles
- Knows methods of time series forecasting
- Applies R statistical programming language for analysis, visualization and forecasting of economic data
- Knows the concept of statistical inference and basic tests for comparing groups
- Builds predictive models for a continuous output variable (regression task)
- Uses methods for model evaluation and model selection
- Builds predictive models for a categorical output variable (classification task)
- Chooses a suitable method for solving data analysis problem
- Introduction to R programming language and software ecosystemOverview of the software for data analysis and the role of open-source tools. R statistical programming language. CRAN repository and CRAN Task Views. Rstudio IDE. R scripts and R Markdown documents. The concepts of reproducibility and literate analysis. Using R Markdown for reporting. Basic data structures and data manipulation in R. Variables, functions and control flow.
- The grammar of graphics and the ggplot2 package for exploratory data analysisThe elements of the grammar of graphics: the data, the layers, the geoms, the scales, the transformations, the facets. Using the ggplot2 package for analysis of univariate and multivariate data. The concept and the purpose of exploratory data analysis.
- Data importing and data transformation using tidy toolsImporting data from text and Excel files. Tibbles. The grammar of data transformation. Tidy data. Transforming and reshaping data using tidy tools. Cleaning data and handling missing values.
- Introduction to functional programming in RThe Don’t Repeat Yourself principle. Writing functions in R. Applying functions to lists and tibbles. Using purr::map_* functions to process lists and data frames. The split-apply-combine principle.
- Time series analysis and forecastingComponents of time series: trend, seasonality, cycles. A stationary time series. Selecting the method of forecasting. Assessment of the adequacy of the selected method of forecasting. Tools of exploring data sets. Time series decomposition in R. Getting time series data from the Web. Forecast accuracy evaluation. "Naive" forecasting models as a baseline for model evaluation. Exponential Smoothing. State-space models. Adjustments and Box-Cox transformations for time series data. The tidy approaches to time series forecasting in R.
- Statistical Inference. Methods for comparing groupsThe sources of data. The definition of the studied general population and sample. The concept of statistical inference. Interval estimation of a population’s mean. Null hypothesis statistical testing. Tests for comparing groups. One-way Analysis of Variance. Distribution fitting.
- The regression task. Multiple linear regressionCross-sectional data. Finding patterns in multivariate data. Correlation analysis. Simple linear regression. Building and interpreting the linear regression models. Checking assumptions. Statistical inference using regression output. Non-linear and variance stabilizing transformations. Multiple regression. Using categorical predictors. Modeling and interpreting the interactions. Multicollinearity. Methods for variable selection for multiple regression models. The concept of regularization. Ridge, LASSO and Elastic Net regression. Regression analysis of time series data . Exploring data using autocorrelation analysis. The model of "white noise". A stationary time series. Building regression models with autocorrelations. Identification and elimination of autocorrelation. Time series and the problem of heteroscedasticity. Cointegration of time series. The Box-Jenkins methodology of modeling time series (ARIMA). ARIMA models and its parameters. Bringing the time series to stationarity. The procedure for identification of ARIMA models.
- The classification task. Decision trees and logistic regressionFinding patterns in categorical data. The classification task. Logistic regression. Decision trees. Evaluating and comparing the classifiers. Accuracy metrics for classification. Using resampling and cross-validation to evaluate the models. ROC analysis.
- The MLR framework for building, evaluating and deploying predictive modelsAutomating model building, validation and scoring using the mlr package.
- Participation (неблокирующий)Solving in-class and homework assignments. Participating discussions.
- Assessment (неблокирующий)Individual assignments on exploratory data analysis, time series forecasting and regression
- Presentation (неблокирующий)A 7-10 minutes presentation on the business cases for using data analysis and predictive modeling in business and research
- Final Examination (блокирующий)Project presentation and defense (15 slides presentation, computer application, report)
- Промежуточная аттестация (4 модуль)0.3 * Assessment + 0.3 * Final Examination + 0.2 * Participation + 0.2 * Presentation
- R for data science : import, tidy, transform, visualize, and model data, Wickham, H., Grolemund, G., 2017
- ggplot2 : elegant graphics for data analysis, Wickham H., 2009
- R в действии : анализ и визуализация данных в программе R, Кабаков Р. И., Волковой П. А., 2014
- Бизнес-прогнозирование, Ханк Дж. Э., Райтс А. Дж., 2003