• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2020/2021

Анализ данных

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Статус: Курс обязательный (Системная и программная инженерия)
Направление: 09.04.04. Программная инженерия
Когда читается: 1-й курс, 3, 4 модуль
Формат изучения: без онлайн-курса
Прогр. обучения: Системная и программная инженерия
Язык: английский
Кредиты: 4
Контактные часы: 64

Course Syllabus

Abstract

The course is taught to students of a master degree of Computer science faculty in NRU HSE in the third and fourth modules of the first year of training. The number of credits is 4. Training in an audience takes 64 hours, including 24 hours of lectures and 40 hours of seminars. The control includes in-class tasks, a homework, a control work, and an examination work.The main purpose of the course is to teach students how to use different data analysis methods to analyze real data.
Learning Objectives

Learning Objectives

  • give students an introduction to the most widely used data analysis methods
  • explain the data analysis methods using real data and concentrating on complications that may occur during the analysis in real-life research
  • teach students how to organize their own research project using the knowledge obtained during the course
  • explain how to use data analysis tools in the most effective way to perform the research tasks
Expected Learning Outcomes

Expected Learning Outcomes

  • select appropriate methods of data analysis depending on the research question and types of empirical data
  • prepare empirical data for their further analysis
  • formulate research hypotheses and construct models
  • create a regression model and describe it
  • create a factor model and describe it
  • create a cluster model and describe it
Course Contents

Course Contents

  • Introduction to data analysis
    Statistical packages and programming languages for data analysis. Data sources. Working with data (exploring data, entering new data, coding variables, preparing data for analysis, export/import of the data, modifying data).
  • Descriptive data analysis
    Frequency analysis. Graphical analysis. Statistical characteristics: central tendency estimations, dispersion, standard deviation, standard error of mean, confidence interval, percentile values, measuring symmetry and pointiness of distribution. Normal distribution, Z-standardization, Kolmogorov-Smirnov test of normality. Working with multiple response questions.
  • Investigating relationships between variables
    Cross tabulation analysis. Formulation and testing hypothesis. Level of significance and first type error. Chi-square test. Correlation coefficients: bivariate, part and partial. T-tests. ANOVA. Non-parametric tests.
  • Regression analysis
    Objectives of regression analysis. Graphical representation of regression line. Simple and multiple linear regression. Logistic regression. Interpreting results of regression analysis. Multicollinearity. Heteroscedasticity. Dummy variables. Regression model limitations and diagnostics.
  • Factor analysis
    Factor analysis steps. Evaluating applicability of data for factor analysis. Methods of factor analysis. Factor loading, rotation. Saving factors as new variables. Interpreting factors.
  • Cluster analysis
    Cluster analysis steps. Evaluating applicability of data for cluster analysis. Methods of cluster analysis: hierarchical and k-means. Saving cluster membership information as new variable. Characterizing clusters.
  • Panel data analysis
    Advantages and problems of using panel data. Classification of panel data models. Panel data regression estimation methods. Models with fixed and random effects. Criteria for choosing the optimal model.
  • Time series analysis
    Stationary and non-stationary time series. Forecasting values for future periods. Autoregressive models, integral models and moving average models (ARIMA).
Assessment Elements

Assessment Elements

  • non-blocking Tasks in class(TC)
    tasks which are performed in class and are aimed at developing students’ skills in data analysis
  • non-blocking Homework (HW)
  • non-blocking Control Work (CW)
    two written works which are performed in class
  • non-blocking Examination Work (EW)
    Экзамен проводится в письменной форме. Экзамен проводится на платформе MS Teams (https://www.microsoft.com/ru-ru/microsoft-365/microsoft-teams/group-chat-software). К экзамену необходимо подключиться за 5 минут до начала. Компьютер студента должен удовлетворять требованиям: наличие рабочей камеры и микрофона, установленное приложение MS Teams. Для участия в экзамене студент обязан явиться на экзамен согласно точному расписанию и быть готовым отвечать на вопросы преподавателя с включённым микрофоном и камерой. Во время экзамена студентам запрещено пользоваться подсказками посторонних людей. Во время экзамена студентам разрешено задавать преподавателю уточняющие вопросы, если не понятно задание. Кратковременным нарушением связи во время экзамена считается нарушение связи менее 10 минут. Долговременным нарушением связи во время экзамена считается нарушение длительностью более 10 минут. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.3 * Control Work (CW) + 0.3 * Examination Work (EW) + 0.2 * Homework (HW) + 0.2 * Tasks in class(TC)
Bibliography

Bibliography

Recommended Core Bibliography

  • Core concepts in data analysis: summarization, correlation and visualization, Mirkin, B., 2011
  • Introduction to econometrics, Dougherty, C., 2016

Recommended Additional Bibliography

  • Idris, I. (2016). Python Data Analysis Cookbook. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1290098
  • McKinney, W. (2018). Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython (Vol. Second edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1605925