• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2019/2020

Анализ категориальных данных: базовый и продвинутый уровни

Статус: Курс по выбору (Население и развитие)
Направление: 38.04.04. Государственное и муниципальное управление
Когда читается: 1-й курс, 3 модуль
Формат изучения: Full time
Преподаватели: Чмель Кирилл Шамилевич
Прогр. обучения: Население и развитие
Язык: английский
Кредиты: 5

Программа дисциплины

Аннотация

In recent years the use of specialized methods for categorical data analysis has significantly increased in social science. Discrete variables have always used as standard measures in public opinion surveys and experimental studies. However, they also require methods that account for properties of probability distributions for categorical data. We will start our course with basics like contingency tables, and then move on to the modelling of binary, multinomial ordered and unordered outcomes. Such advanced topics as maximum likelihood estimation and optimization methods will be covered upon request. At the end of this course students will be able to conduct categorical data statistical analysis using the freeware package R and give a substantive interpretation of results.
Цель освоения дисциплины

Цель освоения дисциплины

  • The aim of this course is to demonstrate students both theoretical rationale and important applications of categorical data analysis methods; provide students with skills to either conduct their own research using categorical data analysis or to be able to replicate existing research using these methods.
Результаты освоения дисциплины

Результаты освоения дисциплины

  • The student who successfully completes this course should have a reasonable grasp of the theoretical foundations of categorical data analysis and have sufficient skills to apply categorical data analysis methods. As examples, students will understand and be able to apply basic asymptotic techniques (e.g. multivariate central limit theorem and delta method).
  • The student will be able to derive and work with sampling distributions of binary or categorical measures. Students will be familiar with a variety of methods for analyzing categorical or count data (e.g. logit, probit, poisson regression, zero-inflated models) and understand in what settings they are applicable. The successful student will have a working knowledge of R and its packages for categorical data analysis.
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Probability Distributions and Statistical Inference for Categorical Data
    Categorical response data. Distributions for categorical data (Bernoulli distribution, multinoulli distribution). Statistical inference for categorical data. Statistical inference for a proportion. Contingency Tables. Table structure. Comparing proportions. Odds ratio. Chi-squared tests. Exact tests for small samples. Correlation for categorical data.
  • GLMs for Binary Data: Binary Logistic Regression – I
    Components of generalized linear model. GLMs for binary data. Fitting generalized linear models. Logistic Regression. Probit. Odds and Odds ratios. Logistic regression for classification. Multiple logistic regression.
  • Binary Logistic Regression – II
    Interpreting logistic regression. Inference for logistic regression. Categorical predictors. Summarizing effects. Strategies in model selection. Model checking. Wald Test, Chi-squared Test.
  • GLMs for Ordered Data: Ordered Logit
    A latent variable model for ordinal variables. Identification. Estimation. Maximum Likelihood Estimation. The parallel regression assumption.
  • GLMs for Nominal Responses: Multinomial Logit
    Logit models for nominal responses. The multinomial logit model. Maximum Likelihood Estimation. The Independence of Irrelevant Alternatives. The conditional logit model. Interpretation. Related models.
  • GLMs for Count Data: Poisson Model
    The Poisson distribution. The Poisson regression model. The Negative Binomial Regression model. Beta Regression. Gamma Regression. Models for truncated counts. Zero-inflated models.
  • Loglinear Models and Three-Way Contingency Tables
    Association in three-way tables. Loglinear models for 2-way and 3-way tables. Inference for loglinear models.
  • In-Class Lab Session
    Whatever students want to discuss: Estimators, Limited Outcomes or Clustered Models.
Элементы контроля

Элементы контроля

  • Replication Project (неблокирующий)
  • Take-home Assignments (неблокирующий)
  • Final Exam (неблокирующий)
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (3 модуль)
    The final score (S) will be computed as S=0.4 RP+0.15 A_I+0.15 A_II+0.3 F, where RP – the score on a replication project; A_I and A_II are the scores on two take-home assignments; F – the score on final exam. Each component of the formula above is scored from 0 to 100 in accordance with the percentage of correct work. The final score (S) will be transformed to the HSE regular scale according to the following rule: 0-9 (0; F), 10-19 (1; F), 20-29 (2; F), 30-39 (3; F). 40-44 (4; C-), 45-54 (5; C+), 55-64 (6; B-), 65-74 (7; B+), 75-84 (8; A-). 85-94 (9; A), 95-100 (10; A+). Note that students who get scores > 75 on RP,A_I and A_II can use a final exam waiver. The final score for those students who use the final exam waiver will be computed as: S=0.6 RP+0.2 A_I+0.2 A_II
Список литературы

Список литературы

Рекомендуемая основная литература

  • Bilder, C. R., & Loughin, T. M. (2014). Analysis of Categorical Data with R. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1763590
  • Categorical data analysis, Agresti A., 2002
  • Regression models for categorical and limited dependent variables, Long J. S., 1997

Рекомендуемая дополнительная литература

  • An introduction to categorical data analysis, Agresti A., 2007
  • Analysis of ordinal categorical data, Agresti A., 2010
  • Applied regression analysis and generalized linear models, Fox J., 2008
  • Brzezińska, J. (2017). Visual Techniques for Categorical Data in R / Metody wizualizacji danych w programie R. Ekonometria / Uniwersytet Ekonomiczny We Wrocławiu / Econometrics / Uniwersytet Ekonomiczny We Wrocławiu, (3), 26. https://doi.org/10.15611/ekt.2017.3.02
  • Friendly, M., & Meyer, D. (2016). Discrete Data Analysis with R : Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1910511
  • Generalized, linear, and mixed models, McCulloch C. E., Searle S. R., 2001
  • Log-linear models, Knoke D., Burke P. J., 1980
  • Paul D. Allison. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, (2), 186. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsrep&AN=edsrep.a.sae.somere.v28y1999i2p186.208