• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2020/2021

Categorical Data Analysis: Introductory and Advanced Topics

Category 'Best Course for New Knowledge and Skills'
Type: Elective course (Comparative Social Research)
Area of studies: Sociology
When: 1 year, 4 module
Mode of studies: offline
Instructors: Kirill Chmel
Master’s programme: Comparative Soсial Research
Language: English
ECTS credits: 5
Contact hours: 32

Course Syllabus

Abstract

In recent years the use of specialized methods for categorical data analysis has significantly increased in social science. Discrete variables have always used as standard measures in public opinion surveys and experimental studies. However, they also require methods that account for properties of probability distributions for categorical data. We will start our course with basics like contingency tables, and then move on to the modelling of binary, multinomial ordered and unordered outcomes. Such advanced topics as maximum likelihood estimation and optimization methods will be covered upon request. At the end of this course students will be able to conduct categorical data statistical analysis using the freeware package R and give a substantive interpretation of results.
Learning Objectives

Learning Objectives

  • The aim of this course is to demonstrate students both theoretical rationale and important applications of categorical data analysis methods; provide students with skills to either conduct their own research using categorical data analysis or to be able to replicate existing research using these methods.
Expected Learning Outcomes

Expected Learning Outcomes

  • The student who successfully completes this course should have a reasonable grasp of the theoretical foundations of categorical data analysis and have sufficient skills to apply categorical data analysis methods. As examples, students will understand and be able to apply basic asymptotic techniques (e.g. multivariate central limit theorem and delta method).
  • The student will be able to derive and work with sampling distributions of binary or categorical measures. Students will be familiar with a variety of methods for analyzing categorical or count data (e.g. logit, probit, poisson regression, zero-inflated models) and understand in what settings they are applicable. The successful student will have a working knowledge of R and its packages for categorical data analysis.
Course Contents

Course Contents

  • Probability Distributions and Statistical Inference for Categorical Data
    Categorical response data. Distributions for categorical data (Bernoulli distribution, multinoulli distribution). Statistical inference for categorical data. Statistical inference for a proportion. Contingency Tables. Table structure. Comparing proportions. Odds ratio. Chi-squared tests. Exact tests for small samples. Correlation for categorical data.
  • GLMs for Binary Data: Binary Logistic Regression – I
    Components of generalized linear model. GLMs for binary data. Fitting generalized linear models. Logistic Regression. Probit. Odds and Odds ratios. Logistic regression for classification. Multiple logistic regression.
  • Binary Logistic Regression – II
    Interpreting logistic regression. Inference for logistic regression. Categorical predictors. Summarizing effects. Strategies in model selection. Model checking. Wald Test, Chi-squared Test.
  • GLMs for Ordered Data: Ordered Logit
    A latent variable model for ordinal variables. Identification. Estimation. Maximum Likelihood Estimation. The parallel regression assumption.
  • GLMs for Nominal Responses: Multinomial Logit
    Logit models for nominal responses. The multinomial logit model. Maximum Likelihood Estimation. The Independence of Irrelevant Alternatives. The conditional logit model. Interpretation. Related models.
  • GLMs for Count Data: Poisson Model
    The Poisson distribution. The Poisson regression model. The Negative Binomial Regression model. Beta Regression. Gamma Regression. Models for truncated counts. Zero-inflated models.
  • Loglinear Models and Three-Way Contingency Tables
    Association in three-way tables. Loglinear models for 2-way and 3-way tables. Inference for loglinear models.
  • In-Class Lab Session
    Whatever students want to discuss: Estimators, Limited Outcomes or Clustered Models.
Assessment Elements

Assessment Elements

  • non-blocking Replication Project
  • non-blocking Take-home Assignments
  • non-blocking Final Exam
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    The final score (S) will be computed as S=0.4 RP+0.15 A_I+0.15 A_II+0.3 F, where RP – the score on a replication project; A_I and A_II are the scores on two take-home assignments; F – the score on final exam. Each component of the formula above is scored from 0 to 100 in accordance with the percentage of correct work. The final score (S) will be transformed to the HSE regular scale according to the following rule: 0-9 (0; F), 10-19 (1; F), 20-29 (2; F), 30-39 (3; F). 40-44 (4; C-), 45-54 (5; C+), 55-64 (6; B-), 65-74 (7; B+), 75-84 (8; A-). 85-94 (9; A), 95-100 (10; A+). Note that students who get scores > 75 on RP,A_I and A_II can use a final exam waiver. The final score for those students who use the final exam waiver will be computed as: S=0.6 RP+0.2 A_I+0.2 A_II
Bibliography

Bibliography

Recommended Core Bibliography

  • Bilder, C. R., & Loughin, T. M. (2014). Analysis of Categorical Data with R. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1763590

Recommended Additional Bibliography

  • Brzezińska, J. (2017). Visual Techniques for Categorical Data in R / Metody wizualizacji danych w programie R. Ekonometria / Uniwersytet Ekonomiczny We Wrocławiu / Econometrics / Uniwersytet Ekonomiczny We Wrocławiu, (3), 26. https://doi.org/10.15611/ekt.2017.3.02
  • Friendly, M., & Meyer, D. (2016). Discrete Data Analysis with R : Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1910511
  • Paul D. Allison. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, (2), 186. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsrep&AN=edsrep.a.sae.somere.v28y1999i2p186.208