Categorical Data Analysis: Introductory and Advanced Topics
- The aim of this course is to demonstrate students both theoretical rationale and important applications of categorical data analysis methods; provide students with skills to either conduct their own research using categorical data analysis or to be able to replicate existing research using these methods.
- The student who successfully completes this course should have a reasonable grasp of the theoretical foundations of categorical data analysis and have sufficient skills to apply categorical data analysis methods. As examples, students will understand and be able to apply basic asymptotic techniques (e.g. multivariate central limit theorem and delta method).
- The student will be able to derive and work with sampling distributions of binary or categorical measures. Students will be familiar with a variety of methods for analyzing categorical or count data (e.g. logit, probit, poisson regression, zero-inflated models) and understand in what settings they are applicable. The successful student will have a working knowledge of R and its packages for categorical data analysis.
- Probability Distributions and Statistical Inference for Categorical DataCategorical response data. Distributions for categorical data (Bernoulli distribution, multinoulli distribution). Statistical inference for categorical data. Statistical inference for a proportion. Contingency Tables. Table structure. Comparing proportions. Odds ratio. Chi-squared tests. Exact tests for small samples. Correlation for categorical data.
- GLMs for Binary Data: Binary Logistic Regression – IComponents of generalized linear model. GLMs for binary data. Fitting generalized linear models. Logistic Regression. Probit. Odds and Odds ratios. Logistic regression for classification. Multiple logistic regression.
- Binary Logistic Regression – IIInterpreting logistic regression. Inference for logistic regression. Categorical predictors. Summarizing effects. Strategies in model selection. Model checking. Wald Test, Chi-squared Test.
- GLMs for Ordered Data: Ordered LogitA latent variable model for ordinal variables. Identification. Estimation. Maximum Likelihood Estimation. The parallel regression assumption.
- GLMs for Nominal Responses: Multinomial LogitLogit models for nominal responses. The multinomial logit model. Maximum Likelihood Estimation. The Independence of Irrelevant Alternatives. The conditional logit model. Interpretation. Related models.
- GLMs for Count Data: Poisson ModelThe Poisson distribution. The Poisson regression model. The Negative Binomial Regression model. Beta Regression. Gamma Regression. Models for truncated counts. Zero-inflated models.
- Loglinear Models and Three-Way Contingency TablesAssociation in three-way tables. Loglinear models for 2-way and 3-way tables. Inference for loglinear models.
- In-Class Lab SessionWhatever students want to discuss: Estimators, Limited Outcomes or Clustered Models.
- Interim assessment (4 module)The final score (S) will be computed as S=0.4 RP+0.15 A_I+0.15 A_II+0.3 F, where RP – the score on a replication project; A_I and A_II are the scores on two take-home assignments; F – the score on final exam. Each component of the formula above is scored from 0 to 100 in accordance with the percentage of correct work. The final score (S) will be transformed to the HSE regular scale according to the following rule: 0-9 (0; F), 10-19 (1; F), 20-29 (2; F), 30-39 (3; F). 40-44 (4; C-), 45-54 (5; C+), 55-64 (6; B-), 65-74 (7; B+), 75-84 (8; A-). 85-94 (9; A), 95-100 (10; A+). Note that students who get scores > 75 on RP,A_I and A_II can use a final exam waiver. The final score for those students who use the final exam waiver will be computed as: S=0.6 RP+0.2 A_I+0.2 A_II
- Bilder, C. R., & Loughin, T. M. (2014). Analysis of Categorical Data with R. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1763590
- Brzezińska, J. (2017). Visual Techniques for Categorical Data in R / Metody wizualizacji danych w programie R. Ekonometria / Uniwersytet Ekonomiczny We Wrocławiu / Econometrics / Uniwersytet Ekonomiczny We Wrocławiu, (3), 26. https://doi.org/10.15611/ekt.2017.3.02
- Friendly, M., & Meyer, D. (2016). Discrete Data Analysis with R : Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1910511
- Paul D. Allison. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, (2), 186. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsrep&AN=edsrep.a.sae.somere.v28y1999i2p186.208