• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

March, 19 – Regular Seminar

Event ended

Topic: Predictors of Respondent Survival in Online Panel Surveys: Evidence from Four Waves of the Values in Crisis project in Russia
Speakers: Boris Sokolov (СSWR, HSE), Viyaleta Korsunava (СSWR, HSE), Yuri Rykov (Okko).

Centre for Comparative Research on Social Well-Being announces the next regular seminar, which will be held as a Zoom session on March, 19 at 02:30 p.m. CET (04:30 p.m. Moscow time). Boris Sokolov (СSWR, HSE), Viyaleta Korsunava (СSWR, HSE), and Yuri Rykov (Okko) will deliver a report “Predictors of Respondent Survival in Online Panel Surveys: Evidence from Four Waves of the Values in Crisis project in Russia”.

Abstract. This project leverages modern machine learning methods to identify the factors that best predict respondent "survival" in longitudinal surveys—that is, participation in all waves of a study without dropping out or skipping any waves. Specifically, we used four supervised ML algorithms—Light Gradient Boosting Machine (LGBM), Categorical Boosting (CatBoost), Random Forest (RF), and logistic regression with Lasso regularization (LogLasso)—to predict survival across four waves of the Russian part of the Values in Crisis survey (covering the period from June 2020 to May 2022) using a large number of covariates (about 100 variables) measured at Wave 1. Our analysis yielded two main findings. First, "survival" (or, alternatively, dropout) is difficult to predict. Even state-of-the-art ML methods trained on a large number of covariates performed only moderately better than random guessing: no method achieved an accuracy higher than 65.5%. Second, the factors that meaningfully contributed to model predictions included not only standard demographic and survey behavior indicators but also some attitudinal variables. The top 10 most influential predictors (according to SHAP values averaged across the four ML models) included only two demographic variables (age and gender) and one survey behavior indicator (survey duration), while the rest were first-order values from Schwartz's Conservation and Self-Transcendence domains and measures of pandemic-related anxiety.

Everyone interested is invited!

The working language is English.