Dealing with missing data using the Heckman selection model: methods primer for epidemiologists

Johanna Muñoz*, Heather Hufstedler, Paul Gustafson, Till Bärnighausen, Valentijn M.T. De Jong, Thomas P.A. Debray

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Missing data is a common problem in epidemiologic studies and is often addressed by omitting incomplete records or adopting multiple imputation. Although these methods can produce unbiased estimates of study associations, their validity becomes problematic when data are missing not at random (MNAR), and the missing data mechanism is nonignorable. This situation typically arises when the presence of missing values depends on characteristics of the measurement or recording process, which is common in surveys and databases with electronic healthcare records. In this article, we discuss the relevance and implementation of Heckman selection models to impute variables that are missing not at random.
Original languageEnglish
Pages (from-to)5-13
Number of pages9
JournalInternational Journal of Epidemiology
Volume52
Issue number1
DOIs
Publication statusPublished - 1 Feb 2023

Keywords

  • Heckman selection model
  • exclusion restriction variables
  • selection bias
  • missing data
  • causal inference
  • real world data

Fingerprint

Dive into the research topics of 'Dealing with missing data using the Heckman selection model: methods primer for epidemiologists'. Together they form a unique fingerprint.

Cite this