TY - JOUR
T1 - Reconciliation of inconsistent data sources using hidden Markov models
AU - Pankowska, Paulina
AU - Pavlopoulos, Dimitris
AU - Bakker, Bart
AU - Oberski, Daniel L.
N1 - Publisher Copyright:
© 2020 - IOS Press and the authors. All rights reserved.
PY - 2020
Y1 - 2020
N2 - This paper discusses how National Statistical Institutes (NSI's) can use hidden Markov models (HMMs) to produce consistent official statistics for categorical, longitudinal variables using inconsistent sources. Two main challenges are addressed: first, the reconciliation of inconsistent sources with multi-indicator HMMs requires linking the sources on the micro level. Such linkage might lead to bias due to linkage error. Second, applying and estimating HMMs regularly is a complicated and expensive procedure. Therefore, it is preferable to use the error parameter estimates as a correction factor for a number of years. However, this might lead to biased structural estimates if measurement error changes over time or if the data collection process changes. Our results on these issues are highly encouraging and imply that the suggested method is appropriate for NSI's. Specifically, linkage error only leads to (substantial) bias in very extreme scenarios. Moreover, measurement error parameters are largely stable over time if no major changes in the data collection process occur. However, when a substantial change in the data collection process occurs, such as a switch from dependent (DI) to independent (INDI) interviewing, re-using measurement error estimates is not advisable.
AB - This paper discusses how National Statistical Institutes (NSI's) can use hidden Markov models (HMMs) to produce consistent official statistics for categorical, longitudinal variables using inconsistent sources. Two main challenges are addressed: first, the reconciliation of inconsistent sources with multi-indicator HMMs requires linking the sources on the micro level. Such linkage might lead to bias due to linkage error. Second, applying and estimating HMMs regularly is a complicated and expensive procedure. Therefore, it is preferable to use the error parameter estimates as a correction factor for a number of years. However, this might lead to biased structural estimates if measurement error changes over time or if the data collection process changes. Our results on these issues are highly encouraging and imply that the suggested method is appropriate for NSI's. Specifically, linkage error only leads to (substantial) bias in very extreme scenarios. Moreover, measurement error parameters are largely stable over time if no major changes in the data collection process occur. However, when a substantial change in the data collection process occurs, such as a switch from dependent (DI) to independent (INDI) interviewing, re-using measurement error estimates is not advisable.
KW - Data reconciliation
KW - dependent interviewing
KW - hidden Markov model
KW - inconsistent data sources
KW - latent class model
KW - linkage error
KW - measurement error
UR - http://www.scopus.com/inward/record.url?scp=85097257989&partnerID=8YFLogxK
U2 - 10.3233/SJI-190594
DO - 10.3233/SJI-190594
M3 - Article
AN - SCOPUS:85097257989
SN - 1874-7655
VL - 36
SP - 1261
EP - 1279
JO - Statistical Journal of the IAOS
JF - Statistical Journal of the IAOS
IS - 4
ER -