Abstract:
As part of a continuous effort to maintain
and improve the quality of data in DHS
surveys, this report examines whether
variation in 25 indicators of data quality,
in 15 recent DHS surveys, can be attributed
to interviewers and their characteristics.
The analysis is based on interviewer ID codes
that appear at several points in DHS data
files, and information about the interviewers
obtained in a Fieldworker Survey that is now
a standard component of all DHS surveys. All
of the data files are publicly available.
The 25 indicators are in three broad
categories: nonresponse and refusals;
reported age at death of young children; and
ages and dates. The third group includes five
subgroups or domains: incompleteness of age,
which usually takes the form of a missing
month of birth; inconsistency between age in
the household survey and age in the
individual surveys of women or men; heaping
on ages that end in 0 or 5; displacement of
age across boundaries for eligibility; and a
new indirect indicator of over-dispersion of
children’s age derived from flagging of the
height-for-age and weight-for-age scores. All
indicators are defined at the level of the
individual, with outcome “1” for a
problematic or potentially problematic
response, and otherwise either “0” or “Not
Applicable”. Because the outcomes are binary,
they can be easily analyzed with logit
regression and related versions of
generalized linear models. Combinations of
indicators and surveys are judged to be
problematic if the level or prevalence of the
outcome “1” is relatively far from an
acceptable level and there is highly
significant variation in the outcome across
interviewers. Many such combinations are
identified, with systematic in-depth
investigation of several examples. It is
found that when there is a high degree of
variation across interviewers, in terms of a
data quality indicator, the bulk of that
variation can often be traced to a handful of
interviewers on the same team or on different
teams.
To investigate the potential effect of the
covariates in the Fieldworker Survey, similar
indicators are pooled and all the surveys are
pooled. There are exceptions, but it is
generally found that interviewers who are
older and better educated have lower levels
of problematic outcomes. Prior experience
with a DHS survey or with other surveys is
often statistically significant, and often—
but not always—in the direction of better
quality data. There is concern when previous
experience may lead to worse, rather than
better, data.
The most important limitation is that
interviewer assignments are almost always to
just one or two geographic regions within a
country, and the quality of the data they
collect is confounded with potentially
relevant characteristics of the regions and
the composition of potentially relevant
characteristics of the respondents. For
example, the respondents’ level of education
is associated with the accuracy of their
stated age, and interviewers assigned to a
region with a low level of education cannot
be expected to obtain the same quality of
responses as interviewers who are assigned to
other regions.
Further analysis is planned that will include
characteristics of the respondents along with
those of the interviewer, and possible
statistical interactions that reflect the
social distance between interviewers and
respondents. The methods and findings of this
study are relevant to ongoing efforts to
improve the training of interviewers and the
monitoring of fieldwork.