Step 4: Recoding of Datasets
The DHS Program makes the resulting survey datasets freely available to researchers, policy and decision makers. In order for the datasets to be clean and as comparable as possible across all surveys, The DHS Program generates "standard recode" datasets, which contain the same data as the raw datasets, but in a standardized format. In the "standard recode" datasets, the variable names and definitions are, wherever possible, consistent across all surveys. However, each survey is different, with questions that diverge from the standard. These questions are included in the standard recode datasets, either as computed standard variables or variables that are specific to that survey. The process of recoding can take several months and it involves consistency checking and comparisons between the standard recode and raw datasets.
Recoding of datasets is currently done for the DHS and AIS surveys. Work is currently in under way to recode the SPA datasets.
Recoding of DHS and AIS datasets
There are three core questionnaires in DHS surveys: the Household Questionnaire, the Women's Questionnaire, and the Men's questionnaire. There are also several standardized modules for countries with interest in other topics, such as malaria, domestic violence or maternal mortality. All additional modules are incorporated into the Household, Women's, or Men's questionnaires. There are two core questionnaires in the AIS surveys: the Household questionnaire and the Individual Questionnaire. The latter applies to women and men as well.
Since the survey methodology, sampling and eligibility of the DHS and AIS surveys are consistent, the DHS recode variables have been expanded to include the AIS variables as well.
Since the very beginning of DHS a recode file was designed for the sake of consistency and comparability across surveys. In the first phase of the DHS (DHS-I) the recode was defined only for the Women's Questionnaire. The recode file proved to be very useful and as a result since DHS-II, a recode file was introduced for the Household and the Men's questionnaires.
Recode files are initially created using a hierarchical model and later exported to flat files. There are two physical recode hierarchical data files. The first one includes the Household and Women's Questionnaire and the second one is for the Men's Questionnaire. The hierarchical data file is broken down into a number of records. The records were originally designed to map different sections of the model questionnaires, but because of changes among phases that is not the case anymore. Some of these records are repeating or multiple-occurrence records while others are single-occurrence records. Single records contain simple, single-answer variables. Multiple records are used to represent sets of questions that are repeated for a number of events.
There are special records to keep variables that are not part of the model questionnaires but were included in a particular country. These records are known as country-specific records and they can also be multiple or single depending on whether the question was added to a single or multiple section in the questionnaire.
See Also:
Data Tools and Manuals
Using Datasets for Analysis