Whether you've analyzed DHS data before or are a first-time user, below are some resources to help you analyze DHS data efficiently.
Step-by-step introduction to analyzing DHS data
Step 1: Select surveys for analysis
Step 2: Review questionnaires
Step 3: Register for dataset access
Step 4: Download datasets
Step 5: Open your dataset
Step 6: Get to know your variables
Step 7: Use sample weights
Step 8: Consider special values
Step 2: Review questionnaires. Familiarize yourself with the questionnaires used to collect the data that you want to analyze. Model questionnaires are used for each survey phase , but each country modifies the core questionnaire slightly to meet their needs. The questionnaires used to collect data for a specific survey are always included at the back of each survey's final report. All final reports are free to download, and in some cases, can be ordered in hard copy, also for free.
Use the questionnaires to determine whether the information you want to analyze was collected in your survey of interest, and who you want to analyze (your unit of analysis).
If the data you want to analyze was collected for everyone listed in the household questionnaire, your unit of analysis is probably household members. On the other hand, if for example you want to analyze data about women's contraceptive use, you will find that the relevant questions were asked in the women's questionnaire, and your unit of analysis is women. The unit of analysis will help you determine which dataset you want to download in step 4.
Step 3: Register for dataset access. All DHS datasets are free to download and use. To download datasets, you must complete a short registration form. Remember your username and password; you can use it later to login quickly and register for access to additional datasets.
Requests to access datasets are usually approved within 24 hours. You will receive an email from firstname.lastname@example.org once your request has been approved with instructions for download.
Step 4: Download datasets. Follow instructions from the email you received. Once you log in to dhsprogram.com, you will see the country, survey, and list of datasets that you are approved to download. The list of Zip files containing datasets are labeled with brief but meaningful names, such as KEIR41DT. The full description of file naming conventions is here, but briefly:
Step 5: Open your dataset in the software you are using for analysis.
A note for Stata users: if your memory and maximum number of variables (maxvar) have not been adjusted from the factory settings, you may get an error message when trying to open DHS datasets, which are very large:
Change the memory and maxvar settings. Try
set memory 450m
set maxvar 10000
to start. You may be able to set these values higher depending on your computer. These settings should allow you to open a DHS dataset.
Step 6: Get to know your variables. When your dataset is open, you will see thousands of variables with confusing names and very short variable labels that briefly describe the contents of each variable. To understand each variable and its contents, get to know the DHS recode manual. Some analysts refer to the recode manual as the "DHS Analysis Bible." Why is the recode manual so important? Here's an example:
In your dataset (assuming you are using an IR, BR, KR, or MR file) check the label of v107 (mv107). The label says "highest year of education." If you analyze this variable assuming it is the respondent's highest year of education, you will have highly misleading results. Why? Because the variable label needs to be short, and so cannot give complete information about every variable included in the dataset. Download the DHS recode manual and look through it to find v107. See that v107 is the highest year of education at the level recorded in v106. Had you analyzed v107 as the highest years of education, you would have seriously underestimated the level of education in the country you are studying. This is just one example of why it is important to use the DHS recode manual.
Step 7: Use sample weights. DHS sample weights are used in almost every tabulation in DHS final reports. The few unweighted tables are clearly labeled. Sample weights are described fully in the Guide to DHS Statistics but briefly, weights are used in all analyses to make sample data representative of the entire population. There are different weights for different sample selections/units of analysis:
|Sample weights in DHS datasets|
|Unit of analysis||Variable|
|Women or children||v005|
|HIV test results||hiv05|
like other variables in DHS datasets, decimal points are not included in the weight variable. Analysts need to divide the sampling weight they are using by 1,000,000. Examples:
generate wgt = v005/1000000
tab var [iweight=wgt]
COMPUTE WGT = V005/1000000.
WEIGHT by WGT.
These are just examples; other types of weights are available in different software packages.
If you're having a problem using DHS data, and you've done all of the following:
and you still have a problem related to DHS data (rather than a problem using statistical software), post a question to the DHS Program User Forum where we will answer your question.