- ABOUT THE DATA
- UNDERSTANDING SURVEY STATISTICS
- WORKING WITH DATASETS
- DATASET ACCESS
Using Datasets for Analysis
Whether you've analyzed DHS data before or are a first-time user, below are some resources to help you analyze DHS data efficiently.
Step-by-step introduction to analyzing DHS data
Step 1: Select surveys for analysis
Step 2: Review questionnaires
Step 3: Register for dataset access
Step 4: Download datasets
Step 5: Open your dataset
Step 6: Get to know your variables
Step 7: Use sample weights
Step 8: Consider special values
Step 1: Select surveys for analysis. Which surveys are you interested in using? See a list of surveys by country, type of survey, year, search by survey characteristics (for example, surveys that included HIV testing, or the Domestic Violence module), or use the full survey search.
Step 2: Review questionnaires. Familiarize yourself with the questionnaires used to collect the data that you want to analyze. Model questionnaires are used for each survey phase , but each country modifies the core questionnaire slightly to meet their needs. The questionnaires used to collect data for a specific survey are always included at the back of each survey's final report. All final reports are free to download, and in some cases, can be ordered in hard copy, also for free.
Use the questionnaires to determine whether the information you want to analyze was collected in your survey of interest, and who you want to analyze (your unit of analysis).
If the data you want to analyze was collected for everyone listed in the household questionnaire, your unit of analysis is probably household members. On the other hand, if for example you want to analyze data about women's contraceptive use, you will find that the relevant questions were asked in the women's questionnaire, and your unit of analysis is women. The unit of analysis will help you determine which dataset you want to download in step 4.
Step 3: Register for dataset access. All DHS datasets are free to download and use. To download datasets, you must complete a short registration form. Remember your username and password; you can use it later to login quickly and register for access to additional datasets. Learn more about why we require registration
Requests to access datasets are usually approved within 24 hours. You will receive an email from firstname.lastname@example.org once your request has been approved with instructions for download.
Step 4: Download datasets. Follow instructions from the email you received after registering. Once you log in to dhsprogram.com, you will see the country, survey, and list of datasets that you are approved to download. View the full tutorial on how to download DHS datasets in the video below:
If for some reason you requested a DHS datasets, but need to modify your request or give additional information to gain dataset approval, you can find the instructions in the video below:
The list of Zip files containing datasets are labeled with brief but meaningful names, such as KEIR41DT. The full description of file naming conventions is here, but briefly:
- The first two letters ("KE") refer to the country – in this case, Kenya. The country code list is here.
- The second two letters ("IR") refer to the data file type. IR is the individual (women's) recode file, MR is the men's recode, HR is the household recode, etc. The complete list of data file types is here. Based on your review of the questionnaires, select the file type you need for your unit of analysis.
- The next two characters ("41") refer to the phase and number of the survey. A complete explanation of this numbering is here. If you are only analyzing one survey, all datasets from that survey will have the same numbering.
- The last two letters refer to the software program you want to use. The DT file contains the Stata (.DTA) data file and associated documentation; The SV file contains the SPSS (.SAV) file; the SD file contains the SAS (.SAS7BDAT) file; and the FL file contains an ASCII file and dictionaries.
If you would like to download more than one dataset, please see the tutorial below to download multiple DHS datasets.
Step 5: Open your dataset in the software you are using for analysis.
A note for Stata users: if your memory and maximum number of variables (maxvar) have not been adjusted from the factory settings, you may get an error message when trying to open DHS datasets, which are very large:
Change the memory and maxvar settings. Try
set memory 450m
set maxvar 10000
to start. You may be able to set these values higher depending on your computer. These settings should allow you to open a DHS dataset.
Step 6: Get to know your variables. When your dataset is open, you will see thousands of variables with confusing names and very short variable labels that briefly describe the contents of each variable. To understand each variable and its contents, get to know the DHS recode manual. Some analysts refer to the recode manual as the "DHS Analysis Bible." Why is the recode manual so important? Here's an example:
In your dataset (assuming you are using an IR, BR, KR, or MR file) check the label of v107 (mv107). The label says "highest year of education." If you analyze this variable assuming it is the respondent's highest year of education, you will have highly misleading results. Why? Because the variable label needs to be short, and so cannot give complete information about every variable included in the dataset. Download the DHS recode manual and look through it to find v107. See that v107 is the highest year of education at the level recorded in v106. Had you analyzed v107 as the highest years of education, you would have seriously underestimated the level of education in the country you are studying. This is just one example of why it is important to use the DHS recode manual.
Step 7: Use sample weights. DHS sample weights are used in almost every tabulation in DHS final reports. The few unweighted tables are clearly labeled. Sample weights are described fully in the Guide to DHS Statistics but briefly, weights are used in all analyses to make sample data representative of the entire population. There are different weights for different sample selections/units of analysis:
|Sample weights in DHS datasets|
|Unit of analysis||Variable|
|Women or children||v005|
|HIV test results||hiv05|
like other variables in DHS datasets, decimal points are not included in the weight variable. Analysts need to divide the sampling weight they are using by 1,000,000. Examples:
generate wgt = v005/1000000
tab var [iweight=wgt]
COMPUTE WGT = V005/1000000.
WEIGHT by WGT.
These are just examples; other types of weights are available in different software packages.
Step 8: Consider special values. As you analyze, make sure to account for missing values and other exceptions. If you're trying to replicate tables in the DHS final report, check the notes here or the FAQs. To produce some of the more complex DHS indicators, the Guide to DHS Statistics (or the online Guide to DHS Statistics) is an invaluable resource, and like all DHS publications, is free to download and use.
If you're having a problem using DHS data, and you've done all of the following:
- Made sure you're using the correct data file
- Made sure you're using the correct weights
- Checked the questionnaire to make sure the question was asked in the way you think it was in your survey
- Checked the DHS Recode Manual
- Checked the DHS Guide to Statistics
- Reviewed the Using DHS Datasets for Analysis YouTube playlist.
- Reviewed the Matching DHS Tables YouTube playlist.
Finally, if you are still have a problem related to DHS data (rather than a problem using statistical software), post a question to the DHS Program User Forum where we will answer your question.
The DHS Program is authorized to distribute, at no cost, unrestricted survey data files for legitimate academic research. Registration is required for access to data.
Guide to Using Datasets