Don’t spin the bottle, please: Challenges in the implementation of probability sampling designs in the field, Part II

Mario Chen

In part I, I reviewed how the principles of probability samples apply to household sampling for valid (inferential) statistical analysis and focused on the challenges faced in selecting enumeration areas (EAs). When we want a probability sample of households, multi-stage sampling approaches are usually used, where EAs are selected in the first stage of sampling and households then selected from the sampled EAs in the second stage. (Additional stages may be added if deemed necessary.) In this post, I move on to the selection of households within the sampled EAs. I’ll focus on the sampling principles, challenges, approaches and recommendations.

Once EAs are selected, a random sample of households must be selected from all eligible households in the EA. Even if the number of households in the EAs is provided by the census bureau, it’s always important to update it before selection. Remember, you need to know the correct number of households from which the sample will be selected, so you can track selection probabilities.

Ideally, you would go through the selected EAs, and using a quick screening process, create a list of households meeting the eligibility criteria – for example, households with children under 5 years old. This complete household listing operation should always be done. However, when resources are tight, researchers sometimes look for creative ways to skip this step. Some of these approaches are definitely clever, but they should be closely scrutinized.

Let me tell you about the (in)famous “spin the bottle” approach. It goes something like this: the field team lands on the selected enumeration area; they find the center of the area; then they spin a bottle, start walking in the pointed direction and screen houses as they go. They select eligible households using a systematic process (e.g., select one in every five eligible households). If the first household to interview is selected randomly, this is still considered a random sample (with some limitations). So, what’s wrong with spinning the bottle, you may ask?

First, the process usually stops when the targeted sample size is achieved. If you don’t go through the entire area screening every household for eligibility, this process would miss portions of the population within the selected area, usually those in the outskirt of the neighborhoods. Second, this process is hard to monitor and document. How do you know the field team didn’t spin the bottle several times until the direction that is easiest to access is “selected”? How do you know that the first household in the selected direction was randomly selected? Will you know how many households were eligible for selection? Does spinning the bottle avert the validity threat? Maybe or maybe not 🙁 .

A general recommendation for rigorous sampling implementation is to have the selection decisions made by a group as far removed from the field as feasible.

A general recommendation for rigorous sampling implementation is to have the selection decisions made by a group as far removed from the field as feasible. For example, if household listing is done, this can be sent to the sampling statistician in a central office, who will draw the sample providing a list of randomly selected households to be visited. If this is not possible, the selection may be done by the local project leader or field supervisors. To the extent possible, the interviewers themselves should not be making the decisions regarding whom to interview. If interviewers themselves do end up making these decisions on the spot, they should be provided with clear documented procedures, including predetermined steps.

In our East Africa study discussed in part I, my colleagues and I did the following to reduce threats to validity in the household sampling. We printed household listing forms for each of the selected EAs with all EA identifying information pre-filled; we provided instructions for listing households; we gave a random number for each EA; and we wrote out instructions for identifying the random start and the systematic process for selecting households. These forms were reviewed in the field by the field supervisors and were also entered into an electronic database for further quality checks and record keeping.

Call backs and enlisting the support of community leadership to prepare the communities and increase community support may help minimize non-response.

Let’s say you listed and then selected the appropriate number of households randomly, you’re not home free yet. What do you do if people were not available for the data collection? Refused to participate? Threw tomatoes at you? For the integrity of the sample, every household selected must be included in the survey. This is simply not possible all the time. What do you do? As we discussed before regarding the practice of replacing EAs, similarly replacing households doesn’t solve the validity problem. Call backs and enlisting the support of community leadership to prepare the communities and increase community support may help minimize non-response. However, regardless of how many call backs you can feasibly do and how supportive the community is, achieving a 100% response rate is usually impossible; thus, the validity threat is not averted 🙁 .

Preserving the integrity of the selection of households and data collection, as discussed so far, will give you the probability sample your heart desired. However, let’s not lose sight of the many challenges that can arise unexpectedly in any given survey. For example, in our East Africa study, the local project lead and overall field work manager quit in the middle of data collection; some interviewers resigned and additional staff had to be quickly trained to replace them; and we had issues securing the necessary equipment for anthropometric measurements. Although probably not immediately obvious, these problems can also lead to deviations from the sampling principles, and can threaten the validity of the data collected. Always prepare for the unexpected. Preparation will save you lot of perspiration.

Select an experienced research group that knows the expected challenges in the specific context. Also, local sampling expert oversight is extremely useful.

This leads to my final recommendation, always involve local expertise. Select an experienced research group that knows the expected challenges in the specific context. Also, local sampling expert oversight is extremely useful. In our East Africa study, we enlisted the help of the statistician at our FHI 360 country office. His help was invaluable to quickly deal with unexpected issues. He knew the situation on the ground and could more effectively communicate with the local agencies involved. His participation in the field staff training and monitoring processes was extremely helpful as well.

How rigorous a probability sample is will be judged on how well you stick to the sampling principles. Nothing is 100% foolproof though; all you can do is minimize the chances for deviating from the sampling principles, document the processes, and account for the implementation problems in the analysis and interpretation of the findings. Happy sampling!

I’d like to acknowledge Eskindir Tenaw and Patrick Olsen for their contributions to the sampling processes in our East Africa study highlighted in this blog post.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Mario Chen

Related Posts

Sample size is not king: Challenges in the implementation of probability sampling designs in the field, Part I

Emojis convey language, why not a sampling lesson too?

A pathway for sampling success

Never miss an email

Our use of cookies