Riddle me this: How many interviews (or focus groups) are enough?

This blog post is the final in a series of three sampling-focused posts.

The first two posts in this series describe commonly used research sampling strategies and provide some guidance on how to choose from this range of sampling methods. Here we delve further into the sampling world and address sample sizes for qualitative research and evaluation projects. Specifically, we address the often-asked question: How many in-depth interviews/focus groups do I need to conduct for my study?

Within the qualitative literature (and community of practice), the concept of “saturation” – the point when incoming data produce little or no new information – is the well-accepted standard by which sample sizes for qualitative inquiry are determined (Guest et al., 2006; Guest and MacQueen, 2008). There’s just one small problem with this: saturation, by definition, can be determined only during or after data analysis. And most of us need to justify our sample sizes (to funders, ethics committees, etc.) before collecting data!

Until relatively recently, researchers and evaluators had to rely on rules of thumb or their personal experiences to estimate how many qualitative data collection events they needed for a study; empirical data to support these sample sizes were virtually non-existent. This began to change a little over a decade ago. Morgan and colleagues (2002) decided to plot (and publish!) the number of new concepts identified in successive interviews across four datasets. They found that nearly no new concepts were found after 20 interviews. Extrapolating from their data, we see that the first five to six in-depth interviews produced the majority of new data, and approximately 80% to 92% of concepts were identified within the first 10 interviews.

Building on this work, Guest et al. (2006) conducted a systematic inductive thematic analysis of 60 in-depth interviews among female sex workers in West Africa. Of the 114 themes identified in the entire dataset, 80 (70%) turned up in the first six interviews, and 100 themes (92%) were identified within the first 12 interviews (Figure 1). Additionally, those 100 themes comprised 97% of the most common (highest prevalence) themes, indicating that the “big ones” were evident early on.

Figure 1. Number of new codes identified in batches of six individual interviews (Guest et al., 2006)

Figure 1. Number of new codes identified in batches of six individual interviews (Guest et al., 2006)

Since Guest et al.’s publication in 2006, other researchers have confirmed that 6-12 interviews seem to be a sweet spot for the number of qualitative interviews needed to reach saturation. We provide the following table as a summary.

Study authorsSaturation definitionFindings
Individual interviews
Morgan and colleagues (2002)Not defined
  • 5-6 interviews for most concepts

  • In all four sets of interviews, approximately 80-92% of concepts identified within 10 interviews (extrapolated from reported data)

Guest et al. (2006)The proportion of identified themes at a given point in analysis divided by the total number of themes identified in that analysis
  • 6 interviews to reach 70% saturation

  • 12 interviews to reach 92% saturation

Francis et al. (2010) (gated)The point, after conducting 10 interviews, when three additional interviews yield no new themes
  • Most themes in both studies identified within 5-6 interviews

  • Saturation reached within 17 interviews in one study, and not reached in 14 interviews in a second study

Coenen et al. (2012) (gated)The point at which linking concepts from two consecutive focus groups or individual interviews reveals no additional second-level categories
  • Inductive approach: 13 interviews to reach saturation

  • Deductive approach: 8 interviews to reach saturation

Hagaman and Wutich (2016) (gated)The number of interviews required to identify the most common themes in a total of three interviews
  • Less than 16 interviews at site level

  • 20-40 interviews to identify cross-cultural meta-themes

Namey, et al. (2016)The proportion of identified themes at a given point in analysis divided by the total number of themes identified in that analysis
    At the median:
  • 8 interviews to reach 80% saturation (range 5-11)

  • 16 interviews to reach 90% saturation (range 11-26)

“But what about focus groups?” you ask. An empirically-based study by Coenen et al. (2012) (gated) found that five focus groups were enough to reach saturation for their inductive thematic analysis. In a recent methodological study (gated), we followed a similar approach used by Guest et al. (2006) and monitored thematic discovery and code creation after each of 40 focus groups conducted among African-American men in North Carolina on the topic of health-seeking behavior (more on this study and its methodological findings here). We found the majority of themes were identified within the first focus group, and nearly all of the important (read most frequently expressed) themes were discovered within the first three focus groups (Figure 2).

Figure 2. Average number of new codes identified per focus group (focus groups randomly ordered) (Guest et al., 2016)

Figure 2. Average number of new codes identified per focus group (focus groups randomly ordered) (Guest et al., 2016)

These data from our study suggest that a sample size of two to three focus groups will likely capture about 80% of themes on a topic — including those most broadly shared — in a study with a relatively homogeneous population, and using a semi-structured guide. As few as three to six focus groups are likely enough to identify 90% of important themes.

Note that these sample sizes, for both interviews and focus groups, apply per sub-population of interest. Note too that thematic saturation will vary based on a number of factors (keep watch for a future blog post) and sample size should be adjusted accordingly.

Use this catchy poem to remember how many in-depth interviews or focus groups you need.

Sampling to reach saturation?
Here’s the magical equation:
For interviews, to do them well,
choose a sample from 6-12*;
If focus groups are in the mix,
aim to conduct 3-6*.

(Okay, equation it is not
But empirical guidance helps a lot!)
*per sub-population of interest

Sharing is caring!

16 thoughts on “Riddle me this: How many interviews (or focus groups) are enough?

  1. It’s great you are taking the time to research this on behalf of the rest of us in the eval world. I assume the term ‘homogenous’ to define the target population also assumes single gender, so one might double these suggested numbers to reach saturation point for women and for men separately.

  2. That’s a great question. In my view, “sample homogeneity” is a relative term, one that is related to how a study population is defined, one’s research objectives, and how participants’ experiences (and therefore responses) might be expected to vary across a particular dimension, such as gender. For many research topics we would expect experiences & perceptions to differ between men and women. In those cases, a homogeneous sample would typically be single gendered. In contexts where gender might play a very minor role in response variability – say, for e.g., food preferences – one could include both men and women, but cultural/ethnic background would be the variable to keep homogeneous. My 2 cents…

    • I agree with Greg. We refer to the suggested sample sizes as “per sub-population of interest”, and men and women may be different sub-populations within your wider sample, dependent on the context. Keep an eye out for a blog post discussing how the level of sample homogeneity – and other factors – might affect thematic saturation, and therefore sample size!

  3. Very interesting. Very useful. If I understood the conclusions correctly, it does go back to traditional rule of thumb approaches; i.e. 1) at least four focus groups are adequate to answer one’s questions for every given sub-population, and 2) for in-depth interviews one best targets an a priori sample of 12 individuals, by sub-group, but stop inclusion of participants once no new themes/ideas are being generated. The catch, or course, is that one should be carefully reviewing scripts as data is collected. I like the idea that it confirms one may make advance decisions on set targets.

  4. Thanks, Jane! You’ve got it – we now have empirical data to back-up/justify those rules-of-thumb when we are writing proposals, budgeting, and seeking ethical approvals!

    • Yes, absolutely, Daniel! Thanks for your comment and link. Stay tuned for the next post in this series that will address some of the factors that affect saturation – to help identify whether a small or large pinch of salt should go into saturation-based qualitative sample size calculations!

  5. Thanks for this post — really useful. One question I have is, did any of these studies consider the interviewing skills of the interviewer? Or mention what training the interviewers had in advance of doing the semi-structured interviews? One challenge I’m facing in my work is that many members of my team have not been trained in interview techniques before, so I imagine the information we collect may be more limited compared to someone with more advanced skills and therefore might require a greater number of interviews in order to reach saturation.

  6. Hi Mia, great questions! I can’t speak to all of the studies reviewed on the question of interviewer skill, but I’d agree with your observations that the data collectors are a factor (among many) that can influence how many interviews you’ll need to reach saturation. We’re working on a post that will highlight some of the primary factors that can influence saturation, and I’ll make sure this is included in the discussion if it’s not there already.

    As an aside from the sampling discussion, interviewer training really is key to generating good qualitative data – and not just training in terms of interviewing skill, but also in making sure that everyone has a common understanding of the research objectives. Some people have a more natural affinity for interviewing than others, but if you can provide your interviewers with pretty immediate feedback (e.g., after the first interview) on the questions you still have after reading a transcript or the types of follow-ups that would be helpful, you can probably somewhat mitigate the lack of training and close the gap on the number of interviews needed to get the information to address your objectives (and reach saturation).

  7. May I ask if you were conducting surveys via email – what would be an acceptable number of completed surveys to aim for?

    • Hi Diana,
      I’m interpreting your question as asking about quantitative, fixed-response surveys, which would require a different sampling strategy and sample size than what’s discussed here. One of the earlier blogs in this series (http://res4evidence.wpengine.com/pathway-sampling-success) could help with the strategy piece; the sample size would be dependent on that and the specifics of your research question.

  8. Thanks for this illuminating post. in my research proposal, I had about 6-12 interviews based on Guests et al 2006 and a few other recommendations, I think I had 1-3 FGDs. I have now collected data from 2 different states were I conducted 13 interviews and 3 FGDs in the first and 8 interviews and 1 FGD in the next. When I started collecting data from the second state, I reached data saturation much quicker and for the FGDs i was not getting much different data from the first state. I am now wondering if I have taken the right approach. Was the 6-12 interviews and 1-3 FGDs recommended per each round of data collection?

    • Hi Abisola,
      Yes, it sounds like you interpreted the recommendations correctly – that those sample sizes are per sub-population of interest. In your case, I would have considered the two states as two sub-populations, as you did. Your finding that there wasn’t as much difference as you might have thought between what you heard in State 1 and State 2 isn’t surprising – Guest et al. found the same in the data on which the 2006 article was based, and their sub-populations were in two different countries! How quickly you reach saturation will be dependent on the homogeneity of the sample(s) and the level of “sharedness” of the information you’re trying to learn from them. In your case, it sounds like the types of information or experiences you’re collecting are shared across the two sites. It’s hard to know this ahead of time though, so I would have planned sample sizes as you did, with equal numbers in both.

  9. amazing article and from my experiences in M&E surveys , I can say this is fact where there is no new information can be getting after 2 to 3 (Female /Males ) FGDs in the same community but it can be change slightly in other communities with different characteristic , therefore I advise to have cluster sample then according to this result you can start identify 2 FGDs or more with considering gender in each site of this cluster sample

  10. Emily and Greg, this is brilliant! Just what I needed today, and described in such simplistic and fun way.

Leave a Reply to Christine Lasway Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.