What’s wrong with the evidence for mHealth?

Adult hands using a maternal health app on a smart phone

There is good news and bad news about the evidence for mHealth interventions in low- and middle-income countries. The good news is there that is a lot of evidence from high-quality (counterfactual-based) studies about the effectiveness of mHealth interventions, that is, “a lot” compared to the overall evidence base for low- and middle-income countries. The bad news is that this body of evidence is not very useful for understanding the potential effectiveness of mHealth interventions at scale and over time. In this post, I present some analysis of the evidence base for mHealth interventions, and I argue that going forward, we need to rigorously evaluate the net impact of mHealth interventions in real program settings, at scale, over time and combined with cost-effectiveness analysis.

There is a lot of evidence
There is good news and bad news about the evidence for mHealth interventions in low- and middle-income countries.
As of August 2017, there were at least 147 impact evaluations of mHealth interventions conducted in low- and middle-income countries. I realize that may seem like an oddly specific number, so let me explain the source. A couple of years ago, my colleague Hannah Skelly and I produced an evidence map for ICT interventions in low- and middle-income countries (ICT4D). To do this, using a list of ICT intervention categories, we conducted a systematic search and screening of dozens of bibliographic indexes and websites to identify as many impact evaluations of ICT4D interventions as possible. By impact evaluation, we mean studies using experimental or quasi-experimental designs to measure the net effect of an intervention. We published the map and our analysis of it here, and you can read a blog post summary here, and learn more about how to read the evidence map here.

One of the 11 categories in the ICT4D evidence map is mHealth, which we define as interventions that use mobile and wireless devices to provide medical care. Of the 253 impact evaluations captured in the map, 147 are studies of mHealth interventions – by far the most evaluated intervention category in the map. So that’s the good news! Even compared to the evidence for other topics and sectors catalogued in other evidence maps for low- and middle-income countries, the quantity of studies for mHealth is large.

Most evidence comes from pilots
…pilot interventions often have limited ecological validity.
After we identified all the ICT4D impact evaluations from our systematic search, we coded these “included studies” for several features. One feature is whether the studied intervention was a pilot, meaning it was implemented for the purpose of the study, or was a program, meaning that it was implemented by the intended implementers at a normal scale. The results are striking – 92.5% of the mHealth development impact evaluations test pilot interventions. This finding is not too surprising. Researchers who work in health come from the tradition of clinical trials that test treatments and approaches in a more controlled research setting first. And many mHealth interventions (for example, SMS appointment reminders) are easy to implement on a pilot basis. This finding is concerning, however, because pilot interventions often have limited ecological validity. That is, they don’t look like what the program would be in the real world. We cannot, therefore, conclude that the net effect measured in the pilot study is the net effect we would see in the real world. I discuss further this problem of ecological validity when going from pilot to scale here.

The evidence comes from small samples

For the subset of mHealth studies, I along with my fabulous then-intern Katherine Whitton coded each study for the sample size used for estimating the net effect of the intervention. All but three of the included studies measure outcomes at the individual level. The average sample size across the studies is 761, but this average is driven by a huge outlier. The median sample size is only 328. The scatterplot in figure 1 below shows the distribution of sample sizes with a single outlier above 20,000. Many of these samples are also convenience or purposive samples. For example, the researchers recruit participants from patients attending a clinic over a certain period of time. Typically, the researchers then randomly assign the recruited patients to treatment and control, so the internal validity of the findings is strong, but the sampling method and size weaken the external validity of the findings.

Figure 1. Scatterplot of study sample sizes

Figure 1. Scatterplot of study sample sizes

The duration of implementation is short

Katherine the great and I also coded the mHealth studies for the length of time the intervention was implemented before the end-line outcome data were collected. From the ICT4D evidence map data, we also know whether or not there were any outcomes measured after end-line. The average duration of implementation of the mHealth interventions studied is 27 weeks, or roughly six months. Similar to the sample size variable, however, this average is driven by outliers. The median duration is only 17 weeks. And only 12 of the 147 studies measure the outcome any time after the initial end-line.

It may certainly be the case that these interventions are designed to have effects after a short period of time, so the short durations may be appropriate to test the mechanism of the intervention. It is difficult to believe, however, that the first 17 weeks of any program is reflective of the program over time. In other words, these short durations also do not have ecological validity. One of the issues I raise in my pilot-to-scale post is the novelty effect. Cool new tools and approaches may have large effects in the beginning but then lose effect over time, as anyone who has learned to turn notifications off can attest. The novelty effect is one reason why it is so important to evaluate interventions over the long run.

Few studies measure cost
Only 16% of the mHealth development impact evaluations provide any information about cost.
Last, but certainly not least, we find that only 16% of the mHealth development impact evaluations provide any information about cost. When we coded this variable for the ICT4D evidence map, we were very generous. That is, we coded a study as including cost information if it includes any information about the cost of the intervention that could be used to think about cost-effectiveness. We did not require the study to address cost-effectiveness directly. Yet, for the full ICT4D sample, the share of studies including cost information is 18.6% (Brown and Skelly, 2019) and for the mHealth subset, only 16% include cost information. Ironically, many of the studies mention cost-effectiveness as a motivation for exploring mHealth approaches. I even found a few that have cost-effectiveness as a keyword for the article and yet provide no cost information!

Discussion

From the standpoint of measuring attributable effect sizes, we have a lot of high-quality evidence about the effectiveness of mHealth interventions. Unfortunately, this evidence is not very useful for programming. It comes from evaluations of pilot interventions, implemented on small samples, over short periods of time without measuring outcomes at any point after the initial end-line. And these evaluations do not provide cost data to help evidence users assess cost-effectiveness. Some may argue that these impact evaluations test mechanisms, and the programs at scale can then be evaluated with monitoring data or non-counterfactual implementation science. I caution against that, especially where we have novel interventions in contexts that are complex and rapidly changing.

Let’s look at an example. Nsagha et al. (2016) test an SMS reminder intervention for improving adherence to treatment and care among people living with HIV and AIDS in Cameroon. Forty-five treatment patients received four messages a week for four weeks while the 45 control patients received standard of care. After the four weeks, 64.4% of the treatment group adhered to antiretroviral medications and only 44.2% of the control group did (p = 0.05). This result is promising, but it is measured on a sample size of 90 after only four weeks. And the simple intervention is addressing a complex problem.

The authors document for the sample that the reasons for missing treatment and for non-adherence are, in descending order of prevalence, late homecoming, involvement in outdoor business, antiretroviral stockout, forgetfulness, traveling out of station without medication, and not belonging to a support group. The SMS reminders can help with forgetfulness and perhaps traveling out of station without medication, but not with the others. Put differently, there are many factors that influence adherence, and SMS reminders do not address them all.

Small pilots can be very useful for establishing feasibility and identifying promising mechanisms, but to inform programmatic decisions about mHealth, we need more evidence from rigorous evaluations conducted on programs at scale and over time.

Suppose this pilot intervention is scaled up and monitored over the course of a year. Would you feel comfortable attributing a before-and-after change in adherence of 10 percentage points entirely to the SMS reminders? It does matter how much of that change you attribute to the intervention, because that is the only way to know whether the money you are spending on the SMS system is worthwhile. What if adherence decreased by 10 percentage points? Would you conclude that the SMS reminders had a negative effect on adherence?

My point is simply that small pilots can be very useful for establishing feasibility and identifying promising mechanisms, but to inform programmatic decisions about mHealth, we need more evidence from rigorous evaluations conducted on programs at scale and over time.


This post is based on a lightning talk I delivered at the 2018 Global Digital Health Forum in Washington, DC. The supplementary materials for this research are available here.

Photo credit: Ericsson/CC BY-NC-ND 2.0 license

Sharing is caring!