Several weeks ago I was on a review team (a Red Team, for those in the know) for a proposal my colleagues were developing for the Bill & Melinda Gates Foundation. One comment I kept coming back to was that they needed to be more specific so that the reviewers would have a clear mental picture of what they are proposing. Shortly thereafter, a new working paper [gated*] that seemingly challenges this advice hit the streets, or more specifically, hit the tweets. In this study, Julian Kolev, Yuly Fuentes-Medel, and Fiona Murray analyze 6,794 proposals submitted to the Gates Foundation and find that “narrow” words are associated with lower proposal scores and “broad” words are associated with higher proposal scores. More to the point for this study, women are more likely to use these narrow words and men more likely to use the broad ones.
I was hit by a pang of guilt! Had I given the wrong advice? I don’t think so, but let me tell you more about this very interesting study so that I can explain why. (Hint: it has to do with how narrow and broad are measured and what that might mean for innovation.)
The study set up
Their dataset includes all the proposals submitted to the Gates Foundation Global Challenges: Exploration (GCE) Program for infectious disease research from 2008 through 2017 by U.S.-based applicants with academic or non-profit affiliations plus the scores for all those proposals. For this grant program, the proposal reviewers are blinded, meaning they don’t have any information about the applicants other than proposal details. You might think that reviewers can easily guess who applicants are, as often happens with blinded journal referees, but for GCE the reviewers come from many fields and backgrounds, including the private sector and government. So there doesn’t seem to be a concern about that.
Not only did Kolev et al. code information from each of these proposals and about each of the reviewers and their scores, they also collected information about career length and publication history for all the applicants and subsequent career outcomes for a subset of applicants. This is an impressive dataset. The data also allow them to do some useful things methodologically. First, because each proposal is scored by multiple reviewers, Kolev et al. are able to control for applicant quality and the proposal idea by comparing scores from different reviewers for the same proposal. Second, they use a regression discontinuity approach to look for a differential effect of receiving funding on male and female applicants by comparing those receiving funding to those just below the cut-off, that is, comparing later outcomes for those applicants who received similar scores.
Unable to eliminate the disparity using more conventional explanations, Kolev et al. analyze the text of the proposals submitted. In particular, they look at word choice. They code words as “narrow” and “broad” and then look for associations between word type and gender as well as associations between word type and reviewers’ scores. For me the most interesting figure in the working paper is figure 6. In a chart with four quadrants, the figure shows words in terms of gender-based use (used more frequently by one gender), score disparity (appear more frequently in high-scoring proposals), and type (narrow or broad). A quick look at the figure shows that men are more likely to use broad words and broad words appear more in high-scoring proposals, while women are more likely to use narrow words, and these appear more in low-scoring proposals. Kolev et al. support these findings with econometrics.
Which words matter?
There are three broad words in the quadrant of words used more by men and appearing more frequently in high-scoring proposals: bacteria, detection, control. There are five narrow words in the quadrant of words used more by women and appearing less frequently in high-scoring proposals: contraceptive**, brain, oral, health, community. Looking closely at this figure, my first response was, “wait, what?! How is it that bacteria is a broad word and health is a narrow word?”
Not surprising (to me anyway) the crux of the matter is measurement. Kolev et al. measure narrowness and broadness of words by looking at the distribution of word choice in the sample proposals across the 10 topics within infectious disease research. Examples of the topics are HIV, malaria and diarrhea. If a word appears at about the same rate in proposals across all the topics, it is considered “broad”. If a word appears significantly more often in proposals under some topics compared to other topics, then the word is considered “narrow”. So that means proposals in only some of the 10 topics use the word health a lot but not in the other topics. Conversely, proposals across all 10 topics use the word bacteria about as often.
What does this mean for innovation?
I hypothesize that truly innovative proposals use more specific words. How can you describe something that is new and different within a topic if you are using words common across topics? According to this study, many people doing infectious disease research take detection into account, but not many take community into account. That suggests to me that proposals that consider community are more likely to be innovative.
What I would love to see as follow-on research is an in-depth assessment of the true innovativeness of a subset of proposals and then an analysis of proposal word choice by innovativeness.
What does this mean for proposal writing and review?
It is important to recognize that the Gates GCE review process is different from most. Remember from above that for GCE, Gates enlists reviewers from broad backgrounds including outside of science. This selection contrasts with a funder like NIH that selects reviewers with expertise in the specific topic they will be reviewing. I suspect that the Gates Foundation expects that a more diverse group of reviewers is better able to identify innovation. A priori, that makes some sense. But the opposite seems to be true, at least for male reviewers. The evidence suggests that the men among these reviewers are “overly credulous to the broad claims” of proposals.
Did I give my colleagues the wrong advice? No, because the proposal they were submitting was in response to a request for concepts in a specific area of work, and thus, I expect, scored by people working in that area. These reviewers, like NIH reviewers, should be able to spot innovation.
Do I write like a girl? I hope so!
*I’m surprised that the Gates Foundation is allowing researchers using foundation data to publish in a gated working paper series.
**My own hypothesis for the word contraceptive is simply that women are more likely to propose innovations related to contraception, and men are less likely to care about it. This hypothesis might apply to the word oral as well, as it often appears along with contraceptive.