Addressing bias in our systematic review of STEM research

Research is a conversation. Researchers attempt to answer a study question, and then other groups of researchers support, contest or expand on those findings. Over the years, this process produces a body of evidence representing the scientific community’s conversation on a given topic. But what did those research teams have to say? What did they determine is the answer to the question? How did they arrive at that answer?

A systematic review allows for a rigorous and unbiased assessment of the current evidence to date.
That is where a systematic review enters the conversation. We know, for example, that a significant amount of research exists exploring gender differences in mathematics achievement, but it is unclear how girls’ math identity contributes to or ameliorates this disparity. In response, we are conducting a systematic review to understand how improving girls’ math identity supports their participation, engagement and achievement in math. (For details on why this is important, read more from Merle Froschl, FHI 360’s Director of Educational Equity.) This review will assist us in moving from a more subjective understanding of the issue to a rigorous and unbiased assessment of the current evidence to date.

Systematic reviews apply a rigorous methodology for identifying and synthesizing evidence on a research topic, which may include a variety of social and health-related subjects. (Check out these useful resources and libraries from the Campbell Collaboration and Cochrane Collaboration.) The goal of a systematic review is to examine the literature in a way that is: 1) exhaustive, 2) reproducible and 3) objective in the interpretation of findings. Developing a systematic review protocol requires thoughtful decision-making about how to reduce various forms of bias at each stage of the process. Below we discuss some of the decisions made to reduce bias in our systematic review exploring girls’ math identity, in the hopes that it will inform others undertaking similar efforts. In particular, we designed our protocol to address source selection bias, publication bias, construct validity, reviewer bias and conclusion bias.

Planning our systematic review was a collaborative process requiring insights from both content and methodological experts. As a team, we identified potential areas of bias, created strategies for improving the validity of the review itself, and developed a detailed protocol (available here) to structure our efforts. This process will ultimately strengthen our ability to successfully summarize the evidence and provide insights on how to effectively foster girls’ interest and engagement in math and related fields.

Source selection bias can arise if researchers select an inappropriate assortment of sources.

One of the first sources of bias addressed was source selection bias, which can arise if researchers select an inappropriate assortment of sources. Articles should be selected from multiple databases representing literature across all relevant disciplines. For example, our review draws from literature in the education, sociology, psychology and gender fields (see Box 1). Incorporating literature from various disciplines improves the likelihood of capturing differing, equally important perspectives on a topic, thereby eliminating “group think” or the echo chamber effect. For example, education experts may examine the review question from a pedagogical standpoint, whereas gender experts may assess societal expectations of girls versus boys when it comes to excelling in math.

Related to source selection bias is publication bias, which describes a situation where research is not representative of all studies on a topic due to standards for publishing in peer-reviewed journals. For example, it is challenging to publish null findings from a study, which may mean the peer-reviewed literature generated from a database search has a positive bias, or tends to show more favorable study results. One solution we applied was to include high-quality grey literature – or non-peer-reviewed articles – to complement the database search. As a primary source for high-quality grey literature, we elected to use Academic Search Premier as a database, which hosts graduate dissertations and educational reports in addition to peer-reviewed articles. We also plan to conduct hand-searches, or reviews of reference lists of relevant systematic reviews or meta-analyses identified through our search process. We will incorporate any peer-reviewed articles and grey literature of relevance that were missed through the database search.

Publication bias describes a situation where research is not representative of all studies on a topic due to standards for publishing in peer-reviewed journals.

Before executing the database search, however, key concepts were clearly defined. In order to reduce threats to construct validity in the systematic review, the included studies must measure key concepts the way the review team defined them. For example, our review built consensus grounded in theory on a definition of girls’ math identity, i.e. girls’ beliefs, attitudes, emotions, and dispositions about math and their resulting motivation to engage and persist in related activities. We then mapped out the various ways this concept may be captured and operationalized across disciplines. Our team eventually agreed on the following list of terms in their various forms: identity, perception, attitude, disposition, belief and self-concept.

Our process was strengthened through the support of a reference librarian, who provided guidance on which terms may or may not be appropriate to include in different database searches. For instance, we hoped to use the term “STEM” as the acronym for the fields of science, technology, engineering and math. STEM is widely recognized and often not spelled out in research. However, in certain databases, this term created significant noise, pulling in irrelevant articles regarding stem cells and brain stems. The process of creating a list of search terms that adequately represents key concepts while also eliminating unwanted articles was iterative and relied heavily on the combined knowledge of content and methodological experts.

Reviewer bias is a type of systematic error introduced when a reviewer consistently misinterprets eligibility criteria or otherwise imposes their personal bias on the decision-making process.
The next potential source of bias we attempted to address was individual reviewer bias in the selection and screening of articles. Reviewer bias is a type of systematic error introduced when a reviewer consistently misinterprets eligibility criteria or otherwise imposes their personal bias on the decision-making process. A list of eligibility criteria, which clearly distinguishes studies that should be included and excluded, helps reinforce construct validity while reducing reviewer bias. Moreover, each article is compared against inclusion criteria through a two-stage screening process: 1) a title and abstract screening and 2) a full text review. The two-stage screening process and subsequent data extraction will be conducted by two researchers to reduce the risk of reviewer bias.

By using at least two reviewers, the team can assess inter-rater reliability, or the extent to which reviewers consistently agree on which articles should be included versus excluded. We conducted a pilot training, where all reviewers assessed the eligibility of 20 articles. After each person screened the articles by title and abstract, the team discussed the group’s decisions to move articles to the next stage. This pilot was an opportunity for lead investigators to train reviewers on key concepts and criteria, while also clarifying written procedures to ensure reproducibility.

The risk of bias assessment helps improve conclusion validity.
It may be tempting to think of bias in systematic reviews only in terms of how the review itself is conducted. However, systematic reviewers must also consider the internal validity of individual studies included in the review. We will include a risk of bias or quality assessment, through which we evaluate threats to internal validity for included studies. The risk of bias assessment helps improve conclusion validity by ensuring that conclusions drawn on girls’ math identity reflect the quality of the evidence provided. Our review will take a broad approach to understanding how girls’ math identity supports achievement in math and will likely include a number of quasi-experimental and non-experimental studies. Therefore, each study’s findings will be considered within the context of a bias assessment specific to their design. Taken together, these assessments will provide intel into the extent to which we can rely on the study’s findings.

As described, many steps can be taken to improve the validity of the review itself. Yet, this is not an exhaustive list of types of bias and the approaches for addressing each. The role of the reviewer is to be cognizant of these challenges and provide a transparent approach for addressing them. In this way, systematic reviews can foster even greater dialogue on a particular topic, including evidence-based approaches for nurturing girls’ interest and engagement in STEM-related fields. If you have or know of a study on this topic that is in the grey literature or in process, please alert us to it in the comments or by sending us an email.

Photo credit: Jessica Scranton/FHI 360

Sharing is caring!