Spell check for research? Confronting the reproducibility crisis with explainable AI

January 19, 2021

You might have heard of the famous marshmallow test.” The study, and follow-ups that extended over decades, found that children who could delay gratification—and consequently get two marshmallows instead of one—tended to have better life outcomes.

But a 2018 study challenged those results. And many other widely accepted findings from high-profile studies of the past have faced new scrutiny over the last decade. Since the nature of science is one study building upon another, what has come to be known as the reproducibility crisis (or replication crisis) has cast a shadow on some important discoveries of the past, particularly in the fields of psychology, economics and medicine.

Now two NIU computer scientists, working with a colleague at Northwestern University, are hoping to make headway in addressing the reproducibility crisis by taking the first steps toward developing software akin to a spell checker for research—a program that would help scholars gauge the likelihood of their study results being successfully reproduced or replicated.

The National Science Foundation (NSF) has awarded $300,000 in grant funding for the project to NIU Computer Science Professors Hamed Alhoori and David Koop, along with colleague Brian Uzzi of Northwestern’s Kellogg School of Management. Their project will make use of “explainable AI,” or artificial intelligence technology that finds solutions that can be understood by people.

Across the globe, millions of research papers are published by scholars annually.

NIU Computer Science Professor Hamed Alhoori

“Each year, more than $2 trillion is spent on research and development worldwide, so findings that don’t hold up represent an enormous waste of time and money,” said Alhoori, the lead researcher on the project. He also points to a 2016 survey by the prestigious journal Nature finding that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

“If we can’t validate published findings, it’s clearly a problem,” Alhoori said. “When scientists see research published in a reputable journal, we think we can build on the work.”

There’s also public trust to consider, Koop added.

NIU Computer Science Professor David Koop

“People hear about retractions (of scientific studies) and assume they can’t trust science,” he says. “It’s important to minimize those events as much as possible.”

Over the next two years, the researchers plan to begin developing metrics and tools to help make reproduction of existing work more efficient. They also want to help scientists, scholars and technologists self-evaluate their work before publishing it.

“One of our major goals is to quantify the level of confidence that a work is likely to be reproducible by leveraging both human and machine intelligence,” Alhoori said. “Ideally, researchers would have something like spell check.”

The terms “reproducibility” and “replicability” are often used interchangeably. However, the former more specifically refers to computational reproducibility, or repeating an experiment using the same data and methods to obtain consistent results. Replicability aims to answer the same scientific question with new data. The thrust of the new research could aid either process but is focused on reproducibility—analyzing the data originally captured and the computer code used to produce models and/or results.

The scientists plan to assemble new datasets covering the success and failure of hundreds of scientific papers and identifying the common threads among research that holds up to scrutiny versus studies that unravel when being reproduced. A range of metrics will be analyzed from the scholarly text, images, tables, methods, computer code, computational notebooks and results.

“There are signals that could provide clues as to whether a work is reproducible,” Koop said.

The research team is aiming to develop interpretable machine learning and deep learning models that will estimate a confidence level in a work’s reproducibility. However, Koop notes that tools that work in one type of discipline might not be appropriate for another. “The tools we develop will probably need to be different depending on the type of publication,” he said.

So, is a spell checker for research imminent? Not yet, Alhoori said, but he and his colleagues hope to make strides in that direction.

Media Contact: Tom Parisi

About NIU

Northern Illinois University is a student-centered, nationally recognized public research university, with expertise that benefits its region and spans the globe in a wide variety of fields, including the sciences, humanities, arts, business, engineering, education, health and law. Through its main campus in DeKalb, Illinois, and education centers for students and working professionals in Chicago, Hoffman Estates, Naperville, Oregon and Rockford, NIU offers more than 100 areas of study while serving a diverse and international student body.