Could you repeat that? Fixing the 'replication crisis' in biomedical research has become top priority

Worldwide, retractions of published papers are growing. A new effort at Johns Hopkins aims to improve standards and protocols to make science reproducible.

Illustration shows a pyramid of scientists dropping fluid into a beaker; the scientist at the top of the pyramid is dropping something red in

Credit: Dave Plunkert

The following article originally appeared in Hopkins Medicine magazine.

Sarven Sabunciyan was intrigued.

He had been reading about xenotropic murine leukemia virus-related virus, or XMRV, a virus never before seen in humans. But according to attention-grabbing studies in PLoS Pathogens and Science, it was now showing up in people with prostate cancer and chronic fatigue syndrome.

Sabunciyan, a pediatric neurovirologist in the Johns Hopkins University School of Medicine, studies the role of viruses in psychiatric diseases and behaviors. He decided to see if people with schizophrenia also had XMRV, a finding that would yield important clues about the mental disorder and how to treat it.

His first step was to find people who had been exposed to the virus. Sabunciyan's team developed a test that could detect antibodies that XMRV would have left behind. "Even if you recover from an infection, you would retain some antibodies," he explains. But no sign of the antibody could be found in blood samples of people with or without schizophrenia.

"We figured maybe we weren't looking at the right disease," he says.

So Sabunciyan's research team decided to check the original claim. A colleague at the National Institutes of Health provided samples of prostate cancer. "We tested several hundred samples," he says, "and we didn't see the antibody in any of them."

It appeared the earlier studies had been wrong. Sabunciyan says he had no choice but to conclude that the virus did not exist in humans.

He didn't want other researchers to invest time and resources following the same erroneous path. But publishing a report that questions previous work was not easy. One cancer journal turned down his paper, saying his research lacked an appropriate control group of people infected with the virus.

By the time his research was published, in the March 2011 issue of Molecular and Cellular Probes, three other articles were also pointing to the conclusion that the virus didn't exist. Science retracted its report that December, and PLoS Pathogens followed in 2012.

As it turns out, XMRV was nothing more than a lab mix-up. The virus had formed when prostate cancer cells were mistakenly combined with cells of a genetically modified mouse.

"I don't think the original authors were trying to mislead or deceive anyone," Sabunciyan says. "They should have done more replication studies."

When research can't be reproduced

Biomedical research illuminates the world and saves lives. It's also messy, tedious, and often frustrating. Repeating studies and building on them, as Sabunciyan attempted to do, is the key to ensuring that our understanding of the natural world grows more precise over time.

The School of Medicine, where researchers publish more than 5,000 research papers a year, has a particular obligation to make sure its own research is as reliable and transparent as possible, says Dean Paul Rothman

But recent years have seen a nationwide rise in highly publicized studies that proved problematic. The issue became clear in 2011, when the pharmaceutical company Bayer conducted its own reproducibility tests on 67 published experiments and failed to get similar results on 53.

Bayer's inability to reproduce potentially lifesaving research shook the scientific world.

Research that can't be reproduced has no value. In many cases, this failure occurs after pharmaceutical companies have invested heavily in discoveries and patients have signed on for clinical trials. Resources were squandered, the march toward therapies took a detour, patient hopes were shattered, and the research community was embarrassed.

Worldwide, retractions of published papers are growing, says Stuart Ray, vice chair of medicine for data integrity and analytics. In 2015, 720 papers in the PubMed database of biomedical literature were retractions, he says, a more than tenfold increase from 2004. The number of publications per year increased just twofold during the same time period.

What many would describe as the "replication crisis" in science has gained the attention of researchers and policymakers at the highest levels. At the National Institutes of Health, which holds the purse strings to some $17 billion in biomedical grant funding annually, Director Francis Collins authored an editorial in Nature acknowledging "... that the complex system for ensuring the reproducibility of biomedical research is failing and is in need of restructuring."

The Johns Hopkins University School of Medicine, where researchers publish more than 5,000 research papers a year, has a particular obligation to make sure its own research is as reliable and transparent as possible, says Paul Rothman, dean of the medical faculty and CEO of Johns Hopkins Medicine.

"Clearly now is the time for the U.S. research enterprise, and for us at Johns Hopkins, to re-evaluate our processes and incentive systems," he says.

Under Rothman's direction, the school of medicine in 2016 convened a six-member Task Force on Research Reproducibility, which developed specific proposals to improve reproducibility.

The proposals fall into two broad categories. The first is identifying and teaching best practices in study design, such as using appropriate statistical measures to ensure sufficient sample sizes, and testing biological agents for contamination or mislabeling. This summer, graduate program directors were surveyed about whether their students receive adequate training in study design and statistics, and how any barriers to such training could be removed.

The second is to archive data associated with published papers—a move that will help investigators like Sabunciyan look back at the original data to see how the earlier conclusions were reached.

"We need to archive, store and secure these data so that the steps taken in each experiment are traceable," says neuroscientist Alex Kolodkin, who chairs the task force. "This is not in any way meant to be a punitive sort of endeavor," he adds. "This is meant to help investigators. It's a universitywide goal that everything be reproducible."

To be clear, most experts, nationally and at Johns Hopkins, agree that outright malfeasance is rare. Far more often, research can't be reproduced because of factors such as poor design, too-small data sets, math mistakes, published studies that leave out the details needed to mimic the trial exactly as it was performed the first time, and inability to access original data and protocols.

"It's not always related to misconduct," Sabunciyan says. "It's just that the problems we are working with are so complex."

Pitfalls around every corner

Research involves studying the existing literature, forming hypotheses, gathering and analyzing data, and publishing the findings. But pitfalls can trip up scientists every step of the way. Researchers may fail to take into consideration factors such as gender differences or room temperature, for example. Or they may misinterpret what their data show.

Illustration shows an assembly line of sorts of various scientists adding liquid to containers; only one is red

Image credit: Dave Plunkert

The School of Medicine's Responsible Conduct of Research program, established in 2009 and refreshed in 2016, requires that faculty members, postdoctoral trainees, and staff members learn research rules and best practices through research integrity colloquia, an online course, and department meetings devoted to the responsible conduct of research.

Graduate students also receive extensive training in responsible research through mandatory graduate-level training courses in the Responsible Conduct of Research program.

In addition, and also in line with the task force's recommendations, a grant from the National Institutes of Health is being used to develop a 10-part online learning module that will soon be required for all beginning researchers.

The course is an expansion of one created by graduate student Alyssa Ward in the Department of Molecular Biology and Genetics. Ward's seven-part course, designed with faculty sponsor Randall Reed, assistant dean for research, explores issues like avoiding inadvertent bias, using sufficient and representative data, creating figures that accurately reflect the data, and writing papers that don't overstate the results.

The experience of creating the course underscored for Ward that even the most well-meaning researchers can make mistakes—herself included.

For information about statistics, Ward stopped by the office of Marie Diener-West, a biostatistician in the Johns Hopkins Bloomberg School of Public Health. In the course of their conversation, Diener-West asked Ward to show the statistical analysis she had done with her own work. Ward studies the regulation of an enzyme that assembles pieces of DNA to make antibodies that fight specific infections.

Diener-West found that Ward's statistical method was overly sensitive, potentially finding a significant result when one didn't exist. "In my case, it didn't happen to change the interpretation of results, but it might have, if my numbers had been closer," says Ward.

Those kinds of mistakes, says Ward, "come from ignorance. We're not teaching statistics in a consistent way."

Further complicating the picture: Rapid advances in technology now mean that many researchers are mining gargantuan data sets.

"The data are more complex, and the principal investigator is farther from the primary data than a decade or two ago," points out Reed, a molecular biologist who graduated from Johns Hopkins University in 1977 and returned as assistant professor in 1984. Instead of working with billions of pieces of information, he says, researchers rely on graphic interpretations of that data, created by computer programs. It's virtually impossible to find or recognize errors in data that are already concentrated and interpreted.

"If we enhance the way we store and manage data, and if we provide resources and education on statistics, we will enhance the overall reproducibility in the research community simply because people will be closer to their data," says Reed, who serves on the Johns Hopkins task force with Kolodkin.

Alex Kolodkin

Image caption: Alex Kolodkin

Image credit: Mike Ciesielski

Reed notes that until now at Johns Hopkins, and at most institutions around the country, there have been no standards for storing and sharing the data that go into experiments. Investigators jot down observations in physical notebooks, stash data on flash drives that can get destroyed in coffee spills, or compute charts on personal laptops that crash or get replaced. When students graduate, they may leave behind primary data that are poorly organized or labeled.

That is changing. Currently, all principal investigators in the neuroscience and molecular biology departments are creating unified data storage protocols for their labs, and other departments are expected to follow. An institutionwide server is also being designed, says Reed, with the goal of launching a pilot in 2018.

It will provide a clear record, he says, of how data presented in every publication authored by Johns Hopkins scientists were gathered and analyzed.

An extra review before publication

Burgeoning data sets also increase the complexity of statistical analysis. Husband-and-wife biostatisticians Jeff Leek and Leah Jager, both on the faculty at the Bloomberg School, studied the problem and concluded that 14 percent of published research is false because data analysis was inaccurate or misinterpreted. So "the result presented in the study produces the wrong answer to the question of interest."

Yet Leek and Jager have spent enough time analyzing research results to know better than to trust their own findings. "If you asked me to stake my career on [our finding of] 14 percent, I wouldn't do that," says Leek.

So the pair took a step that's gaining buy-in among researchers: They posted their paper on arXiv, a server where scientific papers can be critiqued before publication. The journal Biostatistics also sent the paper to several experts for comment as part of its normal reviewing process before it went to print.

The commenters on arXiv picked apart the results, with some arguing that the false positive rate is much higher.

For their part, Leek and Jager say the back-and-forth, while rough on their egos, shows how research can grow more accurate over time. The commenters, they say, "made valuable contributions that suggest ways to build on our original idea." They didn't change their overall conclusion, they say, but the discussion added nuance.

The practice is becoming routine in some computation-based specialties like bioinformatics, says Kolodkin, but he does not expect it to become an across-the-board requirement. "It should be encouraged in cases where the investigator believes the field will provide constructive criticism," he says.

Changing the culture

As anyone in academic medicine can attest, biomedical researchers today work in pressure-cooker environments, where funding is scarce and positions often tenuous. Prominent studies published in well-known journals can bring in much-needed grant money and provide essential career advancement to beginning scientists, including graduate students, postdoctoral fellows, and newly minted assistant professors.

Studies that make a big splash also bring glory and promotions to the mentors of junior investigators: the principal investigators who administer these research projects and the grants that fund them.

Acknowledging this reality, Kolodkin and the task force are advocating nothing less than a change in culture at Johns Hopkins. They want to change the promotion system to emphasize a researcher's entire body of published work, not just focus on papers in prestigious journals.

After all, an experiment that cannot be reproduced might as well never have been performed, he notes. The resources expended on such work were wasted.

"Research reproducibility," says Kolodkin, "must be our highest priority."