Reference genome lacks genetic material from diverse populations

The reference genome, used as a representative sample of the human genome, lacks genetic material found in individuals of African descent

Pencil placed on print out of genetic information

For the past 17 years, most scientists around the globe have used the nucleic acid sequence—an assembly of DNA information, or genome—drawn primarily from a single individual as a kind of baseline reference and human species representation for comparing genetic variety among groups of people.

Known as the GRCh38 reference genome, it is periodically updated with DNA sequences from other individuals. But in a new analysis, Johns Hopkins researchers now say that the collective genomes of 910 people of African descent have a large chunk—about 300 million bits—of genetic material that is missing from the basic reference genome.

"There's so much more human DNA than we originally thought," says Steven Salzberg, Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics.

"The whole world is relying on what is essentially a single reference genome, ... [and] those discarded bits may in fact hold the answers and clues you are seeking."
Steven Salzberg
Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics

Knowing the variations in genomes across populations is essential to research design to reveal why certain people or groups of people may be more or less susceptible to common health conditions, such as heart disease, cancer, and diabetes. Salzberg says that scientists need to build more reference genomes that more closely reflect different populations.

"The whole world is relying on what is essentially a single reference genome, and when a particular DNA analysis doesn't match the reference and you throw away those nonmatching sequences, those discarded bits may in fact hold the answers and clues you are seeking," says Salzberg.

Rachel Sherman, the first author on the report, described online in Nature Genetics and a doctoral candidate in computer science at the Whiting School, says, "If you are a scientist looking for genome variations linked to a condition that is more prevalent in a certain population, you'd want to compare the genomes to a reference genome more representative of that population."

Specifically, the world's reference genome was assembled from the nucleic acid sequences of a handful of anonymous volunteers. Other researchers later determined that 70% of the reference genome derives from a single individual who was half European and half African, and the rest derives from multiple individuals of European and Chinese descent, according to Salzberg.

"These results underscore the importance of research on populations from diverse backgrounds and ancestries to create a comprehensive and inclusive picture of the human genome," said James Kiley, director of the Division of Lung Diseases at the National Heart, Lung, and Blood Institute, which supported the study.

He adds, "A more complete picture of the human genome may lead to a better understanding of variations in disease risk across different populations."

This article originally appeared in JHU Engineering magazine.