Artist's rendering of a double helix DNA strand

Credit: Getty Images

Study probes the relationship between genetics, proteins, and disease risk

A groundbreaking collaborative study led by Johns Hopkins researchers has uncovered novel insights into genetic determinants of health and generated data that could lead to a better understanding of racial health disparities

A nearly 40-year-old study is the basis for new groundbreaking collaborative research identifying the relationship between genetics, proteins, and disease risk, while shedding light on racial health disparities in the process.

The new study, the results of which have been published in a paper in Nature Genetics, has provided a wealth of information that will allow the research community to test the ways in which proteins affect health outcomes, such as the risk for developing various types of cancer or heart disease or contracting COVID-19. The work could also lead to the development or repurposing of therapeutic drugs to treat human disease. The researchers hope the study will increase the understanding of the genetic basis of disease, in particular because the diversity of study participants will unlock new information about the links between proteins and disease.

The makings of this comprehensive study date back to the mid-1980s, when the Atherosclerosis Risk in Communities study was launched with Josef Coresh from the Department of Epidemiology in the Bloomberg School of Public Health as a principal investigator. ARIC, for which Johns Hopkins is a key field center, investigated causes of atherosclerosis—a disease characterized by the build-up of fats, cholesterol, and other substances in the walls of arteries—and measured how cardiovascular risk factors, medical care, and outcomes vary by race, sex, place, and time.

The study was notable in two critical ways: it followed individuals for decades, collecting biological samples at regular intervals; and it included Americans of European ancestry as well as Americans of African ancestry. Beginning in 1987, more than 10,000 participants regularly received physical examinations and follow-up phone calls to maintain contact and to assess the health status of the cohort. Data collected include participants' medical history, demographics, health behaviors, and genetic information. The ARIC study has become a valuable resource, resulting in over 2,500 publications to date. Many independent research projects have used ARIC data for a range of topics including the study of heart disease, kidney disease, diabetes, and cognitive decline.

When Nilanjan Chatterjee, Bloomberg Distinguished Professor of biostatistics and genetic epidemiology, learned through graduate students he was co-advising with Coresh that ARIC also collected participants' proteomic data—information about the proteins present in organisms—he realized the immense untapped potential this resource held.

Bloomberg Distinguished Professor Nilanjan Chatterjee

Image caption: Nilanjan Chatterjee

Image credit: CHRIS HARTLOVE

Proteins have a central role in many biological functions, supporting the structure, function, regulation, and repair of organs, tissues, and cells. Proteins support muscle contraction and movement, for example. They transmit signals to coordinate processes between different organs and move essential molecules around the body. Antibodies that support immune function, hormones that help coordinate bodily function, and enzymes that carry out chemical reactions such as digestion are all proteins. Because proteins control many of the mechanisms critical to an organism's health, diseases can often trace their origins to mutations in proteins.

Proteomics, the systemic analysis of proteins, gathers information about the proteome, the complete set of proteins produced by a given cell, organ, or organism. It falls under a class of disciplines collectively referred to as omics, which aim to collectively characterize the groups of biological molecules that translate into the structure, function, and dynamics of an organism. Other examples of omics studies include genomics, the study of an organism's full genetic information; epigenomics, the study of the supporting structure of the genome; and transcriptomics, the study of the set of all RNA molecules.

"ARIC is an incredibly unique data source, both because of the amount of genetic, proteomic, and other omic data they have on such a large number of study individuals, and because of its inclusion of individuals from European and African ancestries," says Chatterjee. "Diverse ancestry data is completely lacking in many omics studies. ARIC had a wealth of proteomic data that had not been analyzed, so we were very happy to take advantage of this incredible resource available to us right here at Johns Hopkins."

For their study, the researchers first analyzed genetic variants that correlate with protein levels in individuals to identify protein quantitative trait loci, or pQTL, portion of DNA. They then developed machine learning-based models that can predict information about an individual's proteins—information that is not always collected—based on genetic information, which is often more accessible in large-scale studies.

"To best serve all patients, diversity in omics studies is imperative."
Nilanjan Chatterjee
Bloomberg Distinguished Professor of biostatistics and genetic epidemiology

This model in turn will allow scientists to identify links between the levels of certain proteins in an organism and its corresponding disease risk. Knowing which proteins to target in order to prevent development of a disease is crucial for developing new drug therapies or repurposing existing drug therapies, as many drugs work by targeting the body's proteins.

To demonstrate how the model works, the team applied it to proteome-wide association studies for two related traits: gout, a common form of arthritis, and its closely related biomarker, uric acid. The results showed that an existing drug could be repurposed to combat gout.

"'Omics' innovations have made multi-disciplinary collaborations necessary, exciting, and productive," says Coresh. "The lived experience of over 10,000 participants in the ARIC cohort, combined with data on nearly 5,000 protein levels in their blood, allowed for the development of tools that are broadly applicable to human health and disease. We have already seen more than a half a dozen new investigations using the tools and the methods will be even more broadly applicable."

For Chatterjee, the study's powerful models and insightful findings underlined the importance of using diverse populations in genetic and omics studies.

"African populations in particular have a lot more genetic variation because the population is older," Chatterjee says. "Excluding people of African ancestry means we miss out on a large fraction of genetic variations and how it impacts health outcomes. Taking results from a genome-wide association study done with only individuals of European ancestry and trying to apply the results to other populations does not work as well for understanding disease risk, which is not surprising. To best serve all patients, diversity in omics studies is imperative."

"The lived experience of over 10,000 participants in the ARIC cohort, combined with data on nearly 5,000 protein levels in their blood, allowed for the development of tools that are broadly applicable to human health and disease."
Josef Coresh
Epidemiologist and principal investigator on the ARIC study

In addition, the team found that information garnered from populations of African ancestry added incredible value for interpreting results from study participants overall.

"Because European populations are newer, their genes are more confounded—many variants always come together, and it is difficult to determine which genetic variant is causally related to a trait," Chatterjee explains. "African populations are older, and over more generations, the tight linkage among variants have broken down and it becomes possible to identify which variants are most likely to be the causal variant for a trait."

Looking forward, for Chatterjee, an exciting aspect of this project was the immense potential for impact these models have. Chatterjee hopes that a multi-omics approach in a multi-ancestry study will unlock a more comprehensive understanding of the genetic basis of complex disease and how that genetic basis arises. Next steps may include developing and improving statistical and machine learning models to combine data from populations of multiple ancestries, data from other types of -omics studies, and extending analysis to rare variants.

The authors emphasize that the study would not be possible without the strong partnerships and collaborations across Johns Hopkins and beyond, including the sophisticated data analysis led by Department of Biostatistics PhD student Jingning Zhang and post-doctoral fellow Diptavo Dutta.

Given the collaborative nature of the undertaking, it was important to the team to make the resources and models they developed available to others. They have made the models available online.

"Anyone can download these models for use in their own study to test for the effect of proteins on whichever traits they are investigating," Chatterjee explains. "Our work has already generated ideas for many follow-up studies using proteomic data, and it has been exciting to see that, in fact, people have already started using the models in their own protein association studies."