What are your chances of getting coronary heart disease? What cancer treatment will you respond best to? The answers likely lie in your DNA. But it’s not your DNA scientists base their studies on. Instead, they look at a “reference” genome—one assembled from bits and pieces of genetic material from a few people of mostly European and African ancestry.
Now, researchers have released the first “pangenome,” representing individuals with ancestry from across the globe. The work could improve the genetic testing for various diseases and even provide new insights into human evolution and biology.
“It’s an exceptional advance,” says Mashaal Sohail, an evolutionary geneticist at the National Autonomous University of Mexico who was not involved in the project. “It’s making the picture of human genetic variation more accurate and more complete.”
When the first human genome was published in 2001, it wasn’t quite finished. It was missing about 8% of its genetic alphabet, which was hard to read with the sequencing technology of the time. Scientists have been adding to this “draft genome” ever since, with the last update, known as GRCh38, released in 2017. Last year, researchers published the most complete human genome to date, one that represents virtually 100% of the total human sequence.
But this complete reference genome, known as T2T-CHM13, still doesn’t reflect the genetic diversity of our species. It doesn’t include the many versions of the same gene, or alleles, that might be present in some population but not others, for example. It’s also missing so-called structural variants—large chunks of DNA that could explain why each one of us is different.
What’s more, because both GRCh38 and T2T-CHM13 are mainly built from individuals of mostly European ancestry, medical tools that use them as a reference might not work for patients of non-European descent. Biological markers that help predict certain kinds of cancer might be more accurate in people from particular parts of the globe, for example, and a genetic marker that helps gauge a person’s risk of coronary heart disease may be vastly underestimating the risk in Black people.
“We’re missing quite a bit of information that can contribute to our knowledge of health disparities and health inequities,” says Krystal Tsosie, a Diné genetic epidemiologist at Arizona State University.
To fill those gaps, Benedict Paten, a computational genomicist at the University of California, Santa Cruz (UCSC), and his colleagues at the Human Pangenome Reference Consortium (HPRC) incorporated genomes collected from 47 individuals and their parents, with the whole group representing every continent except Antarctica. They analyzed the genome of each individual in detail, parsing out which portions belonged to each parent. To have a high-quality resolution of each genome, the researchers sequenced long reads of DNA, allowing them to capture more variations than previous research efforts. That added 119 million more base pairs—the building blocks of DNA—to the previously known 3.2 billion in GRCh38. The team also found 1115 new gene duplications, involved in evolution, the scientists report today in a series of papers published in Nature.
The new pangenome adds structural variants that were previously hard to sequence and analyze, says Sarah Tishkoff, a human geneticist at the University of Pennsylvania who was not involved in the new study. “It’s a very important resource.”
For Heidi Rehm, a geneticist at Massachusetts General Hospital, the new pangenome could also be a major advance for rare genetic diseases. These conditions are hard to study because mutations that cause them might not show up in GRCh38. The pangenome, she says, might be a better tool to identify such genetic mutations and diagnose patients. “That’s significant.”
The new pangenome is not only relevant medically, it will also open the doors to more accurate evolutionary genetic studies, Sohail says. With more people now represented, researchers could fill gaps in our evolutionary history, especially in historically understudied parts of the globe.
The HPRC team wants to add more genomes in the future. It has already added an additional 123, Paten says, and it hopes to reach the goal of 350 by next year.
So far, most of the institutions that partake in HPRC are based in the United States and Europe. Karen Miga, a geneticist at UCSC who is part of HPRC, says the next phase of the project is to make it a truly international effort and collaborate with institutions abroad.
The team now aims to sequence new genomes from other regions that have been historically underrepresented, such as the Middle East, which researchers have struggled to sample adequately.
Tsosie says it’s important that the team not just use a wide diversity of genomes, but that it also actively reaches out to the communities it’s studying and understand their health care needs. This way, these communities can directly benefit medically from the project’s results. Thus far, Tsosie says, she hasn’t seen this approach with HPRC. Miga says her team is already in talks with researchers who represent such communities to figure out the best way to collaborate.
Tishkoff says that even adding hundreds of additional genomes to the new pangenome isn’t enough. There’s a lot more diversity out there, she says. “But this is a great start.”