EN / DA
Photo: Shutterstock
Body and mind

Pioneering research finds missing pieces in the genomic puzzle

Today, we can get our genome sequenced for less than DKK 4000 and find out how the small changes in our genome might affect the risk of various diseases. The way computer programs compare genomes has primarily focused on these small changes, but major changes in the genome have often been overlooked. Now Danish researchers have developed a new algorithm that finds the pieces that are often overlooked in the enormous genomic puzzle. This new method is expected to be applied in important ways for the personalized medicine of the future.

Determining the sequence of a person’s genome is similar to a jigsaw puzzle. The current technology cannot actually decode the entire genome. Instead, it produces a gigantic puzzle comprising billions of small pieces, and advanced algorithms must assemble them before the genetic profile can be decoded.

“Analysing genome sequencing data requires laying each individual piece on top of a set of known pieces, called the reference genome. New pieces, such as genomic insertions, are therefore easily overlooked because placing them correctly on the reference genome is difficult. We have developed a new computer algorithm that creates this genomic reference in 3D. This offers greater opportunities for discovering the complex and often overlooked genomic changes and thus provides a clearer image of the genomic landscape,” explains a main author, Jonas Andreas Sibbesen, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen.

Hard to process the extra pieces

Genome sequencing has become affordable for almost anyone. For a few thousand Danish kroner, people can have their entire genome sequenced and thus obtain information on variants in their genome and how these might affect their risk of developing various diseases such as cancer and metabolic diseases.

“Providing these answers requires advanced computer algorithms that can assemble the genomes and compare them with a standard genome. Paradoxically, the algorithms used so far have primarily discovered the smaller genetic variants in the genome, but the major variants such as genomic insertions have remained a blind spot for researchers.”

One approach to assembling the genomic puzzle involves placing the pieces from the start without knowing the picture portrayed in the puzzle. With billions of pieces, this task is incredibly time-consuming and laborious. This is why the assembly method is seldom used. Mapping is therefore often the preferred method; here the tiny pieces are instead embedded onto a reference genome – a known puzzle. This makes the analysis much easier. However, in areas in which the individual sequenced and reference genomes differ greatly, this technique can result in variants being overlooked.

“For example, we know that there are many variants in the HLA region, which encodes for genes that play key roles in our immune system. The pieces there can differ so greatly from the reference genome that embedding them is almost impossible, resulting in many variants in this region not being visible.”

The researchers’ new algorithm uses a new approach: instead of working with a randomly selected reference genome, genetic variants from many individuals can be used simultaneously.

“This trick provides much greater opportunities to use genetic variants known from previous studies in analysing new individuals, which increases the sensitivity for more complex forms of genetic variation. You could say that, instead of embedding the pieces in a single individual, we embed them in thousands of individuals simultaneously.”

Revealing the dark patches

Genome sequencing data have already revolutionized the opportunities for researchers and doctors to investigate the human genome, and this trend will increase in the future. In Denmark, the GenomeDenmark project has mapped the Danish reference genome, and this was the basis for a research group from the Section for Computational and RNA Biology at the Department of Biology of the University of Copenhagen developing the new and pioneering algorithm.

“In the GenomeDenmark project, we used our algorithm to significantly enlarge the spectrum of genetic variants that can be identified from such data. This especially applied to the more complex variations such as large deletions and insertions in the genome, where we discovered many new and previously unseen variations.”

The ability to better visualize the previously dark patches on the genetic map is expected to be applied in important ways for personalized medicine, in which charting an individual’s genetic profile will play a role in choosing treatment.

“As more and more countries launch these large-scale national genome projects, having algorithms that can give doctors a more complete genetic picture is increasingly essential. The goal is therefore to continually become better at discovering new variations in our genomes because this will probably help in providing more answers as to why we become ill and how we need to be treated.”

Accurate genotyping across variant classes and lengths using variant graphs” has been published in Nature Genetics. Lasse Maretty and Anders Krogh from the Bioinformatics Centre, Department of Biology, University of Copenhagen are co-authors. The Novo Nordisk Foundation and Innovation Fund Denmark funded the project.

Jonas Andreas Sibbesen
Postdoc
Current methods for genotyping structural variation, from high-throughput sequencing data, are generally based on comparing the reads to a linear reference genome. However, this approach is biased towards the reference, since regions which differ markedly between the individual sequenced and the reference are harder to infer, compared to regions which are more identical. Hence, prediction of structural variants is generally much harder compared to simpler SNVs. This problem can be mitigated by comparing the reads to a genome graph that contain not only the linear reference, but also the millions of variants already known. The aim of our research is to develop a method that improves discovery and genotyping of structural variation, by reducing the reference-bias using genome graphs.