Researchers have created a formal representation of a genome, making it much easier to search for genetic information about the genome or perhaps even design organisms from scratch.
Imagine designing an organism from scratch.
This requires knowing how each building block of the genome affects the final functioning of the organism, which necessitates much greater insight into the functions of all these building blocks than we can access today.
However, this may change very soon, after researchers created a mathematical representation of a genome for the first time. This enables the information to be decoded in the same way as decoding the information on a computer’s hard drive.
According to one researcher behind this, the new mathematical tool that the researchers call the Bitome could be the first step towards a new research field: genome engineering.
“Within a decade, we will be able to design genomes from scratch by deciding what information all the building blocks of the genome must possess. But similar to chemistry and physics, this scientific field must have its own type of mathematics, and the Bitome can be part of this fundamental mathematics,” explains Bernhard Palsson, CEO, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby.
The Bitome has been published in Nucleic Acids Research.
Each DNA building block can encode several functions
The researchers have created a very simple but unbelievably demanding overview of all the properties of the 4.5 million DNA building blocks (nucleotides) that each Escherichia coli bacterium can have.
Each of these nucleotides may have different functional properties.
They may be involved in forming DNA structures, such as an alpha helix, and may be included in codons, which encode an amino acid in a protein. They can also be involved in parts of the genome that help in translating genetic information into proteins. Or they may be involved in unlocking the genome when it is duplicated.
Most nucleotides can even have more than one function.
The researchers identified 1,500 potential properties for each nucleotide.
Created a spreadsheet with all the information in the genome
The researchers then created a massive spreadsheet, in which each row represented one of the 4.5 million building blocks in Escherichia coli, and the columns represented each of the 1,500 properties.
Then they noted for each property whether the specific location on the genome possessed this property or not: 1 for yes and 0 for no.
Does this nucleotide encode an alpha helix? Yes or no? 1 or 0?
“This is like merging information technology and genomics, so that we can access the information in the DNA in the same way that we would access the information on a hard drive, simply by determining whether a given location on the genome encodes a given property. The Bitome contains all the bits of information present in every single location on the genome,” says Bernhard Palsson.
Making searching for information in the genome easy
Bernhard Palsson explains that the mathematical representation of a genome makes searching for information much easier.
For example, you can ask: “How much information is stored in an individual nucleotide at a given location in the genome?”
Then you just have to find the relevant location on the genome in the spreadsheet and then count the number of ones that represent various functions.
When the researchers do this, they find that the average nucleotide has 15–20 functions, and some even have more than 30 functions.
Elsewhere in the genome, large regions of nucleotides encode only about five functions.
“Some places have large information deserts, whereas other places are very information-dense. The complete matrix represents the total information that is in the genome, and you can search the information in various ways by asking different questions,” explains Bernhard Palsson.
Much of the genome encodes genes or alpha helixes
Another way to search the information is by examining at how many locations in the genome a given function is encoded in the genome.
For example, one question could be: “How many locations on the genome encode a function that is relevant to the formation of an alpha helix?”
To answer this question, the researchers can search the matrix and determine in how many rows have a 1 next to the alpha helix.
“You can add the sum of all the ones in a specific row together and find that 25% of the genome encodes an alpha helix. This also shows that 87–88% of the genome encodes one gene. Conversely, we also find that only one site on the genome encodes the starting-point of DNA replication. It is the only site on the genome at which we find this property, and the only location where there is a 1,” says Bernhard Palsson.
Can identify differences in function between bacteria in biotechnology
The new mathematical tool for extracting information from the genome can be used in various contexts in which researchers want to learn more about organisms or their differences.
For example, bacteria used biotechnologically to produce medicines or study infections have many varieties with different properties that are hidden in the genome.
Comparing the Bitome between the bacteria enables researchers to be better informed about how they differ and how this is related to their metabolism or pathology.
“Differences in the Bitome between organisms can be related to differences in the properties of the organisms. In biotechnology, differences in metabolism between bacteria are often the defining property, and comparing Bitomes between the bacteria shows that the information across the vast majority 1,000 genes for metabolism is identical to the information located in the genome, despite large differences in individual metabolism genes. This is the source of design variation,” explains Bernhard Palsson.
Computers can design bacteria from scratch
By obtaining insight into the differences between variants of the same bacterium, researchers can also calculate how they will design the properties they want the individual variant of a bacterium to have.
Bernhard Palsson envisions that in the future you will be able to tell a computer that you want to develop an Escherichia coli that can make insulin most efficiently and that the computer will then provide the complete 4.5 million-nucleotide-long DNA sequence from which this Escherichia coli is to be constructed.
Then the nucleotides just need to be pasted together into the final bacterium.
“This is a whole new research field, and the mathematics opened up by the Bitome makes this field possible,” says Bernhard Palsson.
Bernhard Palsson believes that, within 10 years, all major universities will have a department of genome engineering.
“The Bitome: digitized genomic features reveal fundamental genome organization” has been published in Nucleic Acids Research. Co-author Bernhard Palsson is the CEO of the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby.