AI can make gene editing safer for treatment

Tech Science 8. jan 2026 10 min Professor Yonglun Luo, Professor and bioinformatician Jan Gorodkin Written by Sybille Hildebrandt

Researchers have developed an artificial intelligence (AI) model that can predict how safely and precisely gene defects can be corrected by using a gentler form of CRISPR gene editing before researchers even enter the laboratory. The model helps scientists rule out risky designs and select the changes that have the greatest chance of being suitable for treating patients.

Interested in Tech Science? We can keep you updated for free.

Emil gets out of breath after just a few steps on a staircase, and already on the first landing his legs burn as if he had been running a long distance. On the surface, he looks like a perfectly ordinary 25-year-old. But he has an inherited blood disorder in which the body breaks down red blood cells far too early, long before they have done their job. Doctors can pinpoint the cause precisely: deep within his genetic code sits a single incorrect letter in his DNA – a small typo that makes the rest of the system falter.

On paper, the countermeasure seems obvious. CRISPR gene editing technology can in principle correct such errors and replace the faulty letter so that the blood cells become robust again. Yet doctors hold back. Every time they consider a correction, they must assume that the intervention may not only fix the error but also leave unwanted changes right next to the site they are trying to repair that could potentially cancel out the intended effect.

This is exactly the challenge that a new study published in Nature Communications sets out to tackle. A team of researchers from the University of Copenhagen and Aarhus University has developed an AI model that can test a CRISPR strategy on a computer, enabling doctors to pinpoint where in the DNA the safest and most effective editing can be achieved for patients. The model evaluates both how well a correction works and w high the risk is of introducing additional changes in the DNA surrounding the target.

The team is led by Jan Gorodkin, a professor and bioinformatician at the Centre for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences at the University of Copenhagen, and Yonglun Luo, a professor and researcher in genome editing at the Department of Biomedicine, Aarhus University, and Steno Diabetes Center Aarhus, Aarhus University Hospital.

Cell experiments and computation strengthen each other

According to the researchers, the new method for the first time provides the most precise overview of where in the DNA a planned correction is both effective and carries a low risk of additional, unwanted changes. In other words, the model shows where the gene defect can most likely be hit cleanly – and where the warning lights should start flashing.

“We can see that our model performs significantly better than the tools that existed previously when we test it on independent datasets. It predicts both the effectiveness and the pattern of DNA changes more accurately than the other methods,” says Jan Gorodkin.

Yonglun Luo emphasises that reaching this point required two very different sets of expertise. He describes his own group as the one that, year after year, builds large, controlled datasets in cells, and Jan Gorodkin’s group develops the algorithms capable of extracting patterns from them.

“We have worked for many years to obtain high-quality data from real cell experiments, and Jan’s group has developed the AI to read them. Our study shows how far you can get when you let the laboratory and the computational analyses work closely together. Neither could have solved the task alone,” says Yonglun Luo.

The genetic scissors that find the right error

In its most basic form, CRISPR can be understood as two small tools working together. To understand why correcting a single small error without creating new ones is so difficult requires examining how CRISPR operates inside the cell.

One component is the genetic scissors themselves – a protein such as Cas9 – which can cut DNA or make more careful modifications. The other is a short RNA molecule that acts as a guide. It is written with a specific sequence of letters that mirrors the sequence in the piece of DNA one wants to target.

Inside the cell, the scissors and the guide drift around as a pair, constantly scanning the DNA. The moment it encounters a site where the letters appear in the same order as in the guide and are followed by a few additional letters known as the protospacer adjacent motif (PAM), which anchors the scissors, it latches on, and the scissors are allowed to work precisely there. In technical terms, researchers call this guide a guide RNA – the cell’s equivalent of a search address.

A more gentle form of gene editing builds on precisely this combination. Instead of the rough cuts made by classic CRISPR tools, which break both DNA strands and rely on the cell’s own repair mechanisms, this approach corrects individual DNA bases by acting primarily on a single DNA strand.

The researchers did not invent base editing. Their contribution is that they have used the existing molecular machinery for base editing to build large, uniform datasets from cell experiments — and used them to train an AI model that can predict both how effective a planned correction is and how high the relative risk is of unwanted changes in the DNA around the target.

The classic CRISPR cutter is useful when the goal is to knock out a gene, but it makes precise letter changes uncertain, because the cell must carry out a complex repair process that does not always produce the intended result.

Base editing was developed as a gentler alternative. Whereas classic CRISPR creates breaks in both DNA strands, base editing works by modifying a single strand and changing one DNA letter directly. This reduces the risk of unintended effects that can arise when the cell has to repair a double-strand break and makes it possible to correct a single genetic “typo” in a more controlled way.

Correcting one genetic typo at a time

Rather than cutting both DNA strands, researchers can now change a single letter in the genome – a far gentler method than previous approaches. To do this, they use a CRISPR tool with a special enzyme attached to the scissors.

First, the enzyme targets a specific letter on one DNA strand and chemically converts it into another, while the opposite strand remains intact. The tool then makes a small cut in the opposite strand. When the cell repairs that cut, it uses the edited strand as a template and copies the complementary partner of the new letter onto the other side. In the end, both DNA strands carry the desired corrected change.

You can compare this to a proofreader who first corrects a single character on one side of a folded piece of paper and then makes the copy follow suit. Scientists call this gentle method base editing, and the tools themselves are specialised gene-editing instruments. Some can, for example, change A to G, and others change C to T.

Even very precise changes in DNA can have unintended consequences if CRISPR also affects letters close to the site being corrected.

And although the method is gentler, it is not without risk. When correcting a single letter, CRISPR base editors can also affect neighbouring letters and create small side changes in the area – so-called bystander changes – which in some cases can lead to unwanted mutations with potential biological significance.

For that reason, the AI model is designed to systematically predict all possible changes in the surrounding DNA and assign a score to each outcome, allowing researchers to judge whether a planned edit is worth pursuing before entering the laboratory.

“We have ensured that the model predicts all combinations of possible changes near the intended edit and assigns a score to each of them. This allows researchers to assess whether the subsequent laboratory work is meaningful,” says Jan Gorodkin.

“We are precisely concerned about these small extra changes around the target in treating patients. That is why we need tools that can both correct the gene defect and help us steer clear of the riskiest designs,” says Yonglun Luo.

Without this kind of sorting, precise gene editing is still too uncertain to be used widely.

This is especially true when the technology is intended for treating patients. That selection is exactly the task of the AI model: to separate the safe corrections from the risky ones before they are tested in cells – and long before they ever reach patients.

Inconsistent data make reliable predictions difficult

Developing gentle genetic scissors and an AI model to control them is only half the job. The other half concerns the foundation on which the model stands: the data it learns from.

This is where Jan Gorodkin and Yonglun Luo identified a gap in the toolbox. They wanted to build a large, uniform dataset for gentle gene editing, in which thousands of combinations are measured in the same way – while developing an AI model that can learn from the many existing datasets around the world without simply blending them into an average that blurs the differences.

“If you want a model that also works outside your own small experimental world, you need to give it a much broader and more solid foundation. That requires data that have been measured systematically and can interact with the datasets others have created,” says Jan Gorodkin.

The background is that the many base-editing experiments carried out worldwide recently have been performed using different cell types, tool variants and experimental designs. This makes it easy for a model to learn patterns that fit one particular experimental set-up but do not necessarily hold true in another.

In addition, this means that the model’s predictions are strongest within the types of gene-editing tools and cell models on which it has been trained. For that reason, it had to be fed with a dataset that was collected and measured systematically from scratch – and that became the starting-point for the series of experiments undertaken by Yonglun Luo’s group.

A shared library of thousands of gene experiments

The work began with Yonglun Luo’s research group building an entire library of small DNA fragments. Each fragment corresponds to a specific location in the human genome and has a guide attached to it.

The library was packaged into harmless viral shells and introduced into a well-known human cell line. Further, the cells were given one of two base editors: ABE7.10, which can change the letter A to G, or BE4, which can change C to T.

After a period of time, the researchers harvested the DNA from the cells and read it letter by letter. For each guide, they could observe two things: how large a fraction of the DNA copies had been altered, and which letters in the window around the guide had been swapped.

“This approach is special because we measure thousands of different combinations in the same cell model and under the same conditions. That gives us a dataset in which we can compare the results directly and really see the patterns in how base editors behave,” says Yonglun Luo.

In the scientific article, the researchers applied their previously established lentiviral gRNA-target pair library technology (SURRO-seq) – a systematic catalogue of how thousands of gene corrections actually behave in cells.

In practice, it functions as a large reference work. For each combination of guide and base editor, it shows how powerful the tool is and how clean the correction will be. In this project, the group ended up with reliable measurements for more than 11,000 guides across the two base editors.

The AI learns from every little change

Jan Gorodkin’s group used this dataset as its starting-point.

The new measurements were combined with several other published datasets on base editing from international research groups. In the end, the models were trained on data from more than 40,000 different guides.

Based on information about the DNA sequence around the target, the design of the guide and the experimental conditions, the AI had to learn to predict two things: how large a fraction of the DNA copies would be corrected and how the various possible combinations of letter changes would be distributed within the window around the guide.

The integrated deep-learning models are called CRISPRon-ABE and CRISPRon-CBE, after the two types of base editors that convert A to G and C to T, respectively. The models take a sequence and a plan as input and produce an estimate of both the strength and the purity of the correction.

“We give the model the DNA sequence itself, a measure of the activity of the base changes and a fingerprint showing which dataset the experiment comes from. This enables it to better distinguish between general biological patterns and effects that are due to technical differences between experiments,” says Jan Gorodkin.

The dataset label turned out to be crucial. It helps the model to distinguish between differences that are biological in nature – such as specific DNA patterns – and differences that arise from the experimental set-up itself. Without these labels, predictive accuracy drops markedly.

New models are more accurate than the old ones

To test how well the models perform, the researchers evaluated them on datasets the AI had not seen during training and let them compete against existing specialised tools such as DeepABE, DeepCBE, BE-HIVE and BE-DICT.

CRISPRon-ABE and CRISPRon-CBE outperformed the existing tools on both parameters that matter in practice: they predicted both how strong a correction would be and how the possible DNA outcomes would be distributed within the window around the guide. Even when the two targets were combined into a single assessment, the new AI models scored higher than their competitors.

A follow-up analysis showed that the Cas9 activity score has a significant role in the model’s calculations. This indicates that how accurately the classic CRISPR scissors hit a site in the genome is closely linked with how effectively base editors work at the same location.

These are all technical details. But for the researchers, the key point is that they can now translate these patterns into something that can be used directly when planning new experiments.

Fewer blind shots in the laboratory

The results can already benefit patients. The models are freely available to researchers, both via a website and as downloadable software.

A researcher can take a specific gene defect, enter the DNA sequence, select the relevant type of base editor and receive a list of possible guides. In addition, the model estimates which guides are most effective according to its calculations and which carry a high risk of causing many bystander changes.

“From the outset, we wanted this to be a practical tool that others could download and use directly in their own projects. The idea is that they can more quickly find the safest and most promising solutions instead of wasting energy on poor choices,” says Yonglun Luo.

In the laboratory, the tool can mean fewer blind shots. Instead of testing 10 guides, researchers can start with the few that the model identifies as most promising, saving time, money and experimental effort.

In the longer term, the tool may also influence how clinical trials are planned. Several research groups and companies are already working with base editing to treat blood diseases and other diseases. A clearer overview of the riskiest and the most promising guides could spare both patients and researchers many detours.

A strong foundation but not the final word

The study covers specific, well-established types of base editors and is largely based on systematic experiments in a single human cell line.

The next step will be to build similar datasets for more variants of the tools and for cell types that are closer to those used by doctors in treatment – for example, blood stem cells. Further, this approach to combining datasets can be transferred to other forms of gene editing, such as prime editing, in which the goal is to edit several bases at once.

For Emil, the study will not change his everyday life tomorrow. He will still get out of breath on the stairs. But it does change the fundamental conditions for future gene therapies: that it will increasingly be possible to calculate safety before intervening in human genetic material. When dangerous and imprecise corrections are sorted out on a screen instead of in a cell or a laboratory animal, the chances increase that the corrections eventually used in treatment will both hit the genetic defect and leave the rest of the DNA untouched.

Yonglun Luo is a professor of biomedical research at Aarhus University, specialising in regenerative medicine, gene therapy, and RNA-based treatments....

Jan Gorodkin is Professor of Bioinformatics at the University of Copenhagen, where he conducts research in computational biology with a particular foc...

Explore topics

Exciting topics

English
© All rights reserved, Sciencenews 2020