Researchers have developed a health-focused artificial intelligence (AI) system called Delphi-2M, trained it on the medical histories of 400,000 people in the United Kingdom and tested it on nearly 2 million people in Denmark. The system can estimate a person’s risk of more than 1000 diseases and chart likely health paths up to 20 years ahead – a potential new tool for prevention, screening and smarter healthcare planning.
What if a computer could trace the likely course of your health – not by guessing, but by calculating?
Today, doctors can predict the risk of heart disease or diabetes, but only one illness at a time. The human body, however, does not work that way. One condition can raise or lower the risk of another condition, but medicine still treats each in isolation. This makes early diagnosis and screening a blunt tool – efficient for some but wasteful or even harmful for others.
Researchers in Germany and Denmark have now taken a step toward changing this. In a new study published in Nature, they built Delphi-2M, a large-scale health-focused AI system trained on millions of medical records that can analyse how diseases emerge, interact and evolve – and estimate an individual’s risk across more than 1000 diseases up to 20 years into the future.
“Decision-making in healthcare relies on understanding people’s past and current health states to predict – and ultimately change – their future trajectory,” says Moritz Gerstung, Professor at the German Cancer Research Center in Heidelberg, who led the work.
From single diseases to health as a story
Across the world, people are living longer – but often have several illnesses at once. Heart disease, diabetes, cancer, depression and arthritis rarely appear in isolation. Doctors know that one diagnosis changes the risk of the next, but healthcare systems still tend to treat each disease separately.
“Understanding each individual’s multimorbidity risks is essential if we want to tailor healthcare decisions, motivate lifestyle changes or direct people into screening programmes,” says Gerstung.
The challenge, he explains, is that medical science still studies diseases one by one. “We had already developed a range of algorithms for predicting cancer – and then we thought, why stop there? Why not predict all the other diseases as well?”
But doing this with conventional tools would have meant building thousands of separate models.
“It would have meant training 1,000 different models, one for each diagnosis – and that just seemed unmanageable.”
From crude screening to a language of disease
For Gerstung, the goal was both simple and ambitious: to make sense of how diseases unfold across a lifetime – and use this insight to make screening far more precise.
“For cancer, there’s always this dual question: where does it come from, and how can we prevent it?” he says. “We know that this is probably the build-up of many small things over time. But for screening, the challenge is that you’re looking for a needle in a haystack. If the group you test has too few real cases, screening can actually do more harm than good.”
Today, most screening programmes still rely on broad criteria such as age or family history – “a very crude decision boundary,” as he puts it.
This limitation led Gerstung and collaborator Søren Brunak at the Technical University of Denmark to think differently: what if the entire record of a person’s medical history could be used to read their health more intelligently?
“We wanted to explore how much more the information in someone’s disease record could change this,” says Gerstung.
This idea became the seed for Delphi-2M, an AI system that can read medical histories much like a language model reads text – finding the hidden patterns that link one diagnosis to the next.
“Each disease is like a word,” he explains. “When you string them together over time, they tell the story of a person’s health.”
Teaching AI the grammar of disease
The real breakthrough came with the development of large language models such as ChatGPT – the same kind of AI that now powers tools such as Delphi-2M. For the first time, a system was built to learn patterns across long sequences – exactly the kind of structure that health data also contain.
“When ChatGPT stormed onto the scene, we realised that these models are, at their core, very simple,” says Gerstung. “They just learn the statistical associations between sequences of words – and somehow, that’s enough to capture grammar and logic.”
This was the moment that his team saw the parallel to human health. “In disease progression, we also have a distinct set of ‘words’ – the disease codes. The principle is really the same,” he explains. “If a model can learn the grammar of a language, perhaps it can also learn the grammar of disease – how illnesses follow each other and interact over time.”
The only major difference, he says, is that “in a sentence, words come right after each other, whereas diseases occur over intervals – sometimes months, sometimes years. So we had to teach the model to recognise and predict time.”
By training the system on hundreds of thousands of anonymised health records, the team tested whether the algorithm could not only recognise past disease patterns but also sketch realistic futures.
“Ultimately, what we’re trying to capture,” says Gerstung, “is the natural history of disease – how the story of one person’s health unfolds over a lifetime.”
Building Delphi-2M: a language model for medicine
To put the idea into practice, the team built Delphi-2M – a health-focused version of the same AI technology behind modern chatbots. But instead of learning from sentences and words, Delphi-2M learns from sequences of medical events: diagnoses, lifestyle factors and even no-event periods, treated as tokens in its vocabulary.
“A person’s health trajectory can be represented by a sequence of diagnoses, each recorded at the age of first occurrence,” explains Moritz Gerstung. “Delphi reads these in order and learns how past events influence the future.”
In other words, it treats a lifetime of medical records the way a language model treats a paragraph – as a story unfolding line by line.
The model was trained on data from 402,799 participants in the UK Biobank – decades of anonymised hospital records, death registries and self-reported conditions covering more than 1000 distinct diseases. It was then validated on another 100,000 people in the United Kingdom and tested – without any retraining – on 1.93 million people drawn from the national health registries in Denmark.
“Every model depends on the quality of its data,” says Gerstung. “Each healthcare system has its own quirks – biological and bureaucratic. The real test was whether something trained in one country could work in another – and it did, with only a small drop in accuracy. This success shows that the model is learning genuine features of human biology and healthcare and not just the quirks of a dataset.”
“Denmark is unique in its registry resources,” Gerstung adds. “The National Patient Register has existed since the 1970s and has become one of the main sources for epidemiological research worldwide. This made it the perfect real-world test case for our model.”
Giving AI a sense of time
Having proved that the approach could generalise across countries, the team then refined how Delphi-2M handles one of medicine’s hardest dimensions – time itself. To make this kind of AI work for health, the team had to rethink how it handles time.
“In a sentence, words follow one after another but not diseases. They can be years apart,” Gerstung explains.
Text models use positional encoding to keep track of word order; Delphi-2M replaces this with age encoding, enabling it to operate in continuous time.
“Saying that someone has a one-in-ten risk tomorrow, next year or in 10 years are very different things,” he says. “The time horizon defines how doctors can act – it really matters.”
In practice, the model relied on almost the same technology as ChatGPT – with just a few changes to make it understand time and health instead of words.
“We just had to add time as a dimension,” says Gerstung. “The rest – the idea of looking back through a person’s past and working out statistical associations – remains exactly as in a language model.”
Simulating the future of health
Just as a chatbot writes one word after another, Delphi-2M can extend a health record step by step – predicting each new diagnosis in the sequence, even the final one. Each simulation represents a plausible future based on the statistical structure of real populations.
“By iteratively sampling these sequences,” Gerstung explains, “we can estimate the disease burden in a population or explore how factors such as smoking or obesity shape someone’s future health.”
Although Delphi-2M has only about 2.2 million parameters – tiny compared with commercial AI models – it still performed impressively.
“Even a model of this modest size can learn remarkably rich temporal relationships,” says Gerstung. “This shows that health records have their own internal logic – and that AI can learn to speak it.”
Once trained, Delphi-2M proved surprisingly accurate. When predicting the next diagnosis across more than a thousand diseases, the model achieved an average accuracy (area under the receiver operating characteristic curve) of about 0.76 – equal to or better than specialised models for single conditions such as heart disease or dementia.
“We thought that maybe some diseases would be predictable and others would remain random,” says Moritz Gerstung, “but almost all of them turned out to follow patterns.”
Watching diseases unfold like a movie
The model captured both broad and subtle trends: childhood infections peaking early, chronic illnesses rising in midlife and the sharp increase in disease burden in old age.
“Death was, of course, the easiest to predict,” Gerstung adds dryly. “The area under the receiver operating characteristic curve for mortality was 0.97 – showing that the model recognises when people are approaching the end of life. And when tested on nearly 2 million Danes without retraining, the results held up remarkably well.”
With this validation complete, the researchers began to explore what the model could actually do beyond pure prediction.
Perhaps the most striking result was Delphi’s ability to simulate the future. Starting from a person’s health record at age 60 years, the model could generate realistic 20-year health trajectories – entire chains of likely diagnoses that, when compared with real outcomes, matched population statistics almost perfectly.
“Delphi doesn’t just predict a single disease,” Gerstung explains. “It can recreate the whole distribution of outcomes for a population – replaying, in a sense, the movie of someone’s health forward in time to show how different choices might affect the plot.”
A new tool for prevention – and for privacy
The model could even mimic how lifestyle factors shape risk. When the researchers ran simulations for smokers or people with high body-mass index, the results mirrored known epidemiological patterns.
Finally, the team asked whether Delphi-2M could also help protect privacy – by generating data that act like real data but contain no personal details. A version of the model trained entirely on these artificial data performed only about three percentage points worse than the original.
“This is encouraging from a privacy perspective,” says Gerstung. “We can now create realistic yet anonymous datasets to train or test other AI models – offering a safe path to innovation without exposing personal information.”
Delphi-2M did not just confirm what was already known; it also uncovered new links between diseases: clusters that tend to occur together, time-dependent risks and the hidden grammar connecting one diagnosis to the next.
“In practical terms, the focus is turning prediction into prevention – helping doctors and healthcare systems act before diseases take hold. The same principle, Gerstung adds, could guide emerging cancer detection technologies. “There’s a lot of research into blood-based multi-cancer tests,” he explains, “but these tests aren’t perfect – they miss some cases and sometimes raise false alarms. Models like ours could help decide who should take such tests and when.”
Seeing medicine’s future – one trajectory at a time
Beyond individual care, Delphi-2M could also change how entire health systems prepare for the future. By simulating millions of health trajectories, it can forecast disease patterns years ahead – helping to plan hospitals, wards and resources. Nevertheless, Gerstung cautions, the approach must be used responsibly: “Any AI model is only as fair as the data on which it is trained. If the underlying data reflect a healthier or more affluent population – as in the UK Biobank – the model will inherit these biases.”
One immediate application could be smarter risk assessment and screening. Instead of rigid age limits or single-disease calculators, Delphi-like systems could spot people whose combination of diagnoses puts them at unusually high risk years before symptoms appear.
“You could imagine a doctor opening a screen and seeing your cardiovascular risk eight-fold higher than average – along with the lifestyle factors that could bring it down,” says Gerstung. “This could make existing screening programmes more effective,” says Gerstung, “by enabling high-risk individuals to start earlier and low-risk individuals later or less frequently.”
“Used carefully, such models could help to shift medicine from reacting to preventing,” Gerstung concludes. “Delphi-2M can look far ahead – giving us a new way to see health and disease as one continuous story, much like a language model that doesn’t just predict the next word but writes the whole paragraph,” concludes Moritz Gerstung.
