Algorithm predicts young-onset type 2 diabetes years before diagnosis

Tech Science 30. apr 2026 4 min Research director, professor Søren Brunak Written by Kristian Sjøgren

A new algorithm can predict type 2 diabetes years before it is normally detected by analysing younger patients’ health data. If the tool is put into use, doctors could use already collected health data to identify disease risk several years earlier in patients under the age of 40. The study illustrates a growing field of research in which artificial intelligence (AI) analyses health records for early statistical patterns that precede disease.

Interested in Tech Science? We can keep you updated for free.


AI is expected to play a central role in the healthcare systems of the future. By analysing large quantities of health data, it can detect early signals in health records that point to disease long before doctors would normally make a diagnosis.

Previous studies have shown that diagnostic codes from hospital health records can be used to predict the risk of diseases such as certain types of cancer and cardiovascular disease – sometimes several years before they are detected clinically. The new research builds on this idea but expands the analysis to include patients’ interactions with general practice.

The new algorithm uses data from general practitioners and specialists to identify people who should be examined for young-onset type 2 diabetes – long before the disease would normally be discovered.

“When we now, for example, include prescription data from general practitioners and specialists, it opens the possibility of detecting disease risk even earlier than when we look only at hospital records. We also include many data that do not necessarily consist of diagnosis codes but still tell us something about how patients interact with the healthcare system,” says a researcher behind the study, Søren Brunak, Professor from the Department of Public Health at the University of Copenhagen, Denmark.

The article is currently pending publication in The Lancet Digital Health.

20 years of Danish health data

The study combines data from general practitioners, specialists and hospitals in Denmark to train a machine-learning model that predicts type 2 diabetes among people younger than 40 years, a disease that is otherwise often detected much later.

The model was trained on historical health data and then tested on separate datasets to determine whether it could recognise the same patterns among new patients that geographically do not overlap with the training population.

The dataset covers Denmark’s entire population from 2004 onwards and includes codes for around 9,000 different services carried out by general practitioners and specialists. These records were used as a long-term history of people’s contacts with the healthcare system, which the algorithm could analyse for patterns.

This can include anything from prescriptions and blood tests to email consultations and horse-riding therapy.

The algorithm works by analysing long time-series of patients’ interactions with the healthcare system and identifying statistical patterns that recur among people who are later diagnosed with type 2 diabetes at a young age.

“The aim is to be able to identify younger people with type 2 diabetes earlier. When a doctor is sitting across from a 32-year-old patient, type 2 diabetes is not the first thing that comes to mind, because it most often affects older people,” explains Søren Brunak.

Spotting people at high risk

The study shows that the algorithm can identify a small group of people with a significantly higher risk than the rest of the population of later being diagnosed with type 2 diabetes. In the study, the 0.1% of the population that the model classified as being at highest risk were more than 100 times more likely to develop the disease than average. This means the model can detect early signals in patients’ health data long before diagnosis.

The algorithm can also be calibrated either to give priority to identifying as many people at risk of type 2 diabetes as possible or to minimise the number of false positives.

In other words, it can either identify most young people who will later develop type 2 diabetes but with a higher error rate or operate with fewer false positives while identifying slightly fewer cases.

Regardless of how the algorithm is calibrated, doctors could in principle alert patients long before the disease would otherwise be detected and offer them further examination and possible follow-up treatment.

“In recent years, we have seen a large global increase in the number of younger people diagnosed with type 2 diabetes, and here we show how existing Danish health data from general practitioners, specialists and hospitals can be used to carry out broad screening for this often overlooked patient group,” explains lead author Christian Holm Johansen, a PhD student.

How hidden patterns appear in health records

To understand how the algorithm works, it helps to think about what typically precedes various events in a person’s interactions with the healthcare system.

Data from general practitioners, for example, provides a clear signal linked to the childhood vaccination programme.

Once a child has received one vaccine from the programme, they will typically receive the next as well. In that sense, they follow a predictable path in what ends up being recorded in their health records.

Other connections are far more subtle. People who later develop early-onset type 2 diabetes have often had various contacts with their doctor in the years leading up to the diagnosis – small signals in the data that may be weak on their own but together form a pattern. Machine-learning models are particularly good at detecting this type of pattern in large datasets.

Taken together, these contacts form patterns in the data – a kind of statistical pathway through the healthcare system – that the algorithm can recognise and use to assess whether a person may be moving towards young-onset type 2 diabetes.

Whether the tool should be used is a political decision

Does this mean that Danish doctors will be able to open their patients’ health records tomorrow and see whether they should check a patient’s blood glucose one more time?

Not necessarily. Whether tools like this should be used as part of digital screening programmes is ultimately a political decision.

Although identifying thousands of Danes who appear to be moving towards early type 2 diabetes may be possible, examining all of them more closely in the healthcare system does have costs.

This requires medical evaluation, which takes time in general practice and involves a range of analyses, including blood tests.

“This is a question of resources and of carrying out cost–benefit analysis to determine whether identifying people with type 2 diabetes earlier than we normally do is worth spending time and money,” Søren Brunak explains.

He adds that earlier diagnosis can be crucial for the individual patient identified as being at high risk, because it enables treatment to begin before the disease causes serious damage to the body.

“If another 10 years passes with undiagnosed and untreated type 2 diabetes, the disease will cause damage to the body. That ultimately means lost years of life and a greater need for healthcare services,” says Søren Brunak.

Predicting hundreds of diseases years in advance

According to Søren Brunak, algorithms based on health records can be designed to analyse whether patients appear to be following patterns that lead towards a wide range of different diseases.

This can be done by designing algorithms that specifically look for people who seem to be developing diseases such as diabetes, cardiovascular disease or dementia.

But algorithms can also be developed that estimate the risk of developing up to 1,000 different diseases at the same time – long before the first symptoms appear.

“With these algorithms, we can project the health of the entire population 10–15 years into the future and provide early warning of which diseases may be on the rise. This enables disease to be detected earlier and healthcare to be planned better. The advantage is that the models can be built on health data that already exist. Denmark is therefore an obvious place to develop this type of algorithm, because we have been collecting these data for many years,” concludes Søren Brunak.

“Detection of young-onset type 2 diabetes using deep learning across primary and secondary care: a nationwide retrospective cohort study” is currently pending publication in The Lancet Digital Health. The Novo Nordisk Foundation supported the research.

Søren Brunak is a leading pioneer in the biomedical sciences through invention and introduction of new computational strategies for analysis of biomed...

Explore topics

Exciting topics

English
© All rights reserved, Sciencenews 2020