Computer model uses health data to predict who might develop pancreatic cancer

Therapy Breakthroughs 9. jul 2023 3 min Research director, professor Søren Brunak Written by Kristian Sjøgren

A new deep-learning algorithm uses machine learning to identify people with an increased risk of developing pancreatic cancer based on data on their health history. A researcher says that this will make screening people for this serious disease easier and improve survival.

Interested in Therapy Breakthroughs? We can keep you updated for free.

Follow Therapy Breakthroughs

Research director, professor

Søren Brunak

Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research

Follow Søren

Pancreatic cancer is a very serious disease, with only 11% of the people diagnosed surviving more than 5 years.

One reason for the poor survival is the unspecific symptoms, which creates difficulty in identifying it early.

However, a machine-learning model can plough through the data on people’s health history and identify people with the highest risk of developing pancreatic cancer.

A researcher behind the development of the deep-learning algorithm says that this can help to promote early screening, diagnosis and treatment and hopefully also improve survival for the people with pancreatic cancer.

“The model can potentially identify the people with the greatest risk of developing pancreatic cancer, and then they can be invited for medical work-up. This means that everyone does not have to be examined, since by identifying high-risk individuals, we can hopefully intervene selectively before the disease spreads so much that treatment options are highly limited,” explains Søren Brunak, Professor and Research Director, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen.

The research has been published in Nature Medicine.

Trained a machine-learning model on data from millions of people

The researchers developed and trained a deep-learning algorithm that uses machine learning to identify patterns in health data.

The data used in developing the algorithm is from the Danish National Patient Registry, which contains data on millions of people in Denmark since 1977. The Registry holds data on all hospital contacts, including broken arms, head injuries, serious stomach pain and diabetes.

Using the data, the researchers found that 24,000 people had been diagnosed with pancreatic cancer from 1977 to 2018, and they then asked the algorithm to find patterns in the diagnostic codes that led to the diagnosis. This involved the sequence of the health history and the timing of the various diagnoses in relation to each other.

The researchers trained the algorithm on most of the data but withheld data from 600,000 people to test whether it could subsequently identify the people who actually developed pancreatic cancer.

“People with a broken arm are rarely in doubt about the diagnosis, but the symptoms of pancreatic cancer are much more unspecific and can include stomach pain and other symptoms resulting from other diseases and disorders. A computer model can find weaker patterns in data because it analyses data from millions of people and can thus identify some people at high risk that a doctor might not pick up in the same way,” says Søren Brunak.

Model identifies individuals at high risk

When the researchers had trained the model, they tested it on the withheld data from the 600,000 people and found that it very accurately identified the people who developed pancreatic cancer.

Of the 1,000 people the algorithm assessed as having the greatest risk of developing pancreatic cancer, 320 developed it.

The researchers also validated the algorithm by using data from 3 million military veterans in the United States, which strengthens its quality potential not only in Denmark but also elsewhere.

“One great strength of the algorithm and of the study is that we showed that the model can be used on data from two very different countries,” explains Søren Brunak.

He also notes that how the model can or will be used is a political question.

“Solely focusing on the 1,000 people that the algorithm predicts as having the greatest risk of developing pancreatic cancer will detect many cases and have relatively few false-positive results. But the numbers could also be expanded to test the 10,000 or 100,000 people identified at greatest risk. This would lead to more screening but would also detect more people with pancreatic cancer,” says Søren Brunak.

Helping doctors to become more aware of pancreatic cancer

Although the algorithm is already performing well in identifying people at high risk of pancreatic cancer, the prognostic value of the model can be improved even further.

Søren Brunak calls the current model a prototype, whose prognostic value is at the lower limit of the potential if it is trained on even more data from other sources.

Further developing the algorithm with even more data from, for example, general practitioners, laboratory results, socioeconomic data, genetic data and data from computed tomography and X-rays, can improve the predictive value even more.

The model can thus be used not only to identify people at high risk of developing pancreatic cancer but also to make doctors more aware of other features associated with pancreatic cancer as a disease.

For example, the algorithm identified some diagnostic codes that appear to be associated with the risk of developing pancreatic cancer that were not well characterised chronologically – including gallstones, acid reflux and stomach catarrh.

The algorithm also identified quite similar features in the data from both Denmark and the United States well despite all the differences in diagnostic coding.

There were also differences. For example, the algorithm examined the use of opioids in its risk assessment for the United States but not for Denmark, although whether using opioids is associated with developing pancreatic cancer is unknown.

“We do not assume that one computer model can be used in all countries. Instead, we imagine one that needs to be trained and validated on data from each country to be able to identify people at high risk of developing pancreatic cancer in that country,” concludes Søren Brunak.

Follow Therapy Breakthroughs

“A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories” has been published in Nature Medicine. Several authors are affiliated with the Novo Nordisk Foundation Center for Protein Research, University of Copenhagen.

Research director, professor

Søren Brunak

Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research

Follow Søren

Søren Brunak is a leading pioneer in the biomedical sciences through invention and introduction of new computational strategies for analysis of biomed...

Therapy Breakthroughs

24. okt 2021 3 min

Computer model uses health data to predict who might develop pancreatic cancer

A new deep-learning algorithm uses machine learning to identify people with an increased risk of developing pancreatic cancer based on data on their health history. A researcher says that this will make screening people for this serious disease easier and improve survival.

Interested in Therapy Breakthroughs? We can keep you updated for free.

Søren Brunak

Trained a machine-learning model on data from millions of people

Model identifies individuals at high risk

Helping doctors to become more aware of pancreatic cancer

Søren Brunak

Related articles

Nurses and patients face challenges in discussing head and neck cancer

New knowledge on treating walking problems in Parkinson’s

Doctors in Denmark refreeze transplanted ovary after woman gives birth

Researchers stimulate the body to combat diabetes and obesity

Research consortium aims to discover RNA medicine for treating an emerging liver disease

Super-drugs of the future will be bound to DNA

Genetic variants associated with inflammatory bowel diseases and Parkinson’s disease

Exciting topics

See all 1019

Future 1

Alcohol 27

Food 22

Stem cells 34

Eyes 9

Gut 46

Antibiotics 46

Obesity 97

Mental health 50

Nanotechnology 28

Ageing 28

Hormone 58

Evolution 48

Migraine 9

Protein 125

Fungi 26

Language 7

Plants 42

Exercise 39

Drugs 16

Recycling 4

Schizophrenia 14

Puberty 11

COVID-19 94

DNA 49

Vitamins 16

Fertility 19

Cholesterol 19

Disease 44

Podcasts 14

Dementia 13

Blood 62

Plastic 10

Vaccine 46

Influenza 15

Chemistry 79

Environment 93

Antibodies 24

CRISPR 23

Parasites 13

Depression 28

Muscles 39

Immune system 71

Birds 6

Virus 89

Autism 23

Diabetes 131

Biology 25

Organs 25

Computer 37

Cystisc fibrosis 13

Medicine 97

Micromolecules 22

HPV 13

Asthma 9

Diet 48

Heart 71

Skin 22

Nerves 26

Bacteria 117

Climate 32

Technology 49