A new deep-learning algorithm uses machine learning to identify people with an increased risk of developing pancreatic cancer based on data on their health history. A researcher says that this will make screening people for this serious disease easier and improve survival.
Pancreatic cancer is a very serious disease, with only 11% of the people diagnosed surviving more than 5 years.
One reason for the poor survival is the unspecific symptoms, which creates difficulty in identifying it early.
However, a machine-learning model can plough through the data on people’s health history and identify people with the highest risk of developing pancreatic cancer.
A researcher behind the development of the deep-learning algorithm says that this can help to promote early screening, diagnosis and treatment and hopefully also improve survival for the people with pancreatic cancer.
“The model can potentially identify the people with the greatest risk of developing pancreatic cancer, and then they can be invited for medical work-up. This means that everyone does not have to be examined, since by identifying high-risk individuals, we can hopefully intervene selectively before the disease spreads so much that treatment options are highly limited,” explains Søren Brunak, Professor and Research Director, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen.
The research has been published in Nature Medicine.
Trained a machine-learning model on data from millions of people
The researchers developed and trained a deep-learning algorithm that uses machine learning to identify patterns in health data.
The data used in developing the algorithm is from the Danish National Patient Registry, which contains data on millions of people in Denmark since 1977. The Registry holds data on all hospital contacts, including broken arms, head injuries, serious stomach pain and diabetes.
Using the data, the researchers found that 24,000 people had been diagnosed with pancreatic cancer from 1977 to 2018, and they then asked the algorithm to find patterns in the diagnostic codes that led to the diagnosis. This involved the sequence of the health history and the timing of the various diagnoses in relation to each other.
The researchers trained the algorithm on most of the data but withheld data from 600,000 people to test whether it could subsequently identify the people who actually developed pancreatic cancer.
“People with a broken arm are rarely in doubt about the diagnosis, but the symptoms of pancreatic cancer are much more unspecific and can include stomach pain and other symptoms resulting from other diseases and disorders. A computer model can find weaker patterns in data because it analyses data from millions of people and can thus identify some people at high risk that a doctor might not pick up in the same way,” says Søren Brunak.
Model identifies individuals at high risk
When the researchers had trained the model, they tested it on the withheld data from the 600,000 people and found that it very accurately identified the people who developed pancreatic cancer.
Of the 1,000 people the algorithm assessed as having the greatest risk of developing pancreatic cancer, 320 developed it.
The researchers also validated the algorithm by using data from 3 million military veterans in the United States, which strengthens its quality potential not only in Denmark but also elsewhere.
“One great strength of the algorithm and of the study is that we showed that the model can be used on data from two very different countries,” explains Søren Brunak.
He also notes that how the model can or will be used is a political question.
“Solely focusing on the 1,000 people that the algorithm predicts as having the greatest risk of developing pancreatic cancer will detect many cases and have relatively few false-positive results. But the numbers could also be expanded to test the 10,000 or 100,000 people identified at greatest risk. This would lead to more screening but would also detect more people with pancreatic cancer,” says Søren Brunak.
Helping doctors to become more aware of pancreatic cancer
Although the algorithm is already performing well in identifying people at high risk of pancreatic cancer, the prognostic value of the model can be improved even further.
Søren Brunak calls the current model a prototype, whose prognostic value is at the lower limit of the potential if it is trained on even more data from other sources.
Further developing the algorithm with even more data from, for example, general practitioners, laboratory results, socioeconomic data, genetic data and data from computed tomography and X-rays, can improve the predictive value even more.
The model can thus be used not only to identify people at high risk of developing pancreatic cancer but also to make doctors more aware of other features associated with pancreatic cancer as a disease.
For example, the algorithm identified some diagnostic codes that appear to be associated with the risk of developing pancreatic cancer that were not well characterised chronologically – including gallstones, acid reflux and stomach catarrh.
The algorithm also identified quite similar features in the data from both Denmark and the United States well despite all the differences in diagnostic coding.
There were also differences. For example, the algorithm examined the use of opioids in its risk assessment for the United States but not for Denmark, although whether using opioids is associated with developing pancreatic cancer is unknown.
“We do not assume that one computer model can be used in all countries. Instead, we imagine one that needs to be trained and validated on data from each country to be able to identify people at high risk of developing pancreatic cancer in that country,” concludes Søren Brunak.