According to the World Health Organization, 7 of 10 people die from noncommunicable diseases, such as cardiovascular diseases and cancer. Nevertheless, we know surprisingly little about the relationship between people’s personal environment throughout their lives and health. A major research project is now linking the health data of the Danish population with measured data on the environment, exercise and behaviour on social media of a few sub populations. This will identify new associations that may lead to longer and healthier lives.
Increased awareness of the impact of such harmful conditions as air pollution, exposure to ultraviolet radiation and chemicals on our health means that many people are concerned that they will die prematurely if they are not careful. Many of the invisible dangers that concern people cannot yet be determined with any certainty. Danish researchers from Aarhus University now plan to remedy this.
“We pretty much have the data required to provide the answers. We have health data from the many health registries in Denmark. We measure pollution in great detail. We want to link this huge quantity of information with data we can collect through personal trackers based on the premise that we will be able to identify some clear associations between the environment, lifestyles and disease patterns. We will therefore be able to clearly determine the most serious influences,” explains Clive Sabel, Professor, Department of Environmental Science, Aarhus University.
From big to rich
Over the years, there have been attempts to measure the environment in lots of different ways. Researchers are now going to combine and analyse data from the central population register, health and buildings register with environmental exposure assessments.
“We are going to measure the link between the environment and health in a number of different ways. First of all we have access to the central population registers and the building registers so that we can track people throughout their lifetimes. Where they lived and worked. This will allow us to recreate people´s lifespans from conception to death. Then we can supplement this with lots of environmental data on air and water pollution from the Department of Environmental Science at Aarhus University.”
These official data sources will then be combined with personalised sensor data and finally the researchers will use social media such as Twitter to look at how people tweet about their social and physical environments.
“Adding all of that together we want to build a picture of an individual and how they lead their lives and what environmental influences they have in their lives. However, the unique thing that we are doing is really the integration. It is the joining together, so this is a big data project but we like to call it a rich data because it is not just about the individual large datasets, but it is about the power of joining all these datasets together that is so exciting.”
Obtaining value from the numerous data
The project will be supplemented with three cohorts. The first is the Garmin cohort: the researchers, in collaboration with Garmin, a tracking technology company, will measure the effects of the quantity of physical training among up to 3000 runners. Nevertheless, the researchers are not just monitoring physically active and fit people. Another group comprises people who are seriously ill.
“The second cohort will monitor a group of people with serious heart disease in collaboration with Aarhus University Hospital. The hearts of these individuals stop beating almost daily and they have therefore had a defibrillator implanted that kick-starts their heart when it stops. Using our tracking system, doctors will be able to monitor these people and the circumstances under which they experience cardiac arrest, thereby enabling them to understand better how to avoid these dangerous and painful situations.”
The third investigation in the project will examine a much larger data set comprising blood samples from 500,000 people who have donated blood about every 6 months for the past 20 years. By linking the blood-related data with data on the health of these people, the researchers can link physiological changes in blood samples with health data from doctors and hospitals and other factors, such as changes in social conditions.
“We emphasize big data today, but merely having large data sets is not sufficient. Having valuable data is the important thing. We will try to create value by linking the large data sets already available with the new data we collect – from the personal trackers that measure both the level of activity and the environment but also by analysing the activity of trial participants on social media.”
New generation of researchers needs to be discovered
The researchers will try to analyse the social media data by using advanced text-mining systems to discover whether the trial participants are happy or sad. Managing the various large data sets will therefore be the main challenge in the project, because combining data from the registries, personal trackers and social media has never previously been attempted on such a large scale.
“The large data sets will help us to understand very complex interactions between environmental exposure and human health. However, we need to develop a platform and completely new computer algorithms to reveal patterns and pollutants. If we succeed, we will achieve a far greater and more important milestone than the specific results from the three cohort studies, because researchers worldwide will be able to use the platform and the algorithms.”
The project to create this advanced platform brings together experts on the environment, public health, statistics and data analysis. In addition to solving this complex research task, these researchers have also committed to educating a new generation of researchers in this very interdisciplinary field.
“Creating this new platform is not enough. We also need to ensure that a research tradition can be built up in Denmark. We can best achieve this by educating young people and by ensuring close collaboration with top researchers from abroad, who will visit Denmark during the 5 years of the project.”
In addition to committing to establishing the basis for a future research community through education and international collaboration, the project group also aims to contribute to the national dialogue on data security and personal data.
“Trust and security related to private health data produced in connection with the analysis are key to the future of this field. We can carry out fantastic research, but we will undermine the basis of the field if we fail in data security, and then everything will be wasted.”
However, if they do succeed in this, Clive Sabel is certain about the major impact the project will have.
“The health sciences are really struggling to understand and explain what causes some people to develop cancer, diabetes and cardiovascular disease, whereas others do not. Genes and the environment both have a role. Imagine if we can provide a complete picture of the major environmental influences throughout our life from cradle to grave in the next 20 years. The ambition is very great, but the rewards are even greater if we succeed.”
Clive Sabel, Professor of Environmental Geography, Department of Environmental Science, Aarhus University, received a Novo Nordisk Foundation Challenge Programme grant in 2018 for the project Big Data Centre for Environment and Health. The project is a collaboration between three research groups at Aarhus University led by Ole Hertel, Department of Environmental Science – Atmospheric chemistry and physics, Aarhus University; Torben Sigsgaard, Department of Public Health - Institute of Environmental and Occupational Medicine, Aarhus University; and Carsten Bøcker Pedersen, Department of Economics and Business Economics – National Centre for Register-based Research and Centre for Integrated Register-based Research, Aarhus University.