Researchers develop a method to identify signs of manipulation in large data sets

Breaking new ground 21. may 2024 3 min Associate Professor Arijit Khan Written by Kristian Sjøgren

Many things risk manipulation, such as social media and cryptocurrency. Researchers have developed a fully automated method to find signs of manipulation in large data sets in real time. The method can also explain the decisions to users.

The world is increasingly becoming more and more digitalised and thus also more complex.

One great challenge is the huge data sets that have become an integral part of daily life, such as all the data comprising social media and the Internet as a whole.

Data can be difficult to interpret and understand and can be manipulated. Examples include the manipulation of data from social media or data involved in transactions with cryptocurrency through blockchain technology.

Manipulating data on social media can create a false reality, and manipulating cryptocurrency data can move money from your pocket to other pockets without anyone ever detecting it.

The risk of data manipulation is high and the chances of detecting it are slight, but researchers have developed a method to identify signs of manipulation in large data sets in real time.

The method should enable both detecting signs of manipulation in large data sets and finding the source.

“The challenge is that many data sets are very large, anomalies cannot be identified manually, and understanding the data sets in general is very difficult for most people. Our method can identify the anomalies and enable the people working with data to better understand their data and the source of the anomalies,” explains a researcher behind the development of the method, Arijit Khan, Associate Professor, Department of Computer Science, Aalborg University, Denmark.

Nodes and edges

Arijit Khan and colleagues are working to improve insight into graph data, a method for visualising how data interact in large data sets.

For example, data in social media can be stored as graph data. All people, images, comments, videos, links, groups, pages and events are nodes in data, and these nodes are connected by edges, which means interactions between the nodes.

For example, posting a photo on your profile creates an edge between the two nodes: the person and the photo.

The whole of Facebook is a collection of nodes and edges, because Facebook stores data as graph data.

Blockchain technology works similarly for cryptocurrency transactions. The users are the nodes, and the transactions are the edges.

This also applies to data on the effects of drugs, with the drugs and the molecular structure being the nodes and the effects the edges.

Detecting signs of manipulation

Data risk being manipulated. Social media data can be manipulated or cryptocurrency can be stolen by forging transactions between accounts. This is very difficult to detect.

The methods that Arijit Khan and colleagues have developed can precisely identify anomalies as signs of manipulation in large graph data sets.

The software uses scalable algorithms and artificial intelligence to not only identify anomalies or signs of manipulation in large graph data sets but also to explain the signs of manipulation to the people working with the data.

“Artificial intelligence can find patterns and thereby anomalies in data sets, but the explanation is missing. You can therefore be told that something is problematic but not why. Our method not only identifies the anomalies but also explains the reason for the anomaly,” says Arijit Khan.

Identifying signs of tampering in real time

The researchers developed a method that can identify signs of manipulation in the blockchain.

In a study recently published in Frontiers in Blockchain, the researchers showed that the method could automatically analyse correctly three verified manipulations of blockchain technology in currency trading.

The method could identify both suspicious transactions and the actors behind them.

“Blockchain technology provides a constant flow of data that is impossible to monitor manually. You need automated methods to evaluate data in real time and identify anomalies. When the anomalies are identified, an explanation also needs to follow, and our method does this,” explains Arijit Khan.

In another study published in Proceedings of the ACM on Management of Data, the researchers have shown that they can identify drug properties that are associated with an increased risk of mutagenicity in the same way.

“We can not only show that the data indicate that some drug is associated with an increased risk of mutagenicity but also what in the molecular structure gives rise to this property,” says Arijit Khan.

He says that the methods the researchers develop can be adapted to different data sets with different characteristics.

In addition, the researchers have made the methods freely available so that others can use them and develop them further.

Data depth and core-based trend detection on blockchain transaction networks” has been published in Frontiers in Blockchain. The research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Novo Nordisk Foundation. “View-based explanations for graph neural networks” has been published in Proceedings of the ACM on Management of Data. The research was supported by grants from the National Natural Science Foundation of China, Ningbo Yongjiang Talent Introduction Programme, United States National Science Foundation and the Novo Nordisk Foundation.

Data management and Artificial Intelligence for the emerging problems in large graphs, with a focus on user-friendly, efficient, approximate, and expl...

© All rights reserved, Sciencenews 2020