A labelled multi-relational network (or labelled multigraph, for short) is one in which nodes have labels and a pair of nodes may be connected by an edge with one or more labels. For example, in an airline route database, 'large European city' may be the label on the Paris node and 'large Asian city' may be the label on the New Delhi node and the edge between the two cities may be labelled by several carriers. This article presents an analytical method to compute the p-values of labelled subgraph (sub-network) motifs in such labelled multi-relational networks (multigraphs). The method (and a fast approximation to the method) works for both directed and undirected graphs and extends to large subgraphs. We have validated these methods on a dataset of medium size real networks (up to tens of thousands of nodes and hundreds of thousands of edges) of different types (biological, infrastructural and collaboration networks). The pure analytical model is faster than a randomized simulation model by a factor of approximately 1000 in most of our experiments. This improvement in performance is greater for larger graphs. The approximate analytical model avoids the calculations of statistical variance and achieves nearly the same precision and recall as the pure analytical model while being several times faster. To test the scalability of our methods, we run our algorithms on synthetic and real datasets from protein-protein interaction networks, airline flight paths, the internet infrastructural network and the IMDB movie network. We also illustrate a use case of this form of analysis on a large relationship network of people involved in the Panama papers scandal, retrieving frequently used money laundering patterns. labelled multigraphs motif enumeration; motif statistical significance; random network models; multi-relational networks; multigraphs.

Fast methods for finding significant motifs on labelled multi-relational networks

Giugno Rosalba
;
2019-01-01

Abstract

A labelled multi-relational network (or labelled multigraph, for short) is one in which nodes have labels and a pair of nodes may be connected by an edge with one or more labels. For example, in an airline route database, 'large European city' may be the label on the Paris node and 'large Asian city' may be the label on the New Delhi node and the edge between the two cities may be labelled by several carriers. This article presents an analytical method to compute the p-values of labelled subgraph (sub-network) motifs in such labelled multi-relational networks (multigraphs). The method (and a fast approximation to the method) works for both directed and undirected graphs and extends to large subgraphs. We have validated these methods on a dataset of medium size real networks (up to tens of thousands of nodes and hundreds of thousands of edges) of different types (biological, infrastructural and collaboration networks). The pure analytical model is faster than a randomized simulation model by a factor of approximately 1000 in most of our experiments. This improvement in performance is greater for larger graphs. The approximate analytical model avoids the calculations of statistical variance and achieves nearly the same precision and recall as the pure analytical model while being several times faster. To test the scalability of our methods, we run our algorithms on synthetic and real datasets from protein-protein interaction networks, airline flight paths, the internet infrastructural network and the IMDB movie network. We also illustrate a use case of this form of analysis on a large relationship network of people involved in the Panama papers scandal, retrieving frequently used money laundering patterns. labelled multigraphs motif enumeration; motif statistical significance; random network models; multi-relational networks; multigraphs.
2019
AI algorithms, Network mining, Graph mining
File in questo prodotto:
File Dimensione Formato  
cnz008.pdf

solo utenti autorizzati

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 2.7 MB
Formato Adobe PDF
2.7 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1031083
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact