Graphs are a widely used structure for knowledge representation. Their uses range from biochemical to biomedical applications and are recently involved in multi-omics analyses. A key computational task regarding graphs is the search of specific topologies contained in them. The task is known to be NP-complete, thus indexing techniques are applied for dealing with its complexity. In particular, techniques exploiting paths extracted from graphs have shown good performances in terms of time requirements, but they still suffer because of the relatively large size of the produced index. We applied decision diagrams (DDs) as index data structure showing a good reduction in the indexing size with respect to other approaches. Nevertheless, the size of a DD is dependent on its variable order. Because the search of an optimal order is an NP-complete task, variable order heuristics on DDs are applied by exploiting domain-specific information. Here, we propose a heuristic based on the information content of the labeled paths. Tests on well-studied biological benchmarks, which are an essential part of multi-omics graphs, show that the resultant size correlates with the information measure related to the paths and that the chosen order allows to effectively reduce the index size.
An entropy heuristic to optimize decision diagrams for index-driven search in biological graph databases
Giugno R.;
2021-01-01
Abstract
Graphs are a widely used structure for knowledge representation. Their uses range from biochemical to biomedical applications and are recently involved in multi-omics analyses. A key computational task regarding graphs is the search of specific topologies contained in them. The task is known to be NP-complete, thus indexing techniques are applied for dealing with its complexity. In particular, techniques exploiting paths extracted from graphs have shown good performances in terms of time requirements, but they still suffer because of the relatively large size of the produced index. We applied decision diagrams (DDs) as index data structure showing a good reduction in the indexing size with respect to other approaches. Nevertheless, the size of a DD is dependent on its variable order. Because the search of an optimal order is an NP-complete task, variable order heuristics on DDs are applied by exploiting domain-specific information. Here, we propose a heuristic based on the information content of the labeled paths. Tests on well-studied biological benchmarks, which are an essential part of multi-omics graphs, show that the resultant size correlates with the information measure related to the paths and that the chosen order allows to effectively reduce the index size.File | Dimensione | Formato | |
---|---|---|---|
2021-Licheri-CIKM.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
539.14 kB
Formato
Adobe PDF
|
539.14 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.