Pattern Recognition techniques have been successfully exploited for the biomedical analysis of NMR spectra. In this context, it is crucial to derive a suitable representation for the data: Among others, a successful line of research exploits the Bag of Words representation (called here 'Bag of Peaks'). However, despite its success, the Bag of Peaks paradigm has not been fully explored: For example, appropriate probabilistic models (such as topic models) can further distill the information contained in the Bag of Words, allowing for more interpretable and accurate solutions for the task-at-hand. This paper is aimed at filling this gap, by investigating the usefulness of topic models in the analysis of NMR spectra. In particular, we first introduce an unsupervised approach, based on topic models, that performs soft biclustering of NMR spectra-this kind of unsupervised analysis being new in the NMR literature. Second, we show that descriptors extracted from topic models can be successfully employed for classification of NMR samples: Compared to the original Bag of Words, we prove that our descriptors provide higher accuracies. Finally, we perform an empirical evaluation involving a complex dataset of spectra derived from fruits, and two datasets of medical NMR spectra: Our analysis confirms the suitability of such models in the NMR spectra analysis

Mining NMR Spectroscopy Using Topic Models

M. Bicego;P. Lovato;M. De Bona;F. Guzzo;M. Assfalg
2018-01-01

Abstract

Pattern Recognition techniques have been successfully exploited for the biomedical analysis of NMR spectra. In this context, it is crucial to derive a suitable representation for the data: Among others, a successful line of research exploits the Bag of Words representation (called here 'Bag of Peaks'). However, despite its success, the Bag of Peaks paradigm has not been fully explored: For example, appropriate probabilistic models (such as topic models) can further distill the information contained in the Bag of Words, allowing for more interpretable and accurate solutions for the task-at-hand. This paper is aimed at filling this gap, by investigating the usefulness of topic models in the analysis of NMR spectra. In particular, we first introduce an unsupervised approach, based on topic models, that performs soft biclustering of NMR spectra-this kind of unsupervised analysis being new in the NMR literature. Second, we show that descriptors extracted from topic models can be successfully employed for classification of NMR samples: Compared to the original Bag of Words, we prove that our descriptors provide higher accuracies. Finally, we perform an empirical evaluation involving a complex dataset of spectra derived from fruits, and two datasets of medical NMR spectra: Our analysis confirms the suitability of such models in the NMR spectra analysis
2018
pattern recognition, bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/992373
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact