Automated document classification process extracts information with a systematic analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and research engines. Current tools classify or ”tag” either text or images separately.In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automated documents. We present an investigation of a model of conceptual spaces for investigation using joint information sources from the text and the images forming complex documents. We present a formal model and the computable algorithms and the dataset from which we took a subset to make experiments and relative tests and results.

A Multimodal Approach to exploit similarity in documents

CRISTANI, Matteo;TOMAZZOLI, Claudio
2014-01-01

Abstract

Automated document classification process extracts information with a systematic analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and research engines. Current tools classify or ”tag” either text or images separately.In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automated documents. We present an investigation of a model of conceptual spaces for investigation using joint information sources from the text and the images forming complex documents. We present a formal model and the computable algorithms and the dataset from which we took a subset to make experiments and relative tests and results.
978-331907454-2
document classification, taxonomy, ontology, clustering, statistical natural language processing.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/906384
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 8
social impact