Automated document classification process extracts information with a systematical analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and made readily available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and search engines. Current tools classify or “tag” either text or images separately. In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automatically routing documents. We present a formal definition of pertinence and relevance concepts, that apply to those documents types we name “multimodal”. These are based on a model of conceptual spaces we believe compulsory for document investigation while using joint information sources coming from text and images forming complex documents.

A Multimodal Approach to Relevance and Pertinence of Documents

CRISTANI, Matteo;TOMAZZOLI, Claudio
2016-01-01

Abstract

Automated document classification process extracts information with a systematical analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and made readily available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and search engines. Current tools classify or “tag” either text or images separately. In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automatically routing documents. We present a formal definition of pertinence and relevance concepts, that apply to those documents types we name “multimodal”. These are based on a model of conceptual spaces we believe compulsory for document investigation while using joint information sources coming from text and images forming complex documents.
2016
978-3-319-42006-6
document classification, bag of words
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/948160
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
social impact