The definition of alternative processing techniques as applied to business documents is inevitably at odds with long-standing issues derived by the unstructured nature of most business-related information. In particular, more and more refined methods for automated data extraction have been investigated over the years. The last frontier in this sense is Semantic Role Labeling (SRL), which extracts relevant information purely based on the overall meaning of sentences. This is carried out by mapping specific situations described in the text into more general scenarios (semantic frames). FrameNet originated as a semantic frame repository by applying SRL techniques to large textual corpora, but its adaptation to languages other than English has been proven a difficult task. In this paper, we introduce a new implementation of SRL called Verb-Based SRL (VBSRL) for information extraction. VBSRL relies on a different conceptual theory used in the context of natural language understanding, which is language-independent and dramatically elevates the importance of verbs to abstract from real-life situations.

VBSRL: A Semantic Frame-Based Approach for Data Extraction from Unstructured Business Documents

Scannapieco, Simone;Tomazzoli, Claudio
2021

Abstract

The definition of alternative processing techniques as applied to business documents is inevitably at odds with long-standing issues derived by the unstructured nature of most business-related information. In particular, more and more refined methods for automated data extraction have been investigated over the years. The last frontier in this sense is Semantic Role Labeling (SRL), which extracts relevant information purely based on the overall meaning of sentences. This is carried out by mapping specific situations described in the text into more general scenarios (semantic frames). FrameNet originated as a semantic frame repository by applying SRL techniques to large textual corpora, but its adaptation to languages other than English has been proven a difficult task. In this paper, we introduce a new implementation of SRL called Verb-Based SRL (VBSRL) for information extraction. VBSRL relies on a different conceptual theory used in the context of natural language understanding, which is language-independent and dramatically elevates the importance of verbs to abstract from real-life situations.
978-3-030-80118-2
Frame semantics; Natural language processing; Schank analysis; Semantic role labeling
File in questo prodotto:
File Dimensione Formato  
2021CC-VBSRL-ASemanticFrameBasedApproachforDataExtractionfromUnstructuredBusinessDocuments.pdf

solo utenti autorizzati

Licenza: Accesso ristretto
Dimensione 450.32 kB
Formato Adobe PDF
450.32 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1049420
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact