CATALOGO DEI PRODOTTI DELLA RICERCA

The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale.

High-Precision Biomedical Relation Extraction for Reducing Human Curation Efforts in Industrial Applications

Alan Ramponi;Stefano Giampiccolo;Danilo Tomasoni;Corrado Priami;Rosario Lombardo

2020-01-01

Abstract

The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Parole chiave
	
				Biomedical text mining
information extraction
natural language processing
relation extraction
syntactic dependencies
			
	Appare nelle tipologie:
	
				01.01 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
2019_High-Precision_Relation_Extraction.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 2.03 MB Formato Adobe PDF Visualizza/Apri	2.03 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1144789

Citazioni

ND

5

5

social impact