To understand the content of a document containing both text and pictures, an artificial agent needs to jointly recognize the entities shown in the pictures and mentioned in the text, and to link them to its background knowledge. This is a complex task, that we call Visual-Textual-Knowledge Entity Linking (VTKEL), which aims at linking visual and textual entity mentions to the corresponding entity (or a newly created one) of the agent knowledge base. Solving the VTKEL task opens a wide range of opportunities for improving semantic visual interpretation. For instance, given the effectiveness and robustness of state-of-the-art NLP technologies in entity linking, by automatically linking visual and textual mentions of the same entities with the ontology, we can obtain a huge amount of automatically annotated images with detailed categories. In this paper, we propose the VTKEL dataset, consisting of images and corresponding captions, in which the image and textual mentions are both annotated with the corresponding entities typed according to the YAGO ontology. The VTKEL dataset can be used for training and evaluating algorithms for visual-textual-knowledge entity linking.

VTKEL: A Resource for Visual-Textual-Knowledge Entity Linking

Rospocher, Marco;
2020-01-01

Abstract

To understand the content of a document containing both text and pictures, an artificial agent needs to jointly recognize the entities shown in the pictures and mentioned in the text, and to link them to its background knowledge. This is a complex task, that we call Visual-Textual-Knowledge Entity Linking (VTKEL), which aims at linking visual and textual entity mentions to the corresponding entity (or a newly created one) of the agent knowledge base. Solving the VTKEL task opens a wide range of opportunities for improving semantic visual interpretation. For instance, given the effectiveness and robustness of state-of-the-art NLP technologies in entity linking, by automatically linking visual and textual mentions of the same entities with the ontology, we can obtain a huge amount of automatically annotated images with detailed categories. In this paper, we propose the VTKEL dataset, consisting of images and corresponding captions, in which the image and textual mentions are both annotated with the corresponding entities typed according to the YAGO ontology. The VTKEL dataset can be used for training and evaluating algorithms for visual-textual-knowledge entity linking.
2020
9781450368667
entity recognition and linking, AI, multimedia semantic annotation, knowledge representation, NLP and computer vision
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1014828
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 7
social impact