This paper presents and discusses a project design aimed at producing synthetic and intuitive visualizations of the reception of Italian literature in nineteenth-century England. In the first part, a processing pipeline is described which combines software in optical character recognition (OCR), named entity recognition (NER), topic segmentation, and sentiment analysis. In the second part, the feasibility of the project is preliminarily tested: (1) by evaluating the quality of the possible corpora (on a sample of 23 texts) and discussing methods for further improving the OCR; (2) by comparing the results obtained by free software (e.g. OpenNLP, ANNIE, NLTK, Stanford CoreNLP) on the sample corpus. The outcomes suggest that, while the project is realizable with the already available resources, a further training of the software on an annotated corpus may substantially improve the quality of the results.

A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing

Rebora, S.
2017-01-01

Abstract

This paper presents and discusses a project design aimed at producing synthetic and intuitive visualizations of the reception of Italian literature in nineteenth-century England. In the first part, a processing pipeline is described which combines software in optical character recognition (OCR), named entity recognition (NER), topic segmentation, and sentiment analysis. In the second part, the feasibility of the project is preliminarily tested: (1) by evaluating the quality of the possible corpora (on a sample of 23 texts) and discussing methods for further improving the OCR; (2) by comparing the results obtained by free software (e.g. OpenNLP, ANNIE, NLTK, Stanford CoreNLP) on the sample corpus. The outcomes suggest that, while the project is realizable with the already available resources, a further training of the software on an annotated corpus may substantially improve the quality of the results.
2017
9781450352659
Literary historiography, comparative literature, text mining, software pipeline, project design, project testing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/973445
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact