The present work aims to offer a state of the art on recent developments in the field of automatic transcription of historical printed documents and manuscripts with HTR (Handwritten Text Recognition) systems, focusing primarily on the recent creation of HTR general models. In this regard, the main characteristics of the most widespread tools and the workflow for generating text recognition models are explained. Secondly, a significant sample of the models currently available is provided, insisting on the production process, the criteria adopted and the evaluation of the results, in relation to the experience matured by the Progetto Mambrino research group of the University of Verona. Finally, some future research directions are provided for the creation and dissemination of these resources, emphasizing the need to seek greater synergy between the academic context, computer experts and memory institutions.

Revolucionar el acceso al patrimonio librario: los sistemas de HTR entre Humanidades Digitales y ciencia de la información

Stefano Bazzaco
2024-01-01

Abstract

The present work aims to offer a state of the art on recent developments in the field of automatic transcription of historical printed documents and manuscripts with HTR (Handwritten Text Recognition) systems, focusing primarily on the recent creation of HTR general models. In this regard, the main characteristics of the most widespread tools and the workflow for generating text recognition models are explained. Secondly, a significant sample of the models currently available is provided, insisting on the production process, the criteria adopted and the evaluation of the results, in relation to the experience matured by the Progetto Mambrino research group of the University of Verona. Finally, some future research directions are provided for the creation and dissemination of these resources, emphasizing the need to seek greater synergy between the academic context, computer experts and memory institutions.
2024
Handwritten Text Recognition (HTR), general models, Progetto Mambrino, information science, digital scholarly edition
Handwritten Text Recognition (HTR), modelos mixtos, Progetto Mambrino, ciencia de la información, edición digital académica
El presente trabajo busca ofrecer un estado de la cuestión sobre los recientes desarrollos en el campo de la transcripción automática de impresos antiguos y manuscritos con sistemas de HTR (Handwritten Text Recognition), fijando la atención prioritariamente en la creación reciente de modelos de HTR mixtos. Al respecto se explican las características principales de las herramientas más difundidas y el flujo de trabajo para la generación de modelos de reconocimiento de texto. En segundo lugar, se proporciona una muestra significativa de los modelos disponibles en la actualidad, insistiendo en el proceso de producción, los criterios adoptados y la evaluación de los resultados en relación con la experiencia madurada por el grupo de investigación Progetto Mambrino de la Universidad de Verona. Finalmente se proporcionan unas futuras pistas de investigación para la creación y difusión de estos recursos, haciendo hincapié en la necesidad de buscar una mayor sinergia entre contexto académico, expertos informáticos e instituciones de la memoria.
File in questo prodotto:
File Dimensione Formato  
25392-Texto del artículo-133868-1-10-20241128.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1147687
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact