The development of automatic techniques for the restoration of handwritten texts in ancient, degraded manuscripts is gaining more interest in the field of book heritage. Typically, the task of interpreting the written text is left to expert philologists, who make use of “visible” features (naked eyes or digitization) and contextual information, thus introducing subjectivity into the restoration process. Recent advances in machine learning, especially deep learning, have paved the way for new methods for document analysis that can be integrated into fully automated systems capable of providing objective results. Of particular interest for text restoration are the algorithms related to document image binarisation, which consists in discriminating in the images the text from the background. This paper presents a fully automated processing pipeline for document image binarisation using deep learning based segmentation approaches on digitised images of ancient, degraded manuscripts. To generalize the approach, the method has been tested using image data from different sources, including open datasets and data obtained in laboratory. The experimental datasets were acquired with the PhaseOne multispectral imaging console in the UV-VIS-NIR. The spectral images are first combined to enhance the detection of the manuscript features, by improving the image contrast of the written text with respect to the background. A robust framework for handwritten text segmentation is then applied, enabling an accurate segmentation without the need for manual intervention. The integration of the multispectral data module in the pipeline is investigated in the analysis of the degraded manuscripts that are challenging due to the presence of faded ink and deterioration in the support. Once validated on different type of datasets, the proposed technique may represent a valuable tool for philologists and conservators to save time enabling a scalable analysis of large document collections.

Automated text restoration in ancient manuscripts: a deep learning approach for document image binarisation based on multispectral data

Daffara, Claudia
2025-01-01

Abstract

The development of automatic techniques for the restoration of handwritten texts in ancient, degraded manuscripts is gaining more interest in the field of book heritage. Typically, the task of interpreting the written text is left to expert philologists, who make use of “visible” features (naked eyes or digitization) and contextual information, thus introducing subjectivity into the restoration process. Recent advances in machine learning, especially deep learning, have paved the way for new methods for document analysis that can be integrated into fully automated systems capable of providing objective results. Of particular interest for text restoration are the algorithms related to document image binarisation, which consists in discriminating in the images the text from the background. This paper presents a fully automated processing pipeline for document image binarisation using deep learning based segmentation approaches on digitised images of ancient, degraded manuscripts. To generalize the approach, the method has been tested using image data from different sources, including open datasets and data obtained in laboratory. The experimental datasets were acquired with the PhaseOne multispectral imaging console in the UV-VIS-NIR. The spectral images are first combined to enhance the detection of the manuscript features, by improving the image contrast of the written text with respect to the background. A robust framework for handwritten text segmentation is then applied, enabling an accurate segmentation without the need for manual intervention. The integration of the multispectral data module in the pipeline is investigated in the analysis of the degraded manuscripts that are challenging due to the presence of faded ink and deterioration in the support. Once validated on different type of datasets, the proposed technique may represent a valuable tool for philologists and conservators to save time enabling a scalable analysis of large document collections.
2025
9781510690479
Historical Manuscripts, Handwriting, Restoration, Binarisation, Multispectral Imaging, Image Segmentation, Deep Learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1173750
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact