Book heritage calls scientists to specific challenges: the object is investigated in its “textual” and “material” features, with the two aspects interlaced, especially in the case of degraded manuscripts. The written text is often not readable directly, but only through the experience of expert philologists who try to recover it using visible traits and contextual information. Considering the recent advances in machine learning, it becomes possible to implement a computer-aided system to help philologists with the above challenging problem. To do that, it is fundamental to get annotated data, allowing the AI algorithms to learn how to recover the degraded information, aiding philologists in their work, but unfortunately, such a kind of dataset is not available yet. To fill this gap, this paper introduces a dataset with high-resolution images of historical Italian manuscripts, in different and severely degraded conditions, acquired through an optimized multispectral imaging setup in the UV-VIS-NIR. For each image patch of the multispectral stack both transcription and segmentation masks of the handwritten text are provided, making the overall built dataset a valuable resource for developing and testing AI algorithms for enhancement, detection, segmentation or restoring text.

MADAM: Manuscript Annotated Dataset Based on Multispectral Imaging for Handwritten Text Enhancement and Restoration

Gazzani, Laura;Daffara, Claudia
2026-01-01

Abstract

Book heritage calls scientists to specific challenges: the object is investigated in its “textual” and “material” features, with the two aspects interlaced, especially in the case of degraded manuscripts. The written text is often not readable directly, but only through the experience of expert philologists who try to recover it using visible traits and contextual information. Considering the recent advances in machine learning, it becomes possible to implement a computer-aided system to help philologists with the above challenging problem. To do that, it is fundamental to get annotated data, allowing the AI algorithms to learn how to recover the degraded information, aiding philologists in their work, but unfortunately, such a kind of dataset is not available yet. To fill this gap, this paper introduces a dataset with high-resolution images of historical Italian manuscripts, in different and severely degraded conditions, acquired through an optimized multispectral imaging setup in the UV-VIS-NIR. For each image patch of the multispectral stack both transcription and segmentation masks of the handwritten text are provided, making the overall built dataset a valuable resource for developing and testing AI algorithms for enhancement, detection, segmentation or restoring text.
2026
9783031983788
Manuscripts, multispectral imaging, handwriting, restoration, recognition,
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1173747
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact