MADAM: Manuscript Annotated Dataset Based on Multispectral Imaging for Handwritten Text Enhancement and Restoration

Carcagnì, Pierluigi; Del Coco, Marco; Leo, Marco; Gazzani, Laura; Malagodi, Marco; Paturzo, Melania; Daffara, Claudia

doi:10.1007/978-3-031-98379-5_43

Book heritage calls scientists to specific challenges: the object is investigated in its “textual” and “material” features, with the two aspects interlaced, especially in the case of degraded manuscripts. The written text is often not readable directly, but only through the experience of expert philologists who try to recover it using visible traits and contextual information. Considering the recent advances in machine learning, it becomes possible to implement a computer-aided system to help philologists with the above challenging problem. To do that, it is fundamental to get annotated data, allowing the AI algorithms to learn how to recover the degraded information, aiding philologists in their work, but unfortunately, such a kind of dataset is not available yet. To fill this gap, this paper introduces a dataset with high-resolution images of historical Italian manuscripts, in different and severely degraded conditions, acquired through an optimized multispectral imaging setup in the UV-VIS-NIR. For each image patch of the multispectral stack both transcription and segmentation masks of the handwritten text are provided, making the overall built dataset a valuable resource for developing and testing AI algorithms for enhancement, detection, segmentation or restoring text.