To decipher the text of damaged manuscripts using state-of-the-art AI methods such as Deep Learning, a training phase with an annotated dataset is required. We have developed an optimized dataset to assess the potential of AI in different book heritage applications, ranging from text recovery to diagnostic analysis, using high-resolution multispectral imaging. This dataset is continuously expanded with image stacks acquired on ongoing case studies; here, as exemplary cases, a damaged notebook by a 20th- century Italian writer and a manuscript by a 19th-century Italian intellectual. The annotation process must yield precise and detailed labels for the text and for each individual handwritten character, thereby providing AI models with suitable input for training. Unlike standard annotation approaches, which rely primarily on transcribing the text, we propose a different method based on instance segmentation of each character in the manuscript. Masks are traced to follow the exact shape of each character, each assigned to its own distinct class. Specific annotation criteria were defined in cross-disciplinary collaboration with philologists, physicists, and AI experts to maximize the system’s potential for accurate handwriting recognition in degraded materials.

Artificial intelligence vs human handwriting: annotating damaged manuscripts

Dumitru Scutelnic
;
Laura Gazzani;Paolo Pellegrini;Claudia Daffara
2025-01-01

Abstract

To decipher the text of damaged manuscripts using state-of-the-art AI methods such as Deep Learning, a training phase with an annotated dataset is required. We have developed an optimized dataset to assess the potential of AI in different book heritage applications, ranging from text recovery to diagnostic analysis, using high-resolution multispectral imaging. This dataset is continuously expanded with image stacks acquired on ongoing case studies; here, as exemplary cases, a damaged notebook by a 20th- century Italian writer and a manuscript by a 19th-century Italian intellectual. The annotation process must yield precise and detailed labels for the text and for each individual handwritten character, thereby providing AI models with suitable input for training. Unlike standard annotation approaches, which rely primarily on transcribing the text, we propose a different method based on instance segmentation of each character in the manuscript. Masks are traced to follow the exact shape of each character, each assigned to its own distinct class. Specific annotation criteria were defined in cross-disciplinary collaboration with philologists, physicists, and AI experts to maximize the system’s potential for accurate handwriting recognition in degraded materials.
2025
978-88-942535-9-7
multispectral imaging, damaged manuscripts, characters annotation, semantic segmentation, deep learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1173767
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact