CATALOGO DEI PRODOTTI DELLA RICERCA

To decipher the text of damaged manuscripts using state-of-the-art AI methods such as Deep Learning, a training phase with an annotated dataset is required. We have developed an optimized dataset to assess the potential of AI in different book heritage applications, ranging from text recovery to diagnostic analysis, using high-resolution multispectral imaging. This dataset is continuously expanded with image stacks acquired on ongoing case studies; here, as exemplary cases, a damaged notebook by a 20th- century Italian writer and a manuscript by a 19th-century Italian intellectual. The annotation process must yield precise and detailed labels for the text and for each individual handwritten character, thereby providing AI models with suitable input for training. Unlike standard annotation approaches, which rely primarily on transcribing the text, we propose a different method based on instance segmentation of each character in the manuscript. Masks are traced to follow the exact shape of each character, each assigned to its own distinct class. Specific annotation criteria were defined in cross-disciplinary collaboration with philologists, physicists, and AI experts to maximize the system’s potential for accurate handwriting recognition in degraded materials.

Artificial intelligence vs human handwriting: annotating damaged manuscripts

Dumitru Scutelnic;Laura Gazzani;Paolo Pellegrini;Claudia Daffara

2025-01-01

Abstract

To decipher the text of damaged manuscripts using state-of-the-art AI methods such as Deep Learning, a training phase with an annotated dataset is required. We have developed an optimized dataset to assess the potential of AI in different book heritage applications, ranging from text recovery to diagnostic analysis, using high-resolution multispectral imaging. This dataset is continuously expanded with image stacks acquired on ongoing case studies; here, as exemplary cases, a damaged notebook by a 20th- century Italian writer and a manuscript by a 19th-century Italian intellectual. The annotation process must yield precise and detailed labels for the text and for each individual handwritten character, thereby providing AI models with suitable input for training. Unlike standard annotation approaches, which rely primarily on transcribing the text, we propose a different method based on instance segmentation of each character in the manuscript. Masks are traced to follow the exact shape of each character, each assigned to its own distinct class. Specific annotation criteria were defined in cross-disciplinary collaboration with philologists, physicists, and AI experts to maximize the system’s potential for accurate handwriting recognition in degraded materials.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN degli atti del congresso
	
				978-88-942535-9-7
			
	Parole Chiave
	
				multispectral imaging, damaged manuscripts, characters annotation, semantic segmentation, deep learning
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1173767

Citazioni

ND

ND

ND

social impact