CATALOGO DEI PRODOTTI DELLA RICERCA

Facial emotion recognition is a valuable tool in healthcare, providing insights into emotional well-being, developmental progress, and health-related behaviors. This study presents a novel framework integrating deep learning with explainable artificial intelligence (XAI) to enhance emotion recognition from video data. Using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the framework begins with preprocessing, where 3D face meshes with 478 landmarks are generated using MediaPipe, and regions of interest (ROI) are extracted. Data augmentation techniques, including rotation, scaling, and translation, improve dataset variability. Feature extraction is performed using a fine-tuned Xception deep convolutional neural network, followed by classification using supervised machine learning algorithms such as SVM, KNN, ensemble methods, and ANN. Among these, the Fine Gaussian SVM (FGSVM) achieved the highest performance, with 93.87 % accuracy on both validation and test sets. The validation precision, recall, and F1-score were 94.06 %, 93.79 %, and 93.93 %, respectively, while the test set recorded 94.01 %, 93.74 %, and 93.88 %. To ensure interpretability, XAI techniques such as Grad-CAM, LIME, sensitivity occlusion, and SHAP highlight crucial facial landmarks and temporal frames influencing predictions. This study underscores the potential of combining deep learning with XAI to enhance reliability in healthcare applications, improving clinical decision-making, mental health monitoring, and human-computer interaction. A Python-based implementation of the proposed framework is available at: 10.5281/zenodo.14809940.

Explainable Emotion Recognition Using Xception-Based Feature Extraction and Supervised Machine Learning on the RAVDESS Dataset

Hussain Shah, Syed Taimoor;Hussain Shah, Syed Adil;Panagiotopoulos, Konstantinos;Pigueiras-del-Real, Janet;Qayyum, Kainat;Hussain Shah, Syed Baqir;Buccoliero, Andrea;Deriu, Marco Agostino

2025-01-01

Abstract

Facial emotion recognition is a valuable tool in healthcare, providing insights into emotional well-being, developmental progress, and health-related behaviors. This study presents a novel framework integrating deep learning with explainable artificial intelligence (XAI) to enhance emotion recognition from video data. Using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the framework begins with preprocessing, where 3D face meshes with 478 landmarks are generated using MediaPipe, and regions of interest (ROI) are extracted. Data augmentation techniques, including rotation, scaling, and translation, improve dataset variability. Feature extraction is performed using a fine-tuned Xception deep convolutional neural network, followed by classification using supervised machine learning algorithms such as SVM, KNN, ensemble methods, and ANN. Among these, the Fine Gaussian SVM (FGSVM) achieved the highest performance, with 93.87 % accuracy on both validation and test sets. The validation precision, recall, and F1-score were 94.06 %, 93.79 %, and 93.93 %, respectively, while the test set recorded 94.01 %, 93.74 %, and 93.88 %. To ensure interpretability, XAI techniques such as Grad-CAM, LIME, sensitivity occlusion, and SHAP highlight crucial facial landmarks and temporal frames influencing predictions. This study underscores the potential of combining deep learning with XAI to enhance reliability in healthcare applications, improving clinical decision-making, mental health monitoring, and human-computer interaction. A Python-based implementation of the proposed framework is available at: 10.5281/zenodo.14809940.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9798331523480
			
	Parole chiave
	
				Support vector machines , Deep learning , Emotion recognition , Three-dimensional displays , Sensitivity , Explainable AI , Face recognition , Medical services , Feature extraction , Data augmentation
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
Explainable Emotion Recognition Using Xception- Based Feature Extraction and Supervised Machine Learning on the RAVDESS Dataset.pdf solo utenti autorizzati Descrizione: full-text paper Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.69 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1168330

Citazioni

ND

ND

ND

social impact