Facial emotion recognition is a valuable tool in healthcare, providing insights into emotional well-being, developmental progress, and health-related behaviors. This study presents a novel framework integrating deep learning with explainable artificial intelligence (XAI) to enhance emotion recognition from video data. Using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the framework begins with preprocessing, where 3D face meshes with 478 landmarks are generated using MediaPipe, and regions of interest (ROI) are extracted. Data augmentation techniques, including rotation, scaling, and translation, improve dataset variability. Feature extraction is performed using a fine-tuned Xception deep convolutional neural network, followed by classification using supervised machine learning algorithms such as SVM, KNN, ensemble methods, and ANN. Among these, the Fine Gaussian SVM (FGSVM) achieved the highest performance, with 93.87 % accuracy on both validation and test sets. The validation precision, recall, and F1-score were 94.06 %, 93.79 %, and 93.93 %, respectively, while the test set recorded 94.01 %, 93.74 %, and 93.88 %. To ensure interpretability, XAI techniques such as Grad-CAM, LIME, sensitivity occlusion, and SHAP highlight crucial facial landmarks and temporal frames influencing predictions. This study underscores the potential of combining deep learning with XAI to enhance reliability in healthcare applications, improving clinical decision-making, mental health monitoring, and human-computer interaction. A Python-based implementation of the proposed framework is available at: 10.5281/zenodo.14809940.

Explainable Emotion Recognition Using Xception-Based Feature Extraction and Supervised Machine Learning on the RAVDESS Dataset

Buccoliero, Andrea;
2025-01-01

Abstract

Facial emotion recognition is a valuable tool in healthcare, providing insights into emotional well-being, developmental progress, and health-related behaviors. This study presents a novel framework integrating deep learning with explainable artificial intelligence (XAI) to enhance emotion recognition from video data. Using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the framework begins with preprocessing, where 3D face meshes with 478 landmarks are generated using MediaPipe, and regions of interest (ROI) are extracted. Data augmentation techniques, including rotation, scaling, and translation, improve dataset variability. Feature extraction is performed using a fine-tuned Xception deep convolutional neural network, followed by classification using supervised machine learning algorithms such as SVM, KNN, ensemble methods, and ANN. Among these, the Fine Gaussian SVM (FGSVM) achieved the highest performance, with 93.87 % accuracy on both validation and test sets. The validation precision, recall, and F1-score were 94.06 %, 93.79 %, and 93.93 %, respectively, while the test set recorded 94.01 %, 93.74 %, and 93.88 %. To ensure interpretability, XAI techniques such as Grad-CAM, LIME, sensitivity occlusion, and SHAP highlight crucial facial landmarks and temporal frames influencing predictions. This study underscores the potential of combining deep learning with XAI to enhance reliability in healthcare applications, improving clinical decision-making, mental health monitoring, and human-computer interaction. A Python-based implementation of the proposed framework is available at: 10.5281/zenodo.14809940.
2025
9798331523480
Support vector machines , Deep learning , Emotion recognition , Three-dimensional displays , Sensitivity , Explainable AI , Face recognition , Medical services , Feature extraction , Data augmentation
File in questo prodotto:
File Dimensione Formato  
Explainable Emotion Recognition Using Xception- Based Feature Extraction and Supervised Machine Learning on the RAVDESS Dataset.pdf

solo utenti autorizzati

Descrizione: full-text paper
Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 1.69 MB
Formato Adobe PDF
1.69 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1168330
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact