Objective. Passive brain–computer interface (BCI) based on electroencephalography (EEG) has gained traction as reliable method for monitoring human vigilance in attention-demanding crit- ical contexts. Unfortunately, the lack of extensive public datasets compromises artificial intelligence (AI) research. Proposing a solution to this issue, we augmented two EEG datasets using generat- ive adversarial networks (GANs). Furthermore, we defined a quality-assessment pipeline to over- come the absence of a univocal method to test synthetic data. Approach. Using GAN, we augmen- ted a publicly resting-state EEG dataset sustained attention to response task and a custom one sim- ulating activity during repetitive tasks. After extracting relevant time-variant rhythms via the con- tinuous wavelet transform, we quantitatively compared synthetic data with the real one using L2 distance and cross-correlation function. To evaluate the impact of data augmentation, we trained six forecasting models, three on the original and three on the augmented datasets, over the whole, half and a quarter of total available data, and compared improvements in MAE and symmetric mean absolute percentage error (SMAPE). To study the forecaster’s embeddings, we computed a metric inspired by the Fréchet inception distance (FID) between latent values of real and synthetic data. Finally, to offer a baseline comparison, we extended the performance and embeddings ana- lysis to data generated by a simple linear interpolation method. Main results. The integration of GAN-produced synthetic data improved signal prediction, as evidenced by a 29.0%, 46.4%, 37.4% reduction in mean absolute error (MAE) for splits of the resting-state dataset, and an average MAE reduction of 15.4%, 21.2% for 100% and 50% splits, and a∼2.5% increase for the 25% split. Conversely, training on interpolated data manifests worse performance and denotes extremely small FID distances w.r.t real signals, a sign of overspecialization. Significance. This study contrib- utes a reproducible and complete framework for EEG signal generation and evaluation, addressing one of the main barriers to scalable AI application in BCI.

Toward in-silico data assessment for passive BCIs: generating EEG rhythms with GANs

Cinquetti, Ettore;Menegaz, Gloria;Storti, Silvia F.
2025-01-01

Abstract

Objective. Passive brain–computer interface (BCI) based on electroencephalography (EEG) has gained traction as reliable method for monitoring human vigilance in attention-demanding crit- ical contexts. Unfortunately, the lack of extensive public datasets compromises artificial intelligence (AI) research. Proposing a solution to this issue, we augmented two EEG datasets using generat- ive adversarial networks (GANs). Furthermore, we defined a quality-assessment pipeline to over- come the absence of a univocal method to test synthetic data. Approach. Using GAN, we augmen- ted a publicly resting-state EEG dataset sustained attention to response task and a custom one sim- ulating activity during repetitive tasks. After extracting relevant time-variant rhythms via the con- tinuous wavelet transform, we quantitatively compared synthetic data with the real one using L2 distance and cross-correlation function. To evaluate the impact of data augmentation, we trained six forecasting models, three on the original and three on the augmented datasets, over the whole, half and a quarter of total available data, and compared improvements in MAE and symmetric mean absolute percentage error (SMAPE). To study the forecaster’s embeddings, we computed a metric inspired by the Fréchet inception distance (FID) between latent values of real and synthetic data. Finally, to offer a baseline comparison, we extended the performance and embeddings ana- lysis to data generated by a simple linear interpolation method. Main results. The integration of GAN-produced synthetic data improved signal prediction, as evidenced by a 29.0%, 46.4%, 37.4% reduction in mean absolute error (MAE) for splits of the resting-state dataset, and an average MAE reduction of 15.4%, 21.2% for 100% and 50% splits, and a∼2.5% increase for the 25% split. Conversely, training on interpolated data manifests worse performance and denotes extremely small FID distances w.r.t real signals, a sign of overspecialization. Significance. This study contrib- utes a reproducible and complete framework for EEG signal generation and evaluation, addressing one of the main barriers to scalable AI application in BCI.
2025
passive BCI, EEG, AI, deep learning, GAN, data augmentation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1194427
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact