Objective. Passive brain–computer interface (BCI) based on electroencephalography (EEG) has gained traction as reliable method for monitoring human vigilance in attention-demanding crit- ical contexts. Unfortunately, the lack of extensive public datasets compromises artificial intelligence (AI) research. Proposing a solution to this issue, we augmented two EEG datasets using generat- ive adversarial networks (GANs). Furthermore, we defined a quality-assessment pipeline to over- come the absence of a univocal method to test synthetic data. Approach. Using GAN, we augmen- ted a publicly resting-state EEG dataset sustained attention to response task and a custom one sim- ulating activity during repetitive tasks. After extracting relevant time-variant rhythms via the con- tinuous wavelet transform, we quantitatively compared synthetic data with the real one using L2 distance and cross-correlation function. To evaluate the impact of data augmentation, we trained six forecasting models, three on the original and three on the augmented datasets, over the whole, half and a quarter of total available data, and compared improvements in MAE and symmetric mean absolute percentage error (SMAPE). To study the forecaster’s embeddings, we computed a metric inspired by the Fréchet inception distance (FID) between latent values of real and synthetic data. Finally, to offer a baseline comparison, we extended the performance and embeddings ana- lysis to data generated by a simple linear interpolation method. Main results. The integration of GAN-produced synthetic data improved signal prediction, as evidenced by a 29.0%, 46.4%, 37.4% reduction in mean absolute error (MAE) for splits of the resting-state dataset, and an average MAE reduction of 15.4%, 21.2% for 100% and 50% splits, and a∼2.5% increase for the 25% split. Conversely, training on interpolated data manifests worse performance and denotes extremely small FID distances w.r.t real signals, a sign of overspecialization. Significance. This study contrib- utes a reproducible and complete framework for EEG signal generation and evaluation, addressing one of the main barriers to scalable AI application in BCI.
Toward in-silico data assessment for passive BCIs: generating EEG rhythms with GANs
Cinquetti, Ettore;Menegaz, Gloria;Storti, Silvia F.
2025-01-01
Abstract
Objective. Passive brain–computer interface (BCI) based on electroencephalography (EEG) has gained traction as reliable method for monitoring human vigilance in attention-demanding crit- ical contexts. Unfortunately, the lack of extensive public datasets compromises artificial intelligence (AI) research. Proposing a solution to this issue, we augmented two EEG datasets using generat- ive adversarial networks (GANs). Furthermore, we defined a quality-assessment pipeline to over- come the absence of a univocal method to test synthetic data. Approach. Using GAN, we augmen- ted a publicly resting-state EEG dataset sustained attention to response task and a custom one sim- ulating activity during repetitive tasks. After extracting relevant time-variant rhythms via the con- tinuous wavelet transform, we quantitatively compared synthetic data with the real one using L2 distance and cross-correlation function. To evaluate the impact of data augmentation, we trained six forecasting models, three on the original and three on the augmented datasets, over the whole, half and a quarter of total available data, and compared improvements in MAE and symmetric mean absolute percentage error (SMAPE). To study the forecaster’s embeddings, we computed a metric inspired by the Fréchet inception distance (FID) between latent values of real and synthetic data. Finally, to offer a baseline comparison, we extended the performance and embeddings ana- lysis to data generated by a simple linear interpolation method. Main results. The integration of GAN-produced synthetic data improved signal prediction, as evidenced by a 29.0%, 46.4%, 37.4% reduction in mean absolute error (MAE) for splits of the resting-state dataset, and an average MAE reduction of 15.4%, 21.2% for 100% and 50% splits, and a∼2.5% increase for the 25% split. Conversely, training on interpolated data manifests worse performance and denotes extremely small FID distances w.r.t real signals, a sign of overspecialization. Significance. This study contrib- utes a reproducible and complete framework for EEG signal generation and evaluation, addressing one of the main barriers to scalable AI application in BCI.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



