Swarm Learning (SL) has been recently proposed for distributed learning, where a group of individual centers perform a synchronized training. Unlike traditional machine learning models that rely on a central server, swarm learning distributes the learning process across multiple nodes. Each node independently processes data and contributes to the overall learning task. This collaboration allows the swarm to benefit from individual nodes' different data. Unlike federated learning, here model parameters are not handled by a central server but are randomly handled across each individual node. The intrinsic attention of swarm learning to data privacy makes it suitable for distributed health care analysis, where a clinical center wants to benefit from all the other ones in the swarm network. However, the benefit for a single center or for the whole network could vary depending on data distribution. In this paper, we want to analyze the performance of the swarm learning in a network with multiple nodes, where different data distribution scenarios are taken into account. This analysis will show the gain of the whole swarm network and a specific (reference) node, focusing on scenarios where this node has a different amount of data with respect to the other nodes. To perform a more analytical analysis, we introduce a new Key Performance Indicator (KPI) to measure such gain. We then applied this method using I CU data extracted from the MIMIC EHR database and discussed the results obtained by analyzing 5 nodes with different data distribution scenarios.
A Key Performance Indicator to Analyze Swarm Learning Performances with EHR
Mantovani, Matteo;Scheda, Riccardo;Combi, Carlo;
2024-01-01
Abstract
Swarm Learning (SL) has been recently proposed for distributed learning, where a group of individual centers perform a synchronized training. Unlike traditional machine learning models that rely on a central server, swarm learning distributes the learning process across multiple nodes. Each node independently processes data and contributes to the overall learning task. This collaboration allows the swarm to benefit from individual nodes' different data. Unlike federated learning, here model parameters are not handled by a central server but are randomly handled across each individual node. The intrinsic attention of swarm learning to data privacy makes it suitable for distributed health care analysis, where a clinical center wants to benefit from all the other ones in the swarm network. However, the benefit for a single center or for the whole network could vary depending on data distribution. In this paper, we want to analyze the performance of the swarm learning in a network with multiple nodes, where different data distribution scenarios are taken into account. This analysis will show the gain of the whole swarm network and a specific (reference) node, focusing on scenarios where this node has a different amount of data with respect to the other nodes. To perform a more analytical analysis, we introduce a new Key Performance Indicator (KPI) to measure such gain. We then applied this method using I CU data extracted from the MIMIC EHR database and discussed the results obtained by analyzing 5 nodes with different data distribution scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.