Advancing federated learning in biomedical applications: tools, best practices, and implementations

Riviera, Walter

The impact of Deep Learning (DL) and Artificial Intelligence (AI) in biomedicine has been transformative and far-reaching. These technologies have revolutionized a variety of biomedical fields by improving diagnostic accuracy, predicting patient outcomes, and personalizing treatments. With the increasing need for large-scale data to train robust machine learning models, data privacy and security concerns have become paramount. A clear demonstration of this growing importance is represented by the introduction of dedicated laws around the world. Federated learning (FL) can help mitigating these concerns by allowing multiple institutions to collaboratively train models without sharing sensitive data. This approach ensures that data remain local, thus preserving privacy, while using larger and more diverse datasets from different sources. Although the potential impact of FL can be easily appreciated, it remains a relatively new concept with growing research interest. In addition, the complexity introduced by all the degrees of heterogeneity that might occur in a FL pipeline (related to data, models, and system diversity), has produced many fragmented and parallel lines of investigation in the research community. Because of this, most of the problems remain open and best practice on how to implement the FL pipeline for real-life applications in the biomedical field remains weak. With this thesis, our aim is to accelerate the adoption of FL in real-life scenarios applied to the biomedical field by advancing on two levels: (i) enable the researcher and developer community with suitable tools and (ii) address the convergence and scalability issues of the FL settings by validating them in different scenarios including multimodal data. The first step was achieved through three fundamental contributions: i.1 by actively developing modules of an open source software tool named OpenFL [187]; i.2 providing a ranking of all open source tools based on a proposed taxonomy of the key features required to implement FL pipelines; i.3, evaluating complex settings, such as Vertical FL setup, on multimodal data and releasing the code for further exploitation. The second step was well achieved through extensive validation of different FL settings (Horizontal and Vertical) using multi-modal real- life dataset, with different deep learning models distributed on a cluster of multiple machines to draw more realistic conclusions. With this thesis, we contributed to the advancement of the research applied to FL in the biomedical field, increasing awareness of the concept and open challenges, stimulating adoption by simplifying entry points and contributing to implementation in real-life environments to foster advancements in healthcare.

CATALOGO DEI PRODOTTI DELLA RICERCA