Complex tissues are composed of diverse cell types whose abundance and interactions critically influence physiology and disease. Ignoring such heterogeneity can bias molecular analyses and obscure disease mechanisms. Experimental approaches to profile samples at single cell resolution are still costly and limited, making computational deconvolution an essential strategy to infer cell type composition and expression profiles from bulk RNA-seq data. However, its performance is hindered by inconsistent annotations, the lack of standardized reference resources, and the limited generalizability of existing reference panels, which are often built for specific pathological conditions and therefore not suitable for broader applications. To address these challenges, we developed a workflow to construct an organ-specific single-cell atlas-level reference (i.e. single cell and single-nucleus RNA-seq datasets) panel with consistent cell type annotation and robust gene signatures. This reference can serve as a ground truth resource to benchmark and guide deconvolution algorithms. Once set up, we employed it with the aim of testing deconvolution approaches to infer tissue-specific cell type–level expression profiles from bulk data, thereby enhancing resolution and accuracy in downstream analyses. For the construction of the atlas reference panel, we focused on liver tissue, selecting four liver-based single cell and single nuclei RNA-seq datasets (for total of 45 samples), including healthy and pathological conditions such as hepatocellular carcinoma, cirrhosis, and steatosis. Firstwe utilised the state of the art methods (Seurat/Harmony) to perform dataset integration and batch correction analysis. The integrated reference panel was annotated through a marker-based approach using the software ScType in combination with a curated list of signatures retrieved from databases such as Cell Marker 2.0 and Azimuth. The resulting reference panel comprises 188.727 cells across 45 samples, with a consistent annotation structured in three levels of increasing granularity. To benchmark the performance of our resource, we used both simulated bulk RNA-seq data and real PCLS (Precision Cut Liver Slices) derived single-cell RNA-seq data. We compared results produced by state-of-the-art deconvolution software including MuSiC2, CDSeq, and CIBERSORTx. Additionally, we investigated the deconvolution performances of our resource using Bulk2Space, a deep learning deconvolution method based on variation autoencoder (VAE), that allows the inference of single-cell-level expression profiles. Finally, to evaluate the usability of the resource, we applied it to a cohort of 57 MASH (Metabolic Dysfunction-Associated Steatohepatitis) related HCC samples, with paired Tumor and Non-tumor bulk RNA-seq data. Through benchmarking with multiple deconvolution methods, we demonstrate that the atlas reference panel is a robust resource for inferring cell type composition across both simulated pseudo-bulk and experimental transcriptomics datasets, highlighting the feasibility of leveraging 6 atlas-level single-cell characteristics to accurately reconstruct cellular composition and heterogeneity from bulk RNA-seq data. Furthermore, we established a standardized workflow for dataset integration and annotation. Future studies will focus on refining the model employed for the deep learning–based deconvolution method. In particular, the exploration of advanced architectures such as the Variational Autoencoder coupled with Generative Adversarial Network (VAE-GAN), the Adversarial Autoencoder (AAE) and the Domain Invariant Variational Autoencoder (DIVA) will be investigated to enhance model performance, accuracy and improve the exploitation of the produced single-cell level expression for downstream analyses. In addition, the integration of new datasets, representing new experimental conditions, will be considered to improve the generalizability of the resource and ensure its long-term applicability and robustness.

Building a liver single-cell transcriptomic reference panel to empower bulk RNA-seq deconvolution

Gallinaro Martina
2026-01-01

Abstract

Complex tissues are composed of diverse cell types whose abundance and interactions critically influence physiology and disease. Ignoring such heterogeneity can bias molecular analyses and obscure disease mechanisms. Experimental approaches to profile samples at single cell resolution are still costly and limited, making computational deconvolution an essential strategy to infer cell type composition and expression profiles from bulk RNA-seq data. However, its performance is hindered by inconsistent annotations, the lack of standardized reference resources, and the limited generalizability of existing reference panels, which are often built for specific pathological conditions and therefore not suitable for broader applications. To address these challenges, we developed a workflow to construct an organ-specific single-cell atlas-level reference (i.e. single cell and single-nucleus RNA-seq datasets) panel with consistent cell type annotation and robust gene signatures. This reference can serve as a ground truth resource to benchmark and guide deconvolution algorithms. Once set up, we employed it with the aim of testing deconvolution approaches to infer tissue-specific cell type–level expression profiles from bulk data, thereby enhancing resolution and accuracy in downstream analyses. For the construction of the atlas reference panel, we focused on liver tissue, selecting four liver-based single cell and single nuclei RNA-seq datasets (for total of 45 samples), including healthy and pathological conditions such as hepatocellular carcinoma, cirrhosis, and steatosis. Firstwe utilised the state of the art methods (Seurat/Harmony) to perform dataset integration and batch correction analysis. The integrated reference panel was annotated through a marker-based approach using the software ScType in combination with a curated list of signatures retrieved from databases such as Cell Marker 2.0 and Azimuth. The resulting reference panel comprises 188.727 cells across 45 samples, with a consistent annotation structured in three levels of increasing granularity. To benchmark the performance of our resource, we used both simulated bulk RNA-seq data and real PCLS (Precision Cut Liver Slices) derived single-cell RNA-seq data. We compared results produced by state-of-the-art deconvolution software including MuSiC2, CDSeq, and CIBERSORTx. Additionally, we investigated the deconvolution performances of our resource using Bulk2Space, a deep learning deconvolution method based on variation autoencoder (VAE), that allows the inference of single-cell-level expression profiles. Finally, to evaluate the usability of the resource, we applied it to a cohort of 57 MASH (Metabolic Dysfunction-Associated Steatohepatitis) related HCC samples, with paired Tumor and Non-tumor bulk RNA-seq data. Through benchmarking with multiple deconvolution methods, we demonstrate that the atlas reference panel is a robust resource for inferring cell type composition across both simulated pseudo-bulk and experimental transcriptomics datasets, highlighting the feasibility of leveraging 6 atlas-level single-cell characteristics to accurately reconstruct cellular composition and heterogeneity from bulk RNA-seq data. Furthermore, we established a standardized workflow for dataset integration and annotation. Future studies will focus on refining the model employed for the deep learning–based deconvolution method. In particular, the exploration of advanced architectures such as the Variational Autoencoder coupled with Generative Adversarial Network (VAE-GAN), the Adversarial Autoencoder (AAE) and the Domain Invariant Variational Autoencoder (DIVA) will be investigated to enhance model performance, accuracy and improve the exploitation of the produced single-cell level expression for downstream analyses. In addition, the integration of new datasets, representing new experimental conditions, will be considered to improve the generalizability of the resource and ensure its long-term applicability and robustness.
2026
Single-cell RNA-seq, integration, bulk RNA-seq, deconvolution
File in questo prodotto:
File Dimensione Formato  
MartinaGallinaro_Thesis_XXXVIIICycle.pdf

embargo fino al 12/05/2028

Descrizione: Doctoral thesis
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 4.08 MB
Formato Adobe PDF
4.08 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1189031
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact