Leukemias are a cancer type which affects the leukocytes progenitor cells. These malignancies are highly heterogeneous in terms of molecular mechanisms involved in their onset and progression. Heterogeneity can be further observed within the same subgroup of disease at the inter-individual level, being reflected by different clinical outcomes and responses to treatment in different patients. Unfortunately, the exact leukemia aetiology is still poorly understood and consequently also related prevention, diagnostic, prognostic and follow up methods remain mainly unidentified. Therefore, early-diagnosis, together with specifically tailored approaches to leukemia treatment, still represents a key point in determining patients’ health, life quality and estimated life. Several efforts have been started to improve diagnosis, treatment and disease monitoring of leukemia. In this regard, the work presented in my PhD thesis is part of an international project, named “NGS-PTL: Next Generation Sequencing platform for targeted Personalized Therapy of Leukemia”, whose objective is the development of technologies for the diagnosis and prognosis of haematological cancers. According to the project’s objective, my thesis work aims to identify sequence variants from Whole Exome Sequencing data for the acute types of leukemia, to be used as potential biomarkers to improve therapeutic interventions and for personalize treatments. The work describes the setup and application of a bioinformatic pipeline able to identify the somatic mutations in the leukemia patients and the driver carrier genes, again with the result obtained by its application on all the samples of the project. The setup of the pipeline has required the identification of a set of tools to apply to Cancer sequencing data. In particular, selection of dedicated software to perform the initial pre-processing of the data guarantees the use of sequencing data of high quality and ensures that the subsequent analysis will be performed on well-generated data. Moreover, the selection of MuTect as variant caller has allowed us to overcome specific problems related to the heterogeneity of Cancer sample. The application of these software has led us to the identification of a large and reliable set of somatic variants to be evaluated for the identifications of new biomarkers and driver genes. Then, the interpretation of the somatic variants has required the use of specific database and resources to correctly interpret them and eventually to correlate the mutations with the driving or the development of the leukemia. Using the available biological knowledge, we were able to select likely highly damaging variants, some of which already connected with leukemia in cancer-related sources (COSMIC, ICGC and CIViC). At the end, the discover of genes that drives the development of the disease was performed using three statistical tools on the set of annotated mutations for each leukemia type, leading to the identification of a total of 32 biomarkers. In conclusion, the discovery of potential novel biomarkers, again with the additional biological information provided by the specific resources applied has demonstrated the importance of the application of NGS in the study of Leukemic patients.

ANALYSIS AND INTERPRETATION OF WHOLE EXOME SEQUENCING DATA OF LEUKEMIA PATIENTS

Garonzi, Marianna
2017-01-01

Abstract

Leukemias are a cancer type which affects the leukocytes progenitor cells. These malignancies are highly heterogeneous in terms of molecular mechanisms involved in their onset and progression. Heterogeneity can be further observed within the same subgroup of disease at the inter-individual level, being reflected by different clinical outcomes and responses to treatment in different patients. Unfortunately, the exact leukemia aetiology is still poorly understood and consequently also related prevention, diagnostic, prognostic and follow up methods remain mainly unidentified. Therefore, early-diagnosis, together with specifically tailored approaches to leukemia treatment, still represents a key point in determining patients’ health, life quality and estimated life. Several efforts have been started to improve diagnosis, treatment and disease monitoring of leukemia. In this regard, the work presented in my PhD thesis is part of an international project, named “NGS-PTL: Next Generation Sequencing platform for targeted Personalized Therapy of Leukemia”, whose objective is the development of technologies for the diagnosis and prognosis of haematological cancers. According to the project’s objective, my thesis work aims to identify sequence variants from Whole Exome Sequencing data for the acute types of leukemia, to be used as potential biomarkers to improve therapeutic interventions and for personalize treatments. The work describes the setup and application of a bioinformatic pipeline able to identify the somatic mutations in the leukemia patients and the driver carrier genes, again with the result obtained by its application on all the samples of the project. The setup of the pipeline has required the identification of a set of tools to apply to Cancer sequencing data. In particular, selection of dedicated software to perform the initial pre-processing of the data guarantees the use of sequencing data of high quality and ensures that the subsequent analysis will be performed on well-generated data. Moreover, the selection of MuTect as variant caller has allowed us to overcome specific problems related to the heterogeneity of Cancer sample. The application of these software has led us to the identification of a large and reliable set of somatic variants to be evaluated for the identifications of new biomarkers and driver genes. Then, the interpretation of the somatic variants has required the use of specific database and resources to correctly interpret them and eventually to correlate the mutations with the driving or the development of the leukemia. Using the available biological knowledge, we were able to select likely highly damaging variants, some of which already connected with leukemia in cancer-related sources (COSMIC, ICGC and CIViC). At the end, the discover of genes that drives the development of the disease was performed using three statistical tools on the set of annotated mutations for each leukemia type, leading to the identification of a total of 32 biomarkers. In conclusion, the discovery of potential novel biomarkers, again with the additional biological information provided by the specific resources applied has demonstrated the importance of the application of NGS in the study of Leukemic patients.
2017
NGS, exome, Leukemia
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Marianna_Garonzi.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Accesso ristretto
Dimensione 2.55 MB
Formato Adobe PDF
2.55 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/960651
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact