Mining genetic, transcriptomic, and imaging data in Parkinson’s disease

Cerri, Guglielmo; Tognon, Manuel; Andre, Altmann; Giugno, Rosalba

doi:10.1109/ICHI52183.2021.00105

Parkinson’s disease (PD) is a brain disorder that leads to shaking, stiffness and difficulties with walking, balance, and coordination. Affected people may also have mental and behavioral changes, sleep problems, depression, memory difficulties and fatigue. PD is an age-related disease, with an increased prevalence in populations of subjects over the age of 60. About 5 to 10% of PD patients have an "early-onset" variant and it is often, but not always, inherited. PD is characterized by the loss of groups of neurons involved in the control of voluntary movements. Here we present a novel imaging-genetics workflow on Parkinson’s disease aimed to discover some new potential candidate biomarkers for Parkinson’s disease onset, by interpolating genotyping, transcriptomic, functional (Dopamine Transporter Scan) and morphological (Magnetic Resonance Imaging) imaging data. The proposed tutorial has the aim to encourage and stimulate the attendees on the biomedical research with the advantage of integration of heterogenous data. In the last decade the use of images together with genetics data has become widespread among the bioinformatics researchers. This has allowed to inspect and investigate in detail different specific diseases, to better understand their origin and cause. While in recent years many imaging genetics analyses have been developed and successfully applied to characterize brain functioning and neurodegenerative diseases such as Alzheimer’s disease, to our knowledge, no standard imaging genetics workflow has been proposed for PD. The novelty of our workflow can be summarized as follows: • We propose a domain free and easy-to-use workflow, integrating heterogenous data, such as genotyping, transcriptomic, and imaging data. • The workflow addresses the complexity of integrating real multi-source data when a limited number of data are available by proposing three step-based method, where the first step integrates genotyping and imaging features considering each feature individually, the second step summarizes imaging features in a single measure, and the last step focuses on linking potential functional effects caused by the biomarkers found during the two previous phases. • We propose a validation of the method on genetic and imaging data related to PD, showing our new results. The data used for this tutorial were obtained from the Parkinson’s Progression Marker Initiative (PPMI) data portal. Currently, PPMI is the most complete and comprehensive collection of PD-related data. The dataset that will be used in the tutorial consists in a set of polymorphisms, more specifically insertions and deletions (indels) or Single Nucleotide Polymorphisms (SNPs), and transcriptomic data retrieved by RNA sequencing. In addition, DaTSCAN and MRI data are used, which have been shown to be effective in providing potential biomarkers for PD onset and progression. The attendees will acquire an experience on how to conduct a complete imaging-genetics workflow, in a specific case study of Parkinsonian subjects. After the tutorial session the attendees will be able to conduct themselves an imaging-genetics pipeline, which could also be applied to study other neurological diseases. The tutorial will introduce the partecipants to the biological background, especially with the notion of DNA, RNA, Single-nucleotide polymorphism (SNP) and Genome-Wide Association Study (GWAS). The participants will have the opportunity to get familiar with PLINK, a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyzes in a computationally efficient manner. It provides a large range of functionalities designed for data management, summary statistics, quality control, population stratiﬁcation detection, association analysis, etc. for genotyping data analysis. The audience will also learn how to run code on the widely used R programming environment for statistical computing and graphics. They will also learn some notions about Python, especially how to deal efficiently, with genotyping data using Pandas library, which was designed for data manipulation and analysis. The tutorial code is wrapped in different Jupyter notebooks (formerly IPython Notebooks), that is a web-based and system-independent interactive computational environment for easy analysis reproducibility.