Mutations of RNA binding motif protein 20 (RBM20) have been recently reported to cause Human dilated cardiomyopathy (DCM) (Brauch et al., 2009, Li et al., 2010). DCM is the major cause of heart failure and mortality around the world (Jefferies and Towbin, 2010). Overall, 25–50% of DCM cases are familiar and causative mutations which have been described in more than 50 genes encoding mostly for structural components of cardiomyocytes. RBM20 belongs to the family of the SR and SR-related RNA binding proteins which assemble in the spliceosome taking part in the splicing of pre-mRNA. RBM20 is mainly expressed in striated muscle, with the highest levels in the heart (Guo et al., 2012). Due to its involvement in DCM, RBM20 was studied a lot to unveil its mechanism of action and its RNA targets (Guo et al., 2012, Li et al., 2013). Guo and colleagues reported a set of 31 genes showing a RBM20 dependent splicing from a whole transcriptome analysis in rats and humans (Guo et al., 2012). More recently, Maatz and colleagues reported an additional set of 18 rat genes and observed that RNA sequences recognized by RBM20 are likely to be located in the 400 nucleotides flanking the exons whose alternative splicing is regulated by RBM20 (Maatz et al., 2014). However, both the suggested RNA sequence which is recognized by RBM20 and its over-representation over the flanking regions of affected exons remain poor predictors to target genes presenting splicing events regulated by RBM20. The aim of this work was, thus, to characterize, through a bioinformatic approach, the sequence motifs of the exons whose alternative splicing was affected by RBM20, in order to ameliorate the prediction of the genes (exons) affected by RBM20. A differential expression analysis was performed to select the dataset of RBM20 affected exons; a further dataset was retrieved from literature data (Maatz et al., 2014). A Support Vector Machine (SVM) approach evaluating more kinds of genetic elements binding in the flanking regions of our target exons was used. A SVM method was chose to classify RBM20 affected and not affected exons, but other machine learning algorithms could have been used as well; however, SVM is among the most commonly used ones. From the analyses, our model resulted to well discriminate RBM20 affected from not affected exons. From a biological and functional point of view, this approach helps us to target novel candidate genes associated to diseases depending on a dysregulation of RBM20. This study provided additional information about RBM20 regulation of target exons, based not only on the RNA binding site, but also on other genetic elements associated to the binding site. Furthermore, we proposed the first model based on a SVM algorithm for the classification of RBM20 affected and not affected exons.

A SVM-based method to classify RBM20 affected and not affected exons

Dal Molin, Anna
2017-01-01

Abstract

Mutations of RNA binding motif protein 20 (RBM20) have been recently reported to cause Human dilated cardiomyopathy (DCM) (Brauch et al., 2009, Li et al., 2010). DCM is the major cause of heart failure and mortality around the world (Jefferies and Towbin, 2010). Overall, 25–50% of DCM cases are familiar and causative mutations which have been described in more than 50 genes encoding mostly for structural components of cardiomyocytes. RBM20 belongs to the family of the SR and SR-related RNA binding proteins which assemble in the spliceosome taking part in the splicing of pre-mRNA. RBM20 is mainly expressed in striated muscle, with the highest levels in the heart (Guo et al., 2012). Due to its involvement in DCM, RBM20 was studied a lot to unveil its mechanism of action and its RNA targets (Guo et al., 2012, Li et al., 2013). Guo and colleagues reported a set of 31 genes showing a RBM20 dependent splicing from a whole transcriptome analysis in rats and humans (Guo et al., 2012). More recently, Maatz and colleagues reported an additional set of 18 rat genes and observed that RNA sequences recognized by RBM20 are likely to be located in the 400 nucleotides flanking the exons whose alternative splicing is regulated by RBM20 (Maatz et al., 2014). However, both the suggested RNA sequence which is recognized by RBM20 and its over-representation over the flanking regions of affected exons remain poor predictors to target genes presenting splicing events regulated by RBM20. The aim of this work was, thus, to characterize, through a bioinformatic approach, the sequence motifs of the exons whose alternative splicing was affected by RBM20, in order to ameliorate the prediction of the genes (exons) affected by RBM20. A differential expression analysis was performed to select the dataset of RBM20 affected exons; a further dataset was retrieved from literature data (Maatz et al., 2014). A Support Vector Machine (SVM) approach evaluating more kinds of genetic elements binding in the flanking regions of our target exons was used. A SVM method was chose to classify RBM20 affected and not affected exons, but other machine learning algorithms could have been used as well; however, SVM is among the most commonly used ones. From the analyses, our model resulted to well discriminate RBM20 affected from not affected exons. From a biological and functional point of view, this approach helps us to target novel candidate genes associated to diseases depending on a dysregulation of RBM20. This study provided additional information about RBM20 regulation of target exons, based not only on the RNA binding site, but also on other genetic elements associated to the binding site. Furthermore, we proposed the first model based on a SVM algorithm for the classification of RBM20 affected and not affected exons.
2017
alternative splicing, differential gene expression, motif finding, support vector machine
File in questo prodotto:
File Dimensione Formato  
Tesi di dottorato - Anna Dal Molin.pdf

Open Access dal 17/12/2018

Descrizione: Doctoral thesis Anna Dal Molin
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 1.65 MB
Formato Adobe PDF
1.65 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/965609
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact