In the last decade the increase in efficiency and decrease in cost of new sequencing techniques led to a growing amount of genomic sequences in public databases. With this huge volume of sequences being generated from highthroughput sequencing projects, the requirement for providing accurate and detailed genome annotations has never been greater. Structural genome annotation is the process of identifying structural features in a DNA sequence and classifying them based on their biological role. Computer programs are increasingly used to perform structural annotation since they meet the high-throughput demands of genome sequencing projects even if they are less accurate than manual gene annotation which remains the ‘golden-standard’ for evaluating annotation confidence and quality. The aim of this project is to meet the need of producing fast and accurate genome annotation by applying available computational means to different experimental cases, depending on the biological knowledge achieved so far and the quality of starting data. The contribution of different methods used to produce the final annotation has been analyzed along with the evaluation of results for the completeness of the study. The results obtained showed that the complexity of eukaryotic genomes greatly affects the annotation process; a big fraction of the genes in a genome sequence can be found mostly by homology to other known genes or proteins and by the use of ab initio predictors and species-specific evidence. The integration of multiple sources of annotation greatly improved the accuracy of the final genome annotations, anyway being not error free. Quality assessment of results and filtering of low confidence sequences together with manual revision are Always required to achieve higher accuracy.

Structural annotation of eukaryotic genomes in 2nd generation sequencing era

Dal Molin, Alessandra
2016-01-01

Abstract

In the last decade the increase in efficiency and decrease in cost of new sequencing techniques led to a growing amount of genomic sequences in public databases. With this huge volume of sequences being generated from highthroughput sequencing projects, the requirement for providing accurate and detailed genome annotations has never been greater. Structural genome annotation is the process of identifying structural features in a DNA sequence and classifying them based on their biological role. Computer programs are increasingly used to perform structural annotation since they meet the high-throughput demands of genome sequencing projects even if they are less accurate than manual gene annotation which remains the ‘golden-standard’ for evaluating annotation confidence and quality. The aim of this project is to meet the need of producing fast and accurate genome annotation by applying available computational means to different experimental cases, depending on the biological knowledge achieved so far and the quality of starting data. The contribution of different methods used to produce the final annotation has been analyzed along with the evaluation of results for the completeness of the study. The results obtained showed that the complexity of eukaryotic genomes greatly affects the annotation process; a big fraction of the genes in a genome sequence can be found mostly by homology to other known genes or proteins and by the use of ab initio predictors and species-specific evidence. The integration of multiple sources of annotation greatly improved the accuracy of the final genome annotations, anyway being not error free. Quality assessment of results and filtering of low confidence sequences together with manual revision are Always required to achieve higher accuracy.
2016
9788869250071
structural annotation, NGS, eukaryotic, genome annotation, gene annotation, repeat annotation, plant, fungi, RNA-seq, ab initio prediction
File in questo prodotto:
File Dimensione Formato  
TESI_DOTTORATO_DALMOLIN_ISBN.pdf

accesso aperto

Descrizione: PhD Thesis
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 3.42 MB
Formato Adobe PDF
3.42 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/954305
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact