In the last decade the increase in efficiency and decrease in cost of new sequencing techniques led to a growing amount of genomic sequences in public databases. With this huge volume of sequences being generated from highthroughput sequencing projects, the requirement for providing accurate and detailed genome annotations has never been greater. Structural genome annotation is the process of identifying structural features in a DNA sequence and classifying them based on their biological role. Computer programs are increasingly used to perform structural annotation since they meet the high-throughput demands of genome sequencing projects even if they are less accurate than manual gene annotation which remains the ‘golden-standard’ for evaluating annotation confidence and quality. The aim of this project is to meet the need of producing fast and accurate genome annotation by applying available computational means to different experimental cases, depending on the biological knowledge achieved so far and the quality of starting data. The contribution of different methods used to produce the final annotation has been analyzed along with the evaluation of results for the completeness of the study. The results obtained showed that the complexity of eukaryotic genomes greatly affects the annotation process; a big fraction of the genes in a genome sequence can be found mostly by homology to other known genes or proteins and by the use of ab initio predictors and species-specific evidence. The integration of multiple sources of annotation greatly improved the accuracy of the final genome annotations, anyway being not error free. Quality assessment of results and filtering of low confidence sequences together with manual revision are Always required to achieve higher accuracy.
Structural annotation of eukaryotic genomes in 2nd generation sequencing era
Dal Molin, Alessandra
2016-01-01
Abstract
In the last decade the increase in efficiency and decrease in cost of new sequencing techniques led to a growing amount of genomic sequences in public databases. With this huge volume of sequences being generated from highthroughput sequencing projects, the requirement for providing accurate and detailed genome annotations has never been greater. Structural genome annotation is the process of identifying structural features in a DNA sequence and classifying them based on their biological role. Computer programs are increasingly used to perform structural annotation since they meet the high-throughput demands of genome sequencing projects even if they are less accurate than manual gene annotation which remains the ‘golden-standard’ for evaluating annotation confidence and quality. The aim of this project is to meet the need of producing fast and accurate genome annotation by applying available computational means to different experimental cases, depending on the biological knowledge achieved so far and the quality of starting data. The contribution of different methods used to produce the final annotation has been analyzed along with the evaluation of results for the completeness of the study. The results obtained showed that the complexity of eukaryotic genomes greatly affects the annotation process; a big fraction of the genes in a genome sequence can be found mostly by homology to other known genes or proteins and by the use of ab initio predictors and species-specific evidence. The integration of multiple sources of annotation greatly improved the accuracy of the final genome annotations, anyway being not error free. Quality assessment of results and filtering of low confidence sequences together with manual revision are Always required to achieve higher accuracy.File | Dimensione | Formato | |
---|---|---|---|
TESI_DOTTORATO_DALMOLIN_ISBN.pdf
accesso aperto
Descrizione: PhD Thesis
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
3.42 MB
Formato
Adobe PDF
|
3.42 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.