High-throughput biological data analysis has received a large amount of interest in the last decade due to pioneering technologies that are able to automatically generate large-scale datasets by performing millions of analytical tests on a daily basis. Here we present a new network-based approach to analyze a high-throughput phenomic dataset that was collected on maize inbreds and hybrids by an automated phenotyping facility. Our dataset consists of 1600 biological samples from 600 different genotypes (200 inbred and 400 hybrid lines). On each sample, 141 phenotypic traits were observed for 33 days. We apply a graph-theoretic approach to address two important problems: (i) to discover meaningful patterns in the dataset and (ii) to predict hybrid performance in terms of biomass based on automatically collected phenotypic traits. We propose a modelling framework in which the prediction problem becomes transformed into finding the shortest path in a correlation-based network. Preliminary results show small but encouraging correlations between predicted and observed biomass. Extensions of the algorithm and applications of the modelling framework to other types of biological data are discussed.

Towards a graph-theoretic approach to hybrid performance prediction from large-scale phenotypic data

CASTELLINI, ALBERTO;
2015-01-01

Abstract

High-throughput biological data analysis has received a large amount of interest in the last decade due to pioneering technologies that are able to automatically generate large-scale datasets by performing millions of analytical tests on a daily basis. Here we present a new network-based approach to analyze a high-throughput phenomic dataset that was collected on maize inbreds and hybrids by an automated phenotyping facility. Our dataset consists of 1600 biological samples from 600 different genotypes (200 inbred and 400 hybrid lines). On each sample, 141 phenotypic traits were observed for 33 days. We apply a graph-theoretic approach to address two important problems: (i) to discover meaningful patterns in the dataset and (ii) to predict hybrid performance in terms of biomass based on automatically collected phenotypic traits. We propose a modelling framework in which the prediction problem becomes transformed into finding the shortest path in a correlation-based network. Preliminary results show small but encouraging correlations between predicted and observed biomass. Extensions of the algorithm and applications of the modelling framework to other types of biological data are discussed.
2015
978-3-319-23107-5
Graph-theoretic approach
Performance Prediction
High-throughput
large-scale datasets
maize
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/967040
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact