Integral membrane proteins play a key role in detecting and conveying outside signals into cells, allowing them to interact and respond to their environment in a specific manner. They form principal nodes in several signaling pathways and attract large interest in therapeutic interventions as the majority of drug targets are associated to the cell's membrane. The original human genome sequence project estimated 20% of the total gene count of 31,778 genes to code for membrane proteins[1]. Thus membrane proteins constitute a very large set of yet-to be-characterized proteins mediating all the relevant life-related functions both in prokaryotes and eukaryotes. Estimates are suggesting that in whole genomes the content of this protein type may vary from 10% to 40% of the whole proteome, depending on the organism. As of today, this may change on time, while the rough amount of protein sequences is ~ 6,000,000 (in the Non Redundant data base [http://www.ncbi.nlm.nih.gov/]), the sequences annotated as “membrane protein” are just 45,281 in Swiss-Prot (http://expasy.org/sprot/ where the annotation is manually curated), and the solved atomic structures of membrane proteins are about 350 in the Protein Data Bank [http://www.rcsb.org/pdb/]. This is a very small number considering that we may consider a rough average of 30% of membrane proteins per genome (as derived from sequence similarity search) and end up with an approximate number of about 2,000,000 membrane proteins in the data bases. We can then easily evaluate that less than ~0.6% of membrane proteins are annotated and that ~0.001% of all the membrane protein sequences are known with atomic resolution , giving the idea of an enormous gap that should be filled in order to fully characterize the functioning of membrane proteins. The main reason behind these small numbers is that membrane proteins are very difficult to study as they are inserted into lipid bilayers surrounding the cell and its subcompartments, and expose to the polar outer and inner environments portions of different sizes. When isolated from membranes, membrane proteins are generally less stable than globular ones. It is therefore difficult to purify them in the native, functional form, and more difficult to crystallize them. Thus, crystallization of this type of proteins is yet a very difficult process, given the fact that they expose two different chemico-physical surfaces to the environment: water- and lipid-like. Still, in the last few years, and after great improvements in the techniques underlying X-ray crystallography, several new membrane proteins were solved in different activation states, offering to the entire scientific community a fundamental contribution to the characterization of astonishing mechanisms of signal transduction. Although the improvements in the technologies allowed the determination of several new structures in the last few years, the gap between the known membrane proteins and those with solved structures is still enormous. Thus, a deep combination of X-ray crystallography techniques, computational biology techniques and molecular biology validating experiments, is the key to face the challenge of bridging the gap existing between membrane proteins with and those without known structures. This and other issues may be resolved in the post-genomic era by taking advantage of the all the theoretical and experimental efforts aiming at developing tools based on our present knowledge that are capable of extracting selected structural/functional features from known sequences/structures and of computing the likelihood of their presence in never-seen before sequences/structures. Indeed, some of state-of-the-art tools, are based in the seminal idea that proteins are products of evolution and that their sequences contain millions of years of evolutionary information waiting to be extracted.

Knowledge Based Membrane Protein StructurePrediction: From X-Ray Crystallography toBioinformatics and Back to Molecular Biology

GIORGETTI, ALEJANDRO;Piccoli, Stefano
2011

Abstract

Integral membrane proteins play a key role in detecting and conveying outside signals into cells, allowing them to interact and respond to their environment in a specific manner. They form principal nodes in several signaling pathways and attract large interest in therapeutic interventions as the majority of drug targets are associated to the cell's membrane. The original human genome sequence project estimated 20% of the total gene count of 31,778 genes to code for membrane proteins[1]. Thus membrane proteins constitute a very large set of yet-to be-characterized proteins mediating all the relevant life-related functions both in prokaryotes and eukaryotes. Estimates are suggesting that in whole genomes the content of this protein type may vary from 10% to 40% of the whole proteome, depending on the organism. As of today, this may change on time, while the rough amount of protein sequences is ~ 6,000,000 (in the Non Redundant data base [http://www.ncbi.nlm.nih.gov/]), the sequences annotated as “membrane protein” are just 45,281 in Swiss-Prot (http://expasy.org/sprot/ where the annotation is manually curated), and the solved atomic structures of membrane proteins are about 350 in the Protein Data Bank [http://www.rcsb.org/pdb/]. This is a very small number considering that we may consider a rough average of 30% of membrane proteins per genome (as derived from sequence similarity search) and end up with an approximate number of about 2,000,000 membrane proteins in the data bases. We can then easily evaluate that less than ~0.6% of membrane proteins are annotated and that ~0.001% of all the membrane protein sequences are known with atomic resolution , giving the idea of an enormous gap that should be filled in order to fully characterize the functioning of membrane proteins. The main reason behind these small numbers is that membrane proteins are very difficult to study as they are inserted into lipid bilayers surrounding the cell and its subcompartments, and expose to the polar outer and inner environments portions of different sizes. When isolated from membranes, membrane proteins are generally less stable than globular ones. It is therefore difficult to purify them in the native, functional form, and more difficult to crystallize them. Thus, crystallization of this type of proteins is yet a very difficult process, given the fact that they expose two different chemico-physical surfaces to the environment: water- and lipid-like. Still, in the last few years, and after great improvements in the techniques underlying X-ray crystallography, several new membrane proteins were solved in different activation states, offering to the entire scientific community a fundamental contribution to the characterization of astonishing mechanisms of signal transduction. Although the improvements in the technologies allowed the determination of several new structures in the last few years, the gap between the known membrane proteins and those with solved structures is still enormous. Thus, a deep combination of X-ray crystallography techniques, computational biology techniques and molecular biology validating experiments, is the key to face the challenge of bridging the gap existing between membrane proteins with and those without known structures. This and other issues may be resolved in the post-genomic era by taking advantage of the all the theoretical and experimental efforts aiming at developing tools based on our present knowledge that are capable of extracting selected structural/functional features from known sequences/structures and of computing the likelihood of their presence in never-seen before sequences/structures. Indeed, some of state-of-the-art tools, are based in the seminal idea that proteins are products of evolution and that their sequences contain millions of years of evolutionary information waiting to be extracted.
9789533077543
comparative modeling; membrane proteins; X ray crystal structures
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11562/388697
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact