- Protein structure prediction
Protein structure prediction is one of the most important goals pursued bybioinformatics andtheoretical chemistry . Its aim is the prediction of the three-dimensional structure ofprotein s from theiramino acid sequences, sometimes including additional relevant information such as the structures of related proteins. In other words, it deals with the prediction of a protein'stertiary structure from itsprimary structure . Protein structure prediction is of high importance inmedicine (for example, indrug design ) andbiotechnology (for example, in the design of novelenzymes ). Every two years, the performance of current methods is assessed in theCASP experiment.The practical role of protein structure prediction is now more important than ever. Massive amounts of protein sequence data are produced by modern large-scale
DNA sequencing efforts such as theHuman Genome Project . Despite community-wide efforts instructural genomics , the output of experimentally determined protein structures — typically by time-consuming and relatively expensiveX-ray crystallography or NMR spectroscopy — is lagging far behind the output of protein sequences.A number of factors exist that make protein structure prediction a very difficult task. The two main problems are that the number of possible protein structures is extremely large, and that the physical basis of protein structural stability is not fully understood. As a result, any protein structure prediction method needs a way to explore the space of possible structures efficiently (a search strategy), and a way to identify the most plausible structure (an
energy function).In comparative structure prediction, the search space is pruned by the assumption that the protein in question adopts a structure that is reasonably close to the structure of at least one known protein. In de novo or ab initio structure prediction, no such assumption is made, which results in a much harder search problem. In both cases, an energy function is needed to recognize the native structure, and to guide the search for the native structure. Unfortunately, the construction of such an energy function is to a great extent an open problem.
Direct simulation of
protein folding in atomic detail, via methods such asmolecular dynamics with a suitable energy function, is typically not tractable due to the high computational cost, despite the efforts of distributed computing projects such asFolding@home . Therefore, most de novo structure prediction methods rely on simplified representations of the atomic structure of proteins.The above mentioned issues apply to all proteins, including well-behaving, small,
monomeric proteins. In addition, for specific proteins (such as for example multimeric proteins and disordered proteins), the following issues also arise:* Some proteins require stabilisation by additional domains or binding partners to adopt their native structure. This requirement is typically unknown in advance and difficult to handle by a prediction method.
* The tertiary structure of a native protein may not be readily formed without the aid of additional agents. For example, proteins known as chaperones are required for some proteins to properly fold. Other proteins cannot fold properly without modifications such asglycosylation .
* A particular protein may be able to assume multiple conformations depending on its chemical environment.
* The biologically active conformation may not be the most thermodynamically favorable.Due to the increase in computer power, and especially new algorithms, much progress is being made to overcome these problems. However, routine de novo prediction of protein structures, even for small proteins, is still not achieved.
"Ab initio" protein modelling
"Ab initio"- or "de novo"- protein modelling methods seek to build three-dimensional protein models "from scratch", i.e., based on physical principles rather than (directly) on previously solved structures. There are many possible procedures that either attempt to mimic
protein folding or apply somestochastic method to search possible solutions (i.e.,global optimization of a suitable energy function). These procedures tend to require vast computational resources, and have thus only been carried out for tiny proteins. To predict protein structure "de novo" for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers (such asBlue Gene orMDGRAPE-3 ) or distributed computing (such asFolding@home , theHuman Proteome Folding Project andRosetta@Home ). Although these computational barriers are vast, the potential benefits of structural genomics (by predicted or experimental methods) make "ab initio" structure prediction an active research field.As an intermediate step towards predicted protein structures, contact map predictions have been proposed.
Comparative protein modelling
Comparative protein modelling uses previously solved structures as starting points, or templates. This is effective because it appears that although the number of actual proteins is vast, there is a limited set of tertiary
structural motif s to which most proteins belong. It has been suggested that there are only around 2000 distinct protein folds in nature, though there are many millions of different proteins.These methods may also be split into two groups:
*Homology modelling is based on the reasonable assumption that two homologous proteins will share very similar structures. Because a protein's fold is more evolutionarily conserved than its amino acid sequence, a target sequence can be modeled with reasonable accuracy on a very distantly related template, provided that the relationship between target and template can be discerned throughsequence alignment . It has been suggested that the primary bottleneck in comparative modelling arises from difficulties in alignment rather than from errors in structure prediction given a known-good alignment.cite journal |author=Zhang Y and Skolnick J |title=The protein structure prediction problem could be solved using the current PDB library |journal=Proc Natl Acad Sci USA |volume=102 |issue=4 |pages=1029–1034 |year=2005 |id=Entrez Pubmed|15653774 |doi=10.1073/pnas.0407152101 |pmid=15653774] Unsurprisingly, homology modelling is most accurate when the target and template have similar sequences.* Protein threadingcite journal |author=Bowie JU, Luthy R, Eisenberg D |title=A method to identify protein sequences that fold into a known three-dimensional structure |journal=Science |volume=253 |issue=5016 |pages=164–170 |year=1991 |id=Entrez Pubmed|1853201 |doi=10.1126/science.1853201 |pmid=1853201] scans the amino acid sequence of an unknown structure against a database of solved structures. In each case, a scoring function is used to assess the compatibility of the sequence to the structure, thus yielding possible three-dimensional models. This type of method is also known as 3D-1D fold recognition due to its compatibility analysis between three-dimensional structures and linear protein sequences. This method has also given rise to methods performing an inverse folding search by evaluating the compatibility of a given structure with a large database of sequences, thus predicting which sequences have the potential to produce a given fold.
ide chain geometry prediction
Even structure prediction methods that are reasonably accurate for the peptide backbone often get the orientation and packing of the amino acid
side chain s wrong. Methods that specifically address the problem of predicting side chain geometry includedead-end elimination and the self-consistent mean field method. Both discretize the continuously varyingdihedral angle s that determine a side chain's orientation relative to the backbone into a set ofrotamer s with fixed dihedral angles. The methods then attempt to identify the set of rotamers that minimize the model's overall energy. Rotamers are the side chain conformations with low energy. Such methods are most useful for analyzing the protein'shydrophobic core, where side chains are more closely packed; they have more difficulty addressing the looser constraints and higher flexibility of surface residues.cite journal |author=Voigt CA, Gordon DB, Mayo SL |title=Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design |journal=J Mol Biol |volume=299 |issue=3 |pages=789–803 |year=2000 |id=Entrez Pubmed|10835284 |doi=10.1006/jmbi.2000.3758]Software
MODELLER is a popular software tool for producing homology models using methodology derived from NMR spectroscopy data processing. [http://swissmodel.expasy.org//SWISS-MODEL.html SwissModel] provides an automated web server for basic homology modeling. Common software tools for protein threading are [http://toolkit.tuebingen.mpg.de/hhpred HHpred] , [http://meta.bioinfo.pl/submit_wizard.pl bioinfo.pl] , [http://robetta.bakerlab.org/ Robetta] , and [http://www.sbg.bio.ic.ac.uk/~3dpssm/ 3D-PSSM] . The basic algorithm for threading is described in and is fairly straightforward to implement.[http://www.eidogen-sertanty.com/products_tip_content.html TIP] is a knowledgebase of STRUCTFASTcite journal |author=Debe DA, Danzer JF, Goddard WA, Poleksic A |title=STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring |journal=Proteins |volume=64 |pages=960–967 |year=2006 |id=Entrez Pubmed|16786595 |doi=10.1002/prot.21049] models and precomputed similarity relationships between sequences, structures, and binding sites.
A very recent review of currently popular software for structure prediction can be found at.cite journal |author=Nayeem A, Sitkoff D, Krystek S Jr |title=A comparative study of available software for high-accuracy homology modeling: From sequence alignments to structural models |journal=Protein Sci |volume=15 |pages=808–824 |year=2006 |id=Entrez Pubmed|16600967 |doi=10.1110/ps.051892906 |pmid=16600967] A partial list of web servers and available tools is maintained [http://ncisgi.ncifcrf.gov/~ravichas/HomMod/ here] .
Several
distributed computing projects concerning protein structure prediction have also been implemented, such as theFolding@home ,Rosetta@home ,Human Proteome Folding Project ,Predictor@home andTANPAKU .The
Foldit program seeks to investigate the pattern-recognition and puzzle-solving abilities inherent to the human mind in order to create more successful computer protein structure prediction software.Protein-protein complexes
In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy,
protein-protein docking methods can be used to predict the structure of the complex. Information of the effect of mutations at specific sites on the affinity of the complex helps to understand the complex structure and to guide docking methods.ee also
*
Protein structure prediction software
*Protein-protein interaction prediction
* Molecular modeling softwareReferences
External links
* [http://predictioncenter.org/ CASP experiments home page]
* [http://speedy.embl-heidelberg.de/gtsp/flowchart2.html Structure Prediction Flowchart (a clickable map)]
Wikimedia Foundation. 2010.