Threading (protein sequence)

Threading (protein sequence)

Threading is a method for the computational prediction of protein structure from amino acid sequence.

Protein threading or fold recognition refers to a class of computational methods for predicting the structure of a protein from amino acid sequence. The basic idea is that the target sequence (the protein sequence for which the structure is being predicted) is threaded through the backbone structures of a collection of template proteins (known as the fold library) and a “goodness of fit” score calculated for each sequence-structure alignment. This goodness of fit is often derived in terms of an empirical energy function, based on statistics derived from known protein structures, but many other scoring functions have been proposed and tried over the years. The most useful scoring functions include both pairwise terms (interactions between pairs of amino acids) and solvation terms. Threading methods share some of the characteristics of both comparative modelling methods (the sequence alignment aspect) and "ab initio" prediction methods (predicting structure based on identifying low-energy conformations of the target protein).

Fold recognition methods can be broadly divided into two types: 1. methods that derive a 1-D profile for each structure in the fold library and align the target sequence to these profiles; 2. methods that consider the full 3-D structure of the protein template. A simple example of a profile representation would be to take each amino acid in the structure and simply label it according to whether it is buried in the core of the protein or exposed on the surface. More elaborate profiles might take into account the local secondary structure (e.g. whether the amino acid is part of an alpha helix) or even evolutionary information (how conserved the amino acid is). In the 3-D representation, the structure is modelled as a set of inter-atomic distances i.e. the distances are calculated between some or all of the atom pairs in the structure. This is a much richer and far more flexible description of the structure, but is much harder to use in calculating an alignment. The profile-based fold recognition approach was first described by Bowie, Lüthy and Eisenberg in 1991. The term "threading" was first coined by Jones, Taylor and Thornton in 1992, and originally referred specifically to the use of a full 3-D structure atomic representation of the protein template in fold recognition. Today, the terms threading and fold recognition are frequently (though somewhat incorrectly) used interchangeably.

Fold recognition methods are widely used and effective because it is believed that there are a strictly limited number of different protein folds in nature, mostly as a result of evolution but also due to constraints imposed by the basic physics and chemistry of polypeptide chains. There is, therefore, a good chance (currently 70-80%) that a protein which has a similar fold to the target protein has already been studied by X-ray crystallography or NMR spectroscopy and can be found in the PDB (Protein Data Bank). Currently there are just over 1100 different protein folds known (see [ CATH database statistics] for latest view), but new folds are still being discovered every year thanks in part to the ongoing structural genomics projects.

Many different algorithms have been proposed for finding the correct threading of a sequence onto a structure, though many make use of dynamic programming in some form. For full 3-D threading, the problem of identifying the best alignment is very difficult (it is an NP-hard problem) and researchers have made use of many combinatorial optimization methods such as simulated annealing or branch and bound searching to arrive at heuristic solutions.

It is interesting to compare threading methods to methods which attempt to align two protein structures (Protein structural alignment), and indeed many of the same algorithms have been applied to both problems.

ee also

* Homology modeling
* Protein structure prediction software


JU. Bowie, R. Lüthy, D. Eisenberg (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science. 253:164-170.

DT. Jones, WR. Taylor, JM. Thornton (1992) A new approach to protein fold recognition. Nature. 358, 86-89.

RH. Lathrop (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7:1059-1068.

DT. Jones, C. Hadley (2000) Threading methods for protein structure prediction. (In) Bioinformatics: Sequence, structure and databanks. Higgins, D. & Taylor, W.R. Eds., pp1-13, Springer-Verlag, Heidelberg.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Threading — has more than one meaning:* Thread (computer science), a programming technique * Threaded code, another programming technique * Threaded discussion, a style of email and Usenet news handling * Threading (epilation), a hair removal method * Making …   Wikipedia

  • Protein structure prediction — is one of the most important goals pursued by bioinformatics and theoretical chemistry. Its aim is the prediction of the three dimensional structure of proteins from their amino acid sequences, sometimes including additional relevant information… …   Wikipedia

  • Protein structure — Proteins are an important class of biological macromolecules present in all biological organisms, made up of such elements as carbon, hydrogen, nitrogen, phosphorus, oxygen, and sulfur. All proteins are polymers of amino acids. The polymers, also …   Wikipedia

  • Statistical potential — In protein structure prediction, a statistical potential (also knowledge based potential, empirical potential, or residue contact potential) is an energy function derived from an analysis of known structures in the Protein Data Bank. Typical… …   Wikipedia

  • Homology modeling — Homology modeling, also known as comparative modeling of protein refers to constructing an atomic resolution model of the target protein from its amino acid sequence and an experimental three dimensional structure of a related homologous protein… …   Wikipedia

  • Structural alignment — is a form of sequence alignment based on comparison of shape. These alignments attempt to establish equivalences between two or more polymer structures based on their shape and three dimensional conformation. This process is usually applied to… …   Wikipedia

  • CASP — A target structure (ribbons) and 354 template based predictions superimposed (gray Calpha backbones); from CASP8 CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community wide, worldwide experiment… …   Wikipedia

  • Bioinformatics — For the journal, see Bioinformatics (journal). Map of the human X chromosome (from the NCBI website). Assembly of the human genome is one of the greatest achievements of bioinformatics. Bioinformatics …   Wikipedia

  • Bioinformática — Saltar a navegación, búsqueda La bioinformática, según una de sus definiciones más sencillas, es la aplicación de tecnología de computadores a la gestión y análisis de datos biológicos.[1] Los términos bioinformática, biología computacional y, en …   Wikipedia Español

  • cell — cell1 cell like, adj. /sel/, n. 1. a small room, as in a convent or prison. 2. any of various small compartments or bounded areas forming part of a whole. 3. a small group acting as a unit within a larger organization: a local cell of the… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”