- Secondary structure
In
biochemistry andstructural biology , secondary structure is the general three-dimensional form of "local segments" ofbiopolymer s such asprotein s andnucleic acid s (DNA/RNA). It does not, however, describe specific atomic positions in three-dimensional space, which are considered to betertiary structure .Secondary structure is formally defined by the
hydrogen bond s of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amide and carboxyl groups (sidechain-mainchain and sidechain-sidechain hydrogen bonds are irrelevant), where the DSSP definition of a hydrogen bond is used. In nucleic acids, the secondary structure is defined by the hydrogen bonding between the nitrogenous bases.The hydrogen bonding is correlated with other structural features, however, which has given rise to less formal definitions of secondary structure. For example, residues in protein helices generallyadopt backbone
dihedral angle s in a particular region of theRamachandran plot ; thus, a segment of residues with such dihedral angles is often called a "helix", regardless of whether it has the correct hydrogen bonds. Many other less formal definitions have been proposed, often applying concepts from thedifferential geometry of curves, such ascurvature andtorsion . Least formally, structural biologists solving a new atomic-resolution structure will sometimes assign its secondary structure "by eye" and record their assignments in the corresponding PDB file.The rough secondary-structure content of a biopolymer (e.g., "this protein is 40% α-helix and 20% β-sheet.") can often be estimated spectroscopically. For proteins, a common method is far-ultraviolet (far-UV, 170-250 nm)
circular dichroism . A pronounced double minimum at 208 and 222 nm indicate α-helical structure, whereas a single minimum at 204 nm or 217 nm reflects random-coil or β-sheet structure, respectively.A less common method isinfrared spectroscopy, which detects differences in the bondoscillations of amide groups due to hydrogen-bonding. Finally, secondary-structure contents may beestimated accurately using thechemical shift s of an unassigned NMR spectrum.Secondary structure was introduced by
Kaj Ulrik Linderstrøm-Lang in the 1952 Lane medical lectures atStanford .Proteins
Secondary structure in proteins consists of local inter-residue interactions mediated by hydrogen bonds. The most common secondary structures are alpha helices and
beta sheet s. Other helices, such as the 310 helix and π helix, are calculated to have energetically favorable hydrogen-bonding patterns but are rarely if ever observed in natural proteins except at the ends of α helices due to unfavorable backbone packing in the center of the helix. Other extended structures such as thepolyproline helix andalpha sheet are rare innative state proteins but are often hypothesized as importantprotein folding intermediates. Tight turns and loose, flexible loops link the more "regular" secondary structure elements. Therandom coil is not a true secondary structure, but is the class of conformations that indicate an absence of regular secondary structure.Amino acid s vary in their ability to form the various secondary structure elements.Proline andglycine are sometimes known as "helix breakers" because they disrupt the regularity of the α helical backbone conformation; however, both have unusual conformational abilities and are commonly found in turns. Amino acids that prefer to adopt helical conformations in proteins includemethionine ,alanine ,leucine ,glutamate andlysine ("MALEK" in amino-acid 1-letter codes); by contrast, the large aromatic residues (tryptophan ,tyrosine andphenylalanine ) and -branched amino acids (isoleucine ,valine , andthreonine ) prefer to adopt β-strand conformations. However, these preferences are not strong enough to produce a reliable method of predicting secondary structure from sequence alone.The DSSP code
The DSSP code is frequently used to describe the protein secondary structures with a single letter code. DSSP is an acronym for "Dictionary of Protein Secondary Structure", which was the title of the original article actually listing the secondary structure of the proteins with known 3D structure (Kabsch and Sander 1983). The secondary structure is assigned based on hydrogen bonding patterns as those initially proposed by Pauling et al. in 1951 (before any
protein structure had ever been experimentally determined).* G = 3-turn helix (310 helix). Min length 3 residues.
* H = 4-turn helix (alpha helix ). Min length 4 residues.
* I = 5-turn helix (pi helix ). Min length 5 residues.
* T = hydrogen bonded turn (3, 4 or 5 turn)
* E =beta sheet in parallel and/or anti-parallel sheet conformation (extended strand). Min length 2 residues.
* B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)
* S = bend (the only non-hydrogen-bond based assignment)In DSSP, residues which are not in any of the above conformations is designated as ' ' (space), which sometimes gets designated with C (coil) or L (loop). The helices (G,H and I) and sheet conformations are all required to have a reasonable length. This means that 2 adjacent residues in the primary structure must form the same hydrogen bonding pattern. If the helix or sheet hydrogen bonding pattern is too short they are designated as T or B, respectively. Other protein secondary structure assignment categories exist (sharp turns, Omega loops etc.), but they are less frequently used.
DSSP H-bond definition
Secondary structure is defined by
hydrogen bond ing, so the exact definition of a hydrogen bond is critical. The standard H-bond definition for secondary structure is that ofDSSP , which is a purely electrostatic model. It assigns charges of to the carbonyl carbon and oxygen, respectively, and charges of to the amide nitrogen and hydrogen, respectively. The electrostatic energy is:According to
DSSP , an H-bond exists if and only if is less than -0.5 kcal/mol. Although the DSSP formula is a relatively crude approximation of the "physical" H-bond energy, it is generally accepted as a tool for defining secondary structure.Protein secondary-structure prediction
Early methods of secondary-structure prediction were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. Such methods are typically ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts. A significant increase in accuracy (to nearly ~80%) was made by exploiting
multiple sequence alignment ; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughoutevolution provides a much better picture of the structural tendencies near that position. For illustration, a given protein might have aglycine at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the averagehydrophobicity at that and nearby positions, the same alignment might also suggest a pattern of residue solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, includingneural network s,hidden Markov model s andsupport vector machine s. Modern prediction methods also provide a confidence score for their predictions at every position.Secondary-structure prediction methods are continuously benchmarked, e.g., in the [http://cubic.bioc.columbia.edu/eva/sec/res_sec.html EVA] experiment. Based on ~270 weeks of testing, the most accurate methods at present are [http://bioinf.cs.ucl.ac.uk/psipred/psiform.html PsiPRED] , [http://www.soe.ucsc.edu/research/compbio/HMM-apps/T02-query.html SAM] , [http://distill.ucd.ie/porter/ PORTER] , [http://www.predictprotein.org PROF] and [http://sable.cchmc.org/ SABLE] . Interestingly, it does not seem to be possible to improve upon these methods by taking a consensus of them Fact|date=September 2008. The chief area for improvement appears to be the prediction of β-strands; residues confidently predicted as β-strand are likely to be so, but the methods are apt to overlook some β-strand segments (false negatives). There is likely an upper limit of ~90% prediction accuracy overall, due to the idiosyncrasies of the standard method (
DSSP ) for assigning secondary-structure classes (helix/strand/coil) to PDB structures, against which the predictions are benchmarked [citation needed] .Accurate secondary-structure prediction is a key element in the prediction of
tertiary structure , in all but the simplest (homology modeling) cases. For example, a confidently predicted pattern of six secondary structure elements βαββαβ is the signature of aferredoxin fold.Nucleic acids
Nucleic acid s also have secondary structure, most notably single-strandedRNA molecules. RNA secondary structure is generally divided into helices (contiguous base pairs), and various kinds of loops (unpaired nucleotides surrounded by helices). Thestem-loop structure in which a base-paired helix ends in a short unpaired loop is extremely common and is a building block for larger structural motifs such as cloverleaf structures, which are four-helix junctions such as those found intransfer RNA . Internal loops (a short series of unpaired bases in a longer paired helix) and bulges (regions in which one strand of a helix has "extra" inserted bases with no counterparts in the opposite strand) are also frequent. Finally, bothpseudoknot s and base triples are present in RNA (though not DNA).Since it is almost entirely base pair-mediated, RNA secondary structure can be said to define which bases are paired in a molecule or complex. However, the traditional
Watson-Crick base pair is not the only type of pairing that is permissible in RNA;Hoogsteen base pair ing is also common.RNA secondary structure prediction
See also
RNA structure One application of
bioinformatics uses predicted RNA secondary structures in searching agenome for noncoding but functional forms of RNA. For example, microRNAs have canonical long stem-loop structures interrupted by small internal loops. A general method of calculating probable RNA secondary structure isdynamic programming , although this has the disadvantage that it cannot detectpseudoknot s or other cases in which base pairs are not fully nested. More general methods are based onstochastic context-free grammar s. A web server that implements a type of dynamic programming is [http://bioweb.pasteur.fr/seqanal/interfaces/mfold-simple.html Mfold] .For many RNA molecules, the secondary structure is highly important to the correct function of the RNA — often more so than the actual sequence. This fact aids in the analysis of
non-coding RNA sometimes termed "RNA genes". RNA secondary structure can be predicted with some accuracy by computer and manybioinformatics applications use some notion of secondary structure in analysis of RNA.Alignment
Both protein and RNA secondary structures can be used to aid in multiple
sequence alignment . These alignments can be made more accurate by the inclusion of secondary structure information in addition to simple sequence information. This is sometimes less useful in RNA because base pairing is much more highly conserved than sequence. Distant relationships between proteins whose primary structures are unalignable can sometimes be found by secondary structure.ee also
*
Folding (chemistry)
*primary structure
*tertiary structure
*quaternary structure
*translation
*structural motif References
* C Branden and J Tooze (1999). "Introduction to Protein Structure" 2nd ed. Garland Publishing: New York, NY.
* M. Zuker "Computer prediction of RNA structure", "Methods in Enzymology", 180:262-88 (1989). (The classic paper on dynamic programming algorithms to predict RNA secondary structure.)
* L. Pauling and R.B Corey. "Configurations of polypeptide chains with favored orientations of the polypeptide around single bonds: Two pleated sheets." Proc. Natl. Acad. Sci. Wash., 37:729-740 (1951). (The original beta-sheet conformation article.)
* L. Pauling, R.B. Corey and H.R. Branson. "Two hydrogen-bonded helical configurations of the polypeptide chain." Proc. Natl. Acad. Sci. Wash., 37:205-211 (1951). (alpha- and pi-helix conformations, since they predicted that helices would not be possible.)
* [http://dx.doi.org/10.1002/bip.360221211]
External links
* [http://www.predictprotein.org PROF]
* [http://www.compbio.dundee.ac.uk/~www-jpred/ Jpred]
* [http://swift.cmbi.ru.nl/gv/dssp/ DSSP]
* [http://swift.cmbi.kun.nl/whatif/ WhatIf]
*Rasmol (protein visualization program, implements DSSP)
* [http://bioweb.pasteur.fr/seqanal/interfaces/mfold-simple.html Mfold]
Wikimedia Foundation. 2010.