- Protein structure
Proteins are an important class of biologicalmacromolecules present in all biological organisms, made up of such elements ascarbon ,hydrogen ,nitrogen ,phosphorus ,oxygen , andsulfur . All proteins arepolymers ofamino acid s. The polymers, also known aspolypeptides consist of a sequence of 20 different L-α-amino acids, also referred to as residues. For chains under 40 residues the termpeptide is frequently used instead of protein. To be able to perform their biological function, proteins fold into one, or more, specific spatial conformations, driven by a number of noncovalent interactions such ashydrogen bonding ,ionic interaction s,Van der Waals forces andhydrophobic packing. In order to understand the functions of proteins at a molecular level, it is often necessary to determine the three dimensional structure of proteins. This is the topic of the scientific field ofstructural biology , that employs techniques such asX-ray crystallography or NMR spectroscopy, to determine the structure of proteins.A number of residues are necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. Protein sizes range from this lower limit to several thousand residues in multi-functional or structural proteins. However, the current estimate for the average protein length is around 300 residues.Fact|date=February 2008 Very large aggregates can be formed from
protein subunit s, for example many thousandactin molecules assemble into a microfilament.Levels of protein structure
Biochemistry refers to four distinct aspects of a protein's structure:
* Primary structure - the amino acid sequence of the peptide chains.
* Secondary structure - highly regular sub-structures ("alpha helix " and "strands ofbeta sheet ") which are locally defined, meaning that there can be many different secondary motifs present in one single protein molecule.
* Tertiary structure - three-dimensional structure of a single protein molecule; a spatial arrangement of the secondary structures. It also describes the completely folded and compacted polypeptide chain.
* Quaternary structure - complex of several protein molecules or polypeptide chains, usually called protein subunits in this context, which function as part of the larger assembly or protein complex.In addition to these levels of structure, a protein may shift between several similar structures in performing its biological function. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as
chemical conformation , and transitions between them are called conformational changes.The primary structure is held together by covalent or
peptide bond s, which are made during the process ofprotein biosynthesis or translation. These peptide bonds provide rigidity to the protein. The two ends of the amino acid chain are referred to as the C-terminal end or carboxyl terminus (C-terminus) and the N-terminal end or amino terminus (N-terminus) based on the nature of the free group on each extremity.The various types of secondary structure are defined by their patterns of hydrogen bonds between the main-chain peptide groups. However, these hydrogen bonds are generally not stable by themselves, since the water-amide hydrogen bond is generally more favorable than the amide-amide hydrogen bond. Thus, secondary structure is stable only when the local concentration of water is sufficiently low, e.g., in the
molten globule or fully folded states.Similarly, the formation of molten globules and tertiary structure is driven mainly by structurally "non-specific" interactions, such as the rough propensities of the amino acids and hydrophobic interactions. However, the tertiary structure is "fixed" only when the parts of a protein domain are locked into place by structurally "specific" interactions, such as ionic interactions (salt bridges), hydrogen bonds and the tight packing of side chains. The tertiary structure of extracellular proteins can also be stabilized by
disulfide bond s, which reduce the entropy of the unfolded state; disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is generally a reducing environment.Structure of the amino acids
An α-amino acid consists of a part that is present in all the amino acid types, and a side chain that is unique to each type of residue. The Cα atom is bound to 4 different molecules (the H is omitted in the diagram); an amino group, a carboxyl group, a hydrogen and a side chain, specific for this type of amino acid. An exception from this rule is
proline , where the hydrogen atom is replaced by a bond to the side chain. Because the carbon atom is bound to four different groups it is chiral, however only one of theisomer s occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simplemnemonic for correct L-form is "CORN": when the Cα atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.The side chain determines the chemical properties of the α-amino acid and may be any one of the 20 different side chains:Turns, loops and a few other secondary structure elements such as a 3-10helix complete the picture. We have now enough pieces to assemble a complete protein, displaying its typical tertiary structure.
Tertiary structure
The elements of secondary structure are usually folded into a compact shape using a variety of loops and turns. The formation of tertiary structure is usually driven by the burial of hydrophobic residues, but other interactions such as hydrogen bonding, ionic interactions and disulfide bonds can also stabilize the tertiary structure. The tertiary structure encompasses all the noncovalent interactions that are not considered secondary structure, and is what defines the overall fold of the protein, and is usually indispensable for the function of the protein.
Quaternary structure
The quaternary structure is the interaction between several chains of peptide bonds. The individual chains are called subunits. The individual subunits are not necessarily covalently connected, but might be connected by a disulfide bond. Not all proteins have quaternary structure, since they might be functional as monomers. The quaternary structure is stabilized by the same range of interactions as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. Multimers made up of identical subunits may be referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits may be referred to with a prefix of "hetero-" (e.g. a heterodimer). Tertiary structures vary greatly from one protein to another. They are held together by glycosydic and covalent bonds.
Side chain conformation
The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є and so on. Cα refers to the carbon atom closest to the carbonyl group of that amino acid, Cβ the second closest and so on. The Cα is usually considered a part of the backbone. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3 etc. E.g. the first and second carbon atom in the side chain of lysine is named α and β, and the dihedral angle around the α-β bond is named χ1. Side chains can be in different conformations called gauche(-), trans and gauche(+). Side chains generally tend to try to come into a
staggered conformation around χ2, driven by the minimization of the overlap between theelectron orbital s of the hydrogen atoms.Domains, motifs, and folds in protein structure
Many proteins are organized into several units. A
structural domain is an element of the protein's overall structure that is self-stabilizing and often folds independently of the rest of the protein chain. Many domains are not unique to the protein products of onegene or onegene family but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain ofcalmodulin ". Because they are self-stabilizing, domains can be "swapped" bygenetic engineering between one protein and another to make chimeras. A motif in this sense refers to a small specific combination of secondary structural elements (such ashelix-turn-helix ). These elements are often calledsupersecondary structure s. Fold refers to a global type of arrangement, likehelix bundle orbeta-barrel . Structure motifs usually consist of just a few elements, e.g. the 'helix-turn-helix' has just three. Note that while the "spatial sequence" of elements is the same in all instances of a motif, they may be encoded in any order within the underlyinggene . Protein structural motifs often include loops of variable length and unspecified structure, which in effect create the "slack" necessary to bring together in space two elements that are not encoded by immediately adjacentDNA sequence s in a gene. Note also that even when two genes encode secondary structural elements of a motif in the same order, nevertheless they may specify somewhat different sequences ofamino acid s. This is true not only because of the complicated relationship between tertiary and primary structure, but because the size of the elements varies from one protein and the next. Despite the fact that there are about 100,000 different proteins expressed in eukaryotic systems, there are much fewer different domains, structural motifs and folds. This is partly a consequence ofevolution , since genes or parts of genes can be doubled or moved around within the genome. This means that, for example, a protein domain might be moved from one protein to another thus giving the protein a new function. Because of these mechanisms pathways and mechanisms tends to be reused in several different proteins.Protein folding
The process by which the higher structures form is called protein folding and is a consequence of the primary structure. A unique polypeptide may have more than one stable folded conformation, which could have a different biological activity, but usually, only one conformation is considered to be the active, or native conformation.
Structure classification
Several methods have been developed for the structural classification of proteins. These seek to classify the data in the
Protein Data Bank in a structured order. Several databases exist which classify proteins using different methods.SCOP ,CATH andFSSP are the largest ones. The methods used are purely manual, manual and automated, and purely automated. Work is being done to better integrate the current data. The classification is consistent between SCOP, CATH and FSSP for the majority of proteins which have been classified, but there are still some differences and inconsistencies.Protein structure determination
Around 90% of the protein structures available in the
Protein Data Bank have been determined byX-ray crystallography . This method allows one to measure the 3D density distribution of electrons in the protein (in the crystallized state) and therebyinfer the 3D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques, which can also be used to determine secondary structure. Note that aspects of the secondary structure as whole can be determined via other biochemical techniques such ascircular dichroism . Secondary structure can also be predicted with a high degree of accuracy (see next section).Cryo-electron microscopy has recently become a means of determining protein structures to high resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.Computational prediction of protein structure
The generation of a
protein sequence is much simpler than the generation of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been proposed. "Ab initio" prediction methods use just the sequence of the protein. Threading uses existing protein structures. Homology Modeling to build a reliable 3D model for a protein of unknown structure from one or more related proteins of known structure.Rosetta@home is adistributed computing project which tries to predict the structures of proteins with massive sampling on thousands of home computers.Foldit is avideo game designed to use humanpattern recognition andpuzzle solving abilities to improve existing software.oftware
There are many available software packages, such as free web-based
STING , used to visualize and analyze protein structures. Another example is the [http://www.cbs.dtu.dk/services/FeatureMap3D/ FeatureMap3D] web-server which can visualize the quality of a protein-protein alignment in 3D and be used to map "sequence feature annotation" such as the underlyingIntron /Exon structure onto a protein structure.Several packages, such as Quantum Pharmaceuticals software [ [http://www.q-pharm.com Quantum Pharmaceuticals software] ] , can be used to predict conformational changes of proteins and its influence on protein's functions.
Several methods have been developed to compare structures of different proteins. Please see
structural alignment .Computational tools are also frequently employed to check experimental and theoretical models of protein structures for errors (examples: [http://www.came.sbg.ac.at/typo3/index.php?id=prosa ProSA] , [https://flipper.services.came.sbg.ac.at/ NQ-Flipper] , [http://www.doe-mbi.ucla.edu/Services/Verify_3D/ Verify3D] , [http://www.swissmodel.unibas.ch/anolea/ ANOLEA] , [http://swift.cmbi.ru.nl/gv/whatcheck/ WHAT_CHECK] ).
Software for molecular mechanics modeling useful for building and simulation of protein models.References
Further reading
* (Bayesian computational methods for the structure determination from NMR data)
External links
* [https://prosa.services.came.sbg.ac.at/prosa.php ProSA-web] Web service for the recognition of errors in experimentally or theoretically determined protein structures
* [https://flipper.services.came.sbg.ac.at/ NQ-Flipper] Check for unfavorable rotamers of Asn and Gln residues in protein structures
* [http://swift.cmbi.ru.nl/ servers] That check nearly 200 aspects of protein structure, like packing, geometry, unfavourable rotamers in general of for Asn, Gln and His especially, strange water molecules, backbone conformations, atom nomenclature, symmetry parameters, etc.
* [http://swift.cmbi.ru.nl/teach/B1/ Bioinformatics course] . An interactive, fully free, course explaining many of the aspects discussed in this wiki entry.
Wikimedia Foundation. 2010.