- Quantitative structure-activity relationship
Quantitative structure-activity relationship (QSAR) is the process by which
chemical structure is quantitatively correlated with a well defined process, such asbiological activity or chemical reactivity.For example, biological activity can be expressed quantitatively as in the concentration of a substance required to give a certain biological response. Additionally, when physicochemical properties or structures are expressed by numbers, one can form a mathematical relationship, or quantitative structure-activity relationship, between the two. The mathematical expression can then be used to predict the biological response of other chemical structures.
QSAR's most general mathematical form is::Activity = "f"(physiochemical properties and/or structural properties)
AR and SAR paradox
The basic assumption for all molecule based
hypotheses is that similar molecules have similar activities. This principle is also called Structure-Activity Relationship (SAR). The underlying problem is therefore how to define a "small" difference on a molecular level, since each kind of activity, e.g. reaction ability,biotransformation ability,solubility , target activity, and so on, might depend on another difference. A good example was given in thebioisosterism review of Patanie/LaVoie. [G. A. Patani, E. J. LaVoie, "Bioisosterism: A Rational Approach in Drug Design". Chem. Rev., 1996, 96, 3147-3176. doi|10.1021/cr950066q]In general, one is more interested in finding strong
trend s. Created hypotheses usually rely on a finite number of chemical data. Thus, the induction principle should be respected to avoid overfitted hypotheses and deriving overfitted and useless interpretations on structural/molecular data.The SAR paradox refers to the fact that it is not the case that all similar molecules have similar activities.
Applications
Chemical
One of the first historical QSAR applications was to predict
boiling point s. [Danail Bonchev , D.H. Rouvray: "Chemical Graph Theory: Introduction and Fundamentals". Gordon and Breach Science Publishers, 1990, ISBN 0-85626-454-7.]It is well known for instance that within a particular family of
chemical compound s, especially oforganic chemistry , that there are strongcorrelation s between structure and observed properties. A simple example is the relationship between the number of carbons inalkanes and theirboiling point s. There is a clear trend in the increase of boiling point with an increase in the number carbons and this serves as a means for predicting the boiling points ofhigher alkanes .A still very interesting application is the
Hammett equation ,Taft equation and pKa prediction methods.cite conference | last = Fraczkiewicz | first = R | title = In Silico Prediction of Ionization | booktitle = Comprehensive Medicinal Chemistry II | editor = Testa B and van de Waterbeemd H, eds. | volume = vol. 5 | publisher = Elsevier | location = Amsterdam, The Netherlands | year = 2007 ]Biological
The biological activity of molecules is usually measured in
assay s to establish the level of inhibition of particularsignal transduction ormetabolic pathway s. Chemicals can also be biologically active by being toxic.Drug discovery often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific targets and have lowtoxicity (non-specific activity). Of special interest is the prediction ofpartition coefficient log "P", which is an important measure used in identifying "druglikeness " according toLipinski's Rule of Five .While many quantitative structure activity relationship analyses involve the interactions of a family of molecules with an
enzyme or receptor binding site, QSAR can also be used to study the interactions between thestructural domain s of proteins. Protein-protein interactions can be quantitatively analyzed for structural variations resulted fromsite-directed mutagenesis . [E. K. Freyhult, K. Andersson, M. G. Gustafsson, "Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR",J. Biophys., 2003, "84", ISSN|2264-2272. PMID 12668435] .It is part of the
machine learning method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also MVUE). In general all QSAR problems can be divided into acoding [Roberto Todeschini ,Viviana Consonni , "Handbook of Molecular Descriptors", Wiley-VCH, 2000. ISBN 3527299130] andlearning [R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", John Wiley & Sons, 2001. ISBN 0-471-05669-3] part.Data mining
For the coding usually a relatively large number of features or molecular descriptors is calculated, which can lack structural interpretation ability. In combination with the later applied learning method or as preprocessing step occurs a
feature selection problem.A typical
data mining based prediction uses e.g.support vector machine s,decision tree s,neural networks for inducing a predictive learning model.3D-QSAR
3D-QSAR refers to the application of force field calculations requiring three-dimensional structures, e.g. based on protein
crystallography or molecule superposition. It uses computed potentials, e.g. theLennard-Jones potential , rather than experimental constants and is concerned with the overall molecule rather than a single substituent. It examines the steric fields (shape of the molecule) and the electrostatic fields based on the applied energy function. [A. Leach, "Molecular Modelling: Principles and Applications", Prentice Hall, 2001. ISBN 0-582-38210-6]The created data space is then usually reduced by a following
feature extraction (see alsodimensionality reduction ). The following learning method can be any of the already mentionedmachine learning methods, e.g.support vector machine s. [Schölkopf, B., K. Tsuda and J. P. Vert: "Kernel Methods in Computational Biology", MIT Press, Cambridge, MA, 2004.]In the literature it can be often found that chemists have a preference for
partial least squares (PLS) methods, since it applies thefeature extraction and induction in one step.Molecule mining
Molecule mining approaches, a special case ofstructured data mining approaches, apply a similarity matrix based prediction or an automatic fragmentation scheme into molecular substructures. Furthermore there exist also approaches using maximum common subgraph searches orgraph kernel s. [Gusfield, D., "Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology", Cambridge University Press, 1997. ISBN 0-521-58519-8] [C. Helma (ed.), "Predictive Toxicology", CRC, 2005. ISBN 0-8247-2397-X]Fragment based (group contribution)
It has been shown that the
logP of compound can be determined by the sum of its fragments. Fragmentary logP values have been determined statistically. This method gives mixed results and is generally not trusted to have accuracy of more than +/- 0.1 units. [S. A. Wildman, G. M. Crippen, "Prediction of Physicochemical Parameters by Atomic Contributions", J. Chem. Inf. Comput. Sci.}, 1999, "39", 868-873. DOI|10.1021/ci990307l]Applicability Domain
As the use of (Q)SAR models for chemical risk management increases steadily and is also used for regulatory purposes (in the EU:
Registration, Evaluation, Authorisation and Restriction of Chemicals ), it is of crucial importance to be able to assess the reliability of predictions. The chemical descriptor space spanned by a particular training set of chemicals is calledApplicability Domain . It offers the opportunity to assess whether a compound can be reliably predicted.ee also
*
Structure-activity relationship
*Cheminformatics
*ADME
*Differential solubility
*Intermolecular force
*Pharmacokinetics
*Pharmacophore
*CLogP
*Computer-assisted drug design (CADD)
*Protein structure prediction
*QSAR & Combinatorial Science -Scientific journal
*Software for molecular mechanics modeling References
External links
* [http://www.qsarworld.com QSAR World - A comprehensive web resource for QSAR modelers]
* [http://media.wiley.com/product_data/excerpt/03/04712709/0471270903.pdf History of QSAR] - (PDF)
* [http://www.natureprotocols.com/2007/03/05/development_of_qsar_models_usi_1.php Development of QSAR models using C-QSAR program: a regression program that has dual databases of over 21,000 QSAR models (a protocol)]
* [http://www.syrres.com/qsar2008 The 13th International Workshop on Quantitative Structure-Activity Relationships (QSARs) in the Environmental Sciences]
* [http://www.qsar.org/ The Cheminformatics and QSAR Society]
Wikimedia Foundation. 2010.