International Chemical Identifier

International Chemical Identifier

The IUPAC International Chemical Identifier (InChI, pronounced "INchee") is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web. Developed by IUPAC and NIST during 2000-2005, the format and algorithms are non-proprietary and the software is freely available under the open source LGPL license (though the term "InChI" is a trademark of IUPAC). cite news
last = McNaught
first = Alan
title = The IUPAC International Chemical Identifier:InChl
work = Chemistry International
volume = 28
issue = 6
publisher = IUPAC
date = 2006
url =
accessdate = 2007-09-18


The identifiers describe chemical substances in terms of "layers" of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information. Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information is not relevant to the particular application.

InChIs differ from the widely used CAS registry numbers in three respects:
* they are freely usable and non-proprietary;
* they can be computed from structural information and do not have to be assigned by some organization;
* most of the information in an InChI is human readable (with practice).InChIs can thus be seen as akin to a general and extremely formalized version of IUPAC names. They can express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string which is important in database applications. Information about the 3-dimensional coordinates of atoms is not represented in InChI; for this purpose a format such as PDB can be used.

The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).

The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (25 character) condensed digital representation of the InChI that is not human-readable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematical with the full-length InChI.cite web
title = The IUPAC International Chemical Identifier (InChI)
publisher = IUPAC
date = 5 September 2007
url =
accessdate = 2007-09-18


Format and layers

Every InChI starts with the string "InCHI=" followed by the version number, currently 1. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:

#Main layer
#* Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
#* Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
#* Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.
#Charge layer
#* positive charge sublayer (prefix: "p")
#* negative charge sublayer (prefix: "q")
# Stereochemical layer
# Isotopic layer
# Fixed-H layer
# Reconnected Layer

The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.


The condensed, 25 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.cite web
title = The IUPAC International Chemical Identifier (InChI)
url =
] Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but finite chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present.

InChIKeys consist of 14 characters resulting from a hash of the connectivity information of the InChI, followed by a hyphen, followed by 8 characters resulting from a hash of the remaining layers of the InChI, followed by a single character indication the version of InChI used, followed by single checksum character.

Example: Morphine has the structure shown on right. The InChI for morphine is InChI=1/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11-,13-,16-,17-/m0/s1 and the InChIKey for morphine is BQJCRHHNABKAKU-XKUOQXLYBY. cite web
title = InChI=1/C17H19NO3/c1-18...
publisher = Chemspider
url =
accessdate = 2007-09-18


The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.

See also

* Molecular Query Language
* Molecule editor


External links

Documentation and presentations

* [ IUPAC InChI site]
* [ Unofficial InChI FAQ]
* [ Description of the canoicalization algorithm]
* [ Googling for InChIs] a presentation to the W3C.
* [ The Semantic Chemical Web: GoogleInChI and other Mashups] , Google Tech Talk by Peter Murray-Rust, 13 Sept 2006
* [ IUPAC InChI] , Google Tech Talk by Steve Heller and Steve Stein, 2 November 2006

oftware and services

* [ Generate InChI] (interactive service at University of Cambridge, either interactive or WSDL)
* [ Search Google for molecules] (generates InChI from interactive chemical and searches Google for any pages with embedded InChIs). Requires Javascript enabled on browser
* [ ChemSketch] , free chemical structure drawing package that includes input and output in InCHI format
* [ PubChem online molecule editor] that supports SMILES/SMARTS and InChI
* [ ChemSpider Services] that allows generation of InChI and conversion of InChI to structure (also SMILES and generation of other properties)
* [ MarvinSketch] implementation to draw structures (or open other file formats) and output to InChI file format
* [ InChIMatic] Draw your molecule and Google will search for it
* [ BKchem] implements its own InChI parser and uses the IUPAC implementation to generate InChI strings

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • International Chemical Identifier — El IUPAC International Chemical Identifier (InChI) es un identificador de sustancias químicas, diseñado para proporcionar una forma estándar y legible de codificar la información molecular y para facilitar la búsqueda de información en bases de… …   Wikipedia Español

  • International Chemical Identifier — Der IUPAC International Chemical Identifier (InChI, ausgesprochen Intschie ) (englisch: Internationale chemische Bezeichnung der IUPAC) ist ein chemischer Strukturcode, der es ermöglicht, ein Molekül in eine standardisierte Zeichenkette zu… …   Deutsch Wikipedia

  • Chemical substance — Chemical redirects here. For other uses, see Chemical (disambiguation). Steam and liquid water are two different forms of the same chemical substance, water. In chemistry, a chemical substance is a form of matter that has constant chemical… …   Wikipedia

  • Identifier — Identifiers on the back of a statue in the Louvre An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique class of objects, where the object or class may be an idea, physical [countable] object …   Wikipedia

  • Chemical nomenclature — A chemical nomenclature is a set of rules to generate systematic names for chemical compounds. The nomenclature used most frequently worldwide is the one created and developed by the International Union of Pure and Applied Chemistry (IUPAC). The… …   Wikipedia

  • Chemical file format — This article discusses some common molecular file formats, including usage and converting between them. Contents 1 Distinguishing formats 2 Chemical Markup Language 3 Protein Data Bank Format 4 G …   Wikipedia

  • International Union of Pure and Applied Chemistry nomenclature — IUPAC nomenclature is a system of naming chemical compounds and of describing the science of chemistry in general. It is developed and kept up to date under the auspices of the International Union of Pure and Applied Chemistry (IUPAC).The rules… …   Wikipedia

  • International Union of Pure and Applied Chemistry — The International Union of Pure and Applied Chemistry (IUPAC) (IPAEng|aɪjuːpæk or ay yoo pec ) is an international non governmental organization established in 1919 devoted to the advancement of chemistry. It has as its members national chemistry …   Wikipedia

  • Chemical Abstracts — Der Chemical Abstracts Service (Abkürzung: CAS), ist eine 1907 gegründete Unterabteilung der American Chemical Society. Sein Publikationsorgan Chemical Abstracts (CA) hat zum Ziel, weltweit sämtliche Chemie relevanten Veröffentlichungen zu… …   Deutsch Wikipedia

  • Los Angeles International Airport — LAX redirects here. For other uses, see LAX (disambiguation). See also: List of airports in the Los Angeles area Los Angeles International Airport …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”