The identifiers describe chemical substances in terms of "layers" of information — the atoms and their bond connectivity, tautomeric information,
isotope information,stereochemistry , and electronic charge information. Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information is not relevant to the particular application.InChIs differ from the widely used
CAS registry number s in three respects:
* they are freely usable and non-proprietary;
* they can be computed from structural information and do not have to be assigned by some organization;
* most of the information in an InChI is human readable (with practice).InChIs can thus be seen as akin to a general and extremely formalized version of IUPAC names. They can express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string which is important in database applications. Information about the 3-dimensional coordinates of atoms is not represented in InChI; for this purpose a format such as PDB can be used.The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).
Format and layers
Every InChI starts with the string "InCHI=" followed by the version number, currently 1. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:
#Main layer
#*Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
#* Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
#*Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.
#Charge layer
#* positive charge sublayer (prefix: "p")
#* negative charge sublayer (prefix: "q")
#Stereochemical layer
# Isotopic layer
# Fixed-H layer
# Reconnected LayerThe delimiter-prefix format has the advantage that a user can easily use a
wildcard search to find identifiers that match only in certain layers.InChIKey
Example:
Morphine has the structure shown on right. The InChI for morphine is InChI=1/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11-,13-,16-,17-/m0/s1 and the InChIKey for morphine is BQJCRHHNABKAKU-XKUOQXLYBY.
The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.
