Similarity matrix

Similarity matrix

A similarity matrix is a matrix of scores which express the similarity between two data points. Similarity matrices are strongly related to their counterparts, distance matrices and substitution matrices.Similarity of Matrix Representations is related to "Diagonal Matrix Representation".

Applications

Case Based Reasoning,Content Based Image Retrieval"'Intelligent Information Retrieval"'

Use in sequence alignment

Similarity matrices are used in sequence alignment. Higher scores are given to more-similar characters, and lower or negative scores for dissimilar characters.

Nucleotide similarity matrices are used to align nucleic acid sequences. Because there are only four nucleotides commonly found in DNA (Adenine (A), Cytosine (C), Guanine (G) and Thymine (T)), nucleotide similarity matrices are much simpler than protein similarity matrices. For example, a simple matrix will assign identical bases a score of +1 and non-identical bases a score of −1. A more complicated matrix would give a higher score to transitions (changes from a pyrimidine such as C or T to another pyrimidine, or from a purine such as A or G to another purine) than to transversions (from a pyrimidine to a purine or vice versa).The match/mismatch ratio of the matrix sets the target evolutionary distance (States et al. 1991 METHODS - A companion to Methods in Enzymology 3:66-70); the +1/−3 DNA matrix used by BLASTN is best suited for finding matches between sequences that are 99% identical; a +1/−1 (or +4/−4) matrix is much more sensitiveas it is optimal for matches between sequences that are about 70% identical.

Amino acid similarity matrices are more complicated, because there are 20 amino acids coded for by the genetic code. Therefore, the similarity matrix for amino acids contains 400 entries (although it is usually symmetric). The first approach scored all amino acid changes equally. A later refinement was to determine amino acid similarities based on how many base changes were required to change a codon to code for that amino acid. This model is better, but it doesn't take into account the selective pressure of amino acid changes. Better models took into account the chemical properties of amino acids.

One approach has been to empirically generate the similarity matrices. The Dayhoff method used phylogenetic trees and sequences taken from species on the tree. This approach has given rise to the PAM series of matrices. PAM matrices are labelled based on how many nucleotide changes have occurred, per 100 amino acids. While the PAM matrices benefit from having a well understood evolutionary model, they are most useful at short evolutionary distances (PAM10 - PAM120). At long evolutionary distances, for example PAM250 or 20% identity, it has been shown that the BLOSUM matrices are much more effective.

The BLOSUM series were generated by comparing a number of divergent sequences. The BLOSUM series are labeled based on how much entropy remains unmutated between all sequences, so a lower BLOSUM number corresponds to a higher PAM number.

See also

* Recurrence plot, a powerful visualisation tool of recurrences in dynamical (and other) systems.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Similarity Matrix of Proteins — Similarity Matrix of Proteins, ou SIMAP, est une base de données de similitude entre protéines et des domaines protéiniques. Celle ci rassemble toutes les séquences de protéines actuellement publiées et est continuellement mise à jour. Les… …   Wikipédia en Français

  • Matrix — Contents 1 Science and mathematics 2 Technology 3 Arts and entertainment …   Wikipedia

  • Matrix (biology) — In biology, matrix (plural: matrices) is the material (or tissue) between animal or plant cells, in which more specialized structures are embedded, and a specific part of the mitochondrion that is the site of oxidation of organic molecules. The… …   Wikipedia

  • Matrix mechanics — Quantum mechanics Uncertainty principle …   Wikipedia

  • Similarity — Similar redirects here. For the place in India, see Shimla. Contents 1 Specific definitions 2 In mathematics 3 In computer science 4 In other fields …   Wikipedia

  • Matrix metalloproteinase — Cell surface associated MT1 MMP (MMP14), Green fluorescent protein (GFP) fused to the C term produces a signal on the surface of the cell[1] Matrix metalloproteinases (MMPs) are zinc dependent endopeptidases; other family members are adamalysins …   Wikipedia

  • Matrix equivalence — In linear algebra, two rectangular m by n matrices A and B are called equivalent if for some invertible n by n matrix P and some invertible m by m matrix Q. Equivalent matrices represent the same linear transformation V → W under two… …   Wikipedia

  • similarity transformation — Math. 1. Also called homothetic transformation. a mapping of a set by which each element in the set is mapped into a positive constant multiple of itself, the same constant being used for all elements. 2. an operation performed upon a square… …   Universalium

  • similarity transformation — Math. 1. Also called homothetic transformation. a mapping of a set by which each element in the set is mapped into a positive constant multiple of itself, the same constant being used for all elements. 2. an operation performed upon a square… …   Useful english dictionary

  • Diagonalizable matrix — In linear algebra, a square matrix A is called diagonalizable if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P such that P −1AP is a diagonal matrix. If V is a finite dimensional vector space, then a linear …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”