- Neighbor-joining
In
bioinformatics , neighbor-joining is a bottom-up clustering method used for the construction of phylogenetic trees. Usually used for trees based onDNA orprotein sequence data, the algorithm requires knowledge of the distance between each pair of taxa (e.g. species or sequences) in the tree.The algorithm
Neighbor-joining is an iterative algorithm. Each iteration consists of the following steps:
# Based on the current
distance matrix calculate the matrix Q (explained below).
# Find the pair of taxa in Q with the lowest value. Create a node on the tree that joins these two taxa (i.e. join the closest neighbors, as the algorithm name implies).
# Calculate the distance of each of the taxa in the pair to this new node.
# Calculate the distance of all taxa outside of this pair to the new node.
# Start the algorithm again, considering the pair of joined neighbors as a single taxon and using the distances calculated in the previous step.The Q-matrix
Based on a distance matrix relating "r" taxa, calculate Q as follows:
: Q(i,j)=(r-2)d(i,j)-sum_{k=1}^r d(i,k) - sum_{k=1}^r d(j,k)
"d"("i","j") is the distance between taxa "i" and "j".
For example, if we have four taxa (A, B, C, D) and the following distance matrix:
We can start the procedure anew taking this matrix as the original distance matrix. In our example, it suffices to do one more step of the recursion to obtain the complete tree.
Pros and cons of the NJ method
Neighbor-joining is based on the minimum-evolution criterion for
phylogenetic tree s, i.e. the topology that gives the least total branch length is preferred at each step of the algorithm. However, neighbor-joining may not find the true tree topology with least total branch length because it is agreedy algorithm that constructs the tree in a step-wise fashion. Even though it is sub-optimal in this sense, it has been extensively tested and usually finds a tree that is quite close to the optimal tree. Nevertheless, it has been largely superseded in phylogenetics by methods that do not rely on distance measures and offer superior accuracy under most conditions.The main virtue of neighbor-joining relative to these other methods is its computational efficiency. That is, neighbor-joining is a polynomial-time algorithm. It can be used on very large data sets for which other means of phylogenetic analysis (e.g.
minimum evolution ,maximum parsimony ,maximum likelihood ) arecomputation ally prohibitive. Unlike theUPGMA algorithm for phylogenetic tree reconstruction, neighbor-joining does not assume that all lineages evolve at the same rate (molecular clock hypothesis ) and produces an unrooted tree. Rooted trees can be created by using anoutgroup and the root can then effectively be placed on the point in the tree where the edge from the outgroup connects.Furthermore, neighbor-joining is statistically consistent under many models of evolution. Hence, given data of sufficient length, neighbor-joining will reconstruct the true tree with high probability.
ee also
*
UPGMA References
* Atteson K (1997). "The performance of neighbor-joining algorithms of phylogeny reconstruction", pp. 101–110. "In" Jiang, T., and Lee, D., eds., "Lecture Notes in Computer Science, 1276", Springer-Verlag, Berlin. COCOON '97.
*
* Mihaescu R, Levy D, Pachter L (2006). " [http://arxiv.org/abs/cs.DS/0602041 Why neighbor-joining works] ".
*
*External links
* [http://www.icp.be/~opperd/private/neighbor.html The Neighbor-Joining Method] — a tutorial
Wikimedia Foundation. 2010.