- Hirschberg's algorithm
Hirschberg's algorithm, named after its inventor,
Dan Hirschberg , is adynamic programming algorithm that finds the least cost sequence alignment between two
strings, where cost is measured as Levenshtein edit distance, defined to be the sumof the costs of insertions, replacements, deletions, and null actions needed to changeone string to the other. Hirschberg's algorithm is simply described as a divide and conquer version of theNeedleman-Wunsch algorithm [ [http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/ Hirschberg's algorithm ] ] . Hirschberg's algorithm is commonly used incomputational biology to find maximal global alignments ofDNA andprotein sequences.Algorithm information
Hirschberg's algorithm is a generally applicablealgorithm for finding an optimal sequence alignment.
BLAST andFASTA are suboptimal heuristics. If "x" and "y" are strings, where |"x"| = "n" and |"y"| = "m", theNeedleman-Wunsch algorithm finds an optimal alignment in O("nm") time, using O("nm") space. Hirschberg's algorithm is a clever modification of the Needleman-Wunsch Algorithm which takes O("nm") time, but needs only O(min{"m","n"}) space.One application of the algorithm is finding sequence alignments of DNA or protein sequences.
=Computation of the Levenshtein edit distance in linear space=The edit distance Edit("x","y") is the least cost of changing "x" into "y" by using the operations Insert, Substitute, and Delete, where the costof each kind of operation is given. We write Ins("a") for the cost ofinserting a symbol "a" into a string, Sub("a","b") for the cost of substituting a symbol "b" for "a" in a string, and Del("a") for the cost of deleting a symbol "a" from a string. The "standard" choice of those costs is Ins("a") = Del("a") = 1 for each symbol "a", Sub("a","a") = 0, and Sub("a","b") = 1 if "a" is not equal to "b".The Needleman-Wunsch algorithm computes Edit("x","y") by computingEdit(Pref ["x","i"] ,Pref ["y","j"] ) for all "i" and "j" dynamically, where Pref ["x","i"] denotes the prefix of "x" of length "i". That algorithm requires O("nm") time and O("nm") space, where "n" = |"x"| and "m" = |"y"|.
Algorithm organization
To understand Hirschberg's algorithm, it is important to first understandthat edit distances can be computed using linear space.
What we call the "forward subprogram" computes the values ofEdit(Pref ["x","i"] ,Pref ["y","j"] ) for all "i" and "j", just as the Needleman-Wunschand returns the array {Edit("x",Pref ["y","j"] )}0 ≤ j ≤ m. The"backward subprogram" is similar, except that the dynamic programis done in the opposite direction, i.e., starting from the right endsof the strings. It returns the array {Edit("x",Suff ["y","j"] )}0 ≤ j ≤ m,where Suff ["y","j"] is the suffix of "y" of length "j".
The forward and backward subprograms compute values in the matrix by usingpreviously computed values, but save space by erasing values that will nolonger be needed during that execution of the subprogram. Unfortunately,the erased values will need to be recomputed later; thus, Hirschberg'salgorithm takes roughly twice as much time as Needleman-WunschFact|date=April 2007.
Pseudocode
01 High-level description of the forwards subprogram 02 03 Forwards [x,y] is 04 05 1. n = |x|; m = |y
06 2. For all i from 1 to n: 07 Edit [Pref [x,i] ,Pref [y,0] = 0 08 3. For all j from 1 to m: 09 A. Edit [Pref [x,0] ,Pref [y,j] = Edit [Pref [x,0] ,Pref [y,j-1] + Ins(y_j) 10 B. For all i from 1 to n, execute the following steps: 11 i. Edit [Pref [x,i] ,Pref [y,j] = 12 min{Edit [Pref [x,i-1] ,Pref [y,j] + Del(x_i), 13 Edit [Pref [x,i-1] ,Pref [y,j-1] + Sub(x_i,y_j), 14 Edit [Pref [x,i] ,Pref [y,j-1] + Ins(y_j)} 15 ii. Erase Edit [Pref [x,i-1] ,Pref [y,j-1] 16 C. Erase Edit [Pref [x,i-1] ,Pref [y,j] 17 4. Return {Edit [Pref [x,n] ,Pref [y,j] } %% an array of length m+101 High-level description of the backwards subprogram 02 03 Backwards [x,y] is 04 05 1. n = |x|; m = |y
06 2. For all i from 1 to n: 07 Edit [Suff [x,i] ,Suff [y,0] = 0 08 3. For all j from 1 to m: 09 A. Edit [Suff [x,0] ,Suff [y,j] = Edit [Suff [x,n] ,Suff [y,j-1] + Ins(y_{m-j+1}) 10 B. For all i from 1 to n: 11 i. Edit [Suff [x,i] ,Suff [y,j] = 12 min{Edit [Suff [x,i-1] ,Suff [y,j] + Del(x_{n-i-1}), 13 Edit [Suff [x,i-1] ,Suff [y,j-1] + Sub(x_{n-i-1},y_{m-j+1}), 14 Edit [Suff [x,i] ,Suff [y,j-1] + Ins(y_{m-j+1})} 15 ii. Erase Edit [Suff [x,i-1] ,Suff [y,j-1] 16 C. Erase Edit [Suff [x,i-1] ,Suff [y,j] 17 4. RETURN {Edit [Pref [x,n] ,Pref [y,j] } %% an array of length m+101 High level description of Hirschberg's algorithm: 02 03 Hirschberg(x,y) is 04 05 1. n = |x|; m = |y
06 2. If n <= 1 or m <= 1: 07 OUTPUT Alignment(x,y) using Needleman-Wunsch. 08 Else: 09 A. mid = floor(n/2) 10 B. x^left = Pref [x,mid] 11 C. x^right = Suff [x,n-mid] 12 D. {Edit [x^left,Pref [y,j] } = Forwards(x^left,y) %% an array of length m+1 13 E. {Edit [x^right,Suff [y,j] } = Backwards(x^right,y) %% an array of length m+1 14 F. cut = ArgMin{Edit [x^left,Pref [y,cut] + Edit [x^right,Suff [y,n-cut] } 15 G. Hirschberg(x^left,Pref [y,cut] ) 16 H. Hirschberg(x^right,Suff [y,n-cut] )ee also
*
Needleman-Wunsch algorithm
*Smith Waterman algorithm
*Levenshtein distance References
Wikimedia Foundation. 2010.