# Binary GCD algorithm

Binary GCD algorithm

The binary GCD algorithm is an algorithm which computes the greatest common divisor of two nonnegative integers. It gains a measure of efficiency over the ancient Euclidean algorithm by replacing divisions and multiplications with shifts, which are cheaper when operating on the binary representation used by modern computers. This is particularly critical on embedded platforms that have no direct processor support for division. While the algorithm was first published by the German Josef Stein in 1961, it may have been known in first-century China.Donald Knuth, "The Art of Computer Programming, Volume 2: Seminumerical Algorithms" (3rd Edition). Addison-Wesley.]

Algorithm

This algorithm is also known as Stein's Algorithm.

The algorithm reduces the problem of finding the GCD by repeatedly applying these identities:

# gcd(0, "v") = "v", because everything divides zero, and "v" is the largest number that divides "v". Similarly, gcd("u", 0) = "u". gcd(0, 0) is not defined.
# If "u" and "v" are both even, then gcd("u", "v") = 2·gcd("u"/2, "v"/2), because 2 is a common divisor.
# If "u" is even and "v" is odd, then gcd("u", "v") = gcd("u"/2, "v"), because 2 is not a common divisor. Similarly, if "u" is odd and "v" is even, then gcd("u", "v") = gcd("u", "v"/2).
# If "u" and "v" are both odd, and "u" ≥ "v", then gcd("u", "v") = gcd(("u"−"v")/2, "v"). If both are odd and "u" < "v", then gcd("u", "v") = gcd(("v"-"u")/2, "u"). These are combinations of one step of the simple Euclidean algorithm, which uses subtraction at each step, and an application of step 3 above. The division by 2 results in an integer because the difference of two odd numbers is even. [In fact, the algorithm might be improved by the observation that if both "u" and "v" are odd, then exactly one of "u"+"v" or "u"−"v" must be divisible by four. Specifically, assuming "u" ≥ "v", if (("u" xor "v") and 2) = 2, then gcd("u", "v") = gcd(("u"+"v")/4, "v"), and otherwise gcd("u", "v") = gcd(("u"−"v")/4, "v").]
# Repeat steps 3&ndash;4 until "u" = "v", or (one more step) until "u" = 0. In either case, the result is 2"k""v", where "k" is the number of common factors of 2 found in step 2.

Since this definition is tail-recursive, a loop can be used to replace the recursion.

The algorithm requires O((log2 "uv")2) worst-case time, or in other words time proportional to the square of the number of bits in "u" and "v" together. Although each step reduces at least one of the operands by at least a factor of 2, the subtract and shift operations do not take constant time for very large integers (although they're still quite fast in practice, requiring about one operation per word of the representation).

An extended version of binary GCD, analogous to the extended Euclidean algorithm, is given in "The Art of Computer Programming", [Knuth (1998), answer to exercise 39 of section 4.5.2, p. 646] along with pointers to other versions.

Implementation

Implementation in C

Following is an implementation of the algorithm in C, taking two non-negative arguments "u" and "v". It first removes all common factors of 2 using identity 2, then computes the GCD of the remaining numbers using identities 3 and 4, and combines these to form the final answer.

` unsigned int gcd(unsigned int u, unsigned int v) { int shift; /* GCD(0,x) := x */ if (u = 0 || v = 0) return u | v; /* Let shift := lg K, where K is the greatest power of 2 dividing both u and v. */ for (shift = 0; ((u | v) & 1) = 0; ++shift) { u >>= 1; v >>= 1; } while ((u & 1) = 0) u >>= 1; /* From here on, u is always odd. */ do { while ((v & 1) = 0) /* Loop X */ v >>= 1; /* Now u and v are both odd, so diff(u, v) is even. Let u = min(u, v), v = diff(u, v)/2. */ if (u < v) { v -= u; } else { unsigned int diff = u - v; u = v; v = diff; } v >>= 1; } while (v != 0); return u << shift; }`

Implementation in assembly

This version of binary GCD in ARM assembly (using GNU Assembler syntax) highlights the benefit of branch predication, showing that the advantage of binary GCD over the Euclidean algorithm lies in its optimizability for real-world machines. The loop to implement step 2 consists of three instructions, all predicated. Step 3 consists of two loops, each 2 instructions long (one of the instructions being predicated); however, after the first iteration r0 is kept odd and need not be tested, and only one of the loops is executed. (On cores that implement the `clz` instruction, steps 2 and 3 can be completed without looping.) Finally, step 4 takes four instructions of which 2 are predicated.

Since u and v are guaranteed even or odd at certain points, their least significant bits (LSBs) need not be stored in the registers but considered part of the program state. The evenness tests then become a side effect of the bit shifts, since the LSB can be placed in the carry flag. Thus the code works with u/2 and v/2, which are known at the start of each loop to be even or odd.

@ Arguments arrive in registers r0 and r1 gcd: subs r3, r0, r0 @ Power-of-two counter = 0, carry
remove_twos_loop: movnes r2, r2, lsr #1 @ Shift r2 right if >0, carry
check_two_r0: movs r0, r0, lsr #1 @ divide u by 2 by shifting r0 right bcc check_two_r0 @ repeat until u is odd (loop terminates) check_two_r1: @ Loop X: movs r1, r1, lsr #1 @ divide v by 2 by shifting r2 right bcc check_two_r1 @ repeat until v is odd (loop terminates) subs r1, r1, r0 @ v := v0 - u0 (even, possibly zero) rsbcc r1, r1, #0 @ v := |v0 - u0| (if v0 < u0 then v := 0 - v) subcc r0, r0, r1 @ u := min(u0, v0) (if v0 < u0, u := u - v) bne check_two_r1 @ u remains odd. if v > 0 then loop. @ This loop can be shown to terminate. @ if v = 0, the carry flag is set (from subs) adc r0, r0, r0 @ Restore u; u = 2(r0) + 1 finish: orr r0, r1, r0, lsl r3 @ multiply u by 2^r3 by shifting left bx lr @ return to caller with result in r0.

=Efficiency=
Brigitte Vallée proved that binary GCD can be about 60% more efficient (in terms of the number of bit operations) on average than the Euclidean algorithm. [http://users.info.unicaen.fr/~brigitte/Publications/icalp8-2000.ps] [http://web.comlab.ox.ac.uk/oucl/work/richard.brent/ftp/rpb183pr.ps.gz] [http://www.stepanovpapers.com/notes.pdf Notes on Programming] by Alexander Stepanov] . However, although this algorithm outperforms the traditional Euclidean algorithm, its asymptotic performance is the same, and it is considerably more complex thanks to the availability of division instruction in all modern microprocessors.

In addition, real computers, however, operate on more than one bit at a time, and even assembly language binary GCD implementations have to compete against carefully designed hardware circuits for integer division. Overall, Knuth (1998) reports a 15% gain over Euclidean GCD, and according to one comparison, the greatest gain was about 60%, while on some popular architectures even good implementations of binary GCD were marginally slower than the Euclidean algorithm. [http://boost-consulting.com/vault/index.php?filename=GCD.zip&directory=Math%20-%20Numerics& Faster implementations of binary GCD algorithm] ( [http://boost-consulting.com/vault/index.php?action=downloadfile&filename=GCD.zip&directory=Math%20-%20Numerics& download GCD.zip] )]

In general, with implementations of binary GCD similar to the above C code, the gain in speed over the Euclidean algorithm is always less in practice than in theory. The reason is that the code features a plethora of data-dependent branches. Most may be removed either using conditional instructions along the model of the ARM code, or by computing "min(a,b)" and "|a-b|" using mixtures of Boolean algebra and arithmetic.

The only one that these techniques do not remove is the loop condition marked "Loop X", which can be unrolled with the aid of a lookup table. With a 256-byte lookup table (8 bits), the implementation turned to be between 82.5% and 163% faster than the Euclidean algorithm, depending on CPU and compiler. Even with a small 16-byte lookup table (4 bits), the gains are in the range of 54% to 116%. The lookup-table approach finds its logical conclusion in the use of the special "CTZ" instruction, that count leading or trailing binary zeros in a number, allowing all trailing zero bits to be removed in a single step instead of one at a time. [ [http://wall.riscom.net/books/proc/ppc/cwg/examples.html#334387 PowerPC Compiler Writer's Guide] , section 5.10: Count Trailing Zeros. Gives an instruction sequence for counting the trailing zeros in a word, on a platform providing only the complementary "Count Leading Zeros" instruction.] Of course, this optimization is possible only on platforms where such instructions are available.

ee also

*Euclidean algorithm
*Extended Euclidean algorithm
*Least common multiple

Notes

References

*Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. "Introduction to Algorithms", Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Problem 31-1, pg.902.

* [http://www.nist.gov/dads/HTML/binaryGCD.html NIST Dictionary of Algorithms and Data Structures: binary GCD algorithm]
* [http://www.cut-the-knot.org/blue/binary.shtml Cut-the-Knot: Binary Euclid's Algorithm] at cut-the-knot
* [http://web.comlab.ox.ac.uk/oucl/work/richard.brent/pd/rpb037i.pdf "Analysis of the Binary Euclidean Algorithm"] (1976), a paper by Richard P. Brent, including a variant using left shifts
* [http://gcd.awardspace.com Online gcd calculator(4 methods)]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• GCD — may refer to:* Gongchandang, or the Communist Party of China * General content descriptor, a file format to describe content to wireless devices *In mathematics **Greatest common divisor **Binary GCD algorithm * Great circle distance mdash; in… …   Wikipedia

• Euclidean algorithm — In number theory, the Euclidean algorithm (also called Euclid s algorithm) is an algorithm to determine the greatest common divisor (GCD) of two elements of any Euclidean domain (for example, the integers). Its major significance is that it does… …   Wikipedia

• Multiplication algorithm — A multiplication algorithm is an algorithm (or method) to multiply two numbers. Depending on the size of the numbers, different algorithms are in use. Efficient multiplication algorithms have existed since the advent of the decimal system.… …   Wikipedia

• Williams' p + 1 algorithm — In computational number theory, Williams p + 1 algorithm is an integer factorization algorithm, one of the family of algebraic group factorisation algorithms. It was invented by Hugh C. Williams in 1982. It works well if the number N to be… …   Wikipedia

• Cipolla's algorithm — In computational number theory, Cipolla s algorithm is a technique for solving a congruence of the form x2 = n, where , so n is the square of x, and where p is an odd prime. Here denotes the finite field with p elements; . Th …   Wikipedia

• Cornacchia's algorithm — In computational number theory, Cornacchia s algorithm is an algorithm for solving the Diophantine equation x2 + dy2 = m, where and d and m are coprime. The algorithm was described in 1908 by Giuseppe Cornacchia[1]. Contents 1 The algorithm …   Wikipedia

• Extended Euclidean algorithm — The extended Euclidean algorithm is an extension to the Euclidean algorithm for finding the greatest common divisor (GCD) of integers a and b : it also finds the integers x and y in Bézout s identity: ax + by = gcd(a, b). ,(Typically either x or… …   Wikipedia

• Computational complexity of mathematical operations — The following tables list the running time of various algorithms for common mathematical operations. Here, complexity refers to the time complexity of performing computations on a multitape Turing machine.[1] See big O notation for an explanation …   Wikipedia

• Greatest common divisor — In mathematics, the greatest common divisor (gcd), also known as the greatest common factor (gcf), or highest common factor (hcf), of two or more non zero integers, is the largest positive integer that divides the numbers without a remainder. For …   Wikipedia

• List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia