- Loop unwinding
Loop unwinding, also known as loop unrolling, is a
loop transformation technique that attempts optimize a program's execution speed at the expense of its size.The goal of loop unwinding is to increase the programs speed by reducing (or eliminating) the "end of loop" test on each iteration. Loops can be re-written as a sequence of independent statements which eliminates the loop controller overhead. Significant gains can be realized if the speed gained (by eliminating the "end of loop" test) offsets the performance reduction caused by the increased size of the program. Furthermore, if the statements in the loop are independent of each other, where statements that occur earlier in the loop do not effect statements that follow them, then the statements can be executed in parallel.
ide effects
The major side effects of loop unrolling are:
* a) the increased register usage in a single iteration to store temporary variables (though much will depend on possible optimizations), which may hurt performance; and
* b) the code size expansion after the unrolling, which is undesirable for embedded applications and for purposes of codereadability .A simple example
A procedure in a computer program is to delete 100 items from a collection. This is accomplished by means of a "for"-loop which calls the function "delete(item_number)":
If this part of the program is to be optimized, and the overhead of the loop requires significant resources compared to that for "delete(x)", loop unwinding can be used to speed it up. This will result in an optimized code fragment like:
As a result of this modification, the new program has to make only twenty loops, instead of a hundred. There are now a fifth as many jumps and conditional branches that need to be taken, which over many iterations would be a great improvement in the loop administration time.
On the other hand, the loop unrolling expands the source code size from three lines to seven lines that have to be produced checked and debugged, and the compiler has to allocate more registers to store variables in the expanded loop iteration. In addition, the loop control variables and number of operations inside the unrolled loop structure have to be chosen carefully so that the result is indeed the same as in the original code.
Consider the implications if the iteration count were not divisible by five. It also becomes somewhat more complicated if the test conditions are variables. See also
Duff's device .Early complexity
In the simple case the loop control is merely an administrative overhead, that arranges the productive statements. The loop itself contributes nothing to the results desired, merely saving the programmer the tedium of replicating the code a hundred times which could have been done by a pre-processor generating the replications, or a text editor. Similarly, if-statements and other flow control statements could be replaced by code replication, except that bloated messes soon result. Computer programs easily track the combinations, but humans find this repugnant and make mistakes. Consider for i:=1:8 do if i mod 2 = 0 then do_evenstuff(i) else do_oddstuff(i); next i;This could be fully unrolled, whereupon the if-test involves constants (unlike the previous example) and so the code becomes do_oddstuff(1); do_evenstuff(2); do_oddstuff(3); do_evenstuff(4); do_oddstuff(5); do_evenstuff(6); do_oddstuff(7); do_evenstuff(8);
But of course the stuff done need not be an invocation of a procedure, and this example involves the index variable in computation: x(1) = 1; For i:=2:9 do x(i):=x(i - 1)*i; "print" i,x(i); Next i;Which would unroll to x(1):=1; x(2):=x(1)*2; "print" 2,x(2); x(3):=x(2)*3; "print" 3,x(3); ...etc.which if compiled would produce a lot of code ("print" statements being notorious) but further optimisation is possible. This example make reference only to "x(i)" and "x(i - 1)" in the loop (the latter only to develop the new value "x(i)") so, given that there is no later reference to the array "x" developed here, its usages could be replaced by a simple variable. However it would be quite a surprise if the compiler were to recognise "x(n) = Factorial(n)".
In general, the content of a loop might be large, involving intricate array indexing: such cases should be left alone. Replicating innermost loops might raise many possible optimisations yet yield only a small gain.
Further reading
ee also
*
Loop splitting
*Loop fusion
*Duff's device
Wikimedia Foundation. 2010.