Loop optimization

Loop optimization

In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops. Thus a lot of compiler analysis and optimization techniques have been developed to make the execution of loops faster.

Representation of computation and transformations

Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performedJean-Francois Collard, "Reasoning About Program Transformations,", 2003 Springer-Verlag. Discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.] .

Optimization via a Sequence of Loop Transformations

Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in David F. Bacon, Susan L. Graham, and Oliver J. Sharp. "Compiler transformations for high-performance computing." Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at CiteSeer [http://citeseer.ist.psu.edu/bacon93compiler.html] ). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations] ) to the source code or Intermediate representation, with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.

Common loop transformations include:

* loop interchange : These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout. This is also known as loop permutation.
* loop splitting/loop peeling : Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is "loop peeling", which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
* loop fusion or "loop combining" : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data.
* loop fission or "loop distribution" : Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body.
* loop unrolling: Duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which hurt performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time.
* loop unswitching : Unswitching moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.
* loop inversion : This technique changes a standard "while" loop into a "do/while" (a.k.a. "repeat/until") loop wrapped in an "if" conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the "if" guard can be skipped.
* loop-invariant code motion : If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop.
* loop reversal : Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations.
* loop tiling/loop blocking
* loop skewing : Loop skewing takes a nested loop iterating over a multidimensional array, where each iteration of the inner loop depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop.
* vectorization
* loop parallelization : a special case for parallelization focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers (named automatic parallelization) or manually (inserting parallel directives like OpenMP).
* Software pipelining : a type of out-of-order execution of loop iterations to hide the latencies of processor function units.
* loop scheduling

The Unimodular Transformation Framework

The unimodular transformation approach Steven S. Muchnick, "Advanced Compiler Design and Implementation," 1997 Morgan-Kauffman. Section 20.4.2 discusses loop optimization.] uses a single unimodular matrix to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within "n" loops as a set of integer points in an "n"-dimensional space, with the points being executed in lexicographical order. For example, the executions of a statement nested inside an outer loop with index "i" and an inner loop with index "j" can be associated with the pairs of integers "(i, j)". The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix ( (0, 1) (1, 0) ).

A unimodular transformation is legal if it preserves the temporal sequence of all dependencies; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.

The Polyhedral or Constraint-Based Framework

The polyhedral model handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly-imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. Affine transformations are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a constraint-based approach to loop optimization. For example, a single statement within an outer loop 'for i := 0 to n' and an inner loop 'for j := 0 to i+2' is executed once for each "(i, j)" pair such that "0 <= i <= n and 0 <= j <= i+2".

Once again, a transformation is legal if it preserves the temporal sequence of all dependencies. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing

See also Loop nest optimization, Polytope model.

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Loop tiling — Loop tiling, also known as loop blocking, strip mine and interchange, unroll and jam, or supernode partitioning, is a loop optimization used by compilers to make the execution of certain types of loops more efficient.Loop tiling partitions a loop …   Wikipedia

  • Optimization (computer science) — In computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it executes more rapidly, or is capable of operating …   Wikipedia

  • Loop nest optimization — (LNO) is a special case of loop transformation, dealing with nested loops, that allows large reductions in the cache bandwidth necessary for some common algorithms.Example: Matrix multiplyMany large mathematical operations on computers end up… …   Wikipedia

  • Loop fission — (or loop distribution) is a compiler optimization technique attempting to break a loop into multiple loops over the same index range but each taking only a part of the loop s body. The goal is to break down large loop body into smaller ones to… …   Wikipedia

  • Loop-invariant code motion — Loop invariant code in an imperative programming language consists of statements which could be moved to before the loop (if the loop always terminates), or after the loop, without affecting the semantics of the program. As a result it is… …   Wikipedia

  • Loop splitting — (or loop peeling) is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A… …   Wikipedia

  • Loop unswitching — is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop s body, and placing a version of it inside each of the if and else clauses of the conditional. This can improve the parallelization of the loop …   Wikipedia

  • Loop fusion — Loop fusion, also called loop jamming, is a compiler optimization, a loop transformation, which replaces multiple loops with a single one. Example in C int i, a [100] , b [100] ; for (i = 0; i < 100; i++) { a [i] = 1; } for (i = 0; i < 100; i++)… …   Wikipedia

  • Loop inversion — is a compiler optimization, a loop transformation, which replaces a while loop by an if block containing a do..while loop. Example in C int i, a [100] ; i = 0; while (i < 100) { a [i] = 0; i++; }is equivalent to: int i, a [100] ; i = 0; if (i …   Wikipedia

  • Loop interchange — In compiler theory, loop interchange is the process of exchanging the order of two iteration variables. For example, in the code fragment: for i from 0 to 10 for j from 0 to 20 a [i,j] = i + jloop interchange would result in: for j from 0 to 20… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”