Performance analysis

Performance analysis

In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program's behavior using information gathered as the program executes (i.e. it is a form of dynamic program analysis, as opposed to static code analysis). The usual goal of performance analysis is to determine which sections of a program to optimize - usually either to increase its speed or decrease its memory requirement (or sometimes both).

Use of profilers

A profiler is a performance analysis tool that measures the behavior of a program as it executes, particularly the frequency and duration of function calls. (An instruction set simulator which is also - by necessity - a profiler, can measure the totality of a programs behaviour from invokation to termination.) The output may be a stream of recorded events (a trace) or a statistical summary of the events observed (a profile) or an ongoing interaction with the hypervisor. Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters. The usage of profilers is 'called out' in the performance engineering process.

As the summation in a profile often is done related to the source code positions where the events happen, the size of measurement data is linear to the code size of the program. In contrast, the size of a (full) trace is linear to the program's execution time, making it somewhat impractical. For sequential programs, a profile is usually enough, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring the full trace to get an understanding of the problem.

:"Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architectures. Software writers need tools to analyze their programs and identify critical sections of code. Compiler writers often use such tools to find out how well their instruction scheduling or branch prediction algorithm is performing..." (ATOM, PLDI, '94)


Performance analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the Program status word (PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling (see below). In early 1974, Instruction Set Simulators permitted full trace and other performance monitoring features.

Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph analysis ("Gprof: a Call Graph Execution Profiler" [] )

In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM [] . ATOM is a platform for converting a program into its own profiler. That is, at compile time, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique, modifying a program to analyze itself, is known as "instrumentation".

In 2004, both the Gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time. []

Profiler Types based on Output

Flat profiler

Flat profilers compute the average call times, from the calls, and do not breakdown the call times based on the callee or the context.

Call-Graph profiler

Call Graph profilers show the call times, and frequencies of the functions,and also the call-chains involved based on the callee. However context isnot preserved.

Methods of data gathering

Event based profilers

The programming languages listed here have event-based profilers:
# .NET: Can attach a profiling agent as a COM server to the CLR. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways.
# Java: JVM-Tools Interface (formerly JVM Profiling Interface) JVM API provides hooks to profiler, for trapping events like calls, class-load, unload, thread enter leave.
# Python: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile()' module to trap events like c_{call,return,exception}, python_{call,return,exception}.
# Ruby: Ruby also uses a similar interface like Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.

tatistical profilers

Some profilers operate by sampling. A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. Sampling profiles are typically less accurate and specific, but allow the target program to run at near full speed.

Some profilers instrument the target program with additional instructions to collect the required information. Instrumenting the program can cause changes in the performance of the program, causing inaccurate results and heisenbugs. Instrumenting can potentially be very specific but slows down the target program as more specific information is collected.

The resulting data are not exact, but a statistical approximation. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods." []

Some of the most commonly used statistical profilers are GNU's gprof, Oprofile, AMD's CodeAnalyst and SGI's Pixie.


* Manual: Done by the programmer, e.g. by adding instructions to explicitly calculate runtimes.
* Compiler assisted: Example: "gcc -pg ..." for gprof, "quantify g++ ..." for Quantify
* Binary translation: The tool adds instrumentation to a compiled binary. Example: ATOM
* Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. Examples: PIN, Valgrind
* Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions. Example: DynInst


* Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisor. Example: SIMMON
* Simulator and Hypervisor: Data collected interactively and selectively by running under an Instruction Set Simulator. Examples: SIMON and OLIVER.

imple manual technique

A variation on the sampling technique, based on deep sampling, places less importance on accurate measurement, and more importance on achieving optimization. It requires no instrumentation and takes only a small number of random time samples (e.g. 20). This can be done by running the program under a debugger and interrupting it manually at unpredictable times using a Break key or Esc key. (These samples are taken during the time when the program is being subjectively slow. If needed, a temporary outer loop can be added, to make it run long enough to take manual samples.) At each sample, the entire call stack consisting of the program counter plus the stack of return addresses is recorded. Then the call stack samples are examined for addresses that appear on multiple samples. "Any address that is on the call stack X% of the time identifies an instruction which, if removed, will save X% of running time."

For example, if the program is spending X = 60% of it's time doing something wasteful, such as an unnecessary function call (whether distributed over one invocation or many), it will be seen doing that on 12 out of 20 samples, more or less. The more time is being wasted, the fewer samples are needed to see it. Since the exact instruction is indicated by the samples, there is no guesswork and no need to look elsewhere for the problem (no matter how large the software is). When it is repaired, a performance factor of 1/(1-X) is gained. In this case 1/(1-0.6) = 2.5x.

The entire process can be repeated multiple times, until no more wasteful activities are captured, often resulting in substantial cumulative speedups.

ee also

* Algorithmic efficiency
* Benchmark
* List of performance analysis tools
* Performance engineering
* Performance prediction
* PAPI is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors.
* Software slug
* Worst case execution time
* Java performance


* Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.
* Dunlavey, “Performance Tuning: Slugging It Out!”, Dr. Dobb's Journal, Vol 18, #12, November 1993, pp 18-26.

External links

* Sample Demo " [ Using VSTS Performance Tools to Speed Up Your App] " by Ian Huff, a developer at Microsoft who is demonstrating the profiler in Visual Studio Team System 2005
* Article " [ Need for speed -- Eliminating performance bottlenecks] " on doing execution time analysis of Java applications using IBM Rational Application Developer.

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • List of performance analysis tools — This is a list of performance analysis tools for use in software development.Multiple languagesThe following tools work for multiple languages or binaries. * Valgrind is a GPL d system for debugging and profiling x86 Linux programs. You can… …   Wikipedia

  • Seismic performance analysis — or, simply, seismic analysis is an intellectual tool of earthquake engineering which breaks the complex topic into smaller parts to gain a better understanding of seismic performance of building and non building structures. The technique as a… …   Wikipedia

  • Performance tuning — is the improvement of system performance. This is typically a computer application, but the same methods can be applied to economic markets, bureaucracies or other complex systems. The motivation for such activity is called a performance problem …   Wikipedia

  • BARRA's performance analysis — ( Perfan) A method developed by BARRA, a consulting firm in Berkeley, Calif. It is commonly used by institutional investors applying performance attribution analysis to evaluate their money managers performances. The New York Times Financial… …   Financial and business terms

  • Analysis of algorithms — To analyze an algorithm is to determine the amount of resources (such as time and storage) necessary to execute it. Most algorithms are designed to work with inputs of arbitrary length. Usually the efficiency or running time of an algorithm is… …   Wikipedia

  • Performance (Informatik) — Die Artikel Leistungsbewertung (Computer) und Leistung (Informatik) überschneiden sich thematisch. Hilf mit, die Artikel besser voneinander abzugrenzen oder zu vereinigen. Beteilige dich dazu an der Diskussion über diese Überschneidungen. Bitte… …   Deutsch Wikipedia

  • Performance (disambiguation) — A performance art is an event in which one group of people (the performer or performers) behave in a particular way for another group of people (the audience).Performance may also refer to: * a ritual in a religious or occult setting * the… …   Wikipedia

  • Performance Application Programming Interface — In computer science, Performance Application Programming Interface (PAPI) is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors. It is being widely used to collect low level performance… …   Wikipedia

  • Performance measurement — with a process is the complement to process execution. Based on measured performance, the feedback control loop may be closed. The metrics to assess performance is set according to a determined econometric model. The expected best result is… …   Wikipedia

  • Performance attribution — or Investment Performance Attribution is a set of techniques that performance analysts use to explain why a portfolio s performance differed from the benchmark. This difference between the portfolio return and the benchmark return is known as the …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”