Profiling (computer programming)

Profiling (computer programming)

In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the usage of memory, the usage of particular instructions, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization.

Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler).

The methodology of the profiler itself classify the profiler as event-based, as statistical, as instrumentation, or as simulation.

Contents

Gathering program events

Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters. The usage of profilers is 'called out' in the performance engineering process.

Use of profilers

Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architectures. Software writers need tools to analyze their programs and identify critical sections of code. Compiler writers often use such tools to find out how well their instruction scheduling or branch prediction algorithm is performing... (ATOM, PLDI, '94)

The output of a profiler may be:-

  • A statistical summary of the events observed (a profile)
Summary profile information is often shown annotated against the source code statements where the events occur, so the size of measurement data is linear to the code size of the program.
/* ------------ source------------------------- count */             
0001             IF X = "A"                     0055
0002                THEN DO                       
0003                  ADD 1 to XCOUNT           0032
0004                ELSE
0005             IF X = "B"                     0055

  • A stream of recorded events (a trace)
For sequential programs, a summary profile is usually sufficient, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring a full trace to get an understanding of what is happening.
The size of a (full) trace is linear to the program's instruction path length, making it somewhat impractical. A trace may therefore be initiated at one point in a program and terminated at another point to limit the output.
  • An ongoing interaction with the hypervisor (continuous or periodic monitoring via on-screen display for instance)
This provides the opportunity to switch a trace on or off at any desired point during execution in addition to viewing on-going metrics about the (still executing) program. It also provides the opportunity to suspend asynchronous processes at critical points to examine interactions with other parallel processes in more detail.

History

Performance analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the Program status word (PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling (see below). In early 1974, Instruction Set Simulators permitted full trace and other performance monitoring features.

Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph analysis.[1]

In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM.[2] ATOM is a platform for converting a program into its own profiler. That is, at compile time, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation".

In 2004, both the gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time.[3]

Profiler types based on output

Flat profiler

Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.

Call-graph profiler

Call graph profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved.

Methods of data gathering

Event-based profilers

The programming languages listed here have event-based profilers:

  • Java: the JVMTI (JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave.
  • .NET: Can attach a profiling agent as a COM server to the CLR using Profiling API. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways.
  • Python: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_{call,return,exception}, python_{call,return,exception}.
  • Ruby: Ruby also uses a similar interface like Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.

Statistical profilers

Some profilers operate by sampling. A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed.

The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods. [4]

In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call processing.

Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).

Dedicated hardware can go beyond this: some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter in a truly undetectable manner.

Some of the most commonly used statistical profilers are AMD CodeAnalyst, Apple Inc. Shark, gprof, Intel VTune and Parallel Amplifier (part of Intel Parallel Studio).

Instrumenting profilers

Some profilers instrument the target program with additional instructions to collect the required information.

Instrumenting the program can cause changes in the performance of the program, potentially causing inaccurate results and heisenbugs. Instrumenting will always have some impact on the program execution, typically always slowing it. However, instrumentation can be very specific and be carefully controlled to have a minimal impact. The impact on a particular program depends on the placement of instrumentation points and the mechanism used to capture the trace. Hardware support for trace capture means that on some targets, instrumentation can be on just one machine instruction. The impact of instrumentation can often be deducted (i.e. eliminated by subtraction) from the results.

gprof is an example of a profiler that uses both instrumentation and sampling. Instrumentation is used to gather caller information and the actual timing values are obtained by statistical sampling.

Instrumentation

  • Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement APIs such as the Application Response Measurement standard.
  • Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
  • Compiler assisted: Example: "gcc -pg ..." for gprof, "quantify g++ ..." for Quantify
  • Binary translation: The tool adds instrumentation to a compiled binary. Example: ATOM
  • Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. Examples: Pin, Valgrind
  • Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions. Example: DynInst

Interpreter instrumentation

  • Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A bytecode, control table or JIT interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.

Hypervisor/Simulator

  • Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisor. Example: SIMMON
  • Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an Instruction Set Simulator. Examples: SIMON and OLIVER.

See also

References

  • Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4–8.
  • Dunlavey, “Performance Tuning: Slugging It Out!”, Dr. Dobb's Journal, Vol 18, #12, November 1993, pp 18–26.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Programming tool — A programming tool or software development tool is a program or application that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs that… …   Wikipedia

  • Comparison of programming paradigms — Programming paradigms Agent oriented Automata based Component based Flow based Pipelined Concatenative Concurrent computin …   Wikipedia

  • Optimization (computer science) — In computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it executes more rapidly, or is capable of operating …   Wikipedia

  • Pascal (programming language) — Pascal Paradigm(s) imperative, structured Appeared in 1970 Designed by Niklaus Wirth Typing discipline static, strong, safe …   Wikipedia

  • Service Oriented Programming — (SOP) is a programming paradigm that uses services as the unit of computer work, to design and implement integrated business applications and mission critical software programs. Services can represent steps of business processes and thus one of… …   Wikipedia

  • Monad (functional programming) — In functional programming, a monad is a programming structure that represents computations. Monads are a kind of abstract data type constructor that encapsulate program logic instead of data in the domain model. A defined monad allows the… …   Wikipedia

  • Channel 4 programming — Channel 4, in common with the other main British stations broadcast on analogue, airs a highly comprehensive range of programming. It was established in 1982 with a specific intention of providing programming to groups of minority interests, not… …   Wikipedia

  • Algorithm — Flow chart of an algorithm (Euclid s algorithm) for calculating the greatest common divisor (g.c.d.) of two numbers a and b in locations named A and B. The algorithm proceeds by successive subtractions in two loops: IF the test B ≤ A yields yes… …   Wikipedia

  • Analysis of algorithms — To analyze an algorithm is to determine the amount of resources (such as time and storage) necessary to execute it. Most algorithms are designed to work with inputs of arbitrary length. Usually the efficiency or running time of an algorithm is… …   Wikipedia

  • Multicore Association — The Multicore Association, founded in 2005, is a member funded, non profit, industry consortium focused on the creation of open standard APIs, specifications, and guidelines that will allow system developers and programmers to adopt multicore… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”