- Java performance
Programs written in Java have had a reputation for being slower and requiring more memory than those written in natively compiled languages such as C or
C++ (see e.g. [cite web
url=http://www.jelovic.com/articles/why_java_is_slow.htm
title=Why Java Will Always Be Slower than C++
last=Jelovic|first=Dejan
accessdate=2008-02-15] ). However, Java programs' execution speed has improved significantly due to introduction of Just-In Time compilation (in 1997/1998 for Java 1.1) [cite web
url=http://www.symantec.com/about/news/release/article.jsp?prid=19970407_03
title=Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1] [cite web
url=http://findarticles.com/p/articles/mi_hb6676/is_/ai_n26150624
title=Apple Licenses Symantec's Just In Time (JIT) Compiler To Accelerate Mac OS Runtime For Java] [cite web
url=http://www.infoworld.com/cgi-bin/displayStory.pl?980416.ehjdk.htm
title=Java gets four times faster with new Symantec just-in-time compiler] , addition of language features supporting better code analysis, and optimizations in theJava Virtual Machine itself (such asHotSpot becoming the default for Sun JVM in 2000).Virtual machine optimization techniques
Many optimizations have improved the performance of the
Java Virtual Machine over time. However, although Java was often the firstVirtual machine to implement them successfully, they have often been used in other similar platforms as well.Just-In-Time compilation
Early
Java Virtual Machine always interpreted bytecodes. This had a huge performance penalty (between a factor 10 and 20 for Java versus C in average applications). [http://www.shudo.net/jit/perf/]Java 1.1 saw the introduction of a JIT compiler.
Java 1.2 saw the introduction of an optional system called
HotSpot : The Virtual Machine continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed. These are then targeted for optimization, leading to high performance execution with a minimum of overhead for less performance-critical code.With the introduction of Java 1.3
HotSpot was the default system.With the HotSpot technique, code is first interpreted, then "hot spots" are compiled on the fly. This is the reason why it is necessary to execute the programs a few times before measuring performances in benchmarks.
The HotSpot-compilation uses many optimization techniques, such as
Inline expansion ,Loop unwinding ,Bounds-checking elimination , or architecture dependentRegister allocation . [cite web
url=http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html
title=Deep dive into assembly code from Java
last=Kawaguchi|first=Kohsuke
date=2008-03-30
accessdate=2008-04-02] [cite web
url=http://ei.cs.vt.edu/~cs5314/presentations/Group2PLDI.pdf
title=Fast, Effective Code Generation in a Just-In-Time Java Compiler
publisher=Intel Corporation
accessdate=2007-06-22]Some benchmarks show a 10-fold speed gain from this technique. [This [http://www.shudo.net/jit/perf/ article] shows that the performance gain between interpreted mode and Hotspot is of more than a factor 10.]
Adaptive optimization
Adaptive optimization is a technique in computer science that performs
dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between Just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion to decrease context switching.A Virtual Machine like
HotSpot is also able to deoptimize a previously JITted code. This allows it to perform aggressive (and potentially unsafe) optimizations, while still being able to deoptimize the code and fall back on a safe path later on. [cite web
url=http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html#hotspot
title=The Java HotSpot Virtual Machine, v1.4.1
publisher=Sun Microsystems
accessdate=2008-04-20] [cite web
url=http://headius.blogspot.com/2008/01/langnet-2008-day-1-thoughts.html
title=Lang.NET 2008: Day 1 Thoughts
quote="Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations...knowing you'll be able to fall back on a tried and true safe path later on"
last=Nutter|first=Charles
date=2008-01-28
accessdate=2008-04-20]Garbage collection
The 1.0 and 1.1 Virtual Machines used a mark-sweep collector, which could fragment the heap after a garbage collection.Starting with Java 1.2, the Virtual Machines switched to a generational collector, which has a much better defragmentation behaviour. [http://www-128.ibm.com/developerworks/library/j-jtp01274.html] Modern Virtual Machines use a variety of techniques that have further improved the garbage collection performance. [For example, the duration of pauses is less noticeable now. See for example this clone of Quake 2 written in Java: [http://bytonic.de/html/jake2.html Jake2] .]
Other optimization techniques
plit bytecode verification
Prior to executing a class, the Sun JVM verifies its bytecodes (see Bytecode verifier). This verification is performed lazily: classes bytecodes are only loaded and verified when the specific class is loaded and prepared for use, and not at the beginning of the program. (Note that other verifiers, such as the Java/400 verifier for
IBM System i, can perform most verification in advance and cache verification information from one use of a class to the next.) However, as the Java Class libraries are also regular Java classes, they must also be loaded when they are used, which means that the start-up time of a Java program is often longer than forC++ programs, for example.A technique named Split-time verification, first introduced in the J2ME of the Java platform, is used in the
Java Virtual Machine since the Java version 6. It splits the verification of bytecode in two phases: [https://jdk.dev.java.net/verifier.html]
* Design-time - during the compilation of the class from source to bytecode
* runtime - when loading the class.In practice this technique works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information. This does not make runtime verification appreciably less complex, but does allow some shortcuts.
Escape analysis and lock coarsening
Java is able to manage multithreading at the language level. Multithreading is a technique that allows one to
* improve a user's perceived impression about program speed, by allowing user actions while the program performs tasks, and
* take advantage of multi-core architectures, enabling two unrelated tasks to be performed at the same time by two different cores.However, programs that use multithreading need to take extra care of objects shared between threads, locking access to shared methods or blocks of code when they are used by one of the threads. Locking a block or an object is a time-consuming operation due to the nature of the underlying
operating system -level operation involved (seeconcurrency control and lock granularity).As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks of code when necessary in a multithreaded environment.
Prior to Java 6, the virtual machine always locked objects and blocks when asked to by the program (see Lock Implementation), even if there was no risk of an object being modified by two different threads at the same time. For example, in this case, a local Javadoc:SE|java/util|Vector was locked before each of the "add" operations to ensure that it would not be modified by other threads (Vector is synchronized), but because it is strictly local to the method this is not necessary:Starting with Java 6, code blocks and objects are locked only when necessary [http://www-128.ibm.com/developerworks/java/library/j-jtp10185/] [http://blogs.sun.com/dagastine/entry/java_synchronization_optimizations_in_mustang] , so in the above case, the virtual machine would not lock the Vector object at all.
Register allocation improvements
Prior to Java 6, allocation of registers was very primitive in the "client" virtual machine (they did not live across blocks), which was a problem in architectures which did not have a lot of registers available, such as x86 for example. If there are no more registers available for an operation, the compiler must copy from register to memory (or memory to register), which takes time (registers are typically much faster to access). However the "server" virtual machine used a color-graph allocator and did not suffer from this problem.
An optimization of register allocation was [http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6320351 introduced in this version] ; it was then possible to use the same registers across blocks (when applicable), reducing accesses to the memory. This led to a reported performance gain of approximately 60% in some benchmarks. [http://weblogs.java.net/blog/opinali/archive/2005/11/mustangs_hotspo_1.html]
In this example, the same register could be used for "result", and the "doSomethingElse" method.
Class data sharing
Class data sharing (called CDS by Sun) is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system jar file (the jar file containing all the Java class library, called rt.jar) into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's Metadata for these classes to be shared among multiple JVM processes. [http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html]
The corresponding improvement for start-up time is more noticeable for small programs. [http://www.artima.com/forums/flat.jsp?forum=121&thread=56613]
un Java versions performance improvements
Apart from the improvements listed here, each Sun's Java version introduced many performance improvements in the Java API.
JDK 1.1.6
Introduced at the Virtual machine level:
* FirstJust-in-time compilation by (Symantec 's JIT--compiler) [cite web
url=http://www.symantec.com/about/news/release/article.jsp?prid=19970407_03
title=Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1] [cite web
url=http://www.infoworld.com/cgi-bin/displayStory.pl?980416.ehjdk.htm
title=Java gets four times faster with new Symantec just-in-time compiler]J2SE 1.2
Introduced at the Virtual machine level:
* Use of a generational collector.J2SE 1.3
Introduced at the Virtual machine level:
* Just-In-Time compilation byHotSpot .J2SE 1.4
See [http://java.sun.com/j2se/1.4.2/performance.guide.html here] , for a Sun overview of performance improvements between 1.3 and 1.4 versions.
Java SE 5.0
Introduced at the Virtual machine level :
* Class Data SharingSee [http://java.sun.com/performance/reference/whitepapers/5.0_performance.html here] , for a Sun overview of performance improvements between 1.4 and 5.0 versions.
Java SE 6
Introduced at the Virtual machine level :
* Split bytecode verification
* Escape analysis and lock coarsening
* Register allocation ImprovementsOther improvements:
* JavaOpenGL Java 2D pipeline speed improvements [http://weblogs.java.net/blog/campbell/archive/2005/07/strcrazier_perf.html]
*Java 2D performance has also improved significantly in Java 6 [See [http://jroller.com/page/dgilbert?entry=is_java_se_1_6 here] for a benchmark showing an approximately 60% performance boost from Java 5.0 to 6 for the application [http://www.jfree.org JFreeChart] ]See [http://java.sun.com/performance/reference/whitepapers/6_performance.html here] for Sun overview of performance improvements between Java 5 and Java 6.
Future improvements
Future performance improvements are planned for an update of Java 6 or Java 7: [cite web
url=http://java.sun.com/developer/technicalArticles/javase/consumerjre
title=Consumer JRE: Leaner, Meaner Java Technology
publisher=Sun Microsystems
last=Haase|first=Chet
date=May 2007
accessdate=2007-07-27]* Reduce start-up time by preloading part of JRE data at OS startup on disk cache.cite web
url=http://java.sun.com/developer/technicalArticles/javase/consumerjre#Quickstarter
title=Consumer JRE: Leaner, Meaner Java Technology
publisher=Sun Microsystems
last=Haase|first=Chet
quote="At the OS level, all of these megabytes have to be read from disk, which is a very slow operation. Actually, it's the seek time of the disk that's the killer; reading large files sequentially is relatively fast, but seeking the bits that we actually need is not. So even though we only need a small fraction of the data in these large files for any particular application, the fact that we're seeking all over within the files means that there is plenty of disk activity. "
date=May 2007
accessdate=2007-07-27]* Subsetting the platform to allow users to download only the parts that are necessary to execute their application when accessing it from the web when JRE is not installed. The entire JRE is 12 MB, a typical swing application would only need to download 4 MB. [cite web
url=http://java.sun.com/developer/technicalArticles/javase/consumerjre#JavaKernel
title=Consumer JRE: Leaner, Meaner Java Technology
publisher=Sun Microsystems
last=Haase|first=Chet
date=May 2007
accessdate=2007-07-27]* Provide JVM support for dynamic languages, following the prototyping work currently done on the Multi Language Virtual Machine, [cite web
url=http://www.jcp.org/en/jsr/detail?id=292
title=JSR 292: Supporting Dynamically Typed Languages on the Java Platform
publisher=jcp.org
accessdate=2008-05-28]* Enhance the existing concurrency library by managing
parallel computing onMulti-core processors, [cite web
url=http://www.ibm.com/developerworks/java/library/j-jtp03048.html?ca
title=Java theory and practice: Stick a fork in it, Part 2
last=Goetz|first=Brian
date=2008-03-04
accessdate=2008-03-09] [cite web
url=http://www.infoq.com/news/2008/03/fork_join
title=Parallelism with Fork/Join in Java 7
last=Lorimer|first=R.J.
publisher=infoq.com
date=2008-03-21
accessdate=2008-05-28]* Allow the virtual machine to use both the "Client" and "Server" compilers in the same session with a technique called "Tiered compilation": [cite web
url=http://developers.sun.com/learning/javaoneonline/2006/coreplatform/TS-3412.pdf
title=New Compiler Optimizations in the Java HotSpot Virtual Machine
publisher=Sun Microsystems
date=May 2006
accessdate=2008-05-30]
** The "Client" would be used at startup (because it is good at startup and for small applications),
** The "Server" would be used for long-term running of the application (because it outperforms the "Client" compiler for this).* Replace the existing concurrent low-pause garbage collector (also called CMS or Concurrent Mark-Sweep collector) by a new collector called G1 (or Garbage First) to ensure consistent pauses over time [cite web
url=http://www.infoq.com/news/2008/05/g1
title=JavaOne: Garbage First
publisher=infoq.com
last=Humble|first=Charles
date=2008-05-13
accessdate=2008-09-07] .* Improve graphics performance on Windows by extensively using
Direct3D by default, [cite web
url=http://java.sun.com/developer/technicalArticles/javase/consumerjre#Performance
title=Consumer JRE: Leaner, Meaner Java Technology
publisher=Sun Microsystems
last=Haase|first=Chet
date=May 2007
accessdate=2007-07-27] and useShader s on GPU to accelerate complexJava 2D operations. [cite web
url=http://weblogs.java.net/blog/campbell/archive/2007/04/faster_java_2d.html
title=Faster Java 2D Via Shaders
last=Campbell|first=Chris
date=2007-04-07
accessdate=2008-04-26]Comparison to other languages
Java is often Just-in-time compiled at runtime by the Java Virtual Machine, but may also be compiled ahead-of-time, just like C or C++. When Just-in-time compiled, its performance is generally: [http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=all]
* lower than the performance of compiled languages as C orC++ , but not significantly for most tasks,
* close to other Just-in-time compiled languages such as C#,
* much better than languages without an effective native-code compiler (JIT or AOT), such asPerl , Ruby,PHP and Python. [Python hasPsyco , but the code it can handle is limited, and even with Psyco, its performance is much lower than Java (see [http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=all the Shootout here] )]Program speed
The average performance of Java programs has increased a lot over time, and Java's speed is now comparable with C or
C++ . In some cases Java is significantly slower, in others, significantly faster.cite web|url=http://scribblethink.org/Computer/javaCbenchmark.html |title=Performance of Java versus C++ |publisher=Computer Graphics and Immersive Technology Lab, University of Southern California| author=Lewis, J.P. |coauthors=Neumann, Ulrich]It must also be said that benchmarks often measure performance for small numerically-intensive programs. This arguably favours C. In some real life programs, Java out-performs C, and often there is no performance difference at all. One example is the benchmark of
Jake2 (a clone of Quake 2 written in Java by translating the originalGPL C code). The Java 5.0 version performs better in some hardware configurations than its C counterpart: 260/250 fps versus 245 fps. [http://www.bytonic.de/html/benchmarks.html] While its not specified how the data was measured (for example if the original Quake 2 executable compiled in 1997 was used, which may be considered bad as current C compilers could achieve better optimizations), it notes how the same Java source code can have a huge speed boost just by updating the VM, something impossible to achieve with a 100% static approach.Also some optimizations that are possible in Java and similar languages are not possible in C or C++:
* C-style pointers make optimization hard in languages that support them.
* Adaptive optimization is impossible in fully compiled code, as the code is compiled once before any program execution, and thus can not take advantage of the architecture and the code path. Some benchmarks show that performance of compiled C or C++ programs are very much dependent on the compatibility of the compilation options on the processor architecture – such asSSE2 for example), although Java programs are JIT-compiled and adapt on the fly to any given architecture. [cite web
url=http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all#about
title=mandelbrot benchmark
publisher=Computer Language Benchmarks Game
accessdate=2008-02-16]
* Escape analysis techniques can not be used inC++ for example, because the compiler can not know where an Object will be used (also because of pointers).However, results for microbenchmarks between Java and C or C++ highly depend on which operations are compared. For example, when comparing with Java 5.0:
* 32 and 64 bits arithmetics operations, [cite web
url=http://www.ddj.com/java/184401976?pgno=2
title=Microbenchmarking C++, C#, and Java: 32-bit integer arithmetic
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] [cite web
url=http://www.ddj.com/java/184401976?pgno=12
title=Microbenchmarking C++, C#, and Java: 64-bit double arithmetic
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] File I/O [cite web
url=http://www.ddj.com/java/184401976?pgno=15
title=Microbenchmarking C++, C#, and Java: File I/O
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] andException handling , [cite web
url=http://www.ddj.com/java/184401976?pgno=17
title=Microbenchmarking C++, C#, and Java: Exception
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] have a similar performance to comparable C programs
* Collections, [cite web
url=http://www.ddj.com/java/184401976?pgno=18
title=Microbenchmarking C++, C#, and Java: Single Hash Map
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] [cite web
url=http://www.ddj.com/java/184401976?pgno=19
title=Microbenchmarking C++, C#, and Java: Multiple Hash Map
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] Objects creation and destruction performance, as well as method calls [cite web
url=http://www.ddj.com/java/184401976?pgno=9
title=Microbenchmarking C++, C#, and Java: Object creation/ destruction and method call
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] are much better in Java than in C++. However these particular tests may be biased against C++ since it creates "dynamic objects" even if C++ can create stack objects while Java can't. Heap allocation is slow in C++ since its a general mechanism that should be used only if really neededFact|date=August 2008.
*Array s [cite web
url=http://www.ddj.com/java/184401976?pgno=19
title=Microbenchmarking C++, C#, and Java: Array
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17] operations performance are better in C.
*Trigonometric functions performance are much better in C. [cite web
url=http://www.ddj.com/java/184401976?pgno=19
title=Microbenchmarking C++, C#, and Java: Trigonometric functions
publisher=Dr. Dobb's Journal
date=2005-07-01
accessdate=2007-11-17]tartup time
Java startup time is often much slower than for C or
C++ , because a lot of classes (and first of all classes from the platform Class libraries) must be loaded before being used.It seems that much of the startup time is due to IO-bound operations rather than JVM initialization or class loading (the "rt.jar" class data file alone is 40 MB and the JVM must seek a lot of data in this huge file). Some tests showed that although the new Split bytecode verification technique improved class loading by roughly 40%, it only translated to about 5% startupimprovement for large programs. [cite web
url=http://forums.java.net/jive/thread.jspa?messageID=94530
title=How fast is the new verifier?
date=2006-02-07
accessdate=2007-05-09]Albeit a small improvement it is more visible in small programs that perform a simple operation and then exit, because the Java platform data loading can represent many times the load of the actual program's operation.
Future improvements are planned to preload class data at OS startup to get data from the disk cache rather than on the disk.
Memory usage
Java memory usage is heavier than for C or C++, because:
* parts of the Java Library must be loaded prior to the program execution (at least the classes that are used "under the hood" by the program) [http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html]
* both the Java binary and native recompilations will typically be in memory at once, and
* the virtual machine itself consumes memory.Trigonometric functions
Performance of trigonometric functions can be bad compared to C, because Java has strict specifications for the results of mathematical operations, which may not correspond to the underlying hardware implementation. [cite web|url = http://java.sun.com/javase/6/docs/api/java/lang/Math.html |title = Math (Java Platform SE 6) |publisher =
Sun Microsystems |accessdate=2008-06-08] On thex87 sine and cosine instructions for arguments with absolute value greater than are not accurate, because they are computed by reducing them to this range using an approximation of . [cite web|url=http://blogs.sun.com/jag/entry/transcendental_meditation |title = Transcendental Meditation |accessdate=2008-06-08| date=2005-07-27 |first=James |last=Gosling |authorlink=James Gosling] A JVM implementation must perform an accurate reduction in software instead, causing a big performance hit for values outside the range. [cite web|url=http://www.osnews.com/story/5602&page=3 |title=Nine Language Performance Round-up: Benchmarking Math & File I/O |last=W. Cowell-Shah |first=Christopher |date=2004-01-08 |accessdate=2008-06-08]Java Native Interface
The
Java Native Interface has a high overhead associated with it, making it costly to cross the boundary between code running on the JVM and native code. [cite web
url=http://java.sun.com/docs/books/performance/1st_edition/html/JPNativeCode.fm.html
title=JavaTM Platform Performance: Using Native Code
last=Wilson|first=Steve
coauthors=Jeff Kesselman
publisher=Sun Microsystems
date=2001
accessdate=2008-02-15] [cite web
url=http://janet-project.sourceforge.net/papers/jnibench.pdf
title=Efficient Cooperation between Java and Native Codes - JNI Performance Benchmark
last=Kurzyniec |first=Dawid
coauthors=Vaidy Sunderam
accessdate=2008-02-15]User interface
Swing has been perceived as slower than native
widget toolkit s, because it delegates the rendering of widgets to the pure JavaJava 2D API . However, benchmarks comparing the performance of Swing versus theStandard Widget Toolkit , which delegates the rendering to the native GUI libraries of the operating system, show no clear winner, and the results greatly depend on the context and the environments. [cite web
url=http://cosylib.cosylab.com/pub/CSS/DOC-SWT_Vs._Swing_Performance_Comparison.pdf
title=SWT Vs. Swing Performance Comparison
quote="Initial expectation before performing this benchmark was to find SWT outperform Swing. This expectation stemmed from greater responsiveness of SWT-based Java applications (e.g., Eclipse IDE) compared to Swing-based applications. However, this expectation could not be quantitatively confirmed."
first=Križnar|last=Igor
publisher=cosylab.com
date=2005-05-10
accessdate=2008-05-24]Use for High Performance Computing
Recent independent studies seem to show that Java performance for
High Performance Computing (HPC) is similar toFortran on computation intensive benchmarks, but that JVMs still have scalability issues for performing intensive communication on a Grid Network [cite web
url=http://hal.inria.fr/inria-00312039/en
title=Current State of Java for HPC
quote="We first perform some micro benchmarks for various JVMs, showing the overall good performance for basic arithmetic operations(...). Comparing this implementation with a Fortran/MPI one, we show that they have similar performance on computation intensive benchmarks, but still have scalability issues when performing intensive communications."
author=Brian Amedro, Vladimir Bodnartchouk, Denis Caromel, Christian Delbe, Fabrice Huet, Guillermo L. Taboada
publisher=INRIA
date=August 2008
accessdate=2008-09-04] .Notes
ee also
*
Java Platform
*Java Virtual Machine
*HotSpot
*Java Runtime Environment
*Java version history
*Virtual Machine
*Common Language Runtime
*Compiler optimization
*Performance analysis External links
* [http://java.sun.com/docs/performance/ Sun's Java performance portal]
* [http://shootout.alioth.debian.org/ The Computer Language Benchmarks Game]
Wikimedia Foundation. 2010.