Java class file

Java class file: This article is about the data format. For classes in Java, see Class (computer programming).

Class
Filename extension .class

Developed by Sun Microsystems

Type of format Bytecode

In the Java programming language, source files (.java files) are compiled into (virtual) machine-readable class files which have a .class extension. Since Java is a platform-independent language, source code is compiled into an output file known as bytecode, which it stores in a .class file. If a source file has more than one class, each class is compiled into a separate .class file. These .class files can be loaded by any Java Virtual Machine (JVM).

JVMs are available for many platforms, and the .class file compiled in one platform will execute in a JVM of another platform. This makes Java platform-independent.

Contents

1 History

2 File layout and structure

2.1 Sections

2.2 Magic Number

2.3 General layout

2.4 C programming language representation

2.5 The constant pool

3 See also

4 References

5 Further reading

History

As of 2006^[update], the modification of the class file format is being considered under Java Specification Request (JSR) 202.^[1]

File layout and structure

Sections

There are 10 basic sections to the Java Class File structure:

Magic Number: 0xCAFEBABE

Version of Class File Format: the minor and major versions of the class file

Constant Pool: Pool of constants for the class

Access Flags: for example whether the class is abstract, static, etc.

This Class: The name of the current class

Super Class: The name of the super class

Interfaces: Any interfaces in the class

Fields: Any fields in the class

Methods: Any methods in the class

Attributes: Any attributes of the class (for example the name of the sourcefile, etc.)

There is a handy mnemonic for remembering these 10: My Very Cute Animal Turns Savage In Full Moon Areas.

Magic, Version, Constant, Access, This, Super, Interfaces, Fields, Methods, Attributes (MVCATSIFMA)

Magic Number

Class files are identified by the following 4 byte header (in hexadecimal): CA FE BA BE (the first 4 entries in the below table). The history of this magic number was explained by James Gosling:^[2]

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."

General layout

Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:

u1: an unsigned 8-bit integer

u2: an unsigned 16-bit integer in big-endian byte order

u4: an unsigned 32-bit integer in big-endian byte order

table: an array of variable-length items of some type. The number of items in the table is identified by a preceding count number, but the size in bytes of the table can only be determined by examining each of its items.

Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context. There is no enforcement of word alignment, and so no padding bytes are ever used. The overall layout of the class file is as shown in the following table.

byte offset size type or value description

0 4 bytes u1 =
0xCA hex magic number (CAFEBABE) used to identify file as conforming to the class file format

1 u1 =
0xFE hex

2 u1 =
0xBA hex

3 u1 =
0xBE hex

4 2 bytes u2 minor version number of the class file format being used

5

6 2 bytes u2 major version number of the class file format being used.
J2SE 7 = 51 (0x33 hex),
J2SE 6.0 = 50 (0x32 hex),
J2SE 5.0 = 49 (0x31 hex),
JDK 1.4 = 48 (0x30 hex),
JDK 1.3 = 47 (0x2F hex),
JDK 1.2 = 46 (0x2E hex),
JDK 1.1 = 45 (0x2D hex).
For details of earlier version numbers see footnote 1 at The JavaTM Virtual Machine Specification 2nd edition

7

8 2 bytes u2 constant pool count, number of entries in the following constant pool table. This count is at least one greater than the actual number of entries; see following discussion.

9

10 cpsize (variable) table constant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing (constant pool count - 1) number of entries in total (see note).

...

...

...

10+cpsize 2 bytes u2 access flags, a bitmask

11+cpsize

12+cpsize 2 bytes u2 identifies this class, index into the constant pool to a "Class"-type entry

13+cpsize

14+cpsize 2 bytes u2 identifies super class, index into the constant pool to a "Class"-type entry

15+cpsize

16+cpsize 2 bytes u2 interface count, number of entries in the following interface table

17+cpsize

18+cpsize isize (variable) table interface table, an array of variable-sized interfaces

...

...

...

18+cpsize+isize 2 bytes u2 field count, number of entries in the following field table

19+cpsize+isize

20+cpsize+isize fsize (variable) table field table, variable length array of fields

...

...

...

20+cpsize+isize+fsize 2 bytes u2 method count, number of entries in the following method table

21+cpsize+isize+fsize

22+cpsize+isize+fsize msize (variable) table method table, variable length array of methods

...

...

...

22+cpsize+isize+fsize+msize 2 bytes u2 attribute count, number of entries in the following attribute table

23+cpsize+isize+fsize+msize

24+cpsize+isize+fsize+msize asize (variable) table attribute table, variable length array of attributes

...

...

...

C programming language representation

struct Class_File_Format { u4 magic_number; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count - 1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; }

The constant pool

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).

Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), so the count should actually be interpreted as the maximum index. Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.

The type of each item (constant) in the constant pool is identified by an initial byte tag. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are:

Tag byte Additional bytes Description of constant

1 2+x bytes
(variable) UTF-8 (Unicode) string: a character string prefixed by a 16-bit number (type u2) indicating the number of bytes in the encoded string which immediately follows (which may be different than the number of characters). Note that the encoding used is not actually UTF-8, but involves a slight modification of the Unicode standard encoding form.

3 4 bytes Integer: a signed 32-bit two's complement number in big-endian format

4 4 bytes Float: a 32-bit single-precision IEEE 754 floating-point number

5 8 bytes Long: a signed 64-bit two's complement number in big-endian format (takes two slots in the constant pool table)

6 8 bytes Double: a 64-bit double-precision IEEE 754 floating-point number (takes two slots in the constant pool table)

7 2 bytes Class reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name (in internal format)

8 2 bytes String reference: an index within the constant pool to a UTF-8 string

9 4 bytes Field reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.

10 4 bytes Method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.

11 4 bytes Interface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.

12 4 bytes Name and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name (identifier) and the second a specially encoded type descriptor.

There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.

Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".

The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a complete discussion). The first is that the codepoint U+0000 is encoded as the two-byte sequence C0 80 (in hex) instead of the standard single-byte encoding 00. The second difference is that supplementary characters (those outside the BMP at U+10000 and above) are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E, rather than the correct 4-byte UTF-8 encoding of F0 9D 84 9E.

See also

Java portal

References

^ JSR 202 Java Class File Specification Update

^ James Gosling private communication to Bill Bumgarner

Further reading

Tim Lindholm, Frank Yellin (1999). The Java Virtual Machine Specification (Second Edition ed.). Prentice Hall. ISBN 0-201-43294-3. http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html. Retrieved 2008-10-13. The official defining document of the Java Virtual Machine, which includes the class file format. Both the first and second editions of the book are freely available online for viewing and/or download.

v · d · eJava

Java platform
Java language · JVM · Micro Edition · Standard Edition · Enterprise Edition · Java Card

Sun technologies
Squawk · Java Development Kit · OpenJDK · Java Virtual Machine · JavaFX · Maxine VM

Platform technologies
Applets · Servlets · MIDlets · jsp · Web Start (jnlp)

Major third-party technologies
JRockit · GNU Classpath · Kaffe · TopLink · Apache Harmony · Apache Struts · Spring framework · Hibernate · JBoss application server · Tapestry · Jazelle

History
Java version history · Java Community Process · Sun Microsystems · Free Java implementations

Major programming languages
BeanShell · Clojure · Groovy · Java Tcl · JRuby · Jython · Processing · Rhino · Scala · more…

Java conferences
JavaOne

Categories:
Java platform
Computer file formats

Class
Filename extension	`.class`
Developed by	Sun Microsystems
Type of format	Bytecode

byte offset	size	type or value	description
0	4 bytes	u1 = 0xCA hex	magic number (CAFEBABE) used to identify file as conforming to the class file format
1	u1 = 0xFE hex
2	u1 = 0xBA hex
3	u1 = 0xBE hex
4	2 bytes	u2	minor version number of the class file format being used
5
6	2 bytes	u2	major version number of the class file format being used. J2SE 7 = 51 (0x33 hex), J2SE 6.0 = 50 (0x32 hex), J2SE 5.0 = 49 (0x31 hex), JDK 1.4 = 48 (0x30 hex), JDK 1.3 = 47 (0x2F hex), JDK 1.2 = 46 (0x2E hex), JDK 1.1 = 45 (0x2D hex). For details of earlier version numbers see footnote 1 at The JavaTM Virtual Machine Specification 2nd edition
7
8	2 bytes	u2	constant pool count, number of entries in the following constant pool table. This count is at least one greater than the actual number of entries; see following discussion.
9
10	cpsize (variable)	table	constant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing (constant pool count - 1) number of entries in total (see note).
...
...
...
10+cpsize	2 bytes	u2	access flags, a bitmask
11+cpsize
12+cpsize	2 bytes	u2	identifies this class, index into the constant pool to a "Class"-type entry
13+cpsize
14+cpsize	2 bytes	u2	identifies super class, index into the constant pool to a "Class"-type entry
15+cpsize
16+cpsize	2 bytes	u2	interface count, number of entries in the following interface table
17+cpsize
18+cpsize	isize (variable)	table	interface table, an array of variable-sized interfaces
...
...
...
18+cpsize+isize	2 bytes	u2	field count, number of entries in the following field table
19+cpsize+isize
20+cpsize+isize	fsize (variable)	table	field table, variable length array of fields
...
...
...
20+cpsize+isize+fsize	2 bytes	u2	method count, number of entries in the following method table
21+cpsize+isize+fsize
22+cpsize+isize+fsize	msize (variable)	table	method table, variable length array of methods
...
...
...
22+cpsize+isize+fsize+msize	2 bytes	u2	attribute count, number of entries in the following attribute table
23+cpsize+isize+fsize+msize
24+cpsize+isize+fsize+msize	asize (variable)	table	attribute table, variable length array of attributes
...
...
...

Tag byte	Additional bytes	Description of constant
1	2+x bytes (variable)	UTF-8 (Unicode) string: a character string prefixed by a 16-bit number (type u2) indicating the number of bytes in the encoded string which immediately follows (which may be different than the number of characters). Note that the encoding used is not actually UTF-8, but involves a slight modification of the Unicode standard encoding form.
3	4 bytes	Integer: a signed 32-bit two's complement number in big-endian format
4	4 bytes	Float: a 32-bit single-precision IEEE 754 floating-point number
5	8 bytes	Long: a signed 64-bit two's complement number in big-endian format (takes two slots in the constant pool table)
6	8 bytes	Double: a 64-bit double-precision IEEE 754 floating-point number (takes two slots in the constant pool table)
7	2 bytes	Class reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name (in internal format)
8	2 bytes	String reference: an index within the constant pool to a UTF-8 string
9	4 bytes	Field reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.
10	4 bytes	Method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.
11	4 bytes	Interface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor.
12	4 bytes	Name and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name (identifier) and the second a specially encoded type descriptor.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Class (file format) — In the Java programming language, source files (.java files) are compiled into class files which have a .class extension. Since Java is a platform independent language, source code is compiled into an output file known as bytecode, which it… … Wikipedia
Java Class Library — The Java Class Library is a set of dynamically loadable libraries that Java applications can call at runtime. Because the Java Platform is not dependent on any specific operating system, applications cannot rely on any of the existing libraries.… … Wikipedia
Java Card — (JC) refers to a technology that allows small Java based applications (applets) to be run securely on smart cards and similar small memory footprint devices. Java Card is the tiniest of Java targeted for embedded devices. Java Card gives the user … Wikipedia
Class (format de fichier) — Classe Java Extension de fichier .class Type MIME application/java vm Développé par Sun Microsystems Type de format … Wikipédia en Français
Java bytecode — is the form of instructions that the Java virtual machine executes. Each bytecode instruction or opcode is one byte in length, however not all of the possible 256 instructions are used. In fact, Sun Microsystems, the original creators of the Java … Wikipedia
Class — may refer to: Contents 1 General 2 Media and entertainment 3 Computing 4 Railroads General Class ( … Wikipedia
Java Specification Request — Java Specification Requests Java Specification Requests (JSR) est un système normalisé ayant pour but de faire évoluer la plateforme Java. Sommaire 1 Présentation 2 Implémentation 3 Interopérabilité informatique … Wikipédia en Français
Java Specification Requests — (JSR) est un système normalisé ayant pour but de faire évoluer la plateforme Java. Sommaire 1 Présentation 2 Implémentation 3 Liste des JSRs 4 Notes et … Wikipédia en Français
class (format de fichier) — Classe Java Extension .class Type MIME application/java vm Développé par Sun Microsystems Type de format … Wikipédia en Français
Java 4K Game Programming Contest — The Java 4K Game Programming Contest (aka Java 4K and J4K ) is an informal contest that was started by the Java Game Programming community to challenge their software development abilities. The goal of the contest is to develop the best game… … Wikipedia

Academic Dictionaries and Encyclopedias

Java class file

Contents

History

File layout and structure

Sections

Magic Number

General layout

C programming language representation

The constant pool

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Java class file

Contents

History

File layout and structure

Sections

Magic Number

General layout

C programming language representation

The constant pool

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Direct link