Class (file format)

Class (file format)

In the Java programming language, source files (.java files) are compiled into class files which have a .class extension. Since Java is a platform-independent language, source code is compiled into an output file known as bytecode, which it stores in a .class file. If a source file has more than one class, each class is compiled into a separate .class file. These .class files can be loaded by any Java Virtual Machine (JVM).

Since JVMs are available for many platforms, the .class file compiled in one platform will execute in a JVM of another platform. This makes Java platform-independent.

As of 2006, the modification of the class file format is being considered under Java Specification Request (JSR) 202.

File layout and structure

The TEN Sections

There are 10 basic sections to the Java Class File structure:
* Magic Number: this is currently 0xCAFEBABE
* Version of Class File Format: the minor and major versions of the class file
* Constant Pool: Pool of constants for the class
* Access Flags: for example whether the class is abstract, static, etc
* This Class: The name of the current class
* Super Class: The name of the super class
* Interfaces: Any interfaces in the class
* Fields: Any fields in the class
* Methods: Any methods in the class
* Attributes: Any attributes of the class (for example the name of the sourcefile, etc)

There is a handy mnemonic for remembering these 10:My Very Cute Animal Turns Savage In Full Moon Areas.

Magic, Version, Constant, Access, This, Super, Interfaces, Fields, Methods, Attributes (MVCATSIFMA)

General layout

Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:

* u1: an unsigned 8-bit integer
* u2: an unsigned 16-bit integer in big-endian byte order
* u4: an unsigned 32-bit integer in big-endian byte order
* table: an array of variable-length items of some type. The number of items in the table is identified by a preceding count number, but the size in bytes of the table can only be determined by examining each of its items.

Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context.There is no enforcement of word alignment, and so no padding bytes are ever used.The overall layout of the class file is as shown in the following table.

C programming language representation

The structure of the class file format can be described using a C-like syntax as follows. It should be noted, however, that this is not exactly C syntax because the tables defined by the class file format do not have fixed length entries in some cases.

struct Class_File_Format { u4 magic_number; //unsigned, 4 byte (32 bit) number that //indicates the start of a class file //the actual value is defined in the Java //Virtual Machine Specification as //0xCAFEBABE in hexadecimal, which equals //1100 1010 1111 1110 1011 1010 1011 1110 //in binary, and 3,405,691,582 in decimal

u2 minor_version; //unsigned, 2 byte (16 bit) minor version number u2 major_version; //unsigned, 2 byte (16 bit) major version number

u2 constant_pool_count; //unsigned, 2 byte (16 bit) number //indicating the number of entries //in the constant pool table, plus //one

//the constant pool table cp_info constant_pool [constant_pool_count - 1] ;

u2 access_flags;

u2 this_class; u2 super_class;

u2 interfaces_count; //unsigned, 2 byte (16 bit) number //indicating the number of entries //in the table of superinterfaces //of this class

//the table of superinterfaces of this class u2 interfaces [interfaces_count] ;

u2 fields_count; //unsigned, 2 byte (16 bit) number //indicating the number of entries in //the table of fields of this class

//the table of fields of this class field_info fields [fields_count] ;

u2 methods_count; //unsigned, 2 byte (16 bit) number //indicating the number of entries in //the table of methods of this class

//the table of methods of this class method_info methods [methods_count] ;

u2 attributes_count; //unsigned, 2 byte (16 bit) number //indicating the number of //attributes in the attributes //table

//the attributes table attribute_info attributes [attributes_count] ;}

The constant pool

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).

Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), so the count should actually be interpreted as the maximum index. Additionally a couple types of constants, namely longs and doubles, take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.

The type of each item (constant) in the constant pool is identified by an initial byte "tag". The number of bytes following this tag and their interpretation are then dependent upon the tag value. The legal constant types and their tag values are:

There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.

Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".

The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a complete discussion). The first is that the codepoint U+0000 is encoded as the two-byte sequence C0 80 (in hex) instead of the standard single-byte encoding 00. The second difference is that supplementary characters (those outside the BMP at U+10000 and above) are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E, rather than the correct 4-byte UTF-8 encoding of f0 9d 84 9e.

History

Class files are identified by the following 4 byte header (in hexadecimal): CA FE BA BE (the first 4 entries in the above table). The history of this magic number was explained by James Gosling:

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."

References

* "The Java Virtual Machine Specification, Second Edition" is the official defining document of the Java Virtual Machine (which includes the class file format) as officially specified by Sun Microsystems, and is available online on Sun's website at [http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html] , and also in printed book form as ISBN 0-201-43294-3. Both the first and second editions of the book are freely available online for viewing and/or download at [http://java.sun.com/docs/books/vmspec/ http://java.sun.com/docs/books/vmspec/]
* [http://www.jcp.org/en/jsr/detail?id=202 JSR 202] Java Class File Specification Update
* James Gosling private communication to Bill Bumgarner: http://radio.weblogs.com/0100490/2003/01/28.html


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • ZIP (file format) — unzip redirects here. For the program, see Info ZIP. ZIP Filename extension .zip .zipx (newer compression algorithms) Internet media type application/zip Uniform Type Identifier com.pkware.zip archive Magic …   Wikipedia

  • JAR (file format) — Infobox file format name = Java Archive icon = extension = .jar mime = application/java archive type code = uniform type = com.sun.java archive owner = Sun Microsystems genre = file archive, data compression container for = contained by =… …   Wikipedia

  • UEF (file format) — Infobox file format name = Unified Emulator Format icon = caption = extension = .uef mime = application/octet stream type code = uniform type = magic = UEF File! owner = Thomas Harte released = before 10 August 2000… …   Wikipedia

  • BMP file format — Windows Bitmap Filename extension .bmp or .dib Internet media type image/x ms bmp (unofficial) or image/x bmp (unofficial) Type code BMP BMPf BMPp Uniform Type Identifier com.microsoft.bmp …   Wikipedia

  • EAR (file format) — An Enterprise ARchive, or EAR, is a file format used by Java EE for packaging one or more modules into a single archive so that the deployment of the various modules onto an application server happens simultaneously and coherently. It also… …   Wikipedia

  • S5 (file format) — S5 stands for Simple Standards Based Slide Show System and is an XHTML based file format for defining slideshows. It was created by Eric A. Meyer as an alternative to the browser centric Opera Show Format. S5 is not a presentation program, but… …   Wikipedia

  • MOI (file format) — Infobox file format name = MOI icon = caption = extension = .moi mime = type code = uniform type = magic = owner = genre = Information container for = contained by = extended from = extended to = standard = MOI is a computer file format used… …   Wikipedia

  • Java class file — This article is about the data format. For classes in Java, see Class (computer programming). Class Filename extension .class Developed by Sun Microsystems Type of format Bytecode In the Java programming language, source files (.java files) are… …   Wikipedia

  • Exchangeable image file format — This article is about a format for storing metadata in image and audio files. For information about filename and directory structures of digital cameras, see Design rule for Camera File system. Filename extension .JPG, .TIF, .WAV Developed by… …   Wikipedia

  • Magick Image File Format — This page is about the MIFF file format. For the film festivals, see Melbourne International Film Festival or Milano International Film Festival Not to be confused with ImageMagick The Magick Image File Format, abbreviated MIFF, is an image… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”