Compound File Binary Format

Compound File Binary Format

Compound File Binary Format (CFBF), also called Compound File or Compound Document[1], is a file format for storing numerous files and streams within a single file on a disk. CFBF is developed by Microsoft and is an implementation of Microsoft COM Structured Storage.[2][3][4]

Microsoft has opened the format for use by others and it is now used in a variety of programs from Microsoft Word and Microsoft Access to Business Objects.[citation needed] It also forms the basis of the Advanced Authoring Format.[5]

Contents

Overview

At its simplest, the Compound File Binary Format is a container, with little restriction on what can be stored within it.

A CFBF file structure loosely resembles a FAT filesystem. The file is partitioned into Sectors which are chained together with a File Allocation Table (not to be mistaken with the file system of the same name) which contains chains of sectors related to each file, a Directory holds information for contained files with a Sector ID (SID) for the starting sector of a chain and so on.

Structure

The CFBF file consists of a 512-Byte header record followed by a number of sectors whose size is defined in the header. The literature defines Sectors to be either 512 or 4096 bytes in length, although the format is potentially capable of supporting sectors ranging in size from 128-Bytes upwards in powers of 2 (128, 256, 512, 1024, etc.). The lower limit of 128 is the minimum required to fit a single directory entry in a Directory Sector.

There are several types of sector that may be present in a CFBF:

  • File Allocation Table (FAT) Sector - contains chains of sector indices much as a FAT does in the FAT/FAT32 filesystems
  • MiniFAT Sectors - similar to the FAT but storing chains of mini-sectors within the Mini-Stream
  • Double-Indirect FAT (DIFAT) Sector - contains chains of FAT sector indices
  • Directory Sector - contains directory entries
  • Stream Sector - contains arbitrary file data
  • Range Lock Sector - contains the byte-range locking area of a large file

More detail is given below for the header and each sector type.

CFBF Header format

The CFBF Header occupies the first 512 bytes of the file and information required to interpret the rest of the file. The C-Style structure declaration below (extracted from the AAFA's Low-Level Container Specification) shows the members of the CFBF header and their purpose:

typedef unsigned long ULONG;    // 4 Bytes
typedef unsigned short USHORT;  // 2 Bytes
typedef short OFFSET;           // 2 Bytes
typedef ULONG SECT;             // 4 Bytes
typedef ULONG FSINDEX;          // 4 Bytes
typedef USHORT FSOFFSET;        // 2 Bytes
typedef USHORT WCHAR;           // 2 Bytes
typedef ULONG DFSIGNATURE;      // 4 Bytes
typedef unsigned char BYTE;     // 1 Byte
typedef unsigned short WORD;    // 2 Bytes
typedef unsigned long DWORD;    // 4 Bytes
typedef ULONG SID;              // 4 Bytes
typedef GUID CLSID;             // 16 Bytes
struct StructuredStorageHeader { // [offset from start (bytes), length (bytes)]
    BYTE _abSig[8];             // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
                                // 0x1a, 0xe1} for current version
    CLSID _clsid;               // [08H,16] reserved must be zero (WriteClassStg/
                                // GetClassFile uses root directory class id)
    USHORT _uMinorVersion;      // [18H,02] minor version of the format: 33 is
                                // written by reference implementation
    USHORT _uDllVersion;        // [1AH,02] major version of the dll/format: 3 for
                                // 512-byte sectors, 4 for 4 KB sectors
    USHORT _uByteOrder;         // [1CH,02] 0xFFFE: indicates Intel byte-ordering
    USHORT _uSectorShift;       // [1EH,02] size of sectors in power-of-two;
                                // typically 9 indicating 512-byte sectors
    USHORT _uMiniSectorShift;   // [20H,02] size of mini-sectors in power-of-two;
                                // typically 6 indicating 64-byte mini-sectors
    USHORT _usReserved;         // [22H,02] reserved, must be zero
    ULONG _ulReserved1;         // [24H,04] reserved, must be zero
    FSINDEX _csectDir;          // [28H,04] must be zero for 512-byte sectors,
                                // number of SECTs in directory chain for 4 KB
                                // sectors
    FSINDEX _csectFat;          // [2CH,04] number of SECTs in the FAT chain
    SECT _sectDirStart;         // [30H,04] first SECT in the directory chain
    DFSIGNATURE _signature;     // [34H,04] signature used for transactions; must
                                // be zero. The reference implementation
                                // does not support transactions
    ULONG _ulMiniSectorCutoff;  // [38H,04] maximum size for a mini stream;
                                // typically 4096 bytes
    SECT _sectMiniFatStart;     // [3CH,04] first SECT in the MiniFAT chain
    FSINDEX _csectMiniFat;      // [40H,04] number of SECTs in the MiniFAT chain
    SECT _sectDifStart;         // [44H,04] first SECT in the DIFAT chain
    FSINDEX _csectDif;          // [48H,04] number of SECTs in the DIFAT chain
    SECT _sectFat[109];         // [4CH,436] the SECTs of first 109 FAT sectors
};

File Allocation Table (FAT) Sectors

When taken together as a single stream the collection of FAT sectors define the status and linkage of every sector in the file. Each entry in the FAT is 4 bytes in length and contains the sector number of the next sector in a FAT chain or one of the following special values:

  • FREESECT (0xFFFFFFFF) - denotes an unused sector
  • ENDOFCHAIN (0xFFFFFFFE) - marks the last sector in a FAT chain
  • FATSECT (0xFFFFFFFD) - marks a sector used to store part of the FAT
  • DIFSECT (0xFFFFFFFC) - marks a sector used to store part of the DIFAT

Range Lock Sector

The Range Lock Sector must exist in files greater than 2GB in size, and must not exist in files smaller than 2GB. The Range Lock Sector must contain the byte range 0x7FFFFF00 to 0x7FFFFFFF in the file. This area is reserved by Microsoft's COM implementation for storing byte-range locking information for concurrent access.

Glossary

  • FAT - File Allocation Table, also known as: SAT - Sector Allocation Table
  • DIFAT - Double-Indirect File Allocation Table
  • FAT Chain - a group of FAT entries which indicate the sectors allocated to a Stream in the file
  • Stream - a virtual file which occupies a number of sectors within the CFBF
  • Sector - the unit of allocation within the CFBF, usually 512 or 4096 Bytes in length

See also

References

  1. ^ "Apache POI - POIFS". POI Project. http://poi.apache.org/poifs/index.html. Retrieved 10 May 2011. 
  2. ^ "Compound Files (Windows)". Microsoft Developers Network (MSDN) library – COM SDK. Microsoft Corporation. 20 November 2008. http://msdn.microsoft.com/en-us/library/aa378938%28VS.85%29.aspx. Retrieved 23 September 2009. 
  3. ^ "Containers: Compound Files". Microsoft Developers Network (MSDN) library – Visual Studio 2008 documentation. Microsoft Corporation. http://msdn.microsoft.com/en-us/library/ydd3k45e.aspx. Retrieved 23 September 2009. 
  4. ^ "Understand Compound Files". Microsoft Developers Network (MSDN) library – ActiveDirectory Rights Management. 25 June 2009. http://msdn.microsoft.com/en-us/library/cc542545%28VS.85%29.aspx. Retrieved 23 September 2009. 
  5. ^ AMW Association (formerly AAF Association)

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • SNP File Format — Infobox file format name = Snapshot File icon = extension = .snp mime = owner = Microsoft type code = genre = Access report output, multi page, precise containerfor = EMF (contained pages) containedby = CAB (compression wrapper) extended from =… …   Wikipedia

  • Binary prefix — Prefixes for bit and byte multiples Decimal Value SI 1000 k kilo 10002 M mega …   Wikipedia

  • Netpbm format — Portable pixmap Filename extension .ppm, .pgm, .pbm, .pnm Internet media type image/x portable pixmap, graymap, bitmap, anymap all unofficial Developed by Jef Poskanzer Type of format Image file formats …   Wikipedia

  • Document file format — A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats. A rough consensus has been established that …   Wikipedia

  • Chemical file format — This article discusses some common molecular file formats, including usage and converting between them. Contents 1 Distinguishing formats 2 Chemical Markup Language 3 Protein Data Bank Format 4 G …   Wikipedia

  • Exchangeable image file format — This article is about a format for storing metadata in image and audio files. For information about filename and directory structures of digital cameras, see Design rule for Camera File system. Filename extension .JPG, .TIF, .WAV Developed by… …   Wikipedia

  • Tagged Image File Format — TIF and TIFF redirect here. For other uses, see TIF (disambiguation) and TIFF (disambiguation). Tagged Image File Format Filename extension .tiff, .tif Internet media type image/tiff, image/tiff fx Type code TIFF Unifo …   Wikipedia

  • CoreFSIF — is a structured storage system originally developed by Avanticore, Inc. [http://www.avanticore.com/html/info fsif.html] for on disk storage of transparently encrypted data. CoreFSIF targets primarily embedded platforms such as Microsoft s Windows …   Wikipedia

  • Office Open XML file formats — Main article: Office Open XML Office Open XML Office Open XML file formats Open Packaging Conventions Open Specification Promise Vector Markup Language Office Open XML software Comparison of Office Open XML software Office Open XML… …   Wikipedia

  • Portable Document Format — PDF redirects here. For other uses, see PDF (disambiguation). Portable Document Format Adobe Reader icon Filename extension .pdf Internet media type application/pdf application/x pdf application/x bzpdf application/x gzpdf …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”