- Bzip2
Infobox file format
name = bzip2
extension = .bz2
mime = application/x-bzip
owner =Julian Seward
type code = Bzp2
magic = BZh
genre =Data compression
container for =
contained by =
extended from =
extended to =Infobox_Software
name = bzip2
caption =
developer =Julian Seward
latest_release_version = 1.0.5
latest_release_date =March 17 ,2008
operating_system =Cross-platform
genre =data compression
license =BSD licence cite web|publisher=Julian Seward |url=http://www.bzip.org/|title=bzip2 : Home|accessdate=2008-09-27|quote=Why would I want to use it? [..] Because it's open-source (BSD-style license), and, as far as I know, patent-free.]
website = [http://bzip.org/ bzip.org]bzip2 is a free and open source
lossless data compression algorithm and program developed byJulian Seward . Seward made the first public release of bzip2, version 0.15, in July 1996. The compressor's stability and popularity grew over the next several years, and Seward released version 1.0 in late 2000.Compression efficiency
bzip2 compresses most files more effectively than more traditional
gzip or ZIP but is slowerFact|date=August 2008. In this manner it is fairly similar to other recent-generation compression algorithms. Unlike other formats such asRAR or ZIP (but similar to gzip), bzip2 is only a data compressor, not an archiver. The program itself has no facilities for multiple files, encryption or archive-splitting, but, in theUNIX tradition, relies instead on separate external utilities such as tar and GnuPG for these tasks.In most cases bzip2 is surpassed by PPM algorithms in terms of absolute compression efficiency. bzip2 gets within ten to fifteen percent of PPM, while being roughly twice as fast at compression and six times faster at decompression. [ [http://www.bzip.org/ bzip2 Web site] ]
bzip2 uses the
Burrows-Wheeler transform to convert frequently recurring character sequences into strings of identical letters, and then applies amove-to-front transform and finallyHuffman coding . In bzip2 the blocks are generally all the same size in plaintext, which can be selected by a command-line argument between 100 kB–900 kB. Compression blocks are delimited by a 48-bit sequence (magic number) derived from thebinary-coded decimal representation of π,0x314159265359
, with the end-of-stream similarly delimited by a value representing sqrt(π),0x177245385090
.Originally, bzip2's ancestor bzip used
arithmetic coding after the blocksort; this was discontinued because of the patent restriction to be replaced by theHuffman coding currently used in bzip2Fact|date=August 2008.bzip2 is known to be quite slow at compressing, leading users to opt for alternatives such as
gzip when time is an issue. This problem is asymmetric, as decompression is relatively fast. Motivated by the large CPU time required for compression, a modified version was created in 2003 that supportedmulti-threading , giving significant speed improvements on multi-cpu and multi-core computersFact|date=August 2008. As of January 2008 this functionality has not been incorporated into the main project.Compression stack
Bzip2 uses several layers of compression techniques stacked on top of each other, which occur in the following order during compression and the reverse order during decompression:
#Run-length encoding (RLE): any sequence of 4 to 255 consecutive duplicate symbols is replaced by the first four symbols and a repeat length between 0 and 251. Thus the sequence"AAAAAAABBBBCCCD"
is replaced with"AAAA3BBBB
Wikimedia Foundation. 2010.