- Uuencoding
Uuencoding is a form of
binary-to-text encoding that originated in theUnix program uuencode, for encoding binary data for transmission over theuucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers'character set s, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files bye-mail and posting tousenet newsgroups, etc. It has now been largely replaced byMIME andyEnc . With MIME, files that might have been uuencoded are transferred withbase64 encoding.Encoded format
A file in uuencoded format starts with a header line of the form: begin
Where is the file's Unix read/write/execute permissions as three octal digits, and is the name to be used when recreating the binary data. The file ends with two trailer lines: ` end(The grave accent indicates a line that encodes zero bytes; see below.) Lines between the header and trailer encode data.
Each data line starts with a character indicating the number of data bytes encoded on that line and ends with a newline character. All data lines, except perhaps the last, encode 45 bytes of data. The corresponding encoded length value is 'M' (see below), so most lines begin with 'M'.
A data line subsequently contains group of four characters that encode three bytes of data. If the number of data bytes for a line is not divisible by three, one or two additional zero bytes are appended to the input data before encoding; the encoding always has groups of four characters. Those padding bytes are "not" included in the count at the beginning of the last line.
A data line's byte count is encoded by adding 32 and using the corresponding
ASCII character, except that a byte count of zero is encoded as grave accent ("`", code 96).(In ASCII the first thirty-two characters are unprintable and controlled data transmission. They could be modified or deleted by transmission. The next ninety-five characters at code 32 and above are all printable. Since the byte count is in the range 0-45, adding 32 converts it into a printable character. The ASCII code for 'M' is 77, or exactly 45 + 32. For a zero-length line, adding 32 to 0 gives 32, corresponding to a space character. This character was also problematic for data transmission, so the grave accent ("`", code 96) is used instead. Subtracting 32 produces a value whose lower six bits are 0.)
Each group of three bytes is encoded into four characters. The bytes are concatenated into a 24-bit value in
big-endian order. (The first byte become the most significant 8 bits of the value.) The 24-bit value is then split into four groups of six bits each, also in big-ending order. (The most significant six bits becomes the first group.) Each group of six bits is then encoded into a character using the same calculation as for byte counts. (Since the range of values is from 0 to 63, when 32 is added the ASCII characters will lie in the range 32 (space) to 32 + 63 = 95 (underscore).) ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.Sometimes each data line has extra dummy characters (often the
grave accent (ASCII 96)) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent can also be used in place of a space character.As a complete file, the uuencoded output for (the ASCII bytes representing the string) "Cat" would be begin 644 cat.txt #0V%T ` endThe begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.
ample uuencoding
The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".
Uuencode table
The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent (in
octal ).POSIX Base64 coding
Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain old computers. The worst offenders are computers using non-ASCII character sets such as
EBCDIC . One attempt to fix the problem was theXxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is theBase64 format; it can also be generated by the uuencode program. The header is changed to begin-base64the trailer becomes = and lines between are encoded with characters chosen from ABCDEFGHIJKLMNOP QRSTUVWXYZabcdef ghijklmnopqrstuv wxyz0123456789+/
Trivia
Microsoft's E-mail-program
Outlook Express once erroneously accepted "begin" as the start of UUEncoded attachments ("i.e.", not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom usedFact|date=January 2007 and plain text is preferred, some people would embed begin, space, space in their messages in order to maliciously hide the rest of the message from Outlook Express users ("e.g.", they configured their news-client to quote starting with the line "begin quote from xxx") [http://support.microsoft.com/default.aspx/kb/898124] . ee also
*
Base64
*BinHex
*MIME
* XXEncode
*YEnc References
* [http://www.opengroup.org/onlinepubs/009695399/utilities/uuencode.html IEEE Std 1003.1 uuencode man page]
External links
* [http://www.gnu.org/software/sharutils/ GNU sharutils] - The Free Software Foundation's sharutils bundle includes uuencode, uudecode, and others.
* [http://www.fpx.de/fp/Software/UUDeview/ UUDeview] - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
* [http://www.bastet.com/ UUENCODE-UUDECODE] - open-source program to encode/decode created by Clem "Grandad" Dye
* [http://www.stuartcheshire.org/StUU.html StUU] - Open Source fast UUDecoder for Macintosh byStuart Cheshire
Wikimedia Foundation. 2010.