In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties. Plaintext has reference to the operation of cryptographic algorithms, usually encryption algorithms, and is the input upon which they operate. Cleartext, by contrast, refers to data that is transmitted or stored unencrypted (that is, 'in the clear').
Since computers became commonly available, the definition has also encompassed not only electronic representations of the traditional text, for instance, messages (e.g., email) and document content (e.g., word processor files), but also the computer representations of sound (e.g., speech or music), images (e.g., photos or videos), ATM and credit card transaction information, sensor data, and so forth. Few of these are directly meaningful to humans, being already transformed into computer manipulable forms. Basically, any information which the communicating parties wish to conceal from others can now be treated, and referred to, as plaintext. Thus, in a significant sense, plaintext is the 'normal' representation of data before any action has been taken to conceal, compress, or 'digest' it. It need not represent text, and even if it does, the text may not be "plain".
Plaintext is used as input to an encryption algorithm; the output is usually termed ciphertext particularly when the algorithm is a cipher. Codetext is less often used, and almost always only when the algorithm involved is actually a code. In some systems, however, multiple layers of encryption are used, in which case the output of one encryption algorithm becomes plaintext input for the next.
Secure handling of plaintext
In a cryptosystem, weaknesses can be introduced through insecure handling of plaintext, allowing an attacker to bypass the cryptography altogether. Plaintext is vulnerable in use and in storage, whether in electronic or paper format. Physical security deals with methods of securing information and its storage media from local, physical, attacks. For instance, an attacker might enter a poorly secured building and attempt to open locked desk drawers or safes. An attacker can also engage in dumpster diving, and may be able to reconstruct shredded information if it is sufficiently valuable to be worth the effort. One countermeasure is to burn or thoroughly crosscut shred discarded printed plaintexts or storage media; NSA is infamous for its disposal security precautions.
If plaintext is stored in a computer file (and the situation of automatically made backup files generated during program execution must be included here, even if invisible to the user), the storage media along with the entire computer and its components must be secure. Sensitive data is sometimes processed on computers whose mass storage is removable, in which case physical security of the removed disk is separately vital. In the case of securing a computer, useful (as opposed to handwaving) security must be physical (e.g., against burglary, brazen removal under cover of supposed repair, installation of covert monitoring devices, etc.), as well as virtual (e.g., operating system modification, illicit network access, Trojan programs, ...). The wide availability of keydrives, which can plug into most modern computers and store large quantities of data, poses another severe security headache. A spy (perhaps posing as a cleaning person) could easily conceal one and even swallow it, if necessary.
Discarded computers, disk drives and media are also a potential source of plaintexts. Most operating systems do not actually erase anything — they simply mark the disk space occupied by a deleted file as 'available for use', and remove its entry from the file system directory. The information in a file deleted in this way remains fully present until overwritten at some later time when the operating system reuses the disk space. With even low-end computers commonly sold with many gigabytes of disk space and rising monthly, this 'later time' may be months later, or never. Even overwriting the portion of a disk surface occupied by a deleted file is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated 1996 paper on the recovery of overwritten information from magnetic disks; areal storage densities have gotten much higher since then, so this sort of recovery is likely to be more difficult than it was when Gutmann wrote.
Also, independently, modern hard drives automatically remap sectors that are starting to fail; those sectors no longer in use will contain information that is entirely invisible to the file system (and all software which uses it for access to disk data), but is nonetheless still present on the physical drive platter. It may, of course, be sensitive plaintext. Some government agencies (e.g., NSA) require that all disk drives be physically pulverized when they are discarded, and in some cases, chemically treated with corrosives before or after. This practice is not widespread outside of the government, however. For example, Garfinkel and Shelat (2003) analyzed 158 second-hand hard drives acquired at garage sales and the like and found that less than 10% had been sufficiently sanitized. A wide variety of personal and confidential information was found readable from the others. See data remanence.
Laptop computers are a special problem. The US State Department, the British Secret Service, and the US Department of Defense have all had laptops containing secret information,some perhaps in plaintext form, 'vanish' in recent years. Announcements of similar losses are becoming a common item in news reports. Disk encryption techniques can provide protection against such loss or theft — if properly chosen and used.
On occasion, even when the data on the host systems is itself encrypted, the media used to transfer data between such systems is nevertheless plaintext due to poorly designed data policy. An incident in October 2007 in which HM Revenue and Customs lost CDs containing no less than the records of 25 million child benefit recipients in the United Kingdom — the data apparently being entirely unencrypted — is a case in point.
Modern cryptographic systems are designed to resist known plaintext or even chosen plaintext attacks and so may not be entirely compromised when plaintext is lost or stolen. Older systems used techniques such as padding and Russian copulation to obscure information in plaintext that could be easily guessed, and to resist the effects of loss of plaintext on the security of the cryptosystem.
- S. Garfinkel and A Shelat, "Remembrance of Data Passed: A Study of Disk Sanitization Practices", IEEE Security and Privacy, January/February 2003 (PDF).
- UK HM Revenue and Customs loses 25m records of child benefit recipients BBC
- Kissel, Richard (editor). (February, 2011). NIST IR 7298 Revision 1, Glossary of Key Information Security Terms (PDF). National Institute of Standards and Technology.
Wikimedia Foundation. 2010.