Data corruption

Data corruption
Photo data corruption; in this case, a result of a failed data recovery from a hard disk drive

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer storage and transmission systems use a number of measures to provide data integrity, or lack of errors.

In general, when data corruption occurs, the file containing that data may become inaccessible, and the system or the related application will give an error. For example, if a Microsoft Word file is corrupted, when you try to open that file with MS Word, you will get an error message, and the file would not be opened. Some programs can give a suggestion to repair the file automatically (after the error), and some programs cannot repair it. It depends on the level of corruption, and the in-built functionality of the application to handle the error. There are various causes of the corruption.

Contents

Transmission

Data corruption during transmission has a variety of causes. Interruption of data transmission causes information loss. Environmental conditions can interfere with data transmission, especially when dealing with wireless transmission methods. Heavy clouds can block satellite transmissions. Wireless networks are susceptible to interference from devices such as microwave ovens.

Storage

Data loss during storage has two broad causes: hardware and software failure. Background radiation, head crashes, and aging or wear of the storage device fall into the former category, while software failure typically occurs due to bugs in the code.

Error detection and correction may occur in the hardware, the disk subsystem or adapter, or software which implements error checking and correction (i.e., RAID software such as mdadm for Linux).

There are two types of data loss:

  • Undetected- also known as "silent corruption". These problems have been attributed to errors during the write process to disk. These are the most dangerous errors as there is no indication that the data is incorrect.
  • Detected- these errors are most often caused by disk drive problems. Errors may either permanent or temporary, where temporary errors are able to be overcome when the operation is repeated by the hardware. Errors are normally detected by the hardware, either by the disk drive by checking the data read from the disk using the ECC/CRC error correcting code stored alongside the data on disk, or in the case of a RAID array by comparing the contents of the RAID strips with the ECC checksum or parity of the RAID stripe.

Countermeasures

When data corruption behaves as a Poisson process, where each bit of data has an independently low probability of being changed, data corruption can generally be detected by the use of checksums, and can often be corrected by the use of error correcting codes.

If an uncorrectable data corruption is detected, procedures such as automatic retransmission or restoration from backups can be applied. Certain levels of RAID disk arrays have the ability to store and evaluate parity bits for data across a set of hard disks and can reconstruct corrupted data upon the failure of a single or multiple disks, depending on the level of RAID implemented.

Today, many errors are detected and corrected by the disk drive using the ECC/CRC codes[1] which are stored on disk for each sector. If the disk drive detects multiple read errors on a sector it may make a copy of the failing sector on another part of the disk- remapping the failed sector of the disk to a spare sector without the involvement of the operating system (though this may be delayed until the next write to the sector).

This "silent correction" can lead to other problems if disk storage is not managed well, as the disk drive will continue to remap sectors until it runs out of spares, at which time the temporary correctable errors can turn into permanent ones as the disk drive deteriorates. S.M.A.R.T. provides a standardized way of monitoring the health of a disk drive, and there are tools available for most operating systems to automatically check the disk drive for impending failures by watching for deteriorating SMART parameters.

"Data scrubbing" is another method to reduce the likelihood of data corruption, as disk errors are caught and recovered from, before multiple errors accumulate and overwhelm the number of parity bits. Instead of parity being checked on each read, the parity is checked during a regular scan of the disk, often done as a low priority background process. Note that the "data scrubbing" operation activates a parity check. If a user simply runs a normal program that reads data from the disk, then the parity would not be checked unless parity-check-on-read was both supported and enabled on the disk subsystem.

If appropriate mechanisms are employed to detect and remedy data corruption, data integrity can be maintained. This is particularly important in commercial applications (e.g. banking), where an undetected error could either corrupt a database index or change data to drastically affect an account balance, and in the use of encrypted or compressed data, where a small error can make an extensive dataset unusable.[2] It is worth noting that while the study by CERN has been often referenced as showing large levels of data corruption, the disk subsystem which was the subject of the paper was set up with RAID5 and a single parity bit (hence could not recover from a single "silent" error), did not use parity-check-on-read (and hence could not detect "silent errors" through parity checking of the RAID stripe), and did not use data scrubbing. The disk storage was also subject to a microcode software bug which caused higher levels of errors than normal [3] .

See also

Solutions


References

  1. ^ "Read Error Severities and Error Management Logic". http://www.storagereview.com/guide/errorRead.html. Retrieved 24 July 2011. 
  2. ^ Data Integrity by Cern April 2007 Cern.ch
  3. ^ Bernd Panzer-Steindel. "Data integrity". http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=1&materialId=paper&confId=13797. There are some correlations with known problems, like the problem where disks drop out of the RAID5 system on the 3ware controllers. After some long discussions with 3Ware and our hardware vendors this was identified as a problem in the WD disk firmware.

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Data Corruption —   [engl.], Datenverfälschung …   Universal-Lexikon

  • Data loss — is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss …   Wikipedia

  • Corruption (disambiguation) — Corruption usually refers to spiritual or moral impurity. Corruption may also refer to: Corruption (1933 film), an American crime film Corruption (1968 film), a British horror film Corruption (interactive fiction), a 1988 adventure game… …   Wikipedia

  • Data consistency — summarizes the validity, accuracy, usability and integrity of related data between applications and across an IT enterprise. This ensures that each user observes a consistent view of the data, including visible changes made by the user s own… …   Wikipedia

  • corruption — cor‧rup‧tion [kəˈrʌpʆn] noun [uncountable] 1. LAW the crime of giving or receiving money, gifts, a better job etc in exchange for doing something dishonest or illegal: • He denies twelve counts of corruption. • The Chamber of Deputies voted to… …   Financial and business terms

  • Data deduplication — In computing, data deduplication is a specialized data compression technique for eliminating coarse grained redundant data. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the… …   Wikipedia

  • Data validation — In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called validation rules or check routines , that check for correctness, meaningfulness, and… …   Wikipedia

  • Data scrubbing — Not to be confused with Data cleansing or Sanitization (classified information). Data scrubbing is an error correction technique which uses a background task that periodically inspects memory for errors, and then corrects the error using ECC …   Wikipedia

  • Data sharing — is the practice of making data used for scholarly research available to other investigators. Replication has a long history in science. The motto of The Royal Society is Nullius in verba , translated Take no man s word for it. [1] Many funding… …   Wikipedia

  • Data security — is the means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. Thus data security helps to ensure privacy. It also helps in protecting personal data. Data security is part of the larger practice of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”