- Dirty data
-
For cache operation, see Dirty cache.
Dirty data is a term used by Information technology (IT) professionals when referring to inaccurate information (data) collected from data capture forms. It is also used to refer to data which has not yet been committed to the database, and is currently held in memory.
Dirty data can be misleading, incorrect, without generalized formatting, incorrectly spelled or punctuated, entered into the wrong field or duplicated. Dirty data can be prevented using input masks or validation rules, but completely removing such data from a source can be impossible or impractical
There are several causes of dirty data. In some cases, the information is deliberately distorted. A person may insert misleading or fictional personal information which appears real. Such dirty data may not be picked up by an administrator or a validation routine because it appears legitimate. Duplicate data can be caused by repeat submissions, user error or incorrect data joining. There can also be formatting issues or typographical errors. A common formatting issue is caused by variations in a user's preference for entering phone numbers.
See also
References
- Webopedia - dirty data
- Whatis.com - dirty data
- We Like Bad Data - an alternative take on what might be considered Dirty Data
Categories:- Data quality
- Database stubs
Wikimedia Foundation. 2010.