Noisy text

Noisy text

Noise in text can be defined as any kind of difference between the surface form of a coded representation of the text[disambiguation needed ] and the intended, correct, or original text.

Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this text used in such discourses.

Gartner estimates that unstructured data constitutes 80% of the whole enterprise data. A huge proportion of this unstructured data comprises chat transcripts, emails and other informal and semi-formal internal and external communications.

Usually such text is meant for human consumption. However, now with huge amounts of such text being present, both online and within the enterprise, it is important to mine such text using computers.

Techniques for correction

There are many spell checkers and grammar checkers available today. Many word processors like MS Word include this in the editing tool. Online, Google in its search interface tries to include a correction engine to guide users when they make mistakes with their queries.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Noisy text analytics — is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge… …   Wikipedia

  • Text analytics — The term text analytics describes a set of linguistic, lexical, pattern recognition,extraction, tagging/structuring, visualization, and predictive techniques. The termalso describes processes that apply these techniques, whether independently or… …   Wikipedia

  • Text over IP — (or ToIP) is a means of providing a real time text service that operates over IP based networks. It complements Voice over IP (VoIP) and Video over IP. Real time text is defined in ITU T Multimedia Recommendation F.700 2.1.2.1 . Real time text is …   Wikipedia

  • Noisy-channel coding theorem — In information theory, the noisy channel coding theorem (sometimes Shannon s theorem), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital information)… …   Wikipedia

  • Unstructured data — (or unstructured information) refers to (usually) computerized information that either does not have a data structure or has one that is not easily usable by a a computer program. The term distinguishes such information from data stored in… …   Wikipedia

  • Question answering — (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web or a local collection) the system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more… …   Wikipedia

  • Sentiment analysis — or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine …   Wikipedia

  • Concept Search — A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is… …   Wikipedia

  • Natural language user interface — Natural Language User Interfaces (LUI) are a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications. In interface… …   Wikipedia

  • List of mathematics articles (N) — NOTOC N N body problem N category N category number N connected space N dimensional sequential move puzzles N dimensional space N huge cardinal N jet N Mahlo cardinal N monoid N player game N set N skeleton N sphere N! conjecture Nabla symbol… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”