Noisy channel model

The noisy channel model is a framework used in spell checkers, question answering, speech recognition, and machine translation. In this model, the goal is to find the intended word given a word where the letters have been scrambled in some manner.

1 Definition
2 Example
3 Error-correction
4 See also
5 References

Definition

Given an alphabet $Σ$ , let $Σ *$ be the set of all finite strings over $Σ$ . Let the dictionary $D$ of valid words be some subset of $Σ *$ , i.e., $D\subseteq\Sigma^*$ .

The noisy channel is the matrix

Γ w s = Pr (s | w)

where $w\in D$ is the intended word and $s\in\Sigma^*$ is the scrambled word that was actually received.

Example

Consider the English alphabet $Σ = {a, b, c,..., y, z, A, B,..., Z,...}$ . Some subset $D\subseteq\Sigma^*$ makes up the dictionary of valid English words.

There are several mistakes that may occur while typing, including:

Missing letters, e.g., leter instead of letter
Accidental letter additions, e.g., misstake instead of mistake
Swapping letters, e.g., recieved instead of received
Replacing letters, e.g., fimite instead of finite

To construct the noisy channel matrix $Γ$ , we must consider the probability of each mistake, given the intended word ( $Pr (s | w)$ for all $w\in D$ and $s\in\Sigma^*$ ). These probabilities may be gathered, for example, by considering the Levenshtein distance between $s$ and $w$ or by comparing the draft of an essay with one that has been manually edited for spelling.

Error-correction

The goal of the noisy channel model is to find the intended word given the scrambled word that was received. The decision function $\sigma : \Sigma^* \to D$ is a function that, given a scrambled word, returns the intended word.

Methods of constructing a decision function include the maximum likelihood rule, the maximum a posteriori rule, and the minimum distance rule.

In some cases, it may be better to accept the scrambled word as the intended word rather than attempt to find an intended word in the dictionary. For example, the word schönfinkeling may not be in the dictionary, but might in fact be the intended word.

References

Brill, Eric; Moore, Robert C. (Jan 2000). "An Improved Error Model for Noisy Channel Spelling Correction". Proceedings of ACL 2000. https://research.microsoft.com/apps/pubs/default.aspx?id=66833.

Categories:

Automatic identification and data capture
Computational linguistics
Statistical natural language processing

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Channel capacity — In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel. By the noisy channel coding theorem, the … Wikipedia
Binary erasure channel — A binary erasure channel (or BEC) is a common communications channel model used in coding theory and information theory. In this model, a transmitter sends a bit (a zero or a one), and the receiver either receives the bit or it receives a message … Wikipedia
Binary symmetric channel — A binary symmetric channel (or BSC) is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit (a zero or a one), and the receiver receives a bit. It is assumed that… … Wikipedia
information theory — the mathematical theory concerned with the content, transmission, storage, and retrieval of information, usually in the form of messages or data, and esp. by means of computers. [1945 50] * * * ▪ mathematics Introduction a mathematical… … Universalium
Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… … Wikipedia
Power line communication — or power line carrier (PLC), also known as power line digital subscriber line (PDSL), mains communication, power line telecom (PLT), power line networking (PLN), or broadband over power lines (BPL) are systems for carrying data on a conductor… … Wikipedia
Matched filter — In telecommunications, a matched filter (originally known as a North filter[1]) is obtained by correlating a known signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to… … Wikipedia
Orthogonal frequency-division multiplexing — Passband modulation v · d · e Analog modulation AM · … Wikipedia
Claude Shannon — Claude Elwood Shannon (1916 2001) Born April … Wikipedia
History of information theory — The decisive event which established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of Claude E. Shannon s classic paper A Mathematical Theory of Communication in the Bell System… … Wikipedia

Academic Dictionaries and Encyclopedias

Noisy channel model

Contents

Definition

Example

Error-correction

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Noisy channel model

Contents

Definition

Example

Error-correction

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link