Pumping lemma for regular languages

Pumping lemma for regular languages

In the theory of formal languages, the pumping lemma for regular languages describes an essential property of all regular languages. Informally, it says that all sufficiently long words in a regular language may be "pumped" - that is, have a middle section of the word repeated an arbitrary number of times - to produce a new word which also lies within the same language.

The pumping lemma was first articulated by Y. Bar-Hillel, M. Perles, E. Shamir in 1961. [Y. Bar-Hillel, M. Perles, E. Shamir, "On formal properties of simple phrase structure grammars", "Zeitschrift für Phonetik, Sprachweissenshaft und Kommunikationsforschung" 14 (1961) pp. 143-172. ] It is useful for disproving the regularity of a specific language in question.

Formal statement

Let "L" be a regular language. Then there exists an integer "p" ≥ 1 depending only on "L" such that every string "w" in "L" oflength at least "p" ("p" is called the "pumping length") can be written as "w" = "xyz" (i.e., "w" can be divided into three substrings), satisfying the following conditions:

# |"y"| > 0
# |"xy"| ≤ "p"
# for all "i" ≥ 0, "xy"i"z" ∈ "L"

"y" is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in "L"). (1) means the loop "y" to be pumped must be of length at least one; (2) means the loop must occur within the first "p" characters. There is no restriction on "x" and "z".

Informal statement and explanation

The pumping lemma describes an essential property of regular languages. It says that a word "s" of the language "L" with length of at least "p", (where "p" is a constant, called the pumping length, depending only on the language "L") may be split into three substrings, s = xyz, such that the middle portion, "y" (which must not be empty), can be repeated an arbitrary number of times (including zero times, which removes it) yielding a string that is still in "L". This process of repetition is known as "pumping". Moreover, the pumping lemma guarantees that the length of "xy" will be at most "p", imposing a limit on the ways in which "s" may be split. Note that finite languages satisfy the pumping lemma trivially by having "p" equal to the maximum string length in "L" plus one.

Use of lemma

The pumping lemma is often used to prove that a particular language is non-regular: a proof by contradiction (of the language's regularity) may consist of exhibiting a word (of the required length) in the language which lacks the property outlined in the pumping lemma.

For example the language "L" = {"anbn" : "n" ≥ 0} over the alphabet Σ = {"a", "b"} can be shown to be non-regular as follows. Let "w", "x", "y", "z", "p", and "i" be as stated in the pumping lemma above. Let "w" in "L" be given by "w" = "apbp". By the pumping lemma, there must be some decomposition "w" = "xyz" with |"xy"| ≤ "p", |"y"| ≥ 1 such that"xyiz" in L for every "i" ≥ 0. If we let |"xy"|="p" and |"z"|="p", then "xy" is the first half of "w", or all "p" of the "a"s. Because |"y"| ≥ 1, it consists of a non-zero number of "a"s, and "xy2z" has more "a"s than "b"s and is therefore not in "L" (note that any value of "i" ≠ 1 will give us a contradiction). We have reached a contradiction because, in this case, the pumped word does not belong to the language "L". Therefore, the assumption that "L" is regular must be incorrect. Hence "L" is not regular.

The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea. Given "p", there is a string of balanced parentheses that begins with more than "p" left parentheses, so that "y" will consist entirely of left parentheses. By repeating "y", we can produce a string that does not contain the same number of left and right parentheses, and so they cannot be balanced.

Proof of the pumping lemma

For every regular language there is a finite state automaton (FSA) which accepts the language. If the regular language is finite, then the result is immediate since we may choose the pumping length "p" to be greater than the length of the longest word in the language: in this case there are no strings in the language greater than the pumping length "p", and so the assertions made against such words are vacuously true.

If the regular language is infinite, then there must be some minimal FSA that accepts the language. The number of states in this FSA are counted, and then, that count is used as the pumping length "p". If the string's length is longer than "p", then there must be at least one state that is repeated (which we will call state "S"). The transitions that take the machine from state "S" back to state "S" match some string. This string is called "y" in the lemma, and since the machine will match a string without the "y" portion, or the string "y" can be repeated, the conditions of the lemma are satisfied.

This makes more sense with an example. The following image shows an FSA.

The FSA accepts the string: abcbcd. Since this is longer than the number of states, there are repeated states by the pigeonhole principle. In this example, the repeated states are states q1 and q2. Since the substring bcbc of string abcbcd takes the machine through transitions that start at state q1 and end at state q1, the string portion could be repeated and the FSA would still accept giving the string abcbcbcbcd, and the string portion could be removed and the FSA would still accept giving the string ad. In terms of the pumping lemma, the string abcbcd is broken into a "x" portion a, a "y" portion bc and a "z" portion bcd. (Note, it could be broken in different ways such as a "x" portion a , a "y" portion bcbc and a "z" portion d and all the conditions hold except that |"xy"| ≤ "p").

Lemma not sufficient

Note that the pumping lemma does not give a "sufficient" condition for a language to be regular. That is, the lemma holds for some non-regular languages. If the lemma does not hold for a language "L", then "L" is not regular. If the lemma does hold for a language "L", "L" might be regular.

For example, the language "{uuRv : u,v in {0,1}+}" (strings over the alphabet "{0,1}" consisting of a nonempty even palindrome followed by another nonempty string) is not regular but can still be "pumped" with "p" = 4. Suppose "w=uuRv" has length at least 4.
* If "u" has length 1, then |"v"| ≥ 2 and we can take "y" to be the first character in "v", thus not affecting the palindromic requirement.
* Otherwise, take "y" to be the first character of "u" and note that "yk" for "k" ≥ 2 starts with the nonempty palindrome "yy", so it's still in the language.

For a practical test that exactly characterizes regular languages, see the Myhill-Nerode theorem. The typical method for proving that a language is regular is to construct either a finite state machine or a regular expression for the language.

General version of pumping lemma for regular languages

If a language "L" is regular, then there exists a number "p" > 0 (the pumping length) such that every string "uwv" in "L" with |"w"| ≥ "p" can be written in the form :"uwv" = "uxyzv"with strings "x", "y" and "z" such that |"xy"| ≤ "p", |"y"| > 0 and:"uxyizv" is in "L" for every integer "i" ≥ 0.

This version can be used to prove many more languages are non-regular, since it imposes stricter requirements on the language.

See also

* Pumping lemma for context-free languages
* Formal languages
* Ogden's lemma

References

* Section 1.4: Nonregular Languages, pp.77–83.
* John E. Hopcroft and Jeffrey D. Ullman, "Introduction to Automata Theory, Languages and Computation", Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-029880-X. "(See chapter 3.)"


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Pumping lemma for context-free languages — The pumping lemma for context free languages, also known as the Bar Hillel lemma, is a lemma that gives a property that all context free languages have. Its primary use is to prove a language is not context free.The pumping lemma for context free …   Wikipedia

  • Pumping-Lemma — Das Pumping Lemma bzw. Pumplemma (auch Schleifensatz genannt) beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache… …   Deutsch Wikipedia

  • Pumping Lemma — Das Pumping Lemma bzw. Pumplemma beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache nicht regulär bzw. nicht… …   Deutsch Wikipedia

  • Pumping lemma — In the theory of formal languages in computability theory, a pumping lemma states that any infinite language of a given class can be pumped and still belong to that class. A language can be pumped if any sufficiently long string in the language… …   Wikipedia

  • Regular language — In theoretical computer science, a regular language is a formal language (i.e., a possibly infinite set of finite sequences of symbols from a finite alphabet) that satisfies the following equivalent properties: * it can be accepted by a… …   Wikipedia

  • Ogden's lemma — In the theory of formal languages, Ogden s lemma (named after William F. Ogden) provides an extension of flexibility over the pumping lemma for context free languages. Ogden s lemma states that if a language L is context free, then there exists… …   Wikipedia

  • PumpingLemma — Das Pumping Lemma bzw. Pumplemma beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache nicht regulär bzw. nicht… …   Deutsch Wikipedia

  • Pumpinglemma — Das Pumping Lemma bzw. Pumplemma beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache nicht regulär bzw. nicht… …   Deutsch Wikipedia

  • Pumplemma — Das Pumping Lemma bzw. Pumplemma beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache nicht regulär bzw. nicht… …   Deutsch Wikipedia

  • Schleifensatz — Das Pumping Lemma bzw. Pumplemma beschreibt in der theoretischen Informatik eine Eigenschaft bestimmter Klassen formaler Sprachen. In vielen Fällen lässt sich anhand des Lemmas nachweisen, dass eine formale Sprache nicht regulär bzw. nicht… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”