- Mildly context-sensitive language
-
In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.
Contents
Definition
Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if
- it contains all context-free languages.
- it admits limited cross-serial dependencies.
- all the languages are parsable in polynomial time.
- all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.
Formalisms
Some attempts at creating mildly context-sensitive language formalisms include:
- Linear context-free rewriting systems developed by D. J. Weir
- Minimalist grammars of Edward P. Stabler, Alain Lecomte, Christian Retoré, etc.
- Head grammars of Carl Pollard
- Combinatory categorial grammars developed by Mark Steedman
- Linear indexed grammars defined by Gerald Gazdar
- Tree-adjoining grammars developed by Aravind Joshi
The first two of these grammar classes define the same set of languages, while the last four define another single, strictly smaller class.[citation needed] The larger of the two classes may be parsed by thread automatons, while the other smaller one may be parsed by embedded pushdown automatons.
While all languages in both classes are mildly context-sensitive and both classes support some cross-serial dependency, neither class exhausts the full set of mildly context-sensitive languages.
Control Language Hierarchy
A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.
Following are some of the properties of Level-k languages in the hierarchy:
- Level-k languages are properly contained by Level-(k + 1) language class
- Level-k languages can be parsed in time
- Level-k contains the language , but not
- Level-k contains the language , but not
Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class become, in a sense, less mildly context-sensitive.
See also
Further reading
- Joshi, Aravind; Vijay-Shanker, K.; Weir, David (1991), "The Convergence of Mildly Context-Sensitive Grammar Formalisms", in Sells, Peter; Shieber, Stuart; Wasow, Thomas, Foundational Issues in Natural Language Processing, Cambridge, MA: MIT Press, pp. 31–81, ISBN 0-262-19303-5, http://repository.upenn.edu/cgi/viewcontent.cgi?article=1571&context=cis_reports.
- Vijay-Shanker, K.; Weir, David (1994), "The Equivalence of Four Extensions of Context-Free Grammars", Mathematical Systems Theory 27 (6): 511–546, doi:10.1007/BF01191624, ISSN 0025-5661, http://citeseer.ist.psu.edu/vijay-shanker94equivalence.html.
- Villemonte de La Clergerie, Eric (2002), "Parsing Mildly Context-Sensitive Languages with Thread Automata", Proceedings of the 19th international conference on Computational linguistics, 1, pp. 1–7, http://acl.ldc.upenn.edu/C/C02/C02-1028.pdf.
- Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages", Theoretical computer science 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X..
External links
- Parsing Beyond Context-Free Grammar, by Laura Kallmeyer
- Seminar on tree-adjoining grammars and mildly context-sensitive languages and formalisms, by Laura Kallmeyer
- Mildly Context-Sensitive Grammars, by Aravind Joshi
Automata theory: formal languages and formal grammars Chomsky hierarchy Type-0—Type-1———Type-2——Type-3—Grammars (no common name)Linear context-free rewriting systems etc.Tree-adjoining etc.—Languages Mildly context-sensitiveMinimal automaton Thread automataEach category of languages is a proper subset of the category directly above it. - Any automaton and any grammar in each category has an equivalent automaton or grammar in the category directly above it. Categories:- Formal languages
Wikimedia Foundation. 2010.