- Constraint Grammar
-
Constraint Grammar (CG) is a methodological paradigm for Natural language processing (NLP). Linguist-written, context dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation (lexeme or base form), inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally (defined distances) or globally (undefined distances). Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.
The Constraint Grammar concept was launched by Fred Karlsson in 1990 (Karlsson 1990; Karlsson et al., eds, 1995), and CG taggers and parsers have since been written for a large variety of languages, routinely achieving accuracy F-scores for PoS (word class) of over 99%[1]. A number of syntactic CG systems have reported F-scores of around 95% for syntactic function labels. CG systems can be used to create full syntactic trees in other formalisms by adding small, non-terminal based phrase structure grammars or dependency grammars, and a number of corpus/treebank projects have used Constraint Grammar for automatic annotation. CG methodology has also used in a number of language technology applications, such as spell checkers and machine translation systems.
Contents
List of Constraint Grammar systems sorted by language
- Free software
- VISL CG-3 Constraint Grammar compiler/parser
- North and Lule Sami, Faroese, Komi and Greenlandic from the University of Tromsø (more information, Northern Sami documentation)
- Estonian [1]
- Norwegian Nynorsk and Bokmål online,Oslo-Bergen tagger(sourcecode)
- Breton, Welsh, Irish Gaelic and Norwegian (converted from the above) in Apertium (see CG in Apertium)
- Non-free software
- Basque [2]
- Catalan CATCG
- Danish DanGram
- English ENGCG, ENGCG-2, VISL-ENGCG
- Esperanto EspGram
- French FrAG
- German GerGram
- Irish online
- Italian ItaGram
- Spanish HISPAL
- Swedish SWECG
- Swahili
- Portuguese PALAVRAS
External links
- CG Tutorial by Kevin Donnelly
- VISL CG-3, the grammar compiler/parser
Footnotes
- ^ For English, see for example Tapanainen and Voutilainen 1994.
References
- Bick, Eckhard. 2000. The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus: Aarhus University Press. ISBN 87-7288-910-1.
- Karlsson, Fred. 1990. Constraint Grammar as a Framework for Parsing Unrestricted Text. H. Karlgren, ed., Proceedings of the 13th International Conference of Computational Linguistics, Vol. 3. Helsinki 1990, 168-173.
- Karlsson, Fred, Atro Voutilainen, Juha Heikkilä, and Arto Anttila, editors. 1995. Constraint Grammar: A Language-Independent System for Parsing Running Text. Natural Language Processing, No 4. Mouton de Gruyter, Berlin and New York. ISBN 3-11-014179-5.
- Tapanainen, Pasi and Atro Voutilainen 1994: Tagging accurately: don't guess if you know. ANLC '94 Proceedings of the fourth conference on Applied natural language processing.
Categories:- Grammar frameworks
Wikimedia Foundation. 2010.