- Tree-adjoining grammar
Tree-adjoining grammar (TAG) is a grammar formalism defined by
Aravind Joshi . Tree-adjoining grammars are somewhat similar tocontext-free grammar s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (seetree (graph theory) andtree data structure ).History
TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),cite paper
last = Joshi
first = Aravind
coauthors = S. R. Kosaraju, H. Yamada
title = String Adjunct Grammars
year = 1969
publisher = Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada ] the "string grammar" ofZellig Harris . AGs handleendocentric properties of language in a natural and effective way, but do not have a good characterization ofexocentric constructions; the converse is true of rewrite grammars, orphrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from theChomsky hierarchy but intersects it in interesting and linguistically relevant ways.cite paper
last = Joshi
first = Aravind
title = Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance
year = 1969
publisher = Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden ]Description
The rules in a TAG are trees with a special leaf node known as the "foot node", which is anchored to a word. There are two types of basic trees in TAG: "initial" trees (often represented as '') and "auxiliary" trees (''). Initial trees represent basic valency relations, while auxiliary tree allow for recursion.cite book
last = Jurafsky
first = Daniel
coauthors = James H. Martin
title = Speech and Language Processing
year = 2000
pages = 354
publisher = Prentice Hall
location = Upper Saddle River, NJ ] Auxiliary trees have the root (top) node and foot node labeled with the same symbol.A derivation starts with an initial tree, combining via either "substitution" or "adjunction". Substitution replaces a frontier node with another tree whose top node has the same label. Adjunction inserts an auxiliary tree into the center of another tree.cite conference
last = Joshi
first = Aravind
coauthors = Owen Rambow
title = A Formalism for Dependency Grammar Based on Tree Adjoining Grammar
year = 2003
booktitle = Proceedings of the Conference on Meaning-Text Theory
url = http://www1.cs.columbia.edu/~rambow/papers/joshi-rambow-2003.pdf] .The root/foot label of the auxiliary tree must match the label of the node at which it adjoins.Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.
Complexity and application
Tree-adjoining grammars are often described as mildly context-sensitive, meaning that they possess certain properties that make them more powerful (in terms of
weak generative capacity ) than context-free grammars, but less powerful than indexed orcontext-sensitive grammar s. Mildly context-sensitive grammars are conjectured to be powerful enough to modelnatural language s while remaining efficiently parseable in the general case.cite book
last = Joshi
first = Aravind
chapter = How much context-sensitivity is necessary for characterizing structural descriptions
year = 1985
publisher = Cambridge University Press
pages = 206–250
title = Natural Language Processing: Theoretical, Computational, and Psychological Perspectives
editor = D. Dowty, L. Karttunen, and A. Zwicky, (eds.)
location = New York, NY ]Due to their formal weakness, TAG is often used in
computational linguistics andnatural language processing .A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language . This type of processing can be represented by an
embedded pushdown automaton .Languages with cubes (ie triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.
For these reasons, languages generated by tree-adjoining grammars are referred to as "
mildly context-sensitive language "s.References
External links
* [http://www.cis.upenn.edu/~xtag/ The XTAG project] , which uses a TAG for natural language processing.
* [http://www.let.rug.nl/~vannoord/papers/diss/diss/node59.html A tutorial on TAG]
* [http://www.computing.dcu.ie/~yguo/doc/talk/TAG.pdf Another tutorial] with focus on comparison withLexical Functional Grammar and grammars extraction fromTreebank
* [http://wiki.loria.fr/wiki/SemConst/Documentation#Background SemConst Documentation] A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
* [http://sourcesup.cru.fr/tulipa/ The TuLiPa project] The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly forMulti-Component Tree Adjoining Grammars withTree Tuples
* [http://mgkit.gforge.inria.fr/ The Metagrammar Toolkit] which provides several tools to edit and compileMetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
* [http://www.loria.fr/~azim/LLP2/help/fr/index.html LLP2] ALexicalized Tree Adjoining Grammar parser which provides an easy to use graphical environment (page in french)
Wikimedia Foundation. 2010.