Morphological dictionary

Morphological dictionary

In the field of computational linguistics, a morphological dictionary is a file that contains correspondences between surface form and lexical forms of words. Surface forms of words are those found in any text. The corresponding lexical form of a surface form is the lemma followed by grammatical information (for example the part of speech, gender and number). In English houses is a surface form of the noun house. The lexical form would be "house", noun, plural. There are two kinds of morphological dictionaries: aligned and non-aligned.

Contents

Aligned morphological dictionaries

In an aligned morphological dictionary, the correspondence between the surface form and the lexical form of a word is aligned at the character level. Continuing with the previous example, we have:

(h,h) (o,o) (u,u) (s,s) (e,e) (s,<n>), (θ,<pl>)

Where θ is the empty symbol and <n> signifies "noun", and <pl> signifies "plural".

In the example the left hand side is the surface form (input), and the right hand side is the lexical form (output). This order is used in morphological analysis where a lexical form is generated from a surface form. In morphological generation this order would be reversed.

Formally, if Σ is the alphabet of the input symbols, and Γ is the alphabet of the output symbols, an aligned morphological dictionary is a subset  A \subset 2^{(L^*)} , where:

 L = (( \Sigma \cup { \theta } ) \times \Gamma) \cup (\Sigma \times ( \Gamma \cup { \theta } ))

is the alphabet of all the possible alignments including the empty symbol. That is, an aligned morphological dictionary is a set of string in L * .

Non-aligned morphological dictionary

A non-aligned morphological dictionary is simply a set  U \subset 2^{(\Gamma^* \times \Sigma^*)} of pairs of input and output strings. A non-aligned morphological dictionary would represent the previous example as:

(houses, house<n><pl>)

It is possible to convert a non-aligned dictionary into an aligned dictionary. Besides trivial alignments to the left or to the right, linguistically motivated alignments which align characters to their corresponding morphemes are possible.

Lexical ambiguities

Frequently there exists more than one lexical form associated with a surface form of a word. For example "house" may be a noun in the singular, /haʊs/, or may be a verb in the present tense, /haʊz/. As a result of this it is necessary to have a function which relates input strings with their corresponding output strings.

If we define the set  E \subset \Sigma^* of input words such that  E = { w: (w,w') \in U } , the correspondence funcion would be  \tau : E \rightarrow 2^{\Gamma^{*}} defined as  \tau(w) =  w' : (w,w') \in U .

List of online morphological dictionaries

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Morphological analysis — can refer to: Morphological analysis (problem solving) or general morphological analysis, a method developed by Fritz Zwicky for exploring all the possible solutions to a multi dimensional, non quantified problem complex Analysis of morphology… …   Wikipedia

  • Morphological parsing — Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word… …   Wikipedia

  • Dictionary-based machine translation — Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or… …   Wikipedia

  • Machine-readable dictionary — ( MRD ) is a dictionary stored as machine (computer) data instead of being printed on paper. It is an electronic dictionary and lexical database.A machine readable dictionary is a dictionary in an electronic form that can be loaded in a database… …   Wikipedia

  • Finite state transducer — A finite state transducer (FST) is a finite state machine with two tapes: an input tape and an output tape. This contrasts with an ordinary finite state automaton (or finite state acceptor), which has a single tape. OverviewAn automaton can be… …   Wikipedia

  • Ket language — language name=Ket familycolor=Dené Yeniseian states=Russia region=Krasnoyarsk Krai speakers=550 fam2=Yeniseian fam3=Northern Yeniseian iso3=ket|notice=nonoticeThe Ket language, formerly known as Yenisei Ostyak, a Siberian language long thought to …   Wikipedia

  • Nahuatl — (Aztekisch) (Nāhuatlahtōlli) Gesprochen in Mexiko Sprecher 1,5 Millionen Linguistische Klassifikation Uto Aztekische Sprachen Südliche Uto Aztekische Sprachen Nahua Sprachen (Nahuan) General Aztec   Nahuatl …   Deutsch Wikipedia

  • HEBREW LANGUAGE — This entry is arranged according to the following scheme: pre biblical biblical the dead sea scrolls mishnaic medieval modern period A detailed table of contents precedes each section. PRE BIBLICAL nature of the evidence the sources phonology… …   Encyclopedia of Judaism

  • Meaning–text theory — (MTT) is a theoretical linguistic framework, first put forward in Moscow by Aleksandr Žolkovskij and Igor Mel’čuk,[1] for the construction of models of natural language. The theory provides a large and elaborate basis for linguistic description… …   Wikipedia

  • HEBREW GRAMMAR — The following entry is divided into two sections: an Introduction for the non specialist and (II) a detailed survey. [i] HEBREW GRAMMAR: AN INTRODUCTION There are four main phases in the history of the Hebrew language: the biblical or classical,… …   Encyclopedia of Judaism

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”