Lemma (morphology)

Lemma (morphology)

In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Czech. The process of determining the lemma for a given word is called lemmatisation.

Contents

Morphology

In English, the citation form of a noun is the singular: e.g., mouse rather than mice. For multi-word lexemes which contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun one: e.g., do one's best, perjure oneself. In languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language additionally has cases, the citation form is often the masculine singular nominative.

In many languages, the citation form of a verb is the infinitive: French aller, German gehen. In English it usually is the full infinitive (to go); the present tense is used for some defective verbs (shall, can; and must has only the one form). In Latin, Ancient Greek, and Modern Greek (which has no infinitive), however, the first person singular present tense is normally used, though occasionally the infinitive may also be seen. (For contracted verbs in Greek, an uncontracted first person singular present tense is used to reveal the contract vowel, e.g. φιλέω philéō for φιλῶ philō "I love" [implying affection]; ἀγαπάω agapáō for ἀγαπῶ agapō "I love" [implying regard]). In Japanese, the non-past (present and future) tense is used.

In Arabic, which has no infinitives, the third person singular masculine of the past tense is the least-marked form, and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used today, the triliteral of the word, either a verb or a noun, is used. Hebrew often uses the 3rd person masculine qal perfect, e.g., ברא bara' create, כפר kaphar deny. Georgian uses the verbal noun. For Korean, -da is attached to the stem.

In the Irish language words are highly inflected depending on their case (genitive, nominative, dative, and vocative); they are also inflected on their place within a sentence due to the presence of initial mutations. The noun cainteoir, the lemma for the noun meaning "speaker", has a variety of forms: chainteoir, gcainteoir, cainteora, chainteora, cainteoirí, chainteoirí and gcainteoirí.

Some phrases are cited in a sort of lemma, e.g., Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato, although what he said was more like, Ceterum censeo Carthaginem esse delendam ("As to the rest, I hold that Carthage must be destroyed").

Lexicography

In a dictionary, the lemma "go" represents the inflected forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". The disadvantage of such simplifications is, of course, the inability to look up a declined or conjugated form of the word, although some dictionaries, like Webster's, will list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen); the Cassell does.

The form that is chosen to be the lemma is usually the least marked form, though there are occasional exceptions; e.g., Finnish dictionaries list verbs not under the verb root, but under the first infinitive marked with -(t)a, -(t)ä.

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In such usage the specific definition of "lemma" is flexible depending on the task it is being used for.

Difference between stem and lemma

In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production.[1] In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" (IPA: /proʊˈdjuːst/) vs. "production" (IPA: /proʊˈdʌkʃən/).

Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

See also

References

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Lemma (linguistics) — In linguistics a lemma (plural lemmas or lemmata ) has two distinct interpretations: # morphology / lexicography: the canonical form or citation form of a set of forms (headword); e.g. in English, run , runs , ran and running are forms of the… …   Wikipedia

  • Morphology (linguistics) — For other uses, see Morphology. Linguistics …   Wikipedia

  • Word stem — Examples The stem of the verb wait is wait: it is the part that is common to all its inflected variants. wait (infinitive) wait (imperative) waits (present, 3rd person, singluar) wait (present, other persons and/or plural) waited (simple past)… …   Wikipedia

  • Root (linguistics) — The root word is the primary lexical unit of a word, and of a word family (root is then called base word), which carries the most significant aspects of semantic content and cannot be reduced into smaller constituents. Content words in nearly all …   Wikipedia

  • Marker (linguistics) — In linguistics, a marker is a free or bound morpheme that indicates the grammatical function of the marked word, phrase, or sentence. In analytic languages and agglutinative languages, markers are generally easily distinguished. In fusional… …   Wikipedia

  • Null morpheme — In morpheme based morphology, a null morpheme is a morpheme that is realized by a phonologically null affix (an empty string of phonological segments). In simpler terms, a null morpheme is an invisible affix. It is also called a zero morpheme;… …   Wikipedia

  • Markedness — Unmarked redirects here. For undecorated law enforcement vehicles, see Police car#Functional types. Markedness is a specific kind of asymmetry relationship between elements of linguistic or conceptual structure. In a marked unmarked relation, one …   Wikipedia

  • Inflection — In grammar, inflection or inflexion is the way language handles grammatical relations and relational categories such as tense, mood, voice, aspect, person, number, gender, case. In covert inflection, such categories are not overtly expressed.… …   Wikipedia

  • Glossary of botanical terms — Many of the terms used in Wikipedia glossaries (often most) are already defined and explained within Wikipedia itself. However, lists like the following indicate where new articles need to be written and are also useful for looking up and… …   Wikipedia

  • Morfología floral — Partes de la flor …   Wikipedia Español

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”