National Corpus of Polish

National Corpus of Polish

The National Corpus of Polish (Polish : Narodowy Korpus Języka Polskiego NKJP) is the biggest and the most important corpus of the Polish language. A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function.

Contents

Description

The National Corpus of Polish is a shared initiative of four institutions: Institute of Computer Science at the Polish Academy of Sciences (coordinator), Institute of Polish Language at the Polish Academy of Sciences, Polish Scientific Publishers PWN, and the Department of Computational and Corpus Linguistics at the University of Łódź. It has been registered as a research-development project of the Ministry of Science and Higher Education.

The intended size of the whole National Corpus of Polish is 1 billion words, of which at least 300-million word subcorpus will be carefully balanced. The demo version contains over 1200 million words from the three segments of the Polish language corpora: IPIPAN, PELCRA and PWN (September 2009)[1]

The corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts.[2]

Search Engines

  • PELCRA – 1200 milions words from three corporas : IPIPAN, PELCRA, PWN. It is easy to use and the results can be downloaded in form of spreadsheets. A special query syntax also allows the use of morphological expansion and spelling, the search in one query options and flexible lexical phraseological compounds. PELCRA offers also a visualization of the registry function and the generation of time series for words, phrases and idioms.
  • POLICARP- Poliqarp gives the ability to search for specific words or phrases. It also allows to find the sequence determined using regular expressions, for example, all occurring in the body of phrases consisting of noun and an adjective or all of the grammatical forms of the selected word (especially useful for studies on the Polish language.) These operations, in both on-line and off-line, run pretty quickly - in the simple search queries, does not take more than a few seconds. Finally, we draw attention to a lot of configuration options of the program.


History

The first corpus to emerge was developed by the Institute of the Polish Language, Polish Academy of Sciences (not publicly available), followed by the corpus of PWN publishers, then the corpus of the PELCRA group at the University of Łódź, and finally the corpus of the Institute of Computer Science, Polish Academy of Science. All four teams decided to join forces in 2006, forming the Consortium for the National Corpus of Polish.[3]


References

External links

Narodowy Korpus Jezyka Polskiego

Instytut Podstaw Informatyki Polskiej Akademii Nauk

Instytut Jezyka Polskiego Polskiej Akademii Nauk


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Corpus Christi R. C. Church Complex — Corpus Christi (Church) Location 199 Clark Street, Buffalo, New York Country …   Wikipedia

  • Polish Literature — • Of the literature of Poland before the advent of Christianity (965) very few traces indeed are extant . . . Catholic Encyclopedia. Kevin Knight. 2006. Polish Literature     Polish Literature …   Catholic encyclopedia

  • National Register of Historic Places listings in Texas, Counties K-S — The following is a list of places within the state of Texas that are listed in the National Register of Historic Places. The names on the list are as they were entered in the register; some place names are inaccurate or have changed since being… …   Wikipedia

  • Polish Cathedral style — Immaculate Heart of Mary Church on Polish Hill in Pittsburgh The Polish Cathedral architectural style is a North American genre of Catholic church architecture found throughout the Great Lakes and Middle Atlantic regions as well as in parts of …   Wikipedia

  • Habeas corpus — This article is about the legal term. For other uses, see Habeas corpus (disambiguation). Prerogative w …   Wikipedia

  • Oxford University Polish Society — Home Page Oxford University Polish Society Founded 1955 as Polish Students Club (by Maciej Giertych) Officers, 2010 2011 P …   Wikipedia

  • ISO 12620 — is a standard from ISO/TC 37 which defines a Data Category Registry, a registry for registering linguistic terms used in various fields of translation, computational linguistics and natural language processing and defining mappings both between… …   Wikipedia

  • Poland — /poh leuhnd/, n. a republic in E central Europe, on the Baltic Sea. 38,700,291; ab. 121,000 sq. mi. (313,400 sq. km). Cap.: Warsaw. Polish, Polska. * * * Poland Introduction Poland Background: Poland is an ancient nation that was conceived around …   Universalium

  • American and British English spelling differences — Spelling differences redirects here. For other uses, see Category:Language comparison. For guidelines on dialects and spelling in the English language version of Wikipedia, see Wikipedia:Manual of Style#National varieties of English. Differences… …   Wikipedia

  • Czech language — Czech Čeština, Český jazyk Spoken in Czech Republic Serbia Region Central Europe Native speakers 12 million …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”