Croatian Language Corpus

Croatian Language Corpus: The Croatian Language Corpus (Croatian: Hrvatski jezični korpus, HJK) is a corpus of Croatian compiled at the Institute of Croatian Language and Linguistics (IHJJ).

Contents

1 Background

2 Goals

3 Format and Availability

4 Content

5 Cooperation

6 References

7 External Links

Background

The CLC was initially funded as a sub-project of the research program Riznica (Croatian Language Repository) by the Ministry of Science, Education, and Sports of the Republic of Croatia (MZOŠ) (project no. 0212010) from May 2005. In a second development phase, since 2007, the further extension and development of the CLC was embedded within the research program The Croatian Language Repository (CLR) that was granted by the MZOŠ (cf. Ćavar and Brozović Rončević, 2012^[1]). Being a research program (PI Dunja Brozović Rončević) with numerous subsumed independent research projects that make use of the CLC, the corpus is mainly developed as a by-product of those research projects within the CLR. Currently Dunja Brozović Rončević and Damir Ćavar are in charge of the corpus development.

Goals

One of the main goals of the CLC project is to create a publicly available Croatian corpus that is annotated on multiple levels, i.e. lemmatized, morphologically segmented and morpho-syntactically annotated, phonemically transcribed and syllabified, and syntactically parsed. While the current version of the corpus provides resources from the Croatian language standard, several corpora from different development phases of Croatian are created as well, including the digitizations of manuscripts and Croatian dictionaries.

Format and Availability

From the outset, the collected and digitized texts in the CLC were annotated using the Text Encoding Initiative (TEI) P5 XML standard. Currently approx. 90 mil. tokens are available in the TEI P5 XML format. The corpus can be accessed online via the Philologic^[2] interface (see The ARTFL Project^[3], Department of Romance Languages and Literatures, The University of Chicago). It is virtualized into various sub-corpora, and individual or specific definitions of sub-corpora can be provided on demand.

Content

The CLC is assembled from selected text of Croatian, covering various functional domains and genres. It includes literature and other written sources from the period of the beginning of the final shaping of the standardization of the Croatian language, i.e. from the second half of the 19th century on.

The CLC consists of:

fundamental Croatian literature (e.g. novels, short stories, drama, poetry)

non-fiction

scientific publications from various domains and University textbooks

school books

translated literature from outstanding Croatian translators

online journals and newspapers

books from the pre-standardization period of Croatian that are adapted to nowadays standard Croatian

Cooperation

The realization of the CLC was made possible in cooperation with:

Školska knjiga d.d.

Croatian Academy of Sciences and Arts (HAZU)

Stoljeća hrvatske književnosti, Matica hrvatska

References

^ Ćavar and Brozović Rončević, 2012

^ Philologic

^ The ARTFL Project

External Links

Croatian Language Corpus (CLC) website and Philologic interface

(Croatian) Croatian National Corpus, another Croatian corpus by the Institute of Linguistics of the Faculty of Humanities and Social Sciences, University of Zagreb

v · d · e Croatian language

Features
Alphabet

Dialects
Shtokavian · Chakavian · Kajkavian · Burgenland Croatian · Molise Croatian

Names
Patronymic names · List of exonyms · Months

History and literature
Literature · Declaration on the Status and Name of the Croatian Literary Language

Promotion and purism

Croatian National Corpus · Days of the Croatian Language · Council for Standard Croatian Language Norm · Institute of Croatian Language and Linguistics · Croatian Encyclopedia · Linguistic purism · Studies

Related topics
Croatian Sign Language

Categories:
Corpora
Croatian language
Online databases

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Croatian National Corpus — (Croatian: Hrvatski nacionalni korpus, HNK) is the biggest and the most important corpus of the Croatian language. Its compilation started in 1998 at the Institute of Linguistics[1] of the Faculty of Humanities and Social Sciences, University of… … Wikipedia
Croatian language — Hrvatski redirects here. For other uses, see Hrvatski (disambiguation). Croatian hrvatski Pronunciation … Wikipedia
Days of the Croatian Language — (Croatian: Dani hrvatskoga jezika) is an annual week long cultural event first established by Matica hrvatska which celebrates the Croatian language. It is held from March 11 to March 17. It was first held upon Croatian independence in 1991. In… … Wikipedia
Croatian Sign Language — Hrvatski znakovni jezik Signed in Croatia Native signers (30,000 all dialects of YSL) (date missing) Language family … Wikipedia
Croatian Encyclopedia — Author(s) Dalibor Brozović, Tomislav Ladan … Wikipedia
Croatian studies — (Croatian: Kroatistika, German: Kroatistik, Polish: Kroatystyka) is an academic discipline within Slavic studies which is concerned with the study of Croatian language, literature, history and culture. Within Slavic studies it belongs to the… … Wikipedia
Croatian linguistic purism — One of the features of standard Croatian language and in common with several languages such as Czech, Finnish, Slovenian, Tamil or Turkish is word coinage using roots or elements perceived as being characteristic or unique to the speech of the… … Wikipedia
Croatian months — The Croatian months used with the Gregorian calendar by Croats differ from the original Latin month names: No. Latin name English name Croatian name Croatian meaning 1 Ianuarius January Siječanj month of cutting (wood) 2 Februarius February… … Wikipedia
Language policy — Many countries have a language policy designed to favour or discourage the use of a particular language or set of languages. Although nations historically have used language policies most often to promote one official language at the expense of… … Wikipedia
Serbian language — Serbian српски srpski Pronunciation [sr̩̂pskiː] Spoken in See below under Official status in Central and in immigrant communities in Western Eur … Wikipedia

Academic Dictionaries and Encyclopedias

Croatian Language Corpus

Contents

Background

Goals

Format and Availability

Content

Cooperation

References

External Links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Croatian Language Corpus

Contents

Background

Goals

Format and Availability

Content

Cooperation

References

External Links

Look at other dictionaries:

Share the article and excerpts

Direct link