Bijankhan Corpus

Bijankhan Corpus

The Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc; in about 4300 different subject categories. The Bijankhan collection contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.

Bijankhan corpus was created by the Data Base Research Group at the University of Tehran. The corpus is non-free in that it is not free for commercial use.

ee also

*Hamshahri Corpus
*Persian Today Corpus

External links

* [http://ece.ut.ac.ir/dbrg/Bijankhan Bijankhan corpus] .


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Text corpus — In linguistics, a corpus (plural corpora ) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or… …   Wikipedia

  • Hamshahri Corpus — The Hamshahri Corpus is a sizable Persian (Farsi) corpus based on the Iranian newspaper Hamshahri, one of the first online Persian newspapers in Iran. It was in initially collected and compiled by Ehsan Darrudi () at DBRG Group… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”