American National Corpus

American National Corpus

American National Corpus (ANC) is a paid membership-based collaboratory with the aim of creating an electronic text corpus of American English. The collection will include text and transcripts of spoken data produced from 1990, with the goal of a 100 million word corpus.

ANC Consortium members include publishers, software companies, and academic members. Consortium members have exclusive access throughout the development period and for five years after the first installment of the corpus. The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003. The data includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma. The corpus is provided in XML format conformant to the XML Corpus Encoding Standard (XCES).

ee also

* Corpus of Contemporary American English (COCA) 360 million words, 1990-2007. Freely searchable online.
* British National Corpus
* Oxford English Corpus

External links

* [http://www.cs.vassar.edu/~ide/papers/anc-lrec04.pdf The American National Corpus First Release]
* [http://americannationalcorpus.org ANC Website]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • British National Corpus — The British National Corpus (or just BNC) is a 100 million word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus (text collection) in the field of corpus linguistics. The… …   Wikipedia

  • Corpus linguistics — is the study of language as expressed in samples (corpora) or real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally …   Wikipedia

  • Corpus of Contemporary American English — The freely searchable 425 million word Corpus of Contemporary American English (COCA) is the largest corpus of American English currently available, and the only publicly available corpus of American English to contain a wide array of texts from… …   Wikipedia

  • Corpus-assisted discourse studies — Corpus assisted discourse studies, or CADS, is related historically and methodologically to the discipline of corpus linguistics. The principal endeavor of corpus assisted discourse studies is the investigation, and comparison of features of… …   Wikipedia

  • Corpus oraux — Corpus oral En linguistique, un corpus oral est un corpus constitué de transcriptions de données orales. Bibliographie Olivier Baude, Corpus oraux. Guide des bonnes pratiques, Paris, CNRS, 2006 Douglas Biber, Variation across speech and writing,… …   Wikipédia en Français

  • American and British English spelling differences — Spelling differences redirects here. For other uses, see Category:Language comparison. For guidelines on dialects and spelling in the English language version of Wikipedia, see Wikipedia:Manual of Style#National varieties of English. Differences… …   Wikipedia

  • Corpus oral — En linguistique, un corpus oral est un corpus constitué de transcriptions de données orales. Bibliographie Olivier Baude, Corpus oraux. Guide des bonnes pratiques, Paris, CNRS, 2006 Douglas Biber, Variation across speech and writing, Cambridge,… …   Wikipédia en Français

  • Text corpus — In linguistics, a corpus (plural corpora ) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or… …   Wikipedia

  • Corpus Christi Fuel — Full name Club Corpus Christi Fuel Nickname(s) Fuel Founded 2010 …   Wikipedia

  • Corpus Christi R. C. Church Complex — Corpus Christi (Church) Location 199 Clark Street, Buffalo, New York Country …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”