Oxford English Corpus

Oxford English Corpus

The Oxford English Corpus is a text corpus of English language used by the makers of the Oxford English Dictionary and by Oxford University Press's language research programme. It is the largest corpus of its kind, containing over two billion words.[1] The sources for these words are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of chatrooms, emails, and weblogs"[2]. This may be contrasted with similar databases that sample only a specific kind of writing.

The digital version of the Oxford English Corpus is formatted in XML and usually analysed with Sketch Engine software.[3]

Each document in the OE Corpus is accompanied by metadata naming:

  • title
  • author (if known; many websites make this difficult to determine reliably)
  • author gender (if known)
  • language type (e.g. British English, American English)
  • source website
  • year (+ date, if known)
  • date of collection
  • domain + subdomain
  • document statistics (number of tokens, sentences, etc.)[3]

See also

References

  1. ^ AskOxford.com: How the OED got shorter. Retrieved: 2 December 2007.
  2. ^ AskOxford.com: The Oxford English Corpus. Retrieved 2 December 2007.
  3. ^ a b Technical information. Retrieved June 22, 2006.

External links



Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Oxford English Dictionary, The — ▪ English dictionary  definitive historical dictionary of the English language, originally consisting of 12 volumes and a 1 volume supplement. The dictionary is a corrected and updated revision of A New English Dictionary on Historical Principles …   Universalium

  • Oxford English Dictionary — OED redirects here. For other uses, see OED (disambiguation). This article is about the multi volume historical dictionary. For other, smaller, dictionaries published by Oxford, including the one volume Oxford Dictionary of English, see… …   Wikipedia

  • Concise Oxford English Dictionary — The Concise Oxford English Dictionary (officially titled The Concise Oxford Dictionary until 2002, and widely abbreviated COD) is probably the best known of the smaller Oxford dictionaries. The latest edition of the Concise Oxford English… …   Wikipedia

  • Shorter Oxford English Dictionary — The Shorter Oxford English Dictionary, often abbreviated to SOED, is a scaled down version of the Oxford English Dictionary (OED). It comprises two volumes rather than the twenty needed for the full second edition of the OED. The sixth edition… …   Wikipedia

  • corpus — meaning ‘a collection of writings’, has a plural corpora, although corpuses is increasingly found. In the domain of language and linguistics it is used to refer to a collection of texts of all kinds, written and spoken, which are read and… …   Modern English usage

  • Corpus linguistics — is the study of language as expressed in samples (corpora) or real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally …   Wikipedia

  • English language in England — refers to the English language as spoken in England, part of the United Kingdom. There are many different accents and dialects throughout England and people are often very proud of their local accent or dialect, however there are many associated… …   Wikipedia

  • English literature — Introduction       the body of written works produced in the English language by inhabitants of the British Isles (including Ireland) from the 7th century to the present day. The major literatures written in English outside the British Isles are… …   Universalium

  • Oxford Dictionary of English — A copy of the 2001 edition of NODE The Oxford Dictionary of English (formerly The New Oxford Dictionary of English, often abbreviated to NODE) is a single volume English language dictionary first published in 1998 by Oxford University Press. This …   Wikipedia

  • English words first attested in Chaucer — Contents 1 Etymology 2 List 2.1 Canterbury Tales General Prologue …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”