International Corpus of English

International Corpus of English

The International Corpus of English (ICE) is a set of corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

The project began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Eighteen research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety. To ensure compatability among the component corpora, each team is following a common corpus design, as well as a common scheme for grammatical annotation.

The current list of participant countries are (*= available):

* Australia
* Cameroon
* Canada
* East Africa (Kenya, Malawi, Tanzania)*
* Fiji
* Great Britain* (parsed)
* Hong Kong*
* India*
* Ireland*
* Jamaica
* Kenya
* Malta
* Malaysia
* New Zealand*
* Nigeria
* Pakistan
* Philippines*
* Sierra Leone
* Singapore*
* South Africa
* Sri Lanka
* Trinidad and Tobago
* USA

Each corpus contains one million words in 500 texts of 2000 words, following the sampling methodology used for the Brown Corpus. Unlike Brown or the Lancaster-Oslo-Bergen (LOB) Corpus (or indeed mega-corpora such as the British National Corpus), however, the majority of texts are derived from spoken data.

ICE corpora contain 60% (600,000 words) of orthographically transcribed spoken English. The father of the project, Sidney Greenbaum, insisted on the primacy of the spoken word, following Randolph Quirk and Jan Svartvik's collaboration on the original London-Lund Corpus (LLC). This emphasis on word-for-word transcription marks out ICE from many other corpora, including those containing, e.g. parliamentary or legal paraphrases.

The British Component of ICE, ICE-GB, is fully parsed with a detailed Quirk "et al" [Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey and Svartvik, Jan (1985). "A Comprehensive Grammar of the English Language" London: Longman] phrase structure grammar, and the analyses have been thoroughly checked and completed. This analysis includes a part-of-speech tagging and parsing of the entire corpus. The treebank can be thoroughly searched and explored with the "ICE Corpus Utility Program" or "ICECUP" software. More information is in the handbook. [Nelson, Gerald, Wallis, Sean, and Aarts, Bas (2002). "Exploring Natural Language. Working with the British Component of the International Corpus of English" Amsterdam: John Benjamins]

To ensure compatibility between the individual corpora in ICE, each team is following a common corpus design, as well as a common scheme for grammatical annotation. [ [http://www.ucl.ac.uk/english-usage/ice/ The International Corpus of English website ] ]

References

ee also

*Corpus linguistics
*British National Corpus
*BYU Corpus of American English

External links

* [http://www.ucl.ac.uk/english-usage/ice/ The International Corpus of English website]
* [http://www.ucl.ac.uk/english-usage/projects/ice-gb The British Component of the International Corpus of English]
* [http://www.ucl.ac.uk/english-usage/resources/icecup ICECUP]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Corpus linguistics — is the study of language as expressed in samples (corpora) or real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally …   Wikipedia

  • Corpus of Contemporary American English — The freely searchable 425 million word Corpus of Contemporary American English (COCA) is the largest corpus of American English currently available, and the only publicly available corpus of American English to contain a wide array of texts from… …   Wikipedia

  • English literature — Introduction       the body of written works produced in the English language by inhabitants of the British Isles (including Ireland) from the 7th century to the present day. The major literatures written in English outside the British Isles are… …   Universalium

  • Corpus Vitrearum Medii Aevi — Das Corpus Vitrearum Medii Aevi (CVMA) (lat. etwa für „(Gesamt )Werk der Glasmalerei des Mittelalters“), kurz: Corpus Vitrearum (CV), ist ein internationales kunstgeschichtliches Forschungsunternehmen, das sich zum Ziel gesetzt hat, alle… …   Deutsch Wikipedia

  • Survey of English Usage — The Survey of English Usage was the first research centre in Europe to carry out research with corpora. The Survey is based in the Department of English Language and Literature at University College London. History The Survey of English Usage was …   Wikipedia

  • Brown Corpus — The Brown University Standard Corpus of Present Day American English (or just Brown Corpus) was compiled by Henry Kucera and W. Nelson Francis at Brown University, Providence, RI as a general corpus (text collection) in the field of corpus… …   Wikipedia

  • Text corpus — In linguistics, a corpus (plural corpora ) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or… …   Wikipedia

  • Corpus vasorum antiquorum — (abbreviated CVA) is an international research project for ceramic documentation of the classical area. CVA is the first and oldest research project of the Union Académique Internationale of France. The first project meeting was organized by… …   Wikipedia

  • Corpus — (Latin plural corpora, English plural corpuses or corpora) is Latin for body. It may refer to: Contents 1 Law 2 Biology …   Wikipedia

  • Corpus Christi — ( body of Christ in Latin) may refer to: Contents 1 Religion 2 Places and related matters 3 Educational institutions 3.1 University Colleges …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”