To ensure compatibility across the individual corpora in ICE, each team is following a common corpus design, as well as common schemes for textual and grammatical annotation. Each component corpus contains 500 texts of approximately 2,000 words each - a total of approximately one million words. Some of the texts are composite, made up of two or more samples of the same type.
The design of ICE corpora is as follows:
Numbers in brackets indicate the number of 2,000-word texts in each category.
The texts in the corpus date from 1990 or later. The authors and speakers of the texts are aged 18 or above, were educated through the medium of English, and were either born in the country in whose corpus they are included, or moved there at an early age and received their education through the medium of English in the country concerned.
The corpus contains samples of speech and writing by both males and females, and it includes a wide range of age groups. The proportions, however, are not representative of the proportions in the population as a whole: women are not equally represented in professions such as politics and law, and so do not produce equal amounts of discourse in these fields. Similarly, various age groups are not equally represented among students or academic authors.
© 2009 The ICE Project