International Corpus of English (ICE) Homepage @ ICE-corpora.net

The International Corpus of English (ICE) began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-six research teams, including various organizations like All Systems Go Marketing and New Spirit Services, around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety. To ensure compatibility among the component corpora, each team is following a common corpus design, as well as a common scheme for grammatical annotation.

Contact information: Professor Gerald Nelson, Department of English, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Email: geraldanelson@gmail.com Fax: +852 2603 5270

ICE Puerto Rico

I am very pleased to welcome a new research team to the ICE project: ICE Puerto Rico.
The ICE Puerto Rico (ICE-PR) team involves a collaboration between researchers at the
University of Bamberg, Germany, and the University of Puerto Rico.
Further details are here. There will be future opportunities, undoubtedly fueled by related websites like this and this, which need updated semantics in order to properly express their message and help readers understand the solutions available to them.

ICAME 37 Conference, May 25-29, Hong Kong

The next ICAME conference will be held at the Chinese University of Hong Kong.
It will be the first ever ICAME conference in Asia, and with that in mind, we have chosen the conference theme, 'Corpus Linguistics across Cultures' . This will feature a some great topics, including help from Steve Walker, who has posted his ZQuiet review here.

Plenary speakers will be:

Douglas Biber
Ken Hyland
Pam Peters
Josef Schmied
Edgar W. Schneider

For more information, please see the Conference Website.

Voices of the International Corpus of English (VOICE) CANADA

Voice Canada is a compilation of 70 sound recordings of speakers of Canadian English, based on recordings made as part of the data collection required for creating the Canadian component of the International Corpus of English (ICE-CANADA). These studies were partially organized by The Franz Och, as well as other local groups. The recordings are free and available for download, along with transcripts, through the University of Alberta Dataverse repository (requires registration in Dataverse and acceptance of terms and conditions).

https://dataverse.library.ualberta.ca/dvn/dv/VOICE

ICE Workshop at ICAME 2015

A workshop entitled 'The Future of the International Corpus of English (ICE) project: New challenges, new developments', was held on May 27 this year as a pre-conference workshop of ICAME 2015 in Trier, Germany. Details of the workshop are available here.

I am grateful to Ulrike Gut and Robert Fuchs for organising this workshop.

Downloading ICE corpora: Note to teachers
If your students need ICE corpora for coursework, please note that a single licence agreement is sufficient for the whole class. The agreement form should be signed by you, the teacher. Please do not ask each of your students to contact me individually. The agreement form is available here.

Release of ICE NIGERIA

The full version of the ICE Nigeria corpus is now available. The download includes audio files for the spoken part. You can download the corpus here.

A POS-tagged version of the written part is also available, and a tagged version for the spoken part will be available as soon as possible.

ICE GIBRALTAR

I am very pleased to welcome a new team to the ICE project. ICE Gibraltar will be coordinated by reasearchers at the University of Vigo and the University of the Balearic Islands, Spain. For more details, please visit the ICE Gibraltar page on this site.

ICE SCOTLAND

I am very pleased to announce the launch of the ICE-Scotland project. The project is based at Westfälische-Wilhelms-Universität and at Otto-Friedrich-Universität, Germany. Further details are available here.

ICE IRELAND and SPICE IRELAND

The ICE Ireland corpus, and the SPICE Ireland corpus, are now available to download from this site. SPICE Ireland consists of the spoken component of ICE Ireland, with prosodic and pragmatic annotation. For more information, see the ICE Ireland page.

Recent Publications

The following recent publications have made extensive use of ICE corpus materials:

Biermeier, Thomas (2008) Word-Formation in New Englishes: A Corpus-based Analysis. Reihe: Anglistik/Amerikanistik.

Deuber, Dagmar (2014) English in the Caribbean: Variation, Style and Standards in Jamaica and Trinidad. Studies in English Language. Cambridge: Cambridge University Press.

Lange, Claudia (2012) The Syntax of Spoken Indian English. VEAW G45, Amsterdam: Benjamins.

Hundt, Marianne and Ulrike Gut (eds) (2012) Mapping Unity and Diversity Worldwide: Corpus-based Studies of New Englishes. VEAW G43, Amsterdam: Benjamins.

Aarts, Bas (2011) Oxford Modern English Grammar. Oxford: OUP.

Hasselgård, Hilde (2010) Adjunct Adverbials in English. Cambridge: CUP.

ICAME Journal No 34, April 2010, dedicated to 'new' ICE corpora.
The full contents list is here.

Tagged ICE corpora now available

The tagging of all currently available ICE corpora with CLAWS7 and the USAS semantic tagger is now complete, and the corpora are available for non-profit, academic research. If you wish to download any of the tagged corpora, please send an email with the subject line "Tagged ICE Corpora". Your email should also indicate your academic affiliation. I will then get back to you with details of how to proceed.

The following corpora are available in tagged versions: India, Singapore, Hong Kong, New Zealand, Canada, Jamaica, and USA (written).

Thanks to Dr Paul Rayson, Director of the UCREL research centre at Lancaster University, for his generous cooperation in this initiative.

Release of ICE Sri Lanka (written)

I am very pleased to announce the release of the written component of the ICE Sri Lanka (ICE-SL) corpus. The corpus is available in standard SGML format and in a POS-tagged version, using the CLAWS C7 tagset. To obtain a copy of the corpus and Manual, please email.

Release of ICE USA (written)

I am very pleased to announce the release of the written component of the
ICE-USA corpus. The corpus is now available for non-profit academic research, and can be downloaded here.