The corpora in ICE are being
annotated at various levels to enhance their value in linguistic research.
These levels are
Spoken texts are transcribed orthographically, and are marked for pauses, overlapping strings, discourse phenomena such as false starts and hesitations, and speaker turns.
The markup manual is available here.
The tagging manual is available here
For more details about the
grammatical annotation, see the Quick
Guide to the ICE-GB Grammar (on the UCL server).
The parse trees are edited and corrected, if necessary, using a version of ICECUP, a dedicated syntactic tree editor and retrieval system which has been developed specifically for ICE.
In addition to the annotation levels
above, some ICE teams will digitize their sound recordings, aligning them
with the orthographic transcriptions. The British team has completed the digitization and alignment stage. The American team is adding detailed prosodic annotations
to their transcriptions of spoken texts.
© 2009 The ICE Project