Folks, There is now an updated version of the TDT3 Arabic Text Corpus, which includes the output of machine translation into English; this release, which is designated as version 1.1, replaces the original version (1.0) that I announced a few weeks ago. The overall quality and vocabulary coverage of this MT output is likely to fall short of the standards already established in the Chinese-to- Enlgish MT provided in TDT2 and TDT3. For example, about 18% of word tokens in the MT output are untranslated Arabic. Participants who already requested this corpus (LDC2002E32, TDT3 Arabic Text), have already been notified of the update. If you intend to participate in the TDT-2002 Evaluation, and have not yet obtained this corpus, please: - send email to ldc@ldc.upenn.edu - mention that you are a participant in TDT-2002 - request corpus LDC2002E32 (TDT3 Arabic Text) Those who have an LDC membership for 2001 or 2002, or who have already obtained the previously published "Arabic Newswire Text" corpus (LDC2001T55), will receive the TDT3 Arabic Text corpus without further ado. Participants who do not have an LDC membership, and have not previously purchased the larger newswire text corpus, will need to submit a signed user agreement form (Ilya Ahtaridis, the LDC Membership Coordinator, will provide the appropriate form), and will further need to agree to the following additional condition: By December 31, 2002 (i.e. at the conclusion of the TDT2002 evaluation cycle and workshop), you must either pay for an LDC membership, OR pay a non-member purchase price for this corpus (to be determined, but typically less than the cost of a membership), OR DELETE the data from all storage media at your institution. (This is a standard condition to permit cost-free use of copyrighted data as part of research participation in a sponsored evaluation program.) The data will be delivered via ftp; instructions for ftp retrieval will be provided in response to your email request. Dave Graff ------------------------------------------------------------- To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu with "unsubscribe tdt-distrib" in the body of the message.