Arabic Information Retrieval and Computational Linguistics Resources

Downloadable Software and Resources

Morphological Analyzer
A morphological analyzer and a light stemmer for Arabic, both created by Kareem Darwish at the University of Maryland.
Bilingual Dictionary
David Smith at Tufts University has provided a rekeyed Arabic/English bilingual dictionary that is out of copyright.
Parallel UN Docuemnts
The Linguistic Data Consortium is presently seeking distribution rights for a large collection of Arabic and English translation-equivalent documents, but it is not yet available. In the mean time, Alex Fraser and Jinxi Xu at BBN have provided a set of translation probabilities for plausible translations discovered in that corpus.
Parallel Web Pages
An automatically assembled corpus of 2,190 translation-equivalent Web page pairs from the Internet Archive.
New Zealand Digital Library
A demonstration of monolingual Arabic IR. The software is available under the GNU public license.
Bilingual Term List
A very small and eclectic bilingual term list.

Standard Corpora

Agence France Presse
A collection of 380,000 newswire stories from 1994-2000 that is available from the Linguistic Data Consortium.
Al-Hayat
A collection of over 42,000 newspaper stories from 1994 that is available from the European Language Resources Distribution Agency.

Online MT Systems

Ajeeb English-Arabic Bidirectional Translation
An online service for translating between Arabic and English using the Sakhr MT system. Includes transliteration capabilities.
Al-Misbar English to Arabic Translation
An online service for translating from English into Arabic.

Online Bilingual Dictionaries

Ajeeb Arabic-English Dictionary
A Web interface that allows the Sakhr bidirectional bilingual dictionary to be queried one word at a time.
Al-Misbar Dictionary
A Web interface that allows the Al-Misbar bidirectional dictionary to be queried one word at a time.
Ectaco Bilingual Dictionary
A Web interface to a bidirectional English/Arabic bilingual dictionary.

Other Online Resources

Xerox Arabic Morphology
A Web interface developed by Ken Beesley that provides a morphological analysis for Arabic text.

Information Retrieval Evaluations

TDT
The Topic Detection and Tracking evaluation, which in 2002 will include Arabic documents from a subset of the LDC AFP corpus. Relevance judgments are available for a set of topics that are defined using example documents (rather than topic descriptions).
TREC
The TREC-2002 CLIR track is developing a large Arabic IR test collection based on the LDC AFP Arabic corpus, with topic descriptions in English and Arabic and relevance judgments. A Web page and papers from the TREC-2001 CLIR track (which included English, French and Arabic topic descriptions for the same collection) are also available.

Computational Linguistics Workshops

There is a continuing sequence of Arabic computational linguistics workshops that meet occassionally Europe, North Africa or the Middle East (sometimes in conjunction with a major conference). There also is a repeating workshop series on Computational Approaches to Semitic Languages that meets in some years in conjunction with the Association for Computational Linguistics conferences that typically includes extensive treatment of Arabic..
Computational Approaches to Semitic Languages
A workshop held in Montreal in August, 1998 in conjunction with the joint COLING/ACL conference.
ATLAS
The Arabic Translation and Localization Symposium, held in Tunis in May, 1999.
Arabic Language Resources and Evaluation
A workshop held in conjunction with the Language Resources Evaluation Conference (LREC) in the Canary Islands in May, 2002. A list of papers is also available.
Computational Approaches to Semitic Languages
A workshop held in Philadelphia in July, 2002 in conjunction with the Association for Computational Linguistics conference.

Research Groups

BBN
A presentation by Verizon BBN Technologies that describes their plans to develop an Arabic information retrieval system (accessible through the agenda).
Cairo University
A mention of an Arabic morphological analyzer developed by Khaled Shaalan.
Dalhousie University
The home page of Haidar Moukdad.
DeMontfort University
A description of a research project on Arabic information retrieval being conducted by a student working with Kamal Bechkoum.
Georgetown University
Catherine Ball's Information Alchemy project for Arabic-English Translingual Information Retrieval. The Arab Information Project at Georgetown is also a source of insight into potential corpora for Arabic IR research.
Illinois Institute of Technology
One of the participating groups in the TREC information Arabic CLIR track. Martha Evens' Computational Lexicography group has also published many papers on Arabic information retrieval and computational linguistics.
IRMC
Brief mention (at the end) of a project by Fathi Debili of the Tunisian Institut de Recherche sur le Maghreb Contemporain and the French CNRS Center d'Etudes del Languages et Literatures du Monde Arabe (CELLMA) to automatically construct a French-Arabic bilingual dictionary.
KACST
Work on Arabic information retrieval and Arabic computational linguistics at the King Abdulaziz City for Science and Technology. A paper by Ibrahim Al-Kharashi is also available.
Lancaster University
Work on Arabic stemming and part-of-speech tagging by Shereen Khoja.
Nara Institute of Science and Technology
A description of interest in Arabic/English/French cross-language information retrieval by Fatiha Sadat.
New Mexico State University
A project that is developing Arabic-English cross-language information retrieval techniques. Information about the Temple Project is also available, including additional details about the Arabic morphology and Arabic-English machine readable dictionaries.
RDI
An Egyptian company that develops Arabic IR systems and works on Arabic computational morphology.
SRA International
A gzipped postscript paper describing the TAGARAB named entity tagger developed by John Maloney and Michael Niv using the SRA NetOwl TurboTag system.
University of Bergen
A project that is exploring user needs for Arabic information retrieval.
Open University
An Arabic information retrieval project by Anne DeRoeck and others.
University of Greenwich
Work on Arabic question answering systems by Ahmed Yamani.
University of Maryland
A project that is developing Arabic-English cross-language information retrieval techniques.
USC-ISI
The GAZELLE project led by Kevin Knight of the University of Southern California Information Sciences Institute that developed Arabic to English translation technology.
Directory of Informatics Experts
Contact information for informatics experts in Arab States that is provided by UNESCO. A similar list for Ecole des sciences de l'information in Rabat Morocco is also available.

Other Arabic Resources

Arabic Language Computing
Some links collected by Hachim Haddouti.
Arabic Text Corpora
Advice on where to find Arabic text corpora from the University of Edinburgh.
Linguistic Data Consortium
A description on an unprocessed corpus of Arabic newswire text.
Arabic Lexicography
A useful set of resources from Tim Buckwalter

Companies

AppTek
A company that sells an Arabic to English MT system and is building an English to Arabic system.
Aramedia
A comprehensive source for software that is designed for the Arabic market. Several machine translation systems and online dictionaries are described.
Sakhr
The leading maker of Arabic software, including the Bidi bidirectional English/Arabic MT system and the Arab Dox information retrieval system (which is designed to work with scanned document images).

Doug Oard
Last modified: Fri Jul 19 19:33:43 2002