Arabic Information Retrieval and Computational Linguistics Resources
Downloadable Software and Resources
- Morphological
Analyzer
- A morphological analyzer and a light stemmer for Arabic, both
created by Kareem Darwish at the University of Maryland.
- Bilingual
Dictionary
- David Smith at Tufts University has provided a rekeyed
Arabic/English bilingual dictionary that is out of copyright.
- Parallel UN Docuemnts
- The Linguistic Data Consortium is presently seeking
distribution rights for a large collection of Arabic and
English translation-equivalent documents, but it is not yet
available. In the mean time, Alex Fraser and Jinxi Xu at BBN
have provided a set of translation
probabilities for plausible translations discovered
in that corpus.
- Parallel Web
Pages
- An automatically assembled corpus of 2,190 translation-equivalent
Web page pairs from the Internet Archive.
- New
Zealand Digital Library
- A demonstration of monolingual Arabic IR. The software is
available under the GNU public license.
- Bilingual
Term List
- A very small and eclectic bilingual term list.
Standard Corpora
- Agence France
Presse
- A collection of 380,000 newswire stories from 1994-2000 that is
available from the Linguistic Data Consortium.
- Al-Hayat
- A collection of over 42,000 newspaper stories from 1994 that is
available
from the European Language Resources
Distribution Agency.
Online MT Systems
- Ajeeb
English-Arabic Bidirectional Translation
- An online service for translating between Arabic and English using the
Sakhr MT system. Includes transliteration capabilities.
- Al-Misbar
English to Arabic Translation
- An online service for translating from English into Arabic.
Online Bilingual Dictionaries
- Ajeeb Arabic-English
Dictionary
- A Web interface that allows the Sakhr bidirectional bilingual
dictionary to be queried one word at a time.
- Al-Misbar Dictionary
- A Web interface that allows the Al-Misbar bidirectional
dictionary to be queried one word at a time.
- Ectaco
Bilingual Dictionary
- A Web interface to a bidirectional English/Arabic bilingual
dictionary.
Other Online Resources
- Xerox
Arabic Morphology
- A Web interface developed by Ken
Beesley that provides a morphological analysis for
Arabic text.
Information Retrieval Evaluations
- TDT
- The Topic Detection and Tracking evaluation, which in 2002 will
include Arabic documents from a subset of the LDC AFP corpus.
Relevance judgments are available for a set of topics that are defined
using example documents (rather than topic descriptions).
- TREC
- The TREC-2002 CLIR track is developing a large Arabic IR
test collection based on the LDC AFP Arabic corpus, with topic
descriptions in English and Arabic and relevance judgments. A Web page and papers from the TREC-2001 CLIR
track (which included
English, French and Arabic topic descriptions for the same
collection) are also available.
Computational Linguistics Workshops
There is a continuing sequence of Arabic computational linguistics
workshops that meet occassionally Europe, North Africa or the Middle
East (sometimes in conjunction with a major conference). There also
is a repeating workshop series on Computational Approaches to Semitic
Languages that meets in some years in conjunction with the Association
for Computational Linguistics conferences that typically includes
extensive treatment of Arabic..
- Computational
Approaches to Semitic Languages
- A workshop held in Montreal in August, 1998 in conjunction with
the joint COLING/ACL conference.
- ATLAS
- The Arabic Translation and Localization Symposium, held in
Tunis in May, 1999.
- Arabic
Language Resources and Evaluation
- A workshop held in conjunction with the Language Resources
Evaluation Conference (LREC) in the Canary Islands in May, 2002. A list
of papers is also available.
- Computational Approaches
to Semitic Languages
- A workshop held in Philadelphia in July, 2002 in conjunction
with the Association for Computational Linguistics conference.
Research Groups
- BBN
- A presentation by Verizon BBN Technologies that describes their
plans to develop an Arabic information retrieval system
(accessible through the agenda).
- Cairo
University
- A mention of an Arabic morphological analyzer developed by
Khaled Shaalan.
- Dalhousie University
- The home page of Haidar Moukdad.
- DeMontfort University
- A description of a research project on Arabic information retrieval
being conducted by a student working with Kamal Bechkoum.
- Georgetown
University
- Catherine Ball's Information Alchemy project for Arabic-English
Translingual Information Retrieval. The Arab
Information Project at Georgetown
is also a source of insight into potential corpora for Arabic
IR research.
- Illinois Institute of Technology
- One of the participating groups in the TREC information Arabic
CLIR track. Martha
Evens' Computational Lexicography group has also published many papers
on Arabic information retrieval and computational linguistics.
- IRMC
- Brief mention (at the end) of a project by Fathi
Debili of the Tunisian Institut de Recherche sur le
Maghreb Contemporain and the French CNRS Center d'Etudes del
Languages et Literatures
du Monde Arabe (CELLMA) to automatically construct a
French-Arabic bilingual dictionary.
- KACST
- Work on Arabic information retrieval and Arabic computational
linguistics at
the King Abdulaziz City for Science and Technology. A paper by
Ibrahim Al-Kharashi is also available.
- Lancaster
University
- Work on Arabic stemming and part-of-speech tagging by Shereen
Khoja.
- Nara Institute of Science and Technology
- A description of interest in Arabic/English/French
cross-language information retrieval by Fatiha
Sadat.
- New
Mexico State University
- A project that is developing Arabic-English cross-language
information retrieval techniques. Information about the Temple Project
is also available, including additional
details about the Arabic morphology and Arabic-English machine
readable dictionaries.
- RDI
- An Egyptian company that develops Arabic IR systems and works
on Arabic computational morphology.
- SRA
International
- A gzipped postscript paper describing the TAGARAB named entity
tagger developed by
John Maloney and Michael Niv using the SRA
NetOwl TurboTag system.
- University
of Bergen
- A project that is exploring user needs for Arabic information
retrieval.
- Open
University
- An Arabic information retrieval project by Anne DeRoeck and
others.
- University
of Greenwich
- Work on Arabic question answering systems by Ahmed Yamani.
- University of
Maryland
- A project that is developing Arabic-English cross-language
information retrieval techniques.
- USC-ISI
- The GAZELLE project led by Kevin
Knight of the
University of Southern California Information Sciences Institute that
developed Arabic to English translation technology.
- Directory
of Informatics Experts
- Contact information for informatics experts in Arab States that is
provided by UNESCO. A similar list for Ecole
des sciences de l'information in Rabat Morocco is also available.
Other Arabic Resources
- Arabic
Language Computing
- Some links collected by Hachim
Haddouti.
- Arabic
Text Corpora
- Advice on where to find Arabic text corpora from the University
of Edinburgh.
- Linguistic
Data Consortium
- A description on an unprocessed corpus of Arabic newswire
text.
- Arabic
Lexicography
- A useful set of resources from Tim
Buckwalter
Companies
- AppTek
- A company that sells an Arabic to
English MT system and is building an English to Arabic system.
- Aramedia
- A comprehensive source for software that is designed for the
Arabic market. Several machine translation systems and online
dictionaries are described.
- Sakhr
- The leading maker of Arabic software, including the Bidi
bidirectional English/Arabic MT system and the Arab Dox
information retrieval system (which is designed to work
with scanned document images).
Doug Oard
Last modified: Fri Jul 19 19:33:43 2002