Doug Oard's Research Page


Current Research Projects


Research Directory Pages

Community-wide resources on subjects that interest me. Most of these pages are not actively maintained, so they are best thought of as a snapshot of what a field looked like near the time I first built the page.

Workshop Home Pages


Journal Articles

I have signed over the copyright for these articles to the publishers, so they are not available here unless the copyright agreement specifically allows that. These are generally the best sources on the indicated topics, since they have generally been through rigorous peer review.

Book Chapters

I have signed over the copyright for these book chapters to the publishers, so they are not available here unless the copyright agreement specifically allows that.

Selected Papers

This is a mix of peer reviewed and unrefereed conference and workshop papers, organized by subject. Most of these papers are available as postscript, some are in other formats. Coauthors are indicated on the papers themselves. Some older papers that might have historical value are also available.

Cross-Language Retrieval Overviews

Towards Analysis Tools for a Multilingual Blogosphere
Presented at the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs at Stanford in 2006.
Language Technologies for Scalable Digital Libraries
Presented as an invited paper at the 2004 International Conference on Digital Libraries in New Delhi, India in February 2004. Also available as Microsoft Word. The Powerpoint slides contain extensive notes from the presentation.
When You Come to a Fork in the Road, Take It!
Presented as the keynote address at the SIGIR 2003 Workshop on the future of Cross-Language Information Retrieval research in Tampere, Finland, August, 2003.
Global Access to Multilingual Information
Presented as the keynote address at the Fourth International Workshop on Information Retrieval with Asian Languages, Taipei Taiwan, in November 1999. The Powerpoint slides from my talk are also available.
Multilingual Information Discovery and AccesS (MIDAS): A Joint ACM DL '99 / ACM SIGIR '99 Workshop"
A Workshop report that apprered in the October, 1999 issue of D-Lib magazine.
Extending Cross-Language Information Retrieval to Global Scale
Paper presented at the Workshop on Multilingual Information Management, Granada, Spain, May, 1998.
Serving Users In Many Languages: Cross-Language Information Retrieval for Digital Libraries
A description of the need for and approaches to Cross-Language IR that appeared in the December 1997 issue of D-Lib Magazine.
A Survey of Multilingual Text Retrieval
A technical report that presents my first survey of present practice in retrieval of texts in one language based on queries in another. My later surveys are much better presented, but this one has a tremendous amount of detail that is not present in the others.

Interactive Cross-Language Retrieval

Task-Based Interaction with an Integrated Multilingual, Multimedia Information System: A Formative Evaluation
Presented at JCDL 2007.
iCLEF 2004 at Maryland: Summarization Design for Interactive Cross-Language Question Answering
Presented at the CLEF-2004 Workshop in Bath, UK in September, 2004.
iCLEF-2003 at Maryland: Headline Generation and Interactive Query Formulation
Presented at the CLEF-2003 Workshop in Trondheim, Norway in August, 2003. This is the final version, updated after the workshop. A postscript version and the Powerpoint slides from the presentation are also available.
Comparing User-Assisted and Automatic Query Translation
Presented at the CLEF-2002 Workshop in Rome, Italy in September, 2002. This is the final version, updated after the workshop. The Powerpoint slides from the presentation are also available.
iCLEF 2001 at Maryland: Comparing Word-for-Word Gloss and MT
Presented at the CLEF-2001 Workshop in Darmstadt, Germany in September, 2001. This is the final version, updated after the workshop. A postscript version is also available.
Interactive Cross-Language Information Retrieval
An article that appeared in the Spring 2001 issue of the SIGIR Forum.
Rapidly Retargetable Interactive Translingual Retrieval
A paper presented at the Human Language Technologies (HLT) conference in San Diego in March, 2001.
Evaluating Interactive Cross-Language Information Retrieval: Document Selection
A paper that appeared in the Proceedings of the first Cross Language Evaluation Forum (Lisbon, September, 2000) as part of Lecture Notes in Computer Science in 2001. This paper combines ideas that I first presented at a panel at RIAO 2000 and in an invited presentation at the CLEF workshop, and it is intended to serve as a basis for developing evaluation strategies for interactive CLIR systems.
TREC-9 Experiments at Maryland: Interactive CLIR
The final paper for the TREC-9 conference that was held in Gaithersburg MD on November 15, 2000. This version superceded the working notes paper that was distributed at the conference.

Cross-Language Text Retrieval Algorithms

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval (PDF)
Presented at SIGIR 2006 in Seattle.
Cross-Language Text Classification
Presented as a poster at SIGIR 2005 in Salvador, Brazil.
Answering Spanish Questions from English Documents
Presented at the Cross-Language Evaluation Forum (CLEF) in Trondheim, Norway in August, 2003. The paper is also available as postscript.
Rapid-Response Machine Translation for Unexpected Languages
Presented at MT Summit in New Orleans in September, 2003.
Probabilistic Structured Query Methods
Presented at SIGIR in Toronto in July, 2003.
Desperately Seeking Cebuano
A short paper presented as a poster and a panel presentation at HLT/NAACL 2003 in Edmonton, Canada in May, 2003. This was written on the third day of the TIDES Surprise Language dry run.
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
Presented at the Hawaii International Conference on Systems Sciences in Hawaii, HI in January 2003.
CLIR Experiments at Maryland for TREC-2002: Evidence Combination for Arabic-English Retrieval
Presented at the Text Retrieval Conference in Gaithersburg, MD in November, 2002.
Translation-Based Indexing for Cross-Language Retrieval
A paper presented at the European Colloquium on information Retrieval in Glasgow in March, 2002. The slides from the talk are also available, as is an audio recording of the talk itself. The recording starts with a question to the previous speaker, so fast forward it a bit.
Translation Lexicon Acquisition from Bilingual Dictionaries
A paper on CLIR based on scanning printed bilingual dictionaries that was presented at the SPIE Workshop on Document Recognition and Retrieval in San Jose, CA in January 2002. This is a zipped Microsoft Word document.
TREC-10 Experiments at Maryland: CLIR and Video
Presented at the Text Retrieval Conference in Gaithersburg, MD in November, 2001.
Improved Cross-Language Retrieval Using Backoff Translation
A paper on French/English CLIR presented at the First International Conference on Human Language Technologies in San Diego in March, 2001.
NTCIR-2 ECIR Experiments at Maryland: Comparing Structured Queries and Balanced Translation
A paper on English/Chinese CLIR presented at the Second National Institute of Informatics (NII) Test Collection Information Retrieval (NTCIR) workshop. Also available as PDF.
CLEF Experiments at Maryland: Statistical stemming and backoff translation
Prepared for the first Cross Language Evaluation Forum in Lisbon in September, 2000. This is the final version of the paper, which will appear in Lecture Notes in Computer Science in 2001.
Comparison of Word-Based and Syllable-Based Retrieval for Tibetan
Presented as a poster at the Information Retrieval for Asian Languages Workshop in Hong Kong in September, 2000. The paper is also available in Microsoft Word format
Structured Translation for Cross-Language Information Retrieval
Presented at the SIGIR Conference in Athens in July, 2000. A gzip compressed version is also available.
Evaluating Lexicon Coverage for Cross-Language Information Retrieval
Presented as a poster at the Workshop on Multilingual Information Processing and Asian Language Processing, Beijing, China, in November 1999. One table formatting error is corrected in this version. The poster is also available as PDF, postscript or Powerpoint.
Resources for Chinese/English Cross-Language IR
A draft version of a technical report identifying useful resources. A better formatted Microsoft Word 97 version is also available.
A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval
Paper presented at the Third Conference of the Association for Machine Translation in the Americas (AMTA), Philadelphia, PA, October, 1998.
Adaptive Filtering of Multilingual Document Streams
Presented at the Fifth RIAO Conference on Computer Assisted Information Searching on the Internet, Montreal Canada, June 1997. A postscript version is also available. There is also an errata sheet that contains the results for a fourth topic pair that I referred to at the conference.
Adaptive Vector Space Text Filtering for Monolingual and Cross-Language Applications
A compact version of my August 1996 Ph.D. dissertation. The official version (which has different page numbering) is also available as uncompressed postscript.

iCLEF Track Overviews

iCLEF 2004 Track Overview: Interactive Cross-Language Question Answering
The Cross-Language Evaluation Forum interactive track overview, which was presented at the CLEF-2004 Workshop in Bath, UK in September, 2004.
The CLEF-2003 Interactive Track
The Cross-Language Evaluation Forum interactive track overview, which was presented at the CLEF-2003 Workshop in Trondheim, Norway in August, 2003. Also available as postscript. This is the updated version, revised after the workshop. The Powerpoint slides presented at the workshop are also available.
The CLEF-2002 Interactive Track
The Cross-Language Evaluation Forum interactive track overview, which was presented at the CLEF-2002 Workshop in Rome, Italy in September, 2002. This is the version prepared before the workshop. The Powerpoint slides presented at the workshop are also available.
The CLEF 2001 Interactive Track
The Cross-Language Evaluation Forum interactive track overview, which was presented at the CLEF-2002 Workshop in Darmstadt, Germany in September, 2001. Also available as postscript. This is the updated version, revised afer the workshop. The Powerpoint slides presented at the workshop are also available.

TREC CLIR Track Overviews

The TREC-2002 Arabic-English CLIR Track
The Text Retrieval Conference Cross-Language Information Retrieval track overview paper, presented in Gaithersburg, MD in November, 2002. This is the updated version, revised after the conference. The Powerpoint slides presented at the conference are also available.
Evaluating Arabic Retrieval from English or French Queries
Presented at the Language Resource Evaluation Conference (LREC) 2002 Workshop on Arabic Language Resources and Evaluation in Las Palmas, Spain, in May 2002. Powerpoint slides from the talk are also available.
The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic Using English, French, or Arabic Queries
The Text Retrieval Conference Cross-Language Information Retrieval track overview paper, presented in Gaithersburg, MD in November, 2001. This is the updated version, revised after the conference.
The TREC-2001 Arabic Information Retrieval Evaluation
Presented at the the Association for Computational Linguistics (ACL) Workshop on Arabic Language Processing in Toulouse, France in July 2001. Also available as postscript.

Interactive Speech Retrieval

An Interface to Search Human Movements Based on Geographic and Chronological Metadata
Presented as a poster at SIGIR 2005 in Salvador, Brazil.
First Steps Towards Linking Dialogues: Mediating Between Free-Text Questions and Pre-Recorded Viseo Answers
Presented at the Army Science Conference in Orlando, FL in 2004.
Searching Large Collections of Recorded Speech: A Preliminary Study
Presented at the American Society for Information Science and Technology in Long Beach, CA in November, 2003. A Microsoft Word version is also available.
Supporting Access to Large Digital Oral History Archives
Presented at the Joint Conference on Digital Libraries (JCDL) in Portland, OR in June 2002.
The Use of Speech Retrieval Systems: A Study Design
A paper preented at the SIGIR 2001 Workshop on IR Techniques for Speech Applications in New Orleans in September. A Microsoft Word version is also available
User Interface Design for Speech-Based Retrieval
A paper that appeared in the Bulletin of the American Society for Information Science, 26(5)20--22, June/July 2000 that was based on a presentation at the November, 1999 Annual Conference of the American Society for Information Science.
A Graphical Interface for Speech-Based Retrieval
Paper presented at the Third ACM Conference on Digital Libraries, Pittsburgh, PA, June 1998. The paper is also available in rtf format.

Speech Retrieval Algorithms

CLEF-2006 CL-SR at Maryland: English and Czech (PDF)
Presented at the Cross-Language Evaluation Forum in Alicante, Spain in 2006. This is the pre-conference working notes version; the post-conference version in Springer's Lecture Notes in Computer Science contains important additional details.
One-Sided Measures for Evaluating Ranked Retrieval Effectiveness with Spontaneous Conversational Speech (PDF)
Presented as a poster at SIGIR 2006 in Seattle.
Investigating Cross-Language Speech Retrieval for a Spontaneous Conversational Speech Collection (PDF)
Presented as a poster at HLT-NAACL 2006 in New York City. This version contains corrections to the published version, in which some incorrect scores were reported. The scores in this version were computed with the latest version of trec_eval, which detects duplicates in ranked lists.
CLEF-2005 CL-SR at Maryland: Document Expansion and Query Expansion Using Side Collections and Thesauri (PDF)
Presented at the Cross-Language Evaluation Forum in Vienna, Austria in 2005.
Building an Information Retrieval Test Collection for Spontaneous Conversational Speech
Presented at SIGIR 2004 in Sheffield, UK in June 2004.
Searching Recorded Speech Based on the Temporal Extent of Topic Labels
Presented at the AAAI Spring Symposium on Intelligent Multimedia Knowledge Management at Stanford University in March, 2003. This is the initial submitted version, not the final published version.
TDT-2002 Topic Tracking at Maryland: First Experiments with the Lemur Toolkit
Presented at the TDT-2002 workshop in Gaithersburg, MD in November, 2002.
Cross-Language Access to Recorded Speech in the MALACH project
Presented at the Text, Speech and Dialog (TSD) conference in Brno, Czech Republic in September 2002. Powerpoint slides from the talk are also available.
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
Paper presented at the Human Language Technologies (HLT) conference in San Diego in March, 2001. This reports on a 6-week summer workshop at the Johns Hopkins University in July-August, 2000. Additional details are available from the workshop report that was prepared in Fall 2000, the Powerpoint slides from the August 2000 workshop final presentation, and a talk I gave at Queens College in October, 2000. Pre-workshop planning papers are also available from the NAACL Workshop on Embedded Machine Translation in Seattle in May, 2000 (Microsoft Word version of the paper are also available) and from the TDT-3 Workshop in Tysons Corner, VA in February, 2000 (also available as gzip postscript, Microsoft Word, and gzip Microsoft Word). The slides form the TDT-3 presentation are also available.
Translingual Topic Tracking: Applying Lessons from the MEI Project
The working notes paper that was presented at the TDT workshop in Gaithersburg, MD on November 17, 2000.
Translingual Topic Tracking With PRISE
Presented at the TDT-3 Workshop in Tysons Corner, VA in February, 2000. The paper is also available as gzip postscript. The slides are also available.

CLEF Cross-Language Speech Retrieval Track Overviews

Overview of the CLEF-2006 Cross-Language Speech Retrieval Track
Presented at the Cross-Language Evaluation Forum in Alicante, Spain in September, 2006. This is the pre-conference working notes version; the post-conference version in Springer's Lecture Notes in Computer Science contains important additional details.
Overview of the CLEF-2005 Cross-Langauge Speech Retrieval Track
Presented at the Cross-Language Evaluation Forum in Vienna, Austria in September, 2005.

Document Image Retrieval

Balanced Query Methods for OCR-Based Retrieval
A paper presented at the 2003 Symposium on Document Image Understanding Technology in Columbia, MD.
Term Selection for Searching Printed Arabic
A paper to be presented at SIGIR 2002 in Tampere, Finland.
Document Image Retrieval Techniques for Chinese
A paper presented at the 2001 Symposium on Document Image Understanding Technology in Columbia, MD.
Issues in Cross-Language Retrieval from Document Image Collections
Presented at the 1999 Symposium on Document Image Understanding Technology (SDIUT), Annapolis, MD, April 1999. Two errors in the references have been corrected in this version. The Powerpoint slides from my talk are also available.

Archival Access to Email Collections

Modeling Identity in Archival Collections of Email: A Preliminary Study (PDF)
Presented at the Conference on Email and Anti-Spam, Mountain View, CA, 2006.
An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Arguments (PDF)
Presented at the Conference on Email and Anti-Spam, Mountain View, CA, 2006.
A Menagerie of Tracks at Maryland: HARD, Enterprise, QA, and Genomics, Oh My!
Presented at the Text Retrieval COnference in Gaithersburg, MD in 2005.
Indexing Emails and Email Threads for Retrieval
Presented as a poster at SIGIR 2005 in Salvador, Brazil.
eArchivarius: Accessing Collections of Electronic Mail
Presented as a demonstration at SIGIR 2003 in Toronto.

Recommender Systems

Measuring the Utility of Gaze Detection for Task Modeling: A Preliminary Study
Presented at the Intelligent User Interfaces Conference Workshop on Intelligent User Interfaces for Intelligence Analysis in Sydney, Australia in January, 2006.
On Evaluation of Adaptive Topic Tracking Systems
Presented as a poster at SIGIR 2005 in Salvador, Brazil.
TDT-2004: Adaptive Topic Tracking at Maryland
Preented at the Topic Detection and Tracking Workshop in Gaithursburg, MD in 2004.
Exploring Interactive Relevance Feedback with a Two-Pass Study Design
A technical report analyzing the design of the TREC HARD track's clarification forms from the perspective of user study design.
Protecting the Privacy of Observable Behavior in Distributed Recommender Systems
Presented at the SIGIR Workshop on Implicit Methods in Toronto in July 2003.
Modeling Information Content Using Observable Behavior
Presented at the November 2001 conference of the American Society for Information Science and Technology in Washington DC. Also available as Microsoft Word. Powerpoint slides are also available.
User Modeling for Information Access Based on Implicit Feedback
Presented at the ISKO France Workshop on Information Filtering in Paris in July 2001.
Implicit Feedback for Recommender Systems
A paper that appeared in the Proceedings of the AAAI Workshop on Recommender Systems, Madison, WI, July, 1998. The paper is also available in Microsoft Word format.

Other Papers

Combining Feature Selectors for Text Classification
Presented as a poster at the Conference on Information and Knowledge Management in Arlington, VA in 2006.
Integration of Natural Language with Structured Data: Three Test Collections
A position paper presented at the Information Integration Workshop, Philadelphia, PA, 2006
Improving Passage Retrieval Using Interactive Elicitation and Statistical Modeling
Presented at the Text Retrieval Conference in Gaithersburg Maryland as a poster in 2004.
Extrinsic Evaluation of Automatic Metrics for Summarization
A technical report on summarization evaluation.
Genomic Entity Recognition at TREC
A proposal presented at the JCDL TREC Genomics Pre-Track Workshop in Portland, OR in June 2002. A postscript version is also available.
A Survey of Information Retrieval and Filtering Methods
A technical report I coauthored containing a broad survey of recent research on techniques for information filtering and retrieval. This survey was never published, but for some reason it draws far more inquiries than any of my other papers. I guess that goes to show that it's worth it to choose a compelling title!

Selected Posters

Interactive Translingual Searching Using Document Expansion
Presented at the First Conference of the North American Chapter of the Association for Computational Linguistics in Seattle in May 2000. Also available as PDF and Postscript
LCS-Based English-Chinese Translingual Retrieval
Presented at the First Conference of the North American Chapter of the Association for Computational Linguistics in Seattle in May 2000. Also available as PDF and Postscript

Selected Presentations

Speaking to the Future
Slides from a presentation on the MALACH project on November 6, 2003. A press release describing the talk is also available.
The TIDES Surprise Language Exercises
Slides from a talk presented at MITRE on October 23, 2003
Searching Spoken Word Collections
Slides from a talk at Columbia University on October 16, 2003.
IR Systems as Integration Platforms for Language Technologies
Powerpoint slides from a tutorial at the HLT 2003 conference on May 27, 2003.
The Cross-Language Evaluation Forum (CLEF) 2001 Interactive Track
A presentation for the Human-Computer Interaction Laboratory Workshop on Evaluation of Interactive Cross-Language Information Retrieval on May 31, 2001.
IRAL/ACL 00 Tutorial
Powerpoint slides for a joint tutorial that I presented for the Workshop on Information Retrieval in Asian Languages and the Conference of the Association for Computational Linguistics in Hong Kong on October 2, 2000.
Throwing the Book at Digital Libraries: What's Around the Corner?
Presented at the Annual Conference of the Maryland Library Association, Baltimore, MD, May 2000.
User Interface Design for Content-Based Audio Retrieval
Powerpoint slides presented at the Johns Hopkins University Center for Speech and Language Processing on March 23, 1999.
SIGIR 97 Tutorial on Cross-Language Information Retrieval
Slides from a 4 hour tutorial that include cross references into the associated bibliography. A fairly easy way to pick out one or two useful references if you know what you are looking for, and a good source for my present thinking on the organization of the field. The slides are also available in Powerpoint version 4.0.
Evaluating Corpus-based Cross-language IR Effectiveness
The handout from my presentation at the SIGIR 97 Workshop on Crosslingual Information Retrieval. The full size slides are also available. The reference to known item searching in TREC 6 on these slides is incorrect.

Edited Works

ACM TALIP Special Issue on the TIDES Surprise Language
A pair of special issues (June and September 2003) of the ACM Transactions on Asian Language Information Processing that I edited. Membership in the ACM Digital Library is needed to access the articles.
Team TIDES Newsletter
The newsletter for the DARPA Translingual Information Detection Extraction and Summarization (TIDES) program. I edited the first two (December 2002 and April 2003) and helped out with the third (October 2003). The April 2003 and October 2003 issues contain articles that I wrote about the surprise language exercises.

Research Software

Some software that I have developed for my research projects can be downloaded from a page that describes the available files.
Last modified: Thu May 31 01:15:34 2007
Doug Oard oard@glue.umd.edu