next up previous
Next: Use of Existing Up: Adaptive Filtering of Previous: Adaptive Multilingual Text

Ideal Experiment Design

Evaluation of the three adaptive multilingual text filtering systems has been our greatest challenge. Experiments of the type we are conducting require a document collection for which relevance judgments are available, so it would be ideal if a large collection existed in which every document has versions in two languages and relevance judgments with respect to a number of standardized topics. The United Nations collection that we have used for language training satisfies the first part of this requirement, but there are no standard topics (and hence no relevance judgments) defined for that collection.

While we would ultimately like to provide users with systems which exploit new training data incrementally, when evaluating the filtering effectiveness of the algorithms themselves it suffices to introduce an artificial division between the construction of a profile and the use of that profile to rank order documents. This amounts to setting the ``initial profile'' in Figures 1, 2 and 4 to an empty vector, passing the known relevant documents in the profile training collection through the system to develop a profile, and then freezing this profile and using it to rank order the documents in the evaluation collection. If we had access to the sort of ideal test collection described above we could easily create language training, profile training, and evaluation collections by simply partitioning the collection three ways. Three partitions are needed to perform a fair experiment because it is not possible to use either the training collection or the evaluation collection for language training. In practical applications, cross-language techniques would simply not be needed if the documents in either of these partitions were already available in both languages. Of course, relevance judgments are not needed for the documents in the language training partition.



Douglas W. Oard
Tue May 13 20:29:24 EDT 1997