next up previous
Next: Future Work Up: Adaptive Filtering of Previous: Use of Existing

Results

Since our primary objective is to determine the relative performance of three techniques, we have chosen an effectiveness measure focussed on the portion of the ranked lists produced by the three methods that are likely to show the greatest differences. We expect to find the largest absolute differences in the quality of the ranking near the top of the list. Since this is the portion of the list that is most likely to be examined by users in interactive applications, such an effectiveness measure also yields some insight into the absolute performance that users might observe in such circumstances. The effectiveness measure we have chosen to report is the precision (the fraction of the known relevant documents that have been found) at a fixed value of recall (0.1---the point at which 10% of the known relevant documents have been seen). In our experiments, this recall of 0.1 is achieved after 35, 36 or 8 relevant documents (for topics SP22, SP25 and SP47 respectively) have been found.

We used the SMART text retrieval system, modified locally to include the LSI-mean technique, for our experiments. We substituted the morphological roots provided by the Rank Xerox morphological tagger for SMART stemming because in future Vector Translation experiments we plan to exploit the resulting compatibility with the structure of existing bilingual dictionaries. For the runs reported in this section we built the documents vectors using individual words and did not take advantage of the phrases or part-of-speech tags added by Rank Xerox.gif We have not yet completed the runs for topic 290 because of a delay in obtaining translated versions of the relevant documents that were processed under conditions identical to those used in the remainder of the runs.

Table 3 shows the precision at 0.1 recall for the three cross-language text filtering techniques, Cross-Language Latent Semantic Indexing (CL-LSI), Vector Translation (VT) and Text Translation (TT), and the remaining four columns provide performance bounds that are useful for comparison. The ``chance'' column provides a theoretic lower bound on performance, showing the precision that would be expected at any level of recall if documents were selected manually. The ``lower'' column shows an observed lower bound on the observed precision at 0.1 recall that results from the effect of proper names and other lexical items that are the same in both languages. This observed lower bound is computed by repeating the TT experiment, but omitting the machine translation step shown in Figure 2. The ``TT-max'' column shows the observed upper bound on the performance of Text Translation. In order to obtain comparable results, in our ``TT'' (and ``VT'') experiments we performed the SVD on the Spanish portion of the same bilingual document collection that was used to produce bilingual documents for the CL-LSI SVD. So the ``TT'' (and ``VT'') results include the same domain shift that that the ``CL-LSI'' experiment unavoidably incurs. The ``TT-max'' values show the precision at 0.1 recall that is obtained when TT SVD is constructed using documents from the evaluation domain (El Norte articles). Finally, the ``upper'' column shows the observed upper bound when El Norte documents are used for the SVD, profile training and evaluation. This represents the monolingual performance of the LSI-mean filtering technique on each Spanish topic.

  
Table 3: Adaptive multilingual text filtering experiment results.

The most significant observations that we can draw from these results is that adaptive multilingual text filtering appears to be practical and that the corpora we used are adequate to demonstrate this. Both corpus-based techniques and dictionary-based techniques have demonstrated much better performance than the lower bound runs on these topic pairs, despite the limitations in our ability to accurately measure absolute performance that results from the topic and domain shifts.

Another interesting observation is that the results without cross-language mapping exhibit a surprising amount of variation. We attribute this effect to the existence of words which are common to Spanish and English that are useful for recognizing documents that are relevant to some topics. This observation has led us to conclude that when the available corpora limit a cross-language text filtering or retrieval experiment to a small number of topics, a baseline run with no cross-language component is a simple way to gain some useful insight into the significance of the results.

The ``TT-max'' figures in Table 3 show the effect of the domain shift. In two cases out of three, the domain shift between the UN collection and the El Norte collection appears to be substantial but not overwhelming. The lack of a clear domain shift effect in the third case is at least partially explained by low upper bound on the effectiveness of the LSI-mean technique itself on topic SP25. This poor performance could result from a number of factors, but one possible explanation is that the relevant documents may have a multimodal distribution in the reduced rank LSI vector space.

Table 4 shows preliminary results which provide bounds on the magnitude of the topic shift effect. Results for a fourth topic pair which we tried, SP10/022, are shown as well in order to illustrate the topic shift effect clearly. Although it appeared from a manual inspection of the topic descriptions that topics SP10 and 022 were as similar as any of the other pairs we had chosen, these results clearly reveal that SP10/022 is not a useful topic pair. Again, the SP25/128 topic pair yields unusual and as yet unexplained results, actually increasing precision when translation errors are introduced. The remaining two topic pairs show relatively large topic shift effects (although these are only upper bounds) after considering the relatively small translation error effects.

  
Table 4: Preliminary topic shift results.



next up previous
Next: Future Work Up: Adaptive Filtering of Previous: Use of Existing



Douglas W. Oard
Tue May 13 20:29:24 EDT 1997