IRIT - UMR 5505



Tie-breaking Bias in IR

Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

  • The Issue: Chance Influences IR System Evaluation

    The trec_eval program reorders runs before evaluating them.

    Tied documents (i.e., retrieved with the same sim value) are untied according to docno.

    As a result, relevant documents get ranked higher than intially just because of their name!

    measure_value = f (model_quality, chance)

    So, the problem is: Measure values depend on both model quality, and luck. This implies that an observed performance improvement may be due to chance only!

  • Demonstration of the Tie-breaking Biais Using trec_eval

    For skeptical readers, let us illustrate the tie-breaking bias.

    Bob and Alice retrieved the same sequence of documents [NonRel(0.8), Rel(0.8), NonRel(0.5].

    They obtain, however, different results: Alice's are twice as good compared to Bob's! This is because Alice's WSJ5 is reranked from position #2 to position #1.

    Try it by yourseft: download input.Alice, input.Bob, and qrels.

    Then, Bob conducts failure analysis all night long, modifies his system, produces the input.Bob2 run, and ... gets better results in the end.

    However, this improvement is only due to luck because his new system input.Bob2 retrieved one relevant document (WSJ4) instead of an already relevant document (AP8) in input.Bob. So the only difference is: document name. We consider that this gain is unfair.

    Try it by yourseft: download input.Bob2.

  • Our Proposal: Realistic Reordering Strategy for Conducting Fairer Evaluations

    For the details, please check our papers below ...

  • Scientific References

    Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. Tie-breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation. CLEF'10: Conference on Multilingual and Multimodal Information Access Evaluation, Maristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke (Eds.), Springer, LNCS 6360, p. 112-123, Septembler 2010.
    download paper

    Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. Impact du « biais des ex aequo » dans les évaluations de Recherche d'Information. CORIA'10: Conférence francophone en Recherche d'Information et Applications, pp. 83-98, March 2010.
    download paper, download talk

  • Contact Information

    Guillaume Cabanac, PhD

    IRIT - Institute of Computer Science, Toulouse University, France
    Generalized Information Systems team