Information Retrieval Systems (IRS) aim at retrieving information that meet a user’s need expressed in a query. Retrieving relevant information to a query implies a two step process: off line, the system indexes documents, generally using a bag of words representation; online, the system computes the similarity between the user’s query and the document representations (indexing terms) to retrieve the most similar documents. Current IRS, e.g. search engines on the web are general search tools implementing the same mechanisms and the same methods of data processing and matching, whatever the context of the search, the user, the type of information needs, or information usage are.

The assumption of the project CAAS is that context could improve the performances of the IRS, explicating certain elements of the information retrieval. The contextual aspect refers to tacit or explicit knowledge concerning the intentions of users, the environment of users and the system itself

The fundamental scientific issues that we can quote are:

To tackle these challenges, CAAS will consider the various aspects that may impact the IR process first as independently as possible, then considering the cross-effects. We will focus on the following contextual elements:

For each of them, we will consider various collections and will qualify them (defining features and extracting these features), then we will analyse them deeply in the aim of extracting models and behaviour. Once each contextual element will be analysed, we will consider the cross effect. For example, one of the results could be that query reformulation using relevance feedback is useful when the query contains proper nouns.

We will consider both benchmark collections from international program and more realistic collections from companies.

The analysis and extraction of models is the core of the project. However, we also aim at developing modules from our findings. These modules will be integrated in IR platforms so that they could be re-used as components of complete IR systems. Because analysis and modelling is the core of the project, the partners are all academics. This do not means that companies are not considered: they will be implied in several ways: first we have contacted one major IR web search engine who will provide us with query logs; we also contacted smaller companies who also will provide us with query logs that we will use during the project. Companies will also be considered in the spreading results activities: we will contact different companies in order to show our finding and either will suggest customizing the developed modules for them or transferring the technologies. For example, one application is to suggest adds to be associated to users’ queries in a web site.

To tackle the challenges, the consortium is composed of two institutes in computer sciences, both specialists in IR, but with complementary skills. LIA (Laboratoire Informatique Avignon) works on Question Answering problems, while IRIT (Institut de Recherche en Informatique de Toulouse) is more specialists in Adhoc retrieval and detecting novelty. IRIT works in close relation with IMT (Institut de Mathématique de Toulouse) and for this project with the Statistique et Probabilité group. Even if IMT does not appear as a partner they will be working in this project. Indeed IRIT and IMT are partners in the Plan Pluri-Formation FREMIT which aims at developing collaborative work. CLLE (Cognition, Langues, Langage, Ergonomie) is partner of this project regarding their linguistic skills and past work in IR (Mothe and Tanguy, 2005) and natural language processing.