|
The "logicist Corpuses" project has three complementary and interdependent aims,
serving knowledge cumulativity in the field of human sciences. The first is to
develop methods and tools for constituting so-called "logicist"
corpuses, i.e. corpuses consisting of documents structured in data and interpretation
rules, the latter being understood as inference operations performed to
generate conclusions or interpretative hypotheses. These corpuses will have a
double function: as a decision-making aid to help researchers with their
scientific interpretation, and a documentary function to manage and share all
the data used to support the interpretations proposed. Thus they will directly
contribute to the process of knowledge accumulation as well as to a research
dynamic. The second is to constitute, based on this model, archaeology of
techniques corpuses, a field of excellence in Europe. The third aim
is to develop an automatic annotation tool based on an ontology enabling
queries to be effected on logicist corpuses both on the interpretation rules
and data.
To satisfy these aims it is planned
first of all a) to transform into logicist documents a significant number of
scientific texts concerning the archaeology of techniques, b) collect and index
data connected to these scientific constructs, c) translate these documents in
English or French (depending on the original language), d) assess the rules
produced in terms of transferability in order to produce corpuses consisting of
rules with a "universal" or "local" character. In parallel,
as far as publishing tools are concerned, it is planned to develop 1)
annotation tools, and 2) consultation interfaces. Rules and data annotation
will be done semi-automatically thanks to the use of automatic language
processing and semantic annotation tools based on an ontology. The rule and
data reading interface will comprise a) a natural language interrogation
device, b) an editing device for the answers giving a reading by levels, in the
sense that it will possible to read the different rules relating to the
question asked, and then their premises, thereby checking the solidity of the
available rules and data by consulting the original publications.
The scientific community
expects: a rapid reading of the rules used by the researchers to obtain or
support a result, easily understandable scientific reasoning, and, in return,
better sharing of knowledge within the discipline, exhaustive access to the
databases on which the scientific constructions of a field are based, the
auto-archiving of research data and a solution for the perpetuation of the
indexing of the data. The technologist community is especially sensitive to
these expectations, in so far as the present publishing process does not
allow experimental data to be generally shared even though these data are
indispensable to the dynamic of their research.
Furthermore, the constitution
of logicist corpuses in the field of technology should serve as a model for the type
of corpus which could be developed in the field of human sciences, and could in
this way play a direct role in an effective accumulation of knowledge.
|