RoD - FRAGG

Présentation des journées

Ces journées sont organisées le 6 et le 7 juillet au matin à l'occasion du symposium MADICS ayant lieu en visioconférence du 6 au 9 juillet 2020.

L'objectif est de réunir les chercheurs intéressés pour échanger et discuter autour des avancées récentes dans le domaine du raisonnement sur les données. Ces journées seront l'occasion de présenter de nouvelles approches mais également de diffuser auprès de la communauté francophone des travaux récemment publiés dans des conférences internationales (IJCAI, ECAI, ISWC, ESWC, K-CAP, KR ...) .

Les journées portent sur les techniques de représentation de connaissances et de raisonnement permettant d'exploiter au mieux les données. Les sujets d'intéret sont, sans s'y limiter, le développement d’algorithmes efficaces pour l’interrogation, l’intégration, l’analyse et le liage de données hétérogènes et de qualité variable.

Comité d'organisation

Nathalie HERNANDEZ, IRIT, UT2J
Marie-Laure MUGNIER, LIRMM, INS2i
Marie-Christine ROUSSET, LIG, INS2i
Catherine ROUSSEY, INRAE
Fatiha SAIS, LRI, INS2I, Université Paris Saclay

Liste des présentations

Camille Bourgaux et Meghyn Bienvenu. Querying and Repairing Inconsistent Prioritized Knowledge Bases: Complexity Analysis and Links with Abstract Argumentation publié à KR2020

In this presentation, we explore the issue of inconsistency handling over prioritized knowledge bases (KBs), which consist of an ontology, a set of facts, and a priority relation between conflicting facts. In the database setting, a closely related scenario has been studied and led to the definition of three different notions of optimal repairs (global, Pareto, and completion) of a prioritized inconsistent database. After transferring the notions of globally-, Pareto- and completion-optimal repairs to our setting, we study the data complexity of the core reasoning tasks: query entailment under inconsistency-tolerant semantics based upon optimal repairs, existence of a unique optimal repair, and enumeration of all optimal repairs. Our results provide a nearly complete picture of the data complexity of these tasks for ontologies formulated in common DL-Lite dialects. The second contribution of our work is to clarify the relationship between optimal repairs and different notions of extensions for (set-based) argumentation frameworks. Among our results, we show that Pareto-optimal repairs correspond precisely to stable extensions (and often also to preferred extensions), and we propose a novel semantics for prioritized KBs which is inspired by grounded extensions and enjoys favourable computational properties.

Julien Romero, Nicoleta Preda, Antoine Amarilli and Fabian M. Suchanek. Réécriture de requêtes atomiques par des vues chemins publié à ESWC 2020

Les vues permettent de faire des requêtes paramétrées sur les bases de données. Elles sont notamment utilisées pour modéliser des services webs. Pour répondre à une requête précise, il convient de les orchestrer en des plans d'exécutions. Dans cette présentation, nous verrons comment il est possible de combiner efficacement des vues dites chemins afin d'obtenir des plans d'exécution équivalents à une requête atomique. Nous donnerons un algorithme correct et complet basé sur des grammaires contextuelles afin de générer des plans d'exécution équivalents et nous le comparerons à d'autres systèmes existants.

Arnaud Grall, Thomas Minier, Hala Skaf-Molli and Pascal Molli. Processing SPARQL Aggregate Queries with Web Preemption publié à ESWC 2020

Executing aggregate queries on the web of data allows to compute useful statistics ranging from the number of properties per class in a dataset to the average life of famous scientists per country. However, processing aggregate queries on public SPARQL endpoints is challenging, mainly due to quotas enforcement that prevents queries to deliver complete results. Existing distributed query engines allow to go beyond quota limitations, but their data transfer and execution times are clearly prohibitive when processing aggregate queries. Following the web preemption model, we define a new preemptable aggregation operator that allows to suspend and resume aggregate queries. Web preemption allows to continue query execution beyond quota limits and server-side aggregation drastically reduces data transfer and execution time of aggregate queries. Experimental results demonstrate that o r approach outperforms existing approaches by orders of magnitude in terms of execution time and the amount of transferred data.

Victor Charpenay and Sebastian Käbisch. L'annotation « sémantique » de documents - L'exemple du web des objets publié à ESWC 2020

En avril 2020, après quatre années de travail, le W3C publiait un standard pour le web des objets visant à uniformiser la représentation des objets connectés sur le web. Ce standard, le modèle "Thing Description" (TD), a dû répondre à des questions de représentation de la connaissance, notamment du fait que non seulement des humains mais aussi les objets connectés eux-mêmes sont susceptibles d'interpréter le nom donné à une "Thing" (à un objet) et agir en conséquence. Dans un esprit de compromis, le groupe de travail W3C pour le web des objets a adopté l'approche d'une simple annotation « sémantique » du modèle TD : c'est cette approche qui sera le sujet principal de cette présentation.

Thomas Pellissier Tanon, Gerhard Weikum and Fabian M. Suchanek. YAGO 4: A Reasonable Knowledge Base publié à ESWC 2020

YAGO is one of the large knowledge bases in the LinkedOpen Data cloud. In this resource paper, we present its latest version, YAGO 4, which reconciles the rigorous typing and constraints of schema.org with the rich instance data of Wikidata. The resulting resource contains 2 billion type-consistent triples for 64 Million entities, and has a consistent ontology that allows semantic reasoning with OWL2 description logics.

Thu Huong Nguyen and Andrea G.B. Tettamanzi. Using Grammar-based Genetic Programming for Mining Disjointness Axioms Involving Complex Class Expressions publié à ICCS 2020

In the context of the Semantic Web, learning implicit knowledge in terms of axioms from Linked Open Data has been the object of much current research. We propose a method using grammar-based genetic programming to automatically discover disjointness axioms between concepts from the Web of Data. A training-testing model is implemented to overcome the lack of benchmarks and comparable research. The acquisition of axioms is performed on a small sample of DBpedia with the help of a Grammatical Evolution algorithm. The accuracy evaluation of mined axioms is carried out on the whole DBpedia. Experimental results show that the proposed method gives high accuracy in mining class disjointness axioms involving complex expressions.

Luis Palacios, Yue Ma, Chantal Reynaud and Gaëlle Lortal. Knowledge Based Situation Discovery for Avionics Maintenance publié à KCAP 2019

"For knowledge intensive domains, such as Avionics Maintenance, applying automated analysis comes with a major challenge: formalizing complex domain knowledge and conceiving suitable automated algorithms for real world requirements. In this paper, we propose a study on knowledge discovery to assist avionics maintenance via identifying meaningful Description Logic based complex concepts, called situation discovery, that corresponds to crucial scenarios during device repair. We propose an approach to automatic learning of relevant situations hidden in an ontology, in an unsupervised way. Distinct from ontology based concept learning, where a set of instances is given as positive examples of a target concept, the challenge of learning hidden situations consists in discovering significant situations from exponentially many unknown situations. In this paper, we formalize the problem and study some related complexity results as well as the algorithms to solve the problem, together with its application to Avionics Maintenance. The approach has been integrated into an enterprise system and achieves the state-of-the-art result in this application."

Manuel Atencia, Jérôme David and Jérôme Euzenat. Several link keys are better than one, or Extracting disjunctions of link key candidates publié à Kcap 2019

Link keys express conditions under which instances of two classes of different RDF data sets may be considered as equal. As such, they can be used for data interlinking. There exist algorithms to extract link key candidates from RDF data sets and different measures have been defined to evaluate the quality of link key candidates individually. For certain data sets, however, it may be necessary to use more than one link key on a pair of classes to retrieve a more complete set of links. To this end, in this paper, we define disjunction of link keys, propose strategies to extract disjunctions of link key candidates from RDF data, and apply existing quality measures to evaluate them. We also report on experiments with these strategies.

Joe Raad, Erman Acar and Stefan Schlobach. On the Impact of sameAs on Schema Matching publié à KCAP 2019

In a large and decentralised knowledge representation system such as the Web of Data, it is common for data sets to overlap. In the absence of a central naming authority, semantic heterogeneity is inevitable as such overlapping contents are described using different schemas. To overcome this problem, a number of solutions have automated the integration of these data sets by matching their schemas. In this work, we focus on a specific category of these solutions that relies on the concepts' extension for matching the schemas (i.e., instance-based methods). Rather than introducing a new approach for the task of schema matching, this work studies the impact of exploiting the semantics of owl:sameAs in such instance-based methods. For this empirical analysis, we investigate more than 900K concepts extracted from the Web, and make use of over 35B implicit identity assertions to study their impact. The experiments show that despite the growing doubts over their quality, exploiting owl:sameAs assertions extracted from the Web can improve instance-based schema matching techniques.

Arnaud Giacometti, Beatrice Markhoff and Arnaud Soulet. Découverte de cardinalités maximales significatives dans des bases de connaissances publié à ISWC 2019

Le Web sémantique comprend de grosses bases de connaissances générées à partir de plateformes collaboratives ou par intégration de plusieurs ressources hétérogènes. Ces bases de connaissances peuvent contenir des données erronées et des informations y manquent. Il est important de les doter automatiquement d'indications fiables permettant d'évaluer la fiabilité des résultats que nous obtenons d'elles lorsque nous les interrogeons. Dans cet article nous présentons une méthode de découverte de contraintes de cardinalité maximale significatives, lorsqu'il est possible de les calculer à partir de son contenu. Nous proposons également un algorithme appelé C3M qui réalise une recherche exhaustive de telles contraintes avec des critères d'élagage puissants. Nous utilisons pour cela l'inégalité de Hoeffding. Le programme qui en résulte interroge les bases de connaissances en ligne. Il a été testé sur DBpedia entier avec une précision supérieure à 95%, et sur plusieurs autres ressources. Il est librement utilisable.

Jérémy Lhez, Chan Le Duc, Thinh Dong and Myriam Lamolle. Decentralized Reasoning on a Network of Aligned Ontologies with Link Keys publié à ISWC 2019

Link keys are recently introduced to formalize data interlinking between data sources. They are considered as a new kind of correspondences included in ontology alignments. We propose a procedure for reasoning in a decentralized manner on a network of ontologies with alignments containing link keys. In this paper, the ontologies involved in such a network are expressed in the logic ALC while the alignments can contain concept, individual and link key correspondences equipped with a loose semantics. The decentralized aspect of our procedure is based on a process of knowledge propagation through the network via correspondences. This process allows to reduce polynomially global reasoning to local reasoning.

Sébastien Ferré. Link Prediction in Knowledge Graphs with Concepts of Nearest Neighbourspublié à ESWC 2019

The open nature of Knowledge Graphs (KG) often implies that they are incomplete. Link prediction consists in inferring new links between the entities of a KG based on existing links. Most existing approaches rely on the learning of latent feature vectors for the encoding of entities and relations. In general however, latent features cannot be easily interpreted. Rule-based approaches offer interpretability but a distinct ruleset must be learned for each relation, and computation time is difficult to control. We propose a new approach that does not need a training phase, and that can provide interpretable explanations for each inference. It relies on the computation of Concepts of Nearest Neighbours (CNN) to identify similar entities based on common graph patterns. Dempster-Shafer theory is then used to draw inferences from CNNs. We evaluate our approach on FB15k-237, a challenging benchmark for link prediction, where it gets competitive performance compared to existing approaches.

Programme

L'orateur est indiqué en gras.

Lundi 6 juillet de 9h à 12h30

9h00 : accueil
9h15-10h45
1. Jérémy Lhez, Chan Le Duc, Thinh Dong and Myriam Lamolle. Decentralized Reasoning on a Network of Aligned Ontologies with Link Keys (publié à ISWC 2019) Présentation
2. Joe Raad, Erman Acar and Stefan Schlobach. On the Impact of sameAs on Schema Matching (publié à KCAP 2019) Présentation
3. Manuel Atencia, Jérôme David and Jérôme Euzenat. Several link keys are better than one, or Extracting disjunctions of link key candidates (publié à Kcap 2019) Présentation
10h45-11h : Pause
11h00-12h30
1. Thomas Pellissier Tanon, Gerhard Weikum and Fabian M. Suchanek. YAGO 4: A Reasonable Knowledge Base (publié à ESWC 2020) Présentation
2. Arnaud Giacometti, Beatrice Markhoff and Arnaud Soulet. Découverte de cardinalités maximales significatives dans des bases de connaissances (publié à ISWC 2019) Présentation
3. Thu Huong Nguyen and Andrea G.B. Tettamanzi. Using Grammar-based Genetic Programming for Mining Disjointness Axioms Involving Complex Class Expressions (publié à ICCS 2020) Présentation

Mardi 7 juillet de 9h à 12h30

9h00-9h15 : accueil
9h15-10h45

Sébastien Ferré. Link Prediction in Knowledge Graphs with Concepts of Nearest Neighbours (publié à ESWC 2019) Présentation
Camille Bourgaux et Meghyn Bienvenu. Querying and Repairing Inconsistent Prioritized Knowledge Bases: Complexity Analysis and Links with Abstract Argumentation (publié à KR2020) Présentation
Luis Palacios, Yue Ma, Chantal Reynaud and Gaëlle Lortal. Knowledge Based Situation Discovery for Avionics Maintenance (publié à KCAP 2019) Présentation

10h45-11h : Pause
11h00-12h30
1. Arnaud Grall, Thomas Minier, Hala Skaf-Molli and Pascal Molli. Processing SPARQL Aggregate Queries with Web Preemption (publié à ESWC 2020) Présentation
2. Julien Romero, Nicoleta Preda, Antoine Amarilli and Fabian M. Suchanek. Réécriture de requêtes atomiques par des vues chemins (publié à ESWC 2020) Présentation
3. Victor Charpenay and Sebastian Käbisch. L'annotation « sémantique » de documents - L'exemple du web des objets (publié à ESWC 2020)
12h30-12h45 Bilan et perspectives

Journées RoD (Raisonner sur les Données) action soutenue par les GDR Madics et GDR-IA

"Approches récentes pour le raisonnement sur les données"

Le 6 juillet et le 7 juillet de 9h à 12h30 - en visioconference - dans le cadre du symposium MADICS du 6 au 9 juillet