AKSW Colloquium, 07.07.2017, Two paper presentations concerning Link Discovery and Knowledge Base Reasoning

At the AKSW Colloquium on Friday 7th of July, at 10:40 AM there will be two paper presentations concerning genetic algorithms to learn linkage rules, and differentiable learning of logical rules for knowledge base reasoning.

Tommaso Soru will present the paper Differentiable Learning of Logical Rules for Knowledge Base Reasoning, currently a pre-print, by Fan Yang, Zhilin Yang, and William W. Cohen.

Abstract

“We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method obtains state-of-the-art results on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.”

Daniel Obraczka will present the paper Learning Expressive Linkage Rules using Genetic Programming of Isele and Bizer accepted at VLDB 2012. This work presents an algorithm to learn record linkage rules utilizing genetic programming.

Abstract

“A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm is capable of generating linkage rules which select discriminative properties for comparison, apply chains of data transformations to normalize property values, choose appropriate distance measures and thresholds and combine the results of multiple comparisons using non-linear aggregation functions. Our experiments show that the GenLink algorithm outperforms the state-of-the-art genetic programming approach to learning linkage rules recently presented by Carvalho et. al. and is capable of learning linkage rules which achieve a similar accuracy as human written rules for the same problem.”

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , , , | Comments Off on AKSW Colloquium, 07.07.2017, Two paper presentations concerning Link Discovery and Knowledge Base Reasoning

SANSA 0.2 (Semantic Analytics Stack) Released

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing for semantic technologies in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples format
  • Reading OWL files in various standard formats
  • Querying and partitioning based on Sparqlify
  • RDFS/RDFS Simple/OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular, the projects Big Data Europe,  HOBBIT , SAKE and Big Data Ocean.

SANSA Development Team

Posted in Uncategorized | Comments Off on SANSA 0.2 (Semantic Analytics Stack) Released

AKSW at ESWC 2017

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group.

Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Jens Lehmann and Edgard Marx participated at the conference. They held a number of presentations, workshops and tutorials:

Michael Röder

Mohamed Ahmed Sherif

Muhammad Saleem

Edgard Marx

  • Presented a Workshop paper „Exploring the Evolution and Provenance of Git Versioned RDF Data“ by Natanael Arndt, Patrick Naumann and Edgard Marx
  • Presented a demo paper „Kbox – Distributing ready-to-query RDF Knowledge Graphs“ by Edgard Marx, Tommaso Soru, Ciro Baron Neto and Sandro Coelho

Claus Stadler

  • Presented a Workshop paper in QuWeDa „JPA Criteria Queries over RDF Data“ by Claus Stadler, Jens Lehmann

The final versions of the papers from Edgard and Claus will be made available soon.

As every year the ESWC also awarded the best papers and studies in several categories. The award for Best Challenge Paper went to: “End-to-end Representation Learning for Question Answering with Weak Supervision” by Daniil Sorokin and Iryna Gurevych. The paper is part of the HOBBIT project by AKSW. Congrats to the winners!

Have a look at all the winners at ESWC 2017: http://2017.eswc-conferences.org/awards.

Posted in Announcements, Call for Paper, Events, HOBBIT, paper presentation, Papers | Tagged , , , , , , , | Comments Off on AKSW at ESWC 2017

Four papers accepted at WI 2017

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. The accepted papers are:

“An Evaluation of Models for Runtime Approximation in Link Discovery” by Kleanthi Georgala, Michael Hoffmann, and Axel-Cyrille Ngonga Ngomo.

Abstract: Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.

“CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link Repositories” by Andre Valdestilhas, Tommaso Soru and Axel-Cyrille Ngonga Ngomo.

Abstract: More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including  cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the  provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.

“LOG4MEX: A Library to Export Machine Learning Experiments” by Diego Esteves, Diego Moussallem, Tommaso Soru, Ciro Baron Neto, Jens Lehmann, Axel-Cyrille Ngonga Ngomo and Julio Cesar Duarte.

Abstract: A choice of the best computational solution for a particular task is increasingly reliant on experimentation. Even though experiments are often described through text, tables, and figures, their descriptions are often incomplete or confusing. Thus, researchers often have to perform lengthy web searches for reproducing and understanding the results. In order to minimize this gap, vocabularies and ontologies have been proposed for representing data mining and machine learning (ML) experiments. However, we still lack proper tools to export properly these metadata. To this end, we present an open-source library dubbed LOG4MEX which aims at supporting the scientific community to fulfill this gap.

“GENESIS – A Generic RDF Data Access Interface” by Tim Ermilov, Diego Moussallem, Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo

Abstract: The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present GENESIS, a generic RDF data access interface. GENESIS can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.

The final versions of the papers will be made available soon.

Come over to WI 2017 and enjoy the talks. More information on the program can be found here.

Posted in Announcements, paper presentation, Papers | Tagged , , , | Comments Off on Four papers accepted at WI 2017

AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al., which was presented at LREC 2016 and the second paper titled “How to Configure Statistical Machine Translation with Linked Open Data Resources” of Srivastava et al., which was presented at AsLing 2016.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

SML-Bench 0.2 Released

Dear all,

we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. It already comes with adapters for prominent inductive learning systems like the DL-Learner, the General Inductive Logic Programming System (GILPS), and Aleph, as well as Inductive Logic Programming ‘classics’ like Golem and Progol. The framework is easily extensible, be it in terms of new benchmarking scenarios, or support for new learning systems. SML-Bench allows to define, run and report on benchmarks combining different scenarios and learning systems giving insight into the performance characteristics of the respective inductive learning algorithms on a wide range of learning problems.

Website: http://sml-bench.aksw.org/
GitHub page: https://github.com/AKSW/SML-Bench/
Change log: https://github.com/AKSW/SML-Bench/releases/tag/0.2

In the current release we extended the options to configure learning systems in the overall benchmarking configuration, and added support for running multiple instances of a learning system, as well as the nesting of instance-specific settings and settings that apply to all instances of a learning system. Besides internal refactoring to increase the overall software quality, we also extended the reporting capabilities of the benchmark results. We added a new benchmark scenario and experimental support for the Statistical Relational Learning system TreeLiker.

We want to thank everyone who helped to create this release and appreciate any feedback.

Best regards,

Patrick Westphal, Simon Bin, Lorenz Bühmann and Jens Lehmann

Posted in SMLBench, Software Releases | Comments Off on SML-Bench 0.2 Released

AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Presented at WWW 2017, this work proposes a scalable query processing approach on RDF data that relies on early and aggressive determination and pruning of query-irrelevant data. The paper describes ongoing work as part of the RAPID+ platform project.

Abstract

Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developing techniques for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems.

In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Comments Off on AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

ESWC 2017 accepted two Demo Papers by AKSW members

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                        

1. “KBox Distributing Ready-to-query RDF Knowledge Graphs by Edgard Marx, Ciro Baron, Tommaso Soru and Sandro Athaide Coleho

Abstract: The Semantic Web community has successfully contributed to a remarkable number of RDF datasets published on the Web.However, to use and build applications on top of Linked Data is still a cumbersome and time-demanding task.We present \textsc{KBox}, an open-source platform that facilitates the distribution and consumption of RDF data.We show the different APIs implemented by \textsc{KBox}, as well as the processing steps from a SPARQL query to its corresponding result.Additionally, we demonstrate how \textsc{KBox} can be used to share RDF knowledge graphs and to instantiate SPARQL endpoints.

Please see: https://www.researchgate.net/publication/315838619_KBox_Distributing_Ready-to-query_RDF_Knowledge_Graphs

and

https://www.researchgate.net/publication/305410480_KBox_–_Transparently_Shifting_Query_Execution_on_Knowledge_Graphs_to_the_Edge

2. “EAGLET – All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking“ by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

The desideratum to bridge the unstructured and structured data on theweb has lead to the advancement of a considerable number of annotation tools andthe evaluation of these Named Entity Recognition and Entity Linking systems isincontrovertibly one of the primary tasks. However, these evaluations are mostlybased on manually created gold standards. As much these gold standards have anupper hand of being created by a human, it also has room for major proportionof over-sightedness. We will demonstrate EAGLET, a tool that supports the semi-automatic checking of a gold standard based on a set of uniform annotation rules.

Please also see: https://svn.aksw.org/papers/2017/ESWC_EAGLET_2017/public.pdf

Posted in Announcements, Events | Comments Off on ESWC 2017 accepted two Demo Papers by AKSW members

AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Presented at ISWC 2016, this work extends an ontology based data access (OBDA) system with support for GeoSPARQL for querying geospatial relational databases. In the evaluation section, they compare their approach to Strabon. The work is partially supported by the Optique and Melodies EU projects.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , | Comments Off on AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

AKSW Colloquium, 03.04.2017, RDF Rule Mining

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach for the discovery of Horn clauses in large RDF datasets. The presentation slides will be available at this URL.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LinkingLOD, paper presentation | Tagged , , , , , | Comments Off on AKSW Colloquium, 03.04.2017, RDF Rule Mining