Four papers accepted at WI 2017

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. The accepted papers are:

“An Evaluation of Models for Runtime Approximation in Link Discovery” by Kleanthi Georgala, Michael Hoffmann, and Axel-Cyrille Ngonga Ngomo.

Abstract: Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.

“CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link Repositories” by Andre Valdestilhas, Tommaso Soru and Axel-Cyrille Ngonga Ngomo.

Abstract: More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including  cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the  provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.

“LOG4MEX: A Library to Export Machine Learning Experiments” by Diego Esteves, Diego Moussallem, Tommaso Soru, Ciro Baron Neto, Jens Lehmann, Axel-Cyrille Ngonga Ngomo and Julio Cesar Duarte.

Abstract: A choice of the best computational solution for a particular task is increasingly reliant on experimentation. Even though experiments are often described through text, tables, and figures, their descriptions are often incomplete or confusing. Thus, researchers often have to perform lengthy web searches for reproducing and understanding the results. In order to minimize this gap, vocabularies and ontologies have been proposed for representing data mining and machine learning (ML) experiments. However, we still lack proper tools to export properly these metadata. To this end, we present an open-source library dubbed LOG4MEX which aims at supporting the scientific community to fulfill this gap.

“GENESIS – A Generic RDF Data Access Interface” by Tim Ermilov, Diego Moussallem, Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo

Abstract: The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present GENESIS, a generic RDF data access interface. GENESIS can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.

The final versions of the papers will be made available soon.

Come over to WI 2017 and enjoy the talks. More information on the program can be found here.

Posted in Announcements, paper presentation, Papers | Tagged , , , | Comments Off on Four papers accepted at WI 2017

AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al., which was presented at LREC 2016 and the second paper titled “How to Configure Statistical Machine Translation with Linked Open Data Resources” of Srivastava et al., which was presented at AsLing 2016.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

SML-Bench 0.2 Released

Dear all,

we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. It already comes with adapters for prominent inductive learning systems like the DL-Learner, the General Inductive Logic Programming System (GILPS), and Aleph, as well as Inductive Logic Programming ‘classics’ like Golem and Progol. The framework is easily extensible, be it in terms of new benchmarking scenarios, or support for new learning systems. SML-Bench allows to define, run and report on benchmarks combining different scenarios and learning systems giving insight into the performance characteristics of the respective inductive learning algorithms on a wide range of learning problems.

Website: http://sml-bench.aksw.org/
GitHub page: https://github.com/AKSW/SML-Bench/
Change log: https://github.com/AKSW/SML-Bench/releases/tag/0.2

In the current release we extended the options to configure learning systems in the overall benchmarking configuration, and added support for running multiple instances of a learning system, as well as the nesting of instance-specific settings and settings that apply to all instances of a learning system. Besides internal refactoring to increase the overall software quality, we also extended the reporting capabilities of the benchmark results. We added a new benchmark scenario and experimental support for the Statistical Relational Learning system TreeLiker.

We want to thank everyone who helped to create this release and appreciate any feedback.

Best regards,

Patrick Westphal, Simon Bin, Lorenz Bühmann and Jens Lehmann

Posted in SMLBench, Software Releases | Comments Off on SML-Bench 0.2 Released

AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Presented at WWW 2017, this work proposes a scalable query processing approach on RDF data that relies on early and aggressive determination and pruning of query-irrelevant data. The paper describes ongoing work as part of the RAPID+ platform project.

Abstract

Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developing techniques for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems.

In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Comments Off on AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

ESWC 2017 accepted two Demo Papers by AKSW members

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                        

1. “KBox Distributing Ready-to-query RDF Knowledge Graphs by Edgard Marx, Ciro Baron, Tommaso Soru and Sandro Athaide Coleho

Abstract: The Semantic Web community has successfully contributed to a remarkable number of RDF datasets published on the Web.However, to use and build applications on top of Linked Data is still a cumbersome and time-demanding task.We present \textsc{KBox}, an open-source platform that facilitates the distribution and consumption of RDF data.We show the different APIs implemented by \textsc{KBox}, as well as the processing steps from a SPARQL query to its corresponding result.Additionally, we demonstrate how \textsc{KBox} can be used to share RDF knowledge graphs and to instantiate SPARQL endpoints.

Please see: https://www.researchgate.net/publication/315838619_KBox_Distributing_Ready-to-query_RDF_Knowledge_Graphs

and

https://www.researchgate.net/publication/305410480_KBox_–_Transparently_Shifting_Query_Execution_on_Knowledge_Graphs_to_the_Edge

2. “EAGLET – All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking“ by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

The desideratum to bridge the unstructured and structured data on theweb has lead to the advancement of a considerable number of annotation tools andthe evaluation of these Named Entity Recognition and Entity Linking systems isincontrovertibly one of the primary tasks. However, these evaluations are mostlybased on manually created gold standards. As much these gold standards have anupper hand of being created by a human, it also has room for major proportionof over-sightedness. We will demonstrate EAGLET, a tool that supports the semi-automatic checking of a gold standard based on a set of uniform annotation rules.

Please also see: https://svn.aksw.org/papers/2017/ESWC_EAGLET_2017/public.pdf

Posted in Announcements, Events | Comments Off on ESWC 2017 accepted two Demo Papers by AKSW members

AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Presented at ISWC 2016, this work extends an ontology based data access (OBDA) system with support for GeoSPARQL for querying geospatial relational databases. In the evaluation section, they compare their approach to Strabon. The work is partially supported by the Optique and Melodies EU projects.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , | Comments Off on AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

AKSW Colloquium, 03.04.2017, RDF Rule Mining

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach for the discovery of Horn clauses in large RDF datasets. The presentation slides will be available at this URL.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LinkingLOD, paper presentation | Tagged , , , , , | Comments Off on AKSW Colloquium, 03.04.2017, RDF Rule Mining

AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data

In the upcoming Colloquium, March the 27th at 3 PM Marvin Frommhold will discuss the paper “PPO & PPM 2.0: Extending the Privacy Preference Framework to provide finer-grained access control for the Web of Data” by Owen Sacco and John G. Breslin published in the I-SEMANTICS ’12 Proceedings of the 8th International Conference on Semantic Systems.

Abstract:  Web of Data applications provide users with the means to easily publish their personal information on the Web. However, this information is publicly accessible and users cannot control how to disclose their personal information. Protecting personal information is deemed important in use cases such as controlling access to sensitive personal information on the Social Semantic Web or even in Linked Open Government Data. The Privacy Preference Ontology (PPO) can be used to define fine-grained privacy preferences to control access to personal information and the Privacy Preference Manager (PPM) can be used to enforce such preferences to determine which specific parts of information can be granted access. However, PPO and PPM require further extensions to create more control when granting access to sensitive data; such as more flexible granularity for defining privacy preferences. In this paper, we (1) extend PPO with new classes and properties to define further fine-grained privacy preferences; (2) provide a new light-weight vocabulary, called the Privacy Preference Manager Ontology (PPMO), to define characteristics about privacy preference managers; and (3) present an extension to PPM to enable further control when publishing and sharing personal information based on the extended PPO and the new vocabulary PPMO. Moreover, the PPM is extended to provide filtering data over SPARQL endpoints.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , | Comments Off on AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data

DBpedia @ Google Summer of Code – GSoC 2017

DBpedia, one of InfAI’s community projects, will be part of the 5th Google Summer of Code program.

The GsoC has the goal to bring students from all over the globe into open source software development. In this regard we are calling for students to be part of the Summer of Codes. During three funded months, you will be able to work on a specific task, which results are presented at the summer.

We aroused your interest in particpation? Great, then check out the DBpedia website for further information.

Posted in Announcements, Call for Students, Events | Comments Off on DBpedia @ Google Summer of Code – GSoC 2017

New GERBIL release v1.2.5 – Benchmarking entity annotation systems

Dear all,
the Smart Data Management competence center at AKSW is happy to announce GERBIL 1.2.5.

GERBIL is a general entity annotation benchmarking system and offers an easy-to-use web-based platform for the agile comparison of annotators using multiple datasets and uniform measuring approaches. To add a tool to GERBIL, all the end user has to do is to provide a URL to a REST interface to its tool which abides by a given specification. The integration and benchmarking of the tool against user-specified datasets is then carried out automatically by the GERBIL platform. Currently, our platform provides results for **20 annotators** and **46 datasets** with more coming.

Website: http://aksw.org/Projects/GERBIL.html
Demo: http://gerbil.aksw.org/gerbil/
GitHub page: https://github.com/AKSW/gerbil
Download: https://github.com/AKSW/gerbil/releases/tag/v1.2.5

New features include:

  • Added annotators (DoSeR, NERFGUN, PBOH, xLisa)
  • Added datasets (Derczynski, ERD14 and GERDAQ, Microposts 2015 and 2016, Ritter, Senseval 2 and 3, UMBC, WSDM 2012)
  • Introduced the RT2KB experiment type that comprises recognition and typing of entities
  • Introduced index based sameAs relation retrieval and entity checking for KBs that do not change very often (e.g., DBpedia). Downloading the indexes is optional and GERBIL can run without them (but has the same performance drawbacks as the last versions).
  • A warning should be shown in the GUI if the server is busy at the moment.
  • Implemented checks for certain datasets and annotators. If dataset files are missing (because of licenses) or API keys of annotators, they are not available in the front end.

We want to thank everyone who helped to create this release, in particular we want to thank Felix Conrads and Jonathan Huthmann. We also acknowledge support by the DIESEL, QAMEL and HOBBIT projects.

We really appreciate feedback and are open to collaborations.
If you happen to have use cases utilizing this dataset, please contact us.

Michael and Ricardo on behalf of the GERBIL team

Posted in Uncategorized | Comments Off on New GERBIL release v1.2.5 – Benchmarking entity annotation systems