AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Presented at WWW 2017, this work proposes a scalable query processing approach on RDF data that relies on early and aggressive determination and pruning of query-irrelevant data. The paper describes ongoing work as part of the RAPID+ platform project.

Abstract

Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developing techniques for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems.

In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Comments Off on AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

ESWC 2017 accepted two Demo Papers by AKSW members

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                        

1. “KBox Distributing Ready-to-query RDF Knowledge Graphs by Edgard Marx, Ciro Baron, Tommaso Soru and Sandro Athaide Coleho

Abstract: The Semantic Web community has successfully contributed to a remarkable number of RDF datasets published on the Web.However, to use and build applications on top of Linked Data is still a cumbersome and time-demanding task.We present \textsc{KBox}, an open-source platform that facilitates the distribution and consumption of RDF data.We show the different APIs implemented by \textsc{KBox}, as well as the processing steps from a SPARQL query to its corresponding result.Additionally, we demonstrate how \textsc{KBox} can be used to share RDF knowledge graphs and to instantiate SPARQL endpoints.

Please see: https://www.researchgate.net/publication/315838619_KBox_Distributing_Ready-to-query_RDF_Knowledge_Graphs

and

https://www.researchgate.net/publication/305410480_KBox_–_Transparently_Shifting_Query_Execution_on_Knowledge_Graphs_to_the_Edge

2. “EAGLET – All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking“ by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

The desideratum to bridge the unstructured and structured data on theweb has lead to the advancement of a considerable number of annotation tools andthe evaluation of these Named Entity Recognition and Entity Linking systems isincontrovertibly one of the primary tasks. However, these evaluations are mostlybased on manually created gold standards. As much these gold standards have anupper hand of being created by a human, it also has room for major proportionof over-sightedness. We will demonstrate EAGLET, a tool that supports the semi-automatic checking of a gold standard based on a set of uniform annotation rules.

Please also see: https://svn.aksw.org/papers/2017/ESWC_EAGLET_2017/public.pdf

Posted in Announcements, Events | Comments Off on ESWC 2017 accepted two Demo Papers by AKSW members

AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Presented at ISWC 2016, this work extends an ontology based data access (OBDA) system with support for GeoSPARQL for querying geospatial relational databases. In the evaluation section, they compare their approach to Strabon. The work is partially supported by the Optique and Melodies EU projects.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , | Comments Off on AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

AKSW Colloquium, 03.04.2017, RDF Rule Mining

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach for the discovery of Horn clauses in large RDF datasets. The presentation slides will be available at this URL.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LinkingLOD, paper presentation | Tagged , , , , , | Comments Off on AKSW Colloquium, 03.04.2017, RDF Rule Mining

AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data

In the upcoming Colloquium, March the 27th at 3 PM Marvin Frommhold will discuss the paper “PPO & PPM 2.0: Extending the Privacy Preference Framework to provide finer-grained access control for the Web of Data” by Owen Sacco and John G. Breslin published in the I-SEMANTICS ’12 Proceedings of the 8th International Conference on Semantic Systems.

Abstract:  Web of Data applications provide users with the means to easily publish their personal information on the Web. However, this information is publicly accessible and users cannot control how to disclose their personal information. Protecting personal information is deemed important in use cases such as controlling access to sensitive personal information on the Social Semantic Web or even in Linked Open Government Data. The Privacy Preference Ontology (PPO) can be used to define fine-grained privacy preferences to control access to personal information and the Privacy Preference Manager (PPM) can be used to enforce such preferences to determine which specific parts of information can be granted access. However, PPO and PPM require further extensions to create more control when granting access to sensitive data; such as more flexible granularity for defining privacy preferences. In this paper, we (1) extend PPO with new classes and properties to define further fine-grained privacy preferences; (2) provide a new light-weight vocabulary, called the Privacy Preference Manager Ontology (PPMO), to define characteristics about privacy preference managers; and (3) present an extension to PPM to enable further control when publishing and sharing personal information based on the extended PPO and the new vocabulary PPMO. Moreover, the PPM is extended to provide filtering data over SPARQL endpoints.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , | Comments Off on AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data

DBpedia @ Google Summer of Code – GSoC 2017

DBpedia, one of InfAI’s community projects, will be part of the 5th Google Summer of Code program.

The GsoC has the goal to bring students from all over the globe into open source software development. In this regard we are calling for students to be part of the Summer of Codes. During three funded months, you will be able to work on a specific task, which results are presented at the summer.

We aroused your interest in particpation? Great, then check out the DBpedia website for further information.

Posted in Announcements, Call for Students, Events | Comments Off on DBpedia @ Google Summer of Code – GSoC 2017

New GERBIL release v1.2.5 – Benchmarking entity annotation systems

Dear all,
the Smart Data Management competence center at AKSW is happy to announce GERBIL 1.2.5.

GERBIL is a general entity annotation benchmarking system and offers an easy-to-use web-based platform for the agile comparison of annotators using multiple datasets and uniform measuring approaches. To add a tool to GERBIL, all the end user has to do is to provide a URL to a REST interface to its tool which abides by a given specification. The integration and benchmarking of the tool against user-specified datasets is then carried out automatically by the GERBIL platform. Currently, our platform provides results for **20 annotators** and **46 datasets** with more coming.

Website: http://aksw.org/Projects/GERBIL.html
Demo: http://gerbil.aksw.org/gerbil/
GitHub page: https://github.com/AKSW/gerbil
Download: https://github.com/AKSW/gerbil/releases/tag/v1.2.5

New features include:

  • Added annotators (DoSeR, NERFGUN, PBOH, xLisa)
  • Added datasets (Derczynski, ERD14 and GERDAQ, Microposts 2015 and 2016, Ritter, Senseval 2 and 3, UMBC, WSDM 2012)
  • Introduced the RT2KB experiment type that comprises recognition and typing of entities
  • Introduced index based sameAs relation retrieval and entity checking for KBs that do not change very often (e.g., DBpedia). Downloading the indexes is optional and GERBIL can run without them (but has the same performance drawbacks as the last versions).
  • A warning should be shown in the GUI if the server is busy at the moment.
  • Implemented checks for certain datasets and annotators. If dataset files are missing (because of licenses) or API keys of annotators, they are not available in the front end.

We want to thank everyone who helped to create this release, in particular we want to thank Felix Conrads and Jonathan Huthmann. We also acknowledge support by the DIESEL, QAMEL and HOBBIT projects.

We really appreciate feedback and are open to collaborations.
If you happen to have use cases utilizing this dataset, please contact us.

Michael and Ricardo on behalf of the GERBIL team

Posted in Uncategorized | Comments Off on New GERBIL release v1.2.5 – Benchmarking entity annotation systems

DBpedia Open Text Extraction Challenge – TextExt

DBpedia, a community project affiliated with the Institute for Applied Informatics (InfAI) e.V., extract structured information from Wikipedia & Wikidata. Now DBpedia started the DBpedia Open Text Extraction Challenge – TextExt. The aim is to increase the number of structured DBpedia/Wikipedia data and to provide a platform for benchmarking various extraction tools. DBpedia wants to polish the knowledge of Wikipedia and then spread it on the web, free and open for any IT users and businesses.

Procedure

Compared to other challenges, which are often just one time calls, the TextExt is a continuous challenge focusing on lasting progress and exceeding limits in a systematic way. DBpedia provides the extracted and cleaned full text for all Wikipedia articles from 9 different languages in regular intervals for download and as Docker in the machine readable NIF-RDF format (Example for Barrack Obama in English). Challenge participants are asked to wrap their NLP and extraction engines in Docker images and submit them to the DBpedia-Team. They will run participants’ tools in regular intervals in order to extract:

  1. Facts, relations, events, terminology, ontologies as RDF triples (Triple track)
  2. Useful NLP annotations such as pos-tags, dependencies, co-reference (Annotation track)

DBpedia allows submissions 2 months prior to selected conferences (currently http://ldk2017.org/ and http://2017.semantics.cc/ ). Participants that fulfil the technical requirements and provide a sufficient description will be able to present at the conference and be included in the yearly proceedings. Each conference, the challenge committee will select a winner among challenge participants, which will receive 1.000 €.

Results

Starting in December 2017, DBpedia will publish a summary article and proceedings of participants’ submissions at http://ceur-ws.org/ every year.

For further news and next events please have a look at http://wiki.dbpedia.org/textext or contact DBpedia via email dbpedia-textext-challenge@infai.org.

The project was created with the support of the H2020 EU project HOBBIT (GA-688227) and ALIGNED (GA-644055) as well as the BMWi project Smart Data Web (GA-01MD15010B)

Challenge Committee

  • Sebastian Hellmann, AKSW, DBpedia Association, KILT Competence Center, InfAI e.V., Leipzig
  • Sören Auer, Fraunhofer IAIS, University of Bonn
  • Ricardo Usbeck, AKSW, Simba Competence Center, Leipzig University
  • Dimitris Kontokostas, AKSW, DBpedia Association, KILT Competence Center, InfAI e.V., Leipzig
  • Sandro Coelho, AKSW, DBpedia Association, KILT Competence Center, InfAI e.V., Leipzig

 

Posted in Announcements, dbpedia | Comments Off on DBpedia Open Text Extraction Challenge – TextExt

The USPTO Linked Patent Dataset release

Dear all,

We are happy to announce USPTO Linked Patent Dataset release.

Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150, 000 patents to individuals and companies all over the world. In fact, there were more than 200, 000 patent grants issued in the US in 2013. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient.

Our dataset is the output of converting USPTO XML patents data into RDF from the years 2002 – 2016. This supports the integration with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements.

The USPTO Linked Patent Dataset contains 13,014,651 entities where 2,355,579 are patents. Other entities represent Applicant, Inventor, Agent, Examiner (primary and secondary),  and assignee. All these entities amount to c.a. 168 million triples are describing the patents information.

The complete description of the dataset and SPARQL endpoint are available on the DataHub: https://datahub.io/dataset/linked-uspto-patent-data.

We really appreciate feedback and are open to collaborations.
If you happen to have use cases utilizing this dataset, please contact us.

 

Posted in Dataset Release | Tagged , , , | Comments Off on The USPTO Linked Patent Dataset release

Two accepted papers in ESWC 2017

Hello Community! We are very pleased to announce the acceptance of two papers in ESWC 2017 research track. The ESWC 2017 is to be held in Portoroz, Slovenia from 28th of May to the 1st of June. In more detail, we will present the following papers:

  1. “WOMBAT – A Generalization Approach for Automatic Link Discovery” Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, Jens Lehmann

    Abstract. A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating WOMBAT , a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. WOMBAT is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of WOMBAT and evaluate it on 8 different benchmark datasets. Our evaluation suggests that WOMBAT outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that WOMBAT’s pruning algorithm allows it to scale well even on large datasets.

  2. “All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking” Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

    Abstract. The evaluation of Named Entity Recognition as well as Entity Linking systems is mostly based on manually created gold standards. However, the current gold standards have three main drawbacks. First, they do not share a common set of rules pertaining to what is to be marked and linked as an entity. Moreover, most of the gold standards have not been checked by other researchers after they have been published and hence commonly contain mistakes. Finally, they lack actuality as in most cases the reference knowledge base used to link the entities has been refined over time while the gold standards are typically not updated to the newest version of the reference knowledge base. In this work, we analyze existing gold standards and derive a set of rules for annotating documents for named entity recognition and entity linking. We derive Eaglet, a tool that supports the semi-automatic checking of a gold standard based on these rules. A manual evaluation of Eaglet’s results shows that it achieves an accuracy of up to 88% when detecting errors. We apply Eaglet to 13 gold standards and detect 38,453 errors. An evaluation of 10 tools on a subset of these datasets shows a performance difference of up to 10% micro F-measure on average.

 

Acknowledgments
This work is has been supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227), the European Union’s H2020 research and innovation action SLIPO (GA no. 731581), the BMWI Project SAKE (project no. 01MD15006E), the BmBF project DIESEL (project no. 01QE1512C) and the BMWI Project GEISER (project no. 01MD16014E).

Posted in DIESEL, GEISER, HOBBIT, LIMES, paper presentation, Papers, SAKE, SLIPO, Uncategorized | Comments Off on Two accepted papers in ESWC 2017