The USPTO Linked Patent Dataset release

Dear all,

We are happy to announce USPTO Linked Patent Dataset release.

Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150, 000 patents to individuals and companies all over the world. In fact, there were more than 200, 000 patent grants issued in the US in 2013. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient.

Our dataset is the output of converting USPTO XML patents data into RDF from the years 2002 – 2016. This supports the integration with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements.

The USPTO Linked Patent Dataset contains 13,014,651 entities where 2,355,579 are patents. Other entities represent Applicant, Inventor, Agent, Examiner (primary and secondary),  and assignee. All these entities amount to c.a. 168 million triples are describing the patents information.

The complete description of the dataset and SPARQL endpoint are available on the DataHub: https://datahub.io/dataset/linked-uspto-patent-data.

We really appreciate feedback and are open to collaborations.
If you happen to have use cases utilizing this dataset, please contact us.

 

Posted in Dataset Release | Tagged , , , | Comments Off on The USPTO Linked Patent Dataset release

Two accepted papers in ESWC 2017

Hello Community! We are very pleased to announce the acceptance of two papers in ESWC 2017 research track. The ESWC 2017 is to be held in Portoroz, Slovenia from 28th of May to the 1st of June. In more detail, we will present the following papers:

  1. “WOMBAT – A Generalization Approach for Automatic Link Discovery” Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, Jens Lehmann

    Abstract. A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating WOMBAT , a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. WOMBAT is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of WOMBAT and evaluate it on 8 different benchmark datasets. Our evaluation suggests that WOMBAT outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that WOMBAT’s pruning algorithm allows it to scale well even on large datasets.

  2. “All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking” Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

    Abstract. The evaluation of Named Entity Recognition as well as Entity Linking systems is mostly based on manually created gold standards. However, the current gold standards have three main drawbacks. First, they do not share a common set of rules pertaining to what is to be marked and linked as an entity. Moreover, most of the gold standards have not been checked by other researchers after they have been published and hence commonly contain mistakes. Finally, they lack actuality as in most cases the reference knowledge base used to link the entities has been refined over time while the gold standards are typically not updated to the newest version of the reference knowledge base. In this work, we analyze existing gold standards and derive a set of rules for annotating documents for named entity recognition and entity linking. We derive Eaglet, a tool that supports the semi-automatic checking of a gold standard based on these rules. A manual evaluation of Eaglet’s results shows that it achieves an accuracy of up to 88% when detecting errors. We apply Eaglet to 13 gold standards and detect 38,453 errors. An evaluation of 10 tools on a subset of these datasets shows a performance difference of up to 10% micro F-measure on average.

 

Acknowledgments
This work is has been supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227), the European Union’s H2020 research and innovation action SLIPO (GA no. 731581), the BMWI Project SAKE (project no. 01MD15006E), the BmBF project DIESEL (project no. 01QE1512C) and the BMWI Project GEISER (project no. 01MD16014).

Posted in DIESEL, GEISER, HOBBIT, LIMES, paper presentation, Papers, SAKE, SLIPO, Uncategorized | Comments Off on Two accepted papers in ESWC 2017

AKSW Colloquium, 13th February, 3pm, Evaluating Entity Linking

Michael Roeder On the 13th of February at 3 PM, Michael Röder will present the two papers “Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job” of van Erp et al. and “Moving away from semantic overfitting in disambiguation datasets” of Postma et al. in P702.

Abstract 1

Entity linking has become a popular task in both natural language processing and semantic web communities. However, we find that the benchmark datasets for entity linking tasks do not accurately evaluate entity linking systems. In this paper, we aim to chart the strengths and weaknesses of current benchmark datasets and sketch a roadmap for the community to devise better benchmark datasets.

Abstract 2

Entities and events in the world have no frequency, but our communication about them and the expressions we use to refer to them do have a strong frequency profile. Language expressions and their meanings follow a Zipfian distribution, featuring a small amount of very frequent observations and a very long tail of low frequent observations. Since our NLP datasets sample texts but do not sample the world, they are no exception to Zipf’s law. This causes a lack of representativeness in our NLP tasks, leading to models that can capture the head phenomena in language, but fail when dealing with the long tail. We therefore propose a referential challenge for semantic NLP that reflects a higher degree of ambiguity and variance and captures a large range of small real-world phenomena. To perform well, systems would have to show deep understanding on the linguistic tail.

The papers are available at lrec-conf.org and aclweb.org.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 13th February, 3pm, Evaluating Entity Linking

SLIPO project kick-off meeting

SLIPO, a new InfAI project kicked-off between the 18th and 20th of January in Athens, Greece. Funded by the EU-program “Horizon 2020”, the project is planned to have an operational time until the 31st of December 2019.

Scalable Linking and Integration of Big POI Data (SLIPO) has the goal to transfer the output of the GeoKnow researches to certain challenges of POI data, which becomes more and more indispensable for issues in the fields of tracking, logistics and tourism. Furthermore we are scheduling to improve the scalability of our key research frameworks, such as LIMES, DEER or LinkedGeoData.

For or closer look please visit: http://aksw.org/Projects/SLIPO.html

Our partners through this process are: 

This project has received funding from the European Union’s H2020 research and innovation action program under grant agreement number 731581.

Posted in Announcements, Kickoff, SLIPO | Comments Off on SLIPO project kick-off meeting

AKSW Colloquium 30.Jan.2017

In the upcoming Colloquium, Simon Bin will discuss the paper “SimonTowards Analytics Aware Ontology Based Access to Static and Streaming Data” by Evgeny Kharlamov et.al. that has been presented at ISWC2017.

  Abstract

Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium | Comments Off on AKSW Colloquium 30.Jan.2017

AKSW Colloquium, 23.01.2017, Automatic Mappings of Tables to Knowledge Graphs and Open Table Extraction

Automatic Mappings of Tables to Knowledge Graphs and Open Table Extraction

On the upcoming colloquium on 23.01.2017, Ivan Ermilov will present his work on automatic mappings of tables to knowledge graphs, which was published as TAIPAN: Automatic Property Mapping for Tabular Data on EKAW’2016 conference, as well as extension of this work including:

  • Open Table Extraction (OTE) approach, i.e. how to generate meaningful information from a big corpus of tables.
  • How to benchmark OTE and which benchmarks are available.
  • OTE use cases and applications.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, PHD progress report, PhD topic | Comments Off on AKSW Colloquium, 23.01.2017, Automatic Mappings of Tables to Knowledge Graphs and Open Table Extraction

PRESS RELEASE: “HOBBIT so far.” is now available

cropped-Hobbit_Logo_Claim_2015_rgb_300_130The latest release informs about the conferences our team attended in 2016 as well as about the published blogposts. Furthermore it gives a short analysis about the survey by which we are able to verify requirements of our benchmarks and the new HOBBIT plattform. Last but not least the release gives a short outlook to our plans in 2017 including the founding of the HOBBIT association.

Have a look at the whole press release on the HOBBIT website .

Posted in Announcements, HOBBIT, Press Release, Projects | Comments Off on PRESS RELEASE: “HOBBIT so far.” is now available

4th Big Data Europe Plenary at Leipzig University

bde_vertical

The meeting, hosted by our partner InfAI e. V., took place on the 14th to the 15th of December at the University of Leipzig.
The 29 attendees in total, including 15 partners, discussed and reviewed the progress of all work packages in 2016 and planned the activities and workshops taking place in the next six months.

On the second day we talked about several societal challenge pilots in the fields of AgroKnow, transport, security etc. It’s been the last plenary for this year and we thank everybody for their work in 2016. Big Data Europa and our partners are looking forward to 2017.

The next Plenary Meeting will be hosted by VU Amsterdam and will take place in Amsterdam, in June 2017.

Posted in Announcements, BigDataEurope, Projects | Comments Off on 4th Big Data Europe Plenary at Leipzig University

SANSA 0.1 (Semantic Analytics Stack) Released

Dear all,

The Smart Data Analytics group /AKSW are very happy to announce SANSA 0.1 – the initial release of the Scalable Semantic Analytics Stack. SANSA combines distributed computing and semantic technologies in order to allow powerful machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Support for reading and writing RDF files in N-Triples format
  • Support for reading OWL files in various standard formats
  • Querying and partitioning based on Sparqlify
  • Support for RDFS/RDFS Simple/OWL-Horst forward chaining inference
  • Initial RDF graph clustering support
  • Initial support for rule mining from RDF graphs

We want to thank everyone who helped to create this release, in particular, the projects Big Data Europe, HOBBIT and SAKE.

Kind regards,

The SANSA Development Team

Posted in SANSA | Comments Off on SANSA 0.1 (Semantic Analytics Stack) Released

AKSW wins award for Best Resources Paper at ISWC 2016 in Japan

iswc2016Our paper, “LODStats: The Data Web Census Dataset”, won the award for Best Resources Paper at the recent conference in Kobe/Japan, which was the premier international forum for Semantic Web and Linked Data Community. The paper presents the LODStats dataset, which provides a comprehensive picture of the current state of a significant part of the Data Web.

Congrats to  Ivan Ermilov, Jens Lehmann, Michael Martin and Sören Auer.

Please find the complete list of winners here.

 

Posted in Announcements, Papers | Comments Off on AKSW wins award for Best Resources Paper at ISWC 2016 in Japan