SANSA 0.4 (Semantic Analytics Stack) Released

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in <100 seconds on a 7 node cluster
  • Support for a wider range of data partitioning strategies
  • A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
  • Improved unit test coverage
  • Improved distributed statistics calculation (see ISWC paper)
  • Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
  • New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
  • Numeric outlier detection tested on DBpedia (en)
  • Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

 

Posted in SANSA, Software Releases | Comments Off on SANSA 0.4 (Semantic Analytics Stack) Released

AKSW is organizing the 6th Leipzig Semantic Web Day (LSWT2018)

On June 18th 2018 we will have the 6th Leipzig Semantic Web Day (LSWT2018). A platform for regional actors to get in touch with each other regarding Semantic Web topics. This year we want to focus on e-government, e-commerce and digital humanities. It will be great.

We still have the possibility for more talks on the program. If you want to contribute by giving a presentation please contact Natanael Arndt with your title and an abstract until April 27th 2018.

If you want to participate, please register until May 25th 2018.

For more information please have a look at https://leds-projekt.de/lswt2018.html.



Posted in Call for Paper, Events, LEDS, Projects | Tagged , , | Comments Off on AKSW is organizing the 6th Leipzig Semantic Web Day (LSWT2018)

SANSA 0.3 (Semantic Analytics Stack) Released

Dear all,

We are happy to announce SANSA 0.3 – the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify (with some known limitations until the next Spark 2.3.* release)
  • SPARQL querying via conversion to Gremlin path traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst (all in beta status), EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs based AMIE+
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Distributed knowledge graph embedding approaches: TransE (beta), DistMult (beta), several further algorithms planned

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD and BETTER.

Greetings from the SANSA Development Team

Posted in SANSA | Tagged | Comments Off on SANSA 0.3 (Semantic Analytics Stack) Released

DBpedia @ SEMANTiCS 2017

We are happy to invite you to the 10th DBpedia Community Meeting which will be held in Amsterdam. During the SEMANTiCS 2017, Sep 11-14, the DBpedia Community will get together on the 14th of September for the DBpdia Day.  

What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting. Please submit your proposal in our form.

Highlights/Sessions

  • Keynote by Chris Welty (Google Research)
  • Keynote by Victor de Boer (VU University)
  • DBpedia Association Hour & Dutch DBpedia Hour
  • session on DBpedia ontology by members of the DBpedia ontology committee
  • DBpedia Tutorial Session (for people who want to learn about DBpedia)
  • We will talk with Mike Tung, CEO and founder from diffbot, about the DBpedia NLP department via videostream.

 Tickets

Attending the DBpedia Community Meeting costs €40 (excl. registration fee and VAT). DBpedia members get free admission, please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.  

Please check all details here.

Workshop

If you can’t stand it till the end of the SEMANTiCS, you can already participate in the workshop “Two worlds, one goal: A Reliable Linked Data ecosystem for media” held by DBpedia and Wolters Kluwer on the 11th of September. This half-day workshop aims at exploring major topics for publishers and libraries from DBpedia’s and Wolters Kluwer’s perspective. Therefore, both communities will dive into core areas like Interlinking, Metadata and Data Quality and address challenges such as fundamental requirements when publishing data on the web. Did we spark your interest? Check our detailed program here and get your ticket today.

We are looking forward to meeting you in Amsterdam!

Posted in Uncategorized | Comments Off on DBpedia @ SEMANTiCS 2017

PRESS RELEASE: Amsterdam​ ​-​ ​this​ ​year’s​ ​hotspot​ ​​on Linked​ ​Data​ ​Strategies​ ​&​ ​Practices

SEMANTiCs LogoSeptember 11-14, 2017 international experts from science and industry demonstrate the business value of smart data services at SEMANTiCS 2017

Experts from science and industry meet at Europe’s biggest Linked Data and Semantic Web event to present and discuss latest achievements, challenges and future perspectives of new data management practices. The conference for Semantic Systems is now in its 13th edition and run by a mixed industry and research consortium built by Semantic Web Company (Austria), Institute for Applied Informatics (Germany), University of Applied Science St.Pölten (Austria) and the dutch partners VU, TNO and Kadaster, together with Wolters Kluwer as major industry sponsor.

Most companies and public administrations nowadays are struggling to catch up with new data management practices, either by initializing a data strategy from scratch or by adjusting their old strategy to the affordances of new technological environments, legal frameworks or business models. The Semantics conference gives insights into data management strategies, discusses cases of data-driven business models and gives advice on how to catch up with latest developments at the dawn of smart, networked data.

The exchange between industry and research is facilitated by a rich program consisting of six keynotes from companies like EA Games, Wolters Kluwer, and OTTO, followed by a total of 36 industry and 25 scientific presentations, 17 workshops, a poster and demo area and numerous social side events.

Programme​ ​Overview

September 11, 2017: Pre-Conference Workshops
September 12, 2017: Main Conference Day 1: Keynotes by Wolters Kluwer, EA Games and Toulouse Institute of Computer Science Research
September 13, 2017: Main Conference Day 2: Keynotes by OTTO & Ghent University
September 14, 2017: Post-Conference Workshops & DBpedia Day: Keynote by Chris Welty

This year’s conference focuses on the business value of Linked Data technologies and services as an enabling technology for a cost-efficient, flexible and sustainable enterprise data strategy.

This is addressed in the opening keynote of Sandeep Sacheti, Executive Vice President, Customer Information Management & Operational Excellence of Wolter Kluwer and in the management panel on Wednesday with Frank Tierolff (board member of Kadaster), Henk Jan Vink (director Networked Innovation of TNO), Kor Brandts (director DUO) and Michiel Borgers (Dutch Ministry of Finance).

The full and rich programme, with talks and presentations by leading researchers in the field and leading industry adopters, can be found at the conference page at http://2017.semantics.cc

SEMANTiCS​ ​2017​ ​Key​ ​Data

Date: 11-14 September 2017
Venue: Meervaart Theatre, Amsterdam, The Netherlands
Website: http://2017.semantics.cc
Programm: http://2017.semantics.cc/programme
Twitter: @semanticsconf
Contact: Dissemination Chair Arjen Santema
arjen.santema@kadaster.nl |+31 (0)652481774

View the full release here: PDF

Looking forward to see you at SEMANTiCs 2017.

Posted in Announcements, Events | Tagged , , , | Comments Off on PRESS RELEASE: Amsterdam​ ​-​ ​this​ ​year’s​ ​hotspot​ ​​on Linked​ ​Data​ ​Strategies​ ​&​ ​Practices

AKSW Colloquium, 01.09.2017, IDOL: Comprehensive & Complete LOD Insights

At the AKSW Colloquium on Friday 1st of September, at 10:40 AM there will be a paper presentation by Gustavo Publio. He will present the paper IDOL: Comprehensive & Complete LOD Insights, from Ciro Baron Neto, Dimitris Kontokostas, Amit Kirschenbaum, Gustavo Publio, Diego Esteves, and Sebastian Hellmann which will be presented in the upcoming SEMANTiCS’17 Conference in Amsterdam, Netherlands.

Abstract

“Over the last decade, we observed a steadily increasing amount of
RDF datasets made available on the web of data. The decentralized
nature of the web, however, makes it hard to identify all these
datasets. Even more so, when downloadable data distributions are
discovered, only insufficient metadata is available to describe the
datasets properly, thus posing barriers on its usefulness and reuse.
In this paper, we describe an attempt to exhaustively identify the
whole linked open data cloud by harvesting metadata from multiple
sources, providing insights about duplicated data and the general
quality of the available metadata. This was only possible by using a
probabilistic data structure called Bloom Filter. Finally, we published
a dump file containing metadata which can further be used to enrich
existent datasets.”

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there are complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation, Papers | Tagged , , , | Comments Off on AKSW Colloquium, 01.09.2017, IDOL: Comprehensive & Complete LOD Insights

AKSW at ISWC2017

We are very pleased to announce that AKSW will be presenting 2 papers at ISWC 2017, which will be held on 21-24 October in Vienna, Austria. The demo and workshops papers have to be announced.
The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.

Here is the list of the accepted paper with their abstract:

Distributed Semantic Analytics using the SANSA Stack” by Jens LehmannGezim SejdiuLorenz BühmannPatrick WestphalClaus Stadler, Ivan ErmilovSimon Bin, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo and Hajira Jabeen.

Abstract:A major research challenge is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base  completion and reasoning. Analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases, and most analytics approaches which do scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input. This software framework paper describes the ongoing Semantic Analytics Stack (SANSA) project, which supports expressive and scalable semantic analytics by providing functionality for distributed computing on RDF data.

Iguana : A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores” by Felix ConradsJens Lehmann, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, and Mohamed Morsey.

Abstract  :The performance of triples stores is crucial for applications which rely on RDF data. Several benchmarks have been proposed that assess the performance of triple stores. However, no integrated benchmark-independent execution framework for these benchmarks has been provided so far. We propose a novel SPARQL benchmark execution framework called IGUANA. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under different loads. Moreover, it allows a uniform comparison of results on different benchmarks. We execute the FEASIBLE and DBPSB benchmarks using the IGUANA framework and measure the performance of popular triple stores under updates and parallel user requests. We compare our results with state-of-the-art benchmarking results and show that our benchmark execution framework can unveil new insights pertaining to the performance of triple stores.

Thank you and looking forward to see you at ISWC 2017.

Acknowledgments
These work were supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227), the European Union’s H2020 research and innovation program BigDataEurope (GA no.644564), German Ministry BMWI under the SAKE project (Grant No. 01MD15006E), WDAqua : Marie Skłodowska-Curie Innovative Training Network and Industrial Data Space.

Posted in Uncategorized | Comments Off on AKSW at ISWC2017

AKSW Colloquium, 07.07.2017, Two paper presentations concerning Link Discovery and Knowledge Base Reasoning

At the AKSW Colloquium on Friday 7th of July, at 10:40 AM there will be two paper presentations concerning genetic algorithms to learn linkage rules, and differentiable learning of logical rules for knowledge base reasoning.

Tommaso Soru will present the paper Differentiable Learning of Logical Rules for Knowledge Base Reasoning, currently a pre-print, by Fan Yang, Zhilin Yang, and William W. Cohen.

Abstract

“We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method obtains state-of-the-art results on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.”

Daniel Obraczka will present the paper Learning Expressive Linkage Rules using Genetic Programming of Isele and Bizer accepted at VLDB 2012. This work presents an algorithm to learn record linkage rules utilizing genetic programming.

Abstract

“A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm is capable of generating linkage rules which select discriminative properties for comparison, apply chains of data transformations to normalize property values, choose appropriate distance measures and thresholds and combine the results of multiple comparisons using non-linear aggregation functions. Our experiments show that the GenLink algorithm outperforms the state-of-the-art genetic programming approach to learning linkage rules recently presented by Carvalho et. al. and is capable of learning linkage rules which achieve a similar accuracy as human written rules for the same problem.”

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , , , | Comments Off on AKSW Colloquium, 07.07.2017, Two paper presentations concerning Link Discovery and Knowledge Base Reasoning

SANSA 0.2 (Semantic Analytics Stack) Released

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing for semantic technologies in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples format
  • Reading OWL files in various standard formats
  • Querying and partitioning based on Sparqlify
  • RDFS/RDFS Simple/OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular, the projects Big Data Europe,  HOBBIT , SAKE and Big Data Ocean.

SANSA Development Team

Posted in Uncategorized | Comments Off on SANSA 0.2 (Semantic Analytics Stack) Released

AKSW at ESWC 2017

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group.

Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Jens Lehmann and Edgard Marx participated at the conference. They held a number of presentations, workshops and tutorials:

Michael Röder

Mohamed Ahmed Sherif

Muhammad Saleem

Edgard Marx

  • Presented a Workshop paper „Exploring the Evolution and Provenance of Git Versioned RDF Data“ by Natanael Arndt, Patrick Naumann and Edgard Marx
  • Presented a demo paper „Kbox – Distributing ready-to-query RDF Knowledge Graphs“ by Edgard Marx, Tommaso Soru, Ciro Baron Neto and Sandro Coelho

Claus Stadler

  • Presented a Workshop paper in QuWeDa „JPA Criteria Queries over RDF Data“ by Claus Stadler, Jens Lehmann

The final versions of the papers from Edgard and Claus will be made available soon.

As every year the ESWC also awarded the best papers and studies in several categories. The award for Best Challenge Paper went to: “End-to-end Representation Learning for Question Answering with Weak Supervision” by Daniil Sorokin and Iryna Gurevych. The paper is part of the HOBBIT project by AKSW. Congrats to the winners!

Have a look at all the winners at ESWC 2017: http://2017.eswc-conferences.org/awards.

Posted in Announcements, Call for Paper, Events, HOBBIT, paper presentation, Papers | Tagged , , , , , , , | Comments Off on AKSW at ESWC 2017