AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Michael Roeder On the 15th of August at 3 PM, Michael Röder will present the paper “RDF Query Relaxation Strategies Based on Failure Causes” of Fokou et al. in P702.

Abstract

Recent advances in Web-information extraction have led to the creation of several large Knowledge Bases (KBs). Querying these KBs often results in empty answers that do not serve the users’ needs. Relaxation of the failing queries is one of the cooperative techniques used to retrieve alternative results. Most of the previous work on RDF query relaxation compute a set of relaxed queries and execute them in a similarity-based ranking order. Thus, these approaches relax an RDF query without knowing its failure causes (FCs). In this paper, we study the idea of identifying these FCs to speed up the query relaxation process. We propose three relaxation strategies based on various information levels about the FCs of the user query and of its relaxed queries as well. A set of experiments conducted on the LUBM benchmark show the impact of our proposal in comparison with a state-of-the-art algorithm.

The paper is available at researchgate.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Article accepted in Journal of Web Semantics

We are happy to announce that the article “DL-Learner – A Framework for Inductive Learning on the Semantic Web” by Lorenz Bühmann, Jens Lehmann and Patrick Westphal was accepted for publication in the Journal of Web Semantics: Science, Services and Agents on the World Wide Web.

Abstract:

In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.

Posted in DL-Learner, Papers, Uncategorized | Comments Off on Article accepted in Journal of Web Semantics

AKSW Colloquium, 18.07.2016, AEGLE and node2vec

On Monday 18.07.2016, Kleanthi Georgala will give her Colloquium presentation for her paper “An Efficient Approach for the Generation of Allen Relations”, that was accepted at the European Conference on Artificial Intelligence (ECAI) 2016.

Abstract

Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eight simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 1.

Tommaso SoruAfterwards, Tommaso Soru will present a paper considered the latest chapter of the Everything-2-Vec saga, which encompasses outstanding works such as Word2Vec and Doc2Vec. The paper title is node2vec: Scalable Feature Learning for Networks” [PDF] by Aditya Grover and Jure Leskovec, accepted for publication at the International Conference on Knowledge Discovery and Data Mining (KDD), 2016 edition.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 18.07.2016, AEGLE and node2vec

AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

On the upcoming Monday (04.07.2016), AKSW group will discuss topics related to Semantic Web and Big Data as well as programming languages and code quality. In particular, the following papers will be presented:

S2RDF: RDF Querying with SPARQL on Spark

by Alexander Schätzle et al.
Presented by: Ivan Ermilov

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses its relational interface to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state
of the art SPARQL-on-Hadoop approaches using the recent WatDiv test suite. S2RDF achieves sub-second runtimes for majority of queries on a billion triples RDF graph

A Large Scale Study of Programming Languages and Code Quality in Github

by Baishakhi Ray et al.
Presented by: Tim Ermilov

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods,
and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages

Each paper will be presented in 20 minutes, which will be followed by 10 minutes discussion. After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted in BigDataEurope, Colloquium, paper presentation | Comments Off on AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

Accepted Papers of AKSW Members @ Semantics 2016

logo-semantics-16This year’s SEMANTiCS conference which is taking place between September 12 – 15, 2016 in Leipzig recently invited for the submission of research papers on semantic technologies. Several AKSW members seized the opportunity and got their submitted papers accepted for presentation at the conference.

These are listed below:

  • Executing SPARQL queries over Mapped Document Stores with SparqlMap-M (Jörg Unbehauen, Michael Martin )
  • Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store (Natanael Arndt, Norman Radtke and Michael Martin)
  • Towards Versioning of Arbitrary RDF Data (Marvin Frommhold, Ruben Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen and Michael Martin)
  • DBtrends: Exploring query logs for ranking RDF data (Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg)
  • MEX Framework: Automating Machine Learning Metadata Generation (Diego Esteves, Pablo N. Mendes, Diego Moussallem, Julio Cesar Duarte, Maria Claudia Cavalcanti, Jens Lehmann, Ciro Baron Neto and Igor Costa)

logo-www.leds-projekt.deAnother AKSW-driven event of the SEMANTiCS 2016 will be the Linked Enterprise Data Services (LEDS) Track taking place between September 13-14, 2016. This track is specifically organized by the BMBF-funded LEDS project which is part of the Entrepreneurial Regions program – a BMBF Innovation Initiative for the New German Länder. Focus is on discussing with academic and industrial partners new approaches to discover and integrate background knowledge into business and governmental environments.

DBpediaLogoFullSEMANTiCS 2016 will also host the 7th edition of the DBpedia Community Meeting on the last day of the conference (September 15 – ‘DBpedia Day‘). DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and link the different data sets on the Web to Wikipedia data.

So come and join SEMANTiCS 2016, talk and discuss with us!

More information on the program can be found here.

LEDS is funded by:                      Part of:
BMBF_CMYK_Gef_L_300dpi

Wachstumskern Region

Posted in Announcements, Call for Paper, dbpedia, Events, LEDS, Papers, Uncategorized | Comments Off on Accepted Papers of AKSW Members @ Semantics 2016

AKSW Colloquium, 27.06.2016, When owl:sameAs isn’t the Same + Towards Versioning for Arbitrary RDF Data

In the next Colloquium, June the 27th at 3 PM, two papers will be presented:

When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data

andre_terno_itaAndré Valdestilhas will present the paper “When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data” by Halpin et al. [PDF]:

Abstract:  In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.

Towards Versioning for Arbitrary RDF Data

marvin-frommhold-foto.256x256Afterwards, Marvin Frommhold will practice the presentation of his paper “Towards Versioning for Arbitrary RDF Data” (Marvin Frommhold, Rubén Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen, and Michael Martin) [PDF] which is accepted at the main conference of the Semantics 2016 in Leipzig.

Abstract: Coherent and consistent tracking of provenance data and in particular update history information is a crucial building block for any serious information system architecture. Version Control Systems can be a part of such an architecture enabling users to query and manipulate versioning information as well as content revisions. In this paper, we introduce an RDF versioning approach as a foundation for a full featured RDF Version Control System. We argue that such a system needs support for all concepts of the RDF specification including support for RDF datasets and blank nodes. Furthermore, we placed special emphasis on the protection against unperceived history manipulation by hashing the resulting patches. In addition to the conceptual analysis and an RDF vocabulary for representing versioning information, we present a mature implementation which captures versioning information for changes to arbitrary RDF datasets.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LEDS, LUCID, paper presentation, Papers | Comments Off on AKSW Colloquium, 27.06.2016, When owl:sameAs isn’t the Same + Towards Versioning for Arbitrary RDF Data

Should I publish my dataset under an open license?

Undecided, stand back we know flowcharts:

Did you ever try to apply the halting problem to a malformed flowchart?

 

Taken from my slides for my keynote  at TKE:

Posted in Announcements, best practices in the Web of Data | Comments Off on Should I publish my dataset under an open license?

TKE 2016 has announced their invited speakers

Sebastian HellmannThe 12th International Conference on Terminology and Knowledge Engineering (TKE 2016) has announced their invited speakers, including Dr. Sebastian Hellmann, Head of the AKSW/KILT research group at Leipzig University and Executive Director of the DBpedia Association at the Institut for Applied Informatics (InfAI) e.V.. Sebastian Hellman will give a talk on Challenges, Approaches and Future Work for Linguistic Linked Open Data (LLOD).

The theme of the 12th International Conference on Terminology and Knowledge Engineering will be ‘Term Bases and Linguistic Linked Open Data’. So the main aims of TKE 2016 will be to bring together researchers from these related fields, provide an overview of the state-of-the-art, discuss problems and opportunities, and exchange information. TKE 2016 will also cover applications, ongoing and planned activities, industrial uses and needs, as well as requirements coming from the new e-society.

DownloadThe TKE 2016 conference will take place in Copenhagen, Denmark, between 22-24 June, 2016. Further information about the program and speakers confirmed so far can be found at the conference website.

 

Posted in Announcements, Events, invited talk | Comments Off on TKE 2016 has announced their invited speakers

Two Papers accepted at ECAI 2016

Ecai-2016Hello Community! We are very pleased to announce that two of our papers were accepted for presentation at the biennial European Conference on Artificial Intelligence (ECAI). ECAI is Europe’s premier venue for presenting scientific results in AI and will be held from August 29th to September 02nd in The Hague, Netherlands.

 

In more detail, we will present the following papers:

An Efficient Approach for the Generation of Allen Relations                     (Kleanthi Georgala, Mohamed Sherif, Axel-Cyrille Ngonga Ngomo)

Abstract: Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eights simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 100%.

Towards SPARQL-Based Induction for Large-Scale RDF Data sets             (Simon Bin, Lorenz Bühmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo)

Abstract: We show how to convert OWL Class Expressions to SPARQL queries where the instances of that concept are — with restrictions sensible in the considered concept induction scenario — equal to the SPARQL query result.  Furthermore, we implement and integrate our converter into the CELOE algorithm (Class Expression Learning for Ontology Engineering). Therein, it replaces the position of a traditional OWL reasoner, which most structured machine learning approaches assume knowledge to be loaded into. This will foster the application of structured machine learning to the Semantic Web, since most data is readily available in triple stores. We provide experimental evidence for the usefulness of the bridge. In particular, we show that we can improve the runtime of machine learning approaches by several orders of magnitude. With these results, we show that machine learning algorithms can now be executed on data on which in-memory reasoners could not be  use previously possible.

Come over to ECAI and enjoy the talks. For more information on the conference program and other papers please see here.

Sandra on behalf of AKSW

Posted in Announcements, Call for Paper | Comments Off on Two Papers accepted at ECAI 2016

AKSW Colloquium, 13.06.2016, SPARQL query processing with Apache Spark

In the upcoming Colloquium, Simon Bin will discuss the paper “SimonSPARQL query processing with Apache Spark” by H. Naacke et.al. that has been submitted to ISWC2016.  Abstract

The number of linked data sources and the size of the linked open data graph keep growing every day.  As a consequence, semantic RDF services are more and more confronted to various big data problems.  Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks.  Data management systems requiring these properties are rarely built from scratch but are rather designed on top of an existing cluster computing engine.  In this work, we consider the processing of SPARQL queries with Apache Spark.
We propose and compare five different query processing approaches based on different join execution models and Spark components.  A detailed experimentation, on real-world and synthetic data sets, emphasizes that two approaches tailored for the RDF data model outperform the other ones on all major query shapes, i.e star, snowflake, chain and hybrid.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium | Comments Off on AKSW Colloquium, 13.06.2016, SPARQL query processing with Apache Spark