AKSW Colloquium, 01.02.2016, Co-evolution of RDF Datasets

Natanael ArndtAt the todays colloquium, Natanael Arndt will discuss the the paper “Co-evolution of RDF Dataset” by Sidra Faisal, Kemele M. Endris, Saeedeh Shekarpour and Sören Auer (2016, available on arXiv)

Link: http://arxiv.org/abs/1601.05270v1

Abstract: For many use cases it is not feasible to access RDF data in a truly federated fashion. For consistency, latency and performance reasons data needs to be replicated in order to be used locally. However, both a replica and its origin dataset undergo changes over time. The concept of co-evolution refers to mutual propagation of the changes between a replica and its origin dataset. The co-evolution process addresses synchronization and conflict resolution issues. In this article, we initially provide formal definitions of all the concepts required for realizing co-evolution of RDF datasets. Then, we propose a methodology to address the co-evolution of RDF datasets. We rely on a property-oriented approach for employing the most suitable strategy or functionality. This methodology was implemented and tested for a number of different scenarios. The result of our experimental study shows the performance and robustness aspect of this methodology.

Posted in Colloquium, LEDS, paper presentation | Comments Off on AKSW Colloquium, 01.02.2016, Co-evolution of RDF Datasets

Holographic Embeddings of Knowledge Graphs

During the upcoming colloquium, Nilesh Chakraborty will give a short introduction on factorising RDF tensors and present a paper on “Holographic Embeddings of Knowledge Graphs”:

Holographic Embeddings of Knowledge Graphs

Authors: Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio
Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets.

Posted in Colloquium, paper presentation | Comments Off on Holographic Embeddings of Knowledge Graphs

AKSW Colloquium, 25.01.2016, LargeRDFBench and Introduction To The Docker Ecosystem

On the upcoming colloquium, Muhammad Saleem will present his paper “LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation” about the benchmarking of federated SPARQL endpoints. The other talk will be an introduction to the Docker ecosystem by Tim Ermilov.

LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation

Authors: Muhammad Saleem, Ali Hasnain, Axel Ngonga
Abstract. Gathering information from the Web of Data is commonly carried out by using SPARQL query federation approaches. However, the fitness of current SPARQL query federation approaches for real applications is difficult to evaluate with current benchmarks as they are either synthetic, too small in size and complexity or do not provide means for a fine-grained evaluation. We propose LargeRDFBench, a billion-triple benchmark for SPARQL query federation which encompasses real data as well as real queries pertaining to real bio-medical use cases. We evaluate state-of-the-art SPARQL endpoint federation approaches on this benchmark with respect to their query runtime, triple pattern-wise source selection, result completeness and correctness. Our evaluation results suggest that the performance of current SPARQL query federation systems on simple queries (in terms of total triple patterns, query result set sizes, execution time, use of SPARQL features etc.) does not reflect the systems’ performance on more complex queries. Moreover, current federation systems seem unable to deal with real queries that involve processing large intermediate result sets or lead to large result sets.

Introduction To The Docker Ecosystem

Presented by: Tim Ermilov
Slides are available online

On the upcoming colloquium, Muhammad Saleem will present his paper “LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation” about the benchmarking of federated SPARQL endpoints. The other talk will be an introduction to the Docker ecosystem by Tim Ermilov.

LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation

Authors: Muhammad Saleem, Ali Hasnain, Axel Ngonga
Abstract. Gathering information from the Web of Data is commonly carried out by using SPARQL query federation approaches. However, the fitness of current SPARQL query federation approaches for real applications is difficult to evaluate with current benchmarks as they are either synthetic, too small in size and complexity or do not provide means for a fine-grained evaluation. We propose LargeRDFBench, a billion-triple benchmark for SPARQL query federation which encompasses real data as well as real queries pertaining to real bio-medical use cases. We evaluate state-of-the-art SPARQL endpoint federation approaches on this benchmark with respect to their query runtime, triple pattern-wise source selection, result completeness and correctness. Our evaluation results suggest that the performance of current SPARQL query federation systems on simple queries (in terms of total triple patterns, query result set sizes, execution time, use of SPARQL features etc.) does not reflect the systems’ performance on more complex queries. Moreover, current federation systems seem unable to deal with real queries that involve processing large intermediate result sets or lead to large result sets.

Introduction To The Docker Ecosystem

Presented by: Tim Ermilov
Slides are available online.

Each talk will last for 20 minutes. The audience will have 10 minutes to ask questions. There will be cookies and coffee break after the talks for discussion as well.

 

Posted in Colloquium, paper presentation, tutorial, workshop or tutorial | Comments Off on AKSW Colloquium, 25.01.2016, LargeRDFBench and Introduction To The Docker Ecosystem

HOBBIT project kick-off

HOBBIT, a new InfAI project within the EU’s “Horizon 2020″ framework program kicked-off in Luxembourg on 18 and 19 january in 2016.

The main goal of the HOBBIT project (@hobbit_project on Twitter) is to benchmark linked and big data systems and assess their performance using industry-relevant key performance indicators. To achieve this goal, the project develops 1) a holistic open-source platform and 2) eight industry-grade benchmarks for systems of different parts of the linked data lifecycle. These benchmarks will contain datasets based on industry-related, real-world data and can be scaled up to evaluate even Big Data solutions.

Our partners in this project are iMinds, AGT Group R&D GmbH, Fraunhofer IAIS, USU Software AG, Foundation for Research & Technology – Hellas (FORTH), National Center for Scientific Research “Demokritos” (NCSR), OpenLink Software, TomTom and Ontos AG.

Please continue reading Press Release “New EU project develops a platform for benchmarking large linked datasets by University of Leipzig Press. The Text is also available in English.

Find out more at http://project-hobbit.eu/ and by following us (@hobbit_project) on Twitter.

This project has received funding from the European Union’s H2020 research and innovation action program under grant agreement number 688227.

HOBBIT ProjectEC-H2020

Posted in Announcements, HOBBIT, project kick-off | Comments Off on HOBBIT project kick-off

AKSW Colloquium, 18.01.2016, Natural Language Processing and Question Answering

On the upcoming colloquium, Ivan Ermilov and Konrad Höffner, members of AKSW, will present two papers from the natural language processing (NLP) and Question Answering (QA) research areas.

ClausIE: Clause-Based Open Information Extraction

Authors. Del Corro, Luciano, and Rainer Gemulla.
Abstract. We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of “useful” pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

A Joint Model for Question Answering over Multiple Knowledge Bases

Authors. Zhang, Yuanzhe, et al.
Abstract. As the amount of knowledge bases (KBs) grows rapidly, the problem of question answering (QA) over multiple KBs has drawn more attention. The most significant distinction between multiple KB-QA and single KB-QA is that the former must consider the alignments between KBs. The pipeline strategy first constructs the alignments independently, and then uses the obtained alignments to construct queries. However, alignment construction is not a trivial task, and the introduced noises would be passed on to query construction. By contrast, we notice that alignment construction and query construction are interactive steps, and jointly considering them would be beneficial. To this end, we present a novel joint model based on integer linear programming (ILP), uniting these two procedures into a uniform framework. The experimental results demonstrate that the proposed approach outperforms stateof-the-art systems, and is able to improve the performance of both alignment construction and query construction.

Each talk will last for 20 minutes. The audience will have 10 minutes to ask questions. There will be cookies and coffee break after the talks for discussion as well.

Posted in Colloquium, paper presentation | Comments Off on AKSW Colloquium, 18.01.2016, Natural Language Processing and Question Answering

LinkedGeoData: New RDF versions of OpenStreetMap datasets available

The AKSW research group is happy to announce that a new LinkedGeoData maintenance release with more than 1.2 billion triples based on the OpenStreetMap planet file from 2015-11-02 is now online. Enjoy!

Quick Links

Posted in Announcements, Linked Geo Data, Software Releases | Comments Off on LinkedGeoData: New RDF versions of OpenStreetMap datasets available

AKSW Colloquium, 14-12-2015, SERIMI

Mofeed HassanIn the incoming AKSW Colloquium, scheduled for the 14th of December at 3 PM, Mofeed Hassan will present the paper “SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets”, by Samur Araujo et al.

Abstract

State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , | Comments Off on AKSW Colloquium, 14-12-2015, SERIMI

5th Workshop on Linked Data in Linguistics (LDL-2016)

At next year’s LREC conference, AKSW/KILT members Bettina Klimek and Sebastian Hellmann will co-organize the next Linked Data in Linguistics Workshop. Please find the call for paper below. Thank you for submitting.

Call for Papers: 5th Workshop on Linked Data in Linguistics (LDL-2016): Managing, Building and Using Linked Language Resources

Portorož, Slovenia, 24th May 2016. Co-located with LREC 2016

Website: http://ldl2016.linguistic-lod.org/

Submission Deadline : February 8th 2016

Publishing language resources under open licenses and linking them together has been an area of increasing interest in academic circles, including applied linguistics, lexicography, natural language processing and information technology. It facilitates the exchange of knowledge and information across disciplines as well as between academia and the IT business. By collocating the 5th edition of the workshop series with LREC 2016, we encourage this interdisciplinary community to present and to discuss use cases, experiences, best practices, recommendations and technologies among each other and in interaction with the language resource community. We particularly invite contributions discussing the application of the Linked Open Data paradigm to linguistic data as it might provide an important step towards making linguistic data: i) easily and uniformly queryable, ii) interoperable and iii) sharable over the Web using open standards such as the HTTP protocol and the RDF data model.

While it has been shown that Linked Data has significant value for the management of language resources in the Web, the practice is still far from being an accepted standard in the community. Thus, it is important that we continue to push the development and adoption of Linked Data technologies among creators of language resources. In particular, Linked Data’s ability to increase the quality, interoperability and availability of data on the Web has lead us to focus on managing, improving and using language resources on the Web as a key focus for this year’s workshop.

We invite presentations of algorithms, methodologies, experiments, use cases, project proposals and position papers regarding the creation, publication or application of linguistic data collections and their linking with other resources, as well as descriptions of such data. This includes, but is not limited to, the following:

  • Building linked language resources
    • Novel vocabularies for describing linguistic objects using RDF.
    • Metrics and methodologies to develop linked language resources on the Web.
    • Natural language processing methods to enhance Linked Open Data.
  • Managing linked language resources 
    • Creating, maintaining and accessing language resource infrastructures based on Linked Data.
    • Metadata linking and curation for language resources on the Web.
    • Best practices for publication and linking of multilingual knowledge resources.
  • Using linked language resources
    • Application of Linked Open Data for linguistics, digital humanities and natural language processing.
    • Addressing challenges of scalability, multilinguality and interoperability in the Web.
    • Legal, social and scientific aspects of Linguistic Linked Open Data.

We invite both long (8 pages plus 2 pages of references, formatted according to the LREC guidelines) and short papers (4 pages plus 2 pages of references) representing original research, innovative approaches and resource types, use cases or in-depth discussions. Short papers may also represent project proposals, work in progress or data set descriptions. Papers will be published as part of the LREC workshop proceedings and presented as oral or poster presentations, as appropriate.

Datasets

We encourage submission of datasets and ask that these resources are included in the LLOD cloud (instructions can be found here). As such we require that datasets are either described in Datahub with sufficient metadata to be added to the cloud. In addition, as part of the LREC conference your resource will be described in the LRE Map and assigned an International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource.

Important Dates

Submission Deadline : February 8th 2016

Notification of Acceptance: March 10th 2016

Camera-Ready: March 24th 2016

Workshop: May 24th 2016

Organizers

  • Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany)
  • John Philip McCrae (National University of Ireland, Galway, Ireland)
  • Thierry Declerck (University of Saarland, Germany)
  • Elena Montiel (Universidad Politécnica de Madrid, Spain)
  • Petya Osenova (Sofia University and IICT-BAS, Bulgaria)
  • Sebastian Hellmann (AKSW/KILT, Universität Leipzig, Germany)
  • Julia Bosque Gil (Universidad Politécnica de Madrid, Spain)
  • Bettina Klimek (AKSW/KILT, Universität Leipzig, Germany)

Program Committee

  • Guadalupe Aguado (Universidad Politécnica de Madrid, Spain)
  • Núria Bel (Universitat Pompeu Fabra, Spain)
  • Claire Bonial (University of Colorado at Boulder, USA)
  • Paul Buitelaar (National University of Ireland, Galway, Ireland)
  • Steve Cassidy (Macquarie University, Australia)
  • Nicoletta Calzolari (ILC-CNR, Italy)
  • Damir Cavar (Eastern Michigan University, USA)
  • Philipp Cimiano (Bielefeld University, Germany)
  • Gerard de Melo (Tsinghua University, China)
  • Alexis Dimitriadis (Universiteit Utrecht, The Netherlands)
  • Judith Eckle-Kohler (Technische Universität Darmstadt, Germany)
  • Francesca Frontini (ILC-CNR, Italy)
  • Jeff Good (University at Buffalo, USA)
  • Asunción Gómez Pérez (Universidad Politécnica de Madrid, Spain)
  • Jorge Gracia (Universidad Politécnica de Madrid, Spain)
  • Yoshihiko Hayashi (Waseda University, Japan)
  • Nancy Ide (Vassar College, USA)
  • Fahad Khan (ILC-CNR, Italy)
  • Vanessa Lopez (IBM Europe, Ireland)
  • Steven Moran (Universität Zürich, Switzerland/Ludwig Maximilian University, Germany)
  • Roberto Navigli (University of Rome, “La Sapienza”, Italy)
  • Sebastian Nordhoff (Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany)
  • Antonio Pareja-Lora (Universidad Complutense Madrid, Spain)
  • Maciej Piasecki (Wroclaw University of Technology, Poland)
  • Francesca Quattri (Hong Kong Polytechnic University, Hong Kong)
  • Laurent Romary (INRIA, France)
  • Felix Sasaki (Deutsches Forschungszentrum für Künstliche Intelligenz, Germany)
  • Andrea Schalley (Griffith University, Australia)
  • Gilles Sérraset (Joseph Fourier University, France)
  • Kiril Simov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
  • Milena Slavcheva (JRC-Brussels, Belgium)
  • Aitor Soroa (University of the Basque Country, Spain)
  • Armando Stellato (University of Rome, Tor Vergata, Italy)
  • Piek Vossen (Vrije Universiteit Amsterdam, The Netherlands)
Posted in Announcements, Call for Paper, Events, Papers, workshop | Tagged , , , | Comments Off on 5th Workshop on Linked Data in Linguistics (LDL-2016)

AKSW Colloquium, 07-12-2015, LODVader

Ciro BaronOn the 7th of December at 3 PM, Ciro Baron will present LODVader, a new hybrid system which combines LOD Visualisation, Analytics and DiscovEry in Real-time.

Abstract

The Linked Open Data (LOD) cloud is in danger of becoming a black box. Simple questions such as “What kind of datasets are in the LOD cloud?”, “In what way(s) are these datasets connected?” – albeit frequently asked – are at the moment still difficult to answer due to the lack of proper tooling support. The infrequent update of the static LOD cloud diagram adds to the current dilemma, since there is neither reliable nor timely-updated information to perform an interactive search, analysis or in particular visualization in order to gain insight into the current state of Linked Open Data.

In this Colloquium, Ciro will present a new hybrid system which combines LOD Visualisation, Analytics and DiscovEry in Real-time (LODVader) to aid in answering the above questions. LODVader is equipped with (1) a multi-layer LOD cloud visualization component comprising of a light and a dark side, (2) dataset analysis components that extend the state of the art with new similarity measures and efficient link extracting techniques and (3) a fast search index that is an entry point for dataset discovery. At its core, LODVader employs a timely-updated index using a complex cluster of Bloom Filters (BF) as a fast search index with low memory footprint. This BF cluster is able to efficiently perform analysis on link and dataset similarities based on stored predicate and object information, which – once inverted – can be employed to discover broken links by displaying the Dark LOD Cloud.  By combining all these features with a versioning system, we allow for an up-to-date, multi-dimensional LOD cloud analysis, which – to the best of our knowledge – was not possible before.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , | Comments Off on AKSW Colloquium, 07-12-2015, LODVader

Session on Management of Ontology Evolution during the ALIGNED project meeting on Dec 3rd, 15:45

The fourth ALIGNED project meeting will be held in Leipzig between December 2-4, 2015. During the meeting, we welcome our guest presenters Ralph Schäfermeier and Dr. Anika Groß who will give presentations about the topic of Management of Ontology Evolution (see below). ALIGNED quality-centric, software and data engineering is a research project funded by Horizon 2020,  Project No. 644055. ALIGNED will develop new ways to build and maintain IT systems that use big data on the web (see bottom for more information).
The presentations will take place on Dec 3rd, 15:45 in Room P-502 (Paulinum, Leipzig)
If you wish to join the whole meeting (Dec 2nd-4th),  please contact Dimitris Kontokostas.

Presenters

Ralph Schäfermeier

OntoMaven: Maven-based Ontology Development and Management of Distributed Ontology Repositories

Among our guest speakers are Ralph Schäfermeier (Freie Universität Berlin) who will present a talk (Thu, Dec 3rd 15:45) on “OntoMaven“ (an Apache Maven tool for distributed ontology engineering and ontology-based software engineering).

Abstract: In collaborative agile ontology development projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The Apache Maven approach has proven its success in distributed collaborative Software Engineering by its widespread adoption. The contribution of this paper is a new design artifact called OntoMaven. OntoMaven adopts the Maven-based development methodology and adapts its concepts to knowledge engineering for Maven-based ontology development and management of ontology artifacts in distributed ontology repositories.

Dr. Anika Groß

Evolution of Ontology-based Mappings

Additionally, Dr. Anika Groß (Universität Leipzig, Institut für Informatik, Abteilung Datenbanken) will introduce the ELISA project (Thu, Dec 3rd 16:15) which addresses the development and evaluation of new methods for creating and maintaining semantic annotations. All of the partners will also give short technical briefings on aspects of the current ALIGNED project work that have proved especially interesting.

Abstract: Ontologies are used in numerous research disciplines and commercial applications to uniformly and semantically annotate real-world objects. Due to a rapid development of application domains the corresponding ontologies are changed frequently to include up-to-date knowledge. These changes dramatically influence dependent data as well as applications/systems, for instance, ontology mappings, that semantically interrelate ontologies. The talk will give an overview on evolution of ontologies and ontology-based mappings.

 

About ALIGNED

http://aligned-project.eu/

ALIGNED quality-centric, software and data engineering is a research project funded by Horizon 2020. ALIGNED will develop new ways to build and maintain IT systems that use big data on the web. ALIGNED brings together world class computer science researchers (Trinity College Dublin, University of Oxford, University of Leipzig), software companies specialised in data-intensive systems (Semantic Web Company), information companies (Wolters Kluwer) and academic curators of the Seshat: Global History Databank, large datasets describing world history and archaeology (University of Oxford, Adam Mickiewicz University in Poznań). Together they will create more efficient methods of building IT systems that extract, process, publish and share web data.

ALIGNED will lay the foundations for the next generation of big data systems that lower costs and deal with the web data challenges of dynamism, complexity, scale and inconsistency. ALIGNED is coordinated by the School of Computer Science and Statistics at Trinity College Dublin and is an associated project of the Science Foundation Ireland ADAPT research centre.

Posted in Announcements, Events, invited talk | Comments Off on Session on Management of Ontology Evolution during the ALIGNED project meeting on Dec 3rd, 15:45