Two AKSW Papers at ESWC 2015

We are very pleased to announce that two of our papers were accepted for presentation as full research papers at ESWC 2015.

Automating RDF Dataset Transformation and Enrichment (Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann)

With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

HAWK – Hybrid Question Answering using Linked Data (Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger)

The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach uses predicate-argument representations of questions to derive equivalent combinations of SPARQL query fragments and text queries. These are executed so as to integrate the results of the text queries into SPARQL and thus generate a formal interpretation of the query. We present a thorough evaluation of the framework, including an analysis of the influence of entity annotation tools on the generation process of the hybrid queries and a study of the overall accuracy of the system. Our results show that HAWK achieves 0.68 respectively 0.61 F-measure within the training respectively test phases on the Question Answering over Linked Data (QALD-4) hybrid query benchmark.

Come over to ESWC and enjoy the talks.

Best regards,

Sherif on behalf of AKSW

Posted in Announcements, Events, Papers | Comments Off

AKSW Colloquium, 03-23-2015, Git Triple Store and From CPU bringup to IBM Watson

From CPU bring up to IBM Watson by Kay Müller, visiting researcher, IBM Ireland

Kay Müller

Working in a corporate environment like IBM offers many different opportunities to work on the bleeding edge of research and development. In this presentation Kay Müller, who is currently a Software Engineer in the IBM Watson Group, is going to give a brief overview of some of the projects he has been working on in IBM. These projects range from a CPU bring up using VHDL to the design and development of a semantic search framework for the IBM Watson system.

Git Triple Store by Natanael Arndt

Natanael Arndt

In a setup of distributed clients resp. applications with different actors writing on the same knowledge base (KB) a synchronization of distributed copies of the KB, an edit history with provenance information and a management for different versions of the KB in parallel are needed. The aim is to design and construct a Triple Store back end which records any change on triple-level and enables distributed curation of RDF-graphs. This should be achieved by using a distributed revision control system for holding a serialization of the RDF-graph. Natanael Arndt will present the paper “R&Wbase: Git for triples” by Miel Vander Sande et al. published at LDOW2013 as related work. Additionally, he will present his ideas towards a colaboration infrastructure using distributed version control systems for triples.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

 

Posted in Colloquium, Uncategorized | Comments Off

ALIGNED project kick-off

ALIGNED, AKSW’s new H2020-funded project, kicked off in Dublin. The project brings together computer science researchers, companies building data-intensive systems and information technology, and academic curators of large datasets in an effort to build IT systems for aligned, co-evolving software and data lifecycles. These lifecycles will support automated testing, runtime data quality analytics, model-generated extraction and human curation interfaces.

AKSW will lead the data quality engineering part of ALIGNED, controlling the data lifecycle and providing integrity and verification techniques, using state-of-the-art tools such as RDFUnit and upcoming standards like  W3C Data Shapes. In this project, we will support our partners at Trinity College Dublin and Oxford Software Engineering as technical partners, Oxford Anthropology and Adam Mickiewicz University Poznan as data curators and publishers, as well as the Semantic Web Company and Wolters Kluwer, Germany providing enterprise solutions and use cases.

Find out more at aligned-project.eu and following @AlignedProject on Twitter.

Martin Brümmer on behalf of the NLP2RDF group

Aligned project kick-off team picture

Posted in Announcements, project kick-off | Leave a comment

AKSW Colloquium: Tommaso Soru and Martin Brümmer on Monday, March 2 at 3.00 p.m.

On Monday, 2nd of March 2015, Tommaso Soru will present ROCKER, a refinement operator approach for key discovery. Martin Brümmer will then present NIF annotation and provenance – A comparison of approaches.

Tommaso Soru – ROCKER – Abstract

As within the typical entity-relationship model, unique and composite keys are of central importance also when their concept is applied on the Linked Data paradigm. They can provide help in manifold areas, such as entity search, question answering, data integration and link discovery. However, the current state of the art does not count approaches able to scale while relying on a correct definition of key. We thus present a refinement-operator-based approach dubbed ROCKER, which has shown to scale to big datasets with respect to the run time and the memory consumption. ROCKER will be officially introduced at the 24th International Conference on World Wide Web.

Tommaso Soru, Edgard Marx, and Axel-Cyrille Ngonga Ngomo, “ROCKER – A Refinement Operator for Key Discovery”. [PDF]

Martin Brümmer – Abstract – NIF annotation and provenance – A comparison of approaches

The uptaking use of the NLP Interchange Format (NIF) reveals its shortcomings on a number of levels. One of these is tracking metadata of annotations represented in NIF – which NLP tool added which annotation with what confidence at which point in time etc.

A number of solutions to this task of annotating annotations expressed as RDF statements has been proposed over the years. The talk will weigh these solutions, namely annotation resources, reification, Open Annotation, quads and singleton properties in regard to their granularity, ease of implementation and query complexity.

The goal of the talk is presenting and comparing viable alternatives of solving the problem at hand and collecting feedback on how to proceed.

Posted in Announcements, Colloquium | Leave a comment

AKSW Colloquium: Edgard Marx and Tommaso Soru on Monday, February 23, 3.00 p.m.

On Monday, 23rd of February 2015, Edgard Marx will introduce Smart, a search engine designed over the Semantic Search paradigm; subsequently, Tommaso Soru will present ROCKER, a refinement operator approach for key discovery.

EDIT: Tommaso Soru’s presentation was moved to March 2nd.

Abstract – Smart

Since the conception of the Web, search engines play a key role in making content available. However, retrieving of the desire information is still significantly challenging. Semantic Search systems are a natural evolution of the traditional search engines. They promise more accurate interpretation by understanding the contextual meaning of the user query. In this talk, we will introduce our audience to Smart, a search engine designed over the Semantic Search paradigm. Smart incorporates two of our currently designed approaches of dealing with the problem of Information Retrieval, as well as a novel interface paradigm. Moreover, we will present some of the former, as well as more recent state-of-the-art approaches used by the industry – for instance by Yahoo!, Google and Facebook.

Abstract – ROCKER

As within the typical entity-relationship model, unique and composite keys are of central importance also when their concept is applied on the Linked Data paradigm. They can provide help in manifold areas, such as entity search, question answering, data integration and link discovery. However, the current state of the art does not count approaches able to scale while relying on a correct definition of key. We thus present a refinement-operator-based approach dubbed ROCKER, which has shown to scale to big datasets with respect to the run time and the memory consumption. ROCKER will be officially introduced at the 24th International Conference on World Wide Web.

Tommaso Soru, Edgard Marx, and Axel-Cyrille Ngonga Ngomo, “ROCKER – A Refinement Operator for Key Discovery”. [PDF]

Posted in Announcements, Colloquium, LIMES, paper presentation, PHD progress report | Tagged , , , , , , | Leave a comment

Call for Feedback on LIDER Roadmap

The LIDER project is gathering feedback on a roadmap for the use of Linguistic Linked Data for content analytics.  We invite you to give feedback in the following ways:

Excerpt from the roadmap

Full document: available here
Summary slides: available here

Content is growing at an impressive, exponential rate. Exabytes of new data are created every single day. In fact, data has been recently referred to as the “oil” of the new economy, where the new economy is understood as “a new way of organizing and managing economic activity based on the new opportunities that the Internet provided for businesses” .

Content analytics, i.e. the ability to process and generate insights from existing content, plays and will continue to play a crucial role for enterprises and organizations that seek to generate value from data, e.g. in order to inform decision and policy making.

As corroborated by many analysts, substantial investments in technology, partnerships and research are required to reach an ecosystem consisting of many players and technological solutions that provide the necessary infrastructure, expertise and human resources required to make sure that organizations can effectively deploy content analytics solutions at large scale in order to generate relevant insights that support policy and decision making, or even to define completely new business models in a data-driven economy.

Assuming that such investments need to be and will be made, this roadmap explores the role that linked data and semantic technologies can and will play in the field of content analytics and will generate a set of recommendations for organizations, funders and researchers on which technologies to invest as a basis to prioritize their investment in R&D as well as on optimizing their mid- and long-term strategies and roadmaps.

Conference Call on 19th of February 3 p.m. CET

Connection details: https://www.w3.org/community/ld4lt/wiki/Main_Page#LD4LT_calls
Summary slides: available here

Agenda

  1. Introduction to the LIDER Roadmap (Philipp Cimiano, 10 minutes)
  2. Discussion of Global Customer Engagement Use Cases (All, 10 minutes)
  3. Discussion of Public Sector and Civil Society Use Cases (All, 10 minutes)
  4. Discussion of Linked Data Life Cycle and Linguistic Linked Data Value Chain (All, 10 minutes)
  5. General Discussion on further use cases, items in the roadmap etc. (20 minutes)

In addition, the call will briefly discuss progress of meta-share linked data metadata model.

The call is open to the public, no LD4LT group participation is required. Dial-in information is available. Please spread this information widely. No knowledge about linguistic linked data is required. We especially are interested in feedback from potential users of linguistic linked data.

About the LIDER Project

Website: http://lider-project.eu

The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.

We aim at providing an ecosystem for the establishment of a new Linked Open Data (LOD) based ecosystem of free, interlinked, and semantically interoperable language resources (corpora, dictionaries, lexical and syntactic metadata, etc.) and media resources (image, video, etc. metadata) that will allow for free and open exploitation of such resources in multilingual, cross-media content analytics across the EU and beyond, with specific use cases in industries related to social media, financial services, localization, and other multimedia content providers and consumers.

Take a personal interview to include your voice into the roadmap

Contact: http://lider-project.eu/?q=content/contact-us

The EU project LIDER has been tasked by the European Commission to put together a roadmap for future R&D funding in multilingual industries such as content and knowledge localization, multilingual terminology and taxonomy management, cross-border business intelligence, etc. As a leading supplier of solutions in one or more of these industries, we would need your input for this roadmap. We would like to conduct a short interview with you to establish your views on current and developing R&D efforts in multilingual and semantic technologies that will likely play an increasing role in these industries, such as Linked Data and related standards for web-based, multilingual data processing. The interview will cover the below 5 questions and will not take more than 30 minutes. Please let us know on a suitable time and date.

Posted in Announcements | Leave a comment

AKSW Colloquium: Konrad Höffner and Michael Röder on Monday, February 16, 3.00 p.m.

CubeQA—Question Answering on Statistical Linked Data by Konrad Höffner

Abstract

Question answering systems provide intuitive access to data by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Statistical data, however, is structurally very different from other data and cannot be queried using existing approaches. Building upon a question corpus established in previous work, we created a benchmark for evaluating questions on statistical Linked Data in order to evaluate statistical question answering algorithms and to stimulate further research. Furthermore, we designed a question answering algorithm for statistical data, which covers a wide range of question types. To our knowledge, this is the first question answering approach for statistical RDF data and could open up a new research area.
See also the paper (preprint, under review) and the slides.

News from the WSDM 2015 by Michael Röder

Abstract

The WSDM conference is one of the major conferences for Web Search and Data Mining. Michael Röder was attending this years WSDM conference in Shanghai and wants to present a short overview over the conference topics. After that, he wants to take a closer look at FEL – an entity linking approach for search queries peresented at the conference.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

 

Posted in Announcements, Colloquium, PhD topic | Leave a comment

Kick-off of the FREME project

Hi all !

A new InfAI project, FREME, kicked off in Berlin. FREME – Open Framework of E-Services for Multilingual and Semantic Enrichment of Digital Content is an H2020 funded project with the objective of building an open, innovative, commercial-grade framework of e-services for multilingual and semantic enrichment of digital content.

InfAI will play an important role in FREME by driving two of the six central FREME services, e-Link and the e-Entity. NIF will be used as a mediator between language services and data sources, serving as foundation for e-Link, while DBpedia Spotlight will be a prototype for e-Entity services, linking named entities in natural language texts to Linked Open Data sets like DBpedia.

InfAI will also help to identify and publish new Linked Data sets that can contribute to data value chains. Our partners in this open content enrichment effort will be DFKI, Tilde, Iminds, Agro-Know, Wripl, VistaTEC and ISBM.

Stay tuned for more info ! In the meanwhile join the conversation on twitter #FREMEH2020.

– Amrapali Zaveri on behalf of the NLP2RDF group

Posted in Announcements, project kick-off | Leave a comment

DL-Learner 1.0 (Supervised Structured Machine Learning Framework) Released

Dear all,

we are happy to announce DL-Learner 1.0.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/AKSW/DL-Learner
Download: https://github.com/AKSW/DL-Learner/releases
ChangeLog: http://dl-learner.org/development/changelog/

DL-Learner is used for data analysis in other tools such as ORE and RDFUnit. Technically, it uses refinement operator based, pattern based and evolutionary techniques for learning on structured data. For a practical example, see http://dl-learner.org/community/carcinogenesis/. It also offers a plugin for Protege, which can give suggestions for axioms to add. DL-Learner is part of the Linked Data Stack – a repository for Linked Data management tools.

We want to thank everyone who helped to create this release, in particular (alphabetically) An Tran, Chris Shellenbarger, Christoph Haase, Daniel Fleischhacker, Didier Cherix, Johanna Völker, Konrad Höffner, Robert Höhndorf, Sebastian Hellmann and Simon Bin. We also acknowledge support by the recently started SAKE project, in which DL-Learner will be applied to event analysis in manufacturing use cases, as well as the GeoKnow and Big Data Europe projects where it is part of the respective platforms.

Kind regards,

Lorenz Bühmann, Jens Lehmann and Patrick Westphal

Posted in Announcements, DL-Learner, major tool release, Software Releases | Leave a comment

Writing a Survey – Steps, Advantages, Limitations and Examples

What is a Survey?

A survey or systematic literature review is a text of a scholarly paper, which includes the current knowledge including substantive findings, as well as theoretical and methodological contributions to a particular topic. Literature reviews use secondary sources, and do not report new or original experimental work [1].

A systematic review is a literature review focused on a research question, trying to identify, appraise, select and synthesize all high quality research evidence and arguments relevant to that question. Moreover, a literature review is comprehensive, exhaustive and repeatable, that is, the readers can replicate or verify the review.

Steps to perform a survey

  • Select two independent reviewers

  • Look for related/existing surveys

    • If it exists, see how long back it was done. If it was 10 years ago, you can go ahead and update it.

  • Formulate research questions

  • Devise eligibility criteria

  • Define search strategy – keywords, journals, conferences, workshops to search in

  • Retrieve further potential article using search strategy and also directly contacting top researchers in the field

  • Compare chosen articles among reviewers and decide a core set of papers to be included in the survey

  • Perform Qualitatively and Quantitatively on the selected set of papers

  • Report on the results

Advantages of writing a survey

There are several benefits/advantages of conducting a survey, such as:

  • A survey is the best way to get an idea of the state-of-the-art technologies, algorithms, tools etc. in a particular field

  • One can get a clear birds-eye overview of the current state of that field

  • It can serve as a great starting point for a student or any researcher thinking of venturing into that particular field/area of research

  • One can easily acquire updated information of a subject by referring to a review

  • It gives researchers the opportunity to formalize different concepts of a particular field

  • It allows one to identify challenges and gaps that are unanswered and crucial for that subject

Limitations of a survey

However, there are a few limitations that must be considered before undertaking a survey such as:

  • Surveys can tend to be biased, thus it is necessary to have two researchers, who perform the systematic search for the articles independently

  • It is quite challenging to unify concepts, especially when there are different ideas referring to the same concepts developed over several years

  • Indeed, conducting a survey and getting the article published is a long process

Surveys conducted by members of the AKSW group

In our group, three students conducted comprehensive literature reviews on three different topics:

  • Linked Data Quality: The survey covers 30 core papers, which focus on providing quality assessment methodologies for Linked Data specifically. A total of 18 data quality dimensions along with their definitions and 69 metrics are provided. Additionally, the survey contributes a comparison of 12 tools, which perform quality assessment of Linked Data [2].

  • Ubiquitous Semantic Applications: The survey presents a thorough analysis of 48 primary studies out of 172 initially retrieved papers.  The results consist of a comprehensive set of quality attributes for Ubiquitous Semantic Applications together with corresponding application features suggested for their realization. The quality attributes include aspects such as mobility, usability, heterogeneity, collaboration, customizability and evolvability. The proposed quality attributes facilitate the evaluation of existing approaches and the development of novel, more effective and intuitive Ubiquitous Semantic Applications [3].

  • User interfaces for semantic authoring of textual content: The survey covers a thorough analysis of 31 primary studies out of 175 initially retrieved papers. The results consist of a comprehensive set of quality attributes for SCA systems together with corresponding user interface features suggested for their realization. The quality attributes include aspects such as usability, automation, generalizability, collaboration, customizability and evolvability. The proposed quality attributes and UI features facilitate the evaluation of existing approaches and the development of novel more effective and intuitive semantic authoring interfaces [4].

Also, here is a presentation on “Systematic Literature Reviews”: http://slidewiki.org/deck/57_systematic-literature-review.

References

[1] Lisa A. Baglione (2012) Writing a Research Paper in Political Science. Thousand Oaks: CQ Press.

[2] Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer (2015), ‘Quality Assessment for Linked Data: A Survey’, Semantic Web Journal. http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey

[3] Timofey Ermilov, Ali Khalili, and Sören Auer (2014). ;Ubiquitous Semantic Applications: A Systematic Literature Review’. Int. J. Semant. Web Inf. Syst. 10, 1 (January 2014), 66-99. DOI=10.4018/ijswis.2014010103 http://dx.doi.org/10.4018/ijswis.2014010103

[4] Ali Khalili and Sören Auer (2013). ‘User interfaces for semantic authoring of textual content: A systematic literature review’, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 22, October 2013, Pages 1-18 http://www.sciencedirect.com/science/article/pii/S1570826813000498

Posted in Announcements, Data Quality | Tagged , , , , | Leave a comment