AKSW Colloquium, 27-04-2015, Ontotext’s RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform via & Knowledge-Based Trust

This colloquium features two talks. First the Self-Service Semantic Suite (S4) platform is presented by Marin Dimitrov (Ontotext), followed up by Jörg Unbehauens report on Googles effort on using factual correctness as a ranking factor.

RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform

In this talk Marin Dimitrov (Ontotext) will introduce the RDFdatabase-as-a-service (DBaaS) options for managing RDF data in the Cloud via the Self-Service Semantic Suite (S4) platform. With S4 developers and researchers can instantly get access to fully managed RDF DBaaS, without the need for hardware provisioning, maintenance and operations. Additionally, the S4 platform provides on-demand access to text analytics services for news, social media and life sciences, as well as access to knowledge graphs (DBpedia, Freebase and GeoNames).

The goal of the S4 platform is to make it easy for developers and researchers to develop smart/semantic applications, without the need to spend time and effort on infrastructure provisioning and maintenance. Marin will also provide examples of EC funded research projects – DaPaaS, ProDataMarket and KConnect — that plan to utilise the S4 platform for semantic data management

More information on S4 will be available in [1][2] and [3]

[1] Marin Dimitrov, Alex Simov and Yavor Petkov.  On-demand Text Analytics and Metadata Management with S4. In: proceedings of Workshop on Emerging Software as a Service and Analytics (ESaaSA 2015) at the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015), Lisbon, Portugal.

[2] Marin Dimitrov, Alex Simov and Yavor Petkov. Text Analytics and Linked Data Management As-a-Service with S4. In: proceedings of 3rd International Workshop on Semantic Web Enterprise Adoption and Best Practice (WaSABi 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

[3] Marin Dimitrov, Alex Simov and Yavor Petkov. Low-cost Open Data As-a-Service in the Cloud. In: proceedings of 2nd Semantic Web Enterprise Developers Workshop (SemDev 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

Report on: “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources”

by Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, Wei Zhang

Link to the paper

Presentation by Jörg Unbehauen


“The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method. “

Posted in Uncategorized | Comments Off on AKSW Colloquium, 27-04-2015, Ontotext’s RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform via & Knowledge-Based Trust

Talk by Kleanthi Georgala

Last week on Friday, 17th April, Kleanthi Georgala visited AKSW and gave a talk entitled “Traces Through Time: Probabilistic Record Linkage – Medieval and Early Modern”. More information below.

KLEANTHI GEORGALAThis innovative, multi-disciplinary project will deliver practical analytical tools to support large-scale exploration of big historical datasets. The project aims to bring together international research experience in the digital humanities, natural language processing, information science, data mining and linked data, with large, complex and diverse ‘big data’ spanning over 500 years of British history.

The project’s technical outputs will be a methodology and supporting toolkit that identify individuals within and across historical datasets, allowing people to be traced through the records and enabling their stories to emerge from the data. The tools will handle the ‘fuzzy’ nature of historical data, including aliases, incomplete information, spelling variations and the errors that are inevitably encountered in official records. The toolkit will be open and configurable, offering the flexibility to formulate and ask interesting questions of the data, exploring it in ways that were not imagined when the records were created. The open approach will create opportunities for further enhancement or re-use and offers the further potential to deliver the outputs as a service, extensible to new datasets as these become available. This brings the vision of finding and linking individuals in new combinations of datasets, from the widest range of historical sources.

Traces Through Time project was a collaboration between The National Archives in the United Kingdom, The Institute of Historical Research, University of Brighton and University of Leiden . This presentation describes the probabilistic record linkage system that was developed for this task by the University of Leiden team and introduces some insightful examples of matches that our system was able to find in the Medieval and Early Modern data, as well as a number of experiments on artificial data to test the workings of the system.

Also, you may view her talk here:

Posted in Announcements, Colloquium, Events, invited talk | Comments Off on Talk by Kleanthi Georgala

SAKE Projekt website goes live

Hi all!

The project website for the BMWi funded Smart Data Web Project “SAKE” is now on-line at www.sake-projekt.de. It already mentions the first SAKE-related publication by Saleem@AKSW and introduces our partners as well as the industry use cases which we are going to tackle. Check it out and spread the word. And don’t forget to follow @SAKE_Projekt on twitter.

Simon on behalf of the SAKE team

Posted in SAKE | Comments Off on SAKE Projekt website goes live

AKSW Colloquium, 20-04-2015, OWL/DL approaches to improve POS tagging

In this colloquium Markus Ackermann will touch on the ‘linguistic gap‘ of recent POS tagging endeavours (as perceived by C. Manning, [1]). Building on observations in that paper, potential paths towards more linguistically informed POS tagging are explored:

An alternative to the most widely employed ground truth for development and evaluation of POS tagging systems for English will be presented ([2]) and utilization of benefits of a DL-based representation of POS tags for a multi-tool tagging approach will be shown ([3]).

Finally, the presenter will give an overview about work in progress with the goal to combine OWL/DL-representation of POS tags with a suitable symbolic machine learning tool (DL-Learner, [4]) to improve the performance of a state-of-the-art statistical POS tagger with human-interpretable post-correction rules formulated as OWL/DL-expressions.

[1] Christopher D. Manning. 2011. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In Alexander Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171–189.

[2] G.R. Sampson. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Clarendon Press (Oxford University Press).

[3] Christian Chiarcos. 2010. Towards Robust Multi-Tool Tagging: An OWL/DL-Based Approach. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL2010.

[4] Jens Lehmann. 2009. DL-Learner: Learning Concepts in Description Logics. In The Journal of Machine Learning Research, Volume 10, pp. 2639-2642.

Posted in Colloquium | Comments Off on AKSW Colloquium, 20-04-2015, OWL/DL approaches to improve POS tagging

AKSW Colloquium, 13-04-2015, Effective Caching Techniques for Accelerating Pattern Matching Queries

In this colloquium, Claus Stadler will present the paper Effective Caching Techniques for Accelerating Pattern Matching Queries by Arash Fard, Satya Manda, Lakshmish Ramaswamy, and John A. Miller.

Abstract: Using caching techniques to improve response time of queries is a proven approach in many contexts. However, it is not well explored for subgraph pattern matching queries, mainly because  of  subtleties  enforced  by  traditional  pattern  matching models.  Indeed,  efficient  caching  can  greatly  impact  the  query answering performance for massive graphs in any query engine whether  it  is  centralized  or  distributed.  This  paper  investigates the capabilities of the newly introduced pattern matching models in graph simulation family for this purpose. We propose a novel caching technique, and show how the results of a query can be used to answer the new similar queries according to the similarity measure  that  is  introduced.  Using  large  real-world  graphs,  we experimentally verify the efficiency of the proposed technique in answering subgraph pattern matching queries

Link to PDF

Posted in Colloquium | Comments Off on AKSW Colloquium, 13-04-2015, Effective Caching Techniques for Accelerating Pattern Matching Queries

Special talk: Linked Data Quality Assessment and its Application to Societal Progress Measurement

Linked Data Quality Assessment and its Application to Societal Progress Measurement

Amrapali ZaveriAbstract: In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case.  A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. In this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. Next, three different methodologies for linked data quality assessment are evaluated namely (i) user-driven; (ii) crowdsourcing and (iii) semi-atuomated use case driven. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis.

Join us!

  • Thursday, 9 April at 2pm, Room P702

Posted in Uncategorized | Comments Off on Special talk: Linked Data Quality Assessment and its Application to Societal Progress Measurement

Two AKSW Papers at ESWC 2015

We are very pleased to announce that two of our papers were accepted for presentation as full research papers at ESWC 2015.

Automating RDF Dataset Transformation and Enrichment (Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann)

With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

HAWK – Hybrid Question Answering using Linked Data (Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger)

The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach uses predicate-argument representations of questions to derive equivalent combinations of SPARQL query fragments and text queries. These are executed so as to integrate the results of the text queries into SPARQL and thus generate a formal interpretation of the query. We present a thorough evaluation of the framework, including an analysis of the influence of entity annotation tools on the generation process of the hybrid queries and a study of the overall accuracy of the system. Our results show that HAWK achieves 0.68 respectively 0.61 F-measure within the training respectively test phases on the Question Answering over Linked Data (QALD-4) hybrid query benchmark.

Come over to ESWC and enjoy the talks.

Best regards,

Sherif on behalf of AKSW

Posted in Announcements, Events, Papers | Comments Off on Two AKSW Papers at ESWC 2015

AKSW Colloquium, 03-23-2015, Git Triple Store and From CPU bringup to IBM Watson

From CPU bring up to IBM Watson by Kay Müller, visiting researcher, IBM Ireland

Kay Müller

Working in a corporate environment like IBM offers many different opportunities to work on the bleeding edge of research and development. In this presentation Kay Müller, who is currently a Software Engineer in the IBM Watson Group, is going to give a brief overview of some of the projects he has been working on in IBM. These projects range from a CPU bring up using VHDL to the design and development of a semantic search framework for the IBM Watson system.

Git Triple Store by Natanael Arndt

Natanael Arndt

In a setup of distributed clients resp. applications with different actors writing on the same knowledge base (KB) a synchronization of distributed copies of the KB, an edit history with provenance information and a management for different versions of the KB in parallel are needed. The aim is to design and construct a Triple Store back end which records any change on triple-level and enables distributed curation of RDF-graphs. This should be achieved by using a distributed revision control system for holding a serialization of the RDF-graph. Natanael Arndt will present the paper “R&Wbase: Git for triples” by Miel Vander Sande et al. published at LDOW2013 as related work. Additionally, he will present his ideas towards a colaboration infrastructure using distributed version control systems for triples.


About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.


Posted in Colloquium, Uncategorized | Comments Off on AKSW Colloquium, 03-23-2015, Git Triple Store and From CPU bringup to IBM Watson

ALIGNED project kick-off

ALIGNED, AKSW’s new H2020-funded project, kicked off in Dublin. The project brings together computer science researchers, companies building data-intensive systems and information technology, and academic curators of large datasets in an effort to build IT systems for aligned, co-evolving software and data lifecycles. These lifecycles will support automated testing, runtime data quality analytics, model-generated extraction and human curation interfaces.

AKSW will lead the data quality engineering part of ALIGNED, controlling the data lifecycle and providing integrity and verification techniques, using state-of-the-art tools such as RDFUnit and upcoming standards like  W3C Data Shapes. In this project, we will support our partners at Trinity College Dublin and Oxford Software Engineering as technical partners, Oxford Anthropology and Adam Mickiewicz University Poznan as data curators and publishers, as well as the Semantic Web Company and Wolters Kluwer, Germany providing enterprise solutions and use cases.

Find out more at aligned-project.eu and following @AlignedProject on Twitter.

Martin Brümmer on behalf of the NLP2RDF group

Aligned project kick-off team picture

Posted in Announcements, project kick-off | 1 Comment

AKSW Colloquium: Tommaso Soru and Martin Brümmer on Monday, March 2 at 3.00 p.m.

On Monday, 2nd of March 2015, Tommaso Soru will present ROCKER, a refinement operator approach for key discovery. Martin Brümmer will then present NIF annotation and provenance – A comparison of approaches.

Tommaso Soru – ROCKER – Abstract

As within the typical entity-relationship model, unique and composite keys are of central importance also when their concept is applied on the Linked Data paradigm. They can provide help in manifold areas, such as entity search, question answering, data integration and link discovery. However, the current state of the art does not count approaches able to scale while relying on a correct definition of key. We thus present a refinement-operator-based approach dubbed ROCKER, which has shown to scale to big datasets with respect to the run time and the memory consumption. ROCKER will be officially introduced at the 24th International Conference on World Wide Web.

Tommaso Soru, Edgard Marx, and Axel-Cyrille Ngonga Ngomo, “ROCKER – A Refinement Operator for Key Discovery”. [PDF]

Martin Brümmer – Abstract – NIF annotation and provenance – A comparison of approaches

The uptaking use of the NLP Interchange Format (NIF) reveals its shortcomings on a number of levels. One of these is tracking metadata of annotations represented in NIF – which NLP tool added which annotation with what confidence at which point in time etc.

A number of solutions to this task of annotating annotations expressed as RDF statements has been proposed over the years. The talk will weigh these solutions, namely annotation resources, reification, Open Annotation, quads and singleton properties in regard to their granularity, ease of implementation and query complexity.

The goal of the talk is presenting and comparing viable alternatives of solving the problem at hand and collecting feedback on how to proceed.

Posted in Announcements, Colloquium | Leave a comment