DL-Learner 1.1 (Supervised Structured Machine Learning Framework) Released

Dear all,

we are happy to announce DL-Learner 1.1.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/AKSW/DL-Learner
Download: https://github.com/AKSW/DL-Learner/releases
ChangeLog: http://dl-learner.org/development/changelog/

DL-Learner is used for data analysis in other tools such as ORE and RDFUnit. Technically, it uses refinement operator based, pattern based and evolutionary techniques for learning on structured data. For a practical example, see http://dl-learner.org/community/carcinogenesis/. It also offers a plugin for Protege, which can give suggestions for axioms to add. DL-Learner is part of the Linked Data Stack – a repository for Linked Data management tools.

In the current release, we improved the support for SPARQL endpoints as knowledge sources. You can now directly use a SPARQL endpoint for learning without an OWL reasoner on top of it. Moreover, we extended DL-Learner to also consider dates and inverse properties for learning. Further efforts were made to improve our Query Tree Learning algorithms (those are used to learn SPARQL queries rather than OWL class expressions).

We want to thank everyone who helped to create this release, in particular Robert Höhndorf and Giuseppe Rizzo. We also acknowledge support by the recently started SAKE project, in which DL-Learner will be applied to event analysis in manufacturing use cases, as well as the GeoKnow and Big Data Europe projects where it is part of the respective platforms.

Kind regards,

Lorenz Bühmann, Jens Lehmann, Patrick Westphal and Simon Bin

Posted in Announcements, DL-Learner, Software Releases | Comments Off on DL-Learner 1.1 (Supervised Structured Machine Learning Framework) Released

AKSW Colloquium, 20-07-2015, Enterprise Linked Data Networks

Enterprise Linked Data Networks (PhD progress report) by Marvin Frommhold

marvinFrommholdThe topic of the thesis is the scientific utilization of the LUCID research project, in particular the LUCID Endpoint Prototype. In LUCID we research and develop on Linked Data technologies in order to allow partners in supply chains to describe their work, their companies and their products for other participants. This allows for building distributed networks of supply chain partners on the Web without a centralized infrastructure.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LUCID, PHD progress report | Comments Off on AKSW Colloquium, 20-07-2015, Enterprise Linked Data Networks

AKSW Colloquium, 13-07-2015

Philipp Frischmuth will give a brief presentation regarding the current state of his PhD thesis and Lukas Eipert will present the topic of his upcoming internship:

As part of an internship at eccenca a configurable graphical RDF editor will be developed. Graphical components such as shapes and arrows will be translated to triples depending on the configuration of the editor. This talk outlines the idea and motivation of this task.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

 

Posted in Colloquium, invited talk, PHD progress report | Comments Off on AKSW Colloquium, 13-07-2015

AKSW Colloquium, 22-06-2015, Concept Expansion Using Web Tables, Mining entities from the Web, Linked Data Stack

Concept Expansion Using Web Tables by Chi Wang, Kaushik Chakrabarti, Yeye He,Kris Ganjam, Zhimin Chen, Philip A. Bernstein (WWW’2015), presented by Ivan Ermilov:

Ivan ErmilovAbstract. We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this paper, we propose to leverage the millions of tables on the web for this problem. The core technical challenge is to identify the “exclusive” tables for a concept to prevent semantic drift; existing holistic ranking techniques like personalized PageRank are inadequate for this purpose. We develop novel probabilistic ranking methods that can model a new type of table-entity relationship. Experiments with real-life concepts show that our proposed solution is significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques.

Mining entities from the Web by Anna Lisa Gentile

Anna Lisa GentileThis talk explores the task of mining entities and their describing attributes from the Web. The focus is on entity-centric websites, i.e. domain specific websites containing a description page for each entity. The task of extracting information from this kind of websites is usually referred as Wrapper Induction. We propose a simple knowledge based method which is (i) highly flexible with respect to different domains and (ii) does not require any training material, but exploits Linked Data as background knowledge source to build essential learning resources. Linked Data – an imprecise, redundant and large-scale knowledge resource – proved useful to support this Information Extraction task: for domains that are covered, Linked Data serve as a powerful knowledge resource for gathering learning seeds. Experiments on a publicly available dataset demonstrate that, under certain conditions, this simple approach based on distant supervision can achieve competitive results against some complex state of the art that always depends on training data.

Linked Data Stack by Martin Röbert

martinRobertMartin will present the packaging infrastructure developed for the Linked Data Stack project, which will be followed by a discussion about the future of the project.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, invited talk | Comments Off on AKSW Colloquium, 22-06-2015, Concept Expansion Using Web Tables, Mining entities from the Web, Linked Data Stack

AKSW Colloquium, 15-06-2015, Caching for Link Discovery

Using Caching for Local Link Discovery on Large Data Sets [PDF]
by Mofeed Hassan

Engineering the Data Web in the Big Data era demands the development of time- and Mofeed Hassan's depictionspace-efficient solutions for covering the lifecycle of Linked Data. As shown in previous works, using pure in-memory solutions is doomed to failure as the size of datasets grows continuously with time. In this work, presented by Mofeed Hassan, a study is performed on caching solutions for one of the central tasks on the Data Web, i.e., the discovery of links between resources. To this end, 6 different caching approaches were evaluated on real data using different settings. Our results show that while existing caching approaches already allow performing Link Discovery on large datasets from local resources, the achieved cache hits are still poor. Hence, we suggest the need for dedicated solutions to this problem for tackling the upcoming challenges pertaining to the edification of a semantic Web.

Posted in Colloquium, LIMES, Papers | Tagged , , , | Comments Off on AKSW Colloquium, 15-06-2015, Caching for Link Discovery

AKSW Colloquium, 08-06-2015, DBpediaSameAs, Dynamic-LOD

DBpediaSameAs: An approach to tackling heterogeneity in DBpedia identifiers by Andre Valdestilhas

This work provides an approach to tackle heterogeneity about a problem where several transient owl:sameAs redundant occurrences were found in DBpedia identifiers during searching for owl:sameAs occurrences that were observed while finding of co-references between different data sets.andre_terno_ita

Thus, in this work there are 3 contributions in order to solve this problem: (1) DBpedia Unique Identifier, which was provided to obtain a normalization for owl:sameAs occurrences providing a unique DBpedia identifier instead of several transient owl:sameAs redundant occurrences,  (2) Rate and suggest links, in order to improve the quality and also giving the possibility to have statistic data about the links, and (3) As a result of our work we were able to achieve a performance gain where the physical size has decreased from 16.2 GB to 6 GB triples and we also have the possibility to perform normalization and create an index.

The usability of the interface was evaluated by using a standard system of usability questionnaire. The positive results from all of our interviewed participants showed that the DBpediaSameAs property is easy to use and can thus lead to novel insights.

As proof of concept an implementation is provided in a computational web system, including a Service on the web and a Graphical User Interface.

Dynamic-LOD: An approach to count links using Bloom filters by Ciro Baron

The Web of Linked Data is growing and it becomes increasingly necessary to discover the relationship between different datasets.

Ciro Baron will present an approach for accurate link counting which uses Bloom filters (BF) to compare and approximately count links between datasets, solving the problem of lack of up-to-date meta-data about linksets. The paper which compare performance to classical approaches such as binary search tree (BST) and hash tables (HT) can be found in this link(http://svn.aksw.org/papers/2015/ISWC_DynLOD/public.pdf), and the results show that Bloom filter is 12x more efficient regarding of memory usage with adequate query speed performance.

In addition, Ciro will show a small cloud generated for all English DBpedia datasets and vocabularies available in Linked Open Vocabularies (LOV).

We evaluated Dynamic-LOD in three different aspects: firstly by analyzing data structure performance comparing BF with HS and BST, secondly a quantitative evaluation regarding false positives, speed to count links in a dense scenario like DBpedia and thirdly on a large scale based on lod-cloud distributions. In fact, all three evaluations indicates that BF is a good choice for what our work proposes.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, Data Quality, dbpedia, Uncategorized | Comments Off on AKSW Colloquium, 08-06-2015, DBpediaSameAs, Dynamic-LOD

Smart Data Web project kick-off

Smart Data Web, a new BMWi funded project kicked-off in Berlin. Central goal of Smart Data Web is leveraging state-of-the-art data extraction and enrichment technologies as well as Linked Data to create value-added systems for German industry. Knowledge relevant to decision-making processes will be extracted from government and industry data, official web pages and social media, analyzed using NLP and integrated into knowledge graphs. These graphs will be accessible to focus industries via dashboards and APIs, as well as the public via Linked Data. Special concern will be given to legal questions, such as data licensing as well as data security and privacy.

AKSW, representing the University of Leipzig in this project, will develop the German Knowledge Graph, the central aggregation and integration interface of Smart Data Web. Unlike most current Linked Data knowledge bases, the German Knowledge Graph will focus industry-relevant data. The graph will be developed in an iterative extraction, integration and interlinking process, building on proven technologies of the Linked Data Stack. Data quality and persistence are a special priority of the German Knowledge Graph since consistency has to be guaranteed at all times. RDFUnit is our tool of choice to accomplish this task.

Smart Data Web will contribute significantly to overcome the barriers that hinder the integration of Semantic Web technologies, Web 2.0 data and data analysis for commercial application. Our partners in this project will be Beuth University of Applied SciencesDFKI, Siemens, uberMetrics and VICO Research.

Find out more at smartdataweb.de.

Martin Brümmer on behalf of the NLP2RDF group

Posted in Announcements, project kick-off | Comments Off on Smart Data Web project kick-off

AKSW Colloquium, 01-06-2015, MEX – Publishing ML Experiment Results, Scaling DL-Learner – Status and Plans

MEX – Publishing ML Experiment Results by Diego Esteves

DiegoOver the decades many machine learning experiments have been published, collaborating with the scientific community progress. One of the key-factors in order to compare machine learning experiment results to each other and collaborate positively is to thoroughly perform them on the same computing environment using the same sample data sets and algorithms configurations. Besides, practical experience shows that scientists and engineers tend to have too many output data for their sets of experiments, which, in the end, is either difficult to be analyzed without a provenance metadata as well as archive properly. Despite the efforts for publishing and managing these variables accordingly, we still have a knowledge gap, which is explicated by a missing public ontology for machine learning experiments in order to achieve the interoperability for published results. In order to minimize this gap, we introduce the novel MEX Ontology, built up based on the W3C PROV Ontology (PROV-O) and following the nanopublication principles.

Scaling DL-Learner – Status and Plans by Simon Bin

SimonWith the event of Big Data and Large Scale Data Processing, new challenges also are approaching the DL-Learner. The framework for supervised Machine Learning on OWL and RDF may benefit from various approaches to make it work better with huge data. In this talk, first experimental results of using SPARQL instead of traditional OWL-Reasoner approaches will be shown and possible future directions for scaling DL-Learner will be sketched.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium | Comments Off on AKSW Colloquium, 01-06-2015, MEX – Publishing ML Experiment Results, Scaling DL-Learner – Status and Plans

AKSW Colloquium, 18-05-2015, Multilingual Morpheme Ontology, Personalised Access and Enrichment of Linked Data Resources

MMoOn – A Multilingual Morpheme Ontology by Bettina Klimek

BettinaIn the last years a rapid emergence of lexical resources evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is either absent or only contained in semi-structured strings. While a plethora of linguistic resources for the lexical domain already exist and are highly reused, there is still a great gap for equivalent morphological datasets and ontologies. In order to enable the capturing of the semantics of expressions beneath the word-level, I will present a Multilingual Morpheme Ontology called MMoOn. It is designed for the creation of machine-processable and interoperable morpheme inventories of a given natural language. As such, any MMoOn dataset will contain not only semantic information of whole words and word-forms but also information on the meaningful parts of which they consist, including inflectional and derivational affixes, stems and bases as well as a wide range of their underlying meanings.

Personalised Access and Enrichment of Linked Data Resources by Milan Dojchinovski

MilanRecent efforts in the Semantic Web community have been primarily focused at developing technical infrastructure and methods for efficient Linked Data acquisition, interlinking and publishing. Nevertheless, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In the recent years, we have conducted two lines of research to address this problem. The first line of research aims at developing graph based methods for “personalised access to Linked Data”. A key contribution of this research is the ”Linked Web APIs” dataset, the largest Web services dataset with over 11K service descriptions, which has been used as a validation dataset. The second line of research has aimed at enrichment of Linked Data text resources and development of “entity recognition and linking” methods. In the talk, I will present the developed methods and the results from the evaluation on a different datasets and evaluation challenges, and the lessons learned in this activities. I will discuss the adaptability, performance of the developed methods and present the future directions.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium | Comments Off on AKSW Colloquium, 18-05-2015, Multilingual Morpheme Ontology, Personalised Access and Enrichment of Linked Data Resources

AKSW Colloquium, 11-05-2015, DBpedia distributed extraction framework

Scaling up the DBpedia extraction framework by Nilesh Chakraborty

NileshThe DBpedia extraction framework extracts different kinds of structured information from Wikipedia to generate various datasets. Performing a full extraction of Wikipedia dumps of all languages (or even just the mapping-based languages) takes a significant amount of time. The distributed extraction framework runs the extraction on top of Apache Spark so that users can leverage multi-core machines or a distributed cluster of commodity machines to perform faster extraction. For example, performing extraction of the 30-40 mapping based languages on a machine with a quad-core CPU and 16G RAM takes about 36 hours. Running the distributed framework in the same setting using three such worker nodes takes around 10 hours. It’s easy to achieve faster running times by adding more cores or more machines. Apart from the Spark-based extraction framework, we have also implemented a distributed wiki-dump downloader to download Wikipedia dumps for multiple languages, from multiple mirrors, on a cluster in parallel. This is still a work in progress, and in this talk I will discuss the methods and challenges involved in this project, and our immediate goals and timeline.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 11-05-2015, DBpedia distributed extraction framework