AKSW Blog

AKSW Colloquium, 17.10.2016, Version Control for RDF Triple Stores + NEED4Tweet

Posted on October 17, 2016 by Marvin Frommhold

In the upcoming Colloquium, October the 17th at 3 PM, two papers will be presented:

Version Control for RDF Triple Stores

Marvin Frommhold will discuss the paper “Version Control for RDF Triple Stores” by Steve Cassidy and James Ballantine which forms the foundation of his own work regarding versioning for RDF.

Abstract: RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.

NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and
Disambiguation

Afterwards, Diego Esteves will present the paper “NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and
Disambiguation” by Mena B. Habib and Maurice van Keulen which was accepted at ACL 2015.

Abstract: In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required;
and the seldom entities involved. In this paper, we introduce a novel framework
that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged NED, NEE, RDF, twitter, versioning | Comments Off

LIMES 1.0.0 Released

Posted on October 14, 2016 by Kleanthi Georgala

Dear all,

the LIMES Dev team is happy to announce LIMES 1.0.0.

LIMES, the Link Discovery Framework for Metric Spaces, is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. Our approaches facilitate different approximation techniques to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. By these means, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. The approaches implemented in LIMES include the original LIMES algorithm for edit distances, HR3, HYPPO and ORCHID.

Additionally, LIMES supports the first planning technique for link discovery HELIOS, that minimizes the overall execution of a link specification, without any loss of completeness. Moreover, LIMES implements supervised and unsupervised machine-learning algorithms for finding accurate link specifications. The algorithms implemented here include the supervised, active and unsupervised versions of EAGLE and WOMBAT.

Website: http://aksw.org/Projects/LIMES.html

Download: https://github.com/AKSW/LIMES-dev/releases/tag/1.0.0

GitHub: https://github.com/AKSW/LIMES-dev

User manual: http://aksw.github.io/LIMES-dev/user_manual/

Developer manual: http://aksw.github.io/LIMES-dev/developer_manual/

What is new in LIMES 1.0.0:

New LIMES GUI
New Controller that supports manual and graphical configuration
New machine learning pipeline: supports supervised, unsupervised and active learning algorithms
New dynamic planning for efficient link discovery
Updated execution engine to handle dynamic planning
Added support for qualitative (Precision, Recall, F-measure etc.) and quantitative (runtime duration etc.) evaluation metrics for mapping evaluation, in the presence of a gold standard
Added support for configuration files in XML and RDF formats
Added support for pointsets metrics such as Mean, Hausdorff and Surjection
Added support for MongeElkan, RatcliffObershelp string measures
Added support for Allen’s algebra temporal relations for event data
Added support for all topological relations derived from the DE-9IM model
Migrated the codebase to Java 8 and Jena 3.0.1

We would like to thank everyone who helped to create this release. We also acknowledge the support of the SAKE and HOBBIT projects.

Kind regards,

The LIMES Dev team

Posted in Uncategorized | Comments Off

DL-Learner 1.3 (Supervised Structured Machine Learning Framework) Released

Posted on October 11, 2016 by Jens Lehmann

Dear all,

the Smart Data Analytics group at AKSW is happy to announce DL-Learner 1.3.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/AKSW/DL-Learner
Download: https://github.com/AKSW/DL-Learner/releases
ChangeLog: http://dl-learner.org/development/changelog/

DL-Learner is used for data analysis tasks within other tools such as ORE and RDFUnit. Technically, it uses refinement operator based, pattern-based and evolutionary techniques for learning on structured data. For a practical example, see http://dl-learner.org/community/carcinogenesis/. It also offers a plugin for Protégé, which can give suggestions for axioms to add.

In the current release, we added a large number of new algorithms and features. For instance, DL-Learner supports terminological decision tree learning, it integrates the LEAP and EDGE systems as well as the BUNDLE probabilistic OWL reasoner. We migrated the system to Java 8, Jena 3, OWL API 4.2 and Spring 4.3. We want to point to some related efforts here:

A new DL-Learner overview article is available at the Journal of Web Semantics (pre-print PDF).
We started a benchmarking framework for supervised machine learning from structured data (not restricted to RDF/OWL).
An article about the SPARQL reasoning component is now available (published at ECAI).
An article about terminological decision tree learning is available (published at EKAW).

We want to thank everyone who helped to create this release, in particular we want to thank Giuseppe Cota who visited the core developer team and significantly improved DL-Learner. We also acknowledge support by the recently SAKE project, in which DL-Learner will be applied to event analysis in manufacturing use cases, as well as Big Data Europe and HOBBIT projects.

Kind regards,

Lorenz Bühmann, Jens Lehmann, Patrick Westphal and Simon Bin

Posted in Uncategorized | Comments Off

OntoWiki 1.0.0 released

Posted on October 5, 2016 by NatanaelArndt

Dear Semantic Web and Linked Data Community,
we are proud to finally announce the releases of OntoWiki 1.0.0 and the underlying Erfurt Framework in version 1.8.0.
After 10 years of development we’ve decided to release the teenager OntoWiki from the cozy home of 0.x versions.
Since the last release of 0.9.11 in January 2014 we did a lot of testing to stabilize OntoWikis behavior and accordingly made a lot of bug fixes, also we are now using PHP Composer for dependency management, improved the testing work flow, gave a new structure and home to the documentation and we have created a neat project landing page.

The development of OntoWiki is completely open source and we are happy for any contribution, especially to the code and the documentation, which is also kept in a Git repository with easy to edit Markdown pages. If you have questions about the usage of OntoWiki besides the documentation you can also use or mailinglist or the stackoverflow tag “ontowiki”.

Please see https://ontowiki.net/ for further information.

We also had a Poster for advertising the OntoWiki release at SEMANTiCS Conference:

Philipp Frischmuth, Natanael Arndt, Michael Martin: OntoWiki 1.0: 10 Years of Development – What’s New in OntoWiki

We are happy for your feedback, in the name of the OntoWiki team,
Philipp, Michael and Natanael

Posted in Announcements, LEDS, major tool release, OntoWiki, Software Releases | Tagged 1.0.0, OntoWiki, PHP, release | Comments Off

AKSW Colloquium, 05.09.2016. LOD Cloud Statistics, OpenAccess at Leipzig University.

Posted on August 31, 2016 by Ivan Ermilov

On the upcoming Monday (05.09.2016), AKSW group will discuss topics related to Semantic Web and LOD Cloud Statistics. Also, we will have invited speaker from University of Leipzig Library (UBL) Dr. Astrid Vieler talking about OpenAccess at Leipzig University.

LODStats: The Data Web Census Dataset

by Ivan Ermilov et al.
Presented by: Ivan Ermilov

Abstract: Over the past years, the size of the Data Web has increased significantly, which makes obtaining general insights into its growth and structure both more challenging and more desirable. The lack of such insights hinders important data management tasks such as quality, privacy and coverage analysis. In this paper, we present the LODStats dataset, which provides a comprehensive picture of the current state of a significant part of the Data Web. LODStats is based on RDF datasets from data.gov, publicdata.eu and datahub.io data catalogs and at the time of writing lists over 9 000 RDF datasets. For each RDF dataset, LODStats collects comprehensive statistics and makes these available in adhering to the LDSO vocabulary. This analysis has been regularly published and enhanced over the past five years at the public platform lodstats.aksw.org. We give a comprehensive overview over the resulting dataset.

OpenAccess at Leipzig University

Invited talk by Dr. Astrid Vieler from Leipzig University Library (UBL). The talk will be about Open Access in general and the Open Access Policy of our University in special. She will tell us more about our right, which we have toward the publishers, and she gives us advice and hints on how we can increase the visibility of our publications.

After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted in Announcements, Colloquium, Events, invited talk, paper presentation | Comments Off

AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Posted on August 10, 2016 by Michael Roeder

Michael Roeder On the 15th of August at 3 PM, Michael Röder will present the paper “RDF Query Relaxation Strategies Based on Failure Causes” of Fokou et al. in P702.

Abstract

Recent advances in Web-information extraction have led to the creation of several large Knowledge Bases (KBs). Querying these KBs often results in empty answers that do not serve the users’ needs. Relaxation of the failing queries is one of the cooperative techniques used to retrieve alternative results. Most of the previous work on RDF query relaxation compute a set of relaxed queries and execute them in a similarity-based ranking order. Thus, these approaches relax an RDF query without knowing its failure causes (FCs). In this paper, we study the idea of identifying these FCs to speed up the query relaxation process. We propose three relaxation strategies based on various information levels about the FCs of the user query and of its relaxed queries as well. A set of experiments conducted on the LUBM benchmark show the impact of our proposal in comparison with a state-of-the-art algorithm.

The paper is available at researchgate.

About the AKSW Colloquium

Posted in Uncategorized | Comments Off

Article accepted in Journal of Web Semantics

Posted on August 2, 2016 by Lorenz Bühmann

We are happy to announce that the article “DL-Learner – A Framework for Inductive Learning on the Semantic Web” by Lorenz Bühmann, Jens Lehmann and Patrick Westphal was accepted for publication in the Journal of Web Semantics: Science, Services and Agents on the World Wide Web.

Abstract:

In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.

Posted in DL-Learner, Papers, Uncategorized | Comments Off

AKSW Colloquium, 18.07.2016, AEGLE and node2vec

Posted on July 18, 2016 by TommasoSoru

On Monday 18.07.2016, Kleanthi Georgala will give her Colloquium presentation for her paper “An Efficient Approach for the Generation of Allen Relations”, that was accepted at the European Conference on Artificial Intelligence (ECAI) 2016.

Abstract

Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eight simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 1.

Afterwards, Tommaso Soru will present a paper considered the latest chapter of the Everything-2-Vec saga, which encompasses outstanding works such as Word2Vec and Doc2Vec. The paper title is “node2vec: Scalable Feature Learning for Networks” [PDF] by Aditya Grover and Jure Leskovec, accepted for publication at the International Conference on Knowledge Discovery and Data Mining (KDD), 2016 edition.

Posted in Uncategorized | Comments Off

AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

Posted on June 29, 2016 by Ivan Ermilov

On the upcoming Monday (04.07.2016), AKSW group will discuss topics related to Semantic Web and Big Data as well as programming languages and code quality. In particular, the following papers will be presented:

S2RDF: RDF Querying with SPARQL on Spark

by Alexander Schätzle et al.
Presented by: Ivan Ermilov

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses its relational interface to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state
of the art SPARQL-on-Hadoop approaches using the recent WatDiv test suite. S2RDF achieves sub-second runtimes for majority of queries on a billion triples RDF graph

A Large Scale Study of Programming Languages and Code Quality in Github

by Baishakhi Ray et al.
Presented by: Tim Ermilov

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods,
and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages

Each paper will be presented in 20 minutes, which will be followed by 10 minutes discussion. After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted in BigDataEurope, Colloquium, paper presentation | Comments Off

Accepted Papers of AKSW Members @ Semantics 2016

Posted on June 27, 2016 by Sandra Bartsch

This year’s SEMANTiCS conference which is taking place between September 12 – 15, 2016 in Leipzig recently invited for the submission of research papers on semantic technologies. Several AKSW members seized the opportunity and got their submitted papers accepted for presentation at the conference.

These are listed below:

Executing SPARQL queries over Mapped Document Stores with SparqlMap-M (Jörg Unbehauen, Michael Martin )
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store (Natanael Arndt, Norman Radtke and Michael Martin)
Towards Versioning of Arbitrary RDF Data (Marvin Frommhold, Ruben Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen and Michael Martin)
DBtrends: Exploring query logs for ranking RDF data (Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg)
MEX Framework: Automating Machine Learning Metadata Generation (Diego Esteves, Pablo N. Mendes, Diego Moussallem, Julio Cesar Duarte, Maria Claudia Cavalcanti, Jens Lehmann, Ciro Baron Neto and Igor Costa)

Another AKSW-driven event of the SEMANTiCS 2016 will be the Linked Enterprise Data Services (LEDS) Track taking place between September 13-14, 2016. This track is specifically organized by the BMBF-funded LEDS project which is part of the Entrepreneurial Regions program – a BMBF Innovation Initiative for the New German Länder. Focus is on discussing with academic and industrial partners new approaches to discover and integrate background knowledge into business and governmental environments.

SEMANTiCS 2016 will also host the 7th edition of the DBpedia Community Meeting on the last day of the conference (September 15 – ‘DBpedia Day‘). DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and link the different data sets on the Web to Wikipedia data.

So come and join SEMANTiCS 2016, talk and discuss with us!

More information on the program can be found here.

LEDS is funded by: Part of:
BMBF_CMYK_Gef_L_300dpi

Wachstumskern Region

Posted in Announcements, Call for Paper, dbpedia, Events, LEDS, Papers, Uncategorized | Comments Off

AKSW Colloquium, 17.10.2016, Version Control for RDF Triple Stores + NEED4Tweet

In the upcoming Colloquium, October the 17th at 3 PM, two papers will be presented:

Version Control for RDF Triple Stores

About the AKSW Colloquium

LIMES 1.0.0 Released

DL-Learner 1.3 (Supervised Structured Machine Learning Framework) Released

OntoWiki 1.0.0 released

AKSW Colloquium, 05.09.2016. LOD Cloud Statistics, OpenAccess at Leipzig University.

LODStats: The Data Web Census Dataset

OpenAccess at Leipzig University

AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Abstract

About the AKSW Colloquium

Article accepted in Journal of Web Semantics

AKSW Colloquium, 18.07.2016, AEGLE and node2vec

AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

S2RDF: RDF Querying with SPARQL on Spark

A Large Scale Study of Programming Languages and Code Quality in Github

Accepted Papers of AKSW Members @ Semantics 2016

Recent Posts

Categories

Meta