AKSW Colloquium, 28.11.2016, NED using PBOH + Large-Scale Learning of Relation-Extraction Rules.

In the upcoming Colloquium, November the 28th at 3 PM, two papers will be presented:

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

Diego Moussallem will discuss the paper “Probabilistic Bag-Of-Hyperlinks Model for Entity Linking” by Octavian-Eugen Ganea et. al. which was accepted at WWW 2016.

Abstract:  Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e., linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web

Afterward, René Speck will present the paper “Large-Scale Learning of Relation-Extraction Rules with
Distant Supervision from the Web”
by Sebastian Krause et. al. which was accepted at ISWC 2012.

Abstract: We present a large-scale relation extraction (RE) system which learns grammar-based RE rules from the Web by utilizing large numbers of relation instances as seed. Our goal is to obtain rule sets large enough to cover the actual range of linguistic variation, thus tackling the long-tail problem of real-world applications. A variant of distant supervision learns several relations in parallel, enabling a new method of rule filtering. The system detects both binary and n-ary relations. We target 39 relations from Freebase, for which 3M sentences extracted from 20M web pages serve as the basis for learning an average of 40K distinctive rules per relation. Employing an efficient dependency parser, the average run time for each relation is only 19 hours. We compare these rules with ones learned from local corpora of different sizes and demonstrate that the Web is indeed needed for a good coverage of linguistic variation

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 28.11.2016, NED using PBOH + Large-Scale Learning of Relation-Extraction Rules.

Accepted paper in AAAI 2017

aaai-bannerHello Community! We are very pleased to announce that our paper “Radon– Rapid Discovery of Topological Relations” was accepted for presentation at the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), which will be held in February 4–9 at the Hilton San Francisco, San Francisco, California, USA.

In more detail, we will present the following paper: “Radon– Rapid Discovery of Topological Relations” Mohamed Ahmed Sherif, Kevin Dreßler, Panayiotis Smeros, and Axel-Cyrille Ngonga Ngomo

Abstract. Datasets containing geo-spatial resources are increasingly being represented according to the Linked Data principles. Several time-efficient approaches for discovering links between RDF resources have been developed over the last years. However, the time-efficient discovery of topological relations between geospatial resources has been paid little attention to. We address this research gap by presenting Radon, a novel approach for the rapid computation of topological relations between geo-spatial resources. Our approach uses a sparse tiling index in combination with minimum bounding boxes to reduce the computation time of topological relations. Our evaluation of Radon’s runtime on 45 datasets and in more than 800 experiments shows that it outperforms the state of the art by up to 3 orders of magnitude while maintaining an F-measure of 100%. Moreover, our experiments suggest that Radon scales up well when implemented in parallel.

Acknowledgments
This work is implemented in the link discovery framework LIMES and has been supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227) as well as the BMWI Project GEISER (project no. 01MD16014).

Posted in Uncategorized | Comments Off on Accepted paper in AAAI 2017

AKSW Colloquium, 17.10.2016, Version Control for RDF Triple Stores + NEED4Tweet

In the upcoming Colloquium, October the 17th at 3 PM, two papers will be presented:

Version Control for RDF Triple Stores

Marvin Frommhold will discuss the paper “Version Control for RDF Triple Stores” by Steve Cassidy and James Ballantine which forms the foundation of his own work regarding versioning for RDF.

Abstract:  RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.

NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and
Disambiguation

Afterwards, Diego Esteves will present the paper “NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and
Disambiguation” by Mena B. Habib and Maurice van Keulen which was accepted at ACL 2015.

Abstract: In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required;
and the seldom entities involved. In this paper, we introduce a novel framework
that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , , | Comments Off on AKSW Colloquium, 17.10.2016, Version Control for RDF Triple Stores + NEED4Tweet

LIMES 1.0.0 Released

Dear all,

the LIMES Dev team is happy to announce LIMES 1.0.0.

LIMES, the Link Discovery Framework for Metric Spaces, is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. Our approaches facilitate different approximation techniques to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. By these means, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. The approaches implemented in LIMES include the original LIMES  algorithm for edit distances, HR3, HYPPO and ORCHID.

Additionally, LIMES supports the first planning technique for link discovery HELIOS, that minimizes the overall execution of a link specification, without any loss of completeness. Moreover, LIMES implements supervised and unsupervised machine-learning algorithms for finding accurate link specifications. The algorithms implemented here include the supervised, active and unsupervised versions of EAGLE and WOMBAT.

 

Website: http://aksw.org/Projects/LIMES.html

Download: https://github.com/AKSW/LIMES-dev/releases/tag/1.0.0

GitHub: https://github.com/AKSW/LIMES-dev

User manual: http://aksw.github.io/LIMES-dev/user_manual/

Developer manual: http://aksw.github.io/LIMES-dev/developer_manual/

 

What is new in LIMES 1.0.0:

  • New LIMES GUI
  • New Controller that supports manual and graphical configuration
  • New machine learning pipeline: supports supervised, unsupervised and active learning algorithms
  • New dynamic planning for efficient link discovery
  • Updated execution engine to handle dynamic planning
  • Added support for qualitative (Precision, Recall, F-measure etc.) and quantitative (runtime duration etc.) evaluation metrics for mapping evaluation, in the presence of a gold standard
  • Added support for configuration files in XML and RDF formats
  • Added support for pointsets metrics such as Mean, Hausdorff and Surjection
  • Added support for MongeElkan, RatcliffObershelp string measures
  • Added support for Allen’s algebra temporal relations for event data
  • Added support for all topological relations derived from the DE-9IM model
  • Migrated the codebase to Java 8 and Jena 3.0.1

We would like to thank everyone who helped to create this release. We also acknowledge the support of the SAKE  and HOBBIT projects.

Kind regards,

The LIMES Dev team

 

Posted in Uncategorized | Comments Off on LIMES 1.0.0 Released

DL-Learner 1.3 (Supervised Structured Machine Learning Framework) Released

Dear all,

the Smart Data Analytics group at AKSW is happy to announce DL-Learner 1.3.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/AKSW/DL-Learner
Download: https://github.com/AKSW/DL-Learner/releases
ChangeLog: http://dl-learner.org/development/changelog/

DL-Learner is used for data analysis tasks within other tools such as ORE and RDFUnit. Technically, it uses refinement operator based, pattern-based and evolutionary techniques for learning on structured data. For a practical example, see http://dl-learner.org/community/carcinogenesis/. It also offers a plugin for Protégé, which can give suggestions for axioms to add.

In the current release, we added a large number of new algorithms and features. For instance, DL-Learner supports terminological decision tree learning, it integrates the LEAP and EDGE systems as well as the BUNDLE probabilistic OWL reasoner. We migrated the system to Java 8, Jena 3, OWL API 4.2 and Spring 4.3. We want to point to some related efforts here:

We want to thank everyone who helped to create this release, in particular we want to thank Giuseppe Cota who visited the core developer team and significantly improved DL-Learner. We also acknowledge support by the recently SAKE project, in which DL-Learner will be applied to event analysis in manufacturing use cases, as well as Big Data Europe and HOBBIT projects.

Kind regards,

Lorenz Bühmann, Jens Lehmann, Patrick Westphal and Simon Bin

 

Posted in Uncategorized | Comments Off on DL-Learner 1.3 (Supervised Structured Machine Learning Framework) Released

OntoWiki 1.0.0 released

Dear Semantic Web and Linked Data Community,
we are proud to finally announce the releases of OntoWiki 1.0.0 and the underlying Erfurt Framework in version 1.8.0.
After 10 years of development we’ve decided to release the teenager OntoWiki from the cozy home of 0.x versions.
Since the last release of 0.9.11 in January 2014 we did a lot of testing to stabilize OntoWikis behavior and accordingly made a lot of bug fixes, also we are now using PHP Composer for dependency management, improved the testing work flow, gave a new structure and home to the documentation and we have created a neat project landing page.

The development of OntoWiki is completely open source and we are happy for any contribution, especially to the code and the documentation, which is also kept in a Git repository with easy to edit Markdown pages. If you have questions about the usage of OntoWiki besides the documentation you can also use or mailinglist or the stackoverflow tag “ontowiki”.

Please see https://ontowiki.net/ for further information.

We also had a Poster for advertising the OntoWiki release at SEMANTiCS Conference:

OntoWiki 1.0

Philipp Frischmuth, Natanael Arndt, Michael Martin: OntoWiki 1.0: 10 Years of Development – What’s New in OntoWiki

We are happy for your feedback, in the name of the OntoWiki team,
Philipp, Michael and Natanael

Our Fingers on the Mouse

Posted in Announcements, LEDS, major tool release, OntoWiki, Software Releases | Tagged , , , | Comments Off on OntoWiki 1.0.0 released

AKSW Colloquium, 05.09.2016. LOD Cloud Statistics, OpenAccess at Leipzig University.

On the upcoming Monday (05.09.2016), AKSW group will discuss topics related to Semantic Web and LOD Cloud Statistics. Also, we will have invited speaker from University of Leipzig Library (UBL) Dr. Astrid Vieler talking about OpenAccess at Leipzig University.

LODStats: The Data Web Census Dataset

by Ivan Ermilov et al.
Presented by: Ivan Ermilov

Abstract: Over the past years, the size of the Data Web has increased significantly, which makes obtaining general insights into its growth and structure both more challenging and more desirable. The lack of such insights hinders important data management tasks such as quality, privacy and coverage analysis. In this paper, we present the LODStats dataset, which provides a comprehensive picture of the current state of a significant part of the Data Web. LODStats is based on RDF datasets from data.gov, publicdata.eu and datahub.io data catalogs and at the time of writing lists over 9 000 RDF datasets. For each RDF dataset, LODStats collects comprehensive statistics and makes these available in adhering to the LDSO vocabulary. This analysis has been regularly published and enhanced over the past five years at the public platform lodstats.aksw.org. We give a comprehensive overview over the resulting dataset.

OpenAccess at Leipzig University

Invited talk by Dr. Astrid Vieler from Leipzig University Library (UBL). The talk will be about Open Access in general and the Open Access Policy of our University in special. She will tell us more about our right, which we have toward the publishers, and she gives us advice and hints on how we can increase the visibility of our publications.

After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted in Announcements, Colloquium, Events, invited talk, paper presentation | Comments Off on AKSW Colloquium, 05.09.2016. LOD Cloud Statistics, OpenAccess at Leipzig University.

AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Michael Roeder On the 15th of August at 3 PM, Michael Röder will present the paper “RDF Query Relaxation Strategies Based on Failure Causes” of Fokou et al. in P702.

Abstract

Recent advances in Web-information extraction have led to the creation of several large Knowledge Bases (KBs). Querying these KBs often results in empty answers that do not serve the users’ needs. Relaxation of the failing queries is one of the cooperative techniques used to retrieve alternative results. Most of the previous work on RDF query relaxation compute a set of relaxed queries and execute them in a similarity-based ranking order. Thus, these approaches relax an RDF query without knowing its failure causes (FCs). In this paper, we study the idea of identifying these FCs to speed up the query relaxation process. We propose three relaxation strategies based on various information levels about the FCs of the user query and of its relaxed queries as well. A set of experiments conducted on the LUBM benchmark show the impact of our proposal in comparison with a state-of-the-art algorithm.

The paper is available at researchgate.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Article accepted in Journal of Web Semantics

We are happy to announce that the article “DL-Learner – A Framework for Inductive Learning on the Semantic Web” by Lorenz Bühmann, Jens Lehmann and Patrick Westphal was accepted for publication in the Journal of Web Semantics: Science, Services and Agents on the World Wide Web.

Abstract:

In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.

Posted in DL-Learner, Papers, Uncategorized | Comments Off on Article accepted in Journal of Web Semantics

AKSW Colloquium, 18.07.2016, AEGLE and node2vec

On Monday 18.07.2016, Kleanthi Georgala will give her Colloquium presentation for her paper “An Efficient Approach for the Generation of Allen Relations”, that was accepted at the European Conference on Artificial Intelligence (ECAI) 2016.

Abstract

Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eight simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 1.

Tommaso SoruAfterwards, Tommaso Soru will present a paper considered the latest chapter of the Everything-2-Vec saga, which encompasses outstanding works such as Word2Vec and Doc2Vec. The paper title is node2vec: Scalable Feature Learning for Networks” [PDF] by Aditya Grover and Jure Leskovec, accepted for publication at the International Conference on Knowledge Discovery and Data Mining (KDD), 2016 edition.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 18.07.2016, AEGLE and node2vec