AKSW Colloquium, 18.01.2016, Natural Language Processing and Question Answering

On the upcoming colloquium, Ivan Ermilov and Konrad Höffner, members of AKSW, will present two papers from the natural language processing (NLP) and Question Answering (QA) research areas.

ClausIE: Clause-Based Open Information Extraction

Authors. Del Corro, Luciano, and Rainer Gemulla.
Abstract. We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of “useful” pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

A Joint Model for Question Answering over Multiple Knowledge Bases

Authors. Zhang, Yuanzhe, et al.
Abstract. As the amount of knowledge bases (KBs) grows rapidly, the problem of question answering (QA) over multiple KBs has drawn more attention. The most significant distinction between multiple KB-QA and single KB-QA is that the former must consider the alignments between KBs. The pipeline strategy first constructs the alignments independently, and then uses the obtained alignments to construct queries. However, alignment construction is not a trivial task, and the introduced noises would be passed on to query construction. By contrast, we notice that alignment construction and query construction are interactive steps, and jointly considering them would be beneficial. To this end, we present a novel joint model based on integer linear programming (ILP), uniting these two procedures into a uniform framework. The experimental results demonstrate that the proposed approach outperforms stateof-the-art systems, and is able to improve the performance of both alignment construction and query construction.

Each talk will last for 20 minutes. The audience will have 10 minutes to ask questions. There will be cookies and coffee break after the talks for discussion as well.

Posted in Colloquium, paper presentation | Comments Off on AKSW Colloquium, 18.01.2016, Natural Language Processing and Question Answering

LinkedGeoData: New RDF versions of OpenStreetMap datasets available

The AKSW research group is happy to announce that a new LinkedGeoData maintenance release with more than 1.2 billion triples based on the OpenStreetMap planet file from 2015-11-02 is now online. Enjoy!

Quick Links

Posted in Announcements, Linked Geo Data, Software Releases | Comments Off on LinkedGeoData: New RDF versions of OpenStreetMap datasets available

AKSW Colloquium, 14-12-2015, SERIMI

Mofeed HassanIn the incoming AKSW Colloquium, scheduled for the 14th of December at 3 PM, Mofeed Hassan will present the paper “SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets”, by Samur Araujo et al.

Abstract

State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , , , | Comments Off on AKSW Colloquium, 14-12-2015, SERIMI

5th Workshop on Linked Data in Linguistics (LDL-2016)

At next year’s LREC conference, AKSW/KILT members Bettina Klimek and Sebastian Hellmann will co-organize the next Linked Data in Linguistics Workshop. Please find the call for paper below. Thank you for submitting.

Call for Papers: 5th Workshop on Linked Data in Linguistics (LDL-2016): Managing, Building and Using Linked Language Resources

Portorož, Slovenia, 24th May 2016. Co-located with LREC 2016

Website: http://ldl2016.linguistic-lod.org/

Submission Deadline : February 8th 2016

Publishing language resources under open licenses and linking them together has been an area of increasing interest in academic circles, including applied linguistics, lexicography, natural language processing and information technology. It facilitates the exchange of knowledge and information across disciplines as well as between academia and the IT business. By collocating the 5th edition of the workshop series with LREC 2016, we encourage this interdisciplinary community to present and to discuss use cases, experiences, best practices, recommendations and technologies among each other and in interaction with the language resource community. We particularly invite contributions discussing the application of the Linked Open Data paradigm to linguistic data as it might provide an important step towards making linguistic data: i) easily and uniformly queryable, ii) interoperable and iii) sharable over the Web using open standards such as the HTTP protocol and the RDF data model.

While it has been shown that Linked Data has significant value for the management of language resources in the Web, the practice is still far from being an accepted standard in the community. Thus, it is important that we continue to push the development and adoption of Linked Data technologies among creators of language resources. In particular, Linked Data’s ability to increase the quality, interoperability and availability of data on the Web has lead us to focus on managing, improving and using language resources on the Web as a key focus for this year’s workshop.

We invite presentations of algorithms, methodologies, experiments, use cases, project proposals and position papers regarding the creation, publication or application of linguistic data collections and their linking with other resources, as well as descriptions of such data. This includes, but is not limited to, the following:

  • Building linked language resources
    • Novel vocabularies for describing linguistic objects using RDF.
    • Metrics and methodologies to develop linked language resources on the Web.
    • Natural language processing methods to enhance Linked Open Data.
  • Managing linked language resources 
    • Creating, maintaining and accessing language resource infrastructures based on Linked Data.
    • Metadata linking and curation for language resources on the Web.
    • Best practices for publication and linking of multilingual knowledge resources.
  • Using linked language resources
    • Application of Linked Open Data for linguistics, digital humanities and natural language processing.
    • Addressing challenges of scalability, multilinguality and interoperability in the Web.
    • Legal, social and scientific aspects of Linguistic Linked Open Data.

We invite both long (8 pages plus 2 pages of references, formatted according to the LREC guidelines) and short papers (4 pages plus 2 pages of references) representing original research, innovative approaches and resource types, use cases or in-depth discussions. Short papers may also represent project proposals, work in progress or data set descriptions. Papers will be published as part of the LREC workshop proceedings and presented as oral or poster presentations, as appropriate.

Datasets

We encourage submission of datasets and ask that these resources are included in the LLOD cloud (instructions can be found here). As such we require that datasets are either described in Datahub with sufficient metadata to be added to the cloud. In addition, as part of the LREC conference your resource will be described in the LRE Map and assigned an International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource.

Important Dates

Submission Deadline : February 8th 2016

Notification of Acceptance: March 10th 2016

Camera-Ready: March 24th 2016

Workshop: May 24th 2016

Organizers

  • Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany)
  • John Philip McCrae (National University of Ireland, Galway, Ireland)
  • Thierry Declerck (University of Saarland, Germany)
  • Elena Montiel (Universidad Politécnica de Madrid, Spain)
  • Petya Osenova (Sofia University and IICT-BAS, Bulgaria)
  • Sebastian Hellmann (AKSW/KILT, Universität Leipzig, Germany)
  • Julia Bosque Gil (Universidad Politécnica de Madrid, Spain)
  • Bettina Klimek (AKSW/KILT, Universität Leipzig, Germany)

Program Committee

  • Guadalupe Aguado (Universidad Politécnica de Madrid, Spain)
  • Núria Bel (Universitat Pompeu Fabra, Spain)
  • Claire Bonial (University of Colorado at Boulder, USA)
  • Paul Buitelaar (National University of Ireland, Galway, Ireland)
  • Steve Cassidy (Macquarie University, Australia)
  • Nicoletta Calzolari (ILC-CNR, Italy)
  • Damir Cavar (Eastern Michigan University, USA)
  • Philipp Cimiano (Bielefeld University, Germany)
  • Gerard de Melo (Tsinghua University, China)
  • Alexis Dimitriadis (Universiteit Utrecht, The Netherlands)
  • Judith Eckle-Kohler (Technische Universität Darmstadt, Germany)
  • Francesca Frontini (ILC-CNR, Italy)
  • Jeff Good (University at Buffalo, USA)
  • Asunción Gómez Pérez (Universidad Politécnica de Madrid, Spain)
  • Jorge Gracia (Universidad Politécnica de Madrid, Spain)
  • Yoshihiko Hayashi (Waseda University, Japan)
  • Nancy Ide (Vassar College, USA)
  • Fahad Khan (ILC-CNR, Italy)
  • Vanessa Lopez (IBM Europe, Ireland)
  • Steven Moran (Universität Zürich, Switzerland/Ludwig Maximilian University, Germany)
  • Roberto Navigli (University of Rome, “La Sapienza”, Italy)
  • Sebastian Nordhoff (Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany)
  • Antonio Pareja-Lora (Universidad Complutense Madrid, Spain)
  • Maciej Piasecki (Wroclaw University of Technology, Poland)
  • Francesca Quattri (Hong Kong Polytechnic University, Hong Kong)
  • Laurent Romary (INRIA, France)
  • Felix Sasaki (Deutsches Forschungszentrum für Künstliche Intelligenz, Germany)
  • Andrea Schalley (Griffith University, Australia)
  • Gilles Sérraset (Joseph Fourier University, France)
  • Kiril Simov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
  • Milena Slavcheva (JRC-Brussels, Belgium)
  • Aitor Soroa (University of the Basque Country, Spain)
  • Armando Stellato (University of Rome, Tor Vergata, Italy)
  • Piek Vossen (Vrije Universiteit Amsterdam, The Netherlands)
Posted in Announcements, Call for Paper, Events, Papers, workshop | Tagged , , , | Comments Off on 5th Workshop on Linked Data in Linguistics (LDL-2016)

AKSW Colloquium, 07-12-2015, LODVader

Ciro BaronOn the 7th of December at 3 PM, Ciro Baron will present LODVader, a new hybrid system which combines LOD Visualisation, Analytics and DiscovEry in Real-time.

Abstract

The Linked Open Data (LOD) cloud is in danger of becoming a black box. Simple questions such as “What kind of datasets are in the LOD cloud?”, “In what way(s) are these datasets connected?” – albeit frequently asked – are at the moment still difficult to answer due to the lack of proper tooling support. The infrequent update of the static LOD cloud diagram adds to the current dilemma, since there is neither reliable nor timely-updated information to perform an interactive search, analysis or in particular visualization in order to gain insight into the current state of Linked Open Data.

In this Colloquium, Ciro will present a new hybrid system which combines LOD Visualisation, Analytics and DiscovEry in Real-time (LODVader) to aid in answering the above questions. LODVader is equipped with (1) a multi-layer LOD cloud visualization component comprising of a light and a dark side, (2) dataset analysis components that extend the state of the art with new similarity measures and efficient link extracting techniques and (3) a fast search index that is an entry point for dataset discovery. At its core, LODVader employs a timely-updated index using a complex cluster of Bloom Filters (BF) as a fast search index with low memory footprint. This BF cluster is able to efficiently perform analysis on link and dataset similarities based on stored predicate and object information, which – once inverted – can be employed to discover broken links by displaying the Dark LOD Cloud.  By combining all these features with a versioning system, we allow for an up-to-date, multi-dimensional LOD cloud analysis, which – to the best of our knowledge – was not possible before.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, paper presentation | Tagged , | Comments Off on AKSW Colloquium, 07-12-2015, LODVader

Session on Management of Ontology Evolution during the ALIGNED project meeting on Dec 3rd, 15:45

The fourth ALIGNED project meeting will be held in Leipzig between December 2-4, 2015. During the meeting, we welcome our guest presenters Ralph Schäfermeier and Dr. Anika Groß who will give presentations about the topic of Management of Ontology Evolution (see below). ALIGNED quality-centric, software and data engineering is a research project funded by Horizon 2020,  Project No. 644055. ALIGNED will develop new ways to build and maintain IT systems that use big data on the web (see bottom for more information).
The presentations will take place on Dec 3rd, 15:45 in Room P-502 (Paulinum, Leipzig)
If you wish to join the whole meeting (Dec 2nd-4th),  please contact Dimitris Kontokostas.

Presenters

Ralph Schäfermeier

OntoMaven: Maven-based Ontology Development and Management of Distributed Ontology Repositories

Among our guest speakers are Ralph Schäfermeier (Freie Universität Berlin) who will present a talk (Thu, Dec 3rd 15:45) on “OntoMaven“ (an Apache Maven tool for distributed ontology engineering and ontology-based software engineering).

Abstract: In collaborative agile ontology development projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The Apache Maven approach has proven its success in distributed collaborative Software Engineering by its widespread adoption. The contribution of this paper is a new design artifact called OntoMaven. OntoMaven adopts the Maven-based development methodology and adapts its concepts to knowledge engineering for Maven-based ontology development and management of ontology artifacts in distributed ontology repositories.

Dr. Anika Groß

Evolution of Ontology-based Mappings

Additionally, Dr. Anika Groß (Universität Leipzig, Institut für Informatik, Abteilung Datenbanken) will introduce the ELISA project (Thu, Dec 3rd 16:15) which addresses the development and evaluation of new methods for creating and maintaining semantic annotations. All of the partners will also give short technical briefings on aspects of the current ALIGNED project work that have proved especially interesting.

Abstract: Ontologies are used in numerous research disciplines and commercial applications to uniformly and semantically annotate real-world objects. Due to a rapid development of application domains the corresponding ontologies are changed frequently to include up-to-date knowledge. These changes dramatically influence dependent data as well as applications/systems, for instance, ontology mappings, that semantically interrelate ontologies. The talk will give an overview on evolution of ontologies and ontology-based mappings.

 

About ALIGNED

http://aligned-project.eu/

ALIGNED quality-centric, software and data engineering is a research project funded by Horizon 2020. ALIGNED will develop new ways to build and maintain IT systems that use big data on the web. ALIGNED brings together world class computer science researchers (Trinity College Dublin, University of Oxford, University of Leipzig), software companies specialised in data-intensive systems (Semantic Web Company), information companies (Wolters Kluwer) and academic curators of the Seshat: Global History Databank, large datasets describing world history and archaeology (University of Oxford, Adam Mickiewicz University in Poznań). Together they will create more efficient methods of building IT systems that extract, process, publish and share web data.

ALIGNED will lay the foundations for the next generation of big data systems that lower costs and deal with the web data challenges of dynamism, complexity, scale and inconsistency. ALIGNED is coordinated by the School of Computer Science and Statistics at Trinity College Dublin and is an associated project of the Science Foundation Ireland ADAPT research centre.

Posted in Announcements, Events, invited talk | Comments Off on Session on Management of Ontology Evolution during the ALIGNED project meeting on Dec 3rd, 15:45

AKSW Colloquium, 30-11-2015, SCARO + Large-scale multilingual knowledge extraction & quality assessment

Dimitris KontokostasOn November 30 at 3 PM, Klaus LykoKlaus Lyko (right) will present SCARO. Afterwards, Dimitris Kontokostas (left) will present the progress of his PhD thesis “Large-scale multilingual knowledge extraction & quality assessment”.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

 

Posted in Colloquium, PHD progress report | Comments Off on AKSW Colloquium, 30-11-2015, SCARO + Large-scale multilingual knowledge extraction & quality assessment

AKSW Colloquium, 23-11-2015, CVtec and Patty

Andreas NareikeCVtec and model-driven semantification by Andreas Nareike

In this presentation, I will give a short introduction to our project CVtec (http://www.cv-tec.de/). CVtec is concerned with knowledge management for technical facilities and uses methods of model-driven software development. Although CVtec uses relational databases to persist data right now, we are researching possibilities to move towards a Semantic data model. I will give an overview of our different approaches and hope to get some valuable feedback.

René SpeckRené Speck will present Patty: A Taxonomy of Relational Patterns with Semantic Types by Ndapandula Nakashole, Gerhard Weikum, Fabian SuchanekMax Planck Institute for Informatics,  Saarbücken, Germany

Abstract
This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 23-11-2015, CVtec and Patty

AKSW Colloquium, 09-11-2015, Versioning of Arbitrary RDF Data (PhD progress report) and GraphLab Platform

GraphLab Platform – Overview and History  by Simon Bin

SimonGraphLab is a graph-based distributed computation framework. It was developed from 2009 at Carnegie Mellon University. At that time it was competing with Hadoop on Graph processing. The typical example algorithm demonstrated with it is the PageRank calculation. It still appears today in the Spark GraphX documentation as a filler for the computation step. We will look at the architecture, sample code and what happened to GraphLab today.

Versioning of Arbitrary RDF Data (PhD progress report) by Marvin Frommhold

marvinFrommhold

A major challenge of B2B Data Networks is efficient synchronization of data between the participants, this is especially true for Linked Data based networks. The exchange of the differences only has thereby proved to be very bandwidth and memory-friendly. Unfortunately, there is a  lack of robust and highly efficient versioning and synchronization protocols for Linked Data which hinders a wide adoption of Linked Data in B2B communication. For this reason we develop a versioning system for arbitrary RDF data as part of the LUCID and LEDS research projects. The system will be a feature of the eccenca Linked Data Suite. A big challenge in versioning of RDF data is blank node support. Our approach creates patches which allow to address blank nodes without the need to make changes to the original dataset. This forms the foundation for a comprehensive versioning of any RDF data which enables efficient data exchange in a distributed network.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium, LEDS, LUCID, PHD progress report | Comments Off on AKSW Colloquium, 09-11-2015, Versioning of Arbitrary RDF Data (PhD progress report) and GraphLab Platform

AKSW Colloquium, 2 November, 3pm, Automating Geo-spatial RDF Dataset Integration and Enrichment

Mohamed Sherif depictionOn November 2nd at 3 PM, Mohamed Sherif will present the progress of his PhD titled “Automating Geo-spatial RDF Dataset Integration and Enrichment”.

Abstract:

Within this thesis, we will spur the transition from islands of isolated Geographic Information Systems (GIS) to enriched geo-spatial Linked Data sets with which geographic information can easily be integrated and processed. To achieve this goal, we will provide concepts, approaches and use cases that facilitate the combination and manipulation of geographic information with other data types that are already present on the Linked Data Web. Moreover, we will provide means to automate the proposed approaches by applying unsupervised machine learning algorithms or weakly supervised algorithms.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 2 November, 3pm, Automating Geo-spatial RDF Dataset Integration and Enrichment