NLP Interchange Format (NIF) 1.0 Spec, Demo and Reference Implementation

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI Design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Employing these URIs, annotations can be published on the Web as Linked Data and interchanged between different NLP tools and applications.

In order to simplify the combination of tools, improve their interoperability and facilitating the use of Linked Data we developed the NLP Interchange Format (NIF). NIF addresses the interoperability problem on three layers: the structural, conceptual and access layer. NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-) texts (structural layer) and a comprehensive ontology for describing common NLP terms and concepts (conceptual layer). NIF-aware applications will produce output (and possibly also consume input) adhering to the NIF ontology as REST services (access layer). Other than more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. Another benefit is, that a NIF wrapper has to be only created once for a particular tool, but enables the tool to interoperate with a potentially large number of other
tools without additional adaptations. Ultimately, we envision an ecosystem of NLP tools and services to emerge using NIF for exchanging and integrating rich annotations.

We designed NIF to be very light-weight and to reduce the amount of triples to achieve better scalability. The following triples in N3 Syntax express that the string “W3C” on http://www.w3.org/DesignIssues/LinkedData.html (index 22849 to 22852) is linked to the DBpedia resource of “World_Wide_Web_Consortium”:

@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix scms: <http://ns.aksw.org/scms/> .
@prefix nerd: <http://nerd.eurecom.fr/ontology#> .
ld:offset_22849_22852_W3C str:anchorOf "W3C" .
ld:offset_22849_22852_W3C scms:means dbpedia:World_Wide_Web_Consortium .
ld:offset_22849_22852_W3C a dbo:Organisation , nerd:Organization .

NIF already incorporates the Ontologies of Linguistic Annotation (OLiA) and the Named Entity Recognition and Disambiguation (NERD) ontology. Please get in contact, if you know of further NLP ontologies, which we can reuse and integrate in NIF.

This release consists of the following items:

We would like to thank our colleagues from AKSW research group and the LOD2 project for their helpful comments and inspiring discussions during the development of NIF. Especially, we would like to thank Christian Chiarcos for his support while using OLiA, the members of the Working Group on Open Data in Linguistics and the students that participated in the NIF field study: Markus Ackermann, Martin Brümmer, Didier Cherix, Marcus Nitzschke, Robert Schulze.

Posted in Announcements, NLP2RDF, Software Releases | Tagged , | Leave a comment

DBpedia SPARQL Benchmark paper wins ISWC2011 best-paper award

The closing ceremony of ISWC2011 in Bonn is just over and we are excited to have won the best research paper award with our paper:

Mohamed Morsey, Jens Lehmann, Sören Auer, Axel Cyrille Ngonga Ngomo: DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. To appear in Proceedings of 10th International Semantic Web Conference (ISWC2011), Oct 23-27, 2011, Bonn, Germany. [BIB]

In the paper we describe on the example of DBpedia how domain-specific SPARQL benchmarks can be generated and used for assessing the performance of triples stores.

Its a great  success for Mohamed Morsey, who did most of the implementation work and is only in the second year of his PhD studies at AKSW.

Enhanced by Zemanta
Posted in dbpedia, Papers | 1 Comment

ISSLOD 2011, a great success!

20 students from about 10 different countries were in Leipzig last week, from 12 – 18 September, to attend the Indian-Summer School on Linked Data. Right from the basics of Linked Data, introduced by Chris Bizer and Sören Auer, they learnt about the intricate details of Linked Data such as Interlinking, Conversion, NLP, Reasoning, SPARQL, Linked Semantic Multimedia and more. The AKSW experts shared the details of OntoWiki, RDFaCE, Mobile Semantic Applications, LinkedGeoData.  A poster session was also organized in the middle of the week, where each student got a chance to present their work as well to see the work of other attendees, aiding in networking, finding students with similar interest for further collaborations and also getting insightful comments from experts in the field. The best poster with the title “On Linked Data Indexing and Querying”, was awarded to Martin Svoboda from Charles University in Prague. Besides just passively listening to lectures, the students were asked to do a student project within a week to get hands-on experience of Linked Data. They were presented with several topics, from the AKSW members who would act as mentors, which they could choose according to their interest and over the next two days they were given time to work on it. On Saturday, all groups presented their work and the projects really showed that exciting results can be achieved even in a short period of time using Linked Data! Each group could implement their ideas and show a demo too. We awarded a prize to the best group, which consisted of Gareev Rinat, Michael Meder, Ivo Lasek and Robert Yao, based on a poll from all the attendees. They worked on the “Entity Disambiguation” project, supervised by Axel Ngonga. With all the hard work, there was also time for play, which included a welcome reception at an authentic German restaurant, Thüringer Hof, a city tour around Leipzig’s cultural center and an excursion to Fockeberg, a small hill in Leipzig, accompanied with barbecue and drinks. Check out the photographs here !

Einstein Quotes

Posted in Announcements, Events, LOD2 | Tagged | Leave a comment

Finally … Assisted Link Discovery

COLANUT configuration windows

Hello world,

We are happy to announce that LIMES has been extended with an interface that will make linking easier than ever before. The COLANUT (Complex Linking in a NUTshell) interface implements time-efficient schema matching algorithms that allow LIMES to discover and suggest initial class and properties matchings for linking. The whole is embedded in an easy-to-use GUI that allows you to create link specifications easily and download them as XML files or simply to run them online. Check COLANUT out at http://limes.aksw.org/colanut. A technical description can be found here.

And before I forget, two papers centered around linking with LIMES were accepted at OM2011, the ontology matching workshop at ISWC. Come around and get all details on the exciting development around LIMES and Link Discovery in general.

Link on,
Axel

Posted in Announcements, LIMES, Papers, SCMS, Software Releases | Tagged , , , , | Leave a comment

Assisted Linked Data Consumption

Yet another project for improving the access to Linked Data! It is our pleasure to announce the first version of the Assisted Linked Data Consumption Engine (ALOE). The aim of ALOE project is to assist users during the consumption from and fusion of Linked Data sources. ALOE achieves this goal by discovering class and property mappings across endpoints even when no schema information is available. Moreover, ALOE provides several functions for transforming the data from the source knowledge base into a format that corresponds to that of the target knowledge base. Therewith, ALOE enables lay and experienced users to consume Linked Data with great ease. More information on ALOE can be found at

http://aksw.org/projects/aloe

Cheers,
Axel

Posted in ALOE, Announcements, Projects, SCMS, Software Releases | Leave a comment

Semantics and Media

We partook in the organization of the Semantics and Media Workshop at the University of Mainz, where Axel is a Research Fellow. The aim of the workshop was to bring practitioners and research all around media and semantics together and to discuss the application of semantic technologies to achieve media representation, retrieval and convergence. The workshop took place from the 14th to the 15th and attracted four dozen practitioners and researchers from multiple areas such as publishing, semantic web, information retrieval and classification as well as digital humanities. Results from our research projects LOD2 and SCMS were received with great interest by the participants. The presentations (incl. Sören’s) will be available here soon. Furthermore, the talks will be published asap.

Stay tuned,
Axel

Posted in Announcements, Events, FOX, LIMES, LOD2, Projects, SCMS | Tagged , , , | Leave a comment

RDFaCE: Put a Smile on the Face of Semantic Content Authoring

We are happy to announce the beta release of RDFaCE (RDFa Content Editor). RDFaCE is an online text editor based on TinyMCE. It supports authoring of RDFa content.

In addition to two classical views for text authoring (WYSIWYG and HTML Source Code view) , RDFaCE  supports two novel views for semantic content authoring namely WYSIWYM (What You See Is What You Mean) and a Triple view (aka. Fact View).

The WYSIWYM displays semantic annotations on top of classical WYSIWYG view which is widely used for Web content creation. It uses dynamic CSS stylesheets to distinguish semantic content from normal content.

The triple view is another semantic view which only shows the facts (i.e. triples) stated in the text. RDFaCE provides a syncronization between these four views so that changes in one view cause respective changes in the other views.

Another important RDFaCE feature is the combining of results from multiple NLP APIs to facilitate the semantic authoring process with automatic annotations. This feature provides an initial set of annotations for users that can be modified and extended later on.

A demo version of RDFaCE is available at http://rdface.aksw.org. To see a short screencast of RDFaCE features, visit here. For more information visit the RDFaCE Project Page.

Posted in Announcements, RDFaCE, Software Releases | 2 Comments

LIMES 0.5RC1

We could not resist the pleasure of making the demo of the new release candidate of LIMES (0.5RC1) available for all. LIMES 0.5 comes fitted with a new grammar for complex metric specification and fully novel algorithms. The new version of our framework scales even better than the previous ones and is several order of magnitude faster than other Link Discovery Frameworks. We are currently cleaning the code and adding some more features here and there. Stay tuned for the upcoming release. More information on the project and a demo can be accessed at http://limes.sf.net.

Link on!
Axel

Posted in Announcements, LIMES, Software Releases | Tagged , , | 2 Comments

FOX Version 0.1

We are thrilled to announce the first version of the Federated knOwledge eXtraction (FOX) framework. FOX integrates and merges the results of frameworks for Named Entity Recognition, Keyword/Keyphrase Extraction and Relation Extraction by using machine learning techniques. By these means, FOX can generate RDF out of natural language with improved accuracy. FOX has been shown to be up to 15% more accurate than other frameworks, including commercial software. More information at http://fox.aksw.org.

Stay tuned for more releases,
Axel

Posted in Announcements, FOX, Software Releases | Tagged , , , , | 1 Comment

Deutsche Biographie becomes part of the LOD cloud

During a workshop on June 27th at the Historical College in Munich, the AKSW group and the group of the New German Biography at the Bavarian Academy of Sciences and Humanities presented their results of the LOD2-supported PUBLINK project. Within this project meta-data about 46,000 biographies, 42,000 people and 12,000 locations were made available as Linked Data and RDF. We used the GND vocabulary of the German National Library and enriched it with a few additional classes and relations. In addition to the representation in RDF,  we offer an OntoWiki instance for browsing and querying the dataset. With the help of the RelationshipFinder tool relations between two or more persons may be visualized by using this SPARQL endpoint.

The 22 workshop participants from AKSW, BAdW, BSB, BMLO, Deutsches Museum, FAU Erlangen, GNM, ISGV and LMU started a discussion particularly on the question of how to increase the interlinking of historical information on the Semantic Web. The most crucial aspects in this respect were related to the representations of historical locations and to a more generalized vocabulary for historical information.

More information is available at the following websites:

Linked data endpoint / project page: http://data.deutsche-biographie.de

German Biography:  http://www.deutsche-biographie.de

SPARQL endpoint and OntoWiki instance: http://ndb.publink.lod2.eu

LOD2 project page: http://lod2.eu

Posted in Events, LOD2, OntoWiki, Projects | Leave a comment