University of Leipzig AKSW Homepage | Blog |

Archiv für die Kategorie 'Announcements'

AKSW Colloquium with NIF Release Preparation on Monday, February 10

February 7, 2014 - 2:23 pm by KonradHoeffner - No comments »

NIF Release Preparation

On Monday, February 10, at 1.30 pm in room P702 (Paulinum of the University of Leipzig main building at the Augustusplatz), Sebastian Hellmann will present the Natural Language Processing (NLP) Interchange Format (NIF) which is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. During the meeting we will jointly look at the existing tools and infrastructure, collect issues and discuss potential fixes. Bringing a laptop is recommended.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Abstract

We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this session, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.

References:

Webinar recording on crowdsourced OpenCourseWare authoring with SlideWiki available

January 20, 2014 - 12:53 am by Sören Auer - One comment »

On Jan 21, 2014 at 15:00 CET we were hosting a webinar on crowdsourced, multilingual OpenCourseWare authoring with http://SlideWiki.org:

https://plus.google.com/events/c2sc6p89v67g91uhrn7ofeu6j1c

SlideWiki.org is a platform for OpenCourseWare authoring and publishing. Similar as Wikipedia allows the collaborative authoring of encyclopedic texts, GitHub of sourceode or OpenStreetMaps of maps, SlideWiki enables communities to create comprehensive open educational resources. SlideWiki is open-source software and all content in SlideWiki is Open Knowledge. In this hangout we want to introduce SlideWiki’s rich feature set and explain how SlideWiki can be used for educational projects and teaching.

Preview release of conTEXT for Linked-Data based text analytics

January 17, 2014 - 3:04 pm by AliKhalili - 12 comments »

We are happy to announce the preview release of conTEXT — a platform for lightweight text analytics using Linked Data.
conTEXT enables social Web users to semantically analyze text corpora (such as blogs, RSS/Atom feeds, Facebook, G+, Twitter or SlideWiki.org decks) and provides novel ways for browsing and visualizing the results.

conTEXT workflow

The process of text analytics in conTEXT starts by collecting information from the web. conTEXT utilizes standard information access methods and protocols such as RSS/ATOM feeds, SPARQL endpoints and REST APIs as well as customized crawlers for WordPress and Blogger to build a corpus of information relevant for a certain user. The assembled text corpus is then processed by Natural Language Processing (NLP) services (currently FOX and DBpedia-Spotlight) which link unstructured information sources to the Linked Open Data cloud through DBpedia. The processed corpus is then further enriched by de-referencing the  DBpedia URIs as well as  matching with pre-defined natural-language patterns for DBpedia predicates (BOA patterns). The processed data can also be joined with other existing corpora in a text analytics mashup. The creation of analytics mashups requires dealing with the heterogeneity of different corpora as well as the heterogeneity of different NLP services utilized for annotation. conTEXT employs NIF (NLP Interchange Format) to deal with this heterogeneity. The processed, enriched and possibly mixed results are presented to users using different views for exploration and visualization of the data. Additionally, conTEXT provides an annotation refinement user interface based on the RDFa Content Editor (RDFaCE) to enable users to revise the annotated results. User-refined annotations are sent back to the NLP services as feedback for the purpose of learning in the system.

For more information on conTEXT visit:

AKSW Colloquium (Mon, 16.12.2013) about the SINA question answering system (as presented at IBM Watson)

December 13, 2013 - 4:09 pm by KonradHoeffner - No comments »

Last week Saeedeh Shekarpour was invited to present her work at the IBM research center (Watson project, DeepQA) in New York. On Monday, December 16 at 1.30 pm in Room P-702 (Paulinum), Saeedeh Shekarpour will present SINA, a question answering system, which transforms user-supplied queries in natural language into conjunctive SPARQL queries over a set of interlinked data sources.

As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

For further reading, please refer to the slides and the publication Question Answering on Interlinked Data (BibTeX).

The SINA Question Answering System

The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. The contribution of this work is as follows:

  1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation). It employs a hidden Markov model, whose parameters were bootstrapped with different distribution functions.
  2. A novel method for constructing a federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph which ultimately renders a corresponding SPARQL query.

AKSW Colloquium with Mohamed Morseys PhD defense practise talk on Monday, December 2

November 28, 2013 - 1:16 pm by KonradHoeffner - No comments »

On Monday, December 2 at 1.30 pm in Room P-702 (Paulinum), Mohamed Morsey will give a final rehearsal for his PhD defense “Efficient Extraction and Query Benchmarking of Wikipedia Data”. Guests are encouraged to both provide feedback about improvements to the talk and ask preparatory questions.

As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Efficient Extraction and Query Benchmarking of Wikipedia Data

The thesis consists of two major parts:

  1. Semantic Data Extraction: the objective of that part is to extract data from semi-structured source, i.e. Wikipedia, and transform it into a networked knowledge base, i.e. DBpedia. Furthermore, maintaining the up-to-dateness of that knowledge base to be always in synchronization with Wikipedia.
  2. Triplestore Performance Evaluation: normally the semantic data is stored on a triplestore, e.g. Virtuoso, in order to enable the efficient querying of that data. In that part we have developed a new benchmark for evaluating and contrasting the performance of various triplestores.