AKSW Colloquium, 18-05-2015, Multilingual Morpheme Ontology, Personalised Access and Enrichment of Linked Data Resources

MMoOn – A Multilingual Morpheme Ontology by Bettina Klimek

BettinaIn the last years a rapid emergence of lexical resources evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is either absent or only contained in semi-structured strings. While a plethora of linguistic resources for the lexical domain already exist and are highly reused, there is still a great gap for equivalent morphological datasets and ontologies. In order to enable the capturing of the semantics of expressions beneath the word-level, I will present a Multilingual Morpheme Ontology called MMoOn. It is designed for the creation of machine-processable and interoperable morpheme inventories of a given natural language. As such, any MMoOn dataset will contain not only semantic information of whole words and word-forms but also information on the meaningful parts of which they consist, including inflectional and derivational affixes, stems and bases as well as a wide range of their underlying meanings.

Personalised Access and Enrichment of Linked Data Resources by Milan Dojchinovski

MilanRecent efforts in the Semantic Web community have been primarily focused at developing technical infrastructure and methods for efficient Linked Data acquisition, interlinking and publishing. Nevertheless, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In the recent years, we have conducted two lines of research to address this problem. The first line of research aims at developing graph based methods for “personalised access to Linked Data”. A key contribution of this research is the ”Linked Web APIs” dataset, the largest Web services dataset with over 11K service descriptions, which has been used as a validation dataset. The second line of research has aimed at enrichment of Linked Data text resources and development of “entity recognition and linking” methods. In the talk, I will present the developed methods and the results from the evaluation on a different datasets and evaluation challenges, and the lessons learned in this activities. I will discuss the adaptability, performance of the developed methods and present the future directions.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Colloquium | Comments Off on AKSW Colloquium, 18-05-2015, Multilingual Morpheme Ontology, Personalised Access and Enrichment of Linked Data Resources

AKSW Colloquium, 11-05-2015, DBpedia distributed extraction framework

Scaling up the DBpedia extraction framework by Nilesh Chakraborty

NileshThe DBpedia extraction framework extracts different kinds of structured information from Wikipedia to generate various datasets. Performing a full extraction of Wikipedia dumps of all languages (or even just the mapping-based languages) takes a significant amount of time. The distributed extraction framework runs the extraction on top of Apache Spark so that users can leverage multi-core machines or a distributed cluster of commodity machines to perform faster extraction. For example, performing extraction of the 30-40 mapping based languages on a machine with a quad-core CPU and 16G RAM takes about 36 hours. Running the distributed framework in the same setting using three such worker nodes takes around 10 hours. It’s easy to achieve faster running times by adding more cores or more machines. Apart from the Spark-based extraction framework, we have also implemented a distributed wiki-dump downloader to download Wikipedia dumps for multiple languages, from multiple mirrors, on a cluster in parallel. This is still a work in progress, and in this talk I will discuss the methods and challenges involved in this project, and our immediate goals and timeline.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 11-05-2015, DBpedia distributed extraction framework

Invited talk @AIMS webinar series

On 5th of May Ivan Ermilov on behalf of AKSW presented CKAN data catalog as a part of AIMS (Agricultural Information Management Standards) webinar series. The recording and the slides of the webinar “CKAN as an open-source data management solution for open data” are available on the AIMS web portal: http://aims.fao.org/capacity-development/webinars/ckan-open-source-data-management-solution-open-data

AIMS organizes free and open to everyone webinars on various topics. You can find more recordings and material on AIMS webpage, YouTube channel and Slideshare:

Main page of Webinars@AIMS: http://aims.fao.org/capacity-development/webinars

YouTube: http://www.youtube.com/user/FAOAIMSVideos

Slideshare: http://www.slideshare.net/faoaims/ckan-as-an-opensource-data-management-solution-for-open-data

Posted in invited talk | Comments Off on Invited talk @AIMS webinar series

AKSW Colloquium, 04-05-2015, Automating RDF Dataset Transformation and Enrichment, Structured Machine Learning in Life Science

Automating RDF Dataset Transformation and Enrichment by Mohamed Sherif

Mohamed SherifWith the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this talk, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

Structured Machine Learning in Life Science (PhD progress report) by Patrick Westphal

Patrick WestphalThe utilization of machine learning techniques to solve life science tasks has become widespread within the last years. Mainly working on unstructured data one question is whether such techniques could benefit from the provision of structured background knowledge. One prevalent way to express background knowledge in the life sciences is the Web Ontology Language (OWL). Accordingly there is a great variety of different domain ontologies covering anatomy, genetics, biological processes or chemistry that can be used to form structured machine learning approaches in the life science domain. The talk will give a brief overview of tasks and problems of structured machine learning in life science. Besides the special characteristics observed when applying the state-of-the-art concept learning approaches to life science tasks, a short description of the actual differences to concept learning setups in other domains is given. Further, some directions for machine learning based techniques are shown that could support concept learning in life science tasks.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted in Uncategorized | Comments Off on AKSW Colloquium, 04-05-2015, Automating RDF Dataset Transformation and Enrichment, Structured Machine Learning in Life Science

AKSW Colloquium, 27-04-2015, Ontotext’s RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform via & Knowledge-Based Trust

This colloquium features two talks. First the Self-Service Semantic Suite (S4) platform is presented by Marin Dimitrov (Ontotext), followed up by Jörg Unbehauens report on Googles effort on using factual correctness as a ranking factor.

RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform

In this talk Marin Dimitrov (Ontotext) will introduce the RDFdatabase-as-a-service (DBaaS) options for managing RDF data in the Cloud via the Self-Service Semantic Suite (S4) platform. With S4 developers and researchers can instantly get access to fully managed RDF DBaaS, without the need for hardware provisioning, maintenance and operations. Additionally, the S4 platform provides on-demand access to text analytics services for news, social media and life sciences, as well as access to knowledge graphs (DBpedia, Freebase and GeoNames).

The goal of the S4 platform is to make it easy for developers and researchers to develop smart/semantic applications, without the need to spend time and effort on infrastructure provisioning and maintenance. Marin will also provide examples of EC funded research projects – DaPaaS, ProDataMarket and KConnect — that plan to utilise the S4 platform for semantic data management

More information on S4 will be available in [1][2] and [3]

[1] Marin Dimitrov, Alex Simov and Yavor Petkov.  On-demand Text Analytics and Metadata Management with S4. In: proceedings of Workshop on Emerging Software as a Service and Analytics (ESaaSA 2015) at the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015), Lisbon, Portugal.

[2] Marin Dimitrov, Alex Simov and Yavor Petkov. Text Analytics and Linked Data Management As-a-Service with S4. In: proceedings of 3rd International Workshop on Semantic Web Enterprise Adoption and Best Practice (WaSABi 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

[3] Marin Dimitrov, Alex Simov and Yavor Petkov. Low-cost Open Data As-a-Service in the Cloud. In: proceedings of 2nd Semantic Web Enterprise Developers Workshop (SemDev 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

Report on: “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources”

by Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, Wei Zhang

Link to the paper

Presentation by Jörg Unbehauen

Abstract:

“The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method. “

Posted in Uncategorized | Comments Off on AKSW Colloquium, 27-04-2015, Ontotext’s RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform via & Knowledge-Based Trust

Talk by Kleanthi Georgala

Last week on Friday, 17th April, Kleanthi Georgala visited AKSW and gave a talk entitled “Traces Through Time: Probabilistic Record Linkage – Medieval and Early Modern”. More information below.

KLEANTHI GEORGALAThis innovative, multi-disciplinary project will deliver practical analytical tools to support large-scale exploration of big historical datasets. The project aims to bring together international research experience in the digital humanities, natural language processing, information science, data mining and linked data, with large, complex and diverse ‘big data’ spanning over 500 years of British history.

The project’s technical outputs will be a methodology and supporting toolkit that identify individuals within and across historical datasets, allowing people to be traced through the records and enabling their stories to emerge from the data. The tools will handle the ‘fuzzy’ nature of historical data, including aliases, incomplete information, spelling variations and the errors that are inevitably encountered in official records. The toolkit will be open and configurable, offering the flexibility to formulate and ask interesting questions of the data, exploring it in ways that were not imagined when the records were created. The open approach will create opportunities for further enhancement or re-use and offers the further potential to deliver the outputs as a service, extensible to new datasets as these become available. This brings the vision of finding and linking individuals in new combinations of datasets, from the widest range of historical sources.

Traces Through Time project was a collaboration between The National Archives in the United Kingdom, The Institute of Historical Research, University of Brighton and University of Leiden . This presentation describes the probabilistic record linkage system that was developed for this task by the University of Leiden team and introduces some insightful examples of matches that our system was able to find in the Medieval and Early Modern data, as well as a number of experiments on artificial data to test the workings of the system.

Also, you may view her talk here:

Posted in Announcements, Colloquium, Events, invited talk | Comments Off on Talk by Kleanthi Georgala

SAKE Projekt website goes live

Hi all!

The project website for the BMWi funded Smart Data Web Project “SAKE” is now on-line at www.sake-projekt.de. It already mentions the first SAKE-related publication by Saleem@AKSW and introduces our partners as well as the industry use cases which we are going to tackle. Check it out and spread the word. And don’t forget to follow @SAKE_Projekt on twitter.

Simon on behalf of the SAKE team

Posted in SAKE | Comments Off on SAKE Projekt website goes live

AKSW Colloquium, 20-04-2015, OWL/DL approaches to improve POS tagging

In this colloquium Markus Ackermann will touch on the ‘linguistic gap‘ of recent POS tagging endeavours (as perceived by C. Manning, [1]). Building on observations in that paper, potential paths towards more linguistically informed POS tagging are explored:

An alternative to the most widely employed ground truth for development and evaluation of POS tagging systems for English will be presented ([2]) and utilization of benefits of a DL-based representation of POS tags for a multi-tool tagging approach will be shown ([3]).

Finally, the presenter will give an overview about work in progress with the goal to combine OWL/DL-representation of POS tags with a suitable symbolic machine learning tool (DL-Learner, [4]) to improve the performance of a state-of-the-art statistical POS tagger with human-interpretable post-correction rules formulated as OWL/DL-expressions.

[1] Christopher D. Manning. 2011. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In Alexander Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171–189.

[2] G.R. Sampson. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Clarendon Press (Oxford University Press).

[3] Christian Chiarcos. 2010. Towards Robust Multi-Tool Tagging: An OWL/DL-Based Approach. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL2010.

[4] Jens Lehmann. 2009. DL-Learner: Learning Concepts in Description Logics. In The Journal of Machine Learning Research, Volume 10, pp. 2639-2642.

Posted in Colloquium | Comments Off on AKSW Colloquium, 20-04-2015, OWL/DL approaches to improve POS tagging

AKSW Colloquium, 13-04-2015, Effective Caching Techniques for Accelerating Pattern Matching Queries

In this colloquium, Claus Stadler will present the paper Effective Caching Techniques for Accelerating Pattern Matching Queries by Arash Fard, Satya Manda, Lakshmish Ramaswamy, and John A. Miller.

Abstract: Using caching techniques to improve response time of queries is a proven approach in many contexts. However, it is not well explored for subgraph pattern matching queries, mainly because  of  subtleties  enforced  by  traditional  pattern  matching models.  Indeed,  efficient  caching  can  greatly  impact  the  query answering performance for massive graphs in any query engine whether  it  is  centralized  or  distributed.  This  paper  investigates the capabilities of the newly introduced pattern matching models in graph simulation family for this purpose. We propose a novel caching technique, and show how the results of a query can be used to answer the new similar queries according to the similarity measure  that  is  introduced.  Using  large  real-world  graphs,  we experimentally verify the efficiency of the proposed technique in answering subgraph pattern matching queries

Link to PDF

Posted in Colloquium | Comments Off on AKSW Colloquium, 13-04-2015, Effective Caching Techniques for Accelerating Pattern Matching Queries

Special talk: Linked Data Quality Assessment and its Application to Societal Progress Measurement

Linked Data Quality Assessment and its Application to Societal Progress Measurement

Amrapali ZaveriAbstract: In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case.  A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. In this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. Next, three different methodologies for linked data quality assessment are evaluated namely (i) user-driven; (ii) crowdsourcing and (iii) semi-atuomated use case driven. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis.

Join us!

  • Thursday, 9 April at 2pm, Room P702

Posted in Uncategorized | Comments Off on Special talk: Linked Data Quality Assessment and its Application to Societal Progress Measurement