MMoOn – A Multilingual Morpheme Ontology by Bettina Klimek
In the last years a rapid emergence of lexical resources evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is either absent or only contained in semi-structured strings. While a plethora of linguistic resources for the lexical domain already exist and are highly reused, there is still a great gap for equivalent morphological datasets and ontologies. In order to enable the capturing of the semantics of expressions beneath the word-level, I will present a Multilingual Morpheme Ontology called MMoOn. It is designed for the creation of machine-processable and interoperable morpheme inventories of a given natural language. As such, any MMoOn dataset will contain not only semantic information of whole words and word-forms but also information on the meaningful parts of which they consist, including inflectional and derivational affixes, stems and bases as well as a wide range of their underlying meanings.
Personalised Access and Enrichment of Linked Data Resources by Milan Dojchinovski
Recent efforts in the Semantic Web community have been primarily focused at developing technical infrastructure and methods for efficient Linked Data acquisition, interlinking and publishing. Nevertheless, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In the recent years, we have conducted two lines of research to address this problem. The first line of research aims at developing graph based methods for “personalised access to Linked Data”. A key contribution of this research is the ”Linked Web APIs” dataset, the largest Web services dataset with over 11K service descriptions, which has been used as a validation dataset. The second line of research has aimed at enrichment of Linked Data text resources and development of “entity recognition and linking” methods. In the talk, I will present the developed methods and the results from the evaluation on a different datasets and evaluation challenges, and the lessons learned in this activities. I will discuss the adaptability, performance of the developed methods and present the future directions.
About the AKSW Colloquium
This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.
The DBpedia extraction framework extracts different kinds of structured information from Wikipedia to generate various datasets. Performing a full extraction of Wikipedia dumps of all languages (or even just the mapping-based languages) takes a significant amount of time. The distributed extraction framework runs the extraction on top of Apache Spark so that users can leverage multi-core machines or a distributed cluster of commodity machines to perform faster extraction. For example, performing extraction of the 30-40 mapping based languages on a machine with a quad-core CPU and 16G RAM takes about 36 hours. Running the distributed framework in the same setting using three such worker nodes takes around 10 hours. It’s easy to achieve faster running times by adding more cores or more machines. Apart from the Spark-based extraction framework, we have also implemented a distributed wiki-dump downloader to download Wikipedia dumps for multiple languages, from multiple mirrors, on a cluster in parallel. This is still a work in progress, and in this talk I will discuss the methods and challenges involved in this project, and our immediate goals and timeline.
This innovative, multi-disciplinary project will deliver practical analytical tools to support large-scale exploration of big historical datasets. The project aims to bring together international research experience in the digital humanities, natural language processing, information science, data mining and linked data, with large, complex and diverse ‘big data’ spanning over 500 years of British history.
