Concept Expansion Using Web Tables by Chi Wang, Kaushik Chakrabarti, Yeye He,Kris Ganjam, Zhimin Chen, Philip A. Bernstein (WWW’2015), presented by Ivan Ermilov:
Abstract. We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this paper, we propose to leverage the millions of tables on the web for this problem. The core technical challenge is to identify the “exclusive” tables for a concept to prevent semantic drift; existing holistic ranking techniques like personalized PageRank are inadequate for this purpose. We develop novel probabilistic ranking methods that can model a new type of table-entity relationship. Experiments with real-life concepts show that our proposed solution is significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques.
Mining entities from the Web by Anna Lisa Gentile
This talk explores the task of mining entities and their describing attributes from the Web. The focus is on entity-centric websites, i.e. domain specific websites containing a description page for each entity. The task of extracting information from this kind of websites is usually referred as Wrapper Induction. We propose a simple knowledge based method which is (i) highly flexible with respect to different domains and (ii) does not require any training material, but exploits Linked Data as background knowledge source to build essential learning resources. Linked Data – an imprecise, redundant and large-scale knowledge resource – proved useful to support this Information Extraction task: for domains that are covered, Linked Data serve as a powerful knowledge resource for gathering learning seeds. Experiments on a publicly available dataset demonstrate that, under certain conditions, this simple approach based on distant supervision can achieve competitive results against some complex state of the art that always depends on training data.
Linked Data Stack by Martin Röbert
Martin will present the packaging infrastructure developed for the Linked Data Stack project, which will be followed by a discussion about the future of the project.
About the AKSW Colloquium
This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.