AKSW Blog

DBpedia Day @ SEMANTiCS 2021

Posted on July 30, 2021 by Julia Holze

We are happy to announce that we are partnering again with the SEMANTiCS Conference which will host this year’s DBpedia Day on September 9, 2021. The SEMANTiCS is an established knowledge hub which brings together technology professionals, industry experts, and researchers to exchange knowledge regarding new technologies, innovations, and enterprise implementations in the fields of Linked Data and Semantic AI.

Highlights/Sessions

Keynote presentation by Maria-Esther Vidal (TIB & L3S Research Center) (abstract)
DBpedia Databus presentation by Sebastian Hellmann (InfAI, DBpedia Association)
DBpedia member presentation
DBpedia Ontology session
NLP & DBpedia session
Presentation of the Dutch National Knowledge Graph

Quick Facts

Web URL: https://www.dbpedia.org/events/dbpedia-day-semantics-2021/
When: September 9, 2021
Where: Meervaart Theatre, Meer en vaart 300, 1068 LE, Amsterdam AND online
Registration: To attend the conference on September 9, 2021 you have to book your ticket here.

Registration/Tickets

Attending the DBpedia Day costs 138 € (onsite). You need to buy your ticket on the SEMANTiCS website. Because of the current situation you are able to switch between online (46€) or onsite tickets (138 €) any time.

Sponsors and Acknowledgements

Organisation

Enno Meijers, National Library of the Netherlands & Dutch DBpedia
Julia Holze, InfAI, DBpedia Association
Milan Dojchinovski, InfAI, DBpedia Association, CTU
Sebastian Hellmann, InfAI, DBpedia Association

We are looking forward to meeting you in Amsterdam or online!

Julia

on behalf of the DBpedia Association

Posted in dbpedia, Events | Tagged dbpedia databus, linked data, Semantic Web, SEMANTiCs | Comments Off

LDK Conference meets DBpedia in Zaragoza, Spain

Posted on July 9, 2021 by Julia Holze

We are happy to announce that we will organize a DBpedia Tutorial on September 1, 2021 in Zaragoza, Spain. This DBpedia tutorial will be part of the Language, Data and Knowledge conference 2021. Building upon the success of the previous events held in Galway, Ireland in 2017, and in Leipzig, Germany in 2019, this conference will bring together researchers from across disciplines concerned with the acquisition, curation and use of language data in the context of data science and knowledge-based applications.

Quick facts

Web URL: https://www.dbpedia.org/events/tutorial-at-ldk-2021
Hashtag: #DBpediaTutorial
When: September 1, 2021
Where: Zaragoza in Spain
Registration: via the LDK website until July 22: http://2021.ldk-conf.org/registration/

Highlights/Sessions

Using Databus collections (Download)
Creating customized Databus collections
Uploading data to the Databus
Using collections in Databus-ready Docker applications
Creating dockerized applications for the DBpedia Stack

Organisation

Milan Dojchinovski, InfAI, DBpedia Association
Jan Forberg, InfAI, DBpedia Association
Johannes Frey, InfAI, DBpedia Association
Julia Holze, InfAI, DBpedia Association
Sebastian Hellmann, InfAI, DBpedia Association

We are looking forward to meeting you at this year’s Language, Data and Knowledge Conference!

Cheers, Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia, Events, workshop or tutorial | Tagged DBpedia, dbpedia databus, ldk, tutorial | Comments Off

Assessing Language Identification Over DBpedia

Posted on May 4, 2021 by EdgardMarx

Large-scale multilingual knowledge bases (KBs) are the key for cross-lingual and multilingual applications such as Question Answering, Machine Translation, and Search. They can encode the same information in different languages and be used to deliver content in the most appropriate format to the user. For example, the figure below shows two different language versions of Google search for the query “rdf”.

Another interesting use of multilingual KBs is their application as a training corpus to leverage better multilingual machine learning models. However, finding qualitative multilingual content can be challenging. An analysis of over 100 thousand KBs in LOD Laundromat shows that only ∼14% of them have language tags on their rdfs:labels while ∼20% of all rdfs:labels have no language tag. One solution to overcome this challenge is the use of language identification methods to automatically tag RDF content.

In this work, we exploit DBpedia’s multilingual content for training and evaluating different language identification methods and frameworks. We show that these approaches perform poorly on rdfs:labels. In our experiments, we evaluate the performance of six language identification methods which consist of two baselines (LangTagger) as well as Apache Tika, langdetect, and Apache openNLP language detector in two configurations.

LangTagger and langdetect use Naive Bayes classifiers for language detection. Langdetect uses a character-based while Apache Tika uses a word-based n-gram method to create features. Both LangTagger models were trained using QALD training dataset questions because it is multilingual and based on DBpedia resources. Table II gives an overview of the different language identification methods evaluated.

Evaluation: since openNLP and langdetect originally supported language identification for 103 and 55 languages, we used another configuration to limit the languages to 12. Those 12 languages are English, Deutsch, Spanish, French, Brazilian Portuguese, Portuguese, Italian, Dutch, Hindi, Romanian, Persian, and Russian which are used to train the baseline models. In this work, we evaluate the performance of different language identification methods and frameworks over DBpedia rdfs:labels.

Conclusion: By observing the results shown in Table I, we can see that openNLP outperforms other frameworks w.r.t accuracy after limiting the number of inferred languages. langdetect outperforms other frameworks w.r.t. runtime. We show that it is possible to reach SOTA with a small training corpus (see baseline models in German language). Overall, the methods perform poorly on rdfs:labels. Further, we show that the accuracy can be improved by reducing the number of language profiles and using context-based training corpora.

Acknowledgement: This work was partially supported by DBpedia under Google Summer of Code (GSoC) 2020.

More information can be found in the following links:

Paper: https://ieeexplore.ieee.org/abstract/document/9364504

GitHub: https://github.com/AKSW/LangTagger

Authors: Lahiru Hinguruduwa, Edgard Marx (@eccenca GmbH), Tommaso Soru, Thomas Riechert

Posted in Language Identification, Papers, Research, Uncategorized | Tagged DBpedia, GSOC2020, ICSC2021, Language Identification, Machine Learning, Multilingual Knowledge Bases | Comments Off

DBpedia Tutorial @ Knowledge Graph Conference 2021

Posted on April 9, 2021 by Julia Holze

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. The tutorial targets existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers interested in exploiting the DBpedia Knowledge Graph (KG) and data providers interested in integrating data assets with the DBpedia KG as well as data scientists (e.g. linguists) focused on extracting relevant information (e.g. linguistic) from/based on the DBpedia KG.

During the course of the tutorial the participants will gain knowledge about how to find information, access, query and work with the DBpedia KG, how to replicate it and how to contribute and improve the Knowledge Graph.

Quick Facts

Web URL: https://www.dbpedia.org/events/dbpedia-tutorial-kg-conference/
When: May 4, 2021 3pm -6pm CEST
Where: The tutorial will be organized online.

Registration

Please register at the Knowledge Graph Conference website to be part of the meeting. You NEED to buy a conference ticket to join the tutorial.

Organisation

Milan Dojchinovski, AKSW, DBpedia Association / CTU in Prague
Sebastian Hellmann, AKSW, DBpedia Association
Jan Forberg, AKSW, DBpedia Association
Johannes Frey, AKSW, DBpedia Association
Julia Holze, AKSW, DBpedia Association
Marvin Hofer, AKSW
Denis Streitmatter, AKSW

We are looking forward to meeting you in May!

Emma & Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia, Events | Tagged DBpedia, knowledge graph, open-source, Semantic Web | Comments Off

DBpedia @ Google Summer of Code program 2021

Posted on March 15, 2021 by Julia Holze

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time.

The GsoC program has the goal to bring students from all over the globe into open source software development. In this regard we are calling for students to be part of the Summer of Codes. During two funded months, you will be able to work on a specific task, which results are presented in the summer. Have a look at some of the project ideas here.

We aroused your interest in participation? Great, then check out the DBpedia website for further information.

We are looking forward to your contribution!

Kind regards,

Julia

on behalf of the DBpedia Association

Posted in dbpedia | Tagged community, gsoc21, linked data, open-source | Comments Off

DBpedia’s New Website

Posted on January 28, 2021 by Julia Holze

We are proud to announce the completion of the new DBpedia website. We used the New Year’s break as an opportunity to alter layout, design and content of the DBpedia website, according to the requirements of the DBpedia community and DBpedia members. We’ve created a new site to better present the DBpedia movement in its many facets.

New Website and Blog

The DBpedia team have diligently cleaned up the website, have removed outdated content and created a platform for new tools, applications, services and data sets. We additionally integrated the DBpedia blog on the website, a long overdue step. So now, you have access to all in one spot.

Feedback Button

Feedback from the community and members is very important to us. So, we offer a tool for you, to make your voice heard. Just click the feedback button on the new DBpedia website. If you find the content helpful, please click on Yep. If you think the content is not sufficient, please report to us either directly on the website or via dbpedia@infai.org.

Acknowledgment

The DBpedia Association would like to thank Bettina Klimek, Henri Selbmann (Seefeuer GbR) and the KILT Competence Center at InfAI for their constant support to create the new DBpedia website.

Have fun browsing the new DBpedia website.

Kind regards,

Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia | Tagged community, linked data, website | Comments Off

SANSA 0.7.1 (Semantic Analytics Stack) Released

Posted on January 17, 2020 by Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find usage guidelines and examples at http://sansa-stack.net/user-guide.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad, TRIX format
Reading OWL files in various standard formats
Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify and Ontop and Tensors
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple and OWL-Horst forward chaining inference
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

TRIX support
A new query engine over compressed RDF data
OWL/XML Support

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin, PLATOON and Simple-ML. Also check out our recent articles in which we describe how to use SANSA for tensor based querying, scalable RDB2RDF query execution, quality assessment and semantic partitioning.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

Posted in Uncategorized | Comments Off

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

Posted on December 5, 2019 by Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. These datasets are available in different formats such as raw data dumps and HDT files, or directly accessible via SPARQL endpoints. Querying such a large amount of distributed data is particularly challenging and many of these datasets cannot be directly queried using the SPARQL query language.

In order to tackle these problems, We present WimuQ, an integrated query engine to execute SPARQL queries and retrieve results from a large amount of heterogeneous RDF data sources. Presently, WimuQ is able to execute both federated and non-federated SPARQL queries over a total of 668,166 datasets from LOD Stats and LOD Laundromat, as well as 559 active SPARQL endpoints. These data sources represent a total of 221.7 billion triples from more than 5 terabytes of information from datasets retrieved using the service “Where is My URI” (WIMU). Our evaluation of state-of-the-art real-data benchmarks shows that WimuQ retrieves more complete results for the benchmark queries.

The contributions of this work are:

A hybrid SPARQL query-processing engine to execute SPARQL queries over a large amount of heterogeneous RDF data.
Evaluation of real-world datasets using the state of the art of federated and non-federated query benchmarks (FedBench, LargeRDFBench and FEASIBLE).
We present the first federated SPARQL query-processing engine that executes SPARQL queries over a total of 221.7 billion triples.

This is an ongoing work, in which the next step consists of a Large Scale approach to study the relation and similarity among the datasets. This work was supported by the Semantic Web group of HTWK Leipzig (https://www.htwk-leipzig.de/) under the advisement of Prof. Dr. rer. nat. Thomas Riechert.

Github repository: https://github.com/firmao/wimuT

Prototype/proof of concept: https://w3id.org/wimuq/

Slides: https://tinyurl.com/slidesKcap2019

Paper: https://dl.acm.org/citation.cfm?id=3364436

Conference: http://www.k-cap.org/2019/

Authors/Contact: valdestilhas@informatik.uni-leipzig.de, tsoru@informatik.uni-leipzig.de, saleem@informatik.uni-leipzig.de

Posted in paper presentation, Papers | Tagged k-cap2019, large datasets, wimuq | Comments Off

DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released

Posted on September 24, 2019 by Simon Bin

Dear all,

The Smart Data Analytics group [1] and the E.T.-db-MOLE sub-group located at the InfAI Leipzig [2] is happy to announce

DL-Learner 1.4.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/SmartDataAnalytics/DL-Learner
Download: https://github.com/SmartDataAnalytics/DL-Learner/releases/tag/1.4.0

In the current release, we continued to improve the code and work on our query tree and class expression learning algorithms. The config file can now optionally be written in Json syntax. We updated the packaging to be ready for Java 11 and also tested DL-Learner on Windows. Some logical fixes to the Horizontal Expansion in CELOE were reported and analysed by Yingbing Hua, thanks!

The DL-Learner system has also been presented at The Web Conference in Lyon 2018 [3]. We want to thank everyone who helped to create this release. We also acknowledge support by the following projects: LIMBO [4], QROWD [5], SAKE [6], Big Data Europe [7], HOBBIT [8], GeoKnow [9], GOLD [10], and SLIPO [11].

Kind regards,

Jens Lehmann, Lorenz Bühmann, Patrick Westphal and Simon Bin

[1] http://sda.tech
[2] https://infai.org/efficient-technology-integration/
[3] http://jens-lehmann.org/files/2018/www_dllearner.pdf
[4] https://www.limbo-project.org/
[5] http://qrowd-project.eu/
[6] https://www.sake-projekt.de/
[7] https://www.big-data-europe.eu/
[8] http://project-hobbit.eu/
[9] http://geoknow.eu/
[10] http://aksw.org/Projects/GOLD.html
[11] http://www.slipo.eu/

Posted in Announcements, DL-Learner, Software Releases, Uncategorized | Comments Off

DBpedia Day @ SEMANTiCS 2019

Posted on August 1, 2019 by Sandra Bartsch

We are happy to announce that SEMANTiCS 2019 will host the 14th DBpedia Community Meeting at the last day of the conference on September 12, 2019.

Highlights/Sessions

Keynote #1: Katja Hose, Aalborg University, Denmark
Keynote #2: Dan Weitzner from WPSemantix
DBpedia Databus presentation and training session
DBpedia Association hour
DBpedia Showcase session
DBpedia Chapter session

Call for Contribution

Tell us what cool things you do with DBpedia: Present your tools and datasets at the DBpedia Community Meeting! Please submit your presentations, posters, demos or other forms of contributions through our web form.

Quick Facts

Web URL: https://wiki.dbpedia.org/events/14th-dbpedia-community-meeting-karlsruhe
When: September 12th, 2019
Where: Leibniz-Institute für Informationsstruktur – FIZ Karlsruhe, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
Call for Contribution: Submit your proposal in our form
Registration: Attending the DBpedia Community meeting costs 90 €. You can buy your ticket on the SEMANTiCS website. DBpedia members get free admission. Please contact your nearest DBpedia chapter for a promotion code, or please contact the DBpedia Association.

Sponsors and Acknowledgments

In case you want to sponsor the 14th DBpedia Community Meeting, please contact the DBpedia Association via dbpedia@infai.org.

Organisation

Tina Schmeissner, DBpedia Association
Sandra Prätor, AKSW/KILT, DBpedia Association
Sebastian Hellmann, AKSW/KILT, DBpedia Association

We are looking forward to meeting you in Karlsruhe!

Your DBpedia Association

Posted in Call for Paper, Call for Students, dbpedia, Events | Tagged Community Meeting, DBpedia, SEMANTiCs | Comments Off

DBpedia Day @ SEMANTiCS 2021

LDK Conference meets DBpedia in Zaragoza, Spain

Quick facts

Highlights/Sessions

Organisation

Assessing Language Identification Over DBpedia

DBpedia Tutorial @ Knowledge Graph Conference 2021

DBpedia @ Google Summer of Code program 2021

DBpedia’s New Website

New Website and Blog

Feedback Button

Acknowledgment

SANSA 0.7.1 (Semantic Analytics Stack) Released

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released

DBpedia Day @ SEMANTiCS 2019

Recent Posts

Categories

Meta