DBpedia Masterclass @ Connected Data World

Dear all,

We are proud to announce that we will organize the masterclass “DBpedia Knowledge Graph Tutorial for Beginners” at the Connected Data World event. The masterclass will take place online on December 2, 2021 at 4:15pm CET and it targets existing and potential new users of DBpedia, developers that wish to learn basics about the DBpedia Knowledge Graph and learn how to replicate and deploy DBpedia on a local infrastructure as well as Linked Data and Knowledge Graph adopters.

Highlights

In this masterclass, participants will learn how to consume the DBpedia Knowledge Graph (KG) with the least amount of effort. The masterclass will introduce the DBpedia KG and explain its dataset partitions. Particular focus will be put on “usability”. On a selected use case, we will explain the process of working with DBpedia KG and DBpedia technology stack (DBpedia Databus, DBpedia Spotlight, DBpedia Docker) and illustrate the potential and the benefits of using DBpedia.

Quick Facts

– Web URL: https://2021.connected-data.world/talks/dbpedia-knowledge-graph-tutorial-for-beginners/  

– When: December 2, 2021 at 4:15pm CET

– Where: The tutorial will be organized online.

Tickets 

Please register at the Connected Data World website to be part of the masterclass. You need to buy a full access pass to join the DBpedia masterclass. Please get your ticket here.

Organisation 

– Jan Forberg, DBpedia 

– Johannes Frey, DBpedia 

– Milan Dojchinovski, DBpedia / Czech Technical University in Prague

– Julia Holze, InfAI, DBpedia 

– Sebastian Hellmann, DBpedia 

We are looking forward to meeting you online!

Kind regards,

Julia

on behalf of the DBpedia Association

Posted in dbpedia, Events | Tagged , , , , | Comments Off on DBpedia Masterclass @ Connected Data World

DBpedia Day @ SEMANTiCS 2021

We are happy to announce that we are partnering again with the SEMANTiCS Conference which will host this year’s DBpedia Day on September 9, 2021. The SEMANTiCS is an established knowledge hub which brings together technology professionals, industry experts, and researchers to exchange knowledge regarding new technologies, innovations, and enterprise implementations in the fields of Linked Data and Semantic AI. 

Highlights/Sessions

  • Keynote presentation by Maria-Esther Vidal (TIB & L3S Research Center) (abstract)
  • DBpedia Databus presentation by Sebastian Hellmann (InfAI, DBpedia Association)
  • DBpedia member presentation
  • DBpedia Ontology session
  • NLP & DBpedia session 
  • Presentation of the Dutch National Knowledge Graph

Quick Facts

Registration/Tickets

Attending the DBpedia Day costs 138 € (onsite). You need to buy your ticket on the SEMANTiCS website. Because of the current situation you are able to switch between online (46€) or onsite tickets (138 €) any time.

Sponsors and Acknowledgements

Organisation

  • Enno Meijers, National Library of the Netherlands & Dutch DBpedia
  • Julia Holze, InfAI, DBpedia Association
  • Milan Dojchinovski, InfAI, DBpedia Association, CTU
  • Sebastian Hellmann, InfAI, DBpedia Association

We are looking forward to meeting you in Amsterdam or online!

Julia

on behalf of the DBpedia Association

Posted in dbpedia, Events | Tagged , , , | Comments Off on DBpedia Day @ SEMANTiCS 2021

LDK Conference meets DBpedia in Zaragoza, Spain

We are happy to announce that we will organize a DBpedia Tutorial on September 1, 2021 in Zaragoza, Spain. This DBpedia tutorial will be part of the Language, Data and Knowledge conference 2021. Building upon the success of the previous events held in Galway, Ireland in 2017, and in Leipzig, Germany in 2019, this conference will bring together researchers from across disciplines concerned with the acquisition, curation and use of language data in the context of data science and knowledge-based applications.

Quick facts

Highlights/Sessions

  • Using Databus collections (Download)
  • Creating customized Databus collections
  • Uploading data to the Databus
  • Using collections in Databus-ready Docker applications
  • Creating dockerized applications for the DBpedia Stack

Organisation

  • Milan Dojchinovski, InfAI, DBpedia Association
  • Jan Forberg, InfAI, DBpedia Association
  • Johannes Frey, InfAI, DBpedia Association
  • Julia Holze, InfAI, DBpedia Association
  • Sebastian Hellmann, InfAI, DBpedia Association

We are looking forward to meeting you at this year’s Language, Data and Knowledge Conference!

Cheers, Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia, Events, workshop or tutorial | Tagged , , , | Comments Off on LDK Conference meets DBpedia in Zaragoza, Spain

Assessing Language Identification Over DBpedia

Large-scale multilingual knowledge bases (KBs) are the key for cross-lingual and multilingual applications such as Question Answering, Machine  Translation,  and  Search. They can encode the same information in different languages and be used to deliver content in the most appropriate format to the user. For example, the figure below shows two different language versions of Google search for the query “rdf”.  

Another interesting use of multilingual KBs is their application as a training corpus to leverage better multilingual machine learning models. However, finding qualitative multilingual content can be challenging. An analysis of over 100 thousand KBs in LOD Laundromat shows that only ∼14% of them have language tags on their rdfs:labels while ∼20% of all rdfs:labels have no language tag. One solution to overcome this challenge is the use of language identification methods to automatically tag RDF content.

In this work, we exploit DBpedia’s multilingual content for training and evaluating different language identification methods and frameworks. We show that these approaches perform poorly on rdfs:labels. In our experiments, we evaluate the performance of six language identification methods which consist of two baselines (LangTagger) as well as Apache Tika, langdetect, and Apache openNLP language detector in two configurations.

LangTagger and langdetect use Naive Bayes classifiers for language detection. Langdetect uses a character-based while Apache Tika uses a word-based n-gram method to create features. Both LangTagger models were trained using QALD training dataset questions because it is multilingual and based on DBpedia resources. Table II gives an overview of the different language identification methods evaluated.

Evaluation: since openNLP and langdetect originally supported language identification for 103 and 55 languages, we used another configuration to limit the languages to 12. Those 12 languages are English, Deutsch, Spanish, French, Brazilian Portuguese, Portuguese, Italian, Dutch, Hindi, Romanian, Persian, and Russian which are used to train the baseline models. In this work,  we evaluate the performance of different language identification methods and frameworks over DBpedia rdfs:labels.

Conclusion: By observing the results shown in Table I, we can see that openNLP outperforms other frameworks w.r.t accuracy after limiting the number of inferred languages. langdetect outperforms other frameworks w.r.t. runtime. We show that it is possible to reach SOTA with a small training corpus (see baseline models in German language). Overall, the methods perform poorly on rdfs:labels. Further, we show that the accuracy can be improved by reducing the number of language profiles and using context-based training corpora.

Acknowledgement: This work was partially supported by DBpedia under Google Summer of Code (GSoC) 2020.

More information can be found in the following links:

Paper: https://ieeexplore.ieee.org/abstract/document/9364504

GitHub: https://github.com/AKSW/LangTagger

Authors: Lahiru Hinguruduwa, Edgard Marx (@eccenca GmbH), Tommaso Soru, Thomas Riechert

Posted in Language Identification, Papers, Research, Uncategorized | Tagged , , , , , | Comments Off on Assessing Language Identification Over DBpedia

DBpedia Tutorial @ Knowledge Graph Conference 2021

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. The tutorial targets existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers interested in exploiting the DBpedia Knowledge Graph (KG) and data providers interested in integrating data assets with the DBpedia KG as well as data scientists (e.g. linguists) focused on extracting relevant information (e.g. linguistic) from/based on the DBpedia KG. 

During the course of the tutorial the participants will gain knowledge about how to find information, access, query and work with the DBpedia KG, how to replicate it and how to contribute and improve the Knowledge Graph. 

Quick Facts

Registration

Please register at the Knowledge Graph Conference website to be part of the meeting. You NEED to buy a conference ticket to join the tutorial.

Organisation

  •     Milan Dojchinovski, AKSW, DBpedia Association / CTU in Prague
  •     Sebastian Hellmann, AKSW, DBpedia Association
  •     Jan Forberg, AKSW, DBpedia Association
  •     Johannes Frey, AKSW, DBpedia Association
  •     Julia Holze, AKSW, DBpedia Association
  •     Marvin Hofer, AKSW
  •     Denis Streitmatter, AKSW

We are looking forward to meeting you in May!

Emma & Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia, Events | Tagged , , , | Comments Off on DBpedia Tutorial @ Knowledge Graph Conference 2021

DBpedia @ Google Summer of Code program 2021

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time.

The GsoC program has the goal to bring students from all over the globe into open source software development. In this regard we are calling for students to be part of the Summer of Codes. During two funded months, you will be able to work on a specific task, which results are presented in the summer. Have a look at some of the project ideas here.

We aroused your interest in participation? Great, then check out the DBpedia website for further information.

We are looking forward to your contribution!

Kind regards,

Julia

on behalf of the DBpedia Association

Posted in dbpedia | Tagged , , , | Comments Off on DBpedia @ Google Summer of Code program 2021

DBpedia’s New Website

We are proud to announce the completion of the new DBpedia website. We used the New Year’s break as an opportunity to alter layout, design and content of the DBpedia website, according to the requirements of the DBpedia community and DBpedia members. We’ve created a new site to better present the DBpedia movement in its many facets.

New Website and Blog

The DBpedia team have diligently cleaned up the website, have removed outdated content and created a platform for new tools, applications, services and data sets. We additionally integrated the DBpedia blog on the website, a long overdue step. So now, you have access to all in one spot.

Feedback Button

Feedback from the community and members is very important to us. So, we offer a tool for you, to make your voice heard. Just click the feedback button on the new DBpedia website. If you find the content helpful, please click on Yep. If you think the content is not sufficient, please report to us either directly on the website or via dbpedia@infai.org.

Acknowledgment

The DBpedia Association would like to thank Bettina Klimek, Henri Selbmann (Seefeuer GbR) and the KILT Competence Center at InfAI for their constant support to create the new DBpedia website.

Have fun browsing the new DBpedia website.

Kind regards,

Julia

on behalf of the DBpedia Association

Posted in Announcements, dbpedia | Tagged , , | Comments Off on DBpedia’s New Website

SANSA 0.7.1 (Semantic Analytics Stack) Released

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find usage guidelines and examples at http://sansa-stack.net/user-guide.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad, TRIX format
  • Reading OWL files in various standard formats
  • Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify and Ontop and Tensors
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple and OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • TRIX support
  • A new query engine over compressed RDF data
  • OWL/XML Support

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin, PLATOON and Simple-ML. Also check out our recent articles in which we describe how to use SANSA for tensor based querying, scalable RDB2RDF query execution, quality assessment and semantic partitioning.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

 

Posted in Uncategorized | Comments Off on SANSA 0.7.1 (Semantic Analytics Stack) Released

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. These datasets are available in different formats such as raw data dumps and HDT files, or directly accessible via SPARQL endpoints. Querying such a large amount of distributed data is particularly challenging and many of these datasets cannot be directly queried using the SPARQL query language.

In order to tackle these problems, We present WimuQ, an integrated query engine to execute SPARQL queries and retrieve results from a large amount of heterogeneous RDF data sources. Presently, WimuQ is able to execute both federated and non-federated SPARQL queries over a total of 668,166 datasets from LOD Stats and LOD Laundromat, as well as 559 active SPARQL endpoints. These data sources represent a total of 221.7 billion triples from more than 5 terabytes of information from datasets retrieved using the service “Where is My URI” (WIMU). Our evaluation of state-of-the-art real-data benchmarks shows that WimuQ retrieves more complete results for the benchmark queries. 

The contributions of this work are:

  • A hybrid SPARQL query-processing engine to execute SPARQL queries over a large amount of heterogeneous RDF data.
  • Evaluation of real-world datasets using the state of the art of federated and non-federated query benchmarks (FedBench, LargeRDFBench and FEASIBLE).
  • We present the first federated SPARQL query-processing engine that executes SPARQL queries over a total of 221.7 billion triples.

This is an ongoing work, in which the next step consists of a Large Scale approach to study the relation and similarity among the datasets. This work was supported by the Semantic Web group of HTWK Leipzig (https://www.htwk-leipzig.de/) under the advisement of Prof. Dr. rer. nat. Thomas Riechert.

Github repository: https://github.com/firmao/wimuT

Prototype/proof of concept: https://w3id.org/wimuq/

Slides: https://tinyurl.com/slidesKcap2019

Paper: https://dl.acm.org/citation.cfm?id=3364436

Conference: http://www.k-cap.org/2019/

Authors/Contact: valdestilhas@informatik.uni-leipzig.de, tsoru@informatik.uni-leipzig.de, saleem@informatik.uni-leipzig.de

Posted in paper presentation, Papers | Tagged , , | Comments Off on More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released

Dear all,

The Smart Data Analytics group [1] and the E.T.-db-MOLE sub-group located at the InfAI Leipzig [2] is happy to announce

DL-Learner 1.4.

DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.

Website: http://dl-learner.org
GitHub page: https://github.com/SmartDataAnalytics/DL-Learner
Download: https://github.com/SmartDataAnalytics/DL-Learner/releases/tag/1.4.0

In the current release, we continued to improve the code and work on our query tree and class expression learning algorithms. The config file can now optionally be written in Json syntax. We updated the packaging to be ready for Java 11 and also tested DL-Learner on Windows. Some logical fixes to the Horizontal Expansion in CELOE were reported and analysed by Yingbing Hua, thanks!

The DL-Learner system has also been presented at The Web Conference in Lyon 2018 [3]. We want to thank everyone who helped to create this release. We also acknowledge support by the following projects: LIMBO [4], QROWD [5], SAKE [6], Big Data Europe [7], HOBBIT [8], GeoKnow [9], GOLD [10], and SLIPO [11].

Kind regards,

Jens Lehmann, Lorenz Bühmann, Patrick Westphal and Simon Bin

[1] http://sda.tech
[2] https://infai.org/efficient-technology-integration/
[3] http://jens-lehmann.org/files/2018/www_dllearner.pdf
[4] https://www.limbo-project.org/
[5] http://qrowd-project.eu/
[6] https://www.sake-projekt.de/
[7] https://www.big-data-europe.eu/
[8] http://project-hobbit.eu/
[9] http://geoknow.eu/
[10] http://aksw.org/Projects/GOLD.html
[11] http://www.slipo.eu/

Posted in Announcements, DL-Learner, Software Releases, Uncategorized | Comments Off on DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released