University of Leipzig AKSW Homepage | Blog |

Archiv für die Kategorie 'Data Quality'

Writing a Survey – Steps, Advantages, Limitations and Examples

February 13, 2015 - 10:10 am by AmrapaliZaveri - No comments »

What is a Survey?

A survey or systematic literature review is a text of a scholarly paper, which includes the current knowledge including substantive findings, as well as theoretical and methodological contributions to a particular topic. Literature reviews use secondary sources, and do not report new or original experimental work [1].

A systematic review is a literature review focused on a research question, trying to identify, appraise, select and synthesize all high quality research evidence and arguments relevant to that question. Moreover, a literature review is comprehensive, exhaustive and repeatable, that is, the readers can replicate or verify the review.

Steps to perform a survey

  • Select two independent reviewers

  • Look for related/existing surveys

    • If it exists, see how long back it was done. If it was 10 years ago, you can go ahead and update it.

  • Formulate research questions

  • Devise eligibility criteria

  • Define search strategy – keywords, journals, conferences, workshops to search in

  • Retrieve further potential article using search strategy and also directly contacting top researchers in the field

  • Compare chosen articles among reviewers and decide a core set of papers to be included in the survey

  • Perform Qualitatively and Quantitatively on the selected set of papers

  • Report on the results

Advantages of writing a survey

There are several benefits/advantages of conducting a survey, such as:

  • A survey is the best way to get an idea of the state-of-the-art technologies, algorithms, tools etc. in a particular field

  • One can get a clear birds-eye overview of the current state of that field

  • It can serve as a great starting point for a student or any researcher thinking of venturing into that particular field/area of research

  • One can easily acquire updated information of a subject by referring to a review

  • It gives researchers the opportunity to formalize different concepts of a particular field

  • It allows one to identify challenges and gaps that are unanswered and crucial for that subject

Limitations of a survey

However, there are a few limitations that must be considered before undertaking a survey such as:

  • Surveys can tend to be biased, thus it is necessary to have two researchers, who perform the systematic search for the articles independently

  • It is quite challenging to unify concepts, especially when there are different ideas referring to the same concepts developed over several years

  • Indeed, conducting a survey and getting the article published is a long process

Surveys conducted by members of the AKSW group

In our group, three students conducted comprehensive literature reviews on three different topics:

  • Linked Data Quality: The survey covers 30 core papers, which focus on providing quality assessment methodologies for Linked Data specifically. A total of 18 data quality dimensions along with their definitions and 69 metrics are provided. Additionally, the survey contributes a comparison of 12 tools, which perform quality assessment of Linked Data [2].

  • Ubiquitous Semantic Applications: The survey presents a thorough analysis of 48 primary studies out of 172 initially retrieved papers.  The results consist of a comprehensive set of quality attributes for Ubiquitous Semantic Applications together with corresponding application features suggested for their realization. The quality attributes include aspects such as mobility, usability, heterogeneity, collaboration, customizability and evolvability. The proposed quality attributes facilitate the evaluation of existing approaches and the development of novel, more effective and intuitive Ubiquitous Semantic Applications [3].

  • User interfaces for semantic authoring of textual content: The survey covers a thorough analysis of 31 primary studies out of 175 initially retrieved papers. The results consist of a comprehensive set of quality attributes for SCA systems together with corresponding user interface features suggested for their realization. The quality attributes include aspects such as usability, automation, generalizability, collaboration, customizability and evolvability. The proposed quality attributes and UI features facilitate the evaluation of existing approaches and the development of novel more effective and intuitive semantic authoring interfaces [4].

Also, here is a presentation on “Systematic Literature Reviews”: http://slidewiki.org/deck/57_systematic-literature-review.

References

[1] Lisa A. Baglione (2012) Writing a Research Paper in Political Science. Thousand Oaks: CQ Press.

[2] Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer (2015), ‘Quality Assessment for Linked Data: A Survey’, Semantic Web Journal. http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey

[3] Timofey Ermilov, Ali Khalili, and Sören Auer (2014). ;Ubiquitous Semantic Applications: A Systematic Literature Review’. Int. J. Semant. Web Inf. Syst. 10, 1 (January 2014), 66-99. DOI=10.4018/ijswis.2014010103 http://dx.doi.org/10.4018/ijswis.2014010103

[4] Ali Khalili and Sören Auer (2013). ‘User interfaces for semantic authoring of textual content: A systematic literature review’, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 22, October 2013, Pages 1-18 http://www.sciencedirect.com/science/article/pii/S1570826813000498

BIG at LSWT 2013 – From BIG Data to Smart Data

September 30, 2013 - 4:36 pm by AmrapaliZaveri - No comments »

The 5th Leipziger Semantic Web Tag (LSWT2013) was organized as a meeting point for german as well as international Linked Data experts. Under the motto: From Big Data to Smart Data sophisticated methods that enable handling large amounts of data have been presented on September 23th in Leipzig. The keynote was held by Hans Uszkoreit, scientific director at the German Research Center for Artificial Intelligence (DFKI). By being introduced  to Text Analytics and Big Data issues the participants of the LSWT 2013 discussed the intelligent usage of huge amounts of data in the web.

Presentations on industrial and scientific solutions showed working solutions to big data concerns. Companies like Empolis, Brox and Ontos presented Linked Data and Semantic Web solutions capable of handling terabytes of data. However, also traditional approaches, like Datameer’s Data Analytics Solution based on Hadoop pointed out that big data could be handled nowadays without bigger problems.

Furthermore, problems detecting topics in massive data streams (Topic/S), document collections (WisARD) or corpora at information service providers (Wolters Kluwer) were tackled. Even the ethical issue of robots replacing journalists by the help of semantic data has been examined by Alexander Siebert from Retresco.

In conclusion, the analysis of textual information in large amounts of data is an interesting and so far not yet fully solved area of work. Further Information are available from the website.

Further information on topics related to data analysis, data curation, data storage, data acquisition and data usage can be found in our technical whitepaper available from our project website.

AKSW @ ISWC

July 29, 2013 - 3:34 pm by AmrapaliZaveri - No comments »

We are happy to announce that this year seven papers from AKSW were accepted at the ISWC in different tracks and covering a wide range of topics as listed below:

ISWC Research track:

TITLE: DAW: Duplicate-AWare Federated Query Processing over the Web of Data
AUTHORS: Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira, Helena Deus and Manfred Hauswirth

TITLE: ORCHID – Reduction-Ratio-Optimal Computation of Geo-Spatial Distances for Link Discovery
AUTHORS: Axel-Cyrille Ngonga Ngomo

TITLE: Pattern Based Knowledge Base Enrichment
AUTHORS: Lorenz Bühmann, Jens Lehmann

TITLE: Real-time RDF extraction from unstructured data streams
AUTHORS: Daniel Gerber, Sebastian Hellmann, Lorenz Bühmann, Ricardo Usbeck, Tommaso Soru and Axel-Cyrille Ngonga Ngomo

ISWC In-Use track:

TITLE: Integrating NLP using Linked Data
AUTHORS: Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer

TITLE: Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model
AUTHORS: Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio, Ricardo Pietrobon

ISWC 2013 Evaluation track:

TITLE: Crowdsourcing Linked Data Quality Assessment
AUTHORS: Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann

See you there mates!

Special Issue on Web Data Quality in IJSWIS

June 14, 2013 - 2:55 pm by AmrapaliZaveri - 2 comments »

Call for papers
Special Issue on Web Data Quality
International Journal on Semantic Web and Information Systems

Scope:

The standardization and adoption of Semantic Web technologies has resulted in an unprecedented volume of data being published as Linked Open Data (LOD). The integration across this Web of Data, however, is hampered by the ‘publish first, refine later’philosophy. This leads to various quality problems arising in the underlying data such as incompleteness, inconsistency and incomprehensibility. These problems affect every application domain, be it scientific (e.g., life science, environment), governmental or industrial applications.

This Special Issue is addressed to those members of the community interested in providing novel methodologies or frameworks in assessing, monitoring, maintaining and improving the quality of the Web of Data and also introduce tools and user interfaces which can effectively assist in the assessment. The benefits of such methodologies will not only help in detecting inherent data quality problems currently plaguing the Web of Data, but also provide the means to fix these problems and maintain the quality in the long run. Additionally, we also seek articles that help identify the current impediments in building real-world LOD applications

Topics:

  • Web data and LOD quality concepts
  • Data quality dimensions and metrics for Web data and LOD quality
  • Web data and LOD quality methodologies
  • Data quality assessment frameworks
  • Evaluation of quality and trustworthiness in the web of data
  • (Semi-)automatic assessment in the web of data
  • Large-scale quality assessment of structured datasets
  • Validation of currently existing data quality assessment methodologies
  • Use-case driven quality assessment
  • Quality assessment leveraging background knowledge
  • Co-reference detection and dataset reconciliation
  • Data quality methodologies for linked open data
  • Evaluating quality of ontologies
  • Web data and LOD quality tools
  • Design and implementation of data quality monitoring, assessment and improvement tools
  • Quality exploration and analysis interfaces
  • Scalability and performance of tools
  • Monitoring tools
  • Case studies on Web data and LOD quality assessment and improvement
  • Web data and LOD quality benchmarks
  • Issues in LOD
  • Methods to acquire most relevant LOD datasets
  • Generating meaningful associations across LOD datasets