In this colloquium Markus Ackermann will touch on the ‘linguistic gap‘ of recent POS tagging endeavours (as perceived by C. Manning, [1]). Building on observations in that paper, potential paths towards more linguistically informed POS tagging are explored:
An alternative to the most widely employed ground truth for development and evaluation of POS tagging systems for English will be presented ([2]) and utilization of benefits of a DL-based representation of POS tags for a multi-tool tagging approach will be shown ([3]).
Finally, the presenter will give an overview about work in progress with the goal to combine OWL/DL-representation of POS tags with a suitable symbolic machine learning tool (DL-Learner, [4]) to improve the performance of a state-of-the-art statistical POS tagger with human-interpretable post-correction rules formulated as OWL/DL-expressions.
[1] Christopher D. Manning. 2011. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In Alexander Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171–189.
[2] G.R. Sampson. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Clarendon Press (Oxford University Press).
[3] Christian Chiarcos. 2010. Towards Robust Multi-Tool Tagging: An OWL/DL-Based Approach. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL2010.
[4] Jens Lehmann. 2009. DL-Learner: Learning Concepts in Description Logics. In The Journal of Machine Learning Research, Volume 10, pp. 2639-2642.