Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/28051
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings
Author(s): Böschen, Falk
Scherp, Ansgar
Title: Formalization and preliminary evaluation of a pipeline for text extraction from infographics
Editor(s): Görg, S
Bergmann, R
Müller, G
Citation: Böschen F & Scherp A (2015) Formalization and preliminary evaluation of a pipeline for text extraction from infographics. In: Görg S, Bergmann R & Müller G (eds.) Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDBvolume 1458. CEUR Workshop Proceedings, 1458. LWA 2015 Workshops: KDML, FGWM, IR, FGD, Trier, Germany, 07.10.2015-09.10.2015. Aachen, Germany: CEUR Workshop Proceedings, pp. 20-31. http://ceur-ws.org/Vol-1458/D03_CRC13_Boeschen.pdf
Issue Date: 31-Dec-2015
Series/Report no.: CEUR Workshop Proceedings, 1458
Conference Name: LWA 2015 Workshops: KDML, FGWM, IR, FGD
Conference Dates: 2015-10-07 - 2015-10-09
Conference Location: Trier, Germany
Abstract: We propose a pipeline for text extraction from infographics that makes use of a novel combination of data mining and computer vision techniques. The pipeline defines a sequence of steps to identify characters, cluster them into text lines, determine their rotation angle, and apply state-of-the-art OCR to recognise the text. In this paper, we formally define the pipeline and present its current implementation. In addition, we have conducted preliminary evaluations over a data corpus of 121 manually annotated infographics from a broad range of illustration types such as bar charts, pie charts, and line charts, maps, and others. We assess the results of our text extraction pipeline by comparing it with two baselines. Finally, we sketch an outline for future work and possibilities for improving the pipeline.
Status: VoR - Version of Record
Rights: The copyright is owned by default by the authors. Copying is permitted only for private and academic purposes. The permission for academic use implies an attribution obligation, i.e., you must properly cite the items that you use in your own published work. Modification is not permitted unless a suitable license is granted by its copyright owners. Copying or use for commercial purposes is forbidden unless an explicit permission is acquired from the copyright owners.
URL: http://ceur-ws.org/Vol-1458/D03_CRC13_Boeschen.pdf

Files in This Item:
File Description SizeFormat 
Böschen-Scherp-2015.pdfFulltext - Published Version433.5 kBAdobe PDFView/Open



This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.