Please use this identifier to cite or link to this item:
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings
Author(s): Böschen, Falk
Scherp, Ansgar
Title: Formalization and preliminary evaluation of a pipeline for text extraction from infographics
Editor(s): Görg, S
Bergmann, R
Müller, G
Citation: Böschen F & Scherp A (2015) Formalization and preliminary evaluation of a pipeline for text extraction from infographics. In: Görg S, Bergmann R & Müller G (eds.) Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, volume 1458. CEUR Workshop Proceedings, 1458. LWA 2015 Workshops: KDML, FGWM, IR, FGD, Trier, Germany, 07.10.2015-09.10.2015. Aachen, Germany: CEUR Workshop Proceedings, pp. 20-31.
Issue Date: 31-Dec-2015
Date Deposited: 22-Oct-2018
Series/Report no.: CEUR Workshop Proceedings, 1458
Conference Name: LWA 2015 Workshops: KDML, FGWM, IR, FGD
Conference Dates: 2015-10-07 - 2015-10-09
Conference Location: Trier, Germany
Abstract: We propose a pipeline for text extraction from infographics that makes use of a novel combination of data mining and computer vision techniques. The pipeline defines a sequence of steps to identify characters, cluster them into text lines, determine their rotation angle, and apply state-of-the-art OCR to recognise the text. In this paper, we formally define the pipeline and present its current implementation. In addition, we have conducted preliminary evaluations over a data corpus of 121 manually annotated infographics from a broad range of illustration types such as bar charts, pie charts, and line charts, maps, and others. We assess the results of our text extraction pipeline by comparing it with two baselines. Finally, we sketch an outline for future work and possibilities for improving the pipeline.
Status: VoR - Version of Record
Rights: The copyright is owned by default by the authors. Copying is permitted only for private and academic purposes. The permission for academic use implies an attribution obligation, i.e., you must properly cite the items that you use in your own published work. Modification is not permitted unless a suitable license is granted by its copyright owners. Copying or use for commercial purposes is forbidden unless an explicit permission is acquired from the copyright owners.

Files in This Item:
File Description SizeFormat 
Böschen-Scherp-2015.pdfFulltext - Published Version433.5 kBAdobe PDFView/Open

This item is protected by original copyright

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved

If you believe that any material held in STORRE infringes copyright, please contact providing details and we will remove the Work from public display in STORRE and investigate your claim.