Please use this identifier to cite or link to this item:
Appears in Collections:Computing Science and Mathematics Journal Articles
Peer Review Status: Refereed
Title: RIPL: A Parallel Image Processing Language for FPGAs
Author(s): Stewart, Robert
Duncan, Kirsty
Michaelson, Greg
Garcia, Paulo
Bhowmik, Deepayan
Wallace, Andrew
Keywords: Computing methodologies
Image processing
Computer systems organization
Data flow architectures
Reconfigurable logic and FPGAs
Software and its engineering
Domain specific languages
Issue Date: 31-Mar-2018
Citation: Stewart R, Duncan K, Michaelson G, Garcia P, Bhowmik D & Wallace A (2018) RIPL: A Parallel Image Processing Language for FPGAs. ACM Transactions on Reconfigurable Technology and Systems, 11 (1), Art. No.: 7.
Abstract: Specialized FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real-time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the register transfer level is time consuming and error prone. Existing software languages supported by high-level synthesis (HLS), although providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware-specific code optimizations. Such optimizations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware, such as by using language pragmas to partition data structures across memory blocks. This article presents a thorough account of the Rathlin image processing language (RIPL), a high-level image processing domain-specific language for FPGAs. We motivate its design, based on higher-order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with nonlocal random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclostatic dataflow models to describe their data rates and static scheduling on FPGAs. RIPL compares favorably to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real-world algorithms are implemented in RIPL: visual saliency and mean shift segmentation. For the visual saliency algorithm, RIPL achieves 71 FPS compared to optimized C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimized C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.
DOI Link: 10.1145/3180481
Rights: This work is licensed under a Creative Commons Attribution International 4.0 License (
Licence URL(s):

Files in This Item:
File Description SizeFormat 
a7-stewart.pdfFulltext - Published Version3.5 MBAdobe PDFView/Open

This item is protected by original copyright

A file in this item is licensed under a Creative Commons License Creative Commons

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

If you believe that any material held in STORRE infringes copyright, please contact providing details and we will remove the Work from public display in STORRE and investigate your claim.