Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/36026
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNicol, Robert L-
dc.contributor.advisorBhowmik, Deepayan-
dc.contributor.authorAli, Teymoor R-
dc.date.accessioned2024-05-28T13:23:23Z-
dc.date.issued2023-12-29-
dc.identifier.citationTeymoor Ali, Deepayan Bhowmik, and Robert Nicol. 2023. Domain-Specific Optimisations for Image Processing on FPGAs. J. Signal Process. Syst. 95, 10 (Oct 2023), 1167–1179. https://doi.org/10.1007/s11265-023-01888-2en_GB
dc.identifier.urihttp://hdl.handle.net/1893/36026-
dc.description.abstractAs real-time embedded vision systems become more ubiquitous, the demand for better energy efficiency, runtime, and accuracy have become vital metrics in evaluating overall performance. These requirements have led to innovative computing architectures, leveraging heterogeneity that combine various accelerators into a single processing fabric. These new architectures lead to new challenges in understanding the most efficient way to partition and optimise algorithms on the most suitable accelerator. In this thesis, domain-specific optimisation techniques are applied to enhance performance and resource efficiency for image processing algorithms on heterogeneous hardware. Domain-specific optimisations are preferred for being hardware agnostic and their ability to cater to a wider range of image processing pipelines within the domain. First, a literature analysis is conducted on image processing implementations on heterogeneous hardware, high-level synthesis tools, optimisation strategies, and frameworks. The first objective is to develop macro-micro benchmarks for image processing algorithms to determine the suitability of these algorithms on hardware accelerators. The profiling led to the development of a comprehensive benchmarking framework, Heterogeneous Architecture Benchmarking on Unified Resources (HArBoUR). The framework decomposes each algorithm into its fundamental properties that would affect overall performance. A collection of representative image processing algorithms from various operation domains (\eg Filters, Morphological, Geometric, Arithmetic, CNNs, Feature Extraction ) and full pipelines (\eg edge detection, feature extraction, convolutional neural network) are used as examples to understand the compute efficiency of on three hardware platforms (CPU, GPU, FPGA). The results show that parallelism and memory access patterns influence hardware performance. GPUs excel for algorithms with large data-size parallel operations and regular memory access patterns. FPGAs better suit lower parallel factor and data-sized operations. In addition, optimising for irregular memory access patterns and complex computations remains challenging on both FPGA and GPU architectures. However, FPGAs offer high performance relative to their resource and clock speed, but their specialised architecture requires careful implementation for optimal results. In the case of feature extraction algorithms, GPU acceleration is preferable for high matrix operation-intensive stages due to faster execution times. At the same time, FPGAs are more suitable for lower arithmetic stages due to comparable performance and energy consumption profiles. Edge detection and CNN pipelines demonstrate GPUs faster performance but at a significantly higher energy consumption than FPGAs. FPGAs exhibit lower latency than GPUs, considering initialisation and memory transfer times. CPUs perform comparably to both hardware in low-complexity and data-dependant algorithms. In CNN pipelines, FPGAs compute particular layers faster but generally have slower total inference times than GPUs. Nonetheless, FPGAs offer flexibility with bit-widths and operation-fused custom kernels. Domain-specific optimisations are applied to algorithms such as SIFT feature extraction, filter operations, and CNN pipelines to understand the runtime, energy, and accuracy. Techniques such as downsampling, datatype conversion, and convolution kernel size reduction are investigated to enhance performance. These optimisations notably improve computation time across different processing architectures, with the SIFT algorithm implementation surpassing state-of-the-art FPGA implementations and achieving comparable runtime to GPUs at low power. However, these optimisations led to a 5-20\% image accuracy loss across all algorithms. Finally, the research outcomes described above are applied to two constructed heterogeneous architectures aimed at two domains, low-power (LP) and high-power (HP) systems. Partitioning strategies are explored for mapping CNN layers and operation stages of feature extraction algorithms onto heterogeneous architectures. The results demonstrate that layer-based partitioning methods outperform their fasted homogeneous accelerator counterparts regarding energy efficiency and execution time, suggesting a promising approach for efficient deployment on heterogeneous architectures.en_GB
dc.language.isoenen_GB
dc.publisherUniversity of Stirlingen_GB
dc.subjectImage Processingen_GB
dc.subjectFPGAen_GB
dc.subjectHetergeneous Computingen_GB
dc.subjectHardware Benchmarkingen_GB
dc.subjectComputer Visionen_GB
dc.subjectDomain-Specifc Optimisationsen_GB
dc.subject.lcshComputer architectureen_GB
dc.subject.lcshComputer networksen_GB
dc.subject.lcshComputer visionen_GB
dc.subject.lcshComputer algorithmsen_GB
dc.subject.lcshImage processingen_GB
dc.subject.lcshField programmable gate arraysen_GB
dc.titleDomain-Specific Optimisations for Image Processing Algorithms on Heterogeneous Architecturesen_GB
dc.typeThesis or Dissertationen_GB
dc.type.qualificationlevelDoctoralen_GB
dc.type.qualificationnameDoctor of Philosophyen_GB
dc.rights.embargodate2025-12-31-
dc.rights.embargoreasonDelay in order for the industrial sponsored company to process the data and confirm no IP is compromised/patentable within the thesis. In addition, time to write articles for publication.en_GB
dc.contributor.funderSTMicroelectronicsen_GB
dc.author.emailTeymoor.A@outlook.comen_GB
dc.rights.embargoterms2026-01-01en_GB
dc.rights.embargoliftdate2026-01-01-
Appears in Collections:Computing Science and Mathematics eTheses

Files in This Item:
File Description SizeFormat 
Teymoor_Thesis_2023.pdf17.73 MBAdobe PDFUnder Embargo until 2026-01-01    Request a copy


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.