Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/26444
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings
Author(s): Attila, Kocsis Zoltan
Drake, John H
Carson, Douglas
Swan, Jerry
Title: Automatic improvement of apache spark queries using semantics-preserving program reduction
Citation: Attila Kocsis Z, Drake JH, Carson D & Swan J (2016) Automatic improvement of apache spark queries using semantics-preserving program reduction In: GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference, New York: ACM. 2nd International Genetic Improvement Workshop 2016: An International Workshop on the Repair and Optimisation of Software using Computational Search, 20.7.2016 - 20.7.2016, Denver, CO, USA, pp. 1141-1146.
Issue Date: 2016
Conference Name: 2nd International Genetic Improvement Workshop 2016: An International Workshop on the Repair and Optimisation of Software using Computational Search
Conference Dates: 2016-07-20T00:00:00Z
Conference Location: Denver, CO, USA
Abstract: Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of "deforestation", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases.
Status: Book Chapter: author post-print (pre-copy editing)
Rights: Publisher policy allows this work to be made available in this repository. Published in GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference, New York: ACM. 2nd International Genetic Improvement Workshop 2016: An International Workshop on the Repair and Optimisation of Software using Computational Search, 20.7.2016 - 20.7.2016, Denver, CO, USA, pp. 1141-1146.. The original publication is available at: https://doi.org/10.1145/2908961.2931692
URL: http://dl.acm.org/citation.cfm?doid=2908961.2931692

Files in This Item:
File Description SizeFormat 
main.pdf263.09 kBAdobe PDFView/Open



This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.