Local optima networks for continuous fitness landscapes

Local Optima Networks (LONs) have been proposed as a coarsegrained model of discrete (combinatorial) fitness landscapes, where nodes are local optima and edges are search transitions based on an exploration search operator. This paper presents one of the first complex network analysis of continuous fitness landscapes. We use benchmark functions with well-known global structure, and an existing implementation of a Basin-Hopping algorithm to extract the networks. We also explore the impact of varying the Basin-Hopping perturbation step-size. Our results suggest that the landscape's connectivity pattern (global structure) strongly varies with the perturbation step-size, with extreme values of this parameter being detrimental to search and fragmenting the global structure. Our LON visualisations strikingly illustrate the landscape's global (funnel) structure, indicating that LONs serve as a tool for visualising high-dimensional functions.


INTRODUCTION
Complex optimisation problems are generally non-convex and multi-modal (i.e have multiple local optima and potentially multiple global optima). Moreover, in real-world scenarios the multiple local optima are likely to conform to some global structure, instead of being distributed uniformly at random in the search space.
Local Optima Networks (LONs) [21] are a coarse-grained model of itness landscapes inspired by work on energy surfaces in theoretical chemistry [5]. The idea is to compress the search space Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. GECCO  into a smaller mathematical object: a graph, where vertices are the local optima in the underlying landscape and edges are the possible search transitions among optima. A conspicuous limitation of heuristic optimisation is the danger of getting trapped at a local optimum. LONs capture the number, distribution and connectivity pattern of local optima in the underlying landscape. They are, therefore, a useful tool to study the global structure of itness landscapes. Once a LON has been constructed, a variety of metrics and visualisation tools can be applied to enlighten our understanding of its structure [19]. These features can then be used for performance prediction and enhancing the selection and coniguration of suitable optimisation methods to solve the problem at hand.
Despite their inspiration from the study of energy landscapes, the LON model was deined and developed to deal with discrete search spaces and combinatorial optimisation [21,26], and has only been applied, to the authors' knowledge, to continuous spaces once before [2]. In the discrete case, local minima can be located precisely given suicient computation time. In the continuous domain this is more diicult; concepts from diferential topology, the presence of saddle points, and issues of precision and convergence come into play.
The main contributions of this paper are to: (1) Adapt the LON model to represent continuous itness landscapes. (2) Propose a sampling methodology to extract and construct the networks. (3) Analyse the funnel structure of well-known benchmark functions. (4) Explore the efect of increasing the perturbation strength on the global structure of the studied landscapes.
We start with an overview of the notion of a funnel. Section 3 presents relevant deinitions and algorithms to construct the LON models. Section 4 describes the benchmark instances, sampling procedure, and metrics computed, while 5 presents our results. Finally, Sections 6 and 7 summarises our indings and suggests directions of future work.

THE NOTION OF FUNNEL
Funnels can be loosely deined as groups of local optima which are close in coniguration space within a group, but well-separated between groups. Funnels also constitutes a coarse-grained gradient towards a low cost optimum. The intuition is captured by Figure 1 where two funnels are depicted. A more formal deinition is given in Section 3.3 X f(X) Figure 1: Depiction of two funnels.

Funnels in Energy Landscapes
In theoretical/computational chemistry, an energy landscape is a mapping of all possible conformations of a molecular entity (clusters, glasses and proteins) to their corresponding energy levels [27]. The structure and dynamics of these molecules are known to be related to the topographical features of their energy landscapes [6,27].
The term 'funnel' was introduced in the protein folding community to describe ła region of coniguration space that can be described in terms of a set of downhill pathways that converge on a single low-energy structure or a set of closely-related low-energy structures. ž [6]. The energy landscape of proteins is believed to be characterised by a single deep funnel, a feature that underpins their ability to fold to their native state. In contrast, some shorter polymer chains (polypeptides) that misfold, are expected to have other funnels that can act as traps. Funnels have been widely studied on the so called atomic clusters (spatial arrangements of atoms), as they represent a convenient benchmark of controllable complexity for which excellent putative global minima have been tabulated [6].

Funnels in Continuous Fitness Landscapes
Energy landscapes in computational chemistry and itness landscapes in optimisation are analogous. This is particularly true for continuous optimisation. Locatelli [12] found that search diiculty in continuous optimisation does not relate strictly to the number of local optima, but to how chaotic their positions are. He suggests that problems are structured in multiple levels; at each level, diferent 'objects' are observed, but all levels display a similar structure. This hints towards a fractal structure of itness landscapes [25].
Lunacek and Whitley [14] propose a dispersion metric to quantify the proximity of the best regions in the search space. A high dispersion metric indicates the presence of multiple funnels. In a follow up work [15], the authors studied abstract landscapes with two funnels and ind that evolutionary algorithms mostly fail when the global optimum is located in a proportionally smaller funnel.
Work on exploratory landscape analysis of continuous landscapes suggests that multimodality and global structure are key high-level features help to diferentiate between problem classes [3,9]. The presence of funnels, captured with the dispersion metric, was also found to correlate with and explain the performance of Particle Swarm Optimisation algorithms [16].

Funnels and Local Optima Networks
Recent studies have used LONs to characterise the global structure of combinatorial itness landscapes revealing in many cases a multifunnel structure [7,22,23]. The presence of sub-optimal funnels hinders the optimisation success, as the algorithms can become trapped, and thus fail to locate the funnel containing the global optimum.

ALGORITHMS AND DEFINITIONS
A full enumeration of the local optima for continuous spaces and benchmark instances of non-trivial size is unmanageable. Therefore, LONs are constructed from a sample of high-quality local minima in the search space. To construct the networks we need to deine their nodes and edges. These deinitions are closely related to the methodology for extracting the network data, which is based on a number of runs of a Basin-Hopping algorithm [11,28]. The idea is to record and aggregate the local optima visited by several trajectories of the algorithm. Our local minima sample is, therefore, biased towards regions of high-itness and is not algorithm-agnostic. We argue that this approach is relevant to analyse the global structure of the benchmark itness landscapes studied. Moreover, we contrasted several perturbation step sizes to deine the local optima connectivity. This section describes the Basin-Hopping algorithm and deines the LON models used.

Basin-Hopping
In computational chemistry, global optimisation algorithms are required to minimise the energy function of molecular conformation problems. A successful algorithm for these problem was termed Basin-Hopping by Wales and Doye [28], which is essentially the same as the previous Monte Carlo-minimisation algorithm of Li and Scheraga [11]. Interestingly, Basin-Hopping is analogous to the class of algorithms known as Iterated Local Search (ILS) in combinatorial optimisation [13].
Both Basin-Hopping and ILS are iterative methods with each cycle composed of the following steps: (i) random perturbation of the incumbent solution (ii) local minimisation, and (iii) acceptance criterion, which accepts or rejects the new solution based on the objective function value. The original Basin-Hopping algorithm depends on the Metropolis acceptance criterion of occasionally accepting uphill moves. A simple yet powerful variant is obtained if the Metropolis acceptance criterion is abandoned in favour of only accepting downhill (or non-deteriorating) steps [10]. Note that this corresponds to setting the Metropolis temperature parameter T = 0. This algorithm will follow a descending sequence of local minima until a funnel bottom is reached. Funnel bottoms can be empirically recognised (estimated) by the lack of improvement after a large number of move attempts. We used this Monotonic Sequence Basin-Hopping variant [10] (Algorithm 1) in our study in order to extract the LON models and study the landscapes funnel structure.

Model Description
A itness landscape is deined as a triplet (X , N , f ), where X is the set of conigurations or candidate solutions; N is a notion of neighbourhood, distance or accessibility on X ; and f : X → R is the itness function.
In the context of continuous optimisation, X is the set R n of all possible real-valued solutions to the problem of n dimensions, f : R n → R, and we denote a candidate solution as the vector x = (x 1 , x 2 , . . . , x n ). A common way of deining neighbourhood

Algorithm 1 Monotonic Sequence Basin-Hopping
Require: Search space X , Fitness function f (X ), Perturbation step size p, Stopping threshold R 1: Choose initial random solution x 0 ∈ X 2: l ← LocalMinimisation(x 0 ) 3: r ← 0 4: repeat 5: r ← 0 10: else 11: r ← r + 1 12: end if 13: until r ≥ R 14: return l of a solution x in continuous spaces is the set of points within the hypersphere with some small radius and centre x [24]. However, this deinition of neighbourhood requires Euclidean distance calculations to test for neighbourhood. In this paper we deine neighbourhood based on orthotopes (or hyperrectangles).
Formally, the neighbourhood set N (x k ) of an n-dimensional point x k is an n-orthotope deined as follows: where s = (s 1 , s 2 , . . . , s n ) is a vector deining the size of the neighbourhood in all dimensions.
In order to study the multi-funnel structure of continuous itness landscapes, we considered and adapted the Monotonic (MLON) and Compressed Monotonic (CMLON) models introduced in [23].
Monotonic LON. Is the directed graph MLON = (L, E), where nodes are the local optima L, and edges E are the monotonic perturbation edges.
Local optima. We assume a search space R n with a itness function f (x) and a neighbourhood function N (x). A local optimum, which in the studied benchmark functions is a minimum, is a solution l such that ∀x ∈ N (l), f (l) ≤ f (x). Notice that the inequality is not strict, in order to allow the treatment of neutrality (local optima of equal itness), which is generally present in complex real-world problems. The set of local optima, which corresponds to the set of nodes in the network model, is denoted by L.
Monotonic perturbation edges. Edges are directed and based on the perturbation operator (adding a stepsize). There is an edge from local optimum l 1 to local optimum l 2 , if l 2 can be obtained after applying a random perturbation to l 1 followed by local minimisation, and f (l 2 ) ≤ f (l 1 ). These edges are called monotonic as they record only non-deteriorating transitions between local optima. Edges are weighted with estimated frequencies of transition. The edge weights are estimated after the sampling process. The weight is the number of times a transition between two local optima basins occurred with a given perturbation. The set of edges is denoted by E.

Compressed Monotonic LON Model
This is a coarser LON model compressing connected local optima at the same itness level (according to a given accuracy) into single nodes. The instances studied in this paper only show a small amount of neutrality. We consider this model, however, for the sake of generality, and because it allows a crisper characterisation of the funnel bottoms or sinks.
Compressed Monotonic LON. Is the directed graph CMLON = (CL, CE), where nodes are compressed local optima CL, as deined below, and the edges CE are aggregated from the monotonic edge set E by summing up the edge weights.
Compressed local optima. A compressed local optimum is a set of connected nodes in the MLON with the same itness value. Two nodes in the MLON are connected if there is a perturbation edge between them. The set of connected MLON optima with the same itness, denoted by CL, corresponds to the set of nodes in the Compressed Monotonic LON model.
There is a natural end to every monotonic sequence, cl s , when no improving transitions can be found. This corresponds to the funnel 'bottom'. In the directed CMLON network, cl s will be a node without outgoing edges (or 'sink') 1 .
Funnel. We characterise funnels in the CMLON as the aggregation of all monotonic sequences ending at the same point (funnel bottom or sink). Funnels can be seen as basins of attraction at the level of local optima.

EMPIRICAL METHODOLOGY 4.1 Benchmark Functions
Our experiments used three well-known benchmark functions: Ackley's, Rastrigin, and Birastrigin (also known as the Double Rastrigin or Lunacek's function). These are detailed below, and visualised for two variables in Figure 3.
Ackley's Function. This function was initially deined in two dimensions, but the extended form [1] has been used for our experimentation in order to investigate multiple dimensions. It is expressed as Ackley's function centres around a single funnel with a high number of local minima. While it appears to be relatively simple, it is an important benchmark as it is reminiscent of the free energy landscape of proteins, and therefore has potential real-world application [4].
Rastrigin Function. Like Ackley's function, Rastrigin has a single funnel landscape, but with considerably fewer local minima. The diiculty of this search space is in the increased distance between those local minima, and lattened search space. These properties mean that local searches tend to become trapped, whereas global searches must possess an adequate step-size. It is expressed as This function has one global minimum at [0, 0, . . . , 0] with a itness of 0.0, and is typically evaluated within the range [−5.12, 5.12] for all x i .
Birastrigin Function. The Birastrigin (or Lunacek's function [15]) is the resulting hybrid of the double-sphere and Rastrigin functions; efectively creating a double-funnel problem with local optima. This function is intended to create a more diicult benchmark for global optimisation methods that closely resembles those found in real world problems ś speciically those in computational chemistry, such as Lennard-Jones clusters [5]. This double-funnel landscape is particularly challenging for population based methods. It is expressed as

Sampling Method
The sampling procedure consists of aggregating the local minima and transition edges obtained by 100 runs of basin-hopping (Algorithm 1). The stopping condition was set to 1000 iterations without an improvement. We used the basin-hopping implementation provided in Scipy [8], with the following coniguration and parameter values: Initialisation. The initial solution is a random n-dimensional coordinate vector x 0 with independent identically distributed components generated from a uniform distribution within the given problem bounds.
Local minimisation. The local minimisation step used the limited-memory variant of the Broyden-Fletcher-Goldfarb-Shanno algorithm [20] (L-BFGS-B). This version of BFGS uses less memory by storing only a few vectors instead of the estimation of the entire inverse Hessian matrix, ensuring the scalability of this sampling technique to much larger search-spaces. As this variant is also At the end of each local optimisation step, the solution found is stored as a local optimum to form the nodes of the LON. In continuous spaces, there is the issue of deciding when two solutions found through diferent local search runs are close enough in solution space to be regarded as the same local optimum. In this study, we set the local optimum position threshold to 10 −5 in all dimensions.
Perturbation. The perturbation strength is an essential parameter in deining the search-landscape as it has been shown to have dramatic efects on the shape of said landscape [18]. If it is inadequate, the search will not be able to escape the basin in which the solution originates, but if it is too strong, it will move through the search-space, randomly sampling diferent regions without learning from the landscape. This is especially problematic when the search has progressed to a region with relatively high itness, where large perturbations are likely to be deteriorating.
We determined the base perturbation step (β) for each function in 3D and 5D spaces by randomly sampling the space, performing a local search, and perturbating that solution based on a β value. This β was then varied until approximately 50% of the steps resulted in escaping the original local optimum. This method for selecting perturbation strength via a sampling technique was adopted from that used by Leary [10]. The resulting step sizes are detailed in Table 1. For each function and dimension combination, four perturbation step-sizes were used: p ∈ {0.5β, β, 2β, nβ } (where n is the dimension). The perturbation was performed by applying a displacement to each coordinate of the vector sampled from a uniform distribution within the perturbation step-size, x ′ = x + U ni f orm(−p, p).

Metrics
For each instance, problem dimension, and perturbation step size, we extracted the LON models and computed the measurements described in Table 2. Metrics are reported as aggregations over 100 runs.   Table 2) for four perturbation step sizes, which are multiples of the base (β as deined in Table 1).

Structural and Performance Metrics
As seen in Figure 2, the perturbation strength (pstep) has substantive efects on the structural metrics derived from LONs. The total number of optima found over the course of the basin-hopping experiments generally increased as more suitable pstep values were used, before decreasing as it became excessive. In regards to Ackley's function, 0.5β best explored the search space, increasing to β in 5 dimensions, with a similar efect seen with the Rastrigin peaking at β in 3 dimensions and 2β in 5. This increase in perturbation strength was not required for the Birastrigin function, as 2β was favoured in both investigated dimensions, there was, however, a much more stark increase in discovered optima in the higher dimensional space.
The number of discovered optima correlates well with the number of funnels detected, and the strength of their basins. With 0.5β, we can see that a low number of optima were discovered, but a large number of funnels, indicating that the basin-hopping algorithm rarely escaped the nearest local optima to its origin point. This is most pronounced in the Rastrigin and Birastrigin 5D functions showing a stark decrease in the number of funnels between pstep = β and pstep = 2β, where we were able to uncover the underlying single and double funnel landscapes. This suggests that, the right perturbation step size better exploits the underlying global structure.
This is further reinforced in the success rate, that is, the percentage of runs inding the global optimum. With 0.5β, only the simpler Ackley's function showed any signs of success, with higher dimensionality requiring β, much in line with the number of observed optima and funnels. Most notably, the success rates on the 5D Rastrigin and Birastrigin functions share a very a similar behaviour to the strength, number of optima, and number of funnels. We can also see a similar detrimental behaviour to increased βs, suggesting that higher dimensional functions may be more sensitive to excessive perturbation sizes than those with lower dimensions.
The mean distance of the inal solutions found from the global optimum (deviation) clearly displays a decrease in accordance with the strength of the perturbation parameter. When considering this with the success rates, we can suggest that larger step sizes in basin hopping may decrease the likelihood of inding the global optimum, but increases the chances of achieving, on average, a good solution. Figure 3 provides a 3D representation of the three benchmark instances selected. For practical purposes, we can only visualise functions with two variables in this way as there is no clear way of visually conveying itness landscapes for functions with many variables.

Visualisation
One of the advantages of modelling itness landscapes with LONs is the possibility of visualising these higher dimensional functions. To illustrate this point, Figure 4 visualises the compressed monotonic LONs for the three benchmark functions with ive variables, n = 5. In these plots, the nodes are local minima, and edges are perturbation transitions with step size p = 2β (the values of β, for each function and dimension can be found in Table 1). The size of nodes is proportional to their incoming weighted degree (also called strength), which indicates how much a node 'attracts' the search process. Node colours indicate funnel membership, with pink highlighting the funnel containing the global optimum, and light blue demonstrating the sub-optimal funnels. The red node identiies the global optimum, while the dark blue nodes represent the bottom of sub-optimal funnels (which we also call sinks).
By contrasting the images in Figures 3 and 4, we can appreciate the overall resemblance of the respective functions. In both cases, Ackley and Rastrigin have a single funnel structure, while Birastrigin has a double funnel structure. The smoother shape of the Rastrigin funnels can also appreciated.
In order to illustrate the efect of the perturbation step-size in the LONs connectivity pattern, Figure 5 shows 2D LON visualisations for the three benchmark functions with 5 variables n = 5 and three increasing perturbation step-sizes, pstep, as indicated in the sub-igures captions. It is worth noting that the sampling process aggregates local optima and transition edges from 100 Basin-Hopping independent runs, so the maximum possible number of estimated funnels is 100. The success rate achieved by these 100 independent runs is also shown in the sub-igures caption.
In the images, the size of nodes is proportional to their incoming strength, and the colours indicate funnel membership. The pink nodes belong to the funnel containing the global optimum, while light blue nodes belong to sub-optimal funnels. The red node (when visible) is the global optimum, while dark blue nodes indicate the sub-optima funnel bottoms or sinks.  With the lowest perturbation step in the Ackley function (plot (a)) only a portion of the runs (success = 0.26) reached the global optimum. The successful trajectories are indicated by the pink nodes ending at the global optimum (red node). The remaining runs end up trapped in sub-optimal sinks, with trajectory lengths of one or two hops. For the largest perturbation step (plots (b) and (c)) a single funnel can now be appreciated, where all the search trajectories converge to the global optimum.
For the Rastrigin function and a low perturbation step (plot (d)), the global optimum cannot be reached; the 100 runs all end up trapped in a sub-optimal funnel after a few search transitions. For a perturbation step of 2β (plot (e)) a single funnel is observed, with most trajectories converging to the global optimum (which is hidden behind the pink nodes in this 2D projection). Increasing the perturbation step to 5β (plot (f)) causes the landscape to fragment again into multiple funnels, with several trajectories trapped in sub-optimal sinks. The Birastrigin function shows a similar trend to the Rastrigin function, with the diferences being observed at the intermediate perturbation step-size pstep = 2β (plot (h)), where two funnels appear, with the sub-optimal funnel in blue attracting a larger proportion (65%) of the search trajectories.

DISCUSSION
This proof-of-concept paper shows that it is possible to use LONs to model and visualise the global structure of continuous search spaces. There are, however, a number of limitations and challenges regarding the sampling methodology that need to be addressed before the approach can be applied more widely.
One challenge relates to the sampling of initial solutions to be used as the starting points for local minimisation. The approach used in this paper is to sample each component from a uniform random distribution. A problem with this approach is that, as the number of dimensions increase, the position of the full solution vector is biased towards the centre of the search space. In addition, each initial solution is sampled independently and so there is no guarantee that the diferent starting positions provide a good coverage of the search space. Latin hypercube sampling [17] is a random approach that provides a better distribution in multidimensional spaces than uniform random sampling. Future work could investigate the efect of the sampling approach of initial solutions on the LON models. The choice of method for performing local optimisation is an important consideration in the sampling for LON construction. In this study we used a variant of the BFGS algorithm, which is a quasi-Newton method that uses an approximation of the gradient of the objective function to direct the search. An alternative local search approach may need to be used for black-box optimisation scenarios with expensive function evaluations. There are, however, scenarios where the gradient information is available.
The approach used in this study is computationally expensive. It may not be necessary to perform as many runs of the Basin-Hopping algorithm or to perform as many iterations in the local optimisation step. Further work could investigate cheaper ways to sample continuous search spaces in order to construct LONs that are less detailed, but still informative.

CONCLUSION
This paper detailed a methodology of extracting local optima networks from continuous functions. We demonstrated that the perturbation strength of the Basin-Hopping algorithm is an essential parameter to tune in order to extract the underlying nature of the function ś landscapes appear fractured at values without appropriate strengths, coming together to form the single and double-funnel structures known to exist in the problem domain when the perturbation strength is better chosen. We show that LONs can be used as a tool for visualising itness landscapes, displaying that we can preserve the fundamental topology of the high-dimensional functions. Future work will address the limitations discussed, aiming at analysing larger dimensions and real-world problems.