SPARSE MULTIDIMENSIONAL EXPONENTIAL ANALYSIS WITH AN APPLICATION TO RADAR IMAGING\ast

We present a d-dimensional exponential analysis algorithm that offers a range of advantages compared to other methods. The technique does not suffer the curse of dimensionality and only needs O((d + 1)n) samples for the analysis of an n-sparse expression. It does not require a prior estimate of the sparsity n of the d-variate exponential sum. The method can work with sub-Nyquist sampled data and offers a validation step, which is very useful in low SNR conditions. A favorable computation cost results from the fact that d independent smaller systems are solved instead of one large system incorporating all measurements simultaneously. So the method easily lends itself to a parallel execution. Our motivation to develop the technique comes from 2-D and 3-D radar imaging and is therefore illustrated on such examples.

from noisy data, optimally using a cost effective algorithmic solution rather than an expensive advanced radar system. This application will serve as a guiding example throughout the paper.
ISAR imaging is a system that consists of a real-aperture radar emitting a sequence of high frequency bursts, and a moving target in the far field of the radar, causing backscattering. When the target is hit by an electromagnetic wave, a limited number of locations on the object, such as edges and surface discontinuities, scatter the energy back toward the observation point. The locations of these concentrated sources of scattered energy are called scattering centers, each of which can be described by a multivariate complex exponential. ISAR is widely used and plays an important role in target identification, commercial aircraft classification, military surveillance, and the like.
So the scattering center model in ISAR consists of a finite linear combination of complex exponentials that describe the different scattering centers of the radar target, where the number of these scattering centers is considerably lower than the number of image pixels. Although the model is both simple and sparse, the inverse problem of reliably extracting the location of the scattering centers is rather sensitive to noise [37]. Therefore, the problem has generated a lot of research, which we roughly summarize below.
Fourier-based methods require a large densely sampled 2-dimensional (2D) or 3-dimensional (3-D) data set, which may require a relatively long time to collect. Also, these techniques are limited by the dilemma of time versus frequency resolution and cannot distinguish closely spaced scatterers, as mentioned in [20]. So several researchers have turned their attention to Prony-like spectral estimation or exponential analysis algorithms. In [27] the authors also conclude that the latter are much more accurate than Fourier based methods. But the performance of exponential analysis methods can be seriously affected by a low signal-to-noise ratio (SNR), leading to misclassifying noise as signals.
Here we present another Prony-like technique which allows one to overcome this drawback. Here, the number of scatterers must not be estimated a priori, as pointed out in [1] for other parametric methods. In addition, the new technique does not suffer the well-known curse of dimensionality. A d-dimensional exponential analysis of an nterm model can now be carried out from a mere O((d+1)n) regularly collected samples, substantially fewer than in other Prony-based methods [28,37,30,16,24,26], where the sample usage explodes exponentially. In [37] the entailed complexity of these numerical algorithms is improved by the use of a slicing technique. The computation cost of the new method here compares much more favorably, as we solve several smaller systems instead of one large system dealing with all measurements at the same time.
The theory of compressive sensing also works with sparsely located data, which are, however, collected randomly instead of regularly. Moreover, in radar imaging the results may be severely affected if the scattering centers on the target do not match the prediscretized scene grid, which makes up the dictionary [5]. We emphasize that methods of the Prony family do not work with a discretized grid and hence do not suffer from this drawback.
Other optimization based ISAR techniques include genetic and evolutionary algorithms [19,6]. While they are quite robust and can work completely automatically, without estimation of the model order, they require a lot of computation time, a disadvantage shared by most optimization based methods. Several 2-D compressive sensing or other optimization approaches [34,36,1] may not be feasible in higher dimensions.
The paper is organized as follows. The proposed d-dimensional exponential analysis is presented in section 2. Further validation of the computed results, which proves to be very useful when working with low SNR, is introduced in section 3. The details of the exponential model governing ISAR imaging are given in section 4, together with a first application and comparison of the new method to [16]. A way to recondition and subsequently regularize the d-dimensional exponential analysis is explained in section 5. The full-blown method, including validation and reconditioning, is illustrated in section 6, where it is further compared to [30]. Among the existing d-dimensional exponential analysis generalizations, we chose to compare our 2-D and 3-D numerical illustrations to [16] and [30] for the following reasons. In section 4.2 we use the 2-D Prony-like algorithm MEMP from [16] to illustrate the need for an automatic pairing of the separately computed 1-dimensional (1-D) results, as made available by the new method. In section 6 the multidimensional ESPRIT algorithm from [30] illustrates the importance of obtaining an automatic estimation of the sparsity n, considered to be a difficult problem but which we solve here.
2. Multidimensional exponential analysis. The problem of d-dimensional exponential analysis consists of retrieving the linear parameters \alpha j \in \BbbC and the nonlinear parameters \phi j \in \BbbC d in the exponential model from as few function samples as possible. Until recently, algorithms to solve the problem required a number of samples on the order of O(n d ) [16,18,24,26] or O(2 d n) [30] or at most (d + 1)n 2 log 2d - 2 n [31], all growing exponentially with the dimension of the problem statement. In this section we present a reliable implementation which is based on [10] and requires only O((d + 1)n) regularly gathered samples.
The expressions exp(\langle \phi j , \Delta 1 \rangle ), j = 1, . . . , n are retrieved as the generalized eigenvalues \lambda j of the problem where the v j denote the right eigenvectors. For the sake of completeness and for use further on, we point out that the upper left element in the left-and right-hand side matrices need not carry the indices 1 and 0, respectively. We can start with a higher index number instead of 0, as long as we have 2n consecutive samples lined up in (2.4) [7]. So the sampling of f (x) in the direction of \Delta 1 need not start at the origin. In applications, the generalized eigenvalue problem (2.4) is often solved as part of a classic 1-D exponential analysis algorithm. In our numerical illustrations we use the matrix pencil method studied in [17,35] combined with the rank reduction step described in [29]. In the literature this combination is often referred to as the ESPRIT method, although the rank reduction is performed on the Hankel matrices directly instead of the covariance matrices. For the practical details concerning this aspect, the reader is referred to sections 4 and 6. In sections 2, 3, and 5 the mathematical backbone of the new method is developed.

(2.6)
Note that (2.6) reduces to a square Vandermonde system in the noisefree case, because then n of the linear equations are linearly dependent as a consequence of the fact that the values exp(\Phi j ) already satisfy (2.4).
We call vectors \Delta i , i = 2, . . . , d identification shifts because they will allow one to identify the individual \phi ji in the computed \Phi j from samples taken at shifted locations. For this last step we make use of the fact that the \phi ji appear linearly in the \Phi j .
For i fixed, the additional samples F si can be written as \alpha j exp (\langle \phi j , \Delta i \rangle ) exp (\langle \phi j , s\Delta 1 \rangle ) , s = 0, . . . , n -1, So for i fixed, the A ji , j = 1, . . . , n, are obtained from the Vandermonde system for which the coefficient matrix is part of the Vandermonde structured coefficient matrix in (2.6). From the A ji and the \alpha j we obtain for i fixed where in what follows we denote \Phi ji := \langle \phi j , \Delta i \rangle , j = 1, . . . , n.
Note that we have no problem pairing the \Phi ji to the \Phi j , j = 1, . . . , n since for each i the A ji are paired to the \alpha j , j = 1, . . . , n, through the Vandermonde systems (2.6) and (2.9). These A ji and exp(\Phi ji ) can be computed for each i = 2, . . . , d. The fact that the vectors \Delta 1 and \Delta i , i = 2, . . . , d are linearly independent then leads for each separate j = 1, . . . , n to the d \times d regular linear system \left( \Delta 11 \cdot \cdot \cdot \Delta 1d \Delta 21 \cdot \cdot \cdot \Delta 2d . . . . . .  .7), or a mere total of (d + 1)n samples. In practice, when dealing with noisy data, the value of n is overestimated by \eta > n, as discussed in the next section. The minimal number of samples in an \eta -term exponential model of the form (2.1), in the directions \Delta 1 and \Delta i , i = 2, . . . , d, which are, respectively, 2\eta and \eta , is often again overestimated by N \geq 2\eta and \frakn \geq \eta . The square n \times n generalized eigenvalue problem (2.4), the 2n \times n Vandermonde system (2.6), and the n \times n Vandermonde system (2.9) then, respectively, take the sizes (N -\eta ) \times \eta , N \times \eta , and \frakn \times \eta and are all solved in the least squares sense. Sometimes some of the samples are used in a preprocessing step, such as the computation of an intermediate (N -\eta ) \times \nu structured lower rank approximation to the Hankel matrices, where \nu < \eta .
In the next sections we describe how this technique is combined with convergence theorems from approximation theory on the one hand and sparse interpolation from computer algebra on the other hand, in order to do the following: \bullet filter unstructured noise in the data out of the structured exponential model (2.1) via a connection to Pad\' e approximation theory; \bullet automatically deduce and validate the sparsity n of expression (2.1), which is usually regarded to be a hard problem; \bullet separate exponential components that are contained in a cluster of similar components, using a connection with sparse interpolation; \bullet and as a result of all of the above, tighten the numerical estimates for the parameters \phi j and \alpha j in case of a low SNR.

Connection with
Pad\' e approximation: Validation. From the theoretical mathematical presentation in section 2, we now switch to the practical situation where the samples F s and F si are contaminated by noise. For the reliable computation of the parameters \phi j and \alpha j we need to add some steps to the algorithm. The first change is that we are going to interpret the samples as if they are coming from an \eta -term exponential model of the form (2.1), where \eta > n is a safe overestimate of n. A connection with Pad\' e approximation theory will then allow us to separate the noise from the actual signal content.
Consider the function With F s given by (2.3) we can write [33,2] The partial fraction decomposition (3.1) is related to the 1-D Laplace transform and the Z-transform of (2.1), where the inner product \langle \phi j , x\rangle is regarded as the unknown. It is a rational function of degree n -1 in the numerator and degree n in the denominator with poles 1/ exp(\Phi j ). Now let us perturb \frakf (z) with Gaussian noise to obtain The theorem of Nuttall--Pommerenke states that if \frakf (z) + \epsilon (z) is analytic throughout the complex plane, except for a countable number of poles [21] and essential singularities [25], then its sequence of Pad\' e approximants \{ r \eta - 1,\eta (z)\} \eta \in \BbbN of degree \eta -1 over \eta converges to \frakf (z) + \epsilon (z) in measure on compact sets. This means that for sufficiently large \eta , the measure of the set where the convergence is disrupted, so that | \frakf (z) + \epsilon (z) -r \eta - 1,\eta (z)| \geq \tau for some given threshold \tau , tends to zero as \eta tends to infinity. Pointwise convergence is disrupted by \eta - n unwanted pole-zero combinations of the Pad\' e approximants that are added to the n true poles and n -1 true zeros of \frakf (z) [13,15], the pole and zero in the pair almost cancelling each other locally. These pole-zero combinations are referred to as Froissart doublets. In practice, these Froissart doublets offer a way to separate the noise \epsilon (z) from the underlying \frakf (z) [14,15]. Because of the Pad\' e convergence theorem, the n true (physical) poles are identified as stable poles in successive r \eta - 1,\eta (z), while the \eta -n spurious (noisy) poles are distinguished by their instability. For different \eta [3,23]: \bullet the noisy poles lie scattered in the area around the complex unit circle, as welll as for every different realization of the noise \epsilon (z); \bullet and the true poles exp( - \Phi j ), j = 1, . . . , n are forming clusters so that around each exp( - \Phi j ) cluster there is an almost Froissart doublet-free zone. This characteristic of the true poles is precisely the key point on which our method is based. After the computation of \eta > n generalized eigenvalues \lambda j = exp(\Phi j ), we discard the unstable ones and focus on the stable ones. Note the following: \bullet In order to safely rely on this convergence result, it is clear that \eta should be sufficiently large, as the result is more numerically accurate for \eta large. We usually take \eta to be a multiple of (the so far unknown) n.
\bullet To decide which generalized eigenvalues are the unstable ones, the computational scheme needs to be repeated a number of times with different sets of N + (d -1)\frakn data, which can be achieved as follows. We discuss the sampling along the \Delta 1 direction first. Instead of collecting F s , s = 0, . . . , N -1, in the direction of \Delta 1 , we collect some additional F s , s = 0, . . . , N + (\kappa -1)\lfloor \frakp N \rfloor -1. Here 0 \leq \frakp \leq 1 and 1 \leq \kappa \in \BbbN . From these samples we construct \kappa snapshots of N samples each, snapshot number k = 0, . . . , \kappa -1, starting at k\lfloor \frakp N \rfloor , with an overlap of roughly (1 -\frakp )N points with the previous and the next snapshot. The case when \frakp = 0 and \kappa = 1 delivers the single snapshot situation of the previous section.
When putting all \kappa \eta generalized eigenvalues of the \kappa different eigenvalue problems (2.4) together, then theoretically \kappa n of them cluster together in n clusters of each \kappa elements, and the other \kappa (\eta -n) generalized eigenvalues lie scattered around as they do not reflect true terms in the exponential model (2.1). Of course, the noise may be such that the method does not work perfectly and that in an apparent cluster of somewhat less than \kappa elements are found. We therefore accept a cluster as soon as a sufficiently large fraction of the \kappa expected elements is found.
In the numerical examples we found it most useful to use a density-based cluster analysis such as DBSCAN [12]. The DBSCAN implementation requires two parameters: the density \delta of the clusters and the minimum number m \delta of required cluster elements. These parameters are chosen in terms of the noise in the signal: \bullet Larger values of \delta allow the detection of wider clusters, for instance in cases of a higher noise level. Smaller values of \delta lead to denser clusters with very stable estimates for the generalized eigenvalues, for instance in cases of lower levels of noise. \bullet A value for m \delta smaller than \kappa allows one to discard bogus estimates appearing as a consequence of outliers in the data or too high noise levels. It makes perfect sense, depending on the application, to relax m \delta to, for instance, \kappa -1, \kappa -2 or \lfloor 0.9\kappa \rfloor , \lceil 0.8\kappa \rceil . A very desirable side result of the technique described in this section is the fact that the method automatically reveals the true number n of terms in the expression (2.1) underlying all the samples: n equals the number of clusters detected by the cluster analysis.
It remains to discuss the sampling along the linearly independent shifts of \Delta 1 . Here also, the data set needs to be enlarged in order to support the processing of \kappa snapshots. So at most we collect for each i = 2, . . . , d the samples F si , s = 0, . . . , \frakn + (\kappa -1)\lfloor \frakp N \rfloor -1 (for some choices of the parameters N, \kappa , \frakn , \frakp not all consecutive samples are used). Remember that each of the computed \Phi ji , j = 1, . . . , \eta , i = 2, . . . , d, is connected to its \Phi j , j = 1, . . . , \eta , from the solution of the generalized eigenvalue problem (2.4). For i fixed, we therefore know which \Phi ji are linked to a cluster element \Phi j and which belong to a scattered \Phi j . When taking the m \delta values \Phi ji together that are linked to a cluster element \Phi j , then we can improve the estimate for \Phi ji , j = 1, . . . , n, i = 2, . . . , d, by considering the center of gravity of the m \delta values \Phi ji that go together. As the \Phi ji are obtained from the solution of two Vandermonde structured linear systems through (2.10), their estimates are usually found to be somewhat less accurate than the estimates computed for the clustered \Phi j .
Analysis of the \Phi ji values when taking \kappa snapshots can also serve an additional purpose. Sometimes it is useful to run DBSCAN a consecutive number of times with increasing values for \delta . In this way, very condensed clusters are detected right from the start, and more relaxed clusters are picked up in some later run. In case \delta is relaxed too much, an inspection of the (at least) m \delta values \Phi ji associated with the (at least) m \delta estimates for a particular \Phi j in the candidate cluster helps to accept or refute the relaxed cluster. The latter can be done by looking at the spread (standard deviation) of the associated \Phi ji . If this exceeds an acceptable threshold, the candidate cluster is rejected. So while a cluster of m \delta estimates for some \Phi j is``identified,"" it is``confirmed"" by the analysis of the m \delta associated values \Phi ji as well as for each i = 2 . . . , d.
Let us illustrate the procedure described in sections 2 and 3 on some small-scale numerical examples. In section 5 we further explain how to deal with the situation where some of the clusters around the true \Phi j , j = 1, . . . , n partially overlap, for instance because of very similar \phi j , j = 1, . . . , n in the exponential model (2.1).
4. Application to ISAR imaging. High frequency scattering toward an observation point is often modeled by means of a finite number of concentrated sources of scattering energy, also called scattering centers. A radar signal backscattered from a far-field target with n scattering centers at locations (x j , y j , z j ), j = 1, . . . , n, in a cartesian coordinate system, is then decomposed into n contributions, each with a different phase and magnitude. Assume the radar system emits a signal at frequency \omega h in the direction or line of sight with azimuth angle \theta g and elevation angle \phi m . The backscattered signal f (h, g, m) with (h, g, m) \in \BbbR 3 + is approximated by the following sum of complex exponentials: where \beta j is the scattering amplitude of the jth scattering center, c is the speed of light, \omega c is the central frequency \omega c = (\omega 0 + \omega (N - 1)h )/2, and the parameters \omega h , \theta g , and \phi m are defined by \omega h = \omega 0 + h\delta \omega , \theta g = \theta 0 + g\delta \theta , \phi m = \phi 0 + m\delta \phi .
The remaining values \omega 0 , \theta 0 , \phi 0 and \delta \omega , \delta \theta , \delta \phi are set by the user and are system dependent. We rewrite the exponential model (4.1) as By means of the Prony-like method presented in section 2, the computation of the unknown scattering locations (x j , y j , z j ), j = 1, . . . , n and the unkown scattering amplitudes \beta j , j = 1, . . . , n is then neatly separated, with the scattering locations delivered first after applying (2.9) and (2.10).

3-dimensional illustration of the new algorithm.
To illustrate the method on a synthetic small-scale 3-D example, we consider the 29-term exponential expression (4.1), with (x j , y j , z j ) and \beta j given in Table 1. We further set the following radar parameters: \omega 0 = 7.9GHz, \delta \omega = 0.0015GHz, \theta 0 = \phi 0 = - 0.024, \delta \theta = \delta \phi = 3.75 \times 10 - 4 . We choose (for no specific reason, except that (2.2)     In addition, a d-dimensional algorithm departing from a grid structured data set [16] does not offer any of the advantages we have discussed so far, among which are as follows: \bullet the natural pairing of \Phi ji , i = 2, . . . , d to \Phi j , j = 1, . . . , n, \bullet the automatic detection of the sparsity n, and \bullet the validation of the computed locations (x j , y j , z j ). In Figure 1 we show the DBSCAN result for SNR = 10 dB, with m \delta = \kappa - 2 and \delta varying over 5 \ell \times 10 - 4 , \ell = 0, . . . , 4; among the 1000 computed generalized eigenvalues (\eta = 100, \kappa = 10) 29 clusters are indicated in color. They identify the stable generalized eigenvalues which were detected and confirmed by the algorithm outlined in section 2. None of the groups of m \delta associated values exp(\Phi ji ), j = 1, . . . , 29, i = 2, 3 exhibits a standard deviation larger than 0.25.
We also run the above example for varying noise levels, from 40 dB SNR to 5 dB SNR, now with \kappa = 20, and each experiment repeated 100 times as the noise is randomly generated. In Figure 2 we show the average true cluster radius over the 29 scattering locations for the generalized eigenvalues exp(\Phi j ) with m \delta = \kappa -2. This true radius is computed a posteriori with the exact exp(\Phi j ) in the center. In Figure 3 we show, respectively, at the left and the right for i = 2, 3 the average cluster radius over the 29 scattering locations for the \kappa -5 estimates closest to the true associated exp(\Phi ji ). With \kappa = 20, a ratio of \kappa -5 over the maximum number m \delta of associated elements still represents 83.3\% of the associated values. In Figures 2 and 3 we also show the smallest and largest cluster radius (dashed lines)---they differ by a factor of about 2. It is quite clear that the computation of the exp(\Phi j ) is more accurate than that of the exp(\Phi ji ). The estimates of the latter can be tightened, but this is not really important at this point.

2-dimensional illustration of the validation aspect.
In another experiment we consider the 2-D example with 12 scattering centers (x j , y j ) from Table  2. The dimension is reduced from three to two for the sole reason that in our figures we want to use the third dimension to graph the impact of the SNR. The radar parameters \omega 0 , \theta 0 and \delta \omega , \delta \theta are as in section 4.1. We further take N = 150, \nu = \eta = 50, \frakn = 100, \kappa = 11, \frakp = 0.054, and \Delta 1 = (1.38, 4.14), \Delta 2 = ( - 7.56, 5.67).
In Figure 4 (right) we show the result of the computations after applying DBSCAN with m \delta = \kappa -1 and \delta = 0.00001, 0.002505, 0.005 to each SNR result for the \Phi j , and discarding cluster results when the standard deviation of the \Phi j2 exceeds 0.2. We let the SNR vary from 40 dB to 5 dB, top to bottom. The SNR = 10 dB slice is presented in Figure 5 (right), and a separate coordinate view is found in Figure 6 (right), where the SNR varies from right to left. For each SNR the experiment is repeated 250 times.
We compare these results with the output in Figures 4 (left), 5 (left), and 6 (left) of the 2-D Prony-like algorithm MEMP [16] using the same number of samples but now laid out in a (\Delta 1 , \Delta 2 )-grid of size 40 \times 42. We remark the main differences with the new algorithm: \bullet the matching in the MEMP algorithm between results computed in separate dimensions is definitely not flawless, and as the noise increases erroneous combinations give rise to nonexistent locations; \bullet the matching through the indexing of the variables in (2.6) and (2.9) leaves no room for error, and so does not introduce matching errors; \bullet for increasing noise, meaning decreasing SNR, the unvalidated MEMP algorithm may return a few erroneous (x j , y j ) despite the fact that the sparsity n = 12 was passed to the algorithm as well; \bullet the correct sparsity n = 12 need not be passed to the new algorithm, which detects it automatically as the number of identified and confirmed clusters; \bullet in the new algorithm the results for very small SNR are either somewhat less accurate or absent because of the high validation requirement, which can of course be relaxed by the user.   When dealing with the exp(\Phi ji ), i = 2, 3, we discard cluster results with a standard deviation above 0.5. We remark that as the density \delta increases, the probability increases that a candidate cluster, detected among the \Delta 1 -projections, is not confirmed in each and every one of the \Delta i -projections, i = 2, . . . , d. Rejection dominates acceptance from \ell = 7 on. In the end, the above algorithm detects and validates 516 scatterers out of 1000 (see Figure 8), but misses the scatterers that are located too closely together or for which the inner products in (4.1) are too much alike. Although the overall shape of the fighter is correctly recognized (nose, wing tips, tail, etc.), which may be more than satisfactory for many applications, the accuracy of the algorithm can be improved in the region where several scattering centers (x j , y j , z j ) are located near one another, such as the windshield. To this end the algorithm needs to be combined with a sub-Nyquist technique, particularly suitable for the exponential analysis of such signals [9]. This final addition to the algorithm is explained in the next section. We also point out that, thanks to the validation step, there are no false results. 5. Connection with sparse interpolation: Superresolution. We return to the notation of section 2 to continue our presentation. When replacing the primary sampling direction \Delta 1 by a multiple  \lambda j (m) = exp(m\Phi j ) = \lambda m j , j = 1, . . . , n.
From \lambda m j the imaginary part of \Phi j = \langle \phi j , \Delta 1 (m)\rangle may no longer be retrieved uniquely because we can only guarantee that | \Im (\langle \phi j , \Delta 1 (m)\rangle )| < m\pi . (5.1) So aliasing may have kicked in. Because of the periodicity of exp(\Im (\langle \phi j , m\Delta 1 \rangle )), a total of m values in the 2m\pi wide interval (5.1) can be identified as plausible values for \langle \phi j , \Delta 1 \rangle . Note that when the original \lambda j are clustered, the powered \lambda m j may be distributed quite differently and unclustered. Such a relocation of the generalized eigenvalues, here referred to as superresolution, can seriously improve the conditioning of the Hankel matrices involved. In Figure 9 we show the effect of this powering on a particular example where 20 generalized eigenvalues are clustered in five clusters of different size.
What we need to resolve now is the aliasing problem that is possibly introduced by powering the generalized eigenvalues. This aliasing can be fixed at the expense of a small number of additional samples. Remember that in what follows, n can be replaced everywhere by \eta \geq n when using \eta -n additional terms to model the noise.
An easy choice for \mu is a (small) number mutually prime with m (for the most general choice allowed, we refer to [8]). With the additional samples we proceed as follows: \bullet From the samples F 0 , F m , . . . , F (2n - 1)m we first compute the generalized eigenvalues \lambda m j and the coefficients \alpha j going with \lambda m j in the model We know which coefficient \alpha j goes with which generalized eigenvalue \lambda m j , but we cannot identify the correct \Im (\langle \phi j , \Delta 1 \rangle ) from \lambda m j . \bullet Next we deal with the samples at the additional locations sm\Delta 1 +\mu \Delta 1 , which satisfy \bullet Then, by dividing the \alpha j \lambda \mu j computed from (5.4) by the \alpha j computed from (5.2), for j = 1, . . . , n, we obtain from each quotient \lambda \mu j a second set of \mu plausible values for \langle \phi j , \Delta 1 \rangle in the 2\mu \pi wide interval | \Im (\langle \phi j , \mu \Delta 1 \rangle )| < \mu \pi . \bullet Because of the fact that we choose \mu and m relatively prime, the two sets of plausible values for \langle \phi j , \Delta 1 \rangle have only one value in their intersection [9]. Thus the aliasing problem is solved: Each \langle \phi j , \Delta 1 \rangle is retrieved uniquely from the computation of both \lambda m j and \lambda \mu j for j = 1, . . . , n. This multidimensional sub-Nyquist sampling strategy may help us determine the clustered scattering centers occurring in section 4.3. As suggested in Figure 9, the technique spreads out the generalized eigenvalues, which may recondition the inverse problem. In addition, a variation of scale factors m may be used, and the idea can be translated into sampling at the shifted locations involving the identification shifts \Delta 2 , . . . , \Delta d which satisfy (2.8).
To illustrate how the combined algorithm, laid out in sections 2, 3, and 5, works, we take up the challenging example of section 4.3 again to return highly accurate results from about 95\% of the scattering locations. The result is also compared to another d-dimensional generalization, called ND-ESPRIT, which arranges the samples in multilevel Hankel matrices [30].
6. Full scale ISAR illustration. Returning to the example in section 4.3, we take the radar parameters, the SNR, and the vectors \Delta i , i = 1, 2, 3 as specified there. We collect 30000 samples F s = f (s\Delta 1 ) and 30000 samples F si = f (s\Delta 1 +\Delta i ), i = 2, 3, along each of the shifts, for a total of 90000 samples in total. These samples are now reorganized as follows for use with the technique described in section 5.
For all of the above analyses, the sampling in the direction \Delta 1 starts with F 0 and continues with F m , F 2m , . . . . The shifted samples, which serve the purpose of repairing the possible sub-Nyquist aliasing effect, start with F 1 and continue with F m+1 , F 2m+1 , . . . . In order to make good use of the samples inbetween, the procedure can be repeated m -1 times with the same m and \mu but now starting the sampling, instead of at F 0 , at F 1 , then at F 2 , and so on until F m - 1 . In this way a choice of m produces m\kappa estimates for the exp(\Phi i ), i = 1, . . . , \eta , instead of \kappa , and thus provides a sound basis for validation since m\kappa is usually sufficiently large.
For the choices above, we have m\kappa = 12 for m = 2, 3, 4, and so we can take, for instance, m \delta = (5/6)m\kappa = 10. In Figure 10 (left and right), we show how accurate the scattering centers are reconstructed under SNR = 20 dB noise: With every scattering center in the original data we associate the log 10 of the Euclidean distance to the nearest reconstructed scattering center (in meters on the x-axis) and then accumulate these (tallied on the y-axis). The distinction between the two figures is that Figure  10  The improvement from m = 1 to m = 4 may not seem very impressive at first. But note that the accurately reconstructed scattering centers (say log 10 (\cdot ) \leq - 1) from m = 1 need not be the same as the accurately reconstructed ones from the use of m = 4. Therefore, the combination of both results, merely joining the 516 reconstructions from m = 1 with the 696 reconstructions from m = 4, immediately leads to the improved distance graph shown in Figure 11. Eventually, all runs executed with m = 1, 2, 3, 4 can be combined merely by joining all the computed scatterer reconstructions: 516 from m = 1, 667 from m = 2, 673 from m = 3, and 696 from m = 4, adding up to 2552 in total, with many of them (almost) duplicates. This then leads to highly acurate results for most of the scatterers. In Figure 12 we see that in this combined output 81\% of the scatterers is reconstructed within an error of at most 10 cm, and 95\% is found within a distance of 30 cm! Only three scatterers are not reconstructed within a distance of 1 m. The most inaccurately reconstructed scatterer in Figure 13 is near the engine outlet, where our reconstruction is slightly off. In Figure 13, the 2552 reconstructions are displayed together. Note that, thanks to the validation technique, there are no false results, as also pointed out for Figure 8, where the sub-Nyquist subdivision of the data samples is not yet put to work.
It remains to compare the result to that of a d-dimensional Prony-type algorithm,  such as [30] from data laid out in a grid. For instance, a 45 \times 45 \times 45 grid consists of 91125 samples, which compares nicely to the 90000 samples used in our method. The d-dimensional version considered in [30] starts with the construction of a multilevel Hankel matrix, for which we take 26 \times 20 Hankel blocks on all d = 3 levels, thus adding up to a 26 3 \times 20 3 or 17576 \times 8000 matrix. A log-plot of its singular values is shown in Figure 14 (left), from which one can deduce that n \approx 467 (point of maximal curvature of the plot). With 20 dB noise added to the data, the Euclidean distance log-plot for the 467 reconstructed scatterers is as in Figure 14 (right). This graph compares somewhat to the graphs in Figure 10 but is far from the result displayed in Figure 12, which can be attained with the same sample usage. In Figure 15 we show the actual 467 reconstructed scatterers superimposed on the fighter jet.
One may wonder what role is played by the total number of 90000 collected samples for our method in Figures 10--13. When reducing the sampling from 30000 along each of the three directions to 24000, then 71\% of the scatterers is found within  a distance of 10 cm, and 93\% within 30 cm. When increasing the sampling from three times 30000 to three times 60000, then, as expected, the reconstruction improves: 94\% is found within 10 cm, and 98\% within 30 cm.