Structural information in the shape of the optimum of registration objective function

Registration is a way to find meaningful correspondences between points in one image to points in another image or a group of images. It attempts to align images, such that common structures match. In conventional pairwise intensity-based registration, we usually attempt to find the optimum of registration objective function. We investigated whether there is structural information present in the shape of the optimum. Such structures might be used to improve the performance of registration. By using simple structures (i.e., an edge or corner structure) and Mutual Information (MI) objective function, we perturbed one image locally with a diffeomorphism, and found interesting structure in the shape of the quality of fit function


Introduction
*Registration can be defined as the process of overlaying two or more images of the same scene taken at different time, from different viewpoints, and/or by different sensors (see the survey papers (Sotiras et al., 2013;Zitová and Flusser, 2003). Image registration is required for automatic model building (Cootes et al., 2004), to find correspondences in groups of images (Cootes et al., 2010), to combine and compare information from multiple multimodal images (fusion) (Haber and Modersitzki, 2006). Some attempts were made to add structural information into registration schemes to improve their performance (Konukoglu et al., 2011;Pluim et al., 2000;Purwani and Twining, 2014).
In conventional pairwise intensity-based registration we usually attempt to find the optimum of registration objective function. We investigated whether there is explicit structural information present in the shape of the registration objective function about the optimum. Such structures might be used to improve the performance of registration.
To investigate the shape of the optimum, we used a simple registered pairs of images with a single well-defined structure (i.e., a step (The images were generated by using Gaussian distributions)), and a way of perturbing this registration. By using mutual information registration objective function (Viola, 1995), we found the structure as two peaks and two troughs. Furthermore, on a corner structure (Fig. 6) we found it as four main peaks and four main troughs. We start the following sections with the calculation of mutual information objective function.

Calculation of mutual information
We start from the Shannon entropy (Shannon, 1948) defined as, where Pᵢ the probability of bin i of the histogram. We consider the registration of two images, I₁ and warped image Ĩ₂. The mutual information is defined as where E is the Shannon entropy (1), computed from the probability distribution of individual image I₁ and Ĩ₂, and E(I₁, Ĩ₂) is the joint entropy computed from the joint probability distribution (2D histogram). Mutual information is an information theory measure used in multimodal registration, and which considers image values and difference in the context of the image as the whole. Mutual information was used in these experiments, as a more challenging implementation than a simpler objective function, such as sum of squared differences. Although some early papers (Viola, 1995;Viola and Wells III, 1997) used the 'Parzen Window' method for density estimation to compute MI, a review paper by Pluim et al. (2003) noted that the majority of the papers used histogram for computing MI (Studholme et al., 1995;Collignon et al., 1995;Studholme et al., 1999;Sabuncu and Ramadge, 2008;Twining and Taylor, 2011). Therefore, we calculated the entropy, and hence the MI by using histogram. Then, how we setup perturbation for a pair of registered images is given in the next section.

Perturbation set-up
In order to probe the shape of the optimum, we set up displacement from the optimum for a pair of registered images † . We applied a simple warpwithin-a-circle ‡ to one of the images. This continuous, smooth and invertible warp (called diffeomorphism) ensures that every point in one image maps to exactly one point in other image, and vice versa, or there are no tears and folds. This warp was used to perturb the results of the registrations locally (i.e. locally non-rigid). We performed some experiments on the simplest case, with fixed circle size, and center of the circle lying on the step edge (see the left image in Fig. 1). The experiments included:  Fix the direction of the maximum displacement, but vary the size.  Fix the size of the maximum displacement, but vary the direction.  For each case, what is the effect of varying the noise in the images?
We generated a pair of unperturbed, registered images once (see the top images in Figs 3 and 4), and applied different sizes of perturbation on one of the images on each run by using random number. First we plotted the MI against the mean displacements across all pixels within the circle. Then by varying the direction we plotted it against the angle. Plots are for the same pair of noisy images, but different sizes of perturbation. These plots are given after the analysis and predictions section.

Analysis and predictions
For movement perpendicular to the step-edge, we show the effect of small displacement using pullback warping (see the left image in Fig. 1).
For small displacement less than one pixel, the only pixels which can change value are those like B, in the first row next to the step. For slightly larger displacement, less than two pixels, pixels like A from the second row will start to alter their colour, as they move onto the ramp part of the interpolation function (see the right graph in Fig. 1). † A pair of step images or a pair of corner-structure images (see the top images in Figs 3, 4 and 6). Based on these facts, we can then predict the effect on the scatter plot and the mutual information (Fig. 2).

Fig. 2: Mutual information for noise-free case
The pixels from the first row (shown in green on the left in Fig. 2) now change their values in image 2. The brightest pixel is the one closest to the center of the circle, which has a larger displacement than any other pixels in the first row.
When there is at most one non-empty bin in each row (see the right image in Fig. 2), then the entropy of image 2 cancels with the joint entropy, leaving just the entropy of image 1 (image 1 not being warped). This gives the list of predictions for MI in the noisefree case, as follows,  For displacements less than one pixel, the MI will be flat.  The graph of MI will have abrupt changes at displacement values = 1, 2, 3, etc.
It can be shown that the mean displacement is a third of the maximum displacement. Hence, in terms of the mean displacement the MI plot will have the flat parts at the predicted values less than 0.333, and have the abrupt changes at the predicted values = 0.333, 0.667, 1, etc.
As noise increases, the smoothing effect (which increases MI (Ashburner and Friston, 2007;Tsao, 2003) will occur at small displacements, hence a central trough rather than flat. This will depend on the amount of noise, and affect all pixels in the circle. At larger displacements, we will see the step misregistration signal, hence MI will decrease. The results in the following section agree with these predictions.

Results and discussions
The plot of MI against the mean displacement for displacement perpendicular to the step-edge of lownoise case is shown on the bottom in Fig. 3. According to the previous predictions, the MI will be flat at the values less than 0.333, and have abrupt changes at the values = 0.333, 0.667, 1, etc. These are shown on the bottom in Fig. 3. The red lines show the predicted displacements at which successive rows of pixels begin to cross the step edge, leading to an abrupt drop in the MI.
As noise increases (see the top images in Fig. 4), the smoothing effect, which increases MI, will occur at small displacements, hence a central trough rather than a flat plateau (see the bottom graph in Fig. 4). The smoothing effects of the interpolation make the distribution of image values become narrower and peakier, and this tends to increase the mutual information (Ashburner andFriston, 2007, Tsao, 2003).
This also shows the step mis-registration signal at larger values of the displacement, where the edge placed in the wrong place starts to decrease the MI (see the bottom graph in Fig. 4).

Fig. 4: The images and the plot of MI in the noisy case
We then fixed the size of maximum displacement rad, but varied the directions (see the MI plots in Fig.  5). The MI plotted against the angle (red) has one abrupt change, whereas that with the same image noise but different size of maximum displacement on the bottom left (blue), has four abrupt changes. This is clearly shown when we replot both graphs against the perpendicular maximum displacement where both graphs fit together (see the right graph in Fig.  5).

Fig. 5:
Left: Two MI graphs with the same noise value w = 0.01 but different maximum displacement rad; Right: MI plotted against the perpendicular maximum displacement for the same data (this shows that the first graph fits on the top of the second graph) Looking at these plots as a whole, we find that the structural information signal in the shape of the mutual information about the global optimum is the pattern of two troughs and two peaks as shown on the left graphs in Fig. 5 which are plotted against the angle. We then applied this case to the corner structure (Fig. 6).
We fixed the size of the maximum displacement but varied the direction of displacements, and we also applied perturbation right on the corner. The MI plotted against the angle has many smaller peaks and troughs (see the bottom graph in Fig. 6). By smoothing it we can see four main peaks and four main troughs at the predicted values. The predicted maxima and minima along with the MI plot are shown in Fig. 7. Taking into account the sense of the arrows (that is, whether they point from black to white or vice versa), we see the equivalences as shown. The bottom right shows the predicted troughs. The approximate predictions are shown on the plot by the arrows. We found similar case when we apply this procedure to the noise-case. The next section will conclude all the discussions.

Conclusion
Various issues occurred regarding the results of experiments, such as a flat plateau rather than a smooth, sharp peak near the optimum (Figs. 3 and 5), or the many minor local maxima and minima that occurred along with the four main local maxima and minima (due to real image structures, Figs. 6 and 7). One possible cause is the use of histograms, and their binning process, when estimating entropy and hence mutual information.
We find the structural information for the case of a corner as the four main peaks and the four main troughs in the plot of MI (see the bottom graph in Fig. 6); instead of two peaks and two troughs for the case of a straight edge (see the left graphs in Fig. 5). However, it is surprising and hence interesting, even in the noise-free case with many smaller peaks and troughs. This suggests that except in this very simplest case, it would not be possible to use this information to go the other way, and infer the structure from the shape of the optimum. And in these examples, we knew where the structure was located at the start. Instead, we will consider other methods of linking registration and segmentation. Recently we consider pairwise registration, but in reality we usually have a large number of images. Therefore, groupwise registration which incorporates structural information or segmentation in its scheme will become our future work.