Fusion of SAR images for flood extent mapping in northern peninsula Malaysia

Article history: Received 1 September 2016 Received in revised form 25 November 2016 Accepted 27 November 2016 This study aimed at mapping the flood extents in the northern peninsula Malaysia in order to contribute to the flood disaster eradication by extracting more and better information through the fusion of RadarSat 1 and TerraSARX images. Principal Component Analysis and Brovey Transform (BT) techniques were used. The best principal component of the PCA, which is the PC2 was classified and compared with the classified BT image using Maximum likelihood (ML) and support Vector Machine (SVM). The results indicated that the classification of the BT image using SVM has higher accuracy with an overall of 70.9615% as well as kappa coefficient of 0.3418. This method showed relative improvement on the classification of the flooded and non-flooded areas which were used to produce the flood extent Map that was further verified with the DEM of the area. The final results in this study showed more information on the areas that are affected by the floods especially the extents which became more visible after the classification of the fused images.


Introduction
*Floods are among the most destructive natural hazards that affect humans, property and settlements (Khan et al., 2011;Dano Umar et al., 2011, Dano Umar et al., 2014Ayobami and Rabi'u 2012). Flooding is a devastating natural phenomenon that affects and disrupts the wellbeing of the societies especially poor people who are vulnerable to disaster due to limitation of their resources. Most of the economic losses in most parts of the world are as a result of damages caused by floods.
In Asia, most of the natural disasters are related to flood and they cause maximum damage to life and property in comparison to other disasters (Pradhan, 2010). Among all the disasters in Malaysia, floods are recently the most frequent as they occur annually and thereby increasingly cause great damages. Therefore floods are regarded as the most severe type of disaster that is encountered in Malaysia (Chan 2015;Varikoden et al., 2011;Toriman et al., 2009a). The yearly recurrence of floods varies in severity and location but the most susceptible people are lowland residents i.e. near river banks and flood inclined locations especially in Malaysia where flash floods are most common (Toriman et al., 2009b;Dano Umar et al., 2014). These residents are farmers and fishermen who are more concerned about their source of income than the consequences of residing in such low flood prone areas.
In December 2006, Perlis experienced its most severe floods in 30years covering two third of the state, as a result of a three day nonstop rainfall destroying an estimated 26000ha of paddy fields in Kedah and Perlis, and estimated losses of about RM81 million. In October 26, 2003 floods occurred and affected most of the northwestern peninsula including Penang, Kedah and northern Perak (Ghani et al., 2012). Another flood in November 2010, April and September 2011 led to the loss of millions of dollars' worth property and many injured lives (Chan, 2015). Ninety percent (90%) of the impacts encountered from natural disasters by the country are as a result of floods, with an annual average of USD100 million lost and additional damages done on infrastructures, highways, agricultural areas, residential areas and most importantly, the livelihood of people (Pradhan, 2010).
For ages, the chain of reactions for flood catastrophe were limited to implementing mechanisms to regulate flooding such as building flood control works, dams, levees, seawalls and delivering disaster relief to flood victims. However these strategies failed to lessen losses caused as a result of flooding, neither did they dissuade the improper developments taking place in flood prone zones. The strategies are generally designed to lessen the susceptibility of humans to flood in lieu of relying solely on physical confrontation with flood incidents.
The recurrence of floods requires extensive production of flood maps, but the foundation of an accurate map is dependent on the quality of the satellite images. Growing expectations from the public for better flood management tools, matches with the efficiency and effectiveness of remote sensing, where satellite is a solid foundation to mapping flooded areas and also the important role it plays in the four phases of any disaster management cycle, i.e. mitigation, preparedness, response and recovery. This study integrated three different SAR data in order to obtain more information for mapping the flooded areas which is an important strategy in flood disaster management.
Remote sensing is recently one of the most important tools available for disaster management professionals, which make planning projects much more possible and more accurate, compared to earlier times. Application of remote sensing and Geographic Information System (GIS) in the mapping of floods will make planning easier and provide more effective nonstructural means of reducing the destructive effects as well as impacts of flood, as compared to previous models which were validated by the use of ground truth surveys and were not completely reliable (Dano Umar et al., 2011) Remote sensing data are accurate and suitable for mapping natural hazards such as floods, due to their large area coverage, timely, availability and temporal frequency (Walker et al., 2010). Remote sensing images can be used to map the flood areas and extent provided they are acquired at the time of the occurrence or immediately after the flood. Such timely data acquisition largely depends upon satellites pass and the climatic condition over the flood affected areas. Optical remote sensing is only possible under clear weather conditions, whereas, radar remote sensing data is more advantageous due to its all-weather capability. For extraction of useful information and more accurate processing of flood events, different remote sensing data can be used either separately or combined (Huang et al., 2010) for flood mapping.
Flood mapping in general is a means of addressing the effects that are presented in hazard and risk maps which serve as one of flood risk management approaches, i.e. prevention of buildup of new risks, reduction of existing risks and adaptation to changes risk factors (Anh and Nguyen, 2009). Flood mapping with radar images is preferred because prolonged rains and cloud cover during flooding makes acquisition of optical images difficult (Anh and Nguyen, 2009).
Principal Component Analysis is a multivariate means of analyzing data, where the resulting outputs are explained by many connected, variables quantitatively. The aim of PCA is to extract information and present a new set of variables called principal components (Abdi and Williams, 2010). It can be used in data application to extract and maximize common information within the spectral bands and then input that data in the first component. The principal component bands are in a combination that are originally uncorrelated, with a means to produce outputs in which all inter band variations are contained in all the PCs (Rokni et al,. 2014).
The BT is specifically used for visually increasing the contrast between the low and high ends of the histograms of images and thus, making a change in the original scene geometry. It is a method which is developed in order to produce images with RGB bands and that is why only 3 bands are able to be joined at a time (Mandhare et al., 2013). Additionally, in this method, higher resolution images are multiplied by each lower resolution image using mathematical combinations and this is recognized in the production of high quality outputs (Rokni et al., 2014) This paper is as a follow up to a recent study by Dutsenwai et al. (2015), in which classification of different fusion methods was tested for quality information extraction from the integration of three single band SAR images, where BT method has the highest percentage accuracy (Table 1). The objective of this study is to compare BT and PCA of three SAR images of different spatial resolution (RadarSat-1 preflood image, RadarSat-1 post flood image and TerraSAR-X during flood image) to increase the confident level of using the BT for flood extent mapping and determine which composite image contains more interpretable information as well as improvement of boundaries. The reason why PCA was selected to be compared with BT is given in the PCA subsection (3.2.1). Some recent articles integrated between radar images with multi bands; however, this study integrates single band SAR images to investigate their efficacy in providing more interpretable information that can aid in addressing the flood disastrous challenges in the study area.

Study area
Perlis lies in the southern part of the border between Malaysia and Thailand. It is one of the important states situated at the north coast. It consist of a total population of about 217,480, and a land area covering 795sqkm, which makes it the smallest state in Malaysia. Annual rainfall in Perlis ranges between 2000mm to 2500mm, and an annual temperature ranging between 21 0 c to 32 0 c.
Kedah lies in the northwestern part of peninsula Malaysia, south of Perlis and covers an area of 9425sqkm. Kedah is locally known as the rice bowl of Malaysia, because it is the top rice producing state in the country. Fig. 1 shows the map of peninsula Malaysia indicating the two case studies.

Materials and methods
The satellite data used to carry out this study include RadarSat-1 images of before ( Both RadarSat-1 (Pre and Post) images have single bands and a spatial resolution of 25 meters, but the preflood image has an ascending orbit, while the post flood image has a descending orbit. According to Dutsenwai et al. (2015), an advantage of RadarSat is its ability to capture images in descending and ascending modes during the day and night respectively. The TerraSAR-X (during flood) image also has one band, a spatial resolution of 18meters, and a descending orbit. It has dual polarization options which are advantageous. The stages of the methods used to carry out this study are given in Fig. 3 (Flow chart of methodology).

Preprocessing
Due to the common gain in antenna pattern in radar, the variations occur in the images in the direction opposite to the range; these gains were removed using antenna pattern correction in Environment for Image Visualization (ENVI) 4.8. The images were then filtered using Lee Filter in order to smooth the noise. This was done using a 3x3 filter size and a noise variance of 0.2500 respectively. The advantage of using Lee Filter is the good edge appearance, and also its effectiveness in smoothing the speckle in the data with intensities associated with them on statistical basis, it also reduces the noise while preserving the image.
As a result of the differences in the spatial resolution of the images, they were co registered at a root mean square (RMS) value of 0.464514. All three images were captured at different times; before (March 2010), during (November 2010) and after the flood event (2011 January) respectively. The images were geo referenced to the same geographical reference (being from different sensors) of UTM zone 47N and datum WGS84. The three images were stacked into a multiband layer. The resulting band consequently consisted of all the extents of the three images. Finally the study area was extracted at exactly the extents where the three bands overlapped.

Fusion
The images were fused together using the BT in order to obtain more reliable and interpretable information. It was done at pixel level using the method in the following subsection. This was carried out with the aim of achieving a more enhanced and sharpened image by improving the lower resolutions. In a previous study by Dutsenwai et al. (2015), data fusion methods which include the Gram Schmidt (GS), Principal Component Spectral Sharpening (PCSS), the Brovey Transform (BT) and the Hue Saturation Value (HSV) were tested for all the three images and then the laplacian filter was applied to enhance the edges of the image for better classification. All the outputs (GS, PCSS, BT, and HSV) were classified using maximum likelihood (ML) and support Vector machine (SVM). The BT (SVM) method had the best result with highest accuracy as shown in the Table 1. Hence BT was compared with PCA in this study to test for more accuracy before it was used in flood extent mapping of the study area.

Classification
The supervised classification used involved the collection of training sites known as ROI (region of interests). The training sites were carefully chosen and selected based on flooded and non-flooded regions without considering the land use type in order to investigate the flood extent in the area. For the classification, the PCA best output (i.e. the PC2) and the BT output were involved. SVM and ML types of classification were used and were compared.

Principal component analysis (PCA) outputs
Four outputs were obtained after the PCA, which included the RGB of PC1PC2PC3 and the separate PC1, PC2 and PC3 outputs. Fig. 4 shows the RGB of the PCA Comparing Fig. 4 with Fig. 2a, 2b and 2c, it can be said that the PCA has more information in Fig. 4 than the other figures, an example is the area colored purple in the north eastern part which shows that the information in that area is different from the nearby wet areas as shown in Fig. 2a, and also different from the flooded area in Fig. 2b.
Ideally, the first three principal components have the largest percentage of data variance compared to other variance, therefore in this case where the principal components are actually three in number, all the percentage variance is within the three components.
Also between the principal components, the higher the correlation between the principal components, the better, hence the principal components are better interpreted individually to find the degree of correlation between the three principal components with respect to the outputs in Fig. 5.  The first three components of a principal component analysis are considered to have large data variance and much lesser noise, even between the first three principal components one is selected as the best with better information characteristics (Rokni et al., 2014). In this case the PC2 (Fig. 5b) was selected as the best, because it was considered to have higher correlation by dominating over PC1 (Fig. 5a) and PC3 (Fig. 5c) with more information on the extent of the areas affected by the floods.
The white color on the PC2 (Fig. 5b) image that spreads inland indicates the areas which the floods have affected, but the farther the flood the lesser the level, in the north eastern part, this could be as a result of higher lands compared to the areas where the flood is more prominent. PC1 (Fig. 5a) shows least information on the flood extents because it shows little or no element of wetness neither did it show the flooded areas, except for some part of the southwest of the study area which could be a deep or an area for aqua culture. This is because even in Fig. 4, that particular area has a separate spectral signature within the flooded parts.

Brovey transforms outputs
The BT output in Fig. 6 is visually different from the PCA outputs (Fig. 4) because the overall image brightness is lower compared to the PCA outputs, which never the less gave an appreciable vision of the flooded area. This could be as a result of the BT technique of changing the radiometry of the scene (Mandhare et al., 2013), and thereby showing more contrast between the flooded and non-flooded areas, and even additional composites that could be a result of differences between the extents and effects of the floods. Also it was observed that the change in radiometry could be a reason for lower backscatter from the various objects in the image.

Classification
Results of classification for both ML and SVM are shown in Fig. 7a and Fig. 7b for PC2 and BT respectively. Both were compared and the more accurate classification (based on accuracy) was used for the flood extent map. The major constituents of the land use map in Fig. 8a were studied in order to make interpretation easier for the classification outputs, by having the knowledge of the types of land use and topography as well as their distributions in the study area. Paddy areas are the land use types that cover most part of the study area followed by rubber, and a little proportion of sugarcane plantation with a majority in the North and South of the study area. Other land use found are railways and roads linking residential areas.

Land use identification and classification
All land uses were identified using the land use map of the area (Fig. 8a). These land uses were further reclassified into two major different parts in Fig. 8b, such that those that are destructible by the floods are classified as the low lying areas which consist of the main paddy lands, the residential areas, the roads, rails and sugarcane plantations. The other class in the reclassified map is mainly forest and rubbers (named highlands) which are less likely to be flooded or affected by floods compared to the low lying areas. The reclassified land use map was used as a reference to calculate the accuracy of all the image classification outputs.

Analysis of BT and PCA based on ML and SVM classifications
Based on the classification outputs, Fig. 7a shows that both PC-ML and PC2-SVM are over classified by classifying almost the entire paddy area as flooded and even extending farther in the PC2-SVM where rubber and forest areas are included. This differs from the BT classified outputs (ML and SVM) being more related to each other in terms of common area coverage in each class.
In Fig. 7b, BT-ML classified most part of the west coast as flooded which is contrary to the original TerraSAR-X during flood image that shows that same area as non-flooded, but in BT-SVM, the areas classified are closer in extent along the coast. In short, PC2-ML and PC2-SVM, and BT-ML all over classified the paddy in terms of flood extent. Therefore BT-SVM was preferred to the other three classification outputs.
The reclassified land use map was used as a reference to test the accuracies of the classified outputs for both PC2 and BT. Table 2 shows the overall accuracies and kappa coefficients of the BT and PCA methods.   Based on the accuracy analysis using confusion matrix (Table 1) which shows all the overall accuracies and kappa coefficients for all the techniques used by Dutsenwai et al. (2015), the BT has the highest accuracy with overall accuracy of 70.9606% and kappa coefficient of 0.3280 for SVM and percentage accuracy of 70.1478% and kappa coefficient of 0.3179 for ML. This also applies in Table 2 where the accuracy of PC2 is also less than that of BT.
Initially, the PC2 showed clearer extents of the flood (Fig. 5b) before classification but became less in accuracy both visually and in the post classification accuracy assessment with an accuracy of approximately 68% in both ML and SVM (Table  2). PC1 and PC3 did not show the extent as PC2 which is why it was selected for further processing. Moreover, all three PCs did not differentiate between the flooded and the non-flooded paddy. On the other hand the output of fused BT in Fig. 6 did not show the extents as in PC2 (Fig. 5b) but showed clearer delineations between the flooded and nonflooded paddy.

Flood extent map
In producing the flood extent map, the land use map (Fig. 8a) which showed that the white part in the northern area was across the border of the country (southern Thailand); therefore it was masked off in order to make the map more relevant to the study area (Fig. 8a) and the BT-SVM being already a delineation between the flooded and nonflooded was then assigned more distinctive colors to show the flooded (white) which represents the flood extents and non-flooded (purple) areas.
The flood extent map (Fig. 9) shows the areas that the floods affect in white (flood extents), other areas are not as affected as the white parts, in other words not flooded. The degree of effect of the affected areas range from the very white areas as the most affected to the purple areas considered unaffected. Even in the flooded areas, the degree of the effect is denoted by the intensity of the color, i.e. the whiter the area the more it is flooded and the more the effect. The difference is that, the whiter areas are the lowest lying areas as justified in the DEM map (Fig. 10). The areas that are not flooded are the higher elevation areas consisting of majority of rubber plantations and forest, these areas are unlikely to feel effects of floods the way they are felt by the lower lying areas. As a very important paddy rank in the country, the effect of floods will be suffered by almost all the country through shortage of food as a result of loss of paddy lands in the major rice producing parts of the country.
The areas along the coast and within the paddy fields that are not flooded even though they are situated at very flood prone low lying areas tend to be protected by the roads and rails which normally are constructed at reasonable heights. Other reasons why the flooded areas do not affect the extreme coasts could be embankments. In some cases some of the paddy ranks are higher than others , so definitely the ones that are at the lowest areas tend to be flooded or surfer the effects of flood more than those in the higher areas.
The DEM was referred to in order to show that the extent of the flood is higher in the lowest lying areas, thereby ensuring the degree of accuracy of the new flood extent that was formed. The DEM data was importantly referred to validate the outcome map because it shows specifically the association of the value of each pixel with a topographic height.
The aim of the DEM in this section is to aid in justifying the flood extent map by showing the low lying areas as the areas covered by paddy and the high elevations as the parts that are non-flooded. Fig.  10 shows the DEM map with highest elevation in white and it darkens with the decrease in elevation. The darkest parts occur at the area where the flood extent map shows as the most flooded areas (Fig. 9). The above discussions in a nutshell supports and agree with most of the literature studied that most of the flood events occur in low lying areas of Northern Peninsula Malaysia due to the nature of terrain. This study also agree strongly with Toriman et al. (2009b) and Dano Umar et al. (2014), that flood affects mostly farmers and fishermen who reside in low lying areas and river banks being close to their source of income. The study supports the view of Tuan and Duong (2009) that flood mapping with radar images is appreciable as it is cloud cover free and has advantage of mapping flooded areas during floods. The study is also in agreement with Rokni et al. (2014) that BT method produces high quality outputs. In addition to past literatures, this study classifies Northern Peninsula Malaysia as a high risk zone for economic loss being the major paddy rank in the country. This is because any loss encountered by the region tends to affect the country as a whole.
Considering methods used by past researchers, this study was carried out based on a new methodology which involves the use of three single band SAR images through fusion and classification. Additionally, the classification was compared between fused BT and PCA where the former was observed to outweigh the latter after the accuracy analysis. The methods of classification i.e. SVM and ML were also compared for both PCA and BT.

Conclusion and recommendation
The successful fusion of RadarSat and TerraSAR-X images improved classification (Dutsenwai et al., 2015) for the purpose of delineation between the flooded and non-flooded areas. Based on the comparison of the PC2 method and the BT method, it can be deducted that classification of the fused image (BT) is better when compared to that of PCA, in terms of showing the extent of the flood. Classification of the fused images made the information on the original SAR images more visible after processing. Thus it can be concluded that the fusion of RadarSat and TerraSAR-X imagery can improve classification. Both BT and PC2 methods had different level of accuracies both in the ML and the SVM classifications, with the BT having the overall highest accuracy of 70.9606% and kappa coefficient of 0.3280. Thus the BT is a satisfactory technique to map the flood extents in the study area.
Generally, it can be concluded that the flood extent map has relevant information about the floods in the northern peninsula Malaysia that can help the government and flood management agencies in preparedness forecasting, planning and warning. The information on the most affected areas will help in the distribution of aid and relief to these areas where immediate help is needed. Also the knowledge on the high risk areas can help the agencies to reduce the risk by enacting laws that would discourage the people that are settled in the flood prone areas.
It is recommended that a smaller scope study area should be used in order to have a closer view of the satellite image in order to process a smaller part but at a larger view. TerraSAR-X should be combined with a different data or the RadarSat with another type of data, in order to find the differences in their combination with other types of data and evaluate their efficiencies. The use of feature level image fusion for the same data sets should be tested; this will determine the flexibility of the data in the different types of image fusion techniques.