Enhanced feature extraction technique for brain MRI classification based on Haar wavelet and statistical moments

Many methods have been proposed to classify the MR brain images automatically. We have proposed a method based on a Neural Network (NN) to classify the normality and abnormality of a given MR brain image. This method first employs a median filter to minimize the noise from the image and converted the image to RGB. Then applies the technique of Discrete Wavelet Transform (DWT) to extract the important features from the image and color moments have been employed in the feature reduction stage to reduce the dimension of the features. The reduced features are sent to Feed-Forward Artificial neural network (FF-ANN) to discriminate the normal and abnormal MR brain images. We applied this proposed method on 70 images (45 normal, 25 abnormal). The accuracy of the proposed method of both training and testing images are 95.48%, while the computation time for feature extraction, feature reduction, and neural network classifier is 4.3216s, 4.5056s, and 1.4797s, respectively.


Introduction
*Magnetic resonance imaging (MRI) is a fast, low risk, a noninvasive imaging technique and for studying the human brain, MRI has become the most powerful imaging tool. MRI utilizes radio waves and a magnetic field to produce images of anatomical structures of the brain without utilizing radioactive traces or ionizing radiation (X-rays). MRI provides wide information regarding brain soft tissues which are helpful for biomedical research and clinical diagnosis (Westbrook, 2014). MRI provides enormous contrast for numerous brain tissues as compared to other imaging modalities. Therefore, due to these properties, MRI is the most renowned tool for brain pathology treatment and diagnosis. The wide number of imaging data creates difficulty for the observer to analyze and interpret the brain images manually. Detecting brain disease manually is a tedious, costly, and time consuming task. Therefore, an accurate and automated computer aided diagnosis (CAD) systems are highly demanding to generate easier and faster inferences from MR images. These systems are extremely helpful for physicians in prognosis, diagnosis, pre-surgical and post-surgical process. In the human brain, the most identifiable feature is symmetry which is clearly visible in coronal and axial brain magnetic resonance images. Whereas, in the axial images asymmetry choose the disease or abnormality. Therefore, using different image processing and signal processing techniques these essential features can be modeled to classify the benign and malignant images (Fletcher-Heath et al., 2001) MRI generates good quality images which are extremely helpful for biomedical research and clinical diagnosis.
The accurate and automated classification of MR images is very essential to identify the normal and abnormal MR images. In this case, in the last couple of decades, enormous approaches have been proposed by researchers for the desire aim, which can be categorized into two different groups. The first group is, supervised classification, such as knearest neighbors (k-NN) and support vector machine (SVM) (Chaplot et al., 2006). The second group is the unsupervised classification, i.e., fuzzy cmeans (Maitra and Chatterjee, 2008) and selforganization feature map. Indeed, all these classifiers obtained astounding accuracy or results. However, the performance of supervised classifiers is better than unsupervised classifiers. In the human body, the brain is the most complex organ among other organs which contain over 100 billion nerves. In the human brain, the unusual growth of tissues is known as a brain tumors. Brain tumors are of two types, malignant and benign. The malignant tumor grows rapidly and contains on the cancer cell. While a benign tumor grows slowly and is less harmful. The brain of a normal person consists of three different parts. Such as gray matter, white matter and cerebrospinal fluid (Zarandi et al., 2011;Demirhan and Güler, 2011). The part gray matter is responsible for the nervous signal processing and this part consist of neuron nucleus and dendrites. The contribution of gray matter to the total brain volume is 40 percent. The white matter comprised on fiber which is called axon. The contribution of white matter to the total brain volume is sixty percent. The third part, cerebrospinal fluid is colorless and it secretes different hormones for making among grey matter, white matter and spinal card of the nervous system. The cerebrospinal fluid, white matter, and gray matter are mostly affected by various brain abnormalities. So we mostly considered in our work, the intensities of a different pixel which are representing these affected parts. Brain abnormalities have so many types, some of them are represented by Fig. 1. Automatic detection of different diseases is extremely important to save human life as much as possible. So for this purpose, different types of image techniques are available in magnetic resonance imaging. MRI is the most suitable imaging modality and can easily achieve their objectives. The following modules contain in our methodology: Image preprocessing, Feature extraction, Feature reduction, and Classification.
Image pre-processing is very important for further processing. Different type of noises corrupted the medical images like salt and pepper noise, Rician noise etc. as the noise is removed in this step to enhance the image quality. A good quality image is required for efficient observation. The median filter removes the noise efficiently and preserves the edges and brightness effectively.
In image processing, a feature extraction is a special form of dimensionality reduction. Quantitative measurement of medical images is usually done in the feature extraction stage and is used for making a decision about the pathology of a tissue or structure. The transformation of input data into the set of features is known as feature extraction. Feature extraction stage is very useful when an algorithm input data is extremely large to be processed and it is already known that processing the whole data is time consuming. So to avoid the redundancy, transform the input data into an exact representation of a set of features. If the extracted features are carefully selected instead of the full size input in the feature extraction stage, it is more likely that the desired information will be extracted by the features set from the input data in order to perform the desired task. Generally, the features of an image can be extracted by its content, like for instance, color, texture, shape, position, dominant edges of image items and region etc.
Feature selection is a challenging task. Different researchers have used different techniques for feature selection such as Gabor feature, Discrete Wavelet Transform, Spectral Mixture Analysis, Texture Feature, Principle Component Analysis, and Minimum Noise Fraction Transform (Gonzalez and Woods, 2009). Through dimensionality reduction process a large number of features needs to be reduced, to focus only on important features. For feature reduction purpose, the most used techniques used by researchers are a genetic algorithm (GA), independent component analysis (IDA), principal component analysis, and linear discriminate analysis.
The classifier takes these reduced features for training and testing to decrease the computation complexity and time. Classifiers such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), Hidden Markov Model (HMM), and Probabilistic Neural Network (PNN) are used for different applications like for instance, object identification, handwritten digit identification, text classification, face identification, speaker identification and for medical applications. Every classification has unique properties and has some advantages and disadvantages.
The major disadvantage of K-NN is that it utilizes the whole features in the distance which is computationally complex, especially when the training set size is increased. Most importantly, the presence of irrelevant features and noise, severely affect the accuracy of K-NN. Furthermore, the disadvantage of PNN is, it needs adequate memory space to store the model and regarding classifying new cases, PNN is much slower multilayer perceptron networks. The Bayesian classifier is renown because of its low computational cost and its simplicity in both the training and testing stage. Unfortunately, the accuracy of the Bayesian approach is lower than SVM.
The classification accuracy of SVM is much more accurate as compared to other classifiers (Shantanu, 2006). Artificial Neural Network, its accuracy rate is higher than other classification techniques with contradictory data and high dimensional features.
The training and testing are the two parts of the classification process. Firstly, for training the known data are needed to be given in the training part to the classifier. Secondly, the unknown data are needed to be given in testing part to the classifier and after training, classification is performed. The classifier precision and error rates depends on the training efficiency. Analyzing the human brain MR images manually is a slow and expensive process and there is always a likelihood of errors. Therefore, it is extremely important that the human brain images should be examined, classify and process in an automatic way. So for this purpose, we have suggested an automatic MR image classification method which is based on color features and feedforward artificial neural network.
The remaining paper is as follows, section 2 presents a literature review, section 3 is carried out the proposed methodology, the experimental results and discussion are carried out in section 4. While the final conclusion part is shown in section 5.

Literature review
From the last couple of decades, the brain tumor is the most favorite area for research. Nowadays, the two main technologies mostly employed are, MR imaging and computed tomography (CT) scan. In this modern advance era, still, the MR imaging are examined manually by the experts which can give wrong results and most importantly it is timeconsuming. The classification of images can be automated by using the digital image processing techniques and algorithms. Most importantly, the human-made mistakes and the computation time can be reduced by applying automated techniques. Therefore, for this purpose, we have proposed an automated methodology to discriminate between the normal brain MR images and abnormal brain MR images by using different techniques, such as median filter, discrete wavelet transform, color moments, and artificial neural network. We have reduced the number of features for fast processing and for better accuracy rate.
Numerous techniques have been presented by researchers for brain MRI classification. Zanaty and Aljahdali (2011) presented an automated method to segment an image using modified fuzzy algorithms, where enhancement of segmentation i.e., computational speed and the accuracy is unavoidable in medical image segmentation. Somasundaram and Kalaiselvi (2010) proposed an automated algorithm for brain extraction for axial T2-weighted images. Some other brain MRI modalities are also discussed. An improved FCM algorithm is introduced by Vasuda and Satheesh (2010) for brain MRI segmentation. Convergence rate reduction remains a major problem for the proposed algorithm. Logeswari and Karnan (2010) used a segmentation process to report brain tumor detection based on SOM.
Joseph et al. (2014) introduced a method for MR brain images segmentation utilizing k-means clustering and morphological filtering to detect the abnormal images. Alfonse and Salem (2016) used SVM to discriminate the brain tumor of MR images. The fast Fourier Transform is utilized for features extraction to improve the accuracy of the classifier. In Chaplot et al. (2006) study, for feature extraction, a two-dimensional discrete wavelet transform are used whereas, Daubechies filters are utilized for the decomposition. The accuracy of self-organizing map (SOM) and support vector machine (SVM) classifiers were 94% and 98%, respectively. Maitra and Chatterjee (2006) used Slanted Transform for feature extraction of brain MRI. After feature extraction, backpropagation neural network is used to classify the images and achieved 100% accuracy.
El-Dahshan et al. (2010) used DWT as a feature vector for MRI brain image and applied PCA to the feature vector to minimize the number of coefficients. For classification purpose, FFBP-ANN and k-NN classifiers have been used to differentiate between benign and malignant images and the accuracy was achieved 97% and 98% respectively.
Classification of brain image and high accuracies for classification were achieved by Zhang et al. (2011). Fayaz et al. (2016) suggested a three stages approach for brain MRI classification and hence achieved high accuracy. Wahid et al. (2016) suggested a method for identifying normal and abnormal MR images using three stages such as preprocessing, feature extraction and then apply probabilistic classifier and obtained high accuracy. The classification of different diseases, diagnosis, and prediction can be improved with the appropriate use of data mining techniques (Ullah et al., 2016). Ullah et al. (2018) suggested a method for MRI classification using K-NN and obtained astounding accuracy. Suhaimi and Htike (2018) used convolutional neural network to classify functional magnetic resonance imaging. They have selected sizes for feature map while designing CNN for the classification of FMRI. A novel automatic approach is proposed for fundus images (Maniar and Shah, 2017). In the proposed method they have used data pre-processing technique and images for the classifier improvement. Different classifiers have been used and also compared in the proposed methodology such as CART, NN, NB, decision tree, K-NN. A new method is proposed for brain tumor detection and classification in Saleh and Al-Bakry (2017) study. They have used various methods such as multi-level threshold segmentation, morphological closing and opening for multi-level threshold enhancement and for tumor segmentation watershed algorithm has been used. An approach is proposed in Keerthana and Xavier (2018) study for brain tumor detection and classification which contains on pre-processing, segmentation, feature extraction and classification. They have used genetic algorithm and SVM for optimizing the features. In Korolev et al. (2017) study the brain MR image for Alzheimer disease is classified using convolutional NN. They have skipped the feature extraction step in their methodology and achieved promising results.
In Lavanyadevi et al. (2017) study PNN is used to automatically classify the brain tumor. K-means algorithm has been used for segmentation and for the detection of tumor region. Four different stages are presented in Mathew and Anto (2017) to classify the brain MRI, such as anisotropic diffusion filter is used to pre-process the MR image, DWT is used to extract the features from the image and lastly these features are given to SVM for classification. In Saha and Hossain (2017) study the normality and abnormality is concluded using K-means, NSCT, and SVM. While the median filter has been used for noise removal. For image segmentation purpose K-means clustering algorithm has been used. Afterwards, nonsub sampled contourlet transform (NSCT) has been applied on segmented images. From the sub band coefficients of NSCT seven features are extracted and are given to support vector machine and obtained a reasonable accuracy.

Proposed model
The proposed methodology discriminates the human brain MR images as normal or abnormal. The suggested methodology uses four steps such as preprocessing, feature extraction, feature reduction, and classification. These techniques have been used in our proposed methodology, median filter, Discrete Wavelet Transform (DWT), Color Moments (CM), and FF-ANN as shown in the Fig. 2. The detail of these given techniques has been provided in the subsequent subsections. Normal MR brain Image Abnormal MR brain Image

Pre-processing
Different filters have been used by different researchers for noise removal from MRI images. Different kinds of noise corrupted the medical images while acquiring the image. The median filter is utilized in this paper to minimize the unwanted pixels from MR images. The median filter sharpens the image and preserves the edges of the image effectively. A suitable window size 3*3 mask is used in the proposed method to remove the noise from the image. A large window size affects the edges of the image and it requires a higher computation time. The grayscale image then converted to Red, Green and Blue components. As grayscale images are less informative than RGB images. Filtered Image and RGB image were shown in Fig. 3. a b

Features extraction
For image feature extraction the Discrete Wavelet Transform is a powerful mathematical tool. Discrete Wavelet Transform preserves frequency and time information of the signal. DWT is also useful to be used for image analysis and classification. The major advantage of DWT is; it transforms the image from the spatial domain into the frequency domain. The signal analysis development is shown in Fig. 4. Let suppose that (t) represents a square-integrable function, then the continuous WT of (t) relative to a given wavelet ѱ( ) can be define as: In the above equations, by using translation and dilation the wavelet ѱ , ( ) calculated from mother wavelet ѱ( ), where 'a' and 'b' are representing the dilation and translation parameter respectively. Both 'a' and 'b' are taken as positive integers. There are so many types of wavelets, however, the most popular and important wavelet is Haar wavelet. In lots of application Haar is the most preferred wavelet because of its simplicity. We have decomposed an image into sub bands with the relative co-efficient of DWT.
Discrete Wavelet Transform is implemented using cascading filter banks where the low pass filter and high pass filter meets the required constraints as demonstrated in Eq. 3 and Eq. 4. (3) In Eq. 3 , demonstrate the approximate component coefficients, and in Eq. 4 , shows the details component coefficients.

Fig. 4: The development of signals analysis
In the proposed method the DWT is separately applied on every single dimension. Fig. 5 shows the 2D DWT diagram. So as a result, at each scale, there are four sub band (LL, LH, HH, HL) images. For next 2D DWT, the sub band LL has been used. Here the approximation component is represented by the LL sub band. While the three sub bands LH, HL, and HH are known as comprehensive component of the image. The decomposition of the image can be performed by many levels. In our algorithm, we have used level-three decomposition to extract the features via Haar wavelet. Whereas, the overall sub band layout is shown in Fig. 6.
In Fig. 7 the color image is a transformed image from a grayscale image. The size of the RGB image is 256*256*3, this size is extremely large to calculate and is also time consuming. Therefore, we compress the image while preserving the information. In the proposed method the image is compressed up to 3levels without losing any information. This 3-level wavelet decomposition reduced the size of the input image to his best as shown in Fig. 7. Our region of interest (ROI) in Fig. 7 is the approximate coefficient (LL3) and the total size of this reduced image 32*32*3. However, this size is still large enough for a classifier, therefore, we have reduced these features in feature reduction stage using Color Moments. Three-level wavelet decomposition tree were shown in Fig. 8.

Features reduction
An excessive number of features maximize the time of computation for classification and it also maximize the storage memory. Most of the time the excessive number of features makes the classification more complicated. Therefore, the number of features need to be reduced to overcome this problem. We have used Color Moments in feature reduction stage to reduce the extracted 3072 features, these extracted features are still high in number for classification and are time-consuming. In the given methodology, we have extracted the RGB channels from each approximate coefficient up to 3 levels. For every channel of RGB, the three Color Moments i.e., standard deviation (variance), mean and skewness are computed separately as shown below. These features are very informative for classification. The mathematical representation of color moments such as mean, variance and skewness for Red, Green, and Blue channels are shown in Eqs. 5-13.
The overall summary of the above equations are, we have obtained RGB channels from approximate coefficients of the RGB image at level-3, represented by LL3 as shown in Fig. 7. Subsequently, we have calculated the mean, variance, and skewness of each RGB channels separately. As a result, we obtained a total of nine features, where each channel gets three features respectively. Finally, the total features are kept in a one dimensional array and then accessed by Artificial Neural Network Classifier.

Classification
In the last stage of the proposed methodology, Feed-Forward Artificial Neural Network is used to identify normal and abnormal MR images. Neural Network is a renowned technique for classification and is extensively utilized for pattern classification. ANN does not require any data about prior probabilities and the probability distribution of various classes. The function of ANN is the same as the human brain, for instance, reasoning information, inferences, and information storage etc. The ANN required high computation time during the training phase. However, the training phase plays a vital role, once we trained the classifier for some HL HH specific work. Then this classifier identifies unknown entities abruptly. ANN is a mathematical model which contains on many non-linear artificial neurons which run parallel. NN consists on single layeredperceptron or on a multi layered perceptron. However, most of the ANN architecture consists of an input layer, a hidden layer, and output layer neurons. The hidden layer plays a mediate role between the input layer and an output layer. The FF-ANN is well known for its simplistic use among all the ANNs. In the suggested method the reduced 9 features were directly sent to Feed-Forward Artificial Neural Network (FF-ANN). Therefore, we have 9 (NI) input neurons. While the hidden neuron (NH) is decided as 10 according to the entropy information method (23) as shown in the Fig. 9. So the structure of the NN is like 9-10-1. Furthermore, the output layer then classifies the normality and abnormality of MR images. In our case, 1 and 0 are used to show the normal and abnormal MR images. It can be clearly seen in Fig. 9. Where both the hidden layer and output layer use the sigmoid function and linear transformation. Levenberg Marquardt back-propagation algorithm (24), is used to train the network because this algorithm adjusts the input weight automatically for achieving the target.

Results and discussion
In progressive researches, MRI plays an important role in the last two decades or so. We have used Intel® Core TM i5-4590 CPU having 3.3 GHz processor and 8GB memory running under the Windows 7 operating system. The experiments and algorithm development were performed on NN toolbox of MATLAB 2015 (R2015a).
The proposed model is easy in implementation. We have extracted the red, green and blue channels from the color brain MR images, and for these three color channels, the color moments such as mean, skewness and standard deviation were computed separately. So nine features were totally extracted from brain MR image. We have used only seventy images (25 normal and 45 abnormal) and for the efficient classification of these images, we have considered FF-ANN and obtained a good accuracy rate. The contribution of this paper is the features reduction, as we have reduced the features to only 9 which takes a short time for computation. While the other researchers have used a high number of features for classification which require a high amount of time for computation.

Datasets
We have used T-2 weighted images in the experiments which are taken from the website of Harvard Medical School (http://med.harvard.edu/ AANLIB/). We have taken 70 images for the experiment. In these total images, 25 were normal brain MR images and 45 were abnormal brain MR images. The abnormal images comprised of 15 Alzheimer disease images, 15 Acute Stroke images, and 15 Glioblastoma Multiform overlay images. In Fig. 10 a single image of each disease has been shown.

Algorithm accuracy
The whole datasets which contain on 9 extracted features were divided randomly into three parts, such as training dataset (65%), testing dataset (30%) and validation dataset (5%). The proposed algorithm gave 97.18% accuracy for the training dataset, while 91.9% and 99.7% accuracy was recorded for testing and validation dataset respectively. So the unanimous accuracy of the S Ca1 Cd1 Cd2 Ca2 Ca3 Cd3 overall method becomes 95.48%. The given accuracy is observed only for 14 epochs. Fig. 11, Fig. 12, and Fig. 13 show the system performance of our suggested methodology. The accuracy of our suggested method has been compared with some well-known methods as shown in Table 1. The suggested method provides some promising results especially in terms of time and accuracy. This is because of utilizing less amount of features as discussed above. In the Table 1, one technique provides 100% accuracy. However, the computation time of that method is very high.
The next Table 2 Represents feature based comparison. The other researchers used some huge amount of features while in our proposed method we have only used 9 features as shown in Table 2 and obtained a high accuracy rate.

Time analysis
Classifier evaluation time is another important factor. In this methodology, the time is not considered for network training, unless the NN biases/weights remain unchanged or a big change occur in the properties of the images. We have sent the whole dataset of images into the classifier which contains on 70 images and noted the time of computation. The time taken by different stages is shown in Fig. 14. Which is quite reasonable. The computation time of each image for feature extraction, feature reduction, and neural network classifier is 4.3216, 4.5056, and 1.4797s, respectively. The most time-consuming stage was feature reduction, which needs to be improved in the future work.

Conclusion
We have proposed in this paper a modified method to discriminate between normal and abnormal brain MR images using a median filter, DWT, color moments and ANN. The human brain MR images normality and abnormality has been foud out using various stages as discussed in the literature. DWT can easily extract the information from the image without losing any information. However, the number of extracted features were very high, therefore we have used color moments to reduce the number of features in the feature reduction stage which is the main contribution. In the last stage of the proposed methodology, these reduced features were given to FF-ANN to classify the normality and abnormality of MR brain image. The overall accuracy of our proposed method is 95.48% and each image computation time is extremely reasonable.   (Chaplot et al., 2006) DWT The main advantage of feature reduction is; it minimizes the computation time. Nowadays identifying different diseases in the human brain is an issue. A multiclass classification will be helpful for this issue. In future work, we will emphasize on how to reduce the computation time of feature reduction and how to recognize various bugs in the human brain. Korolev S, Safiullin A, Belyaev M, and Dodonova Y (2017).
Residual and plain convolutional neural networks for 3D brain MRI classification.