Detection of acute lymphoblastic leukemia using microscopic images of blood

Article history: Received 19 March 2017 Received in revised form 1 July 2017 Accepted 5 July 2017 Leukemia is a group of cancers that usually begin in the bone marrow and result in high number of abnormal white blood cells. Detection of leukaemia in early stages is necessary as this can reduce the rate of mortality and may lead to death. In a manual method of leukaemia detection Haematologists analyze the microscopic images and decide the severity. This is lengthy, cost effective and time taking process which depends on person’s expertise and may not lead to standard accuracy. Till date, a number of methods have been proposed for this Leukaemia detection using Image Processing. Unlike the previous methods, which solely depend upon the entire cell, in this paper we proposed a new method to separate the cell Nucleus from Cytoplasm to obtain more features. The proposed method achieves the better accuracy when compared to the other existing methods.


Introduction
*Cancer can be defined as a disease in which a group of abnormal cells grow uncontrollably by disregarding the normal rules of cell division. There are several types of cancers which includes Skin cancer, Lung cancer, Prostate cancer, Blood cancer etc. Leukemia comes under blood cancer. Leukemia is a type of cancer pertaining to white blood cells (WBCs), whereby abnormal and immature WBCs are produced by the bone marrow and enter the blood (Turgeon, 2005). Generally, cells grow and multiply themselves to form new cells as the body needs them for effective functioning. When cells grow old, they die and new cells take their place. Sometimes, this cycle does not work properly (Majno and Joris, 2004). In cancer, new cells are formed when the body does not need them, and old cells do not die when they should (Patel and Mishra, 2015). Thus the number of abnormal white blood cells become numerous and interferes with normal white blood cells to carry out their functions. This also causes an imbalance of blood system in human body (Vaghela et al., 2015). The two main characteristics of cancer are uncontrolled growth of the abnormal cells in the human body and the ability of these cells to migrate from the original site and spread to distant sites. If the spread is not controlled, cancer can result in death. There are two types of acute leukemia, namely acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) (Turgeon, 2005). Acute lymphoblastic Leukemia (ALL) is diagnosed in 3000 to 4000 persons in the United States each year; two thirds of them are children. The current rate of cure of nearly 80 percent in children attests to remarkable progress in the development of effective treatments for resistant subtypes of the disease (Pui and Evans, 1998). Acute leukemia is usually diagnosed by a morphological analysis of blood slides by hematologists, which is a complex, timeconsuming, and costly process (Mohapatra et al., 2012a). It also requires considerable training and expertise.
Furthermore, the results often lack a standardized performance owing to a variety of factors including insufficient expertise or imperfection of the samples (Piuri and Scotti, 2004;Mohapatra and Patra, 2010). Unfortunately, the accurate assignment of patients to specific risk groups is a difficult and expensive process, requiring intensive laboratory studies including immune phenotyping, cytogenetics, and molecular diagnostics. Moreover, these diagnostic methods require the collective expertise of a number of professionals, and although this expertise is available at most major medical centers, it is generally unavailable in developing, underdeveloped countries (Yeoh et al., 2002). Microscopic examination of bone marrow blood smear and aspirate is the most economical technique but manual examination of microscopic images (Srisukkham et al., 2013) often leads to bias because the results lack standardized performance owing to a variety of factors including insufficient expertise of the person, imperfection of the samples (Vaghela et al., 2015). Thus, there is a need to automate this procedure (Shirvoikar and Virani, 2016). Various digital diagnosis systems were developed to analyze microscopic blood images for leukemia detection. However, they suffered from a number of problems and limitations, in particular accurate diagnosis of leukemia require discrimination of one cell type from another, and of the cell nucleus from cell cytoplasm.
Indeed, separation of Leukemia cell nucleus with diverse complex irregular morphology from cytoplasm is a challenging task. Research shows that only a few existing clustering algorithms are able to achieve good adaptivity for reliable separation of nucleus and cytoplasm (Mohapatra et al., 2012a;Mohapatra et al., 2012b;Mohapatra et al., 2014). Therefore, the existing methods like k-means (Fatma and Sharma, 2014), FCS1 (Li et al., 2011), FCS2 (Wu et al., 2005), FCM (Bezdek et al., 1984), LDA (Li et al., 2011) are not reliable because of the limitations of the clustering algorithms (Kuo and Landgrebe, 2004).
This paper aims to overcome the above mentioned challenges. Here we propose a new segmentation algorithm to efficiently separate cell nucleus from its cytoplasm. The proposed algorithm uses discriminant measures to separate nucleus and cell where it uses features representing shape and color extracted from the information based on cell, cytoplasm and nucleus. The employed classifier, Support vector machine (SVM) take the statistical values obtained from the features to classify healthy and unhealthy cells (Tong and Chang, 2001;Tsochantaridis et al., 2004;Osuna et al., 1997).

Proposed algorithm
In this Paper, we consider the problem of segmentation of Leukocyte into Nucleus and Cytoplasm. We classify the Leukocyte into healthy or unhealthy based on various features which are to be extracted from the cell as well as nucleus. So segmentation plays an important role in evaluating the performance of the system. To improve the accuracy of the cell and nucleus segmentation process, we devised a new algorithm which is explained below.

Segmentation
We need to segment each Leukocyte into Nucleus and Cytoplasm. Segmentation of Nucleus from the Leukocyte is a challenging task and all aspects should be considered clearly. In this research we proposed a new method for the segmentation of Nucleus. The Algorithm for the segmentation is stated below: Initially the color blood slide image 'a' is given as input to the system. The color image is then converted into grayscale image 'l'. To adjust image intensity level, linear contrast stretching is applied to gray scale image l. Then contrast of the grayscale image is enhanced by using histogram equalization method to get image 'h'.
Obtain the image 'r1'=l+h to brighten all other image components except cell nucleus. Obtain the image 'r2'=r1-h to highlight the entire image objects along with cell nucleus. Obtain the image 'r3'=r1+r2 to remove all other components of blood with minimum effect of distortion over nucleus. To reduce noise, preserve edges and increase the darkness of the nuclei implement 3-by-3 minimum filter on the image 'r3' to get image 'minf'. Apply a global threshold Otsu's method on image 'minf'.
Using the threshold value in above step convert 'minf' to binary image 'thresh'. To remove small pixel groups use morphological opening and get image 'morph'. We found contours associated with the image. All the contours except the maximum area contour are cleared .The maximum area contour is filled to get image nucleus. The following are the intermediate images that are obtained from the above algorithm. Fig. 1(a) represents the initial color image. Then Fig. 1(a) is converted to Binary image Fig. 1(b). Then Fig. 1(b) is converted to grayscale image Fig. 1(c). Fig. 1(c) is enhanced by histogram equalization to form Fig. 2(a) and Fig. 2(b) is obtained by adding Fig.  1(c) and Fig. 2(a) and Fig. 2(c) is obtained by subtracting Fig. 2(a) from Fig. 2(b). Fig. 3(a) is obtained by adding Fig. 2(b) and Fig. 2(c). Fig. 3(b) is the Otsu's threshold of Fig. 3(a) and Fig. 3(c) is the binary image of Fig. 3(b). Fig. 4(a) is obtained by morphological opening of Fig. 3(c) and Fig. 4(b) is the maximum contour obtained from Fig. 4(a) and Fig. 4(b) is completely filled to obtain Fig. 4(c) which is the final nucleus obtained.  (cell,contours,k,color,8,hierarchy,0,Point()); 18 threshold (cell,cell,0,255,CV_THRESH_BINARY); 19 display cell First color image X is converted to HSV image hsv. Extraction of white blood cell part i.e. image tot is done using HSV values. Convert obtained image to binary image x. To remove small pixel groups use morphological opening. We found contours associated with the image. All the contours except the maximum area contour are cleared. The maximum area contour is filled to get image nucleus. The following are the intermediate images that are obtained from the above algorithm. Fig. 5(a) is the initial color image taken and then it is converted to Fig. 5(b) which is the HSV image of Fig. 5(a). Then Fig. 5(c) is converted to its binary image Fig. 5(b). Fig. 6(a)

Feature extraction
Healthy and Unhealthy Leukocytes can be classified based on certain features. Shape of the Nucleus plays an important role in differentiating unhealthy cells. Once the segmentation process is completed, we extract features like Area, Perimeter, Circularity, Convex area, Solidity, Major axis length, Minor axis length, Eccentricity, Extent, Filled area, Aspect ratio, Equivalent diameter, Mean, Standard deviation, Entropy, Nucleus to cell area ratio, Nucleus to cell perimeter ratio. Definitions of some important features are shown below: a) Area: The number of non-zero pixels within the image region is called as area. b) Perimeter: The distance between successive boundary pixels is termed as perimeter. c) Circularity: This is defined as ratio of area to square of perimeter and then multiplied by 4 times the value of pie. It is dimensionless. Circularity = 4 * Pi * Area/ Perimerer2 d) Eccentricity: It is a parameter associated with every conic section. It can be thought of as a measure of how much the conic section deviates from being circular. e) Form factor: a mathematical factor which compensates for irregularity in the shape of an object, usually the ratio between its volume and that of a regular object of the same breadth and height.

Classification
After extracting all the required features, we need to classify the Leukocytes into healthy or unhealthy. Classification process plays an important role in detection of leukemia and it's the soul of the entire system. And the accuracy of the system completely depends on the classification algorithm used. Several machine learning algorithms have been proposed till date for the classification process. Out of them, Support Vector Machine (SVM) is found to be simple and efficient with an accuracy of 92.7%. "Support Vector Machine" (SVM) is a supervised machine learning algorithm which can be used for both classification and regression challenges. It is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then Classification is performed by finding the hyper-plane that differentiates the two classes very well. After applying SVM on the images, all the Healthy cells fall onto one side of the plane and the Unhealthy cells fall onto other plane. The whole algorithm is illustrated in Fig. 7.

Experiment and result
After segmentation of Leukocyte, the results of Healthy cell and Unhealthy cell are shown below. In our experiment 40 images of Healthy cells and 40 images of Unhealthy cells are taken from ALL-IDB2. According to our work, the colored images converted to binary images of both Nucleus and cell ( Fig. 8 and  9). The above images are taken from the experiment results. Both nucleus and cell part are obtained from the segmentation algorithm (nucleus) and segmentation algorithm (cell) respectively.
Training data are the input feature values which are supplied to the system and experiment is evaluated using the testing data. When training data is given to any classifier it establishes the relation based on the data and then it applies the relation to the test data. In this way, the classifiers work. Overall we took 80 images out of which 40 are healthy and 40 are unhealthy. We compare the results of our work with Khashman and Abbas (2013) method because their method is proved to be the best method available in this field of Leukemia Detection. In the Tables 1 and 2 we present the comparison results and overall efficiency of our work.    (50%), 20 healthy images and 20 unhealthy images are given as training data and then system is evaluated on another 20 healthy and 20 unhealthy images. Then we achieved accuracy of 80% which is equal to the accuracy obtained by Khashman and Abbas (2013) method.
In Table 2, we compared the overall efficiency of our proposed work with the Khashman and Abbas (2013) method and we achieved the accuracy of 86.66% leading their method by 9.11%.
In Khashman and Abbas (2013) method, they used Multi-Layer Perceptron (MLP) method for classification and in our proposed work, we have used SVM method for classification and we proposed a new segmentation process for Nucleus separation. This clearly reveals the strength of proposed segmentation method which provides more efficient nucleus separation to achieve better classification accuracy compared with the existing algorithm.

Conclusion
In this paper, we proposed a step by step approach for the detection of Leukemia using Image Processing. The proposed Method provides the best accuracy of 93.33% when 80 images from ALL-IDB2 database are provided as input. Segmentation method used to separate Nucleus from the Leukocyte is simple and effective when compared to all the existing methods. Coming to our future work, In addition to extraction of features from Nucleus, extracting features from Cytoplasm will also help in improving performance of the system. Classification plays an important role in the detection process. So, we intend to implement more Machine learning techniques to improve the accuracy of cell classification algorithm.