Region-based image retrieval using region of interest (ROI) according to incremental frame and clustering color image

Content-based image retrieval involves the extraction of global feature images for their retrieval performance in large image databases. Extraction of global features image cause problem of the semantic gap between the high-level meaning and low-level visual features images. In this study RBIR, Region of Interest Based (ROI) Image Retrieval Using Incremental Frame of Color Image was proposed. It combines several methods, including filtering process, image partitioning using clustering and incremental frame formation, complementation law of theory set to generate ROI, NROI, or ER of the region. The concept of weighting as well as a significant query is also incorporated as a query strategy. Extensive experiments were also conducted on the Wang database and the color model selected was the CIE lab. Experimental results show the proposed method is efficient in image retrieval. The performance of the proposed method shows a better average IPR value of 3.51% compared to RGB and 22.92% with the HSV color model. Meanwhile, it also performs better by 36%, 5%, and 24% compared to methods CH (8,2,2), CH (8,3,3), and CH (16,4,4).


Introduction
*Today, technological developments have seen the rapid growth of the Internet, covering wider areas, increasing the usage of multimedia and electronic devices equipped with advanced technology at affordable prices, enabling millions of images to be uploaded and downloaded in the media social led to the growth of the digital image database exponentially. This triggered researcher to come out with efficient image retrieval techniques with the aim of finding the images of interest. This opportunity has been taken by CBIR as a new approach to efficiently retrieve relevant images. This technique tends to retrieve similar images using visual features to a given query.
Despite the many publications using visual features in CBIR to report the performance and significant advances in image retrieval, this technique still shows some limitations. Extraction of visual features from the whole image, causing a problem of the semantic gap between the high-level meaning and low-level visual features images. These visuals insufficiently describe some relevant objects or the particular region that the user is interested in (Bchir et al., 2018). Another limitation is their sensitivity to the type the visual features that are extracted from the images. Absolutely, the visual feature relevance depends on the image content. Therefore, the more accurate the retrieval result obtained, the more distinctive the visual features are. Actually, achieving efficient image retrieval can be done by allowing the user to select the important or desired specific region that will minimize the semantic gap.
In this paper, we propose a region of interest (ROI) based on image retrieval. To obtain the ROI of images, the clustering process, dividing nonoverlapping images, the concept of complementary law of basic set theory is wrapped up as a part of the proposed method. The silhouette index is also consolidated to validate the resulting clusters. The rest of the paper is organized as follows: Section 2 represents a related work for this research. Section 3 describes the proposed technique that triggers the idea resulting from chapter 2. This section also covers the process of measuring the similarity and performance of the ROI-based image retrieval. Results and discussion is presented in section 4 and finally section 5 elaborates the conclusion of this research.

Related works
Region of Interest (ROI) or Object-based image retrieval (RBIR/OBIR) emerged as an alternative approach to effectively retrieve relevant regions or objects in images that the user may be interested in. This approach attracted many researchers for expanding their research and building commercial products (Vu et al., 2003;Chan et al., 2008;Moghaddam et al., 2001;Tian et al., 2000;Zhou et al., 2005;Lee et al., 2012;Vimina and Jacob, 2013;Eze et al., 2019;Baji and Mocanu, 2017;Raja et al., 2020). This method offers a way to reduce the semantic gap between the visual properties of the query and the high-level understanding of the user. Kam et al. (2000) mentioned the performance of effective CBIR lies in the ability to access the image at the level of objects. This is because users typically want to search images that contain important or particular objects of interest. Therefore, for the proposed RBIR from CBIR where the representation, index, and query level using the object is critical (Carson et al., 1999). The authors proposed RBIR based on object extraction through image segmentation. A multiscale segmentation algorithm that automates the segmentation process was applied in this research, the output of which is assigned novel color and texture descriptors that are both efficient and effective. To improve the result, the authors perform strategies of the query.
Next, Tian et al. (2000) proposed a novel method of CBIR by combining user-defined Region-of-Interest (ROI) and spatial layout. The image is partitioned as n x n non-overlapping image blocks and features such as color and texture are extracted from each block. Selections of 2 x 2, 3 x 3, 4 x 4 and 5 x 5 layouts are available depending on the complexity of the internal structures of the query image. Then the spatial layout is compared to the conventional approach using global features in the relevance feedback process. Next, user-defined ROI is applied as a refined spatial layout approach. Since the automatic segmentation to perform ROI is not always reliable, the authors suggested to capture of image object (ROI) is done by the user. Therefore, more accurate relevance feedback is achieved and thus leads to a better result.
The combination of image features based on color, shape, and location to perform region matching for image retrieval was proposed by Prasad et al. (2004). Dominant regions within each image are indexed using integrated color, shape, and location features. Various combinations of regions are also indexed. To obtain individual regions, images are divided into 3×3 blocks, and the number of blocks with the largest area of the region is designated. The hash structure is used to store the resulting index and related metadata. The retrieval process is non-cascading and images can be retrieved based on color, shape, or location and also based on a combined color-shape-location index. The results portray the effectiveness of retrieval increases in non-cascaded region-based querying by combined index. Kim et al. (2004) proposed a new automatic method to classify either object or non-object of image. A set of regions located near the center of the image is defined as an object. It is because the color distribution is significant or more noticeable near the center of the image (object) compared to the surrounding area or background of the objects. Three measures for the classification are based on the characteristics of an object. Firstly, calculate the center significance based on the different color distribution between the areas of the center and the surrounding region. Secondly, compute the variance of significantly correlated colors in the image plane. Significantly correlated colors are first defined as the colors of two adjacent pixels that appear more frequently around the center of an image rather than at the background of the image. Finally, to estimate the object based on the edge strength at the boundary of the region. A combination of these measures is used to classify the object of the image by training the neural network.
Meanwhile, Zhou et al. (2005) presented a new method for CBIR that combines the detection of the ROI with relevant feedback. To describe the image contains the ROI approach more accurately compare using global features, meanwhile the relevance feedback makes the system adaptable to the subjective human perception. Color, texture, and shape features are extracted from the ROI. For an illustration of the overall approach, the authors apply color saliency and wavelet feature saliency to determine the ROI. The result portrays that the proposed method performs better performance compared to the global feature-based approaches and region-based techniques without feedback. Next, Chan et al. (2008) presented a novel image feature called color variances among adjacent objects (CVAAO). The CVAAO is not sensitive to distortion and scale variations of images. For contiguous objects in an image, it can effectively describe the principal colors and texture distribution of the image. According to CVAAO, the authors proposed two methods, using full image and clip or region of the image. CVAAO-based image retrieval method for full image meanwhile CVAAO-based on ROI for clip or region image. The results show that the CVAAObased ROI image retrieval method better performance in finding out the database images that meet user requirements. Wang et al. (2013) proposed an ROI-based image retrieval. This research considers user-defined the ROI query more effectively compare to ROI queries automatically in terms of the user's intention. Two characteristics appear when used to define the ROI, firstly the target region is located at the center of the ROI query and the ROI query contains hardly any noisy descriptors which do not belong to the target region. The proposed system integrated the two characteristics above, general bag-of-words image retrieval method and an auxiliary Gaussian weighting (AGW) scheme. A 2-D Gaussian window function is used to weigh each descriptor according to its distance between the center of the ROI query. The score of each image database is computed using the AGW scheme and an efficient re-ranking algorithm is proposed based on the distribution consistency of the Gaussian weight between the matched descriptors of the ROI query and the candidate image. The results demonstrate the proposed system obtain satisfactory retrieval results. Yang et al. (2011) proposed object retrieval to improve performance. The system exploits the information about the visual context of the query object and employs it to compensate for possible uncertainty in feature-based query object representation. Visual elements surrounding the query object in the query image are known as Contextual information. The authors also consider the region of interest (ROI) based on, firstly an uncertain observation of the latent search intent and second the saliency map detected for the query image. Then the search intent scores are integrated into the contextual object retrieval COR model to more effectively meet users' true information needs. Several experiments are conducted on several datasets and the results demonstrate some improvements in the object retrieval performance.
The discussion of some state of art techniques for ROI image retrieval was presented by Shrivastava and Tyagi (2015). In this research, the authors mentioned the problems faced and summarized solutions provided by various researchers for ROI image retrieval and then performed the generalized overall framework for ROI image retrieval. Based on the study it was found that there are many problems related to ROI image retrieval which have not been answered satisfactorily till now. Among them are, first, the accurately reflecting the user intent in query formulation, second, an effective technique for selecting ROI overlapping blocks, third, the technique for considering relative locations of multiple ROIs and finally, the reducing of overall computation time for region matching without affecting the accuracy of the system.
The RBIR system is based on color, texture, and shape features proposed by Vimina and Jacob (2013). Local color, texture, and shape features are extracted from selected sub-blocks of images, and the global color is extracted from the whole mages. All images are divided into different sized blocks ((3x3 grid/blocks, horizontal and vertical grids/block, central block, and the entire image) for extracted features. Histograms of the quantized HSV color space represent color features, texture features namely contrast, energy, correlation, and homogeneity from Gray Level Co-Occurrence Matrix (GLCM) represent the texture and Edge Histogram Descriptor (EHD) features represent shape features. For finding the minimum distance between the subblocks of the query and target image, the authors also used modified Integrated Region Matching (IRM) algorithm, and the proposed method showed better performance compare to some previous methods.
A combination of the mean shift tracking (MST) and an improved expectation-maximization (EM)like (IEML) methods have been used by Chen et al. (2015) to produce a novel ROI image retrieval. MST functions to seek the initial location of the target candidate model meanwhile IEML is used to adaptively change the location and scale of the target candidate model. This function aims to include the relevant region and exclude the irrelevant region as far as possible. As part of this method, images are divided into 4 blocks/regions and used color histogram and spatial features. Compared with the previous, the proposed method can directly find the target candidate model in the candidate image without pre-segmentation in advance and thus causes some performance improvement. Eze et al. (2019) emphasized that content authentication and verification in medical images are required to obtain the region of interest (ROI) in RBIR. This negligence can lead to adversarial modification of the stored image which could have a lethal effect on research, diagnostic outcome, and the outcome of some forensic investigations. The proposed method combined both robust watermarking and fragile steganography with image search features to design a medical RBIR system that incorporates ROI integrity verification. Original ROI features were pre-computed and embedded into archival images and utilized during retrieval for image integrity checks.
Next, Baji and Mocanu (2017) proposed a new method for RBIR using the region of interest (ROI) of images. The ROI technique is based on segmenting the image into fixed partitions. This method is also based on the connected components and interesting objects to generate the histogram and statistical texture feature vectors. Color and texture features of the connected components are computed from the histograms of the quantized HSV color space and Gray Level Co-occurrence Matrix (GLCM), respectively. Histogram intersection is used for the matching process. From the experiments, without any knowledge, the proposed method was able to extract interesting objects from uninteresting objects and a complex background image. Raja et al. (2020) proposed a new method for RBIR using color features. In this method, the authors used Sobel and canny technique to find the region of interest (ROI) with HSV color space was used. For classification, a neural network is used having categorized the data with class labels. To compute the similarity difference, techniques such as like Manhattan distance, Euclidean distance, Chebyshev, Hamming distance, and Jaccard distance are used. The proposed method improves the accuracy and precision result.
From our study it was found that using ROI we can reduce the semantic gap between visual properties and high-level human understanding of the images. However, there is still space in ROI for use in a more efficient image retrieval system. This is a challenges researcher need to face in improving the performance of image retrieval systems using ROI techniques. Fig. 1 represents the block diagram for the proposed method.

Proposed method
Step 1: Pre-Processing Step 2: Forming the region (Region of interest, non region of interest, equal region) Incremental frame and clustering using techniques Step 3: Color feature extraction Step 4: Similarity measure Performance evaluation Fig. 1: The block diagram for the proposed method

Step 1: Pre-processing
This phase involved two processes namely data acquisition and filtering the images. Data acquisition is obtained from an image from the Wang database (Yue et al., 2011;Mamat et al., 2016a). This database contains 1000 images and is divided into 10 categories and is known as African People, Beach, Building, Buses, Dinosaurs, Elephants, Flowers Horses, Mountains, and Food.
Meanwhile, the filtering process is performed to reduce the noise and to increase the quality of the image (Singh and Hemachandran, 2012;Hossain and Islam, 2017;Mamat et al., 2015). To attain this aim, we apply the median filter to an images Median filter is used because this filtering is performing better than the average filtering in the sense of removing impulse noise (Manoharan and Sathappan, 2013;Kannan et al., 2010;Szeliski, 2010;Malviya et al., 2017). Median filter filters that output at each voxel (i, j, k) the median of density values of an input image in the neighborhood of (i, j, k). A voxel is a unit of graphic information that defines a point in threedimensional space (x,y, and z coordinates). The definition of Median filter is as follows (Toriwaki and Yoshida, 2009): where N ((i,j,k))=the neighbourhood.

Step 2: Forming the region
Performing the region is an important part of this paper. The output of this step is identified the region is as the region interest region (ROI), non-region of interest (NROI), or equal region (ER). Several proses are integrated such as incremental frames, k-means clustering, cluster validation, basic theory setcomplement law.
i) Incremental Frame: Portioning the images using an incremental frame. It's stated from frame 1 and increment to frame 5. Using this Frame, allows the entire image area to be used. This gives the advantage that an important object/area is concentrated somewhere in a smaller frame so it has a greater weight value. Similarly, if an important object/area bursts a lot of space in a larger frame then it will gain more weight. This situation gives the advantage of the database being used as it generally contains various positions of objects in the image, from simple images (a small number of objects) to more complex images. Fig. 2. The steps to get this frame are as follows:

An example of an incremental Frame is shown in
1. Get the midpoint pixel of the image 2. Move the pixels from this center point left, right, up, and down to form a rectangular frame (F). 3. If the frame size centered on this midpoint forms an area of as much as:  20% of the total, then the frame is marked as 1 ,  40% of the total, then the frame is marked as 2 ,  60% of the total, then the frame is marked as 3 ,  80% of the total, then the frame is marked as 4 ,  100% of the total, then the frame is marked as 5 .

Fig. 2:
An example of the incremental frame for image portioning ii) K-Means Clustering: To obtain a cluster of regions, k-mean clustering was used. It is because is one of the most widely used clusterings and is the ease of implementation, simplicity, efficiency, and empirical success (Mamat et al., 2015;Zambre and Patil, 2013;Rejito et al., 2017). Fig. 3 shows an example of clusters resulting in k-means. In this example k=3.

Fig. 3: An example of k-means clustering
iii) Cluster Validation-Silhouette Index: The aim of cluster validation is to evaluate the goodness of clusters that produce (Maulik and Bandyopadhyay, 2002) and it is one of the major concerns and essentially important to the success of clustering applications (Jain and Dubes, 1988). Silhouette Index (SI) is a well-known technique for cluster validation (Chaimontree et al., 2010). Besides, according to Arbelaitz et al. (2013), in their research, this index portrays one of the best performing measurements. It is capable of pointing out which objects were placed well within their cluster and which ones are merely somewhere in between clusters. In this research, the authors use only k=2 because to take advantage of the goodness of the cluster (Mamat et al., 2018). iv) Complementation Law-Set Theory: The definition of Complementation Law of set theory. If U is a universal set and A be any subset of U then the complement of A is the set of all members of the universal set U which are not the elements of A (Talib, 2006).
Adaptation of this theory is also used as a part to perform ROI or vice versa. In this case, U=Frame (F), caused by k=2 where only two clusters are in F and x are pixels that form either cluster 1 or 2.

Step 3: Colour features extraction
Color is one of the most extensively used visual content for image retrieval. This is because it is easy to extract rather than shape and texture. In addition, the color feature is relatively robust to background complication and independent of image size and orientation and this has attracted many researchers to use it in their research (Mamat et al., 2016a;Singh and Hemachandran, 2012;Hossain and Islam, 2017;Mamat et al., 2015). Taking advantage of these, color features were used in this study. It includes color moment (CM) and manipulation of color histogram consisting of the color histogram (H), normalized color histogram (NH), and bin of color histogram (BH). The equation below is used to calculate the color features: i) Color Moment: The first three moments, mean, variance and standard deviation have shown to be efficient and effective in representing color distributions of images (Mamat et al., 2015;Chaudhuri, 2007). The equation below is used to compute the mean, variance, and standard deviation of an image of size N x M.
where is the value of the pixel in row i and column j. h(i)=the number of pixels in I (separated to individual color channel) with the intensity value I, for all 0≤i<K and more formally such as: for 0≤j<B. The range of possible values in B is divided into bins of equal size (bin width) =K/B such that the starting value of j is:

Step 4: Similarity measure
To choose the method for determining the similarity or distance of the images depends solely on the CBIR choice. The similarity tries to capture the strength of the relationship between features during comparisons of images in a database. Many types of techniques to compute the distance such as City Block, Manhattan, Euclidean distance (ED), and many others. Because of widely and ease to use, this study applies ED for the RBIR system (Zhang and Lu, 2002).

3.5.
Step 5: Performance evaluation ROI-based image retrieval aims for searching image databases for specific images that are similar to a given query image. There are many RBIR systems available but it is difficult to determine which is the best ones because it is impossible to compare RBIR systems quantitatively and objectively. One reason is the absence of a standard database to determine a set of quantitative performance measures. Two commonly used measurements of performance are precision and recall.
i) Precision and Recall: Performance of ROI system is evaluated and analyzed through Precision and Recall (Liu et al., 2007;Mamat et al., 2016b). Precision (P) is defined as the ratio of the number of relevant images retrieved ( ) to the number of total of the images retrieved K, whilst Recall (R) is defined as the number of retrieved relevant images , over the total number of relevant images available in the database . Precision and Recall is calculated using Eqs. 11 and 12.
ii) Interpolated Precision and Recall (IPR): Interpolated Precision and Recall is related to precision and recall. One of the goals of IPR is to produce a smoother precision and recall graph. Precision and recall graph has a distinctive sawtooth shape, example if the ( + 1) ℎ image retrieved is a non-relevant image and the value of recall is the same as for the top K images. In this case, the value of precision has dropped.
To overcome this problem the interpolated precision is used (Manning et al., 2009;Keilwagen et al., 2014) The IPR can be expressed mathematically as Eq. 13 below, and calculating the value of R, will give the same value as IR. Fig. 4 represent precision and recall graph before and after IPR. iii) Significant Query (SQ) using Average precision: One of the strategies to improve the performance of ROI-based systems is suggested by the use of significant queries (SQ). 10% of images (10 queries) per category are used as queries. The steps to get an SQ are as follows: a) Obtain the threshold, , for interpolated precision (IP) for a particular image query (eg 10 return images for the first 10 queries) for each category. In this study , is the average IP and t is a variable depending on the IP number of the query. , is calculated using Eq. 14.
where c=category of image.
b) Compare IP query (Q) with values , and if IP query is more than and an equal=ϑ , then Q=SQ and vice versa. iv) Experimental setup: Database was developed by Wang and his colleagues from the Pennsylvania State University (Li and Wang, 2003) and is available at http://wang.ist.psu/edu is used for this research.
 Experiment 1: Aim of this experiment is to compute Precision and Recall (PR) and follow by IPR. 10% of the images are used as query and each query consist of 5 frames (F) namely 1( 20 ) F2 ( 40 ), 3( 60 ), 4 ( 80 ) and 5 ( 100 ). Therefore, each image will produce 5 queries (1 image x 5 frames), and each category that consists of 10 query image produce 50 queries (10 image x 5 frames). Next, get Average Interpolated Precision values for all queries per frame for each category.

Results and discussion
This section discusses the experimental results that have been performed and are divided into two sections as in the paragraph below.

Comparison between frames for each category
This comparison aims to find out which frame performs best for each category and the result is shown in Table 1. The result is based on the Average IPR of 20 return images (Can use any number of return image variables. This data is selected as it will be used for comparison with other researchers). In general, one of the reasons for this result is that the position of the homogenous object/region is scattered unevenly within the frame. As an example for the category 5-Dinosaur, the best performing frame is F2, meaning that the region of interest lies in the middle and the surrounding area and the opposite results apply to categories 9 and 10.

Comparison with other research
This section compares the findings with other researchers (Ayan et al., 2016). The comparisons have been shown in Table 2 and Table 3 below. In Table 2, comparisons are made with the best performance in the RGB color model and HSV color model. For RGB color models, the performance of the proposed method is better for Category 1, 3, 4, 7 9, and 10, while in the HSV color model, the category that outperforms is Category 1 2, 3, 4 7, 9, and 10. The proposed method shows a better average IPR value of 3.51% compared to RGB and 22.92% with the HSV color model.

Conclusion
In this paper, ROI based incremental frame for image retrieval is presented to enhance the retrieval performance. The various techniques used to include the filtering process, image partitioning using clustering and incremental frame formation, complementation law (theory set) have been blended to construct the proposed method. The results showing the best performance of frames are not static for this database. In general, the ROI position in the image is scattered. Some are located in the center of the image (e.g., Dinosaur-F5) and some are dispersed throughout the image (e.g., Mountain-F9).
The use of ROI in image retrieval is effective for ROI dispersed in whole images (examples: F2-Beach, F7-Flower, F9-Mountain, F10-Food) and vice versa (example F5-Dinosaurs). However, there are other factors that influence the result, for example, the nature of objects in images, complexity (such as single or multiples objects and variety of color) images, limitation of the proposed method, and others. Compared to other researchers, the proposed method is better performance is 36%, 5%, and 24% compared to methods CH (8, 2, 2), CH (8, 3, 3), and CH (16, 4, 4)). In addition, we plan to use other databases that contain fewer object and shape features to evaluate the proposed method.  where A=CH (8,2,2), B=CH (8,3,3), C=CH (16,4,4)