An optimized DCT compressor based on Cordic-Loeffler approach for wireless endoscopic capsule

Advanced endoscopic imaging needs store large quantities of digitized clinical data. In order to save wireless transmission power and bandwidth, an efficient image compression algorithm needs to be implemented inside the endoscopy capsule. In this paper, an optimized DCT compressor based on coordinate digital computer (Cordic) Loeffler is presented specially designed for wireless capsule endoscopy application. It has obtained by optimizing the Loeffler DCT based on Cordic algorithm. Therefore, the computational complexity is reduced significantly from 38 additions and 16 shift operations to 30 additions and 16 shift operations. Moreover, to further, ameliorate our results, we use modified carry look ahead adder and carry save adder, which have characterized by low power and high speed compared to classical carry look ahead adder. The proposed design is implemented on field programmable gate arrays and simulated using Matlab language. The simulation is performed on several endoscopic images. Hence, the obtained result can significantly guarantee the image quality as long as the average peak signal-to-noise ratio (PSNR) is 38.99 dB. Compared with the contemporary VLSI architectures, our approach can offer, lower power consumption, a high quality image, a lower number of arithmetic operations. It should be noted, that the suggested DCT architecture is very suitable for low-power and high-quality codecs, which grants the image quality.


Introduction
*Thanks to the enormous progress in microelectronics, a wireless endoscopic capsule (WEC) has recently been invented. This capsule allows evaluating the whole Gastro-Intestinal (GI) tract and the small intestines. First, such a capsule was invented by Given Imaging Ltd. Iddan et al. (2000) at the end of the 20-th century. Actually, it is equipped with a CMOS sensor, lighting, a data processing module and a transmission unit, as shown in Fig. 1. After being swallowed by a patient, the capsule passes through the GI tract owing to peristalic intestine movements and takes images that are wirelessly transmitted to the recorder carried by a patient. The wireless capsule uses a tiny wireless camera to take the images of the digestive tract. It takes about 50,000-60,000 digital images for the doctor's review (Xie et al., 2006). The endoscopic capsule needs to be small enough to be swallowed easily and to pass through the human GI tract. Generally, it takes 24 hours to move from mouth to evacuation. The images are transmitted by a wireless radiofrequency transmitter to the workstation, where they are stored (Xie et al., 2006). The transmission of the image data consumes about 90% of the total power in the battery of the endoscopic capsule (Xie et al., 2006). The data should be first compressed to reduce the power of the image data transmission and the communication bandwidth.
In this paper, we propose an efficient hardware architecture of a 1D-Discrete Cosine Transform (DCT) based on Cordic-Loeffler compression algorithm for WEC, which is optimized by taking advantage of certain properties of the novel Cordicbased unified architecture for the DCT and Inverse DCT (IDCT) (Xiao and Huang, 2012). Differently from the architectures existing in the literature, which need 38 additions, and 16 shift operations ours only requires 30 add and 16 shift operations. Basically, our contribution consists in the fact that the resulting Cordic-Loeffler DCT architecture not only reduces the computational complexity and power consumption significantly, but also retains the good transformation quality as the pervious Cordic-Loeffler DCT does. Therefore, the presented Cordicbased Loeffler DCT implementation is especially suited for low-power and high-quality codecs. Fig. 1: Block diagram of a typical endoscopic system (Wahid et al., 2008) This paper is structured as follows. Section 2 presents the related works. Section 3 introduces the Cordic-Loeffler-based DCT algorithm. In section 4 we explain the proposed Cordic-Loeffler DCT architecture. The experimental results are shown in section 5. Section 6 concludes the paper.

Related works
Recently, various kinds of research have been done to design an efficient image compression algorithm for (WCE). In this field, the most utilized compressor is based on the DCT compression algorithm. The authors in Wahid et al. (2008) proposed an efficient hardware implementation of an image compressor using a direct mapping to compute the 2D-DCT for WEC. In addition, in Dung (2011), Cheng et al. (2010), a DCT transform was adopted to implement the compressor algorithm. Furthermore, to implement an efficient DCT compression algorithm suitable for WEC divers technique was utilized. In Loeffler et al. (1989), Rao and Yip (2014), the authors put forward a lower complexity fast DCT algorithm based on a flow graph algorithm, which needs only 11 multiplications and 29 addition operations. However, the common inconvenience of all DCT algorithms that they use floating-point multiplications. These kinds of operations need huge area and power in the hardware and are highly slow in software implementation. Hence, to overcome this problem the authors in Zelinski et al. (2004) suggested a new idea consisting in replacing multiplication by a constant with an addition and shift operation, which has a lower cost in terms of area and power consumption.
On the other hand, in Parfieniuk (2008), Yu and Swartzlander (2002) and Heyne et al. (2006), the authors proposed to use the coordinate rotation digital computer (Cordic) in order to avoid the use of multiplier operations. Moreover, other researches have concluded that the use of a large number of addition operations highly affects the performance of the DCT implementation as in Sun et al. (2007). Thus, in Sun et al. (2006), the authors suggested lowpower high-quality Cordic-based Loeffler DCT architecture. The combination of both Cordic and Loeffler algorithm conducted to reduce the computational complexity. In fact, the number of operations reduced from 11 multiplications and 29 additions to 38 additions and 16 shift operations. Besides, the authors in Thoné et al. (2010) put forward an efficient optimized compression algorithm based on Haar wavelet for the WEC. Hence, a near lossless image compression algorithm based on the Bayer format image suitable for hardware was design presented in Xie et al. (2005).

Cordic-Loefller DCT algorithm
A few years ago, an optimized Cordic-Loeffler DCT algorithm implementation required 38 adders and 16 shift operations, as illustrated in Fig. 2. To perform a DCT transformation without any multiplier, researchers combine both Cordic and Loeffler algorithms to avoid using multiplication due to its complexity. As a result, the DCT algorithm becomes very simple to implement. The original Loeffler DCT is taken as a starting point, and is replaced it by the circular rotation of the Cordic algorithm. To realize the vector rotation that rotates the vector (X, Y) by an angle θ, the circular rotation angle is described as follows (Sun et al., 2007) where, i is the rotation iteration and σ is the vector rotation direction.
Then, the vector rotation (x, y) can be achieved using the iterative equation given as below (Eq. 2): Moreover, the results of the rotation iteration need to be scaled by the compensation factor s (Eq. 3).
When utilizing the Cordic algorithm to replace the multiplications of the 8 DCT points whose θ rotation angles are fixed, it can skip some unnecessary Cordic iterations without losing accuracy, as provided in Table 1, which summarizes the Cordic iteration of the Cordic-Loeffler DCT.

Optimized Cordic-Loeffler DCT algorithm
On the basis of the previous work about the Cordic-Loeffler DCT (Sun et al., 2007), we propose an optimized Cordic-Loeffler DCT algorithm.  (Sun et al., 2007)  angle, as shown in Fig. 3 needs 8 shifts and 8 additions to evaluate its rotation angle. angle  To assess the rotation more efficiently, the number of i and j sequence iterations should be big enough. As it can be noticed, when Eq. 4 and Eq. 5 are used, twice the iterations are replaced with one (Xiao and Huang, 2012;Huang and Xiao, 2013).
Thus, based on these two equations, the conventional unfolded Cordic is modified. The latter requires less shift and addition operations. Therefore, the modified unfolded Cordic flow graph of the 3π 16 angle is shown as Fig. 4. We use this principal for the two other angles: π 16 and 3π 8 ( Fig. 5 and Fig. 6). In the same vein, compared to the conventional unfolded π 16 and 3π 8 Cordic angles, the number of additions is reduced from 4 to 2 and from 6 to 4, respectively. Consequently, the number of additions in all the architecture is reduced from 38 to 30 additions in the main Cordic-Loeffler DCT algorithm.

DCT Cordic-Loeffler architecture
Based on the proposed modified algorithm, an 8 point DCT-based Cordic-Loeffler architecture is presented. This latter is adopted to improve performance and reduce hardware complexity. The suggested architecture of the DCT-based Cordic Loeffler algorithm is represented in Fig. 7. This architecture consists of adders, subtractors and the modified Cordic algorithm. In addition, to further improve the efficiency of the architecture, it is important to speed up the adder efficiency. Hence, a Modified Carry Look Ahead (MCLA) adder is used, thanks to its high speed and low cost (Pai and Chen, 2004). The MCLA adder is similar to Carry Look Ahead (CLA) adder in basic construction. It contains an arithmetic adder circuit and a CLA one. To make it faster, the authors in (Pai and Chen, 2004) proposed to replace the AND and NOT gates in the CLA adder by NAND gates, in order to decrease the cost and increase the speed of the CLA adder.
To design the Cordic architecture, it is important to speed up the adder efficiency. According to Fig. 5, the architecture of the modified unfolded Cordic is implemented, as presented in Fig. 8 using a Carry Save Adder (CSA) and hard-wired shifters.

Experimental results and comparisons
The proposed architecture is described in VHDL (VHSIC Hardware Description Language) and synthesized via Xilinx ISE 13.1 using VIRTEX5 FPGA as a target device. The synthesis results of the architecture show that it occupies 613 slices out of 93,120 and 394 LUTs out of 46,560 and operated at about 226.2 MHz. Added to that, it consumes about 0.037W of power.
The suggested modified DCT-based Cordic-Loeffler algorithm has a low computational complexity compared to other algorithms, as demonstrated in Fig. 9. It is clear from the Fig. 9 that the proposed design requires a less number of operations compared to other different DCT algorithms. It needs only 30 additions and 16 shifts likened with the previous Cordic-based Loeffler DCT and the bin-DCT algorithm which require 38 additions, 16 shifts and 36 additions and 17 shift operations respectively. Therefore, the suggested design is more efficient and has a lower hardware complexity than the original Cordic-Loeffler DCT algorithms and the bin-DCT. It is suitable for low-power and high quality CODECs, especially for battery-based systems.
Furthermore, to verify the performance of our proposed DCT Cordic-Loeffler algorithm, we use the Matlab language. In this paper, we utilize the grayscale endoscopic image of different parts of the gastrointestinal tract as a test Image, as shown in Fig.  11, which presents the original and reconstructed images. Hence, to measure the performance of our algorithm we use the following parameters: Therefore, based on the comprehensive simulation results presented in Table 2, it can be seen that our proposed algorithm has a good performance. As indicated, the Table 2 represents the variation value of the SSIM and MSE for various 8 endoscopic images. We can notice that all value of SSIM is close to 1 whereas the obtained MSE values are low which improves the visual image quality.
Moreover, the qualitative results are given in Fig.  10. It is clear from the found results that our proposed algorithm has a good performance with the greatest PSNR and CR values, which are equal to 42.01 dB and 72.60 %, respectively. Furthermore, the average PSNR value is equal to 38.99 dB, which is about 6 dB higher than that obtained in Wahid et al. (2008), Lin et al. (2006), as provided in Table 3. In all the cases, the PSNR of our reconstruction image are well above 30 dB, which is highly acceptable for medical diagnosis. This implies that the suggested DCT architecture not only reduces the computation complexity, area and power consumption, but also ameliorates the quality of the results in terms of PSNR values.

Conclusion
To overcome the limited conditions, the small size and the power limitation of wireless endoscopic capsule, image compressor should be able to sufficiently compress the captured image to save transmission power and take a small physical area. In addition, an optimized DCT Cordic-Loffler architecture is presented. In this paper, we use a modified DCT based on Cordic Loeffler architecture for image compression and we demonstrate that the number of arithmetic operators can be reduced without losing image quality. The proposed DCT Cordic-Loeffler architecture needs only 30 additions and 16 shifts to carry out the DCT transformation, which has low complexity compared to previous works. Simultaneously, in order to perform the efficiency of the proposed architecture, the MCLA adder and CSA are used to implement the Cordic-Loeffler DCT architecture. Yet, it is clear from the obtained results that the value of the PSNR comes to be the highest in our proposed work. Whence, it not only reduces the computation complexity, but also reduces the area and power consumption compared to the conventional Cordic-Loeffler DCT algorithm. It also keeps a high quality output image.