An empirical study of extracting embedded text from digital images

doi:10.21833/ijaas.2023.06.006

	IJAAS
	International Journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN: 2313-626X Frequency: 12





Volume 10, Issue 6 (June 2023), Pages: 48-53 ---------------------------------------------- Original Research Paper An empirical study of extracting embedded text from digital images Author(s): Emad Shafie * Affiliation(s): Department of Computer and Applied Science, Applied College, Umm Al-Qura University, Mecca, Saudi Arabia Full Text - PDF XML * Corresponding Author. Corresponding author's ORCID profile: https://orcid.org/0000-0003-2041-6380 Digital Object Identifier: https://doi.org/10.21833/ijaas.2023.06.006 Abstract: The utilization of images as a means of transferring information is a widespread technique employed to circumvent simple detection functions that primarily focus on analyzing textual content rather than conducting thorough file examinations. This study investigates the efficacy of deep learning models in detecting embedded information within digital images. The data used for analysis was acquired from a secondary source and underwent comprehensive preprocessing. Feature extraction, sequence labeling, and predictive model training were performed using CRNN, CNN, and RNN models. Two specific models were trained and tested in this research: 1) CNN, RNN-LSTM with the Adam optimizer, and 2) CNN, RNN-GRU with the RAdam optimizer for text detection. The findings reveal that Model #1 achieved the highest F1-score during testing, with a score of 98.37% for text detection and 96.73% for word detection. The second model obtained an F1-score of 94.84% and 93.05% for text and word detection, respectively. Model #1 exhibited a word detection accuracy of 98.38% and a text detection accuracy of 96.47%. These findings indicate that the first model outperformed the second model, suggesting that employing RNN-LSTM and the Adam optimizer made a positive impact. Therefore, utilizing deep learning tools and emerging technologies is crucial for extracting textual information and analyzing visual data. In summary, this study concludes that deep learning models can be relied upon to effectively detect textual information embedded within digital images. © 2023 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords: Convoluted neural networks, Deep learning, Long short-term memory, Digital images, Text detection, Embedded information Article History: Received 13 December 2022, Received in revised form 2 April 2023, Accepted 6 April 2023 Acknowledgment No Acknowledgment. Compliance with ethical standards Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Citation: Shafie E (2023). An empirical study of extracting embedded text from digital images. International Journal of Advanced and Applied Sciences, 10(6): 48-53 Permanent Link to this page Figures No Figure Tables Table 1 Table 2 Table 3 Table 4 Table 5 ---------------------------------------------- References (21) Al-Saffar A, Awang S, Al-Saiagh W, Al-Khaleefa AS, and Abed SA (2021). A sequential handwriting recognition model based on a dynamically configurable CRNN. Sensors, 21(21): 7306. https://doi.org/10.3390/s21217306 [Google Scholar] PMid:34770612 PMCid:PMC8587523 Cao Y, Ma S, and Pan H (2020). FDTA: Fully convolutional scene text detection with text attention. IEEE Access, 8(1): 155441-155449. https://doi.org/10.1109/ACCESS.2020.3018784 [Google Scholar] Chang L, Li D, Hameed MK, Yin Y, Huang D, and Niu Q (2021). Using a hybrid neural network model DCNN–LSTM for image-based nitrogen nutrition diagnosis in muskmelon. Horticulturae, 7(11): 489. https://doi.org/10.3390/horticulturae7110489 [Google Scholar] Huang C and Xu J (2019). An anchor-free oriented text detector with connectionist text proposal network. In the 11^th Asian Conference on Machine Learning, PMLR, Nagoya, Japan, 101: 631-645. https://doi.org/10.1145/3318299.3318373 [Google Scholar] Huang Z, Lin J, Yang H, Wang H, Bai T, Liu Q, and Pang Y (2020). An algorithm based on text position correction and encoder-decoder network for text recognition in the scene image of visual sensors. Sensors, 20(10): 2942. https://doi.org/10.3390/s20102942 [Google Scholar] PMid:32455941 PMCid:PMC7285298 Jaderberg M, Simonyan K, Vedaldi A, and Zisserman A (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116: 1-20. https://doi.org/10.1007/s11263-015-0823-z [Google Scholar] Kang J, Ibrayim M, and Hamdulla A (2022). MR-FPN: Multi-level residual feature pyramid text detection network based on self-attention environment. Sensors, 22(9): 3337. https://doi.org/10.3390/s22093337 [Google Scholar] PMid:35591028 PMCid:PMC9102995 Kang X, Huang H, Hu Y, and Huang Z (2021). Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion. Digital Signal Processing, 116: 103110. https://doi.org/10.1016/j.dsp.2021.103110 [Google Scholar] Levy I and Schiller D (2021). Neural computations of threat. Trends in Cognitive Sciences, 25(2): 151-171. https://doi.org/10.1016/j.tics.2020.11.007 [Google Scholar] PMid:33384214 PMCid:PMC8084636 Li X, Liu J, Zhang G, Huang Y, Zheng Y, and Zhang S (2021). Learning to predict more accurate text instances for scene text detection. Neurocomputing, 449: 455-463. https://doi.org/10.1016/j.neucom.2021.04.035 [Google Scholar] Li Y, Silamu W, Wang Z, and Xu M (2022). Attention-based scene text detection on dual feature fusion. Sensors, 22(23): 9072. https://doi.org/10.3390/s22239072 [Google Scholar] PMid:36501774 PMCid:PMC9739706 Li Z, Zhou Y, Sheng Q, Chen K, and Huang J (2020). A high-robust automatic reading algorithm of pointer meters based on text detection. Sensors, 20(20): 5946. https://doi.org/10.3390/s20205946 [Google Scholar] PMid:33096701 PMCid:PMC7589492 Nagaoka Y, Miyazaki T, Sugaya Y, and Omachi S (2017). Text detection by faster R-CNN with multiple region proposal networks. In the 14^th IAPR International Conference on Document Analysis and Recognition, IEEE, Kyoto, Japan, 6: 15-20. https://doi.org/10.1109/ICDAR.2017.343 [Google Scholar] Nozari H and Sadeghi ME (2021). Artificial intelligence and machine learning for real-world problems (A survey). International Journal of Innovation in Engineering, 1(3): 38-47. https://doi.org/10.59615/ijie.1.3.38 [Google Scholar] Raisi Z, Naiel MA, Fieguth P, Wardell S, and Zelek J (2020). Text detection and recognition in the wild: A review. ArXiv Preprint ArXiv:2006.04305. https://doi.org/10.48550/arXiv.2006.04305 [Google Scholar] Shah D, Osinski B, Ichter B, and Levine S (2023). LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. ArXiv Preprint ArXiv:2207.04429. https://doi.org/10.48550/arXiv.2207.04429 [Google Scholar] Verdonck T, Baesens B, Óskarsdóttir M, and vanden Broucke S (2021). Special issue on feature engineering editorial. Machine Learning, 1-12. https://doi.org/10.1007/s10994-021-06042-2 [Google Scholar] Wang X, Zheng S, Zhang C, Li R, and Gui L (2021). R-YOLO: A real-time text detector for natural scenes with arbitrary rotation. Sensors, 21(3): 888. https://doi.org/10.3390/s21030888 [Google Scholar] PMid:33525619 PMCid:PMC7865800 Wang Y, Mamat H, Xu X, Aysa A, and Ubul K (2022). Scene Uyghur text detection based on fine-grained feature representation. Sensors, 22(12): 4372. https://doi.org/10.3390/s22124372 [Google Scholar] PMid:35746154 PMCid:PMC9229707 Xiao L, Zhou P, Xu K, and Zhao X (2021). Multi-directional scene text detection based on improved YOLOv3. Sensors, 21(14): 4870. https://doi.org/10.3390/s21144870 [Google Scholar] PMid:34300607 PMCid:PMC8309843 Zhao F, Shao S, Zhang L, and Wen Z (2021). A straightforward and efficient instance-aware curved text detector. Sensors, 21(6): 1945. https://doi.org/10.3390/s21061945 [Google Scholar] PMid:33802093 PMCid:PMC8000375