Cybersecurity in social networks: An ensemble model for Twitter bot detection

Abdulbasit A. Darem; Asma A. Alhashmi; Meshari H. Alanazi; Abdullah F. Alanezi; Yahia Said; Laith A. Darem; Maher M. Hussain

	IJAAS
	International Journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN: 2313-626X Frequency: 12





Volume 11, Issue 11 (November 2024), Pages: 130-141 ---------------------------------------------- Original Research Paper Cybersecurity in social networks: An ensemble model for Twitter bot detection Author(s): Abdulbasit A. Darem^1,, Asma A. Alhashmi¹, Meshari H. Alanazi¹, Abdullah F. Alanezi¹, Yahia Said², Laith A. Darem², Maher M. Hussain³ Affiliation(s):* ¹Department of Computer Science, College of Science, Northern Border University, Arar, Saudi Arabia ²Department of Electrical Engineering, College of Engineering, Northern Border University, Arar, Saudi Arabia ³Department of Civil Engineering, College of Engineering, Northern Border University, Arar, Saudi Arabia Full text Full Text - PDF * Corresponding Author. Corresponding author's ORCID profile: https://orcid.org/0000-0002-5650-1838 Digital Object Identifier (DOI) https://doi.org/10.21833/ijaas.2024.11.014 Abstract The increasing presence of bot accounts on social media platforms creates major challenges for ensuring truthful and reliable online communication. This study examines how well ensemble learning techniques can identify bot accounts on Twitter. Using a dataset from Kaggle, which provides detailed information about accounts and labels them as either bot or human, we applied and tested several machine learning methods, including logistic regression, decision trees, random forests, XGBoost, support vector machines, and multi-layer perceptrons. The ensemble model, which merges predictions from individual classifiers, achieved the best performance, with 90.22% accuracy and a precision rate of 92.39%, showing strong detection capability with few false positives. Our results emphasize the potential of ensemble learning to improve bot detection by combining the strengths of different classifiers. The study highlights the need for reliable and understandable detection systems to preserve the authenticity of social media, addressing the changing tactics used by bot developers. Future research should explore additional types of data and ways to make models easier to understand, aiming to further improve detection results. © 2024 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords Bot detection accuracy, Ensemble learning methods, Social media integrity, Machine learning classifiers, Model interpretability Article history Received 18 July 2024, Received in revised form 1 September 2024, Accepted 30 October 2024 Acknowledgment The authors gratefully acknowledge the approval and the support of this research study by grant no. SCIA-2023-12-2341 from the Deanship of Scientific Research at Northern Border University, Arar, K.S.A. Compliance with ethical standards Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Citation: Darem AA, Alhashmi AA, Alanazi MH, Alanezi AF, Said Y, Darem LA, and Hussain MM (2024). Cybersecurity in social networks: An ensemble model for Twitter bot detection. International Journal of Advanced and Applied Sciences, 11(11): 130-141 Permanent Link to this page Figures Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Tables Table 1 Table 2 Table 3 Table 4 ---------------------------------------------- References (33) Alothali E, Zaki N, Mohamed EA, and Alashwal H (2018). Detecting social bots on Twitter: A literature review. In the International Conference on Innovations in Information Technology, IEEE, Al Ain, UAE: 175-180. https://doi.org/10.1109/INNOVATIONS.2018.8605995 [Google Scholar] Bibi M, Hussain Qaisar Z, Aslam N, Faheem M, and Akhtar P (2024). TL‐PBot: Twitter bot profile detection using transfer learning based on DNN model. Engineering Reports, 6(9): e12838. https://doi.org/10.1002/eng2.12838 [Google Scholar] Bijalwan A, Chand N, Pilli ES, and Krishna CR (2016). Botnet analysis using ensemble classifier. Perspectives in Science, 8: 502-504. https://doi.org/10.1016/j.pisc.2016.05.008 [Google Scholar] Cohen J (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37-46. https://doi.org/10.1177/001316446002000104 [Google Scholar] Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M (2017). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In the 26^th International Conference on World Wide Web Companion, Perth, Australia: 963-972. https://doi.org/10.1145/3041021.3055135 [Google Scholar] Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M (2020). Emergent properties, models, and laws of behavioral similarities within groups of Twitter users. Computer Communications, 150: 47-61. https://doi.org/10.1016/j.comcom.2019.10.019 [Google Scholar] Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M (2018). Social fingerprinting: Detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing, 15: 561-576. https://doi.org/10.1109/TDSC.2017.2681672 [Google Scholar] Davis J and Goadrich M (2006). The relationship between precision-recall and ROC curves. In the 23^rd International Conference on Machine learning, Association for Computing Machinery, Pittsburgh, USA: 233-240. https://doi.org/10.1145/1143844.1143874 [Google Scholar] PMCid:PMC3242122 Dietterich TG (2000). Ensemble methods in machine learning. In the 1^st International Workshop on Multiple Classifier Systems, Springer, Cagliari, Italy: 1-15. https://doi.org/10.1007/3-540-45014-9_1 [Google Scholar] Elhadad MK, Li KF, and Gebali F (2021). An ensemble deep learning technique to detect COVID-19 misleading information. In: Barolli L, Li K, Enokido T, and Takizawa M (Eds.), Advances in Networked-Based Information Systems: The 23^rd International Conference on Network-Based Information Systems: 163-175. Springer International Publishing, Cham, Switzerland. https://doi.org/10.1007/978-3-030-57811-4_16 [Google Scholar] Fernquist J, Kaati L, and Schroeder R (2018). Political bots and the Swedish general election. In the IEEE International Conference on Intelligence and Security Informatics, IEEE, Miami, USA: 124-129. https://doi.org/10.1109/ISI.2018.8587347 [Google Scholar] Ferrara E, Varol O, Davis C, Menczer F, and Flammini A (2016). The rise of social bots. Communications of the ACM, 59(7): 96-104. https://doi.org/10.1145/2818717 [Google Scholar] Ilias L and Roussaki I (2021). Detecting malicious activity in Twitter using deep learning techniques. Applied Soft Computing, 107: 107360. https://doi.org/10.1016/j.asoc.2021.107360 [Google Scholar] Ilias L, Kazelidis IM, and Askounis D (2024). Multimodal detection of bots on X (Twitter) using transformers. IEEE Transactions on Information Forensics and Security. https://doi.org/10.1109/TIFS.2024.3435138 [Google Scholar] Jain AK, Sahoo SR, and Kaubiyal J (2021). Online social networks security and privacy: Comprehensive review and analysis. Complex and Intelligent Systems, 7: 2157-2177. https://doi.org/10.1007/s40747-021-00409-7 [Google Scholar] Knauth J (2019). Language-agnostic Twitter-bot detection. In the International Conference on Recent Advances in Natural Language Processing, Varna, Bulgaria: 550-558. https://doi.org/10.26615/978-954-452-056-4_065 [Google Scholar] Kotsiantis SB, Kanellopoulos D, and Pintelas PE (2006). Data preprocessing for supervised learning. International Journal of Computer Science, 1: 111-117. [Google Scholar] Kudugunta S and Ferrara E (2018). Deep neural networks for bot detection. Information Sciences, 467: 312-322. https://doi.org/10.1016/j.ins.2018.08.019 [Google Scholar] Lever J, Krzywinski M, and Altman N (2019). Points of significance: Principal component analysis. Nature Methods, 14: 641-643. https://doi.org/10.1038/nmeth.4346 [Google Scholar] Levonian Z, Dow M, Erikson D, Ghosh S, Miller Hillberg H, Narayanan S, Terveen L, and Yarosh S (2021). Patterns of patient and caregiver mutual support connections in an online health community. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW3): 1–46. https://doi.org/10.1145/3434184 [Google Scholar] Marques DSL (2023). Dataset for detecting bots on Twitter. Kaggle. Available online at: https://www.kaggle.com/datasets/diegoslmarques/dataset-para-deteco-de-bots-no-twitter Minnich A, Chavoshi N, Koutra D, and Mueen A (2017). BotWalk: Efficient adaptive exploration of Twitter bot networks. In the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, Association for Computing Machinery, Sydney, Australia: 467-474. https://doi.org/10.1145/3110025.3110163 [Google Scholar] Moe WW and Schweidel DA (2017). Opportunities for innovation in social media analytics. Journal of Product Innovation Management, 34: 697-702. https://doi.org/10.1111/jpim.12405 [Google Scholar] Potdar K, Pardawala TS, and Pai CD (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International Journal of Computer Applications, 175: 7-9. https://doi.org/10.5120/ijca2017915495 [Google Scholar] Ramalingaiah A, Hussaini S, and Chaudhari S (2021). Twitter bot detection using supervised machine learning. Journal of Physics: Conference Series, 1950: 012006. https://doi.org/10.1088/1742-6596/1950/1/012006 [Google Scholar] Rauchfleisch A and Kaiser J (2020). The false positive problem of automatic bot detection in social science research. PLOS ONE, 15: e0241045. https://doi.org/10.1371/journal.pone.0241045 [Google Scholar] PMid:33091067 PMCid:PMC7580919 Sagi O and Rokach L (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8: e1249. https://doi.org/10.1002/widm.1249 [Google Scholar] Sallah A, Alaoui EAA, and Agoujil S (2023). Transformer-based models for detecting bots on Twitter. In: Elkhattabi EM, Boutahir M, Termentzidis K, Nakamura K, and Rahmani A (Eds.), International Conference on Advanced Materials for Sustainable Energy and Engineering: 122-127. Springer Nature, Cham, Switzerland. https://doi.org/10.1007/978-3-031-57022-3_16 [Google Scholar] Shao C, Ciampaglia GL, Varol O, Yang KC, Flammini A, and Menczer F (2018). The spread of low-credibility content by social bots. Nature Communications, 9: 4787. https://doi.org/10.1038/s41467-018-06930-7 [Google Scholar] PMid:30459415 PMCid:PMC6246561 Vaidya GM and Kshirsagar MM (2020). A survey of algorithms, technologies and issues in big data analytics and applications. In the 4^th International Conference on Intelligent Computing and Control Systems, IEEE, Madurai, India: 347-350. https://doi.org/10.1109/ICICCS48265.2020.9121064 [Google Scholar] Varol O, Ferrara E, Davis C, Menczer F, and Flammini A (2017). Online human-bot interactions: Detection, estimation, and characterization. In the International AAAI Conference on Web and Social Media, Montreal, Canada, 11: 280-289. https://doi.org/10.1609/icwsm.v11i1.14871 [Google Scholar] Wang AH (2010). Detecting spam bots in online social networking sites: A machine learning approach. In: Foresti S and Jajodia S (Eds.), Data and applications security and privacy: 335-342. Springer, Berlin, Germany. https://doi.org/10.1007/978-3-642-13739-6_25 [Google Scholar] Yang KC, Varol O, Hui PM, and Menczer F (2020). Scalable and generalizable social bot detection through data selection. In the AAAI Conference on Artificial Intelligence, AAAI Press, New York, USA, 34: 1096-1103. https://doi.org/10.1609/aaai.v34i01.5460 [Google Scholar]

Cybersecurity in social networks: An ensemble model for Twitter bot detection

Full text

Digital Object Identifier (DOI)

Abstract

Keywords

Article history

Citation:

References (33)