Automatic detection of cyberbullying and threatening in Saudi tweets using machine learning

Social media has become a major factor in people's lives, which affects their communication and psychological state. The widespread use of social media has formed new types of violence, such as cyberbullying. Manual detection and reporting of violent texts in social media applications are challenging due to the increasing number of social media users and the huge amounts of generated data. Automatic detection of violent texts is language-dependent, and it requires an efficient detection approach, which considers the unique features and structures of a specific language or dialect. Only a few studies have focused on the automatic detection and classification of violent texts in the Arabic Language. This paper aims to build a two-level classifier model for classifying Arabic violent texts. The first level classifies text into violent and non-violent. The second level classifies violent text into either cyberbullying or threatening. The dataset used to build the classifier models is collected from Twitter, using specific keywords and trending hashtags in Saudi Arabia. Supervised machine learning is used to build two classifier models, using two different algorithms, which are Support Vector Machine (SVM), and Naive Bayes (NB). Both models are trained in different experimental settings of varying the feature extraction method and whether stop-word removal is applied or not. The performances of the proposed SVM-based and NB-based models have been compared. The SVM-based model outperforms the NB-based model with F1 scores of 76.06%, and 89.18%, and accuracy scores of 73.35% and 87.79% for the first and second levels of classification, respectively.


Introduction
*In recent years, social media has been widely used around the world. The role of using social media is to allow people to communicate, exchange messages, share knowledge, and interact with each other. Social media use has become an increasingly popular component of everyday activities. Due to that, a huge amount of data on social media websites and microblogs, such as Twitter and Facebook, are being added every day (Altaher, 2017). A study has shown that Saudi Arabia has the highest annual growth rate of social media users around the world (Alruily, 2020). With Twitter users posting about 500 million tweets per day, over 30% of these tweets are from Saudi Arabia (Alruily, 2020).
With this growth of social media websites and the increasing number of users, the forms of abuse and violence have evolved from the real world to the virtual world. Although social media have helped to connect people around the world, some people abuse this technology by using violent texts to verbally attack other users in many ways, such as bullying, insulting, swearing, and extortion. Violent text is abuse that takes place over digital devices like cell phones, computers, and tablets (Haidar et al., 2016). Social media violence, such as cyberbullying, can have a negative effect on people's psychological and mental health that could even be worse than physical violence, especially for teenagers and young people. Cyberbullying can spread at a wider scale than real-world bullying, in addition, violent text posted on social media is left there forever, which could have a long-term effect on people unless these posts are reported and removed. Many of the individuals who are affected by social media violence do not report such incidents for several reasons, which include fearing that things will get worse or being under threat by the bully who prevents them from reporting these incidents. Moreover, due to a large number of users and the huge amounts of social media data, which are generated on a daily basis, it is difficult to detect, track, and stop such kinds of attacks manually.
Thus, the aim of this paper is to automatically detect violent tweets on Twitter. Compared to existing papers which have been proposed in the literature for violent Arabic text detection and classification Duwairi and Qarqaz, 2014;El-Naggar et al., 2017;Biltawi et al., 2017;Mouheb et al., 2018;Haidar et al., 2017), this paper is focusing on the Saudi dialect. The main contribution of this paper is to build a two-level classification model for violent text as follows:  First level of classification classifies the tweets into either violent or non-violent text.  Second level of classification classifies the violent text into either cyberbullying or threatening.
The rest of this paper is organized as follows: Section 2 presents the related work on text classification, Section 3 describes the proposed methodology, Section 4 presents the experiments, results, and discussion. Finally, Section 5 concluded this paper.

Related works
Several works on detecting hate speech and offensive language have been done in many languages such as English (Gambäck and Sikdar, 2017;Mahmud et al., 2008;Spertus, 1997), German (Wiegand et al., 2018;Schneider et al., 2018;Ross et al., 2017), Hindi (Kumar et al., 2018;Modha et al., 2018), Mexican Spanish (Díaz-Torres et al., 2020), Dutch (Van Hee et al., 2015), and Arabic c Duwairi and Qarqaz, 2014;El-Naggar et al., 2017;Biltawi et al., 2017;Mouheb et al., 2018;Haidar et al., 2017). A review of techniques used in Arabic language cyberbullying detection, including natural language processing, and machine learning have been presented in Haidar et al. (, .2016). Using a set of 175 million Arabic tweets collected during March 2014, a list of obscene words was extracted to be used in identifying offensive text content (Mubarak et al., 2017). These words were used to build a corpus of 660 thousand Arabic offensive tweets, which were collected between April 15, 2019, and May 6, 2019 . About 10 thousand of these tweets were annotated manually by experienced annotators . The authors provided the annotators with a set of guidelines to help them in labeling the tweets as either offensive or clean where offensive tweets include vulgar and hate speech . Likewise, a dataset of 15,050 Arabic comments on celebrities in the Arab world was collected from YouTube videos in July 2017 (Alakrot et al., 2018). The comments were annotated as either offensive or inoffensive by three annotators from different Arab countries (Alakrot et al., 2018). Further, a Support Vector Machine (SVM) classifier was used on the prepossessed dataset with/without stemming and with/without normalization and the results showed that data preprocessed dataset with stemming can enhance the detection of offensive comments (Alakrot et al., 2018).
Detecting and classifying cyberbullying Arabic tweets in real-time based on their strength was proposed in Mouheb et al. (2019). The authors created a list of offensive words with three different classes, which are mild, medium, and strong. If a comment contains any word from the offensive words list, it is classified as cyberbullying. In addition, detected cyberbullying tweets were classified based on their strengths by assigning a weight function for each comment. The weight function considers the number of bullying words in the comment and the weight of each word. A dataset of 100,327 tweets and comments were collected from Microsoft Flow and YouTube and classified as either cyberbullying or not based on lexicon using Pointwise Mutual Information (PMI), Chi-square, and Entropy approaches (AlHarbi et al., 2019).
Multiple classifiers were applied to Arabic text to detect offensive language. A single learner machine learning (SVM, logistic regression, and decision tree) and ensemble machine learning (bagging, Adaboost, and random forest) were applied on Arabic offensive tweets collected in Al-Khalifa et al. (2020) for the purpose of detecting offensive language in Arabic text (Husain, 2020). The results showed that ensemble machine learning achieved better results than single learner machine learning and bagging ensemble machine learning classifiers was the best in detecting offensive language. A comparison of four neural network classifiers, which are Convolutional Neural Network (CNN), Bidirectional Long Short Term Memory (Bi-LSTM), attention Bi LSTM, and a combined CNN-LSTM on a was done in Mohaouchane et al. (2019). The data set used was created in Alakrot et al. (2018) and the results showed that the combined CNN-LSTM achieved the best recall and the CNN achieved the best accuracy and precision among the classifiers for detecting offensive on Arabic social media (Mohaouchane et al., 2019).

Methodology
The proposed methodology consists of five main steps, as illustrated in Fig. 1. The first step involves collecting the required data to build the classifier model. Then, the data is preprocessed and annotated to train the model. After that, the classification is done in two levels, where the first level of classification classifies the tweets into either violent text or non-violent. Further, the violent text is classified using the second classifier into cyberbullying or threatening.

Data collection
Twitter is a social website where people write their opinions and thoughts about different topics, which makes Twitter rich with text data. With the help of Twitter API and Tweepy which is a library available in the python language, the total number of tweets that have been collected is 3700. Those tweets are collected using 50 keywords and common hashtags in Saudi Arabia, in addition to using different filter settings to ensure that the collected dataset contains a sufficient number of violent texts from both categories, cyberbullying and threatening. Table 1 and Table 2 show samples of the cyberbullying and threatening keywords, respectively, which are used to collect the dataset from Twitter. Moreover, there are some keywords that did not indicate cyberbullying by itself such as "mryḍ," which means being sick. When this word is used alone, it retrieves a normal tweet such as "āllhm āšfy kl mryḍ," which is a prayer for a sick person to recover from illness. However, when adding some prefix to it such as adding "yā" to "mryḍ" it becomes "yāmryḍ," which is an insulting text that means "you are sick." Using the resulting term "yāmryḍ" would retrieve cyberbullying tweets. In addition, in the Arabic language, a word may be written in different forms such as "dbh'" and "dbh," which means "you are fat like a bear." The different forms of the same words are added to the list.  Also, common violent phrases in Saudi Arabian dialect were added to both lists. For example, "ābn āmk wryny wǧhk," which "means show me your face if you dare" and has been added to the threatening list. Table 3 shows some examples of these sentences. The retrieved tweets are saved in two excel files. Some of the tweets which include non-Saudi dialect, advertisement, and non-text contents have been manually removed from the dataset. Thus, after the cleaning process, the resulting dataset contains 2000 tweets which include both violent text and nonviolent text.

Data annotation
Using guidelines from two experts in psychology and a handbook from the "Be Free" program of the Bahrain women's association, named "Say no to cyberbullying," the tweets are annotated as either normal, cyberbullying, or threatening. To annotate the tweets, a copy of the tweets associated with the guidelines was sent to two annotators. A third annotator is involved only when there is a disagreement between the two participants as the final label for each tweet. After the annotation process is done, the agreement between the annotators has been calculated to ensure the reliability and quality of the annotation process using Cohen's Kappa agreement, which considers the fact the annotators may disagree or agree by chance (Vieira et al., 2010;Al-Kabi et al., 2016). A substantial agreement of 80% has been found between the annotators. Table 4 shows examples of annotated tweets based on the guidelines. Table 5 summarizes the number of annotated tweets for each class, non-violent, cyberbullying, and threatening.

Data pre-processing
The pre-processing phase involves four main tasks, which are tokenization, noise removal, normalization, and stop-word removal. Fig. 2 shows the workflow of the pre-processing step. You are a fat girl who should not use the elevator Get out! Use the stairs and burn some calories leave some space for us in the elevator.
I swear I will kill you, niger don't be a skunk, or else I will block you.
I have a question for people who love to read I am thinking that I should educate myself and use my time efficiently in reading. Would you give some advice on the first book I should read?
Normal (Non-Violent text)  Fig. 2: Summary of data pre-processing  Tokenization: The first step of pre-processing is tokenization. This tokenization is the process of breaking up the tweets into separate words based on the space, comma, semicolon, colon, and dot. The main benefit is to deal with each word separately which makes the features extraction process for the next cycle easier. For example, after applying tokenization to the following sentence: "trāk qzm mṯlhā lā tswy fyhā hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh" which means "you are a dwarf. Do not overestimate yourself; LOL!," the generated tokens are as follows: "trāk," "qzm," "mṯlhā," "lā," "tswy," "fyhā," "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh" tokenization was done using the Natural Language Toolkit (NLTK) library in Python.
 Noise Removal: Noise removal involves removing digits of characters or text parts that may interfere with the text analysis. Non-Arabic terms, stop words, numbers, punctuations, emojis, hashtags, and URLs were removed using (NLTK) and regular expression (RE) libraries in Python. Table 6 shows examples of different noise types found within the text and how they are removed. In the Arabic language, some words contain at least two duplicates letters but they are not redundant, but rather they are part of the word such as the letter "b" in "bbġāʾ," which means "parrot" and the letter "m" in "mmtāz," which means "excellent." These letters should not be removed from the words. Thus, a list of words with redundant letters that are

Stop-word Removal
Dataset Normalization Noise Removal Tokenization Cleaned Dataset part of the word itself is created and it contains 35 words. This list is used to check if the redundant letter is a part of the word or not. If the word is included in the list, the redundant letter will not be removed. Table 7 shows some examples of the rendered-letter list.
 Normalization: Text normalization is the process of converting text into one single form by replacing similar letters that are used interchangeably in the Arabic language. Table 8 shows examples of these interchangeable letters. For example, the words (aaḍrbk) and (-aḍrbk), which means "hit you," will be considered as different words by the classifier while they have the same meaning. Additionally, normalizing any diacritics (ḥrkāt) for all the text and removing elongated letters that would appear in the text were also considered. For normalization, the Araby library in python was used. Table 9 shows examples of normalization applied to sentences. By normalizing words, only one form of a word with a specific meaning is used.
 Stop words Removal: Stop-word-removal aims to remove insignificant words. Stop-words are words that are commonly used in a language, and carry no useful information. These words include prepositions, conjunctions, pronouns, and others. While Stop word removal does not affect the meaning of a sentence, it can affect classification performance positively and improve its accuracy.
To show the effect of stop word removal, a comparison is done in experiments to compare the performance of the considered model with/without stop words removal. Table 10 shows some examples of the stop word list. Table 11 shows an example of applying all the pre-processing steps.

Data classification
The classification is done in two levels as shown in Fig. 3 where the first level is classifying the tweet into violent text or non-violent, then the second level classifies the violent text into cyberbullying or threatening.
The data was classified by using supervised machine learning algorithms namely SVM and NB. Also, two features selection methods were applied. The first one is a pre-trained distributed word representation model named Aravec (Soliman et al., 2017). It has been trained on 1,476,715 vocabularies gathered from Twitter. The other method is the term frequency (TF) method, which measures how frequently a term occurs in a document (Utomo and Sibaroni, 2019).

Model evaluation
For model evaluation, the cross-validation strategy which is a common classifier evaluation strategy divides the dataset randomly into k subsets or "folds" (F1, F2, ..., Fn) of the same size. In the first iteration, the test will be in F1 while the other subset from F2 to Fn are the training data. In the second iteration, F2 will be the test data, and F1, F3, ..., Fn are the train data, and so on (Han et al., 2011). In this paper, the cross-validation strategy has been applied to the testing set with 10 folds.
In this paper, the cross-validation strategy has been applied to the testing set with 10 folds. Different evaluation measures have been used to evaluate the classification models. These measures are calculated for each test experiment and then averaged over all tests. Assuming the is the total number of instances in the test dataset, is an instance tweet in the dataset, and , and are the model predicted and the actual labels, respectively, accuracy, precision, recall, and F-measure can be defined as follows: (1)  Precision: represents the probability that the tweets which have been classified by the classifier as class (e.g., cyberbullying) are actually belonging to class (El-Makky et al., 2014). The following Eq. 2 is used to calculate the precision.
(2)  Recall: calculates the probability that the tweets of class (e.g., cyberbullying) are classified as class by the classifier (El-Makky et al., 2014). Eq. 3 is used to calculate recall as follows: (3)  F-Measure: is an evaluation measure that combines both precision and recall (El-Makky et al., 2014). Eq. 4 is used to calculate F1 as follows: (4)

Experiments and results
The following sub-sections describe the experimental setting used in this study along with the results of the conducted experiments.

The first level of classification
In the first level of classification, the tweets are classified into either violent or non-violent. Four experiments are done at this level. In all of them the tokenization, noise removal, and normalization are applied to the dataset. Also, in all the experiments, a cross-validation strategy is applied. The experiments differ in the feature extraction method, and whether or not stop words are removed. Table 12 shows the details of the experiments. Each experiment is applied using the two algorithms SVM and NB. Table  13 shows the results obtained from the first level of classification. As shown in Table 13, the best percentage achieved by the NB algorithm was obtained in experiment four with the TF method and with stopword removal. While the best percentage achieved by the SVM algorithm was obtained in experiment three with the TF method and without stop word removal. The highest percentage among all the experiments was in the third experiment using SVM with an accuracy of 73.35% and F1 of 76.06%. Therefore, SVM with the TF method and without stop word removal is selected as the best model at this level. The violent tweets that are classified using this model are used as input for the second level of classification.

The second level of classification
Since the highest results in the first level of classification were obtained without stop word removal, stop words are not removed in the second level. Therefore, two experiments are done using the two different feature extraction methods, which are AraVec pre-trained model and TF. Each experiment is applied using the two algorithms SVM and NB.
The experiment results are shown in Table 14. The highest result was in the second experiment setting which uses the SVM algorithm with an achieved an accuracy of 87.79% and F1 of 89.18%.

Conclusion and future work
This paper presented a model for automatic detection and classification of violent text. This paper aims to create a model that detects the phenomena of cyberbullying and threats in social media using a two-level classifier model. The first level classifies text into violent and non-violent and the second level classifies violent text into cyberbullying and threatening. The dataset was consisting of 2000 tweets collected using Twitter API that was manually labeled. Finally, the tweets are pre-processed to fit into the classifier by removing all the noises. Supervised machine learning was used, the two algorithms SVM and NB were trained in different settings. For the first level of the classification, four experiments were done and the SVM achieves higher percentages using the pretrained model and stop-word removed. The results were 73.35%, 75.86%, 76.62%, and 76.06% for accuracy, precision, recall, and F1, respectively. In the second level of classification, two experiments were done, SVM achieves higher than NB, using TF with an accuracy of 87.79%, a precision of 86.51%, recall of 92.21%, and F1 of 89.18%. Future work will focus on including other types of violent text and adding more features as knowing if the text is considered as a violent text based on the use of emojis, tashkil, and other Arabic dialects.

Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.