Detecting block ciphers generic attacks: An instance-based machine learning method

Cryptography facilitates selective communication through encryption of messages and or data. Block-cipher processing is one of the prominent methods for modern cryptographic symmetric encryption schemes. The rise in attacks on block-ciphers led to the development of more difficult encryption schemes. However, attackers decrypt block-ciphers through generic attacks given sufficient time and computing. Recent research had applied machine learning classification algorithms to develop intrusion detection systems to detect multiple types of attacks. These intrusion detection systems are limited by misclassifying generic attacks and suffer reduced effectiveness when evaluated for detecting generic attacks only. Hence, this study introduced and proposed k -nearest neighbors, an instance-based machine learning classification algorithm, for the detection of generic attacks on block-ciphers. The value of k was varied (i.e., 1, 3, 5, 7, and 9) and multiple nearest neighbors classification models were developed and evaluated using two distance functions (i.e., Manhattan and Euclidean) for classifying between generic attacks and normal network packets. All nearest neighbors models using the Manhattan distance function performed better than their Euclidean counterparts. The 1-nearest neighbor (Manhattan distance function) model had the highest overall accuracy of 99.6%, a generic attack detection rate of 99.5% which tallies with the 5, 7, and 9 nearest neighbors models, and a false alarm rate of 0.0003 which is the same for all Manhattan nearest neighbors classification models. These instance-based methods performed better than some existing methods that even implemented an ensemble of deep-learning algorithms. Therefore, an instance-based method is recommended for detecting block-ciphers generic attacks.


Introduction
*Cryptography is the art of coding messages or information to facilitate selective communication (Bhattacharyya and Chakrabarti, 2022). In other words, it is the art and science of introducing secrecy into information security (Samoriski, 2020). Traditionally, cryptography involves manipulating letters or digits and it is based on providing security through obscurity (Aswath et al., 2022). Traditional block cipher involves the encryption of data via the manipulation of letters and digits (Nahar and Chakraborty, 2020). Traditional ciphers are usually encrypted using the symmetric key encryption method, which uses the same key for encryption and decryption (Kshirsagar and Shah, 2021). However, modern cryptography is built on the concepts of mathematics number theory, probability theory, and computational complexity theory for the encryption of data (Saračević et al., 2020). It deals with the security of digital data which is represented as strings of binary digits (Easttom, 2021). Modern cryptography utilizes one of those mathematical concepts to convert strings of plain binary digits into coded binary strings for encryption to take place. Modern symmetric encryption schemes are categorized into block ciphers and stream ciphers based on how plain binary strings are processed (Sevin and Mohammed, 2021). This study is interested in block ciphers.
Block ciphers encryption scheme processes plain strings of binary text in blocks (i.e., groups) of bits at a time. For example, the modern Advanced Encryption Standard (AES) scheme processes 128 bits of a plain string of binary texts at a time (Awan et al., 2020). Block ciphers are developed by selecting a block of plain strings of bits, performing an encryption function on the selected bits, generating a block of block cipher, and repeating these processes until all plain strings of bits are transformed into their corresponding bits of block ciphers (Bahadori et al., 2021).
To protect data integrity, cryptographic experts developed various complicated schemes of block ciphers to deter attackers from decrypting block ciphers. However, with enough computational resources, most block ciphers can be decrypted (Shetty et al., 2020). Attackers are usually aware that data or information is being communicated or transmitted, although they are encrypted or scrambled messages. Thus, they usually intrude on a digital network to attack encrypted data or information. One of these dangerous attacks is referred to as a 'generic attack on block ciphers'. A generic attack on block ciphers (Moustafa and Slay, 2015;Kumar et al., 2020) usually involves running a brute-force attack on the block ciphers regardless of the encryption structure of such block ciphers.
Cybersecurity aims to ensure the integrity, confidentiality, and availability of data, information, and resources (Gauthama Raman et al., 2020) across cyberspace consisting of billions of connected users and devices (Faker and Dogdu, 2019). Hence, various countermeasures against generic attacks are being developed (Dutta et al., 2019). One of the prominent countermeasures is the use of classification machine learning algorithms for detecting intrusions (attacks) (Wei et al., 2020). Intrusion detection systems detect network packets as a normal network packets or one of the various forms of attacks (Idhammad et al., 2018;Alsariera et al., 2020a;2021a;2021b). Most of the existing multiclassification intrusion detection systems suffer from the misclassification of attacks that shares the same characteristics (Salman et al., 2017). Salman et al. (2017) demonstrated how multiclassification intrusion detection system tends to misclassify generic attack as exploited attacks at about 51% error. Therefore, it becomes essential to develop intrusion detection models specifically for the detection of the nefarious generic attacks on block ciphers to appropriately defend against this form of attack.
To develop such an intrusion detection model, data becomes pivotal. Most of the publicly available network data such as NSL-KDD intrusion network data (i.e., an improved KDDCup'99) does not capture generic attacks (Xin et al., 2018). The UNSW-NB15 dataset (Moustafa and Slay, 2015) captured generic attacks and other forms of attacks. Meanwhile, the UNSW-NB15 dataset is usually used in research for developing multi-classification (Nawir et al., 2018) or anomaly (Feng et al., 2019) intrusion detection models. Hence, this study aims to develop a generic attack detector through the implementation of an instance-based machine learning classification algorithm. As such, the contributions to knowledge made by this research include introducing an instance-based machine learning classification algorithm to detect generic attacks at a higher detection rate and lower false alarm rate. Another contribution of this study is conducting a robust empirical analysis of various instance-based classification models to identify the best-performing model(s) to classify between generic attacks and normal packets.
The remaining sections of this study include the review of related works, methodology, results, discussion, conclusion, and future works.

Review of related works
There is more research on multi-classification models for detecting generic attacks than standalone generic attack detectors. However, most of the research on multi-classification methods does not report the performance of their model for each type of attack. In this study, we reviewed some of the published works on multi-classification models for detecting attacks that provided the performance of their model for generic attacks. Thaseen et al. (2020a; identified the performance results of implementing an integration of a majority voting ensemble of long-short-term memory deep learning method and embedded feature extraction module to detect generic attacks and other forms of attacks in a multi-classification method. The original performance of this method was a 99.9% overall accuracy for multiclassification which reduced to 95.23% accuracy for detecting generic attacks. Another study by Gharaee and Hosseinvand (2017) developed a genetic algorithm to select the best variables to detect attacks and used a support vector machine learning algorithm to fit a multiclassification model to detect generic and other forms of attacks contained in the UNSW-NB15 datasets. The study provided the specific performance of its method for detecting generic attacks. Their method was reported to detect generic attacks at a 96.69% true positive rate, misclassified normal packets, and other attacks as generic attacks at a 0.01% false alarm rate, and resulted in an overall accuracy of 97.51%.
Olasehinde (2020) implemented k-Nearest neighbor, Naïve Bayes, and Decision Tree classification algorithms as a base-learner for three different implementations of stacked ensemble methods. The stacked ensemble methods were Multiple Model Trees (MMT), Meta Decision Trees (MDT), and Multi-Response Linear Regression (MLR). This study reported the performance of various stacked ensemble models with the integration of the feature selection method for multiclassification rather than each attack. It was reviewed as an instance-based method (i.e., knearest neighbors) and was included as a based learner. The MMT ensemble method produced 96.89% overall accuracy, the MLR method had 97.8% overall accuracy, and the MDT ensemble method had 98.08% overall accuracy. Kumar et al. (2020) published a novel rule-based multi-classification method on the Generic, DOS, Exploit, and Probe attacks contained in the UNSW-NB15 dataset. The performance of the rule-based method was reported to be an overall average accuracy of 65.21{% for all classes of attacks and a False Alarm Rate of 2.01%.
Through the review of literature, multiclassification models are seen to have reduced effectiveness in detecting generic attacks. More so, the performances of the multi-classification models need improvements considering the dangerous effect of the successful execution of generic attacks.

Dataset
This study considers the UNSW-NB15 dataset as it contains contemporary normal network packets and generic attacks among others (Moustafa and Slay, 2015). The KDDCup'99  and NSL-KDD datasets do not contain the attack considered in this study (Mabayoje et al., 2016;Saleh et al., 2019).
The data used in this study is a balanced extraction of all generic attack instances in the UNSW-NB15 dataset and enough normal packet instances. This developed dataset contains all features of the original dataset besides the features 'id' and 'attack_cat' which are not relevant to this study. Therefore, the developed data contain fortytwo (42) independent features and one dependent feature with two values (i.e., generic and normal).
Hence, the developed data for this study contained 18,871 generic attack instances and 18,954 normal packets.

Implemented models
The instance-based classification machine learning algorithm is also referred to as memorybased reasoning, lazy learning, example-based reasoning, or case-based reasoning (Verma and Shakya, 2021). It is one of the available nonparametric categories of machine learning algorithms. It does not assume the inherent data distribution to develop a classification model rather it waits until the testing phase to compute the class an instance belongs to Mabayoje et al. (2019) and Sharma et al. (2019).
Although the Nearest neighbor algorithm can be used for regression and classification, it is implemented and used as a classification algorithm for detecting generic attacks on a network as befitting to this study. Regarding classification, the Nearest Neighbor algorithm simply classifies an instance based on its distance to the specified number of nearest instance(s) as illustrated in Pseudocode 1. Algorithm 1: k-Nearest Algorithm Let (Xi, Ci) where i = 1, 2, …, n be data points. Xi denotes feature values and Ci denotes labels for Xi for each i. Assuming the number of classes as 'c', ci ∈ [1,2,3, …, c] for all values of i Let x be a point for an unknown label, and k-NN find the label.

1:
Calculate d(x, xi), i = 1, 2, …,n; where d represent the distance between those points. 2: Arrange the calculate n distance in non-decreasing order 3: Let k be a positive number, select the first k distances from number 2 above.

4:
Find those k-points corresponding to the selected kdistances 5: Assign class i to x, where i is the majority label of the selected k-distances Theoretically, Nearest Neighbor assumes that data is in a feature space and its instances (or data points) are at distance among themselves. Each data instance is made up of independent variables and a class label. Also, it assumes a single positive number "k" is given which determines the number of neighbors useful for classification.
Given the fact that the value of k is pivotal in implementing a Nearest Neighbor classification algorithm and there are two class labels in the 'Generic' attack dataset, this study considered evaluating the odd values contained within the range of 1 to 10 (i.e., 1, 3, 5, 7 and 9).
A typical 1-Nearest Neighbor classification model assigns the class label of the closest instance to the predicted instance. The other k-Nearest Neighbor classification models assign most of the class labels of k instances to the predicted instance. Therefore, this study implemented 1-Nearest Neighbor and four different types of k-Nearest Neighbor to classify network traffics into either normal traffic or generic attack.
The experimental framework of this study is graphically depicted in Fig. 1. All generic network attack instances were extracted from the UNSW-NB15 training dataset to develop a dataset. An adequate number of normal traffic instances were also extracted and appended to the data to form a balanced dataset. The balanced dataset was randomly shuffled to mingle the generic attack instances and normal traffic instances within the dataset. Two distance functions were implemented (i.e., Euclidean and Manhattan) for each instancebased method before model development. Each nearest neighbors method was fitted on the randomly shuffled dataset via 10-fold crossvalidation. The 10-fold cross-validation technique fits a robust model by splitting the dataset into 10 partitions. It trains the model using the first 9 splits and tests on the set-aside split. This is repeated 10 times until all splits are used for training and testing.
The 10 models are then aggregated to produce a robust model. The total number of generic attacks and normal traffic instances that were correctly and falsely classified by the fitted models (i.e., 1, 3, 5, 7, 9-Nearest Neighbor classification models) were reported as a confusion matrix.

Performance evaluation metrics
This study aims to develop instance-based machine learning models for classifying between generic attacks and normal traffic. This is typically a binary type of classification (i.e., two class values) model. Using the populated values in the confusion matrix, (i.e., True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN)), the performances of the proposed instance-based methods were evaluated by deriving these evaluation scores, namely: Matthews Correlation Coefficient (MCC), True Positive (TP) Rate (i.e., Detection Rate), False Positive (FP) Rate (i.e., False Alarm Rate), F-Measure, and Area Under Curve (AUC) (Elijah et al., 2019;Alsariera et al., 2020a;2021a;2021b;Kasongo and Sun, 2020;Mebawondu et al., 2020;Sarumi et al., 2020;Thaseen et al., 2020a;. Additionally, the kappa value and overall accuracy (i.e., the percentage of correctly classified 'Generic' attack and normal network traffic) were calculated for each instance-based model.
Through the review of literature, the MCC score is optimal for evaluating binary classification models because it uses all the populated counts or values in the confusion matrix table (Li et al., 2020;Thaseen et al., 2020a;. MCC metric is calculated as seen in Eq. 1. Eq. 1 shows Matthews Correlation Coefficient (Li et al., 2020).
MCC measure tells the correlation coefficient among the detected and expected predictions, and its value ranges from 0 to 1 (Mebawondu et al., 2020). In this study, the MCC value of each instance-based model is used to compare the performance of each model alongside other performance evaluation measures.

Results
The 1-Nearest Neighbor model for detecting generic attacks using the Euclidean distance function correctly classified 18,773 of 18,871 generic attacks and 18,869 of 18,954 normal traffic (Table 1).  From Table 2, the 1-Nearest Neighbor model with Manhattan distance falsely classified 88 generic attacks as normal traffic and just 59 normal traffic instances as generic attacks. The values of the derived performance measures are revealed in Table  3. The comparison of performance scores of the 1-Nearest Neighbor models reveals that the 1-Nearest Neighbor model developed via Manhattan distance is better for detecting generic attacks. With higher overall accuracy, lower false alarm rate, and a higher MCC value among others, the Manhattan distance 1-Nearest Neighbor classification model for detecting generic attacks in the presence of normal network traffic is better than the Euclidean distance 1-Nearest Neighbor.
The 3-Nearest Neighbor model for detecting generic attacks using the Euclidean distance function correctly classified 18,763 of 18,871 generic attacks and 18,881 of 18,954 normal traffic (Table 4). From Table 4, the 3-Nearest Neighbor model with Euclidean distance falsely classified 106 generic attacks as normal traffic while 73 normal traffic were falsely classified as generic attacks. The 3-Nearest Neighbor model for detecting generic attacks using the Manhattan distance function correctly classified 18,765 of 18,871 generic attacks and 18,906 of 18,954 normal traffic. Table 5 shows confusion matrix for 3-nearest neighbor (Manhattan distance). From Table 5, the 3-Nearest Neighbor model with Manhattan distance falsely classified 106 generic attacks as normal traffic and just 48 normal traffic instances as generic attacks. The values of the performance measures derived from the confusion matrix are revealed in Table 6. The comparison of performance scores of the 3-Nearest Neighbor models is like the results of 1-Nearest Neighbor classification models where the Manhattan distance Nearest Neighbor had the better performance.
The 5-Nearest Neighbor model for detecting generic attacks using the Euclidean distance function correctly classified 18,762 of 18,871 generic attacks and 18,877 of 18,954 normal traffic (Table 7).  Table 7, the 5-Nearest Neighbor model with Euclidean distance falsely classified 109 generic attacks as normal traffic, and 77 normal traffic were falsely classified as generic attacks. The 5-Nearest Neighbor model for detecting generic attacks using the Manhattan distance function correctly classified 18,769 of 18,871 generic attacks and 18,899 of 18,954 normal traffic. Table 8 shows confusion matrix for 5-nearest neighbor (Manhattan distance). From Table 8, the 5-Nearest Neighbor model with Manhattan distance falsely classified 102 generic attacks as normal traffic and 55 normal traffic instances as generic attacks. The values of the performance measures derived from the confusion matrix are revealed in Table 9.
The comparison of performance scores of the 5-Nearest Neighbor models is like the 3-Nearest Neighbor classification models where the Manhattan distance Nearest Neighbor had the better performance.
The 7-Nearest Neighbor model for detecting generic attacks using the Euclidean distance function correctly classified 18,761 of 18,871 generic attacks and 18,878 of 18,954 normal traffic (Table 10).  From Table 10, the 7-Nearest Neighbor model with Euclidean distance falsely classified 110 generic attacks as normal traffic, and 76 normal traffic were falsely classified as generic attacks. The 7-Nearest Neighbor model for detecting generic attacks using the Manhattan distance function correctly classified 18,770 of 18,871 generic attacks and 18,898 of 18,954 normal traffic. Table 11 shows confusion matrix for 7-nearest neighbor (Manhattan distance). From Table 11, the 7-Nearest Neighbor model with Manhattan distance falsely classified 101 generic attacks as normal traffic and 56 normal traffic instances as generic attacks. The values of the performance measures derived from the confusion matrix are contained in Table 12. The 7-Nearest Neighbor (Manhattan Distance) classification model performed better than its Euclidean distance counterpart.
The 9-Nearest Neighbor model for detecting generic attacks using the Euclidean distance function correctly classified 18,760 of 18,870 generic attacks and 18,873 of 18,873 normal traffic (Table 13). The 9-Nearest Neighbor model with Euclidean distance falsely classified 111 generic attacks as normal traffic and 81 normal traffic were falsely classified as generic attacks. On the other hand, the 9-Nearest Neighbor model for detecting generic attacks using the Manhattan distance function correctly classified 18,774 of 18,871 generic attacks and 18,898 of 18,954 normal traffic. Table 14 shows the confusion matrix for 9-Nearest Neighbor (Manhattan Distance).   The comparison of performance scores of the 9-Nearest Neighbor models reveals that the 9-Nearest Neighbor model developed via Manhattan distance is better for detecting generic attacks.

Discussion
This study aims to develop an optimal instancebased machine learning model capable of detecting generic attacks on block ciphers. The implementation of the proposed experimental framework led to the development of ten (10) instance-based classification models. Five (5) numbers of nearest neighbors values were set (i.e., 1, 3, 5, 7, and 9) and two distance functions were implemented (i.e., Euclidean and Manhattan distance functions). The models were developed using a 10fold cross-validation model and their classification performances were reported via a confusion matrix. Other performance metrics values were calculated from the confusion matrix values.
The better instance-based classification model (based on the accuracy, detection rate, and false alarm rate) for each k value were all selected and tabulated (Table 16). From the comparative results, all nearest neighbors classification models using the Manhattan distance function are better than their Euclidean distance counterparts across all k values for detecting generic attacks and correctly identifying normal network packets. This performance can be safely attributed to how both distance function calculates the distance between two data points.
All nearest neighbors classification models using the Manhattan distance function shared the same MCC value of 0.922. Similarly, they shared the same AUC value of 0.99 except for the 9-nearest Neighbor (Manhattan distance) classification model with an AUC value of 1.0.
The nearest neighbors (Manhattan distance) classification models were all able to classify between normal packets and generic attacks at the lowest accuracy of 99.5849% and the highest accuracy of 99.6114%. These models detected generic attacks at 99.5% except for the 3-nearest neighbors (Manhattan distance) classification model with a detection rate of 99.4%.
Considering the false alarm rate, the nearest neighbors classification models using the Manhattan distance function shared the same value (i.e., 0.003). However, the 3-nearest Neighbor (Manhattan distance) had the lowest number of false positives. This model misclassified only 48 normal traffic as generic attacks (Table 6), at the expense of increasing false negatives (i.e., 106 misclassified generic attacks as normal packets).
Across all Manhattan distance nearest neighbors for classifying between normal network packets and generic attacks, the 1-nearest neighbor's classification model had scored the highest accuracy and higher number of detected generic attacks. However, the 3-nearest neighbor's classification model detected more normal packets and lower false alarm rates. This can be safely translated into a realtime application that for every host on a network under a generic attack, the next host is more likely to be under the same attack. Also, for every 3 closely distanced hosts transmitting normal packets, the fourth host is more likely to transmit packets.
In comparison to existing methods, the instancebased classification method of this study performed better than the reviewed methods. This study's instance-based models detected generic attacks better than the 95.25% detection rate of the majority vote ensemble deep learning method published by Thaseen et al. (2020a;. This study's instance-based methods detected generic attacks on blockciphers better than all three stacked ensemble methods of study Olasehinde (2020) with 96.89%, 97.8%, and 98.08% accuracies. The novel rule-based method (Kumar et al., 2020) of 65.21% overall accuracy and 2.01% false alarm rate was outperformed by all implemented instance-based classification models for detecting generic attacks. Given the fact that the instance-based method of this study is specifically trained to detect between normal packets and generic attacks, the successful performance of this study's method is plausible.

Conclusion and future works
In conclusion, the aim of this study to introduce, implement and evaluate an instance-based for detecting generic attacks on block ciphers was fulfilled.
The development and evaluation of various nearest neighbors classification methods showed great performance in detecting generic attacks on block ciphers. The overall accuracies of the various methods that were implemented in this study were over 99% while detecting generic attacks at a 99.4% rate at the very least. All nearest neighbors models for detecting generic attacks on block ciphers maintain a low false alarm rate of 0.0003.
In comparison with existing methods, the proposed instance-based methods of this study performed better than all existing multiclassification methods as the study's method is customized to detect generic attacks.
This study does not consider feature selection of the generic attacks variable to investigate if a lesser number of variables can also lead to such high detection performance of generic attacks. Conducting such empirical research to ascertain if feature selection will make or mar the performance of instance-based generic attack detection is the most prominent future work of this study.

Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.