PERFORMANCE EVALUATION OF DEEP LEARNING TECHNIQUES IN THE DETECTION OF IOT MALWARE

- Internet of Things (IoT) equipment is rapidly being used in a variety of businesses and for a variety of reasons (for example, sensing and collecting data from the environment in both public and military settings). Because of their expanding involvement in a wide range of applications and their rising computational and processing capabilities, they are a viable attack target for malware tailored to infect specific IoT devices. This study investigates the potential of detecting IoT malware using different deep learning techniques: the classic feedforward neural network (FNN), convolutional neural networks (CNN), long short-term memory (LSTM), and recurrent neural networks (RNN). The proposed method analyses the execution operation codes of IOT app sequences using modern NLP (natural language processing) methods. The current work utilized an IoT application dataset with 500 malware (collected from the IOTPOT dataset) and 500 goodware samples to train the proposed algorithms. The trained model is tested against 2971 fresh IoT malware and goodware samples. The samples were input into deep learning models, and performance metrics were obtained. The results demonstrate that the RNN model had the best accuracy ( 99 . 19 %) in detecting fresh malware samples. On the other hand, the results were compared by the time required for training; the CNN model shows that it could achieve high accuracy ( 98 . 05 %) with less training time. A comparison with various deep learning classifiers demonstrates that the RNN and CNN techniques produce the best results.


I. INTRODUCTION
The Internet of Things (IoT) is, in fact, a networked collection of devices that can detect, process, and transmit data [1]- [4].There are several uses for the Internet of Things, including medical, mobility, intelligent buildings, urban governance, and agriculture [5]- [9].By 2025, it is anticipated that more than 63 million Internet of Things devices will indeed be on the marketplace [10].According to Ericsson's projections, roughly 3.5 billion cellular Internet of Things connections will be established by 2023 [11].Because of the extensive usage and vital role of the Internet of Things networks, cybercriminals have devised harmful and complex cyberattacks targeting IoT end nodes to exploit IoT nodes and infrastructural facilities [12]- [14].Mirai was among the first malware strains to universally use the Internet of Things.Mirai orchestrated a botnet of compromised Internet of Things devices to launch a Distributed Denial of Service (DDoS) cyberattack [15].The release of Mirai's script demonstrated that it is effortless to develop harmful IoT-based malware cyberattacks.
For example, the "BrickerBot" malware grows by using the Mirai source code, joins the infected system to a botnet, erases the firmware, and completely reboots the device after infection.Different and intelligent techniques have been proposed to detect cyberattacks; one of the smart techniques is the deep learning technique.Deep Learning (DL) represents a technique that has been widely used to improve the accuracy and solidity of cybersecurity detection and defense systems [16,17].DL's capacity to understand complex patterns inside complex intrusions and its resistance against unanticipated hostile attempts make it a potential solution for identifying and safeguarding IoT networks from cyberattacks.DL approaches are This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).frequently used in the fields of security, privacy, and forensics [14], [18]- [22].The current work aims to investigate and report on the results of applying various deep learning models to a gathered dataset of IoT software products through extracted Operational Code (OpCode) from malicious and benign recordings using the proposed technique.The rest of this paper is arranged as follows: Section II evaluates pertinent literature, Section III details the technique adopted, Section IV discusses the methodology used, Section V analyzes the experimental data, Section VI examines the findings, and Section VII wraps up this article.

II. RELATED WORKS
The methods that are now in use to particularly identify malware that uses OpCode sequences are described in this section.
Azmoodeh et al. and Naveen et al. [23,24] suggested converting OpCodes into vector spaces and using a deep Eigenspace technique to distinguish between malicious and benign files.Azmoodeh et al.Jeon J. et al. [26] proposed a hybrid malware detection model called "HyMalD."This model conducts dynamic and static analyses concurrently to discover disguised malware, which static analysis alone cannot detect.First, it extracts static aspects of the opcode sequence using a predefined dataset and then dynamically collects the API call sequence.The retrieved features are trained using the Bi-LSTM and SPP-Net models.This model reached an accuracy of 92.5%.
Jahromi et al. [27] constructed a revised "Two-hidden-layered Extreme Learning Machine (TELM)."This relies on malware sequence components, such as OpCode sequences, in malware detection.This model proposes using a method that benefits from avoiding backpropagation when training neuron networks by using partly linked networks between the input and the first hidden layer.In the second layer, these are subsequently aggregated into a fully linked network.Finally, they employ an ensemble to raise the system's reliability and accuracy for detecting malware threats.Compared to stacked LSTM and CNN, the suggested technique expedites the learning and detection phases of malware detection and achieves an accuracy of 99.65%.
Radhakrishnan et al. [28]   The proposed model achieves an accuracy of 99.98%.
All of the proposed approaches still need to be improved to detect malware with very high detection accuracy while keeping model complexity to a minimum.According to the previously shown relevant research, a lightweight solution is highly required in an IoT environment, which can be achieved by fine-tuning the hyperparameter of the deep learning algorithms.This study aimed to create a deep learning model that efficiently detects IoT malware with as little complexity as possible.

III. DEEP LEARNING ALGORITHMS
Deep learning algorithms have gained popularity in malware detection in recent years.In this section, some of the widely used algorithms are introduced.

A. Convolutional Neural Networks (CNN)
One deep learning method well-known for handling image analytics is CNN.CNN has lately been widely employed for textual analysis and sentiment analysis.Additionally, it is recognized as the regularized kind of multilayer perceptron, which resists overfitting and mimics biological neurons in the human brain.
Like other deep learning approaches, CNN comprises one output, an input layer, and numerous hidden layers.There are mainly two types of hidden layers in a convolutional neuron network, which are: a) the convolution layer, which is responsible for applying filters to the data and learning the features from the filter output.b) the max-pooling layer.Here is where the pooling action takes place, choosing the most significant number of elements from the feature map region the filter has covered.The most prominent aspects of the prior feature map would therefore be included in the output after the max-pooling layer, which would be a feature map [30].

B. Simple Recurrent neural network (RNN)
As the name suggests, an RNN is a neural network that repeatedly uses the prior output as an input.RNN's nature makes it suited for prediction problems.
The most notable characteristic of an RNN is the hidden state, which aids in storing details for each sequence.The settings for its parameters that forecast the series are the same for all hidden layers doing the same task [31].
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).LSTM is a unique form of RNN designed to overcome the difficulty of long-term dependencies.It has been focusing on text processing for better results [31].

D. Feedforward Neural Network (FNN)
Artificial neural networks, called feedforward neural networks, do not have cyclical connections between the units.RNNs are more complex than feedforward neural networks, the first artificial neural network type to be created.They are known as "feedforward" networks because information only moves forward (there are no loops) between the input nodes, hidden nodes, and output nodes in that order [32].
IV. METHODOLOGY This section will discuss the methodology used to evaluate the suggested deep learning models.As shown in Fig. 1, the process consists of two stages (training and testing), each consisting of several sub-phases âmainly data gathering, OpCode extraction, pre-processing, and training and testing the deep learning models.The only difference between the two stages is the feature selection phase in the training stage, which requires selecting features that contribute the most to classifying the files into benign and malware.Following is a simple description of the operation done in each phase: • Data gathering and OpCode extraction: this phase involves collecting benign and malware IoT executables and using reverse engineering tools; the operational codes are extracted and saved in a text file for the training and testing datasets.
• Feature selection: in the training stage, the features that contribute the most to accurately classifying the executable files as benign and malicious were selected using the information gain technique.
• In pre-processing to convert the training and testing datasets to a format understood by the deep learning models, text filtering and tokenizing techniques are used.
• Training and testing: these phases involve feeding the data to the deep learning models and evaluating their performance in classifying the files as benign or malware using evaluation metrics.The latest DL approaches for identifying IoT malware were evaluated using an IoT dataset consisting of 3971 samples in text file format.This dataset has 724 samples of goodware and 3247 samples of malware.The malware files were obtained from the IoTPoT dataset, which contains malware executable files used to extract the malicious OpCode.This dataset was provided by a cybersecurity research team led by Prof. Katsunari Yoshioka at Yokohama National University.The benign opcode was extracted from the Ubuntu Linux system's system files residing in the paths /usr/bin, /sbin, and /bin.The most recent version of the IoTPoT dataset [33,34] can be found on this site: https://sec.ynu.codes/iot.

B. Dataset Preparation
A text corpus was constructed using simple Python code to showcase various feature engineering and representation approaches, resulting in a data frame with two columns, one of which involves a succession of operational codes such as ADD, MOV, SUB, and PUSH.The other column specifies the sample category, such as "goodware" or "malware."The second step is to select the OpCodes that contribute the most to correctly classifying the file as malicious or benign.The information gain technique was used as a feature selection method for this step.Eq. ( 1  threshold (α > 0.30).The third step involved using text filtering and the tokenizer technique, which filters the text from any unnecessary special characters and assigns each OpCode a numeric value.Then aggregate these numeric values in a list that represents the opcode, and finally use the padding technique to make these lists the same size by post-padding with zeroes.Fig. 2 shows the order of the pre-processing steps.
Figure 2: Pre-processing steps After shaping the data into a format understood by the deep learning models, the data was fed to four deep learning models to train them and evaluate them individually using the training and testing datasets.

V. EXPERIMENT RESULTS
The proposed models are evaluated with different settings to achieve the best possible accuracy.The first thing the models were tested against was how accuracy and training time would affect the number of OpCodes in each text file.times.After selecting the suitable number of OpCodes, the models were evaluated against the batch size, a hyperparameter specifying how many samples must be processed before the inner model parameters are updated.Fig. 3 (c), Fig. 3 (d), and Table 2 show the accuracy and training time provided by the models.Some models' performance was excellent in one batch size group and bad in another group.Based on the result above, the 32 and 64 batch sizes are so close in their performance that the batch size is chosen to be 32 as it provided the best accuracy possible and acceptable training times.Typically, the following metrics are used to assess deep learning's efficiency in detecting attacks: • True Positive (TP): signifies that a malicious program has been successfully detected as harmful.
• True Negative (TN): signifies that a benign program was accurately identified as non-malicious.
• False Positive (FP): implies that a benign program was incorrectly identified as harmful.
• False Negative (FN) indicates that malware was not discovered and classified as benign.
Additionally, the following measures are used to determine the performance of a deep learning model: 1) Accuracy, the proportion of samples is appropriately classified.
2) Precision is the percentage of correctly predicted malware samples.Precision = T P T P + F P 3) Recall The proportion of malware that was accurately classified Recall = T P T P + F N 4) The F1 Rating is a summed recall and accuracy score.
A. processing environment All testing was conducted on a PC with 8 GB of memory and a basic Core i7 (1.8GHz) processor.Furthermore, the source code was written in Python 3.9.5, and the library for the DL work used Keras version 2.3.

B. Deep learning models final settings
Table III shows the final setting of the deep learning model, which involves the defied layers, the number of epochs used for training the models, the activation functions, the used optimizer, and the used batch size.

C. Accuracy
The accuracy measure was generated using Eq. 2 to evaluate the accuracy of DL models in distinguishing malware from benign samples, and  The recall of all models was evaluated using Eq. ( 4), as shown in Fig.  initially, each model is evaluated with a batch size of 64,128 , or 512 records, which provided appropriate timing but poor performance for all models, as opposed to a smaller batch size, such as 32 records, which produces a better result but takes longer to process.As shown in Fig. 4 and Table 4, the RNN model reached 99% accuracy in five epochs, and the training took only 356 seconds (about 5.93 minutes).Compared to the proposed RNN model, the proposed CNN was a bit less accurate, with an accuracy rate of 98%, but it took less training time, about 13 seconds in five epochs.For both models (RNN and CNN), the loss for training and validation is shown in Table 4, and it shows that the model can manage this type of data without encountering overfitting.
On the other hand, the LSTM and FNN models show much less accuracy than the previous models, with an accuracy of 97% and 88%, respectively.Additionally, Table II shows an enormous difference between testing and training loss for the FNN model, which indicates the data is overfitted, but this model was the fastest in training time; it takes only 2 seconds.
Lastly, the LSTM model gave us an accuracy of 97%, but it took the longest training time, about 6.13 minutes, to be precise, and the model did not face any overfitting.The paper also evaluates the models based on other matrices.Figures 5-7 show the model's performance based on the precision, recall, and F1 measurement; for all models, they achieve the same precision of 92%, indicating that the models messed up in different places but have the same precision.In terms of recall, the LSTM model and the CNN model manage to have the same recall values of 90%, and for the RNN and FNN models, as before, they show results of 91% and 82%, respectively.Lastly, the F1 measurement was calculated, and the RNN model showed the highest results of 92%, followed by the CNN and LSTM models, which gave us a measure of 91%.Finally, the FNN model received 87%.
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

VII. CONCLUSION
With the advent of the IoT and the widespread usage of IoT networks to deliver a wide range of high-quality electronic services, fraudsters are attempting to exploit IoT networks and devices to damage their performance.Furthermore, they endanger the privacy of information sensed, processed, and sent through networks, and there is an ongoing battle between virus writers and security groups.This article used four alternative models on a collection of malicious and benign IoT samples to analyze and evaluate state-of-the-art deep learning algorithms.Several tests were conducted to examine the dataset, sample distribution, and score of every attribute for classification tasks.Overall, the RNN and CNN models provided the highest accuracy, at 99.19% and 98.05%, respectively.This result can be improved by using an ensemble of the two models.At the same time, maintain a low training time by efficiently tuning the hyperparameters of the models.
www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 suggest feeding the vector space values directly to the Eigenspace model without change; on the other hand, Naveen et al. created N-gram opcode sequences and fed them to the model.Azmoodeh et al. and Naveen et al. use accuracy matrices to evaluate their model and achieve an accuracy of 99.68% and 98.37%, respectively.Hamad S. et al. [25] present a simple cross-architecture antimalware solution for IoT devices.The proposed approach examines the sequence representations of the executable file's operation codes (OpCodes) by utilizing Deep Bidirectional Representations from Transformers (BERT) and embedding an advanced natural language processing (NLP) method.The retrieved sentence embedding from BERT is input into a hybrid CNN, Bi-LSTM, and LocAtt model that has been customized to capture contextual information and long-term relationships between OpCode sequences.The proposed deep learning (DL) model incorporates the convolutional neural networks (CNN), local attention mechanisms (LocAtt), and bidirectional long short-term memory (Bi-LSTM) in one model to detect malware accurately.The accuracy acquired by BERTDeep-Ware reached 99.93%.
proposed a technique involving using a recurrent neural network (RNN) model to analyze ARM-base OpCode sequences.The authors used a small dataset composed of 271 files of benign IoT OpCodes and 282 This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 files of malicious IoT OpCodes to train their model; they then used 104 unknown files of OpCodes to test their model, which achieved an accuracy of 99.08%.Vasan et al. [29] discuss a robust cross-architecture IoT malware threat detection system utilizing advanced ensemble learning and propose a malware hunting methodology based on advanced ensemble learning (MTHAEL).Their proprietary MTHAEL model, which improves existing methods for detecting IoT malware, uses a stacking ensemble of heterogeneous pattern selection techniques to see it.The model involves using an ensemble of RNN and CNN deep learning algorithms and training the model on a dataset composed of 15,482 files of malware OpCodes and 5,655 files of benign OpCodes.

Figure 1 :
Figure 1: (a) shows steps conducted when training the models, (b) shows steps conducted when testing the models ) shows the information gain formula, where an OpCode feature called f, c is how many classes available (malicious and non-malicious); and Dv is the OpCode flow where the feature f may be found.Wi is the percentage of DV in the training dataset that belongs to class i. Sorting each IG's values into decreasing order allowed us to determine the most important characteristics necessary for defining a This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

Fig. 4
shows model performance.As shown,the RNN model outperforms other models and obtains an accuracy of 99%.Next in line was the CNN model, with an accuracy of 98%, and the LSTM and FNN models gave the least accuracy of 97% and 88%, respectively.

Figure 4 :Figure 5 :
Figure 4: Accuracy measured by the Dl model (6).The figure shows that the LSTM model and CNN model gave the same recall of 90%.And the RNN model gave us 91%, and the FNN model gave us 82%.

TABLE I A
COMPARISON BETWEEN THE OPCODE NUMBER AND THE PROVIDED ACCURACY/TRAINING-TIME Based on the obtained result, An OpCode number equal to 302 is chosen , which gives acceptable accuracy and training This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362

TABLE II A
COMPARISON BETWEEN THE BATCH SIZE AND THE PROVIDED ACCURACY / TRAINING-TIME

TABLE III FINAL
SETTING FOR THE DL MODELS

TABLE IV TRAINING
AND TESTING RESULTS FOR THE IMPLEMENTED MODELS