Abstract
In recent years, the widespread use of Internet of Things (IoT) applications has contributed to the development of smart cities, but with the growth of smart city networks, the risk of cybersecurity threats and attacks increases. Despite the spread of many security mechanisms such as encryption technologies and firewalls, it is impossible to avoid various attacks on IoT networks. To address this problem, machine learning has been used as an effective tool to detect attacks. This is done by applying a number of supervised classification algorithms to a dataset. This study reviews some of the common datasets for detecting intrusions into networks in general and IoT networks in particular, the most important of which are: KDD Cup '99, Kyoto2006+, NSL-KDD, UNSW-NB15, CIC-IDS 2017, CSE-CIC-IDS 2018. In addition to comparing them based on the number of features in each, the presence of recent attacks, the total number of records and the number of attack categories. At the end of this paper, the most important previous studies that dealt with the application of some machine learning algorithms on the data set under study were reviewed and the performance indicators were summarized, including accuracy and training time of the algorithms.
References
M. M. Rashid, J. Kamruzzaman, M. M. Hassan, T. Imam, and S. Gordon, “Cyberattacks detection in iot-based smart city applications using machine learning techniques,” International Journal of environmental research and public health, vol. 17, no. 24, pp. 9347, 2020.
Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,” Transactions on Emerging Telecommunications Technologies, vol. 32, no. 1, pp. e4150, 2021.
M. Sarhan, S. Layeghy, and M. Portmann, “Feature analysis for machine learning-based IoT intrusion detection,” arXiv preprint arXiv:2108.12732, 2021.
R. Alshamy, and M. Ghurab, “A review of big data in network intrusion detection system: Challenges, approaches, datasets, and tools,” Journal of Computer Sciences and Engineering, vol. 8, no. 7, pp. 62-74, 2020.
M. Leon, T. Markovic, and S. Punnekkat, “Comparative Evaluation of Machine Learning Algorithms for Network Intrusion Detection and Attack Classification,” in 2022 international joint conference on neural networks (IJCNN), 2022, pp. 01-08.
M. G. Solomon, and M. Chapple, Information security illuminated: Jones & Bartlett Publishers, 2004.
S. Sriram, K. Simran, R. Vinayakumar, S. Akarsh, and K. Soman, “Towards evaluating the robustness of deep intrusion detection models in adversarial environment,” in International Symposium on Security in Computing and Communication, 2019, pp. 111-120.
E. Alpaydin, Introduction to machine learning: MIT press, 2020.
د. ع. طعيمة, تعلم الآلة وعلم البيانات : الأساسيات والمفاهيم والخوارزميات والادوات, p.^pp. 465, 2022.
A. Sugandhi. "Feature Engineering for Machine Learning," Sep 5th, 2023.[online]. Available: https://www.knowledgehut.com/blog/data-science/feature-engineering-for-machine-learning. [Accessed:Oct 3rd 2023].
G. Kocher, and G. Kumar, “Analysis of machine learning algorithms with feature selection for intrusion detection using UNSW-NB15 dataset,” Available at SSRN 3784406, 2021.
N. Kaur, M. Bansal, and S. S. Sran, “Scrutinizing attacks and evaluating performance appraisal parameters via feature selection in intrusion detection system,” 2021.
M. A. Arif. "Confusion Matrices and Classification Reports: A Guide to Evaluating Machine Learning Models," Apr 3th, 2023.[online]. Available: https://smuhabdullah.medium.com/confusion-matrices-and-classification-reports-a-guide-to-evaluating-machine-learning-models-385496cf7cee. [Accessed: Feb20th,2024].
M. A. Umar, and C. Zhanfang, “Effects of Feature Selection and Normalization on Network Intrusion Detection,” Authorea Preprints, 2023.
M. A. Siddiqi, and W. Pak, “Optimizing filter-based feature selection method flow for intrusion detection system,” Electronics, vol. 9, no. 12, pp. 2114, 2020.
S. Yemulwar. "Feature Selection Techniques," Sep 27th, 2019.[online]. Available: https://medium.com/analytics-vidhya/feature-selection-techniques-2614b3b7efcd. [Accessed: Jan 8th,2024].
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine learning, vol. 63, pp. 3-42, 2006.
S. Moualla, K. Khorzom, and A. Jafar, “Improving the performance of machine learning-based network intrusion detection systems on the UNSW-NB15 dataset,” Computational Intelligence and Neuroscience, vol. 2021, pp. 1-13, 2021.
G. Chandrashekar, and F. Sahin, “A survey on feature selection methods,” Computers & Electrical Engineering, vol. 40, no. 1, pp. 16-28, 2014.
A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, and A. Anwar, “TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems,” Ieee Access, vol. 8, pp. 165130-165150, 2020.
D. D. Protić, “Review of KDD Cup ‘99, NSL-KDD and Kyoto 2006+ datasets,” Vojnotehnički glasnik/Military Technical Courier, vol. 66, no. 3, pp. 580-596, 2018.
M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in 2009 IEEE symposium on computational intelligence for security and defense applications, 2009, pp. 1-6.
E. Hassan, M. Saleh, and A. Ahmed, “Network intrusion detection approach using machine learning based on decision tree algorithm,” Journal of Engineering and Applied Sciences, vol. 7, no. 2, pp. 1, 2020.
M. Latah, and L. Toker, “Towards an efficient anomaly‐based intrusion detection for software‐defined networks,” IET networks, vol. 7, no. 6, pp. 453-459, 2018.
M. Ghurab, G. Gaphari, F. Alshami, R. Alshamy, and S. Othman, “A detailed analysis of benchmark datasets for network intrusion detection system,” Asian Journal of Research in Computer Science, vol. 7, no. 4, pp. 14-33, 2021.
S. Meftah, T. Rachidi, and N. Assem, “Network based intrusion detection using the UNSW-NB15 dataset,” International Journal of Computing and Digital Systems, vol. 8, no. 5, pp. 478-487, 2019.
N. Moustafa, and J. Slay, “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 military communications and information systems conference (MilCIS), 2015, pp. 1-6.
N. Elmrabit, F. Zhou, F. Li, and H. Zhou, “Evaluation of machine learning algorithms for anomaly detection,” in 2020 international conference on cyber security and protection of digital services (cyber security), 2020, pp. 1-8.
M. Rodríguez, Á. Alesanco, L. Mehavilla, and J. García, “Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection,” Sensors, vol. 22, no. 23, pp. 9326, 2022.
A. Thakkar, and R. Lohiya, “A review of the advancement in intrusion detection datasets,” Procedia Computer Science, vol. 167, pp. 636-645, 2020.
UNSW-NB15 Dataset.[online]. Available: https://github.com/abhinav-bhardwaj/IoT-Network-Intrusion-Detection-System-UNSW-NB15/tree/master/datasets. [Accessed: Jan 7th, 2024].
D. Jing, and H.-B. Chen, “SVM based network intrusion detection for the UNSW-NB15 dataset,” in 2019 IEEE 13th international conference on ASIC (ASICON), 2019, pp. 1-4.
S. Bagui, E. Kalaimannan, S. Bagui, D. Nandi, and A. Pinto, “Using machine learning techniques to identify rare cyber‐attacks on the UNSW‐NB15 dataset,” Security and Privacy, vol. 2, no. 6, pp. e91, 2019.
I. Alrashdi, A. Alqazzaz, E. Aloufi, R. Alharthi, M. Zohdy, and H. Ming, “Ad-iot: Anomaly detection of iot cyberattacks in smart city using machine learning,” in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), 2019, pp. 0305-0310.
M. Hammad, W. El-Medany, and Y. Ismail, “Intrusion detection system using feature selection with clustering and classification machine learning algorithms on the unsw-nb15 dataset,” in 2020 international conference on innovation and intelligence for informatics, computing and technologies (3ICT), 2020, pp. 1-6.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright (c) 2024 Sebha University Conference Proceedings