Optimizing cloud-based intrusion detection systems through hybrid data sampling and feature selection for enhanced anomaly detection

Authors

  • Sadargari Viharika
  • N. Alangudi Balaji

DOI:

https://doi.org/10.6977/IJoSI.202504_9(2).0007

Keywords:

Anomaly Detection, Data Optimization, Intrusion Detection System, Machine Learning

Abstract

To enhance detection accuracy in network intrusion scenarios, this study proposes an optimized intrusion detection system (IDS) framework that integrates advanced data sampling, feature selection, and anomaly detection techniques. Leveraging random forest (RF) and genetic algorithm, the framework optimizes sampling ratios and identifies critical features. In contrast, the isolation forest algorithm detects and excludes outliers, refining dataset quality and classification performance. Evaluated on the UNSW-NB15 dataset, comprising over 2.5 million records and 42 diverse features, the proposed framework demonstrates significant improvements in anomaly detection, including reduced false alarm rates and enhanced identification of rare threats, such as shellcode, worms, and backdoors. Experimental results reveal that the RF-based model achieves an F1 score of 91.8% and an area under the curve (AUC) of 96%, outperforming traditional machine learning models and standalone RF classifiers. The integration of extreme gradient boosting (XGB) and its optimized variant, XGBGA, further enhances the framework, with XGBGA achieving the highest performance metrics, including an F1 score of 92.8% and an AUC of 97%. These findings underscore the importance of data optimization strategies in improving the accuracy and reliability of IDSs, particularly in handling imbalanced datasets and diverse network traffic. Future work will focus on real-time processing capabilities to handle streaming data and expanding the framework’s applicability to domains such as fraud detection and cybersecurity, where precise anomaly detection is essential.

Downloads

Published

2025-04-08