In this project, basic anomaly based intrusion detection is carried out using binary classification method. This method makes use of Naive Bayes as the classifier and is implemented in Python.
This project was implemented in Python using Naive Bayes for Binary Classification into Attack or Normal category.
The dataset used here is the UNSW-NB15 dataset (both the Training and Testing datasets were used). This dataset has 45 features and more than 175000 rows.
The project has been implemented in Python and the dependencies necessary for the project include Pandas, Numpy, Matplotlib, and Scikit-Learn.
Workflow of the project consists of the following stages -
1. Loading and exploring the dataset - It is essential to understand and explore the dataset before performing any manipulations on it. This allows for any necessary manipulation to be more precise and accurate and allows us to understand what happens to the data when we perform different actions on it.
The dataset is then split for further processing and for allowing training of the model.
2. Preprocessing - This is done in order to clean the data and make it more useful.
Here, the following are carried out -
1. Removal of NaN values
2. Removal of Attack_cat column since the project is mainly concerned with Binary Classification.
3. Normalization - Min-Max Normalization is carried out to bring the values between 0 and 1.
4. Encoding - OneHotEnoding is carried out to convert categorical values to numeric values.
4. Classification - This is done by making use of the Naive Bayes model as classifier. The model is used to detect and classify input data into attack or normal categories.
The Naive Bayes Model is used as classifier in this case.
The Accuracy and Classification Report are as follows -
Classification reports for the above method was also generated in order to provide more insight into the efficiency of the model.
Important files in the project -
1. se_ids.py - Contains the source code for the project.
2. UNSW_NB15_training-set.csv - Contains the training dataset used in the project.
3. UNSW_NB15_testing-set.csv - Contains the testing dataset used in the project.
Output of the above model -