Improving Network Security with Artificial Intelligence

November 30, 2024

Network Traffic Type Prediction Project

This project focuses on enhancing network security through AI-powered solutions. By leveraging advanced machine learning techniques, the system detects anomalies, predicts threats, and recognizes attack patterns with greater accuracy and speed compared to traditional security systems. Unlike rule-based methods, which are reactive and resource-intensive, AI-driven security is proactive, adaptable, and scalable, offering precise detection with minimal false positives. The project aims to improve the overall efficiency of network monitoring, providing a more effective approach to safeguarding digital infrastructures.

Check out the website

https://minor-project-rupam.streamlit.app/

Project Repository

https://github.com/Trident09/net-sec-ai-MP

Colab Notebook

https://colab.research.google.com/drive/1ka2RMIRDwdXqTJlEDX71IxsBBulrA5Fu?usp=sharing

Dataset

https://www.unb.ca/cic/datasets/ids-2017.html

This project aims to predict traffic patterns based on network traffic data using machine learning. The project is organized into three main directories:

Dataset: Contains the CICIDS dataset used for model training and testing.
Frontend: Includes the Streamlit app for predicting traffic patterns, the trained model (trained_model-neuralnetwork.keras).
Training: Contains the scripts and notebooks used to train the model, including data preprocessing, feature engineering, model training, and evaluation (mainly in Google Colab and Jupyter notebooks).

Project Overview

The goal of this project is to predict traffic patterns (either “True” or “False”) based on the features of network traffic. The application provides the following:

Training Pipeline: A training pipeline that processes the dataset, trains a model, and saves the trained model for later predictions.
Prediction App: A user-friendly web application built with Streamlit that allows users to upload CSV files containing network traffic data and get predictions using the pre-trained model.

Folder Structure

traffic-prediction-project/
├── Dataset/
│   ├── CICIDS_2017_traffic_data.csv      # Network traffic dataset
│   ├── ...                               # Other data files
│
├── Frontend/
│   ├── streamlit-app.py                  # Streamlit app for predictions
│   ├── trained_model-neuralnetwork.keras # Pre-trained model file
│   ├── other model files                 # Random forest and standardisation model files
│   ├── ...                               # Other frontend assets
│
├── Training/
│   ├── Minor_minor_neuralnetwork.ipynb        # Google Colab/Jupyter notebook for training the model
│   ├── ...                               # Other training scripts

How the Project Works

1. Dataset Folder:

This folder contains the CICIDS dataset, which includes network traffic data used to train and test the model. The dataset contains various features such as packet length, flow duration, and flags, which are used to predict traffic anomalies or classify traffic.

2. Training Folder:

This folder contains the scripts and notebooks used for training the model. The data preprocessing steps, feature engineering, model training, and evaluation are implemented here. The trained model and scaler are saved to files (trained_model-neuralnetwork.keras), which is then used by the Streamlit frontend for predictions.
- You can run the Jupyter notebook (Network_minor_neuralnetwork.ipynb) to reproduce the training pipeline to train the model from scratch.

3. Frontend Folder:

This folder contains the Streamlit app (app.py), which is the interface where users can upload CSV files containing traffic data. The app uses the trained model (trained_model-neuralnetwork.keras) to predict labels and display prediction probabilities.
The app shows the uploaded data and the predictions in real-time, including the predicted labels and the probability for each class.

Technologies Used

Python: The primary programming language used for this project.
Pandas: For data manipulation and analysis.
NumPy: For numerical computations and array operations.
Scikit-learn: For machine learning models, preprocessing, and model evaluation.
Joblib: For saving and loading the trained model (if applicable).
Streamlit: For building the web application for predictions.
Google Colab/Jupyter Notebooks: For data preprocessing, feature engineering, and model training.
Matplotlib/Seaborn: For data visualization.
TensorFlow: For building and training the neural network model.
Keras: For defining and managing the neural network layers and training process.
Requests: For making HTTP requests if needed (e.g., for API calls).
Scikit-learn (train_test_split, LabelEncoder, StandardScaler): For splitting the data, label encoding, and standardizing features.
Confusion Matrix (from sklearn.metrics): For evaluating model performance.

Presentation

Improve-Network-Security-with-Artificial-Intelligence by Rupam Barui

How to Set Up the Project

1. Clone the Repository

Start by cloning the repository to your local machine:

git clone https://github.com/Trident09/net-sec-ai-MP.git

2. Set Up the Environment

It is recommended to use a virtual environment to install the required dependencies.

Navigate to the project directory:
```
cd net-sec-ai-MP
```
Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```

3. Install Dependencies

Install the required dependencies for the project in each folder according to need:

pip install -r requirements.txt

Alternatively, you can install the necessary libraries manually if requirements.txt is not provided:

pip install streamlit pandas numpy scikit-learn joblib

4. Train the Model (if not already trained)

If you haven’t yet trained the model, you can do so by following these steps:

Navigate to the Training folder.
Open the Jupyter notebook Network_minor_neuralnetwork.ipynb
The model will be trained on the CICIDS dataset, and the trained model (trained_model-neuralnetwork.keras) will be saved.

The trained files are saved in the Frontend folder for use in the Streamlit app.

5. Run the Streamlit App

Once everything is set up, you can start the Streamlit app for predictions:

streamlit run Frontend/streamlit-app.py

This will open the app in your web browser.

How to Use the Traffic Prediction App

Upload a CSV File:
- Click on the “Choose a CSV file” button in the Streamlit app and upload a CSV file containing traffic data. The CSV file should contain network traffic features (e.g., flow duration, packet lengths, flags, etc.).
- The app will display the contents of the uploaded file to ensure the data is correctly loaded.
Get Predictions:
- Once the CSV file is uploaded, the app will use the trained model to predict the traffic labels (True/False).
- The app will display:
  - The original uploaded data.
  - Predicted labels for each row.
  - Prediction probabilities (True/False) for each class.
View Results:
- After the predictions are made, the app will show:
  - Predicted Labels: Whether the traffic is classified as True or False based on the model.
  - Prediction Probabilities (True/False): Probabilities for each class indicating the confidence of the prediction.

Example CSV Format

The CSV file should have the following columns (without the label column):

Destination Port	Flow Duration	Total Fwd Packets	Total Backward Packets	…
80	1500	25	30	…
443	2000	40	50	…
21	1000	15	20	…

Sample Output:

False Probability	True Probability	Predicted Label
0.18	0.82	True
0.09	0.91	True
0.08	0.92	True

Troubleshooting

CSV Format Errors: Ensure the uploaded CSV contains the necessary features and follows the correct format. The app will drop the label column automatically if present.
Model Errors: Ensure the model and scaler are correctly saved in the Frontend folder. The app expects these files to be present for making predictions.

License

This project is open-source and licensed under the MIT License.