Intrusion Detection Systems

The purpose of an intrusion detection system (IDS) is to protect the confidentiality, integrity, and availability of a system. Intrusion detection systems (IDS) are designed to detect specific issues, and are categorized as signature-based (SIDS) or anomaly-based (AIDS). IDS can be software or hardware. How do SIDS and AIDS detect malicious activity? What is the difference between the two? What are the four IDS evasion techniques discussed, and how do they evade an IDS?

Introduction

Comparison of public IDS datasets

Since machine learning techniques are applied in AIDS, the datasets that are used for the machine learning techniques are very important to assess these techniques for realistic evaluation. Table 12 summarises popular public data sets, as well as some analysis techniques and results for each dataset from prior research. Table 13 summarizes the characteristics of the datasets.

 

Table 12 Comparison of results achieved by various methods on publically available IDS datasets

Dataset

Result

Observations

Reference

DARPA 98

Snort's detection, 69% of total generated alerts are considered to be false alarms.

SIDS is applied without AIDS

Hu, et al. (2009)

ANN analysis system calls, 96% detection rate.

A classifier based on artificial neural network (ANN) has been executed for preparing and testing of framework.

McHugh (2000)

SVM on subset of DARPA 98, 99.6% detection rate.

SVM isolates information into various classes by a hyperplane or hyperplanes since it can deal with multidimensional information. SVM usually demonstrate good performance for a binary class problem.

Chen, et al. (2005)

KDDCUP 99

Multivariate statistical analysis of audit data, 90% detection rate

Multivariate is used to reduce false alarm rates.

Ye, et al. (2002), Hotta, et al. (2008)

The best results have been achieved by the C4.5 algorithm which attains the 95% true positive rate.

The decision trees created by C4.5 can be utilized for classification

Ferrari and Cribari-Neto (2004); Shafi and Abbass (2013); Laskov, et al. (2005)

SMO classifier
97% detection rate.

This SVM based classifier with SMO implementation produces good detection accuracy. However, the accuracy reported is less than that in (Chen et al., 2005), because the KDDCUP 99 dataset is more complex and comprehensive than DARPA 98 dataset.

Shafi and Abbass (2013)

The best model is an HNB model, where 95% confidence level is used to compare the models.

Hidden Naïve Bayes (HNB) techniques could be applied to IDS area that suffer from dimensionality, highly associated attributes and high network speed. HNB technique is better than the one based on the traditional NB method in terms of detection accuracy for IDS.

Koc, et al. (2012)

NSL-KDD

K-Nearest Neighbour (k-NN) algorithm, the detection rate of 94%.

The k-NN algorithm uses all labelled training instances as a model of the target function. During the classification phase, k-NN uses a similarity-based search strategy to determine a locally optimal hypothesis function.

Adebowale, et al. (2013)

Naïve Bayes, the detection rate is 89%.

Bayesian classifiers provide moderate accuracy because the focus is on classifying the classes for the instances, not the exact probabilities.

Adebowale, et al. (2013)

C4.5 gave the best detection rate of 99%.

C4.5 selects the feature of the data that most efficiently divides its set of samples into subsets, contributing to improved accuracy

Thaseen and Kumar (2013)

SMO classifier, the detection rate is 97%.

The work also uses SVM based classifier and achieves detection rate similar to (Chen et al., 2005).

Adebowale, et al. (2013)

Expectation Maximization (EM) clustering, the accuracy is 78%

EM forms a "soft" task of each row to various clusters in percentage to the probability of each cluster. The accuracy in this method is low as EM does not give a parameter covariance matrix for standard errors

Ahmed, et al. (2016)

ADFA-WD

Creech et al. have used Hidden Markov Model
(HMM), Extreme Learning Machine (ELM) and SVM. They reported 74.3% accuracy for HMM, 98.57% accuracy for ELM and 99.64% accuracy for SVM.

The ADFA-WD is a much new data set and contains new attacks. This is why reported accuracy was not as good as for every machine learning technique when compared to the accuracy using legacy KDD98 data.
SVM has been reported to produce the highest accuracy.

Creech and Hu (2014b)

ADFA-LD

100% accuracy for using ELM using original semantic feature

New semantic features are applied. Therefore, ELM, are capable to use the new semantic feature easily and quickly by including amounts of semantic phrases.

Creech and Hu (2014b)

CICIDS2017

94.5% accuracy obtained by using MLP solely, by using MLP and Payload Classifier together 95.2% accuracy rate is detected.

Feature selection is done by using Fisher Score algorithm.

Usteba, et al. (2018)

Bot-IoT

The highest accuracy from the SVM model.
98% detection rate

This SVM based method has produced good detection accuracy (Mitchell & Chen, 2015; Chen et al., 2005; Ferrari & Cribari-Neto, 2004)

Koroniotis, et al. (2018)

 

 

Table 13 Compassion of datasets (✔ = True, ✖ = False)

Dataset

Realistic Traffic

Label data

IoT traces

Zero-day attacks

Full packet captured

Year

DARPA 98

1998

KDDCUP 99

1999

CAIDA

2007

NSL-KDD

2009

ISCX 2012

2012

ADFA-WD

2014

ADFA-LD

2014

CICIDS2017

2017

Bot-IoT

2018