Intrusion Detection Systems

The purpose of an intrusion detection system (IDS) is to protect the confidentiality, integrity, and availability of a system. Intrusion detection systems (IDS) are designed to detect specific issues, and are categorized as signature-based (SIDS) or anomaly-based (AIDS). IDS can be software or hardware. How do SIDS and AIDS detect malicious activity? What is the difference between the two? What are the four IDS evasion techniques discussed, and how do they evade an IDS?

Introduction

Unsupervised learning in intrusion detection system

Unsupervised learning is a form of machine learning technique used to obtain interesting information from input datasets without class labels. The input data points are normally treated as a set of random variables. A joint density model is then created for the data set. In supervised learning, the output labels are given and used to train the machine to get the required results for an unseen data point, while in unsupervised learning, no labels are given, and instead, the data is grouped automatically into various classes through the learning process. In the context of developing an IDS, unsupervised learning means, use of a mechanism to identify intrusions by using unlabelled data to train the model.

As shown in Fig. 6, once records are clustered, all of the cases that appear in small clusters are labelled as an intrusion because the normal occurrences should produce sizable clusters compared to the anomalies. In addition, malicious intrusions and normal instances are dissimilar, thus they do not fall into the identical cluster. Fig 6 Using Clustering for Intrusion Detection

K-means: The K-means techniques are one of the most prevalent techniques of clustering analysis that aims to separate 'n' data objects into 'k' clusters in which each data object is selected in the cluster with the nearest mean. It is a distance-based clustering technique and it does not need to compute the distances between all combinations of records. It applies a Euclidean metric as a similarity measure. The number of clusters is determined by the user in advance. Typically several solutions will be tested before accepting the most appropriate one. Annachhatre et.al. used the K-means clustering algorithm to identify different host behaviour profiles (Annachhatre et al., 2015). They have proposed new distance metrics which can be used in the k-means algorithm to closely relate the clusters. They have clustered data into several clusters and associated them with known behavior for evaluation. Their outcomes have revealed that k-means clustering is a better approach to classify the data using unsupervised methods for intrusion detection when several kinds of datasets are available. Clustering could be used in IDS for reducing intrusion signatures, generate a high-quality signature, or group similar intrusion.


Hierarchical Clustering: This is a clustering technique that aims to create a hierarchy of clusters. Approaches for hierarchical clustering are normally classified into two categories:

  1. Agglomerative- bottom-up clustering techniques where clusters have sub-clusters, which in turn have sub-clusters, and pairs of clusters are combined as one moves up the hierarchy.

  2. Divisive - hierarchical clustering algorithms where iteratively the cluster with the largest diameter in feature space is selected and separated into binary sub-clusters with a lower range.

 A lot of work has been done in the area of the cyber-physical control system (CPCS) with attack detection and reactive attack mitigation by using unsupervised learning. For example, a redundancy-based resilience approach was proposed by Alcara (Alcaraz, 2018). He proposed a dedicated network sublayer that has the capability to handle the context by regularly collecting consensual information from the driver nodes controlled in the control network itself, and discriminating view differences through data mining techniques such as k-means and k-nearest neighbour. Chao Shen et al. proposed Hybrid-Augmented device fingerprinting for IDS in Industrial Control System Networks. They used different machine learning techniques to analyse network packets to filter anomaly traffic to detect in the intrusions in ICS networks (Shen et al., 2018).