Cyber Crime and Confusion Matrix

Shivani Srivastava
5 min readJun 6, 2021
Source: Cyber-crime-2.jpg (1688×1125) (henshalls.com)

Cyber Crime

Cyber Crime is the criminal activity that involves targeting a computer or a networked device. The computer is used as an instrument to pursue illegal purposes, such as fraud, trafficking in child pornography and intellectual property, identity theft, or violating privacy.

Types Of Cyber Crimes

The worldwide information security market is forecast to reach $170.4 billion in 2022, according to Gartner. This is due in large part to organizations evolving their defenses against cyber threats — and a rise in such threats, including in their own companies. According to Cybint, 95% of cybersecurity breaches are caused by human error. It’s a telling takeaway about the cybersecurity landscape, and we’ve outlined more to give an idea of the field as a whole, along with the overall impact of cyber attacks.

95% of cybersecurity breaches are caused by human error. (Cybint)

The worldwide information security market is forecast to reach $170.4 billion in 2022. (Gartner)

88% of organizations worldwide experienced spear phishing attempts in 2019. (Proofpoint)

68% of business leaders feel their cybersecurity risks are increasing. (Accenture)

On average, only 5% of companies’ folders are properly protected. (Varonis)

Data breaches exposed 36 billion records in the first half of 2020. (RiskBased)

86% of breaches were financially motivated and 10% were motivated by espionage. (Verizon)

45% of breaches featured hacking, 17% involved malware and 22% involved phishing. (Verizon)

Between January 1, 2005, and May 31, 2020, there have been 11,762 recorded breaches. (ID Theft Resource Center)

The top malicious email attachment types are .doc and .dot which make up 37%, the next highest is .exe at 19.5%. (Symantec)

An estimated 300 billion passwords are used by humans and machines worldwide. (Cybersecurity Media)

Source:134 Cybersecurity Statistics and Trends for 2021 | Varonis

Therefore, it is very necessary to detect different cyber attacks in a network. The application of the machine learning model in the construction of an effective intrusion detection system (IDS) is involved. A binary classification model can be used to identify what is going on within the network, that is, whether there is an attack or not.

Understanding raw safety data is the first step in constructing an intelligent safety model to make predictions about future incidents.

The two categories being — normal and anomalous.

Consider the security highlights and playing out all preprocessing steps, train the model that can be utilized to identify whether the experiment is ordinary or an oddity. For assessment of the model, one of the measurements utilized is the Confusion network.

Confusion Matrix:

In the field of AI and explicitly the issue of statistical classification, a confusion matrix, also known as error matrix, is a particular table format that permits the visualization of an algorithm normally a directed learning one (in solo learning it is generally called a matching matrix).

Confusion Matrix

Let’s see the terms we used in the above diagram:

  • In a two-class problem, such as attack state, we assign the event normal as “positive” and anomaly as “negative“.
  • True Positive” for correctly predicted event values.
  • False Positive” for incorrectly predicted event values.
  • True Negative” for correctly predicted no-event values.
  • False Negative” for incorrectly predicted no-event values.

Confusion matrices have two types of errors: False Positive (FP) (Type I) and False Negative (FN) (Type II).

Now lets see these terminologies and their importance along with a cyber attack prediction for a finer comprehension of the topic:

An Intrusion Detection System (IDS) is a system that monitors network traffic to detect suspicious activities and provides alerts when such activities are found. It is a software application that scans a network or system for an adverse activity or policy violation. Any malicious company or violation is normally reported to an administrator or collected centrally via a Security Intelligence and Event Management System (SIEM). An SIEM system integrates multiple source outputs and uses alarm filtering techniques to differentiate between malicious and false alarms.

Let’s assume that our model has created the Confusion Matrix for 120 packets it examined:

A total of 120 packets was analyzed by our model in IDS system which have been classified in the above confusion matrix.

  • Positive” -> Model predicted no attack.
  • Negative” -> Model predicted the attack.
  • True Negative: Out of 60 times for which model predicted attack will take place, 55 predictions were ‘True’. Which means 55 times attack actually took place. Due to the prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.
  • False Negative: Out of 60 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also a Type II error.
  • True Positive: The model predicted 60 times that attack wouldn’t take place, out of which 40 times actually no attack happened. These are the correct predictions.
  • False Positive: 20 times the attack actually took place when the model had predicted that no attack will happen. It is also called as a Type I error.

False Positive (FP) Type I Error:

This type of error can be extremely hazardous. Our system has not anticipated any attacks, but when a real attack takes place, in that particular case no notification would have been passed on to the security team and nothing can be done to prevent it. Therefore, one of the aims of the model is to minimize this value.

False Negative (FN) Type II error:

This type of error is not very hazardous because our system is actually protected, but the model predicts an attack. The security team would be notified and would verify any malicious activity. They may be referred to as False Alarm.

Uses of Confusion Matrix in the Calculation of Metrics

  1. Accuracy: The matrix of confounding values is used to compute the accuracy of the model. It is the ratio of all correct predictions to overall predictions (total values)

Accuracy = (TP + TN) /(TP + TN + FP + FN)

2. Precision: (True positives / Predicted positives) = TP / TP + FP

3. Recall: (True positives / all actual positives) = TP / TP + FN

4. Specificity: (True negatives / all actual negatives) =TN / TN + FP

5. Misclassification: (all incorrect / all) = FP + FN / TP + TN + FP + FN

So here was how Cyber Crimes are monitored with the help of Confusion Matrix.

Thank you for reading!

--

--

Shivani Srivastava

Project Management Enthusiast | AI Tools Junkie | Seeking Opportunities to Drive Innovation and Success