Confusion Matrix and Cyber Security
A confusion matrix provides a summary of the predictive leads to a classification problem. Correct and incorrect predictions are summarized during a table with their values and weakened by each class.
Definition of the Terms:
True Positive: You predicted positive and it’s true.
True Negative: You predicted negative and it’s true.
False Positive: You predicted positive and it’s false.
False Negative: You predicted negative and it’s false.
For Example, let’s take a scenario
We have a total 10 of car and bus and our model predicts whether it is a car or not
Actual v alues = [‘bus’, ‘car’, ‘bus’, ‘car’, ‘bus’, ‘bus’, ‘car’, ‘bus’, ‘car’, ‘bus’]
Predicted values = [‘bus’, ‘bus’, ‘bus’, ‘car’, ‘bus’, ‘bus’, ‘car’, ‘car’, ‘car’, ‘car’]
Definition of the Terms:
True Positive: You predicted positive and it’s true. You predicted that a car and it actually is.
True Negative: You predicted negative and it’s true. You predicted that is not a bus and it actually is not.
False Positive (Type 1 Error): You predicted positive and it’s false. You predicted a car but it actually is not.
False Negative (Type 2 Error): You predicted negative and it’s false. You predicted that is a car but it actually is.
Classification Accuracy is given by the relation:
Accuracy is the number of correctly (True) predicted results out of the total.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Accuracy should be considered when TP and TN are more important and the dataset is balanced because in that case the model will not get baised based on the class distribution. But in real-life classification problem, imbalanced class distribution exists.
Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes. Or, out of all the predictive positive classes, how much we predicted correctly. Precision should be high.
Out of the total predicted positive values, how many were actually positivePrecision = TP / (TP + FP) = 4/5 = 0.8
Out of the total actual positive values, how many were correctly predicted as positive
Recall= TP / (TP + FN) = 4/5 = 0.8
Based on the problem statement, whenever the FP is having a greater impact, go for Precision and whenever the FN is important, go for Recall
4. F beta SCORE
In some use cases, both precision and recall are important. Also, in some use cases even though precision plays an important role or recall plays is important, we should combine both to get the most accurate result.
Cybersecurity is the practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks. It’s also known as information technology security or electronic information security. The term applies in a variety of contexts, from business to mobile computing, and can be divided into a few common categories.
· Network security
· Application security
· Information security
· Operational security
· Disaster recovery and business continuity
· End-user education
True Positive (TP): The amount of attack detected when it is actually attacked.
True Negative (TN): The amount of normal detected when it is actually normal.
False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
False Negative (FN): The amount of normal detected when it is actually attacked.
Comparison of detection rate: Detection Rate (DR) is given by.
Comparison of False Alarm Rate: False Alarm Rate (FAR) refers to the proportion that normal data is falsely detected as attack behavior.
Confusion matrix and accuracy
The confusion matrix that was obtained from the classifier is depicted in Figure below. It is in normalized form, since the classes are imbalanced. The darker the blue, the better the classifier is at predicting files for this class. It is clear where the classifier gets ‘confused’. The ‘identity theft’ class does not seem to do well, which has a good reason. Through reading court cases, the discovery was made that ‘platform fraud’ is linked to ‘identity theft’, as it appears that stolen identities are often used to commit platform fraud. In the confusion matrix it is shown that ‘identity theft’ is often predicted as ‘platform fraud’.