Performance measures of models

Deepak Jain
Towards Data Science
6 min readSep 3, 2020

--

Schools and colleges regularly conduct tests. The basic idea behind this is to measure the performance of the students. To understand which is their strong subject and where they need to work harder. In a similar way, we also test our machine learning model to measure their performance and based on this performance we try to understand what the model is doing right and where it needs to work harder (basically we need to work harder)

In the field of machine learning, other than building models, it's equally important to measure the performance of the model. Basically, we check how good are the predictions made by our model.

In this series of articles, we will try to understand what are the various performance measures of a model.

Accuracy

This is probably the simplest performance metrics. It is defined as:

Image by author

Accuracy value lies between 0 and 1. If the value is closer to 0 it's considered as bad performance, whereas if the value is closer to 1 then its considered good performance. It is one of the simplest and easy to understand metric.

Let’s understand this metric using an example:

Assume we have already trained our model using training data. Now, we want to use the test data and check how accurate the predictions are. Let's say we have a classification problem with 100 data points in our test data set. The objective is to classify whether the point is positive or negative. Assume out of the 100 points, we have 60 positive points and 40 negative points (Note that this is the original/actual class label). Now, when we feed this test data to our model and suppose we get the below output:

Image by author

So basis the above example, our model has misclassified 7 points as negative and 5 points as positive. So overall misclassified points = 7+5 = 12.

So the accuracy of the model can be calculated as:

Image by author

Now that we have understood how to calculate accuracy, let's understand some of the problems associated with it.

Imbalanced data set

Let's say we have a model that returns negative class as my output. Now, suppose we have an imbalanced data set which is also my test data set, and say 90% of the total test data set is negative. Now when we input this test data to our model, we will get 90% of the classification correct as my model returns negative class label and 90% of my test data set is negative. In such a scenario, even the dumb model gives me an accuracy of 90%. So stay away from accuracy when you have an imbalanced data set.

Accuracy doesn’t consider probability scores

Consider the below example to understand this:

Image by author

x ➜ Datapoints

y ➜ Actual class label

M1➜ Probability score of model M1

M2 ➜ Probability score of model M2

Y1 ➜ Predicted class label of model M1

Y2 ➜ Predicted class label of model M2

Let's assume we ran our data through 2 models M1 and M2 and these models returned the probability scores. So given a data point, we get a probability of P(y=1).

M1 can be read as the probability score of x1 having y1=1 is 0.9 when ran through model M1. Similarly, the probability score of x3 having y3=1 is 0.1, which means the probability of y3=1 is very less when ran through model M1 (means P(y3=0) = 0.9).

Consider x1. For x1, the actual class label is 1. Our model M1 gives a probability score of P(y=1) = 0.9 and is predicted to belong to class label 1. On the other hand model, M2 gives a probability score of P(y=1) = 0.6, and hence it is also classified as class label 1. But if we consider the probability scores, its clear that my model M1 is performing better than my model M2. Similarly, for x2, x3, and x4 the probability scores of model M1 are much better compared to model M2. However, their predicted class labels remain the same in both models. Accuracy as a measure doesn’t distinguish which model is better since it doesn’t use probability scores. It can only use predicted class labels and since it uses predicted class labels to calculate accuracy it will say that model M1 and M2 have the same accuracy but from probability scores, it is clear that M1 is better than M2.

Confusion Matrix

To understand the confusion matrix, let's take a binary classification task where the objective is to classify the class label as either 0 (negative) or 1 (positive). Let's construct the confusion matrix for the same.

TN ➜ True Negative
FN ➜ False Negative
FP ➜ False Positive
TP ➜ True Positive
N ➜ Total no. of negative points
P ➜ Total no. of positive points

Let's understand each of the above terms:

  • True Negative ➔ when the actual value is 0 and the model predicted value is also 0
  • False Negative ➔ when the actual value is 1 and the model predicted value is 0
  • True Positive ➔ when the actual value is 1 and the model predicted value is also 1
  • False Positive ➔ when the actual value is 0 and the model predicted value is 1

Now that we have understood how to construct a confusion matrix and also its basic terminologies, let's understand some of the key metrics associated with it.

True Positive Rate (TPR) = # of TP / Total # of P

True Negative Rate (TNR) = # of TN / Total # of N

False Positive Rate (FPR) = # of FP / Total # of N

False Negative Rate (FNR) = # of FN / Total # of P

Precision = TP / (TP + FP)
It means, of all the points the model predicted to be positive, what % of them are actually positive. In precision, we are not bothered about the negative class. Our only focus is on the positive class label.

Recall = TPR = TP / Total # of P
It means, of all the points that “actually” belong to class label 1, how many the model predicted to be class label 1.

For a good model, we would always want the values of precision and recall to be high.

F1-score:
It’s the combination of both metrics precision and recall and is given as follows:

Conclusion

So far we have covered the accuracy and confusion matrix and we also understood various terms under confusion matrix. In the second part of this series, we’ll understand ROC and AUC, log loss, co-efficient of determination, and median absolute deviation of errors.

Until then, Happy Learning!

Deepak Jain

--

--