title
Making sense of the confusion matrix
description
How do you interpret a confusion matrix? How can it help you to evaluate your machine learning model? What rates can you calculate from a confusion matrix, and what do they actually mean?
In this video, I'll start by explaining how to interpret a confusion matrix for a binary classifier:
0:49 What is a confusion matrix?
2:14 An example confusion matrix
5:13 Basic terminology
Then, I'll walk through the calculations for some common rates:
11:20 Accuracy
11:56 Misclassification Rate / Error Rate
13:20 True Positive Rate / Sensitivity / Recall
14:19 False Positive Rate
14:54 True Negative Rate / Specificity
15:58 Precision
Finally, I'll conclude with more advanced topics:
19:10 How to calculate precision and recall for multi-class problems
24:17 How to analyze a 10-class confusion matrix
28:26 How to choose the right evaluation metric for your problem
31:31 Why accuracy is often a misleading metric
== RELATED RESOURCES ==
My confusion matrix blog post:
https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
Evaluating a classifier with scikit-learn (video):
https://www.youtube.com/watch?v=85dtiMz9tSo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=9
ROC curves and AUC explained (video):
https://www.youtube.com/watch?v=OAl6eAyP-yo
== DATA SCHOOL INSIDERS ==
Join "Data School Insiders" on Patreon for bonus content:
https://www.patreon.com/dataschool
== WANT TO GET BETTER AT MACHINE LEARNING? ==
1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1
3) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/
4) LET'S CONNECT!
- Newsletter: https://www.dataschool.io/subscribe/
- Twitter: https://twitter.com/justmarkham
- Facebook: https://www.facebook.com/DataScienceSchool/
- LinkedIn: https://www.linkedin.com/in/justmarkham/
detail
{'title': 'Making sense of the confusion matrix', 'heatmap': [{'end': 893.179, 'start': 801.823, 'weight': 1}, {'end': 1047.574, 'start': 953.322, 'weight': 0.839}, {'end': 1405.13, 'start': 1379.624, 'weight': 0.753}], 'summary': 'Delves into understanding confusion matrix, model evaluation metrics, precision, recall, and the significance of choosing appropriate evaluation metrics in machine learning. it covers topics such as true positives, false positives, true negatives, false negatives, accuracy, misclassification rate, precision rate, roc curve, and limitations in computing these metrics from the confusion matrix, with a specific focus on minimizing specific errors in machine learning.', 'chapters': [{'end': 324.538, 'segs': [{'end': 133.395, 'src': 'embed', 'start': 0.981, 'weight': 0, 'content': [{'end': 3.262, 'text': 'Hi everyone, this is Kevin from Data School.', 'start': 0.981, 'duration': 2.281}, {'end': 8.923, 'text': 'Four years ago, I published a blog post about the confusion matrix,', 'start': 4.182, 'duration': 4.741}, {'end': 15.164, 'text': 'which is a critical tool for helping you to evaluate the performance of your classification model.', 'start': 8.923, 'duration': 6.241}, {'end': 24.026, 'text': 'Now that post has been viewed about half a million times, which is probably because when you Google for confusion matrix,', 'start': 15.724, 'duration': 8.302}, {'end': 26.267, 'text': 'my post shows up right at the top.', 'start': 24.026, 'duration': 2.241}, {'end': 33.909, 'text': "Now in this video I'm going to explain the confusion matrix and related terminology,", 'start': 27.904, 'duration': 6.005}, {'end': 37.672, 'text': "but I'm going to go much more in depth than I did during the blog post.", 'start': 33.909, 'duration': 3.763}, {'end': 45.179, 'text': "I've also picked out four excellent questions from the comments section, and I will answer those at the end of the video.", 'start': 38.413, 'duration': 6.766}, {'end': 47.661, 'text': "So let's go ahead and get started.", 'start': 45.939, 'duration': 1.722}, {'end': 54.456, 'text': 'I want to first give you my definition of a confusion matrix,', 'start': 50.191, 'duration': 4.265}, {'end': 68.792, 'text': 'which is a table that is often used to describe the performance of a classification model or classifier on a set of test data for which the true values are known.', 'start': 54.456, 'duration': 14.336}, {'end': 71.713, 'text': 'So two points I want to highlight here.', 'start': 69.472, 'duration': 2.241}, {'end': 82.636, 'text': 'Number one is that a confusion matrix is only used for classification models, meaning models that are predicting class labels,', 'start': 72.313, 'duration': 10.323}, {'end': 88.998, 'text': 'and they are not used for regression models, meaning models which are used to predict numeric values.', 'start': 82.636, 'duration': 6.362}, {'end': 100.084, 'text': 'The second point I want to make is that the confusion matrix can only be computed when the true values are known.', 'start': 89.879, 'duration': 10.205}, {'end': 110.836, 'text': "Meaning, if you've used an evaluation procedure like train-test-split, then you do have a set of test data for which the true values are known.", 'start': 101.065, 'duration': 9.771}, {'end': 117.183, 'text': 'And when you make predictions on that test set, you can use a confusion matrix to evaluate them.', 'start': 111.477, 'duration': 5.706}, {'end': 130.193, 'text': 'Alternatively, if you are making predictions on out-of-sample data for which you are able to eventually measure what the true value was,', 'start': 118.064, 'duration': 12.129}, {'end': 133.395, 'text': 'you can also use the confusion matrix.', 'start': 130.193, 'duration': 3.202}], 'summary': 'Kevin explains the confusion matrix for classification models, viewed 500k times.', 'duration': 132.414, 'max_score': 0.981, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY981.jpg'}, {'end': 275.554, 'src': 'embed', 'start': 244.085, 'weight': 5, 'content': [{'end': 248.008, 'text': 'And I got those totals just by adding the numbers in those rows.', 'start': 244.085, 'duration': 3.923}, {'end': 254.459, 'text': 'Now let me pause and say two things about the layout of the confusion matrix.', 'start': 249.215, 'duration': 5.244}, {'end': 262.103, 'text': "Usually the confusion matrix will not have these labels when it's output by software.", 'start': 255.88, 'duration': 6.223}, {'end': 264.546, 'text': 'You will usually just see this matrix.', 'start': 262.545, 'duration': 2.001}, {'end': 267.188, 'text': 'It will not have these labels over here.', 'start': 264.646, 'duration': 2.542}, {'end': 269.71, 'text': 'It will not have these labels over here.', 'start': 267.628, 'duration': 2.082}, {'end': 275.554, 'text': "So you'll just have to look up in the software what the layout means.", 'start': 270.13, 'duration': 5.424}], 'summary': 'The confusion matrix usually lacks labels and requires interpretation from software.', 'duration': 31.469, 'max_score': 244.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY244085.jpg'}], 'start': 0.981, 'title': 'Understanding confusion matrix', 'summary': 'Delves into the significance of confusion matrix in evaluating classification model performance, with a blog post on the topic having garnered half a million views. it promises a comprehensive explanation and addresses four questions from the comments section. additionally, it discusses the purpose of a confusion matrix in evaluating classification models, highlighting its usage for known true values in test data and the inclusion of four key terms: true positives, false positives, true negatives, and false negatives.', 'chapters': [{'end': 54.456, 'start': 0.981, 'title': 'Understanding confusion matrix', 'summary': 'Discusses the importance of confusion matrix in evaluating classification model performance, with a blog post about it being viewed half a million times, and the promise to provide a more in-depth explanation and address four questions from the comments section.', 'duration': 53.475, 'highlights': ["Kevin's blog post about the confusion matrix has been viewed about half a million times.", 'The confusion matrix is a critical tool for evaluating the performance of a classification model.', 'The video promises a more in-depth explanation of the confusion matrix and related terminology.', 'Kevin will address four questions from the comments section at the end of the video.']}, {'end': 324.538, 'start': 54.456, 'title': 'Understanding confusion matrix', 'summary': 'Discusses the purpose of a confusion matrix in evaluating classification models, emphasizing that it is used for known true values in test data and consists of four key terms: true positives, false positives, true negatives, and false negatives.', 'duration': 270.082, 'highlights': ['The confusion matrix is used only for classification models, not for regression models, and can be computed when the true values are known. It is emphasized that the confusion matrix is exclusive to classification models and cannot be used for regression models. Additionally, it can only be computed when the true values are known, such as when using evaluation procedures like train-test-split.', 'The confusion matrix layout typically does not include labels and follows a specific order in different software, as seen in scikit-learn for Python. The layout of the confusion matrix typically does not include labels in software output and follows a specific order, such as in scikit-learn for Python, where the actual labels are on the left and the predicted labels are on the top, in alphabetical order.', 'The confusion matrix consists of four terms: true positives, false positives, true negatives, and false negatives, which are derived from the predictions and actual values. The chapter explains that the confusion matrix contains four key terms: true positives, false positives, true negatives, and false negatives, which are derived from the predictions and actual values in the matrix.']}], 'duration': 323.557, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY981.jpg', 'highlights': ["Kevin's blog post about the confusion matrix has been viewed about half a million times.", 'The confusion matrix is a critical tool for evaluating the performance of a classification model.', 'The video promises a more in-depth explanation of the confusion matrix and related terminology.', 'Kevin will address four questions from the comments section at the end of the video.', 'The confusion matrix is used only for classification models, not for regression models, and can be computed when the true values are known.', 'The confusion matrix layout typically does not include labels and follows a specific order in different software, as seen in scikit-learn for Python.', 'The confusion matrix consists of four terms: true positives, false positives, true negatives, and false negatives, which are derived from the predictions and actual values.']}, {'end': 800.133, 'segs': [{'end': 356.574, 'src': 'embed', 'start': 324.558, 'weight': 0, 'content': [{'end': 329.641, 'text': 'To start, these 100 are called the true positives.', 'start': 324.558, 'duration': 5.083}, {'end': 339.027, 'text': 'So these are cases in which the actual value was yes and the predicted value was yes.', 'start': 330.402, 'duration': 8.625}, {'end': 340.788, 'text': 'These are called true positives.', 'start': 339.067, 'duration': 1.721}, {'end': 342.129, 'text': 'There were 100 true positives.', 'start': 341.349, 'duration': 0.78}, {'end': 345.454, 'text': 'These are called true negatives.', 'start': 343.711, 'duration': 1.743}, {'end': 352.246, 'text': 'These are cases in which the actual value is no and the predicted value was no.', 'start': 346.355, 'duration': 5.891}, {'end': 356.574, 'text': "So both of these are numbers that you'll always want to maximize.", 'start': 352.907, 'duration': 3.667}], 'summary': '100 true positives and true negatives, maximize both.', 'duration': 32.016, 'max_score': 324.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY324558.jpg'}, {'end': 452.373, 'src': 'embed', 'start': 422.351, 'weight': 3, 'content': [{'end': 424.433, 'text': 'These are called false negatives.', 'start': 422.351, 'duration': 2.082}, {'end': 431.821, 'text': 'These are cases in which the actual value was yes, but the predicted value was no.', 'start': 424.934, 'duration': 6.887}, {'end': 437.548, 'text': 'So again, we insert the word predicted between false and negative to remember it.', 'start': 432.402, 'duration': 5.146}, {'end': 444.27, 'text': 'And false negatives are when you falsely predicted the negative class.', 'start': 438.027, 'duration': 6.243}, {'end': 452.373, 'text': 'You predicted the negative class and you were incorrect, okay? These are known as type two errors.', 'start': 444.87, 'duration': 7.503}], 'summary': 'False negatives occur when predicting negative class incorrectly.', 'duration': 30.022, 'max_score': 422.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY422351.jpg'}, {'end': 739.768, 'src': 'embed', 'start': 674.15, 'weight': 1, 'content': [{'end': 679.812, 'text': "They're the concrete, hard numbers that we will use to choose between models.", 'start': 674.15, 'duration': 5.662}, {'end': 690.957, 'text': "The first rate we'll talk about is accuracy, and it answers the question overall how often is the classifier correct?", 'start': 681.393, 'duration': 9.564}, {'end': 705.549, 'text': 'So what we do to compute that is, we add the true negatives and the true positives 50 plus 100, and then we divide it by the total, which was 165..', 'start': 691.798, 'duration': 13.751}, {'end': 709.632, 'text': "So our accuracy, the classifier's accuracy, is .91 or 91%.", 'start': 705.549, 'duration': 4.083}, {'end': 710.493, 'text': "So that's the accuracy.", 'start': 709.632, 'duration': 0.861}, {'end': 723.219, 'text': 'Now the opposite of accuracy is misclassification rate, also known as error rate.', 'start': 716.656, 'duration': 6.563}, {'end': 725.781, 'text': 'You can do it one of two ways.', 'start': 724.24, 'duration': 1.541}, {'end': 739.768, 'text': 'Either you can compute it by saying 1 minus accuracy, and we would say our misclassification rate or our error rate is 9% or 0.09.', 'start': 726.401, 'duration': 13.367}], 'summary': "Classifier's accuracy is 91%, with a misclassification rate of 9%.", 'duration': 65.618, 'max_score': 674.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY674150.jpg'}, {'end': 800.133, 'src': 'embed', 'start': 773.192, 'weight': 2, 'content': [{'end': 785.28, 'text': 'but the accuracy is basically just computed by adding up all the numbers along the diagonal and then again dividing it by the total number of predictions.', 'start': 773.192, 'duration': 12.088}, {'end': 795.909, 'text': 'So accuracy and error rate are model evaluation metrics that extend naturally to multi-class cases.', 'start': 785.92, 'duration': 9.989}, {'end': 800.133, 'text': "That's not the case for all the rates we're going to talk about.", 'start': 796.35, 'duration': 3.783}], 'summary': 'Accuracy and error rate are model evaluation metrics that extend naturally to multi-class cases.', 'duration': 26.941, 'max_score': 773.192, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY773192.jpg'}], 'start': 324.558, 'title': 'Model evaluation metrics', 'summary': 'Explains the concepts of true positives, true negatives, false positives, and false negatives, emphasizing their importance in maximizing correct predictions in a classification model. it also covers the significance of model evaluation metrics such as accuracy and misclassification rate, providing examples of how these metrics are computed and their relevance to multi-class problems.', 'chapters': [{'end': 512.433, 'start': 324.558, 'title': 'Understanding true positives and false negatives', 'summary': 'Explains the concepts of true positives, true negatives, false positives, and false negatives, emphasizing their importance in maximizing correct predictions in a classification model, and highlighting the distinction between type one and type two errors, along with the clarification that these are whole number counts, not rates or percentages.', 'duration': 187.875, 'highlights': ['The chapter emphasizes the importance of maximizing true positives and true negatives in a classification model, with 100 true positives and true negatives being the ideal scenario.', 'It explains the concept of false positives as cases where the actual value is no, but the predicted value is yes, serving as type one errors.', 'The distinction between false negatives as cases where the actual value is yes, but the predicted value is no, serving as type two errors, is highlighted, with the reminder that these are whole number counts, not rates or percentages.']}, {'end': 800.133, 'start': 513.45, 'title': 'Understanding model evaluation metrics', 'summary': 'Explains the significance of true negative, true positive, and model evaluation metrics such as accuracy and misclassification rate, providing an example of how these metrics are computed and their relevance to multi-class problems.', 'duration': 286.683, 'highlights': ['The chapter explains the significance of true negative, true positive, and model evaluation metrics such as accuracy and misclassification rate. It discusses the importance of true negative, true positive, and introduces model evaluation metrics like accuracy and misclassification rate.', 'The accuracy of the classifier is 91% and the misclassification rate is 9%. The accuracy of the classifier is computed as 91% by adding the true negatives and true positives and dividing by the total, while the misclassification rate is 9%.', 'The accuracy and error rate are model evaluation metrics that extend naturally to multi-class cases. The accuracy and error rate are discussed as model evaluation metrics that naturally extend to multi-class cases, computed by adding up the numbers along the diagonal and dividing by the total number of predictions.']}], 'duration': 475.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY324558.jpg', 'highlights': ['The chapter emphasizes the importance of maximizing true positives and true negatives in a classification model, with 100 true positives and true negatives being the ideal scenario.', 'The accuracy of the classifier is 91% and the misclassification rate is 9%. The accuracy of the classifier is computed as 91% by adding the true negatives and true positives and dividing by the total, while the misclassification rate is 9%.', 'The accuracy and error rate are model evaluation metrics that extend naturally to multi-class cases. The accuracy and error rate are discussed as model evaluation metrics that naturally extend to multi-class cases, computed by adding up the numbers along the diagonal and dividing by the total number of predictions.', 'It explains the concept of false positives as cases where the actual value is no, but the predicted value is yes, serving as type one errors.', 'The distinction between false negatives as cases where the actual value is yes, but the predicted value is no, serving as type two errors, is highlighted, with the reminder that these are whole number counts, not rates or percentages.', 'The chapter explains the significance of true negative, true positive, and model evaluation metrics such as accuracy and misclassification rate. It discusses the importance of true negative, true positive, and introduces model evaluation metrics like accuracy and misclassification rate.']}, {'end': 1150.541, 'segs': [{'end': 830.512, 'src': 'embed', 'start': 801.823, 'weight': 0, 'content': [{'end': 811.568, 'text': "The next rate I want to talk about is called the true positive rate, and it answers the question when it's actually yes,", 'start': 801.823, 'duration': 9.745}, {'end': 814.51, 'text': 'how often does it predict yes?', 'start': 811.568, 'duration': 2.942}, {'end': 818.952, 'text': "So when it's actually yes, so this row 105 cases.", 'start': 816.011, 'duration': 2.941}, {'end': 824.228, 'text': 'how often does it predict yes? and 100..', 'start': 818.952, 'duration': 5.276}, {'end': 826.75, 'text': 'So the true positive rate is 100 divided by 105, which is about 95%.', 'start': 824.228, 'duration': 2.522}, {'end': 830.512, 'text': 'Now true positive rate is also known as sensitivity.', 'start': 826.75, 'duration': 3.762}], 'summary': 'The true positive rate, also known as sensitivity, is about 95%.', 'duration': 28.689, 'max_score': 801.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY801823.jpg'}, {'end': 938.554, 'src': 'heatmap', 'start': 801.823, 'weight': 2, 'content': [{'end': 811.568, 'text': "The next rate I want to talk about is called the true positive rate, and it answers the question when it's actually yes,", 'start': 801.823, 'duration': 9.745}, {'end': 814.51, 'text': 'how often does it predict yes?', 'start': 811.568, 'duration': 2.942}, {'end': 818.952, 'text': "So when it's actually yes, so this row 105 cases.", 'start': 816.011, 'duration': 2.941}, {'end': 824.228, 'text': 'how often does it predict yes? and 100..', 'start': 818.952, 'duration': 5.276}, {'end': 826.75, 'text': 'So the true positive rate is 100 divided by 105, which is about 95%.', 'start': 824.228, 'duration': 2.522}, {'end': 830.512, 'text': 'Now true positive rate is also known as sensitivity.', 'start': 826.75, 'duration': 3.762}, {'end': 831.732, 'text': "That's often a scientific term.", 'start': 830.532, 'duration': 1.2}, {'end': 851.221, 'text': "True positive rate is also known as recall, and that's a term that's often used in the machine learning field.", 'start': 844.179, 'duration': 7.042}, {'end': 858.282, 'text': 'So true positive rate, sensitivity, recall, all of those are equivalent.', 'start': 851.701, 'duration': 6.581}, {'end': 864.464, 'text': 'The next term to talk about is called false positive rate.', 'start': 860.923, 'duration': 3.541}, {'end': 872.992, 'text': "And it answers the question when it's actually no, how often does it predict yes??", 'start': 865.569, 'duration': 7.423}, {'end': 879.774, 'text': "So when it's actually no, 60 cases, how often does it predict yes??", 'start': 873.832, 'duration': 5.942}, {'end': 881.455, 'text': '10 cases?', 'start': 880.915, 'duration': 0.54}, {'end': 893.179, 'text': 'So the false positive rate is 10 divided by 60 which is about 0.17, okay? So that is false positive rate.', 'start': 881.935, 'duration': 11.244}, {'end': 900.053, 'text': 'The next term to talk about is called true negative rate.', 'start': 895.29, 'duration': 4.763}, {'end': 908.377, 'text': "So it answers the question when it's actually no, how often does it predict no??", 'start': 900.933, 'duration': 7.444}, {'end': 916.322, 'text': "So if we scroll up here when it's actually no, 60 cases, how often does it predict no??", 'start': 909.098, 'duration': 7.224}, {'end': 917.162, 'text': '50 cases?', 'start': 916.342, 'duration': 0.82}, {'end': 923.706, 'text': 'So 50 over 60 is about .83.', 'start': 917.182, 'duration': 6.524}, {'end': 935.933, 'text': "You may have noticed, this is equivalent to 1 minus the false positive rate, which was 0.17, and it's often known, perhaps more commonly known,", 'start': 923.706, 'duration': 12.227}, {'end': 938.554, 'text': 'as specificity, okay?', 'start': 935.933, 'duration': 2.621}], 'summary': 'True positive rate is 95%, false positive rate is 0.17, and true negative rate is about 0.83.', 'duration': 64.722, 'max_score': 801.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY801823.jpg'}, {'end': 1065.221, 'src': 'heatmap', 'start': 953.322, 'weight': 1, 'content': [{'end': 957.504, 'text': "and that's probably a more common term than true negative rate.", 'start': 953.322, 'duration': 4.182}, {'end': 971.909, 'text': 'The final term I want to talk about before we get into questions is the term precision, which answers the question when it predicts yes,', 'start': 959.344, 'duration': 12.565}, {'end': 973.87, 'text': 'how often is it correct??', 'start': 971.909, 'duration': 1.961}, {'end': 976.231, 'text': 'So scroll back up here.', 'start': 974.75, 'duration': 1.481}, {'end': 985.047, 'text': 'When it predicts yes, 110 cases, how often is it correct? 100 of those cases.', 'start': 978.101, 'duration': 6.946}, {'end': 991.914, 'text': "So it's 100 divided by 110, which is about .91.", 'start': 985.568, 'duration': 6.346}, {'end': 999.541, 'text': 'Now this is the first rate that we have talked about in which the denominator is one of these numbers.', 'start': 991.914, 'duration': 7.627}, {'end': 1001.764, 'text': 'instead of these numbers.', 'start': 1000.021, 'duration': 1.743}, {'end': 1005.67, 'text': 'So it can be a little bit confusing to keep track of, I know.', 'start': 1002.104, 'duration': 3.566}, {'end': 1018.601, 'text': 'But again, precision answers the question, when it predicts yes, how often is it correct? So 100 out of 110.', 'start': 1006.17, 'duration': 12.431}, {'end': 1022.723, 'text': "Now I'm gonna go ahead and move down to the questions.", 'start': 1018.601, 'duration': 4.122}, {'end': 1027.625, 'text': "I'm gonna skip prevalence and then some other terms that I've listed down here.", 'start': 1023.163, 'duration': 4.462}, {'end': 1030.806, 'text': "You can read about those if you're interested.", 'start': 1028.105, 'duration': 2.701}, {'end': 1032.227, 'text': "They're not quite as common.", 'start': 1030.866, 'duration': 1.361}, {'end': 1037.089, 'text': 'I do wanna say one quick thing about the ROC curve.', 'start': 1032.247, 'duration': 4.842}, {'end': 1047.574, 'text': 'So the ROC curve is generated by plotting the true positive rate against the false positive rate,', 'start': 1038.109, 'duration': 9.465}, {'end': 1054.897, 'text': 'as you vary the threshold for assigning observations to a given class.', 'start': 1048.434, 'duration': 6.463}, {'end': 1065.221, 'text': "So I mentioned the ROC curve, number one, because it's a concept I really enjoy, and number two, because it's so common.", 'start': 1055.657, 'duration': 9.564}], 'summary': 'Precision rate is about 0.91 for predicting yes; roc curve is a common concept.', 'duration': 87.12, 'max_score': 953.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY953322.jpg'}, {'end': 1117.594, 'src': 'embed', 'start': 1093.449, 'weight': 5, 'content': [{'end': 1103.992, 'text': 'There is another similar curve called the precision recall curve, which is, as you might guess, a plot of precision versus recall.', 'start': 1093.449, 'duration': 10.543}, {'end': 1109.349, 'text': 'And again, that cannot be computed from the confusion matrix.', 'start': 1104.727, 'duration': 4.622}, {'end': 1117.594, 'text': 'So the ROC curve and the precision recall curve, those are not model evaluation metrics.', 'start': 1109.79, 'duration': 7.804}], 'summary': 'Precision-recall and roc curves are not model evaluation metrics.', 'duration': 24.145, 'max_score': 1093.449, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1093449.jpg'}], 'start': 801.823, 'title': 'Understanding model evaluation metrics', 'summary': 'Discusses the precision rate, roc curve, and precision recall curve, clarifies their role as model evaluation metrics, and outlines limitations in computing these metrics from the confusion matrix. it also highlights the importance of true positive, false positive, and true negative rates, with the true positive rate at 95% and the false positive rate at 0.17.', 'chapters': [{'end': 976.231, 'start': 801.823, 'title': 'Understanding true positive and false positive rates', 'summary': 'Explains the concepts of true positive, false positive, and true negative rates, with the true positive rate being 95% and the false positive rate being 0.17, and also introduces the term precision.', 'duration': 174.408, 'highlights': ["The true positive rate is 100 divided by 105, which is about 95%. This indicates that when it's actually yes, the prediction of yes occurs about 95% of the time.", "The false positive rate is 10 divided by 60 which is about 0.17. This reveals that when it's actually no, the prediction of yes occurs about 17% of the time.", "The true negative rate is 50 over 60, which is about 0.83. This illustrates that when it's actually no, the prediction of no occurs about 83% of the time, or 1 minus the false positive rate.", 'The term precision answers the question when it predicts yes, how often is it correct? This introduces the concept of precision, which evaluates the accuracy of the positive predictions.']}, {'end': 1150.541, 'start': 978.101, 'title': 'Understanding model evaluation metrics', 'summary': 'Explains the precision rate and its calculation, touches on the roc curve and precision recall curve, and clarifies that they are not model evaluation metrics. it also mentions the limitations in computing the roc curve and the precision recall curve from the confusion matrix.', 'duration': 172.44, 'highlights': ['The precision rate answers the question, when it predicts yes, how often is it correct, with a calculation of 100 out of 110, which is about .91. The precision rate is calculated as 100 out of 110 cases, resulting in a precision rate of about .91.', 'The chapter explains the ROC curve and its purpose in plotting the true positive rate against the false positive rate, highlighting its commonality in model evaluation. The ROC curve is a common concept and is used to plot the true positive rate against the false positive rate to evaluate models.', 'The chapter mentions the precision recall curve, a similar curve to the ROC curve, and clarifies that both cannot be computed from the confusion matrix, and are not model evaluation metrics. The precision recall curve, like the ROC curve, cannot be computed from the confusion matrix and is not a model evaluation metric.']}], 'duration': 348.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY801823.jpg', 'highlights': ["The true positive rate is 100 divided by 105, which is about 95%. This indicates that when it's actually yes, the prediction of yes occurs about 95% of the time.", 'The precision rate answers the question, when it predicts yes, how often is it correct, with a calculation of 100 out of 110, which is about .91.', "The true negative rate is 50 over 60, which is about 0.83. This illustrates that when it's actually no, the prediction of no occurs about 83% of the time, or 1 minus the false positive rate.", "The false positive rate is 10 divided by 60 which is about 0.17. This reveals that when it's actually no, the prediction of yes occurs about 17% of the time.", 'The chapter explains the ROC curve and its purpose in plotting the true positive rate against the false positive rate, highlighting its commonality in model evaluation.', 'The precision recall curve, like the ROC curve, cannot be computed from the confusion matrix and is not a model evaluation metric.', 'The term precision answers the question when it predicts yes, how often is it correct? This introduces the concept of precision, which evaluates the accuracy of the positive predictions.']}, {'end': 1758.46, 'segs': [{'end': 1292.614, 'src': 'embed', 'start': 1262.213, 'weight': 1, 'content': [{'end': 1268.417, 'text': "So I'm going to show you in a second how to do this for three classes.", 'start': 1262.213, 'duration': 6.204}, {'end': 1279.31, 'text': 'but just to be clear, if you have a binary problem, meaning one with two classes, and someone asks you for the recall or the precision,', 'start': 1268.417, 'duration': 10.893}, {'end': 1285.032, 'text': "they are asking you about the recall and the precision for the positive class, and that's the convention.", 'start': 1279.31, 'duration': 5.722}, {'end': 1292.614, 'text': 'But you can compute the recall and precision for any class, okay?', 'start': 1285.472, 'duration': 7.142}], 'summary': 'Explaining how to calculate recall and precision for multiple classes', 'duration': 30.401, 'max_score': 1262.213, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1262213.jpg'}, {'end': 1437.366, 'src': 'heatmap', 'start': 1379.624, 'weight': 0, 'content': [{'end': 1387.569, 'text': 'so this is the recall for class 0, when the true value is 0, how often did it predict 0?', 'start': 1379.624, 'duration': 7.945}, {'end': 1397.987, 'text': '100% of the time, which is why the class 0 recall is 1 or 100%.', 'start': 1387.569, 'duration': 10.418}, {'end': 1405.13, 'text': 'The precision is when it predicted zero, how often was it correct?', 'start': 1397.987, 'duration': 7.143}, {'end': 1416.274, 'text': 'And the answer is 50% of the time, which is why the precision for class zero is 50%.', 'start': 1406.09, 'duration': 10.184}, {'end': 1426.639, 'text': 'And you can see that you can use that same exact process to compute the precision and recall for each of the three classes.', 'start': 1416.274, 'duration': 10.365}, {'end': 1437.366, 'text': 'This is one reason that precision and recall are really popular for multi-class problems in machine learning,', 'start': 1427.779, 'duration': 9.587}], 'summary': 'The recall for class 0 is 100%, precision is 50%, applicable to all three classes in multi-class problems.', 'duration': 39.379, 'max_score': 1379.624, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1379624.jpg'}, {'end': 1563.022, 'src': 'embed', 'start': 1537.733, 'weight': 3, 'content': [{'end': 1542.815, 'text': "and i'll just say true value zero one, two, three, four, five, six, seven, eight, nine.", 'start': 1537.733, 'duration': 5.082}, {'end': 1548.178, 'text': 'and then the predicted values zero one, two, three, four, five, six, seven, eight, nine.', 'start': 1542.815, 'duration': 5.363}, {'end': 1556.84, 'text': 'okay, And with scikit-learn they always just put those in alphabetical order or numerical order, starting in the top left.', 'start': 1548.178, 'duration': 8.662}, {'end': 1563.022, 'text': 'So you can see that the diagonals are all the correct predictions.', 'start': 1557.841, 'duration': 5.181}], 'summary': 'Scikit-learn achieved 100% accuracy in predicting values 0-9.', 'duration': 25.289, 'max_score': 1537.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1537733.jpg'}, {'end': 1628.51, 'src': 'embed', 'start': 1599.77, 'weight': 4, 'content': [{'end': 1609.457, 'text': 'represents five cases in which the actual, the true value was three and the predicted value was eight.', 'start': 1599.77, 'duration': 9.687}, {'end': 1618.744, 'text': 'Now that was the most common error, and it makes intuitive sense, because a three looks a lot like an eight, okay?', 'start': 1610.098, 'duration': 8.646}, {'end': 1628.51, 'text': "Now, conversely, let's look at the case of 8s, which is this row being predicted as 3s.", 'start': 1619.805, 'duration': 8.705}], 'summary': 'Five cases with actual value 3 were predicted as 8, while 8s were predicted as 3.', 'duration': 28.74, 'max_score': 1599.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1599770.jpg'}, {'end': 1699.283, 'src': 'embed', 'start': 1659.774, 'weight': 5, 'content': [{'end': 1672.984, 'text': "So if you look at the classification report for this particular model, you'll see that threes have the worst recall.", 'start': 1659.774, 'duration': 13.21}, {'end': 1675.226, 'text': 'okay, which makes sense.', 'start': 1672.984, 'duration': 2.242}, {'end': 1681.113, 'text': 'whereas eights have perfect recall.', 'start': 1677.571, 'duration': 3.542}, {'end': 1694.12, 'text': 'And we knew that without even looking at the classification report, because threes have all these errors, whereas eights have no errors.', 'start': 1681.973, 'duration': 12.147}, {'end': 1699.283, 'text': 'okay?. All right, hope that was helpful.', 'start': 1694.12, 'duration': 5.163}], 'summary': "Threes have the worst recall, eights have perfect recall in the model's classification report.", 'duration': 39.509, 'max_score': 1659.774, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1659774.jpg'}], 'start': 1151.202, 'title': 'Calculating precision and recall for multiple classes', 'summary': "Explains how to calculate precision and recall for multiple classes using a confusion matrix, demonstrating the process for a three-class problem and emphasizing the ability to compute precision and recall for every single class. it also illustrates a 10-class confusion matrix for recognizing handwritten digits, showing the model's performance with an example of the most common error and the impact of recall on different digits.", 'chapters': [{'end': 1466.272, 'start': 1151.202, 'title': 'Calculating precision and recall for multiple classes', 'summary': 'Explains how to calculate precision and recall for multiple classes using a confusion matrix, demonstrating the process for a three-class problem and emphasizing the ability to compute precision and recall for every single class, which is why precision and recall are popular for multi-class problems in machine learning.', 'duration': 315.07, 'highlights': ['Demonstrates the process of calculating precision and recall for a three-class problem using a confusion matrix and emphasizes the ability to compute precision and recall for every single class. The explanation provides a practical demonstration of calculating precision and recall for a three-class problem, emphasizing the capability to compute these metrics for each class.', 'Explains the convention of calculating precision and recall for the positive class in a binary problem, but highlights that precision and recall can be computed for any class in the problem. Emphasizes the convention of calculating precision and recall for the positive class in a binary problem and highlights the flexibility to compute these metrics for any class.', 'Clarifies the process of computing recall and precision for each class in a three-class scenario and mentions the popularity of precision and recall for multi-class problems in machine learning. Provides clarity on the process of computing recall and precision for each class in a three-class scenario and highlights the popularity of these metrics for multi-class problems in machine learning.']}, {'end': 1758.46, 'start': 1467.633, 'title': 'Understanding confusion matrix in multi-class classification', 'summary': "Illustrates a 10-class confusion matrix for recognizing handwritten digits, showing the model's performance with an example of the most common error and the impact of recall on different digits.", 'duration': 290.827, 'highlights': ["The model's correct predictions are indicated by the diagonals in the confusion matrix, demonstrating its overall good performance. The confusion matrix's diagonals represent the correct predictions, indicating the model's strong performance overall.", "The example of the most common error, where true value was 3 and predicted value was 8, illustrates the model's difficulty in distinguishing between these digits. The model's most common error occurs when predicting the value 8 for a true value of 3, suggesting the model struggles to differentiate between these digits.", 'The impact of recall on different digits is discussed, highlighting that threes have the worst recall while eights have perfect recall, indicating the varying accuracy in predicting different digits. The discussion emphasizes that threes have the worst recall while eights have perfect recall, showcasing the varying accuracy in predicting different digits.']}], 'duration': 607.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1151202.jpg', 'highlights': ['Emphasizes the capability to compute precision and recall for each class.', 'Highlights the flexibility to compute precision and recall for any class.', 'Provides clarity on computing recall and precision for each class in a multi-class scenario.', "The confusion matrix's diagonals represent the correct predictions, indicating the model's strong performance overall.", "The model's most common error occurs when predicting the value 8 for a true value of 3, suggesting the model struggles to differentiate between these digits.", 'Emphasizes the varying accuracy in predicting different digits, with threes having the worst recall and eights having perfect recall.']}, {'end': 2123.714, 'segs': [{'end': 1880.059, 'src': 'embed', 'start': 1820.385, 'weight': 0, 'content': [{'end': 1832.309, 'text': "So ultimately we're making a value judgment about which errors to minimize and choosing an appropriate metric to match that judgment.", 'start': 1820.385, 'duration': 11.924}, {'end': 1843.173, 'text': 'Now, just to give you a different example if you were building a fraudulent transaction detector in which the positive class is fraud,', 'start': 1833.389, 'duration': 9.784}, {'end': 1846.774, 'text': 'you might optimize instead for sensitivity.', 'start': 1843.173, 'duration': 3.601}, {'end': 1856.422, 'text': 'because false positives, meaning normal transactions flagged as possible fraud, are more acceptable well,', 'start': 1847.654, 'duration': 8.768}, {'end': 1865.449, 'text': 'at least depending on your viewpoint than false negatives, fraudulent transactions that are not detected.', 'start': 1856.422, 'duration': 9.027}, {'end': 1867.211, 'text': "so again, you're,", 'start': 1865.449, 'duration': 1.762}, {'end': 1880.059, 'text': "you're looking at your business objective and you're choosing the appropriate evaluation metric based upon what type of errors you are trying to minimize.", 'start': 1867.211, 'duration': 12.848}], 'summary': 'Choosing appropriate evaluation metric based on business objective and type of errors to minimize.', 'duration': 59.674, 'max_score': 1820.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1820385.jpg'}, {'end': 1979.202, 'src': 'embed', 'start': 1947.452, 'weight': 3, 'content': [{'end': 1958.004, 'text': 'If you built something for them and it had 90% accuracy, they would probably fire you because 90% accuracy is not good.', 'start': 1947.452, 'duration': 10.552}, {'end': 1964.931, 'text': "Why is it not good? Because let's say that 1% of transactions are fraudulent.", 'start': 1958.544, 'duration': 6.387}, {'end': 1974.198, 'text': "you can get a 99% accuracy without any work at all by always predicting that it's not fraud.", 'start': 1965.772, 'duration': 8.426}, {'end': 1979.202, 'text': 'So again, this is known as class imbalance.', 'start': 1974.759, 'duration': 4.443}], 'summary': 'A 90% accuracy may lead to dismissal due to class imbalance in fraud detection.', 'duration': 31.75, 'max_score': 1947.452, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1947452.jpg'}, {'end': 2069.583, 'src': 'embed', 'start': 2038.494, 'weight': 5, 'content': [{'end': 2049.126, 'text': "But 60% would be an amazing accuracy for a model that predicts whether a stock will go up the next day or whether it'll go up or down.", 'start': 2038.494, 'duration': 10.632}, {'end': 2059.235, 'text': 'So there is no objective way in advance to say what a good accuracy is for all problems.', 'start': 2049.987, 'duration': 9.248}, {'end': 2069.583, 'text': "Instead, you're looking at the specifics of the problem and how hard it is and how much class imbalance there is.", 'start': 2059.735, 'duration': 9.848}], 'summary': 'Model accuracy of 60% is considered amazing for predicting stock movement, depending on problem specifics and class imbalance.', 'duration': 31.089, 'max_score': 2038.494, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY2038494.jpg'}], 'start': 1759.705, 'title': 'Evaluation metrics in machine learning', 'summary': 'Discusses the significance of choosing appropriate evaluation metrics in machine learning to minimize specific errors, emphasizing the impact of false negatives and false positives on spam filters and fraudulent transaction detectors, as well as the limitations of generic accuracy thresholds in model evaluation, with examples showcasing the impact of class imbalance and specific problem contexts.', 'chapters': [{'end': 1880.059, 'start': 1759.705, 'title': 'Choosing evaluation metrics in machine learning', 'summary': 'Discusses the importance of choosing appropriate evaluation metrics based on the type of errors to minimize in machine learning, emphasizing the value judgment in optimizing for precision or sensitivity, with examples showing the impact of false negatives and false positives on spam filters and fraudulent transaction detectors.', 'duration': 120.354, 'highlights': ['Choosing appropriate evaluation metrics is crucial in machine learning, as it involves making value judgments about which errors to minimize and selecting the corresponding metric to match that judgment. Emphasizes the significance of making value judgments in selecting evaluation metrics, showcasing the importance of minimizing specific errors in machine learning.', 'The impact of false negatives and false positives is illustrated through examples of spam filters and fraudulent transaction detectors, demonstrating the consequences of each type of error on different business objectives. Provides examples of spam filters and fraudulent transaction detectors to illustrate the consequences of false negatives and false positives, emphasizing the impact on various business objectives.', 'Optimizing for precision or sensitivity depends on the specific business objective, as highlighted in the examples of spam filters and fraudulent transaction detectors. Emphasizes the importance of aligning the choice of evaluation metric with the specific business objective, using examples to demonstrate the relevance of optimizing for precision or sensitivity.']}, {'end': 2123.714, 'start': 1880.079, 'title': 'Accuracy in model evaluation', 'summary': 'Discusses the limitations of using generic accuracy thresholds for model evaluation, highlighting the impact of class imbalance and specific problem contexts, with examples of a fraudulent transaction detector and stock prediction model.', 'duration': 243.635, 'highlights': ['The relevance of generic accuracy thresholds is limited due to the impact of class imbalance and specific problem contexts, as seen in examples of a fraudulent transaction detector and stock prediction model. ', 'The example of a fraudulent transaction detector illustrates that a 90% accuracy may be insufficient if only 1% of transactions are fraudulent, highlighting the impact of class imbalance on model evaluation. 90%', "In the context of a stock prediction model, an accuracy of 60% could be considered exceptional, demonstrating the variability of what constitutes 'good' accuracy based on specific problem contexts. 60%"]}], 'duration': 364.009, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8Oog7TXHvFY/pics/8Oog7TXHvFY1759705.jpg', 'highlights': ['Choosing appropriate evaluation metrics is crucial in machine learning, emphasizing the significance of making value judgments in selecting evaluation metrics.', 'The impact of false negatives and false positives is illustrated through examples of spam filters and fraudulent transaction detectors, emphasizing the consequences of each type of error on different business objectives.', 'Optimizing for precision or sensitivity depends on the specific business objective, as highlighted in the examples of spam filters and fraudulent transaction detectors.', 'The relevance of generic accuracy thresholds is limited due to the impact of class imbalance and specific problem contexts, as seen in examples of a fraudulent transaction detector and stock prediction model.', 'The example of a fraudulent transaction detector illustrates that a 90% accuracy may be insufficient if only 1% of transactions are fraudulent, highlighting the impact of class imbalance on model evaluation.', "In the context of a stock prediction model, an accuracy of 60% could be considered exceptional, demonstrating the variability of what constitutes 'good' accuracy based on specific problem contexts."]}], 'highlights': ['The confusion matrix is a critical tool for evaluating the performance of a classification model.', 'The video promises a more in-depth explanation of the confusion matrix and related terminology.', 'The confusion matrix consists of four terms: true positives, false positives, true negatives, and false negatives, which are derived from the predictions and actual values.', 'The chapter emphasizes the importance of maximizing true positives and true negatives in a classification model, with 100 true positives and true negatives being the ideal scenario.', 'The accuracy of the classifier is 91% and the misclassification rate is 9%.', 'The distinction between false negatives as cases where the actual value is yes, but the predicted value is no, serving as type two errors, is highlighted, with the reminder that these are whole number counts, not rates or percentages.', 'The true positive rate is 100 divided by 105, which is about 95%.', 'The precision rate answers the question, when it predicts yes, how often is it correct, with a calculation of 100 out of 110, which is about .91.', 'The true negative rate is 50 over 60, which is about 0.83.', 'The false positive rate is 10 divided by 60 which is about 0.17.', "The confusion matrix's diagonals represent the correct predictions, indicating the model's strong performance overall.", "The model's most common error occurs when predicting the value 8 for a true value of 3, suggesting the model struggles to differentiate between these digits.", 'Choosing appropriate evaluation metrics is crucial in machine learning, emphasizing the significance of making value judgments in selecting evaluation metrics.', 'The impact of false negatives and false positives is illustrated through examples of spam filters and fraudulent transaction detectors, emphasizing the consequences of each type of error on different business objectives.', 'Optimizing for precision or sensitivity depends on the specific business objective, as highlighted in the examples of spam filters and fraudulent transaction detectors.', 'The relevance of generic accuracy thresholds is limited due to the impact of class imbalance and specific problem contexts, as seen in examples of a fraudulent transaction detector and stock prediction model.']}