title
XGBoost Part 2 (of 4): Classification
description
In this video we pick up where we left off in part 1 and cover how XGBoost trees are built for Classification.
NOTE: This StatQuest assumes that you are already familiar with...
XGBoost Part 1: XGBoost Trees for Regression: https://youtu.be/OtD8wVaFm6E
...the main ideas behind Gradient Boost for Classification: https://youtu.be/jxuNLH5dXCs
...Odds and Log(odds): https://youtu.be/ARfXDSkQf1Y
...and how the Logistic Function works: https://youtu.be/BfKanl1aSG0
Also note, this StatQuest is based on the following sources:
The original XGBoost manuscript: https://arxiv.org/pdf/1603.02754.pdf
The original XGBoost presentation: https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
And the XGBoost Documentation: https://xgboost.readthedocs.io/en/latest/index.html
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
Corrections:
14:24 I meant to say "larger" instead of "lower.
18:48 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
#statquest #xgboost
detail
{'title': 'XGBoost Part 2 (of 4): Classification', 'heatmap': [{'end': 660.431, 'start': 589.346, 'weight': 0.929}, {'end': 732.333, 'start': 661.914, 'weight': 0.739}, {'end': 791.347, 'start': 767.304, 'weight': 0.841}, {'end': 882.701, 'start': 862.589, 'weight': 0.739}, {'end': 1083.011, 'start': 1060.845, 'weight': 0.707}, {'end': 1169.15, 'start': 1149.133, 'weight': 0.876}], 'summary': 'Delves into xgboost for classification, explaining xgboost trees, building trees for regression, xgboost cover and leaf residuals, tree pruning, and xgboost for probability and classification, featuring details like achieving a gain of 1.33 when using a specific threshold dosage and a new predicted probability of 0.35.', 'chapters': [{'end': 263.925, 'segs': [{'end': 89.548, 'src': 'embed', 'start': 54.652, 'weight': 0, 'content': [{'end': 61.454, 'text': "The good news is that each part is pretty simple and easy to understand, and we'll go through them one step at a time.", 'start': 54.652, 'duration': 6.802}, {'end': 69.277, 'text': 'In part one in this series, we provided an overview of how XGBoost trees are built for regression.', 'start': 62.835, 'duration': 6.442}, {'end': 77.279, 'text': "In this video, part two, we'll give an overview of how XGBoost trees are built for classification.", 'start': 70.477, 'duration': 6.802}, {'end': 89.548, 'text': "And in part three we'll dive into the mathematical details to show you how regression and classification are related and why creating unique trees makes so much sense.", 'start': 78.482, 'duration': 11.066}], 'summary': 'Xgboost series covers regression, classification, and unique tree creation.', 'duration': 34.896, 'max_score': 54.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU54652.jpg'}, {'end': 180.578, 'src': 'embed', 'start': 135.305, 'weight': 1, 'content': [{'end': 140.088, 'text': 'regardless of whether you are using XGBoost for regression or classification.', 'start': 135.305, 'duration': 4.783}, {'end': 149.515, 'text': 'In other words, regardless of the dosage, the default prediction is that there is a 50% chance the drug is effective.', 'start': 141.769, 'duration': 7.746}, {'end': 167.729, 'text': 'We can illustrate the initial prediction by adding a y-axis to our graph to represent the probability that the drug is effective and drawing a thick black line at 0.5 to represent a 50% chance that the drug is effective.', 'start': 151.343, 'duration': 16.386}, {'end': 175.712, 'text': 'Since these two green dots represent effective dosages, we will move them to the top of the graph,', 'start': 169.59, 'duration': 6.122}, {'end': 180.578, 'text': 'where the probability that the drug is effective is 1..', 'start': 175.712, 'duration': 4.866}], 'summary': 'Xgboost predicts 50% drug effectiveness by default for any dosage; effective dosages have a 100% probability of effectiveness.', 'duration': 45.273, 'max_score': 135.305, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU135305.jpg'}, {'end': 239.077, 'src': 'embed', 'start': 208.988, 'weight': 5, 'content': [{'end': 216.033, 'text': 'However, since we are using XGBoost for classification, we have a new formula for the similarity scores.', 'start': 208.988, 'duration': 7.045}, {'end': 223.959, 'text': "Note, even though the numerator looks fancy, it's just the sum of the residuals squared.", 'start': 217.574, 'duration': 6.385}, {'end': 231.485, 'text': 'In other words, the numerator for classification is the same as the numerator for regression.', 'start': 225.56, 'duration': 5.925}, {'end': 239.077, 'text': 'And just like for regression, the denominator contains lambda, the regularization parameter.', 'start': 232.734, 'duration': 6.343}], 'summary': 'Xgboost uses a new formula for similarity scores in classification, with the numerator being the sum of the squared residuals, and the denominator containing the regularization parameter lambda.', 'duration': 30.089, 'max_score': 208.988, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU208988.jpg'}], 'start': 0.583, 'title': 'Xgboost for classification', 'summary': 'Provides an overview of xgboost trees for classification, covering the initial prediction process, fitting xgboost tree to residuals, and mathematical relation to regression.', 'chapters': [{'end': 106.316, 'start': 0.583, 'title': 'Xgboost part 2: trees for classification', 'summary': 'Discusses xgboost part 2, focusing on xgboost trees for classification, aiming to provide a simple overview of building trees for classification and their mathematical relation to regression.', 'duration': 105.733, 'highlights': ['XGBoost is an extensive machine learning algorithm with various parts, each of which is relatively simple and easy to understand.', 'The video series is split into three parts, with part one providing an overview of how XGBoost trees are built for regression, part two covering XGBoost trees for classification, and part three delving into the mathematical details to illustrate the relation between regression and classification.', 'The training data used consists of four different drug dosages, offering a simple example to understand the concepts.']}, {'end': 263.925, 'start': 107.967, 'title': 'Xgboost initial prediction and fitting process', 'summary': 'Explains the process of making an initial prediction in xgboost and fitting xgboost tree to the residuals for classification, with a default initial prediction of 0.5 and the process of adjusting the prediction based on observed and predicted values.', 'duration': 155.958, 'highlights': ['The default initial prediction in XGBoost is 0.5, indicating a 50% chance that the drug is effective, regardless of whether it is used for regression or classification. Default prediction probability is 0.5. Default applies to both regression and classification.', 'The process involves moving effective dosages to the top of the graph with a probability of 1 and leaving ineffective dosages at the bottom with a probability of 0. Effective dosages moved to top with probability 1. Ineffective dosages left at the bottom with probability 0.', 'The chapter explains the new formula for similarity scores in XGBoost for classification and the process of adjusting the prediction based on the observed and predicted values. New formula for similarity scores in XGBoost for classification. Process of adjusting prediction based on observed and predicted values.']}], 'duration': 263.342, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU583.jpg', 'highlights': ['The video series is split into three parts, with part one providing an overview of how XGBoost trees are built for regression, part two covering XGBoost trees for classification, and part three delving into the mathematical details to illustrate the relation between regression and classification.', 'The default initial prediction in XGBoost is 0.5, indicating a 50% chance that the drug is effective, regardless of whether it is used for regression or classification. Default prediction probability is 0.5. Default applies to both regression and classification.', 'The training data used consists of four different drug dosages, offering a simple example to understand the concepts.', 'XGBoost is an extensive machine learning algorithm with various parts, each of which is relatively simple and easy to understand.', 'The process involves moving effective dosages to the top of the graph with a probability of 1 and leaving ineffective dosages at the bottom with a probability of 0. Effective dosages moved to top with probability 1. Ineffective dosages left at the bottom with probability 0.', 'The chapter explains the new formula for similarity scores in XGBoost for classification and the process of adjusting the prediction based on the observed and predicted values. New formula for similarity scores in XGBoost for classification. Process of adjusting prediction based on observed and predicted values.']}, {'end': 588.005, 'segs': [{'end': 343.056, 'src': 'embed', 'start': 265.765, 'weight': 2, 'content': [{'end': 275.787, 'text': 'Note although this formula is different from what XGBoost uses for regression, it is very closely related and we will show you why in part three,', 'start': 265.765, 'duration': 10.022}, {'end': 277.948, 'text': 'when we get into the nitty gritty details.', 'start': 275.787, 'duration': 2.161}, {'end': 280.868, 'text': "Now let's build a tree.", 'start': 279.568, 'duration': 1.3}, {'end': 289.697, 'text': 'Just like for regression, each tree starts out as a single leaf, and all of the residuals go to the leaf.', 'start': 282.128, 'duration': 7.569}, {'end': 299.68, 'text': 'Now we need to calculate a similarity score for the leaf, and that means we plug all four residuals into the numerator.', 'start': 291.557, 'duration': 8.123}, {'end': 306.162, 'text': 'Note, because we do not square the residuals before we add them together,', 'start': 301.261, 'duration': 4.901}, {'end': 314.945, 'text': 'they will cancel each other out and we will end up with zero in the numerator, and that makes the similarity score equal to zero.', 'start': 306.162, 'duration': 8.783}, {'end': 322.985, 'text': "And that's a little bit of a bummer since it doesn't give us a chance to talk about the denominator, which is the interesting part.", 'start': 316.502, 'duration': 6.483}, {'end': 326.367, 'text': "However, don't freak out.", 'start': 324.626, 'duration': 1.741}, {'end': 327.948, 'text': "We'll get to it soon.", 'start': 326.827, 'duration': 1.121}, {'end': 335.052, 'text': "For now, let's just put similarity equals zero up here so we can keep track of it.", 'start': 329.429, 'duration': 5.623}, {'end': 343.056, 'text': 'Now we need to decide if we can do a better job clustering similar residuals if we split them into two groups.', 'start': 336.552, 'duration': 6.504}], 'summary': 'Building a tree for xgboost regression, initially leading to a similarity score of zero, with further details in part three.', 'duration': 77.291, 'max_score': 265.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU265765.jpg'}, {'end': 553.352, 'src': 'embed', 'start': 484.437, 'weight': 0, 'content': [{'end': 490.863, 'text': "Since I'm such a nice guy, I'm going to tell you that no other threshold gives us a larger gain value.", 'start': 484.437, 'duration': 6.426}, {'end': 497.168, 'text': 'And that means dosage less than 15 will be the first branch in our tree.', 'start': 492.184, 'duration': 4.984}, {'end': 502.693, 'text': 'Now we can focus on splitting these residuals into two leaves.', 'start': 499.05, 'duration': 3.643}, {'end': 517.147, 'text': 'Note, we can tell just by looking at the data that this threshold, dosage less than 5, has a higher gain than this threshold, dosage less than 10.', 'start': 504.395, 'duration': 12.752}, {'end': 523.45, 'text': 'This is because when the threshold is dosage less than 10, these two residuals will cancel each other out.', 'start': 517.147, 'duration': 6.303}, {'end': 528.072, 'text': 'And the similarity score for this leaf will be zero.', 'start': 524.85, 'duration': 3.222}, {'end': 539.921, 'text': 'So when we calculate the gain, we get 0.66.', 'start': 529.592, 'duration': 10.329}, {'end': 546.787, 'text': "Now let's compare that to the gain we get when the threshold is dosage less than 5.", 'start': 539.921, 'duration': 6.866}, {'end': 548.628, 'text': 'These are the similarity scores.', 'start': 546.787, 'duration': 1.841}, {'end': 553.352, 'text': 'And when we plug them into the equation for the gain, we get 2.66.', 'start': 549.869, 'duration': 3.483}], 'summary': 'Threshold of dosage less than 5 yields a higher gain of 2.66 compared to 0.66 for dosage less than 10.', 'duration': 68.915, 'max_score': 484.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU484437.jpg'}], 'start': 265.765, 'title': 'Building trees for regression and decision tree process', 'summary': 'Covers the initial steps of building a tree for regression, including starting with a single leaf, calculating a similarity score, and considering the possibility of splitting residuals. it also explains the process of building a decision tree using threshold dosages, achieving a gain of 1.33 when using the threshold dosage less than 15, and comparing gains between thresholds of less than 10 and less than 5, resulting in 0.66 and 2.66, respectively.', 'chapters': [{'end': 343.056, 'start': 265.765, 'title': 'Building a tree for regression', 'summary': 'Explains the initial steps of building a tree for regression, including starting with a single leaf, calculating a similarity score, and considering the possibility of splitting residuals into two groups.', 'duration': 77.291, 'highlights': ['Each tree starts out as a single leaf, and all of the residuals go to the leaf, setting the initial stage for building the tree.', 'The similarity score is calculated for the leaf by plugging all residuals into the numerator, resulting in a score of zero due to the cancellation of unsquared residuals.', 'The discussion of the denominator, an interesting aspect, is deferred for later exploration in the chapter, creating anticipation for upcoming details.']}, {'end': 588.005, 'start': 344.633, 'title': 'Decision tree building process', 'summary': 'Explains the process of building a decision tree using threshold dosages, achieving a gain of 1.33 when using the threshold dosage less than 15, and comparing gains between thresholds of less than 10 and less than 5, resulting in 0.66 and 2.66 respectively.', 'duration': 243.372, 'highlights': ['The gain achieved using the threshold dosage less than 15 is 1.33, which is the largest gain value obtained from all other thresholds.', 'When comparing the gain values, the threshold dosage less than 5 results in a higher gain of 2.66, surpassing the gain of 0.66 obtained with the threshold dosage less than 10.', 'The decision to limit the tree to two levels is mentioned, signifying the completion of the tree building process.']}], 'duration': 322.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU265765.jpg', 'highlights': ['The gain achieved using the threshold dosage less than 15 is 1.33, the largest gain value obtained.', 'The threshold dosage less than 5 results in a higher gain of 2.66, surpassing the gain of 0.66 obtained with the threshold dosage less than 10.', 'Each tree starts out as a single leaf, setting the initial stage for building the tree.', 'The similarity score is calculated for the leaf by plugging all residuals into the numerator, resulting in a score of zero.', 'The discussion of the denominator, an interesting aspect, is deferred for later exploration in the chapter.']}, {'end': 766.003, 'segs': [{'end': 732.333, 'src': 'heatmap', 'start': 589.346, 'weight': 0, 'content': [{'end': 595.23, 'text': 'However, XGBoost also has a threshold for the minimum number of residuals in each leaf.', 'start': 589.346, 'duration': 5.884}, {'end': 600.935, 'text': "Warning, it's time for some tedious detail and terminology.", 'start': 596.712, 'duration': 4.223}, {'end': 608.12, 'text': 'the minimum number of residuals in each leaf is determined by calculating something called cover.', 'start': 602.537, 'duration': 5.583}, {'end': 614.543, 'text': 'Cover is defined as the denominator of the similarity score minus lambda.', 'start': 609.441, 'duration': 5.102}, {'end': 626.39, 'text': 'In other words, when we are using XGBoost for classification cover is equal to the sum of the previous probability times,', 'start': 615.844, 'duration': 10.546}, {'end': 630.812, 'text': "one minus the previous probability for each residual, that's in the leaf.", 'start': 626.39, 'duration': 4.422}, {'end': 641.064, 'text': 'In contrast, when XGBoost is used for regression and we are using this formula for the similarity score,', 'start': 632.941, 'duration': 8.123}, {'end': 646.106, 'text': 'then cover is equal to the number of residuals in a leaf.', 'start': 641.064, 'duration': 5.042}, {'end': 651.608, 'text': 'By default, the minimum value for cover is one.', 'start': 647.826, 'duration': 3.782}, {'end': 660.431, 'text': 'Thus, by default, when we use XGBoost for regression, we can have as few as one residual per leaf.', 'start': 653.008, 'duration': 7.423}, {'end': 671.501, 'text': 'In other words, when we use XGBoost for regression and use the default minimum value for cover, cover has no effect on how we grow the tree.', 'start': 661.914, 'duration': 9.587}, {'end': 679.747, 'text': 'In contrast, things are way more complicated when we use XGBoost for classification,', 'start': 673.443, 'duration': 6.304}, {'end': 685.311, 'text': 'because cover depends on the previously predicted probability of each residual in a leaf.', 'start': 679.747, 'duration': 5.564}, {'end': 690.115, 'text': 'For example, the cover for this leaf is..', 'start': 687.273, 'duration': 2.842}, {'end': 705.983, 'text': 'the previously predicted probability for this observation, which was 0.5, times 1 minus the previously predicted probability, which is 0.25.', 'start': 691.277, 'duration': 14.706}, {'end': 712.346, 'text': 'And since the default value for the minimum cover is 1, XGBoost would not allow this leaf.', 'start': 705.983, 'duration': 6.363}, {'end': 720.831, 'text': 'Likewise, the cover for this leaf is equal to 0.5.', 'start': 714.027, 'duration': 6.804}, {'end': 725.572, 'text': 'So, by default, XGBoost would not allow this leaf either.', 'start': 720.831, 'duration': 4.741}, {'end': 732.333, 'text': "Since these leaves are not allowed, let's remove them and go back to this leaf.", 'start': 727.312, 'duration': 5.021}], 'summary': "Xgboost's minimum cover default value is one, affecting regression but not classification.", 'duration': 123, 'max_score': 589.346, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU589346.jpg'}, {'end': 766.003, 'src': 'embed', 'start': 740.114, 'weight': 2, 'content': [{'end': 752.857, 'text': 'cover is just three times the cover for one of the residuals, and that means cover equals 0.75, so XGBoost would not allow this leaf either.', 'start': 740.114, 'duration': 12.743}, {'end': 762.241, 'text': 'Ultimately, if we use the default minimum value for cover 1, then we would be left with the root,', 'start': 754.578, 'duration': 7.663}, {'end': 766.003, 'text': 'and XGBoost requires trees to be larger than just the root.', 'start': 762.241, 'duration': 3.762}], 'summary': 'Xgboost requires trees with cover > 0.75 and minimum cover of 1 for non-root nodes.', 'duration': 25.889, 'max_score': 740.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU740114.jpg'}], 'start': 589.346, 'title': 'Xgboost cover and leaf residuals', 'summary': 'Explains the concept of cover in xgboost, where the minimum value for cover is one by default for regression and depends on previously predicted probability for classification, while xgboost requires trees to be larger than just the root.', 'chapters': [{'end': 766.003, 'start': 589.346, 'title': 'Xgboost minimum cover and leaf residuals', 'summary': 'Explains the concept of cover in xgboost, where the minimum value for cover is one by default for regression, allowing as few as one residual per leaf, while for classification, cover depends on the previously predicted probability of each residual in a leaf, and xgboost requires trees to be larger than just the root.', 'duration': 176.657, 'highlights': ['The default minimum value for cover in XGBoost is one for regression, allowing as few as one residual per leaf.', 'For classification in XGBoost, cover depends on the previously predicted probability of each residual in a leaf, influencing the tree growth.', 'XGBoost requires trees to be larger than just the root, as using the default minimum value for cover would leave only the root.']}], 'duration': 176.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU589346.jpg', 'highlights': ['For classification in XGBoost, cover depends on the previously predicted probability of each residual in a leaf, influencing the tree growth.', 'The default minimum value for cover in XGBoost is one for regression, allowing as few as one residual per leaf.', 'XGBoost requires trees to be larger than just the root, as using the default minimum value for cover would leave only the root.']}, {'end': 1059.344, 'segs': [{'end': 836.681, 'src': 'heatmap', 'start': 767.304, 'weight': 0, 'content': [{'end': 776.621, 'text': "So, in order to prevent this from being the worst example ever, let's set the minimum value for cover equal to 0.", 'start': 767.304, 'duration': 9.317}, {'end': 781.483, 'text': 'and that means setting the minChildWeight parameter equal to zero.', 'start': 776.621, 'duration': 4.862}, {'end': 783.904, 'text': 'Small bam.', 'start': 782.904, 'duration': 1}, {'end': 787.666, 'text': 'Now we can talk about how to prune the tree.', 'start': 785.225, 'duration': 2.441}, {'end': 791.347, 'text': 'Just like we did in part one,', 'start': 789.346, 'duration': 2.001}, {'end': 798.61, 'text': 'we prune by calculating the difference between the gain associated with the lowest branch and a number we pick for gamma.', 'start': 791.347, 'duration': 7.263}, {'end': 809.927, 'text': 'For example, if we plugged in the gain, and set gamma equal to 2, then we would not prune because the difference is a positive number.', 'start': 799.811, 'duration': 10.116}, {'end': 819.593, 'text': 'In contrast, if we set gamma equal to 3, then we would prune because the difference is a negative number.', 'start': 811.808, 'duration': 7.785}, {'end': 826.317, 'text': 'And we would also prune this branch because 1.33 minus 3 equals a negative number.', 'start': 821.374, 'duration': 4.943}, {'end': 834.8, 'text': 'and all we would be left with is the original prediction.', 'start': 831.437, 'duration': 3.363}, {'end': 836.681, 'text': 'Small bam.', 'start': 835.84, 'duration': 0.841}], 'summary': 'Setting the minchildweight parameter to 0 prevents worst example, pruning based on gain and gamma.', 'duration': 69.377, 'max_score': 767.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU767304.jpg'}, {'end': 992.292, 'src': 'heatmap', 'start': 862.589, 'weight': 2, 'content': [{'end': 869.413, 'text': 'And that means a lower value for gamma will result in a negative difference and cause us to prune branches.', 'start': 862.589, 'duration': 6.824}, {'end': 872.475, 'text': 'In other words,', 'start': 871.214, 'duration': 1.261}, {'end': 882.701, 'text': 'values for lambda greater than zero reduce the sensitivity of the tree to individual observations by pruning and combining them with other observations.', 'start': 872.475, 'duration': 10.226}, {'end': 897.009, 'text': "Bam! For now, regardless of lambda and gamma, let's assume that this is the tree we are working with and determine the output values for the leaves.", 'start': 884.158, 'duration': 12.851}, {'end': 900.011, 'text': 'For classification.', 'start': 898.57, 'duration': 1.441}, {'end': 915.561, 'text': 'the output value for a leaf is the sum of the residuals divided by the sum of the previous probability times 1 minus the previous probability for each residual in the leaf plus lambda.', 'start': 900.011, 'duration': 15.55}, {'end': 926.245, 'text': 'Note, with the exception of lambda, the regularization parameter, this is the same formula we used for unextreme gradient boost.', 'start': 917.402, 'duration': 8.843}, {'end': 934.308, 'text': 'So for this leaf we plug in the residual negative 0.5,', 'start': 927.645, 'duration': 6.663}, {'end': 941.25, 'text': 'and the previously predicted probability and the value for the regularization parameter lambda.', 'start': 934.308, 'duration': 6.942}, {'end': 950.356, 'text': 'If lambda equals 0, then there is no regularization and the output value equals negative 2.', 'start': 942.594, 'duration': 7.762}, {'end': 963.879, 'text': 'On the other hand, if lambda equals 1, then the output value equals negative 0.4, which is closer to 0 than negative 2 when lambda equals 0.', 'start': 950.356, 'duration': 13.523}, {'end': 971.981, 'text': 'In other words, when lambda is greater than 0, then it reduces the amount that this single observation adds to the new prediction.', 'start': 963.879, 'duration': 8.102}, {'end': 981.682, 'text': 'Thus, lambda, the regularization parameter, reduces the prediction sensitivity to isolated observations.', 'start': 973.715, 'duration': 7.967}, {'end': 992.292, 'text': "For now, we'll let lambda equal zero because this is the default value and put negative two under the leaf so we will remember it.", 'start': 983.384, 'duration': 8.908}], 'summary': 'Lower gamma results in negative difference, lambda reduces sensitivity and affects output value for leaves.', 'duration': 121.078, 'max_score': 862.589, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU862589.jpg'}], 'start': 767.304, 'title': 'Tree pruning and xgboost output value', 'summary': "Discusses setting parameters for tree pruning, including cover value, gain and gamma difference for pruning, and lambda's impact on sensitivity. it also explains xgboost output value calculation, involving residuals, previous probabilities, and lambda's effect on prediction sensitivity.", 'chapters': [{'end': 900.011, 'start': 767.304, 'title': 'Tree pruning and regularization', 'summary': 'Discusses setting parameters for tree pruning and regularization, including setting the minimum value for cover, pruning by calculating the difference between gain and gamma, and the impact of lambda on reducing sensitivity to individual observations.', 'duration': 132.707, 'highlights': ['Setting the minimum value for cover equal to 0 and the minChildWeight parameter equal to zero is crucial for preventing negative differences and ensuring proper tree pruning. By setting the minimum value for cover equal to 0 and the minChildWeight parameter equal to zero, negative differences can be avoided, ensuring proper tree pruning.', 'Pruning by calculating the difference between the gain associated with the lowest branch and a number picked for gamma is essential to determine whether to prune branches based on the positivity or negativity of the difference. Pruning by calculating the difference between the gain associated with the lowest branch and a number picked for gamma is essential to determine whether to prune branches based on the positivity or negativity of the difference.', 'Lambda, the regularization parameter, reduces the sensitivity of the tree to individual observations by pruning and combining them with other observations, with higher values resulting in lower values for gain and a lower value for gamma, causing the pruning of branches. Lambda, the regularization parameter, reduces the sensitivity of the tree to individual observations by pruning and combining them with other observations, with higher values resulting in lower values for gain and a lower value for gamma, causing the pruning of branches.']}, {'end': 1059.344, 'start': 900.011, 'title': 'Xgboost output value calculation', 'summary': 'Explains the calculation of output values for a leaf in xgboost, where the output value is determined by the sum of residuals divided by the sum of previous probabilities, adjusted by the regularization parameter lambda, impacting the prediction sensitivity to isolated observations.', 'duration': 159.333, 'highlights': ['The output value for a leaf is determined by the sum of residuals divided by the sum of previous probability times 1 minus the previous probability, plus the regularization parameter lambda, with specific examples provided for lambda values of 0 and 1.', 'When lambda equals 0, the output value is -2, and when lambda equals 1, the output value is -0.4, illustrating the impact of lambda on the output value and its role in reducing prediction sensitivity to isolated observations.', 'The explanation highlights the impact of lambda on the prediction sensitivity to isolated observations, where lambda reduces the amount that a single observation adds to the new prediction, and the default value of lambda is set at 0.', 'The completion of the first tree is celebrated, and the process of making new predictions in XGBoost for classification is briefly mentioned.']}], 'duration': 292.04, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU767304.jpg', 'highlights': ['Setting the minimum value for cover equal to 0 and the minChildWeight parameter equal to zero is crucial for preventing negative differences and ensuring proper tree pruning.', 'Pruning by calculating the difference between the gain associated with the lowest branch and a number picked for gamma is essential to determine whether to prune branches based on the positivity or negativity of the difference.', 'Lambda, the regularization parameter, reduces the sensitivity of the tree to individual observations by pruning and combining them with other observations, with higher values resulting in lower values for gain and a lower value for gamma, causing the pruning of branches.', 'The output value for a leaf is determined by the sum of residuals divided by the sum of previous probability times 1 minus the previous probability, plus the regularization parameter lambda, with specific examples provided for lambda values of 0 and 1.', 'When lambda equals 0, the output value is -2, and when lambda equals 1, the output value is -0.4, illustrating the impact of lambda on the output value and its role in reducing prediction sensitivity to isolated observations.', 'The explanation highlights the impact of lambda on the prediction sensitivity to isolated observations, where lambda reduces the amount that a single observation adds to the new prediction, and the default value of lambda is set at 0.']}, {'end': 1517.161, 'segs': [{'end': 1083.011, 'src': 'heatmap', 'start': 1060.845, 'weight': 0.707, 'content': [{'end': 1069.312, 'text': 'However, just like with Unextreme Gradient Boost for classification, we need to convert this probability to a log odds value.', 'start': 1060.845, 'duration': 8.467}, {'end': 1075.278, 'text': 'So, since this is the formula that converts probabilities to odds,', 'start': 1070.874, 'duration': 4.404}, {'end': 1083.011, 'text': 'we can get a formula that converts probabilities to the log odds by taking the log of both sides.', 'start': 1076.462, 'duration': 6.549}], 'summary': 'Convert classification probabilities to log odds using a formula.', 'duration': 22.166, 'max_score': 1060.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU1060845.jpg'}, {'end': 1196.608, 'src': 'heatmap', 'start': 1149.133, 'weight': 0, 'content': [{'end': 1163.328, 'text': 'plus the learning rate 0.3, times the output value negative 2, and that gives us a log of the odds value equal to negative 0.6..', 'start': 1149.133, 'duration': 14.195}, {'end': 1169.15, 'text': 'To convert a log of the odds value into a probability, we plug it into the logistic function.', 'start': 1163.328, 'duration': 5.822}, {'end': 1178.273, 'text': 'Note. if the logistic function makes you feel a little uncomfortable, check out the StatQuest Logistic Regression Details.', 'start': 1170.49, 'duration': 7.783}, {'end': 1181.594, 'text': 'Part 2, Fitting a Line with Maximum Likelihood.', 'start': 1178.273, 'duration': 3.321}, {'end': 1187.621, 'text': "Assuming we're cool with this equation, let's plug in the log of the odds.", 'start': 1183.317, 'duration': 4.304}, {'end': 1196.608, 'text': 'Do the math, and the new predicted probability is 0.35.', 'start': 1189.202, 'duration': 7.406}], 'summary': 'Using a learning rate of 0.3 and output value of -2, the log odds value is -0.6. applying logistic function gives a probability of 0.35.', 'duration': 67.784, 'max_score': 1149.133, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU1149133.jpg'}, {'end': 1398.452, 'src': 'embed', 'start': 1372.648, 'weight': 3, 'content': [{'end': 1382.785, 'text': 'Now that we have a new tree, we add it to all of the previous predictions and make new predictions that give us even smaller residuals.', 'start': 1372.648, 'duration': 10.137}, {'end': 1394.79, 'text': 'Then we build another tree based on the new residuals and we keep building trees until the residuals are super small or we have reached the maximum number of trees.', 'start': 1384.325, 'duration': 10.465}, {'end': 1398.452, 'text': 'Triple bam!.', 'start': 1396.391, 'duration': 2.061}], 'summary': 'Building new trees reduces residuals, reaching smaller residuals or max trees.', 'duration': 25.804, 'max_score': 1372.648, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU1372648.jpg'}, {'end': 1489.829, 'src': 'embed', 'start': 1461.723, 'weight': 4, 'content': [{'end': 1471.128, 'text': 'When using XGBoost for classification, we have to be aware that the minimum number of residuals in a leaf is related to a metric called cover,', 'start': 1461.723, 'duration': 9.405}, {'end': 1474.99, 'text': 'which is the denominator of the similarity score minus lambda.', 'start': 1471.128, 'duration': 3.862}, {'end': 1480.178, 'text': 'Tune in next time for XGBoost Part 3,', 'start': 1476.874, 'duration': 3.304}, {'end': 1489.829, 'text': 'when we dive deep into the nitty-gritty details of the math that ties XGBoost trees for regression and classification into one elegant equation.', 'start': 1480.178, 'duration': 9.651}], 'summary': 'Xgboost classification involves tuning minimum residuals and cover, with upcoming deep dive into math details.', 'duration': 28.106, 'max_score': 1461.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU1461723.jpg'}], 'start': 1060.845, 'title': 'Xgboost for probability and classification', 'summary': 'Covers converting probabilities to log odds, adding log odds to tree output, scaling by learning rate, and building xgboost trees for classification, with an example showing a new predicted probability of 0.35, and invites further explorations on xgboost part 3.', 'chapters': [{'end': 1196.608, 'start': 1060.845, 'title': 'Xgboost for probability: log odds and learning rate', 'summary': 'Explains the process of converting probabilities to log odds, adding the log odds of the initial prediction to the output of the tree, and scaling it by a learning rate, with an example showing a new predicted probability of 0.35.', 'duration': 135.763, 'highlights': ['XGBoost adds the log odds of the initial prediction to the output of the tree, scaled by a learning rate, with a default value of 0.3, resulting in a new predicted probability of 0.35.', 'The process involves converting a log of the odds value into a probability using the logistic function, with an example showing the new predicted probability as 0.35.', 'The formula for converting probabilities to log odds involves taking the log of both sides, resulting in a log of the odds value of 0 when the probability equals 0.5.']}, {'end': 1517.161, 'start': 1196.608, 'title': 'Xgboost trees for classification', 'summary': 'Explains the process of building xgboost trees for classification, utilizing the initial prediction, residual calculation, tree building, similarity scores, gain determination, pruning, and regularization parameters, and concludes by inviting for further explorations on xgboost part 3.', 'duration': 320.553, 'highlights': ['XGBoost tree building utilizes the initial prediction, residual calculation, and tree building process with examples of different initial predictions yielding more interesting results. Initial prediction, residual calculation, tree building process, examples of different initial predictions', 'The chapter covers the process of building new trees, creating new predictions, and continuously building trees until achieving super small residuals. Building new trees, creating new predictions, continuous tree building process, achieving super small residuals', 'The explanation of XGBoost tree building encompasses calculating similarity scores, gain, pruning, and the impact of the regularization parameter lambda, as well as the minimum number of residuals in a leaf being related to the metric called cover. Calculating similarity scores, gain determination, pruning process, impact of regularization parameter lambda, minimum number of residuals in a leaf related to cover metric']}], 'duration': 456.316, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8b1JEDvenQU/pics/8b1JEDvenQU1060845.jpg', 'highlights': ['XGBoost adds the log odds of the initial prediction to the output of the tree, scaled by a learning rate, with a default value of 0.3, resulting in a new predicted probability of 0.35.', 'The process involves converting a log of the odds value into a probability using the logistic function, with an example showing the new predicted probability as 0.35.', 'The formula for converting probabilities to log odds involves taking the log of both sides, resulting in a log of the odds value of 0 when the probability equals 0.5.', 'The chapter covers the process of building new trees, creating new predictions, and continuously building trees until achieving super small residuals. Building new trees, creating new predictions, continuous tree building process, achieving super small residuals', 'The explanation of XGBoost tree building encompasses calculating similarity scores, gain, pruning, and the impact of the regularization parameter lambda, as well as the minimum number of residuals in a leaf being related to the metric called cover. Calculating similarity scores, gain determination, pruning process, impact of regularization parameter lambda, minimum number of residuals in a leaf related to cover metric', 'XGBoost tree building utilizes the initial prediction, residual calculation, and tree building process with examples of different initial predictions yielding more interesting results. Initial prediction, residual calculation, tree building process, examples of different initial predictions']}], 'highlights': ['The threshold dosage less than 5 results in a higher gain of 2.66, surpassing the gain of 0.66 obtained with the threshold dosage less than 10.', 'The gain achieved using the threshold dosage less than 15 is 1.33, the largest gain value obtained.', 'XGBoost adds the log odds of the initial prediction to the output of the tree, scaled by a learning rate, with a default value of 0.3, resulting in a new predicted probability of 0.35.', 'The video series is split into three parts, with part one providing an overview of how XGBoost trees are built for regression, part two covering XGBoost trees for classification, and part three delving into the mathematical details to illustrate the relation between regression and classification.', 'The process involves converting a log of the odds value into a probability using the logistic function, with an example showing the new predicted probability as 0.35.', 'The default initial prediction in XGBoost is 0.5, indicating a 50% chance that the drug is effective, regardless of whether it is used for regression or classification. Default prediction probability is 0.5. Default applies to both regression and classification.', 'The training data used consists of four different drug dosages, offering a simple example to understand the concepts.', 'XGBoost is an extensive machine learning algorithm with various parts, each of which is relatively simple and easy to understand.', 'The process involves moving effective dosages to the top of the graph with a probability of 1 and leaving ineffective dosages at the bottom with a probability of 0. Effective dosages moved to top with probability 1. Ineffective dosages left at the bottom with probability 0.', 'The chapter explains the new formula for similarity scores in XGBoost for classification and the process of adjusting the prediction based on the observed and predicted values. New formula for similarity scores in XGBoost for classification. Process of adjusting prediction based on observed and predicted values.']}