title

XGBoost Part 1 (of 4): Regression

description

XGBoost is an extreme machine learning algorithm, and that means it's got lots of parts. In this video, we focus on the unique regression trees that XGBoost uses when applied to Regression problems.
NOTE: This StatQuest assumes that you are already familiar with...
The main ideas behind Gradient Boost for Regression: https://youtu.be/3CC4N4z3GJc
...and the main ideas behind Regularization: https://youtu.be/Q81RR3yKn30
Also note, this StatQuest is based on the following sources:
The original XGBoost manuscript: https://arxiv.org/pdf/1603.02754.pdf
And the XGBoost Documentation: https://xgboost.readthedocs.io/en/latest/index.html
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
2:35 The initial prediction
3:11 Building an XGBoost Tree for regression
4:07 Calculating Similarity Scores
8:23 Calculating Gain to evaluate different thresholds
13:02 Pruning an XGBoost Tree
15:15 Building an XGBoost Tree with regularization
19:29 Calculating output values for an XGBoost Tree
21:39 Making predictions with XGBoost
23:54 Summary of concepts and main ideas
Corrections:
16:50 I say "66", but I meant to say "62.48". However, either way, the conclusion is the same.
22:03 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
#statquest #xgboost

detail

{'title': 'XGBoost Part 1 (of 4): Regression', 'heatmap': [{'end': 357.461, 'start': 325.774, 'weight': 0.813}, {'end': 823.913, 'start': 770.79, 'weight': 0.93}], 'summary': 'Series introduces xgboost, a machine learning algorithm designed for complex datasets, with a focus on regression trees and modeling. it covers key concepts such as residual similarity, decision tree threshold calculation, tree pruning, lambda impact, and the iterative nature of building xgboost trees, providing specific values and quantifiable results for clarity and understanding.', 'chapters': [{'end': 153.968, 'segs': [{'end': 129.902, 'src': 'embed', 'start': 38.156, 'weight': 0, 'content': [{'end': 39.816, 'text': 'If not, check out the quests.', 'start': 38.156, 'duration': 1.66}, {'end': 42.157, 'text': 'The links are in the description below.', 'start': 40.297, 'duration': 1.86}, {'end': 51.802, 'text': "XGBoost is EXTREME! And that means it's a big machine learning algorithm with lots of parts.", 'start': 44.358, 'duration': 7.444}, {'end': 59.813, 'text': "The good news is that each part is pretty simple and easy to understand, and we'll go through them one step at a time.", 'start': 53.049, 'duration': 6.764}, {'end': 67.256, 'text': "Actually, I'm assuming that you are already familiar with gradient boost and regularization.", 'start': 61.613, 'duration': 5.643}, {'end': 73.54, 'text': "So we'll start by learning about XGBoost's unique regression trees.", 'start': 68.937, 'duration': 4.603}, {'end': 79.823, 'text': "Because this is a big topic, we'll spend three whole stack quests on it.", 'start': 75.04, 'duration': 4.783}, {'end': 89.501, 'text': "In this StatQuest, Part 1, we'll build our intuition about how XGBoost does regression with its unique trees.", 'start': 81.195, 'duration': 8.306}, {'end': 97.106, 'text': "In Part 2, we'll build our intuition about how XGBoost does classification.", 'start': 91.422, 'duration': 5.684}, {'end': 100.768, 'text': 'And in Part 3,', 'start': 99.287, 'duration': 1.481}, {'end': 110.215, 'text': "we'll dive into the mathematical details and show you how regression and classification are related and why creating unique trees makes so much sense.", 'start': 100.768, 'duration': 9.447}, {'end': 117.417, 'text': 'XGBoost was designed to be used with large, complicated datasets.', 'start': 112.955, 'duration': 4.462}, {'end': 125.26, 'text': "However, to keep the examples from getting out of hand, we'll use this super simple training data.", 'start': 119.057, 'duration': 6.203}, {'end': 129.902, 'text': 'On the x-axis, we have different drug dosages.', 'start': 126.801, 'duration': 3.101}], 'summary': 'Xgboost is a big machine learning algorithm; three parts covering regression, classification, and mathematical details.', 'duration': 91.746, 'max_score': 38.156, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E38156.jpg'}], 'start': 9.978, 'title': 'Xgboost and its regression trees', 'summary': "Introduces xgboost, a complex machine learning algorithm with plans for a three-part statquest series. it emphasizes xgboost's design for large, complicated datasets and simplicity in understanding each part.", 'chapters': [{'end': 153.968, 'start': 9.978, 'title': 'Xgboost part 1: regression trees', 'summary': "Introduces xgboost, a complex machine learning algorithm, and its unique regression trees, with plans for a three-part statquest series. it emphasizes xgboost's design for large, complicated datasets and simplicity in understanding each part.", 'duration': 143.99, 'highlights': ['XGBoost is a big machine learning algorithm with lots of parts, but each part is simple and easy to understand, making it suitable for large, complicated datasets.', "The chapter will cover XGBoost's unique regression trees in a three-part StatQuest series, focusing on regression in Part 1, classification in Part 2, and the mathematical details and the relationship between regression and classification in Part 3.", 'XGBoost was designed for use with large, complicated datasets, and although it will be demonstrated with simple training data, it is intended for more complex datasets.']}], 'duration': 143.99, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E9978.jpg', 'highlights': ['XGBoost was designed for use with large, complicated datasets, and although it will be demonstrated with simple training data, it is intended for more complex datasets.', "The chapter will cover XGBoost's unique regression trees in a three-part StatQuest series, focusing on regression in Part 1, classification in Part 2, and the mathematical details and the relationship between regression and classification in Part 3.", 'XGBoost is a big machine learning algorithm with lots of parts, but each part is simple and easy to understand, making it suitable for large, complicated datasets.']}, {'end': 508.341, 'segs': [{'end': 246.128, 'src': 'embed', 'start': 156.029, 'weight': 0, 'content': [{'end': 161.952, 'text': 'The very first step in fitting XGBoost to the training data is to make an initial prediction.', 'start': 156.029, 'duration': 5.923}, {'end': 173.306, 'text': "This prediction can be anything, but by default it is 0.5, regardless of whether you're using XGBoost for regression or classification.", 'start': 163.117, 'duration': 10.189}, {'end': 181.353, 'text': 'The prediction, 0.5, corresponds to this thick black horizontal line.', 'start': 174.967, 'duration': 6.386}, {'end': 190.141, 'text': 'And the residuals, the differences between the observed and predicted values, show us how good the initial prediction is.', 'start': 182.854, 'duration': 7.287}, {'end': 198.908, 'text': 'Now, just like Unextreme Gradient Boost, XGBoost fits a regression tree to the residuals.', 'start': 192.003, 'duration': 6.905}, {'end': 208.776, 'text': 'However, unlike Unextreme Gradient Boost, which typically uses regular off-the-shelf regression trees,', 'start': 200.169, 'duration': 8.607}, {'end': 214.38, 'text': 'XGBoost uses a unique regression tree that I call an XGBoost tree.', 'start': 208.776, 'duration': 5.604}, {'end': 219.865, 'text': "So let's talk about how to build an XGBoost tree for regression.", 'start': 215.882, 'duration': 3.983}, {'end': 225.101, 'text': 'There are many ways to build XGBoost trees.', 'start': 222.18, 'duration': 2.921}, {'end': 229.843, 'text': 'This video focuses on the most common way to build them for regression.', 'start': 225.701, 'duration': 4.142}, {'end': 238.606, 'text': 'Each tree starts out as a single leaf, and all of the residuals go to the leaf.', 'start': 232.003, 'duration': 6.603}, {'end': 246.128, 'text': 'Now we calculate a quality score, or similarity score, for the residuals.', 'start': 240.706, 'duration': 5.422}], 'summary': 'Xgboost initially predicts 0.5, fits unique regression trees, and calculates quality scores for residuals.', 'duration': 90.099, 'max_score': 156.029, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E156029.jpg'}, {'end': 376.271, 'src': 'heatmap', 'start': 325.774, 'weight': 3, 'content': [{'end': 333.396, 'text': 'Now the question is whether or not we can do a better job clustering similar residuals if we split them into two groups.', 'start': 325.774, 'duration': 7.622}, {'end': 340.257, 'text': 'To answer this, we first focus on the two observations with the lowest dosages.', 'start': 334.956, 'duration': 5.301}, {'end': 347.619, 'text': 'Their average dosage is 15, and that corresponds to this dotted red line.', 'start': 342.058, 'duration': 5.561}, {'end': 357.461, 'text': 'So we split the observations into two groups based on whether or not the dosage is less than 15.', 'start': 349.477, 'duration': 7.984}, {'end': 363.364, 'text': 'The observation on the far left is the only one with dosage less than 15.', 'start': 357.461, 'duration': 5.903}, {'end': 366.146, 'text': 'So its residual goes to the leaf on the left.', 'start': 363.364, 'duration': 2.782}, {'end': 370.708, 'text': 'All of the other residuals go to the leaf on the right.', 'start': 367.786, 'duration': 2.922}, {'end': 376.271, 'text': 'Now we calculate the similarity score for the leaf on the left.', 'start': 373.009, 'duration': 3.262}], 'summary': 'Splitting residuals into two groups based on dosage less than 15 improved clustering.', 'duration': 26.794, 'max_score': 325.774, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E325774.jpg'}, {'end': 478.343, 'src': 'embed', 'start': 440.176, 'weight': 4, 'content': [{'end': 445.261, 'text': '7.5 and negative 7.5, cancel each other out, leaving only one residual 6.5, in the numerator.', 'start': 440.176, 'duration': 5.085}, {'end': 461.714, 'text': 'Thus, the similarity score for the residuals in the leaf on the right equals 14.08.', 'start': 452.85, 'duration': 8.864}, {'end': 467.778, 'text': "So let's put similarity equals 14.08 under the leaf so we can keep track of it.", 'start': 461.714, 'duration': 6.064}, {'end': 478.343, 'text': 'Now that we have calculated similarity scores for each node, we see that when the residuals in a node are very different,', 'start': 469.879, 'duration': 8.464}], 'summary': 'Residuals cancel out, similarity score = 14.08', 'duration': 38.167, 'max_score': 440.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E440176.jpg'}], 'start': 156.029, 'title': 'Xgboost modeling and residual similarity', 'summary': 'Covers the initial prediction step in xgboost with a default prediction value of 0.5, and the unique regression tree used for fitting residuals. it also explains the calculation of similarity scores for residuals, with specific values provided, illustrating the impact of clustering similar residuals.', 'chapters': [{'end': 246.128, 'start': 156.029, 'title': 'Xgboost initial prediction and tree building', 'summary': 'Explains the initial prediction step of fitting xgboost to training data, with a default prediction value of 0.5, and the unique regression tree used in xgboost to fit residuals for regression, emphasizing on the most common way to build them.', 'duration': 90.099, 'highlights': ['XGBoost starts with an initial prediction of 0.5, regardless of using it for regression or classification, and assesses the residuals to determine how good the initial prediction is.', 'XGBoost uses a unique regression tree, called an XGBoost tree, to fit the residuals, which differs from regular regression trees used in Unextreme Gradient Boost.', 'The chapter focuses on the most common way to build XGBoost trees for regression, with each tree starting as a single leaf and all residuals going to the leaf before calculating a quality or similarity score.']}, {'end': 508.341, 'start': 247.912, 'title': 'Residual similarity score calculation', 'summary': 'Discusses the calculation of similarity scores for residuals, with the root node having a similarity score of 4, and the split leaves having similarity scores of 110.25 and 14.08, demonstrating that clustering similar residuals yields relatively larger similarity scores.', 'duration': 260.429, 'highlights': ['The observation on the far left is the only one with dosage less than 15. So its residual goes to the leaf on the left. All of the other residuals go to the leaf on the right. Explains the process of splitting observations based on dosage and distributing residuals to respective leaves.', 'The similarity score for the residuals in the leaf on the left equals 110.25. Demonstrates the high similarity score for residuals in the left leaf, indicating strong clustering of similar residuals.', 'The similarity score for the residuals in the leaf on the right equals 14.08. Illustrates the relatively smaller similarity score for residuals in the right leaf, showcasing the impact of dissimilar residuals on the score.']}], 'duration': 352.312, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E156029.jpg', 'highlights': ["XGBoost starts with an initial prediction of 0.5, regardless of regression or classification, and assesses residuals' impact.", 'XGBoost uses a unique regression tree, called an XGBoost tree, to fit residuals, differing from regular regression trees.', 'The chapter focuses on the most common way to build XGBoost trees for regression, with each tree starting as a single leaf.', 'Explains the process of splitting observations based on dosage and distributing residuals to respective leaves.', 'The similarity score for the residuals in the left leaf equals 110.25, indicating strong clustering of similar residuals.', 'The similarity score for the residuals in the right leaf equals 14.08, showcasing the impact of dissimilar residuals on the score.']}, {'end': 754.525, 'segs': [{'end': 588.713, 'src': 'embed', 'start': 546.246, 'weight': 2, 'content': [{'end': 562.904, 'text': 'So we shift the threshold over so that it is the average of the next two observations and build a simple tree that divides the observations using the new threshold dosage less than 22.5..', 'start': 546.246, 'duration': 16.658}, {'end': 568.909, 'text': 'Now we calculate the similarity scores for the leaves and calculate the gain.', 'start': 562.904, 'duration': 6.005}, {'end': 580.908, 'text': 'The gain for dosage less than 22.5 is 4.', 'start': 568.929, 'duration': 11.979}, {'end': 588.713, 'text': 'Since the gain for dosage less than 22.5 is less than the gain for dosage less than 15,', 'start': 580.908, 'duration': 7.805}], 'summary': 'Adjusted threshold results in gain of 4 for dosage less than 22.5', 'duration': 42.467, 'max_score': 546.246, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E546246.jpg'}, {'end': 657.469, 'src': 'embed', 'start': 619.474, 'weight': 0, 'content': [{'end': 624.795, 'text': 'Again, since the gain for dosage less than 30 is less than the gain for dosage less than 15,', 'start': 619.474, 'duration': 5.321}, {'end': 627.636, 'text': 'dosage less than 15 is better at splitting the observations.', 'start': 624.795, 'duration': 2.841}, {'end': 648.104, 'text': "And since we can't shift the threshold over any further to the right, we are done comparing different thresholds.", 'start': 641.98, 'duration': 6.124}, {'end': 657.469, 'text': 'And we will use the threshold that gave us the largest gain, dosage less than 15, for the first branch in the tree.', 'start': 649.745, 'duration': 7.724}], 'summary': 'Dosage less than 15 gives the largest gain for the first branch in the tree.', 'duration': 37.995, 'max_score': 619.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E619474.jpg'}, {'end': 754.525, 'src': 'embed', 'start': 691.776, 'weight': 3, 'content': [{'end': 696.6, 'text': 'Now, just like before, we calculate the similarity scores for the leaves.', 'start': 691.776, 'duration': 4.824}, {'end': 703.827, 'text': 'We calculated the similarity score for this node when we figured out how to split the root.', 'start': 698.804, 'duration': 5.023}, {'end': 707.469, 'text': 'So now we calculate the gain.', 'start': 705.668, 'duration': 1.801}, {'end': 722.805, 'text': 'And we get gain equals 28.17 for when the threshold is dosage less than 22.5.', 'start': 707.489, 'duration': 15.316}, {'end': 734.067, 'text': 'Now we shift the threshold over so that it is the average of the last two observations, calculate the similarity scores for the leaves, and the gain.', 'start': 722.805, 'duration': 11.262}, {'end': 750.023, 'text': 'And we get gain equals 140.17, which is much larger than 28.17 when the threshold was dosage less than 22.5.', 'start': 734.087, 'duration': 15.936}, {'end': 754.525, 'text': 'so we will use dosage less than 30 as the threshold for this branch.', 'start': 750.023, 'duration': 4.502}], 'summary': 'Calculated gain of 28.17 for dosage less than 22.5, and 140.17 for dosage less than 30.', 'duration': 62.749, 'max_score': 691.776, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E691776.jpg'}], 'start': 509.98, 'title': 'Decision tree threshold calculation', 'summary': 'Discusses the calculation of gain for different thresholds in a decision tree, with a gain of 140.17 obtained for a threshold of dosage less than 30, compared to 28.17 for a threshold of dosage less than 22.5.', 'chapters': [{'end': 691.776, 'start': 509.98, 'title': 'Tree splitting algorithm', 'summary': 'Explains the process of tree splitting algorithm by calculating gain for different thresholds and selecting the one with the largest gain, with specific examples such as gain of 120.33 for dosage less than 15, 56.33 for dosage less than 30, and 4 for dosage less than 22.5.', 'duration': 181.796, 'highlights': ['The gain for the threshold of dosage less than 15 is 120.33, which is the largest gain among all thresholds.', 'The gain for dosage less than 30 equals 56.33, which is less than the gain for dosage less than 15.', 'The gain for dosage less than 22.5 is 4, which is less than the gain for dosage less than 15.']}, {'end': 754.525, 'start': 691.776, 'title': 'Decision tree threshold calculation', 'summary': 'Discusses the calculation of gain for different thresholds in a decision tree, with a gain of 140.17 obtained for a threshold of dosage less than 30, compared to 28.17 for a threshold of dosage less than 22.5.', 'duration': 62.749, 'highlights': ['The gain equals 140.17 for the threshold of dosage less than 30, which is much larger than the gain of 28.17 for the threshold of dosage less than 22.5.', 'Calculation of similarity scores and gain for different thresholds in a decision tree.']}], 'duration': 244.545, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E509980.jpg', 'highlights': ['The gain for the threshold of dosage less than 15 is 120.33, the largest gain among all thresholds.', 'The gain for dosage less than 30 equals 56.33, less than the gain for dosage less than 15.', 'The gain for dosage less than 22.5 is 4, less than the gain for dosage less than 15.', 'The gain equals 140.17 for the threshold of dosage less than 30, much larger than the gain of 28.17 for the threshold of dosage less than 22.5.', 'Calculation of similarity scores and gain for different thresholds in a decision tree.']}, {'end': 890.411, 'segs': [{'end': 890.411, 'src': 'heatmap', 'start': 770.79, 'weight': 0, 'content': [{'end': 774.691, 'text': 'However, the default is to allow up to six levels.', 'start': 770.79, 'duration': 3.901}, {'end': 781.121, 'text': 'Small bam! Now we need to talk about how to prune this tree.', 'start': 775.892, 'duration': 5.229}, {'end': 786.384, 'text': 'We prune an XGBoost tree based on its gain values.', 'start': 782.481, 'duration': 3.903}, {'end': 791.207, 'text': 'We start by picking a number, for example 130.', 'start': 787.845, 'duration': 3.362}, {'end': 800.312, 'text': "Oh no! It's the dreaded terminology alert! XGBoost calls this number gamma.", 'start': 791.207, 'duration': 9.105}, {'end': 809.97, 'text': 'We then calculate the difference between the gain associated with the lowest branch in the tree and the value for gamma.', 'start': 802.008, 'duration': 7.962}, {'end': 817.092, 'text': 'If the difference between the gain and gamma is negative, we will remove the branch.', 'start': 811.47, 'duration': 5.622}, {'end': 823.913, 'text': 'And if the difference between the gain and gamma is positive, we will not remove the branch.', 'start': 818.212, 'duration': 5.701}, {'end': 835.01, 'text': 'In this case, when we plug in the gain and the value for gamma 130, We get a positive number.', 'start': 825.434, 'duration': 9.576}, {'end': 838.411, 'text': 'so we will not remove this branch and we are done pruning.', 'start': 835.01, 'duration': 3.401}, {'end': 850.237, 'text': 'Note, the gain for the root, 120.3, is less than 130, the value for gamma, so the difference will be negative.', 'start': 840.552, 'duration': 9.685}, {'end': 856.64, 'text': 'However, because we did not remove the first branch, we will not remove the root.', 'start': 851.377, 'duration': 5.263}, {'end': 872.069, 'text': 'In contrast, if we set gamma equal to 150, then we would remove this branch because 140.17 minus 150 equals a negative number.', 'start': 858.504, 'duration': 13.565}, {'end': 875.07, 'text': "So let's remove this branch.", 'start': 873.55, 'duration': 1.52}, {'end': 880.212, 'text': 'Now we subtract gamma from the gain for the root.', 'start': 877.151, 'duration': 3.061}, {'end': 890.411, 'text': 'Since 120.33 minus 150 equals a negative number, we will remove the root.', 'start': 881.733, 'duration': 8.678}], 'summary': 'Prune xgboost tree by removing branches based on gain values and gamma, up to six levels.', 'duration': 102.566, 'max_score': 770.79, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E770790.jpg'}], 'start': 756.225, 'title': 'Xgboost tree pruning', 'summary': 'Explains the process of pruning an xgboost tree based on gain values, using examples of determining whether to remove branches based on the difference between gain and a specified value for gamma, such as 130 or 150.', 'chapters': [{'end': 890.411, 'start': 756.225, 'title': 'Xgboost tree pruning', 'summary': 'Explains the process of pruning an xgboost tree based on gain values, using examples of determining whether to remove branches based on the difference between gain and a specified value for gamma, such as 130 or 150.', 'duration': 134.186, 'highlights': ['We prune an XGBoost tree based on its gain values, such as by determining whether to remove branches based on the difference between gain and a specified value for gamma, such as 130 or 150.', 'When the gain and the value for gamma, such as 130, result in a positive number, the branch will not be removed, whereas if the gain and gamma result in a negative number, the branch will be removed.', 'For example, setting gamma equal to 150 would result in the removal of a branch as the difference between the gain and 150 is negative, and subsequently, the root would also be removed as the difference between the gain for the root and 150 is also negative.']}], 'duration': 134.186, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E756225.jpg', 'highlights': ['We prune an XGBoost tree based on its gain values, such as by determining whether to remove branches based on the difference between gain and a specified value for gamma, such as 130 or 150.', 'When the gain and the value for gamma, such as 130, result in a positive number, the branch will not be removed, whereas if the gain and gamma result in a negative number, the branch will be removed.', 'Setting gamma equal to 150 would result in the removal of a branch as the difference between the gain and 150 is negative, and subsequently, the root would also be removed as the difference between the gain for the root and 150 is also negative.']}, {'end': 1346.392, 'segs': [{'end': 958.413, 'src': 'embed', 'start': 923.913, 'weight': 0, 'content': [{'end': 933.523, 'text': 'Remember, lambda is a regularization parameter, which means that it is intended to reduce the prediction sensitivity to individual observations.', 'start': 923.913, 'duration': 9.61}, {'end': 946.34, 'text': 'Now the similarity score for the root is 3.2, which is 8 tenths of what we got when lambda equals 0.', 'start': 935.077, 'duration': 11.263}, {'end': 958.413, 'text': 'When we calculate the similarity score for the leaf on the left, we get 55.12, which is half of what we got when lambda equals 0.', 'start': 946.34, 'duration': 12.073}], 'summary': 'Lambda regularization reduces sensitivity. root similarity score: 3.2 (80% decrease from lambda 0). left leaf similarity score: 55.12 (50% decrease from lambda 0).', 'duration': 34.5, 'max_score': 923.913, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E923913.jpg'}, {'end': 1046.896, 'src': 'embed', 'start': 1016.448, 'weight': 1, 'content': [{'end': 1022.833, 'text': 'Similarly, when lambda equals 1, the gain for the next branch is smaller than before.', 'start': 1016.448, 'duration': 6.385}, {'end': 1031.659, 'text': 'Now, just for comparison, these were the gain values when lambda equals 0.', 'start': 1024.534, 'duration': 7.125}, {'end': 1037.99, 'text': 'When we first talked about pruning trees, we set gamma equal to 130.', 'start': 1031.659, 'duration': 6.331}, {'end': 1046.896, 'text': 'And because, for the lowest branch in the first tree, gain minus gamma equaled a positive number, so we did not prune at all.', 'start': 1037.99, 'duration': 8.906}], 'summary': 'Pruning trees: gain reduced at lambda 1, gamma set at 130', 'duration': 30.448, 'max_score': 1016.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E1016448.jpg'}, {'end': 1187.031, 'src': 'embed', 'start': 1154.905, 'weight': 2, 'content': [{'end': 1167.578, 'text': "Awesome!. For now, regardless of lambda and gamma, let's assume that this is the tree we are working with and determine the output values for the leaves.", 'start': 1154.905, 'duration': 12.673}, {'end': 1177.847, 'text': 'The output value equals the sum of the residuals divided by the number of residuals plus lambda.', 'start': 1169.124, 'duration': 8.723}, {'end': 1187.031, 'text': 'Note, the output value equation is like the similarity score, except we do not square the sum of the residuals.', 'start': 1179.308, 'duration': 7.723}], 'summary': 'Determine output values for leaves based on residuals and lambda', 'duration': 32.126, 'max_score': 1154.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E1154905.jpg'}], 'start': 892.132, 'title': 'Xgboost and lambda impact', 'summary': 'Explores the impact of lambda on xgboost tree pruning, showing how setting lambda to 1 results in decreased similarity scores, with the largest decrease of 50% for the left leaf and 20% for the root. it also demonstrates the influence of lambda on tree pruning, revealing that higher lambda values lead to smaller gain values and easier pruning, with specific examples and quantifiable results. additionally, the chapter explains the process of determining output values for leaves in an xgboost tree, highlighting the impact of lambda on the output values and the calculation of new predicted values using the learning rate eta, and provides examples of output values for different lambda values and new predicted values for specific observations using the default learning rate.', 'chapters': [{'end': 1011.169, 'start': 892.132, 'title': 'Xgboost tree pruning', 'summary': 'Explains the impact of setting lambda equal to 1 in xgboost tree pruning, resulting in smaller similarity scores and reduced prediction sensitivity, with the leaf on the left experiencing the largest decrease of 50% and the root the smallest decrease of 20%.', 'duration': 119.037, 'highlights': ['Setting lambda equal to 1 in XGBoost tree pruning leads to smaller similarity scores, with the leaf on the left experiencing the largest decrease of 50% and the root the smallest decrease of 20%.', 'The similarity score for the root is 3.2, 8 tenths of the score when lambda equals 0.', 'The similarity score for the leaf on the left is 55.12, half of the score when lambda equals 0.', 'The similarity score for the leaf on the right is 10.56, three quarters of the score when lambda equals 0.', 'When lambda is greater than zero, the similarity scores are smaller and the amount of decrease is inversely proportional to the number of residuals in the node.']}, {'end': 1153.703, 'start': 1016.448, 'title': 'Lambda and tree pruning', 'summary': 'Explains the impact of lambda on tree pruning, demonstrating how higher lambda values lead to smaller gain values and easier pruning, with specific examples and quantifiable results.', 'duration': 137.255, 'highlights': ['When lambda equals 1, the gain for the next branch is smaller than before, making it easier to prune the whole tree away due to smaller gain values.', 'With lambda equals 1, the values for gain are both less than 130, facilitating the pruning of the entire tree, exemplifying the effect of lambda on simplifying pruning.', 'Calculating the similarity scores with lambda equal to one, the gain is negative 16.06, showcasing the impact of lambda on the gain and the subsequent decision to prune the branch.']}, {'end': 1346.392, 'start': 1154.905, 'title': 'Xgboost tree output calculation', 'summary': "Explains the process of determining output values for leaves in an xgboost tree, highlighting the impact of the regularization parameter lambda on the output values and the calculation of new predicted values using the learning rate eta. the first tree's output values are shown for different lambda values, and the new predicted value for a specific observation is calculated using the default learning rate.", 'duration': 191.487, 'highlights': ['The output value for a leaf is determined by the sum of the residuals divided by the number of residuals plus lambda, where lambda is the regularization parameter.', 'When lambda equals zero, the output value for a leaf equals the average of the residuals in that leaf, resulting in an output value of 7 for this leaf.', 'The regularization parameter lambda reduces the prediction sensitivity to individual observations, with a lambda value of 1 resulting in an output value of -5.25 for a specific leaf.', 'XGBoost makes new predictions by adding the output of the tree scaled by a learning rate, with the default learning rate (ETA) being 0.3, resulting in a new predicted value of -2.65 for a specific observation with a dosage of 10.']}], 'duration': 454.26, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E892132.jpg', 'highlights': ['Setting lambda equal to 1 in XGBoost tree pruning leads to smaller similarity scores, with the leaf on the left experiencing the largest decrease of 50% and the root the smallest decrease of 20%.', 'When lambda equals 1, the gain for the next branch is smaller than before, making it easier to prune the whole tree away due to smaller gain values.', 'The output value for a leaf is determined by the sum of the residuals divided by the number of residuals plus lambda, where lambda is the regularization parameter.']}, {'end': 1545.424, 'segs': [{'end': 1503.283, 'src': 'embed', 'start': 1419.175, 'weight': 0, 'content': [{'end': 1422.678, 'text': 'and then build another tree based on the newest residuals.', 'start': 1419.175, 'duration': 3.503}, {'end': 1430.143, 'text': 'And we keep building trees until the residuals are super small or we have reached the maximum number.', 'start': 1424.059, 'duration': 6.084}, {'end': 1447.414, 'text': 'Triple bam! In summary, when building XGBoost trees for regression, we calculate similarity scores, and gain to determine how to split the data.', 'start': 1431.904, 'duration': 15.51}, {'end': 1458.163, 'text': 'And we prune the tree by calculating the differences between gain values and a user-defined tree complexity parameter, gamma.', 'start': 1449.656, 'duration': 8.507}, {'end': 1462.967, 'text': 'If the difference is positive, then we do not prune.', 'start': 1459.825, 'duration': 3.142}, {'end': 1465.79, 'text': "If it's negative, then we prune.", 'start': 1463.708, 'duration': 2.082}, {'end': 1476.15, 'text': "For example, if we subtract gamma from this gain and get a negative value, we will prune, otherwise we're done.", 'start': 1467.682, 'duration': 8.468}, {'end': 1483.477, 'text': 'If we prune, then we will subtract gamma from the next gain value and work our way up the tree.', 'start': 1477.631, 'duration': 5.846}, {'end': 1488.369, 'text': 'Then we calculate the output values for the remaining leaves.', 'start': 1484.926, 'duration': 3.443}, {'end': 1496.016, 'text': 'And lastly, lambda is a regularization parameter and when lambda is greater than zero,', 'start': 1489.77, 'duration': 6.246}, {'end': 1503.283, 'text': 'it results in more pruning by shrinking the similarity scores and it results in smaller output values for the leaves.', 'start': 1496.016, 'duration': 7.267}], 'summary': 'Xgboost trees are built for regression using similarity scores, gain, and a user-defined tree complexity parameter, gamma, with lambda also impacting pruning and output values.', 'duration': 84.108, 'max_score': 1419.175, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E1419175.jpg'}], 'start': 1346.392, 'title': 'Xgboost trees', 'summary': 'Discusses building xgboost trees for regression, highlighting the iterative nature of making new predictions, and the process of pruning trees based on gain values and a user-defined tree complexity parameter, gamma.', 'chapters': [{'end': 1447.414, 'start': 1346.392, 'title': 'Building xgboost trees for regression', 'summary': 'Discusses the process of building xgboost trees for regression, highlighting the iterative nature of making new predictions and building trees based on residuals, with a focus on achieving smaller residuals and using similarity scores and gain to determine data splits.', 'duration': 101.022, 'highlights': ['New predictions for observations have smaller residuals than before, suggesting each small step was in the right direction.', 'We make new predictions that give us even smaller residuals, and then build another tree based on the newest residuals.', 'Building trees based on residuals is an iterative process until the residuals are super small or the maximum number of trees is reached.', 'The process involves calculating similarity scores and gain to determine how to split the data.']}, {'end': 1545.424, 'start': 1449.656, 'title': 'Xgboost tree pruning', 'summary': 'Explains how xgboost prunes trees by calculating gain values and a user-defined tree complexity parameter, gamma, resulting in more pruning with a greater lambda and a negative difference, with an overview of xgboost trees for classification in part 2.', 'duration': 95.768, 'highlights': ['XGBoost prunes trees by calculating gain values and a user-defined tree complexity parameter, gamma, resulting in pruning for negative differences and no pruning for positive differences.', 'Lambda, a regularization parameter, leads to more pruning and smaller output values for the leaves when it is greater than zero.', 'Overview of XGBoost trees for classification will be provided in Part 2.']}], 'duration': 199.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OtD8wVaFm6E/pics/OtD8wVaFm6E1346392.jpg', 'highlights': ['Building trees based on residuals is an iterative process until the residuals are super small or the maximum number of trees is reached.', 'The process involves calculating similarity scores and gain to determine how to split the data.', 'XGBoost prunes trees by calculating gain values and a user-defined tree complexity parameter, gamma, resulting in pruning for negative differences and no pruning for positive differences.', 'Lambda, a regularization parameter, leads to more pruning and smaller output values for the leaves when it is greater than zero.']}], 'highlights': ['XGBoost was designed for use with large, complicated datasets, and although it will be demonstrated with simple training data, it is intended for more complex datasets.', "XGBoost starts with an initial prediction of 0.5, regardless of regression or classification, and assesses residuals' impact.", 'The gain for the threshold of dosage less than 15 is 120.33, the largest gain among all thresholds.', 'We prune an XGBoost tree based on its gain values, such as by determining whether to remove branches based on the difference between gain and a specified value for gamma, such as 130 or 150.', 'Setting lambda equal to 1 in XGBoost tree pruning leads to smaller similarity scores, with the leaf on the left experiencing the largest decrease of 50% and the root the smallest decrease of 20%.', 'Building trees based on residuals is an iterative process until the residuals are super small or the maximum number of trees is reached.']}