title
Gradient Boost Part 3 (of 4): Classification
description
This is Part 3 in our series on Gradient Boost. At long last, we are showing how it can be used for classification. This video gives focuses on the main ideas behind this technique. The next video in this series will focus more on the math and how it works with the underlying algorithm.
This StatQuest assumes that you have already watched Part 1:
https://youtu.be/3CC4N4z3GJc
...and it also assumed that you understand Logistic Regression pretty well. Here are the links for...
A general overview of Logistic Regression: https://youtu.be/yIYKR4sgzI8
how to interpret the coefficients: https://youtu.be/vN5cNN2-HWE
and how to estimate the coefficients: https://youtu.be/BfKanl1aSG0
Lastly, if you want to learn more about using different probability thresholds for classification, check out the StatQuest on ROC and AUC: https://youtu.be/xugjARegisk
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
This StatQuest is based on the following sources:
A 1999 manuscript by Jerome Friedman that introduced Stochastic Gradient Boost: https://statweb.stanford.edu/~jhf/ftp/stobst.pdf
The Wikipedia article on Gradient Boosting: https://en.wikipedia.org/wiki/Gradient_boosting
The scikit-learn implementation of Gradient Boosting: https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
#statquest #gradientboost
detail
{'title': 'Gradient Boost Part 3 (of 4): Classification', 'heatmap': [{'end': 123.58, 'start': 81.537, 'weight': 0.761}, {'value': 0.7245093961954974, 'end_time': 123.58, 'start_time': 108.604}, {'end': 335.754, 'start': 311.582, 'weight': 0.741}, {'end': 422.912, 'start': 359.69, 'weight': 0.748}, {'end': 596.291, 'start': 565.993, 'weight': 0.934}], 'summary': "Explains the concepts of using gradient boost for classification, involving a training dataset of six people's attributes and their love for the movie troll 2. it discusses calculating initial predictions, converting to probability, making classification decisions, measuring prediction accuracy, building trees, and updating predictions, leading to improved predicted probabilities.", 'chapters': [{'end': 72.39, 'segs': [{'end': 72.39, 'src': 'embed', 'start': 30.754, 'weight': 0, 'content': [{'end': 36.836, 'text': 'Note, this stat quest assumes you have already watched Gradient Boost Part 1, Regression Main Ideas.', 'start': 30.754, 'duration': 6.082}, {'end': 39.117, 'text': 'If not, check out the quest.', 'start': 37.536, 'duration': 1.581}, {'end': 46.199, 'text': 'In addition, when Gradient Boost is used for classification, it has a lot in common with logistic regression.', 'start': 40.317, 'duration': 5.882}, {'end': 50.461, 'text': "So if you're not already familiar with logistic regression, check out the quests.", 'start': 46.78, 'duration': 3.681}, {'end': 60.981, 'text': 'In this stat quest, we will use this training data, where we have collected popcorn preference from six people, their age,', 'start': 51.841, 'duration': 9.14}, {'end': 66.566, 'text': 'their favorite color and whether or not they love the movie Troll 2,', 'start': 60.981, 'duration': 5.585}, {'end': 72.39, 'text': 'and walk through step by step the most common way that Gradient Boost fits a model to this training data.', 'start': 66.566, 'duration': 5.824}], 'summary': 'Stat quest on gradient boost for classification and logistic regression with training data of six people.', 'duration': 41.636, 'max_score': 30.754, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs30754.jpg'}], 'start': 0.333, 'title': 'Gradient boost for classification', 'summary': "Explains the main ideas of using gradient boost for classification, using a training data of six people's popcorn preferences, age, favorite color, and love for the movie troll 2.", 'chapters': [{'end': 72.39, 'start': 0.333, 'title': 'Gradient boost for classification', 'summary': "Explains the main ideas of using gradient boost for classification, using a training data of six people's popcorn preferences, age, favorite color, and love for the movie troll 2.", 'duration': 72.057, 'highlights': ['Gradient Boost can be used for classification, and it shares similarities with logistic regression, making it important to understand logistic regression as well.', "The chapter uses a training data set of six people's popcorn preferences, age, favorite color, and love for the movie Troll 2 to demonstrate the common way Gradient Boost fits a model to the data.", 'This is part 3 in a series on Gradient Boost, and it assumes that the audience has already watched part 1 on Regression Main Ideas.']}], 'duration': 72.057, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs333.jpg', 'highlights': ["The chapter uses a training data set of six people's popcorn preferences, age, favorite color, and love for the movie Troll 2 to demonstrate the common way Gradient Boost fits a model to the data.", 'Gradient Boost can be used for classification, and it shares similarities with logistic regression, making it important to understand logistic regression as well.', 'This is part 3 in a series on Gradient Boost, and it assumes that the audience has already watched part 1 on Regression Main Ideas.']}, {'end': 281.796, 'segs': [{'end': 126.441, 'src': 'heatmap', 'start': 73.831, 'weight': 0, 'content': [{'end': 80.397, 'text': 'Just like in part one of this series, we start with a leaf that represents an initial prediction for every individual.', 'start': 73.831, 'duration': 6.566}, {'end': 88.343, 'text': 'When we use Gradient Boost for classification, the initial prediction for every individual is the log of the odds.', 'start': 81.537, 'duration': 6.806}, {'end': 94.419, 'text': 'I like to think of the log of the odds as the logistic regression equivalent of the average.', 'start': 89.577, 'duration': 4.842}, {'end': 101.921, 'text': "So let's calculate the overall log of the odds that someone loves Troll 2.", 'start': 95.739, 'duration': 6.182}, {'end': 108.604, 'text': 'Since four people in the training dataset love Troll 2, and two people do not,', 'start': 101.921, 'duration': 6.683}, {'end': 118.257, 'text': 'then the log of the odds that someone loves Troll 2 is the log of 4 divided by 2, which equals 0.7..', 'start': 108.604, 'duration': 9.653}, {'end': 120.458, 'text': 'which we will put into our initial leaf.', 'start': 118.257, 'duration': 2.201}, {'end': 123.58, 'text': 'So this is the initial prediction.', 'start': 121.859, 'duration': 1.721}, {'end': 126.441, 'text': 'How do we use it for classification?', 'start': 124.74, 'duration': 1.701}], 'summary': 'Using log of the odds as initial prediction for classification in gradient boost, with a calculated value of 0.7.', 'duration': 52.61, 'max_score': 73.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs73831.jpg'}, {'end': 172.869, 'src': 'embed', 'start': 150.001, 'weight': 1, 'content': [{'end': 162.084, 'text': 'So we plug in the log of the odds into the logistic function, do the math, and we get 0.7 as the probability of loving troll 2.', 'start': 150.001, 'duration': 12.083}, {'end': 163.705, 'text': "And let's save that up here for now.", 'start': 162.084, 'duration': 1.621}, {'end': 172.869, 'text': "Note, these two numbers, the log of 4 divided by 2 and the probability, are the same only because I'm rounding.", 'start': 165.103, 'duration': 7.766}], 'summary': 'Using logistic function, log(odds) gives 0.7 probability for loving troll 2.', 'duration': 22.868, 'max_score': 150.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs150001.jpg'}, {'end': 281.796, 'src': 'embed', 'start': 199.911, 'weight': 2, 'content': [{'end': 208.998, 'text': 'While 0.5 is a very common threshold for making classification decisions based on probability, we could have just as easily used a different value.', 'start': 199.911, 'duration': 9.087}, {'end': 216.103, 'text': 'For more details, check out the StatQuest ROC and AUC Clearly Explained.', 'start': 210.119, 'duration': 5.984}, {'end': 226.95, 'text': 'Now, classifying everyone in the training dataset as someone who loves Troll 2 is pretty lame, because two of the people do not love the movie.', 'start': 217.867, 'duration': 9.083}, {'end': 236.614, 'text': 'We can measure how bad the initial prediction is by calculating pseudo-residuals, the difference between the observed and the predicted values.', 'start': 228.211, 'duration': 8.403}, {'end': 244.157, 'text': "Although the math is easy, I think it's easier to grasp what's going on if we draw the residuals on a graph.", 'start': 238.315, 'duration': 5.842}, {'end': 250.401, 'text': 'The y-axis is the probability of loving Troll 2.', 'start': 245.558, 'duration': 4.843}, {'end': 256.084, 'text': 'The predicted probability of loving Troll 2 is 0.7.', 'start': 250.401, 'duration': 5.683}, {'end': 264.928, 'text': 'The red dots, with the probability of loving Troll 2 equal to 0, represent the two people that do not love Troll 2.', 'start': 256.084, 'duration': 8.844}, {'end': 273.652, 'text': 'And the blue dots, with the probability of loving Troll 2 equal to 1, represent the four people that love Troll 2.', 'start': 264.928, 'duration': 8.724}, {'end': 281.796, 'text': 'In other words, the red and blue dots are the observed values, and the dotted line is the predicted value.', 'start': 273.652, 'duration': 8.144}], 'summary': 'Using 0.5 as a threshold for classification decisions, assessing initial predictions through pseudo-residuals and visualization.', 'duration': 81.885, 'max_score': 199.911, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs199911.jpg'}], 'start': 73.831, 'title': 'Gradient boost for classification', 'summary': 'Discusses calculating initial prediction using odds, converting to probability, and making classification decisions with a specific example of classifying individuals based on their love for troll 2. it also explains measuring prediction accuracy by calculating pseudo-residuals and graphing them, with a predicted probability of 0.7 and observations of 2 people not loving troll 2 and 4 people loving it.', 'chapters': [{'end': 226.95, 'start': 73.831, 'title': 'Gradient boost for classification', 'summary': 'Discusses how to calculate the initial prediction using the log of the odds, convert it into a probability using the logistic function, and make classification decisions based on the probability, with a specific example of classifying individuals based on their love for troll 2.', 'duration': 153.119, 'highlights': ['The initial prediction for every individual in Gradient Boost classification is the log of the odds, calculated using the proportion of positive and negative cases in the training dataset, with a specific example of the log of the odds for loving Troll 2 being 0.7 based on the ratio of individuals who love and do not love the movie.', 'The log of the odds is converted into a probability using the logistic function, and in this case, the probability of loving Troll 2 is calculated to be 0.7, which is then used for classification.', 'The chapter highlights the importance of choosing a threshold for classification decisions based on probability, with a reference to the StatQuest ROC and AUC Clearly Explained for further details.']}, {'end': 281.796, 'start': 228.211, 'title': 'Measuring prediction accuracy', 'summary': 'Explains how to measure prediction accuracy by calculating pseudo-residuals and graphing them, with a predicted probability of 0.7 and observations of 2 people not loving troll 2 and 4 people loving it.', 'duration': 53.585, 'highlights': ['By calculating pseudo-residuals, the difference between observed and predicted values can be measured, providing insight into the accuracy of the initial prediction.', 'Graphing the residuals on a graph helps to visualize the accuracy of the prediction, with the y-axis representing the probability of loving Troll 2 and the predicted probability being 0.7.', 'Observations of 2 people not loving Troll 2 and 4 people loving it are represented by red and blue dots respectively, with the dotted line indicating the predicted value.']}], 'duration': 207.965, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs73831.jpg', 'highlights': ['The initial prediction for every individual in Gradient Boost classification is the log of the odds, calculated using the proportion of positive and negative cases in the training dataset.', 'The log of the odds is converted into a probability using the logistic function, and in this case, the probability of loving Troll 2 is calculated to be 0.7, which is then used for classification.', 'By calculating pseudo-residuals, the difference between observed and predicted values can be measured, providing insight into the accuracy of the initial prediction.', 'The chapter highlights the importance of choosing a threshold for classification decisions based on probability, with a reference to the StatQuest ROC and AUC Clearly Explained for further details.', 'Graphing the residuals on a graph helps to visualize the accuracy of the prediction, with the y-axis representing the probability of loving Troll 2 and the predicted probability being 0.7.', 'Observations of 2 people not loving Troll 2 and 4 people loving it are represented by red and blue dots respectively, with the dotted line indicating the predicted value.']}, {'end': 1008.716, 'segs': [{'end': 357.928, 'src': 'heatmap', 'start': 311.582, 'weight': 4, 'content': [{'end': 316.023, 'text': "Hurray! We've calculated the residuals for the leaf's initial prediction.", 'start': 311.582, 'duration': 4.441}, {'end': 324.584, 'text': 'Now we build a tree using likes popcorn, age, and favorite color to predict the residuals.', 'start': 317.463, 'duration': 7.121}, {'end': 326.865, 'text': "And here's the tree.", 'start': 326.005, 'duration': 0.86}, {'end': 335.754, 'text': 'Note, just like when we used gradient boost for regression, we are limiting the number of leaves that we will allow in the tree.', 'start': 328.57, 'duration': 7.184}, {'end': 341.098, 'text': 'In this simple example, we are limiting the number of leaves to three.', 'start': 337.035, 'duration': 4.063}, {'end': 349.503, 'text': 'In practice, people often set the maximum number of leaves to be between eight and 32.', 'start': 342.559, 'duration': 6.944}, {'end': 352.304, 'text': "Now let's calculate the output values for the leaves.", 'start': 349.503, 'duration': 2.801}, {'end': 357.928, 'text': 'Note, these three rows of data go to the same leaf.', 'start': 353.645, 'duration': 4.283}], 'summary': 'Calculated residuals, built a tree with 3 leaves, set limit to 3-32 leaves.', 'duration': 29.358, 'max_score': 311.582, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs311582.jpg'}, {'end': 422.912, 'src': 'heatmap', 'start': 359.69, 'weight': 0.748, 'content': [{'end': 363.111, 'text': 'These two rows of data go to the same leaf.', 'start': 359.69, 'duration': 3.421}, {'end': 368.613, 'text': 'Lastly, this row of data goes to its own leaf.', 'start': 364.512, 'duration': 4.101}, {'end': 377.836, 'text': 'When we used gradient boost for regression, a leaf with a single residual had an output value equal to that residual.', 'start': 370.253, 'duration': 7.583}, {'end': 385.018, 'text': 'In contrast, when we use gradient boost for classification, the situation is a little more complex.', 'start': 379.056, 'duration': 5.962}, {'end': 393.638, 'text': 'This is because the predictions are in terms of the log of the odds, and this leaf is derived from a probability.', 'start': 386.274, 'duration': 7.364}, {'end': 401.402, 'text': "So we can't just add them together and get a new log of the odds prediction without some sort of transformation.", 'start': 394.919, 'duration': 6.483}, {'end': 409.026, 'text': 'When we use gradient boost for classification, the most common transformation is the following formula.', 'start': 403.023, 'duration': 6.003}, {'end': 422.912, 'text': 'The numerator is the sum of all the residuals in the leaf and the denominator is the sum of the previously predicted probabilities for each residual times 1 minus the same predicted probability.', 'start': 410.087, 'duration': 12.825}], 'summary': 'Gradient boost uses different methods for regression and classification, with a specific transformation for the latter.', 'duration': 63.222, 'max_score': 359.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs359690.jpg'}, {'end': 470.039, 'src': 'embed', 'start': 435.661, 'weight': 1, 'content': [{'end': 440.744, 'text': "So for now, let's just use the formula to calculate the output value for this leaf.", 'start': 435.661, 'duration': 5.083}, {'end': 447.24, 'text': 'Since there is only one residual in this leaf, we can ignore these summation signs for now.', 'start': 442.236, 'duration': 5.004}, {'end': 455.307, 'text': 'So we plug in the residual from the leaf and since we are building the first tree,', 'start': 448.801, 'duration': 6.506}, {'end': 459.33, 'text': 'the previous probability refers to the probability from the initial leaf.', 'start': 455.307, 'duration': 4.023}, {'end': 470.039, 'text': 'So we plug that in, do the math, and we end up with negative 3.3 as the new output value for this leaf.', 'start': 460.771, 'duration': 9.268}], 'summary': 'Using formula to calculate output value for leaf, resulting in -3.3.', 'duration': 34.378, 'max_score': 435.661, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs435661.jpg'}, {'end': 596.291, 'src': 'heatmap', 'start': 565.993, 'weight': 0.934, 'content': [{'end': 579.966, 'text': 'The log of the odds prediction is the previous prediction, 0.7, plus the output value from the tree scaled by the learning rate, 0.8 times 1.4.', 'start': 565.993, 'duration': 13.973}, {'end': 585.708, 'text': 'And the new log of the odds prediction equals 1.8.', 'start': 579.966, 'duration': 5.742}, {'end': 589.689, 'text': 'Now we can convert the new log odds prediction into a probability.', 'start': 585.708, 'duration': 3.981}, {'end': 596.291, 'text': 'And the new predicted probability equals 0.9.', 'start': 591.089, 'duration': 5.202}], 'summary': 'New predicted probability is 0.9 based on log odds prediction of 1.8', 'duration': 30.298, 'max_score': 565.993, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs565993.jpg'}, {'end': 863.612, 'src': 'embed', 'start': 816.654, 'weight': 0, 'content': [{'end': 821.055, 'text': 'We started with just a leaf, which made one prediction for every individual.', 'start': 816.654, 'duration': 4.401}, {'end': 829.376, 'text': 'Then we built a tree based on the residuals, the difference between the observed values and the single value predicted by the leaf.', 'start': 822.375, 'duration': 7.001}, {'end': 836.518, 'text': 'Then we calculated the output values for each leaf, and we scaled it with a learning rate.', 'start': 830.397, 'duration': 6.121}, {'end': 841.034, 'text': 'Then we built another tree based on the new residuals,', 'start': 837.912, 'duration': 3.122}, {'end': 846.559, 'text': 'the difference between the observed values and the values predicted by the leaf and the first tree.', 'start': 841.034, 'duration': 5.525}, {'end': 850.722, 'text': 'Then we calculated the output values for each leaf.', 'start': 847.78, 'duration': 2.942}, {'end': 855.345, 'text': 'And we scaled this new tree with the learning rate as well.', 'start': 852.283, 'duration': 3.062}, {'end': 863.612, 'text': 'This process repeats until we have made the maximum number of trees specified or the residuals get super small.', 'start': 856.987, 'duration': 6.625}], 'summary': 'Gradient boosting builds trees iteratively to minimize residuals.', 'duration': 46.958, 'max_score': 816.654, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs816654.jpg'}, {'end': 936.166, 'src': 'embed', 'start': 902.829, 'weight': 3, 'content': [{'end': 905.13, 'text': 'And then we add the scaled output value.', 'start': 902.829, 'duration': 2.301}, {'end': 908.151, 'text': 'Now we just do the math.', 'start': 906.651, 'duration': 1.5}, {'end': 916.836, 'text': 'And get 2.3 as the log of the odds prediction that this person loves Troll 2.', 'start': 909.452, 'duration': 7.384}, {'end': 920.598, 'text': 'Now we need to convert this log of the odds into a probability.', 'start': 916.836, 'duration': 3.762}, {'end': 925.261, 'text': 'So we plug the log of the odds into the logistic function.', 'start': 922.099, 'duration': 3.162}, {'end': 927.362, 'text': 'Do the math.', 'start': 926.581, 'duration': 0.781}, {'end': 936.166, 'text': 'and the predicted probability that this individual will love Troll 2 is 0.9.', 'start': 928.82, 'duration': 7.346}], 'summary': 'Using logistic function, the predicted probability of loving troll 2 is 0.9.', 'duration': 33.337, 'max_score': 902.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs902829.jpg'}], 'start': 283.117, 'title': 'Gradient boosting for prediction', 'summary': 'Delves into the process of calculating residuals, building trees, and updating predictions in gradient boosting, resulting in improved predicted probabilities. it also emphasizes the use of small trees and promises a detailed explanation of the math behind gradient boost in the next section.', 'chapters': [{'end': 655.69, 'start': 283.117, 'title': 'Gradient boosting for prediction', 'summary': 'Explains the process of calculating residuals, building a tree to predict the residuals, and updating predictions using the output values from the tree, ultimately improving the predicted probabilities.', 'duration': 372.573, 'highlights': ['The output value for a leaf is calculated by summing the residuals and the previous probabilities, aiding in predicting the log of the odds and the new probabilities.', 'The process involves limiting the number of leaves in the tree and scaling the new tree by a learning rate, with the new log odds prediction and probabilities being calculated based on the output values.', 'The chapter emphasizes the iterative nature of building multiple trees to obtain improved predicted probabilities for all individuals involved.']}, {'end': 1008.716, 'start': 657.531, 'title': 'Gradient boost: building trees and making predictions', 'summary': 'Explains the process of building trees and making predictions using residuals in gradient boost, culminating in the classification of a new individual with a predicted probability of 0.9, and highlights the use of small trees, and the promise of a deep dive into the math of gradient boost in the next part.', 'duration': 351.185, 'highlights': ['The prediction starts with the leaf. Then we run the data down the first tree. And we add the scaled output value. Then we run the data down the second tree. And then we add the scaled output value. Now we just do the math. And get 2.3 as the log of the odds prediction that this person loves Troll 2.', 'The predicted probability that this individual will love Troll 2 is 0.9.', 'Gradient Boost usually uses trees with between 8 and 32 leaves.']}], 'duration': 725.599, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jxuNLH5dXCs/pics/jxuNLH5dXCs283117.jpg', 'highlights': ['The process involves limiting the number of leaves in the tree and scaling the new tree by a learning rate, with the new log odds prediction and probabilities being calculated based on the output values.', 'The output value for a leaf is calculated by summing the residuals and the previous probabilities, aiding in predicting the log of the odds and the new probabilities.', 'The chapter emphasizes the iterative nature of building multiple trees to obtain improved predicted probabilities for all individuals involved.', 'The predicted probability that this individual will love Troll 2 is 0.9.', 'Gradient Boost usually uses trees with between 8 and 32 leaves.', 'The prediction starts with the leaf. Then we run the data down the first tree. And we add the scaled output value. Then we run the data down the second tree. And then we add the scaled output value. Now we just do the math. And get 2.3 as the log of the odds prediction that this person loves Troll 2.']}], 'highlights': ['The process involves limiting the number of leaves in the tree and scaling the new tree by a learning rate, with the new log odds prediction and probabilities being calculated based on the output values.', 'The initial prediction for every individual in Gradient Boost classification is the log of the odds, calculated using the proportion of positive and negative cases in the training dataset.', "The chapter uses a training data set of six people's popcorn preferences, age, favorite color, and love for the movie Troll 2 to demonstrate the common way Gradient Boost fits a model to the data.", 'The chapter emphasizes the iterative nature of building multiple trees to obtain improved predicted probabilities for all individuals involved.', 'By calculating pseudo-residuals, the difference between observed and predicted values can be measured, providing insight into the accuracy of the initial prediction.', 'Gradient Boost can be used for classification, and it shares similarities with logistic regression, making it important to understand logistic regression as well.']}