title

How to Prune Regression Trees, Clearly Explained!!!

description

Pruning Regression Trees is one the most important ways we can prevent them from overfitting the Training Data. This video walks you through Cost Complexity Pruning, aka Weakest Link Pruning, step-by-step so that you can learn how it works and see it in action.
NOTE: This StatQuest assumes you already know about...
Regression Trees: https://youtu.be/g9c66TUylZ4
ALSO NOTE: This StatQuest is based on the Cost Complexity Pruning algorithm found on pages 307 to 309 of the Introduction to Statistical Learning in R: http://faculty.marshall.usc.edu/gareth-james/ISL/
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:59 Motivation for pruning a tree
3:58 Calculating the sum of squared residuals for pruned trees
7:50 Comparing pruned trees with alpha.
11:17 Step 1: Use all of the data to build trees with different alphas
13:05 Step 2: Use cross validation to compare alphas
15:02 Step 3: Select the alpha that, on average, gives the best results
15:27 Step 4: Select the original tree that corresponds to that alpha
#statquest #regression #tree

detail

{'title': 'How to Prune Regression Trees, Clearly Explained!!!', 'heatmap': [{'end': 707.65, 'start': 670.239, 'weight': 0.741}, {'end': 753.424, 'start': 721.553, 'weight': 0.723}, {'end': 813.585, 'start': 777.58, 'weight': 0.788}, {'end': 926.18, 'start': 835.68, 'weight': 0.795}], 'summary': 'Covers cost complexity pruning and weakest link pruning as methods for pruning regression trees, discussing drug effectiveness prediction based on dosages and finding the optimal pruned tree with an alpha value of 10,000.', 'chapters': [{'end': 55.53, 'segs': [{'end': 55.53, 'src': 'embed', 'start': 29.773, 'weight': 0, 'content': [{'end': 38.219, 'text': "We'll start by giving a general overview of how cost complexity pruning works, and then we'll describe how it's used to build regression trees.", 'start': 29.773, 'duration': 8.446}, {'end': 44.523, 'text': 'This stat quest assumes that you are already familiar with regression trees.', 'start': 39.78, 'duration': 4.743}, {'end': 46.684, 'text': 'If not, check out the quest.', 'start': 45.003, 'duration': 1.681}, {'end': 49.006, 'text': 'The link is in the description below.', 'start': 47.085, 'duration': 1.921}, {'end': 55.53, 'text': 'Also note, this stat quest assumes that you are already familiar with cross-validation.', 'start': 50.247, 'duration': 5.283}], 'summary': 'Overview of cost complexity pruning for regression trees.', 'duration': 25.757, 'max_score': 29.773, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo29773.jpg'}], 'start': 0.249, 'title': 'Pruning regression trees', 'summary': 'Covers cost complexity pruning as a method for pruning regression trees and assumes familiarity with regression trees and cross-validation.', 'chapters': [{'end': 55.53, 'start': 0.249, 'title': 'Pruning regression trees', 'summary': 'Covers cost complexity pruning as a method for pruning regression trees and assumes familiarity with regression trees and cross-validation.', 'duration': 55.281, 'highlights': ['The chapter introduces cost complexity pruning as a method for pruning regression trees.', 'It assumes prior knowledge of regression trees and cross-validation.']}], 'duration': 55.281, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo249.jpg', 'highlights': ['Introduces cost complexity pruning for regression trees.', 'Assumes prior knowledge of regression trees and cross-validation.']}, {'end': 457.487, 'segs': [{'end': 147.922, 'src': 'embed', 'start': 56.211, 'weight': 0, 'content': [{'end': 57.752, 'text': 'If not, check out the quest.', 'start': 56.211, 'duration': 1.541}, {'end': 62.879, 'text': 'In the StatQuest on regression trees, we had this data.', 'start': 59.437, 'duration': 3.442}, {'end': 71.622, 'text': 'Given different drug dosages on the x-axis, we measured the drug effectiveness on the y-axis.', 'start': 64.059, 'duration': 7.563}, {'end': 80.146, 'text': 'When the drug dosage was too low or too high, the drug was not effective.', 'start': 73.163, 'duration': 6.983}, {'end': 88.85, 'text': 'Medium dosages were very effective, and moderately high dosages were moderately effective.', 'start': 81.667, 'duration': 7.183}, {'end': 100.363, 'text': 'We then fit a regression tree to the data, and each leaf corresponded to the average drug effectiveness from a different cluster of observations.', 'start': 90.455, 'duration': 9.908}, {'end': 110.951, 'text': 'This tree does a pretty good job reflecting the training data, because each leaf represents a value that is close to the data.', 'start': 102.264, 'duration': 8.687}, {'end': 116.495, 'text': 'However, what if these red circles were testing data?', 'start': 112.872, 'duration': 3.623}, {'end': 128.761, 'text': 'These three observations are pretty close to the predicted values, so their residuals, the difference between the observed and predicted values,', 'start': 117.912, 'duration': 10.849}, {'end': 129.982, 'text': 'are not very large.', 'start': 128.761, 'duration': 1.221}, {'end': 137.248, 'text': 'Similarly, the residuals for these observations in the testing data are relatively small.', 'start': 131.483, 'duration': 5.765}, {'end': 143.173, 'text': 'However, the residuals for these observations are larger than before.', 'start': 138.689, 'duration': 4.484}, {'end': 147.922, 'text': 'And the residuals for these observations are much larger.', 'start': 144.54, 'duration': 3.382}], 'summary': 'Regression tree model predicts drug effectiveness based on dosage with good accuracy on training data but larger residuals on testing data.', 'duration': 91.711, 'max_score': 56.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo56211.jpg'}, {'end': 257.62, 'src': 'embed', 'start': 200.768, 'weight': 3, 'content': [{'end': 211.954, 'text': 'Thus, the main idea behind pruning a regression tree is to prevent overfitting the training data so that the tree will do a better job with the testing data.', 'start': 200.768, 'duration': 11.186}, {'end': 224.24, 'text': 'Bam!. if we wanted to prune the tree more, we could remove these two leaves and replace the split with a leaf.', 'start': 213.335, 'duration': 10.905}, {'end': 227.382, 'text': 'that is the average of a larger number of observations.', 'start': 224.24, 'duration': 3.142}, {'end': 237.17, 'text': 'And we could then remove these two leaves and replace the split with a leaf that is the average of all of the observations.', 'start': 228.925, 'duration': 8.245}, {'end': 249.216, 'text': 'So the question is, how do we decide which tree to use? In this StatQuest, we will answer that question with cost complexity pruning.', 'start': 238.871, 'duration': 10.345}, {'end': 257.62, 'text': 'The first step in cost complexity pruning is to calculate the sum of the squared residuals for each tree.', 'start': 251.137, 'duration': 6.483}], 'summary': 'Pruning regression trees prevents overfitting, improving testing performance.', 'duration': 56.852, 'max_score': 200.768, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo200768.jpg'}, {'end': 457.487, 'src': 'embed', 'start': 400.502, 'weight': 5, 'content': [{'end': 402.763, 'text': 'So we put SSR equals 19243.7 on top of the subtree with 2 leaves.', 'start': 400.502, 'duration': 2.261}, {'end': 430.241, 'text': 'Lastly, the sum of squared residuals for the subtree with only one leaf is 28, 897.2.', 'start': 424.298, 'duration': 5.943}, {'end': 432.702, 'text': "So let's put SSR equals 28, 897.2 on top of the subtree with one leaf.", 'start': 430.241, 'duration': 2.461}, {'end': 454.226, 'text': 'Note the sum of squared residuals is relatively small for the original full-sized tree, But each time we remove a leaf,', 'start': 432.722, 'duration': 21.504}, {'end': 457.487, 'text': 'the sum of squared residuals gets larger and larger.', 'start': 454.226, 'duration': 3.261}], 'summary': 'Ssr increases from 19243.7 to 28,897.2 as leaves are removed.', 'duration': 56.985, 'max_score': 400.502, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo400502.jpg'}], 'start': 56.211, 'title': 'Regression trees and drug effectiveness', 'summary': 'Discusses the use of regression trees to predict drug effectiveness based on dosages, with medium dosages being very effective, and moderately high dosages being moderately effective. it also examines how residuals of testing data differ based on their proximity to predicted values.', 'chapters': [{'end': 147.922, 'start': 56.211, 'title': 'Regression trees and drug effectiveness', 'summary': 'Discusses how regression trees are used to predict drug effectiveness based on different dosages, with medium dosages being very effective and moderately high dosages being moderately effective, and how the residuals of testing data differ based on their proximity to predicted values.', 'duration': 91.711, 'highlights': ['The regression tree accurately reflects the training data, with each leaf representing a value close to the data.', 'Medium dosages were very effective, and moderately high dosages were moderately effective, as observed from the data.', 'The residuals for certain testing data observations are much larger than others, indicating varying degrees of difference between observed and predicted values.']}, {'end': 457.487, 'start': 149.502, 'title': 'Pruning regression trees for better generalization', 'summary': 'Discusses the concept of pruning regression trees to prevent overfitting, using cost complexity pruning to select the optimal tree and improve generalization, with a focus on minimizing the sum of squared residuals.', 'duration': 307.985, 'highlights': ['Cost complexity pruning allows us to select the optimal tree by minimizing the sum of squared residuals for each tree, starting with the full-sized tree and gradually removing leaves to find the best fit for the testing data. Introduces the concept of cost complexity pruning to select the optimal tree, emphasizes the importance of minimizing sum of squared residuals, and describes the process of gradually removing leaves to find the best fit for testing data.', 'The sum of squared residuals is calculated for each subtree, with the full-sized tree having a relatively small sum of squared residuals, and as leaves are removed, the sum of squared residuals increases. Emphasizes the trend of increasing sum of squared residuals as leaves are removed from the tree, highlighting the impact of pruning on the fit of the model.', "Overfitting is prevented by pruning the regression tree to ensure better generalization for testing data, with the aim of improving the model's performance beyond the training data. Highlights the purpose of pruning in preventing overfitting and improving generalization for testing data, ultimately enhancing the model's overall performance."]}], 'duration': 401.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo56211.jpg', 'highlights': ['Medium dosages were very effective, and moderately high dosages were moderately effective, as observed from the data.', 'The regression tree accurately reflects the training data, with each leaf representing a value close to the data.', 'The residuals for certain testing data observations are much larger than others, indicating varying degrees of difference between observed and predicted values.', 'Cost complexity pruning allows us to select the optimal tree by minimizing the sum of squared residuals for each tree, starting with the full-sized tree and gradually removing leaves to find the best fit for the testing data.', "Overfitting is prevented by pruning the regression tree to ensure better generalization for testing data, with the aim of improving the model's performance beyond the training data.", 'The sum of squared residuals is calculated for each subtree, with the full-sized tree having a relatively small sum of squared residuals, and as leaves are removed, the sum of squared residuals increases.']}, {'end': 974.461, 'segs': [{'end': 582.853, 'src': 'embed', 'start': 474.338, 'weight': 0, 'content': [{'end': 493.825, 'text': 'Weakest link pruning works by calculating a tree score that is based on the sum of squared residuals for the tree or subtree and a tree complexity penalty that is a function of the number of leaves or terminal nodes in the tree or subtree.', 'start': 474.338, 'duration': 19.487}, {'end': 500.228, 'text': 'The tree complexity penalty compensates for the difference in the number of leaves.', 'start': 495.466, 'duration': 4.762}, {'end': 509.66, 'text': "Note, alpha is a tuning parameter that we find using cross-validation, and we'll talk more about it in a bit.", 'start': 502.498, 'duration': 7.162}, {'end': 516.081, 'text': "For now, let's let alpha equal 10, 000.", 'start': 511.181, 'duration': 4.9}, {'end': 518.962, 'text': "Now let's calculate the tree score for each tree.", 'start': 516.082, 'duration': 2.88}, {'end': 535.479, 'text': 'The tree score for the original full-sized tree is the total SSR for the tree, which is 543.8,, plus 10, 000 times t the total number of leaves,', 'start': 520.563, 'duration': 14.916}, {'end': 537.981, 'text': 'which is 4..', 'start': 535.479, 'duration': 2.502}, {'end': 541.323, 'text': 'So the tree score for the original full-size tree is 40, 543.8.', 'start': 537.981, 'duration': 3.342}, {'end': 546.986, 'text': "Now let's save the tree score below the tree and calculate the tree score for the subtree with one fewer leaf.", 'start': 541.323, 'duration': 5.663}, {'end': 558.81, 'text': 'The sum of squared residuals for this subtree is 5494.8.', 'start': 555.868, 'duration': 2.942}, {'end': 561.912, 'text': 'And since there are three leaves, t equals 3, and the total tree score equals 35494.8.', 'start': 558.81, 'duration': 3.102}, {'end': 565.655, 'text': 'The tree score for the subtree with two leaves is.', 'start': 561.912, 'duration': 3.743}, {'end': 582.853, 'text': '39, 243.7.', 'start': 582.833, 'duration': 0.02}], 'summary': 'Weakest link pruning calculates tree scores based on residuals and complexity penalty, with alpha tuned using cross-validation. initial full-sized tree score: 40,543.8.', 'duration': 108.515, 'max_score': 474.338, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo474338.jpg'}, {'end': 707.65, 'src': 'heatmap', 'start': 670.239, 'weight': 0.741, 'content': [{'end': 676.964, 'text': "So let's talk about how to build a pruned regression tree and how to find the best value for alpha.", 'start': 670.239, 'duration': 6.725}, {'end': 684.09, 'text': 'First, using all of the data, build a full-sized regression tree.', 'start': 678.485, 'duration': 5.605}, {'end': 692.997, 'text': 'Note, this full-sized tree is different than before because it was fit to all of the data, not just the training data.', 'start': 685.351, 'duration': 7.646}, {'end': 700.727, 'text': 'Also note, this full size tree has the lowest tree score when alpha equals zero.', 'start': 694.665, 'duration': 6.062}, {'end': 707.65, 'text': 'This is because when alpha equals zero, the tree complexity penalty becomes zero.', 'start': 701.948, 'duration': 5.702}], 'summary': 'Build a pruned regression tree and find best alpha value using full-sized tree data', 'duration': 37.411, 'max_score': 670.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo670239.jpg'}, {'end': 753.424, 'src': 'heatmap', 'start': 721.553, 'weight': 0.723, 'content': [{'end': 729.478, 'text': "So let's put alpha equals zero here to remind us that this tree has the lowest tree score when alpha equals zero.", 'start': 721.553, 'duration': 7.925}, {'end': 736.202, 'text': 'Now we will increase alpha until pruning leaves will give us a lower tree score.', 'start': 731.059, 'duration': 5.143}, {'end': 746.688, 'text': "In this case, when alpha equals 10, 000, we'll get a lower tree score if we remove these leaves and use this subtree.", 'start': 737.863, 'duration': 8.825}, {'end': 753.424, 'text': 'Now we increase alpha again until pruning leaves will give us a lower tree score.', 'start': 748.26, 'duration': 5.164}], 'summary': 'Alpha=0 yields lowest tree score. prune at alpha=10,000 for lower score.', 'duration': 31.871, 'max_score': 721.553, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo721553.jpg'}, {'end': 813.585, 'src': 'heatmap', 'start': 777.58, 'weight': 0.788, 'content': [{'end': 784.363, 'text': 'In the end, different values for alpha give us a sequence of trees from full-sized to just a leaf.', 'start': 777.58, 'duration': 6.783}, {'end': 793.508, 'text': 'Now go back to the full dataset and divide it into training and testing datasets.', 'start': 786.284, 'duration': 7.224}, {'end': 804.754, 'text': 'And just using the training data, use the alpha values we found before to build a full tree and a sequence of subtrees that minimize the tree score.', 'start': 795.149, 'duration': 9.605}, {'end': 813.585, 'text': 'In other words, when alpha equals zero, we build a full-size tree since it will have the lowest tree score.', 'start': 806.559, 'duration': 7.026}], 'summary': 'Different alpha values generate a sequence of trees from full to leaf, aiding in dataset division and subtree minimization.', 'duration': 36.005, 'max_score': 777.58, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo777580.jpg'}, {'end': 887.143, 'src': 'embed', 'start': 855.217, 'weight': 2, 'content': [{'end': 862.303, 'text': 'In this case, the tree with alpha equals 10, 000 had the smallest sum of squared residuals for the testing data.', 'start': 855.217, 'duration': 7.086}, {'end': 869.286, 'text': 'Now we go back and create new training data and new testing data.', 'start': 864.161, 'duration': 5.125}, {'end': 880.997, 'text': 'And just using the new training data, build a new sequence of trees from full-sized to a leaf using the alpha values we found before.', 'start': 871.148, 'duration': 9.849}, {'end': 887.143, 'text': 'Then we calculate the sum of squared residuals using the new testing data.', 'start': 882.739, 'duration': 4.404}], 'summary': 'Tree with alpha=10,000 had smallest sum of squared residuals for testing data.', 'duration': 31.926, 'max_score': 855.217, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo855217.jpg'}, {'end': 943.531, 'src': 'heatmap', 'start': 835.68, 'weight': 1, 'content': [{'end': 845.528, 'text': 'Lastly, when alpha equals 22, 000, we will get a lower tree score if we prune these two leaves and use this tree instead.', 'start': 835.68, 'duration': 9.848}, {'end': 853.175, 'text': 'Now calculate the sum of squared residuals for each new tree using only the testing data.', 'start': 847.33, 'duration': 5.845}, {'end': 862.303, 'text': 'In this case, the tree with alpha equals 10, 000 had the smallest sum of squared residuals for the testing data.', 'start': 855.217, 'duration': 7.086}, {'end': 869.286, 'text': 'Now we go back and create new training data and new testing data.', 'start': 864.161, 'duration': 5.125}, {'end': 880.997, 'text': 'And just using the new training data, build a new sequence of trees from full-sized to a leaf using the alpha values we found before.', 'start': 871.148, 'duration': 9.849}, {'end': 887.143, 'text': 'Then we calculate the sum of squared residuals using the new testing data.', 'start': 882.739, 'duration': 4.404}, {'end': 894.292, 'text': 'This time, the tree with alpha equals zero had the lowest sum of squared residuals.', 'start': 889.11, 'duration': 5.182}, {'end': 902.236, 'text': 'Now we just keep repeating until we have done tenfold cross-validation,', 'start': 896.053, 'duration': 6.183}, {'end': 911, 'text': 'and the value for alpha that on average gave us the lowest sum of squared residuals with the testing data is the final value for alpha.', 'start': 902.236, 'duration': 8.764}, {'end': 921.358, 'text': 'In this case, the optimal trees built with alpha equals 10, 000 had, on average, the lowest sum of squared residuals.', 'start': 912.734, 'duration': 8.624}, {'end': 926.18, 'text': 'So alpha equals 10, 000 is our final value.', 'start': 922.678, 'duration': 3.502}, {'end': 938.765, 'text': 'Lastly, we go back to the original trees and subtrees made from the full data and pick the tree that corresponds to the value for alpha that we selected.', 'start': 927.7, 'duration': 11.065}, {'end': 943.531, 'text': 'This subtree will be the final pruned tree.', 'start': 940.387, 'duration': 3.144}], 'summary': 'Using alpha values, the optimal tree with alpha equals 10,000 had the lowest sum of squared residuals on testing data, making it the final pruned tree.', 'duration': 30.797, 'max_score': 835.68, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo835680.jpg'}], 'start': 458.787, 'title': 'Pruning regression trees', 'summary': 'Discusses weakest link pruning, calculating tree scores based on sum of squared residuals and tree complexity penalty. alpha set to 10,000 results in the selection of a subtree with the lowest tree score. it also explains building a pruned regression tree and finding the best value for alpha, with examples of different alpha values leading to lower tree scores. the final value for alpha being 10,000 results in the optimal pruned tree.', 'chapters': [{'end': 646.539, 'start': 458.787, 'title': 'Tree pruning and scoring', 'summary': 'Discusses weakest link pruning, calculating tree scores based on sum of squared residuals and tree complexity penalty, with alpha set to 10,000, resulting in the selection of a subtree with the lowest tree score.', 'duration': 187.752, 'highlights': ['Weakest link pruning calculates tree scores based on sum of squared residuals and tree complexity penalty, where the penalty compensates for the difference in the number of leaves.', 'The tree score for the original full-sized tree is 40,543.8, while the subtree with one fewer leaf has a tree score of 35,494.8, and the subtree with two leaves has a tree score of 39,243.7.', 'Alpha, a tuning parameter found using cross-validation, is set to 10,000, impacting the tree complexity penalty for different trees and resulting in the selection of a subtree with the lowest tree score.']}, {'end': 974.461, 'start': 649.341, 'title': 'Pruned regression trees and optimal alpha', 'summary': 'Explains how to build a pruned regression tree and find the best value for alpha, with examples of how different alpha values lead to lower tree scores and the final value for alpha being 10,000, resulting in the optimal pruned tree.', 'duration': 325.12, 'highlights': ['The value for alpha makes a difference in our choice of subtree. The choice of subtree is influenced by the value of alpha, with different alpha values leading to the selection of subtrees with lower tree scores.', 'The final value for alpha is 10,000, resulting in the optimal pruned tree. Through tenfold cross-validation, it was determined that the trees built with alpha equals 10,000 had the lowest sum of squared residuals on average, making it the final value for alpha.', 'The tree with alpha equals 10,000 had the smallest sum of squared residuals for the testing data. The tree built with alpha equals 10,000 achieved the lowest sum of squared residuals when evaluated using the testing data.']}], 'duration': 515.674, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/D0efHEJsfHo/pics/D0efHEJsfHo458787.jpg', 'highlights': ['Alpha, a tuning parameter found using cross-validation, is set to 10,000, impacting the tree complexity penalty for different trees and resulting in the selection of a subtree with the lowest tree score.', 'The final value for alpha is 10,000, resulting in the optimal pruned tree. Through tenfold cross-validation, it was determined that the trees built with alpha equals 10,000 had the lowest sum of squared residuals on average, making it the final value for alpha.', 'The tree with alpha equals 10,000 had the smallest sum of squared residuals for the testing data. The tree built with alpha equals 10,000 achieved the lowest sum of squared residuals when evaluated using the testing data.', 'Weakest link pruning calculates tree scores based on sum of squared residuals and tree complexity penalty, where the penalty compensates for the difference in the number of leaves.', 'The tree score for the original full-sized tree is 40,543.8, while the subtree with one fewer leaf has a tree score of 35,494.8, and the subtree with two leaves has a tree score of 39,243.7.']}], 'highlights': ['Cost complexity pruning allows us to select the optimal tree by minimizing the sum of squared residuals for each tree, starting with the full-sized tree and gradually removing leaves to find the best fit for the testing data.', 'The final value for alpha is 10,000, resulting in the optimal pruned tree. Through tenfold cross-validation, it was determined that the trees built with alpha equals 10,000 had the lowest sum of squared residuals on average, making it the final value for alpha.', 'The tree with alpha equals 10,000 had the smallest sum of squared residuals for the testing data. The tree built with alpha equals 10,000 achieved the lowest sum of squared residuals when evaluated using the testing data.', 'Weakest link pruning calculates tree scores based on sum of squared residuals and tree complexity penalty, where the penalty compensates for the difference in the number of leaves.', 'The regression tree accurately reflects the training data, with each leaf representing a value close to the data.']}