title
Regression Trees, Clearly Explained!!!

description
Regression Trees are one of the fundamental machine learning techniques that more complicated methods, like Gradient Boost, are based on. They are useful for times when there isn't an obviously linear relationship between what you want to predict, and the things you are using to make the predictions. This StatQuest walks you through the steps required to build Regression Trees so that they are Clearly Explained. NOTE: This StatQuest assumes you already know about... The bias/variance tradeoff: https://youtu.be/EuBBz3bI-aA Decision Trees: https://youtu.be/7VeUPuFGJHk Linear Regression: https://www.youtube.com/watch?v=nk2CQITm_eo ALSO NOTE: This StatQuest is based on the definition of Regression Trees found on pages 304 to 307 of the Introduction to Statistical Learning in R: http://faculty.marshall.usc.edu/gareth-james/ISL/ For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Buying The StatQuest Illustrated Guide to Machine Learning!!! PDF - https://statquest.gumroad.com/l/wvtmc Paperback - https://www.amazon.com/dp/B09ZCKR4H6 Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...a cool StatQuest t-shirt or sweatshirt: https://shop.spreadshirt.com/statquest-with-josh-starmer/ ...buying one or two of my songs (or go large and get a whole album!) https://joshuastarmer.bandcamp.com/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 0:41 Motivation for Regression Trees 2:19 Regression Trees vs Classification Trees 7:11 Building a Regression Tree with one variable 18:59 Building a Regression Tree with multiple variables 20:54 Summary of concepts and main ideas #statquest #regression #tree

detail
{'title': 'Regression Trees, Clearly Explained!!!', 'heatmap': [{'end': 460.62, 'start': 431.912, 'weight': 0.838}, {'end': 839.142, 'start': 766.249, 'weight': 0.739}, {'end': 1179.47, 'start': 1155.597, 'weight': 0.745}, {'end': 1259.066, 'start': 1205.307, 'weight': 0.928}], 'summary': 'Explains regression trees and their application in predicting drug dosages, highlighting their advantages over straight lines in accommodating multiple predictors and making accurate predictions in complex scenarios. it also discusses building a regression tree with 98% accuracy to predict drug effectiveness based on dosage and decision tree analysis for predicting drug effectiveness, emphasizing specific thresholds and average effectiveness values for different dosage ranges.', 'chapters': [{'end': 128.675, 'segs': [{'end': 59.319, 'src': 'embed', 'start': 29.415, 'weight': 0, 'content': [{'end': 36.158, 'text': 'and the basic ideas behind decision trees, and the basic ideas behind regression.', 'start': 29.415, 'duration': 6.743}, {'end': 38.42, 'text': 'If not, check out the quests.', 'start': 36.759, 'duration': 1.661}, {'end': 40.521, 'text': 'The links are in the description below.', 'start': 38.82, 'duration': 1.701}, {'end': 46.343, 'text': 'Now, imagine we developed a new drug to cure the common cold.', 'start': 41.921, 'duration': 4.422}, {'end': 51.286, 'text': "However, we don't know the optimal dosage to give patients.", 'start': 47.764, 'duration': 3.522}, {'end': 59.319, 'text': 'So we do a clinical trial with different dosages and measure how effective each dosage is.', 'start': 52.807, 'duration': 6.512}], 'summary': 'Introduction to decision trees and regression for drug dosage optimization in clinical trials.', 'duration': 29.904, 'max_score': 29.415, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ429415.jpg'}, {'end': 128.675, 'src': 'embed', 'start': 98.087, 'weight': 1, 'content': [{'end': 101.908, 'text': 'Somewhat higher dosages work at about 50% effectiveness.', 'start': 98.087, 'duration': 3.821}, {'end': 105.43, 'text': 'And high dosages are not effective at all.', 'start': 103.069, 'duration': 2.361}, {'end': 111.432, 'text': 'In this case, fitting a straight line to the data will not be very useful.', 'start': 107.09, 'duration': 4.342}, {'end': 124.455, 'text': 'For example, if someone told us they were taking a 20 milligram dose, then we would predict that a 20 milligram dose should be 45% effective,', 'start': 112.813, 'duration': 11.642}, {'end': 128.675, 'text': 'even though the observed data says it should be 100% effective.', 'start': 124.455, 'duration': 4.22}], 'summary': 'Higher dosages are about 50% effective, not useful for prediction.', 'duration': 30.588, 'max_score': 98.087, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ498087.jpg'}], 'start': 0.979, 'title': 'Regression trees', 'summary': 'Explores regression trees and their application in predicting drug dosages, emphasizing the limitations of fitting straight lines to irregular datasets.', 'chapters': [{'end': 128.675, 'start': 0.979, 'title': 'Understanding regression trees', 'summary': 'Discusses the concept of regression trees, highlighting its application in predicting the effectiveness of drug dosages based on given data, showcasing the limitations of fitting straight lines to irregular datasets.', 'duration': 127.696, 'highlights': ['Regression trees are used to predict outcomes based on input variables, such as determining the effectiveness of drug dosages, as illustrated in the example where different dosages were measured for their effectiveness.', 'Fitting a straight line to irregular datasets may lead to inaccurate predictions, as demonstrated by the scenario where low, moderate, and high dosages of a drug exhibited varying levels of effectiveness, rendering a linear fit unsuitable for prediction.']}], 'duration': 127.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4979.jpg', 'highlights': ['Regression trees predict outcomes based on input variables, e.g., drug dosage effectiveness.', 'Straight line fitting to irregular datasets may lead to inaccurate predictions.']}, {'end': 395.901, 'segs': [{'end': 186.839, 'src': 'embed', 'start': 158.851, 'weight': 0, 'content': [{'end': 166.674, 'text': 'With this regression tree, we start by asking if the dosage is less than 14.5.', 'start': 158.851, 'duration': 7.823}, {'end': 171.435, 'text': 'If so, then we are talking about these six observations in the training data.', 'start': 166.674, 'duration': 4.761}, {'end': 179.754, 'text': 'and the average drug effectiveness for these six observations is 4.2%.', 'start': 172.97, 'duration': 6.784}, {'end': 186.839, 'text': 'So the tree uses the average value, 4.2%, as its prediction for people with dosages less than 14.5.', 'start': 179.754, 'duration': 7.085}], 'summary': 'Regression tree predicts drug effectiveness for dosages less than 14.5 as 4.2%.', 'duration': 27.988, 'max_score': 158.851, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4158851.jpg'}, {'end': 319.596, 'src': 'embed', 'start': 291.38, 'weight': 2, 'content': [{'end': 298.323, 'text': 'Since each leaf corresponds to the average drug effectiveness in a different cluster of observations,', 'start': 291.38, 'duration': 6.943}, {'end': 302.445, 'text': 'the tree does a better job reflecting the data than the straight line.', 'start': 298.323, 'duration': 4.122}, {'end': 312.653, 'text': 'At this point, you might be thinking, the regression tree is cool, but I can also predict drug effectiveness just by looking at the graph.', 'start': 304.369, 'duration': 8.284}, {'end': 319.596, 'text': 'For example, if someone said they were taking a 27 milligram dose,', 'start': 313.974, 'duration': 5.622}], 'summary': 'Regression tree reflects drug effectiveness better than a straight line for different clusters of observations.', 'duration': 28.216, 'max_score': 291.38, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4291380.jpg'}, {'end': 370.539, 'src': 'embed', 'start': 342.323, 'weight': 1, 'content': [{'end': 351.826, 'text': 'But when we have three or more predictors like dosage, age and sex to predict drug effectiveness, drawing a graph is very difficult,', 'start': 342.323, 'duration': 9.503}, {'end': 352.906, 'text': 'if not impossible.', 'start': 351.826, 'duration': 1.08}, {'end': 359.148, 'text': 'In contrast, a regression tree easily accommodates the additional predictors.', 'start': 354.387, 'duration': 4.761}, {'end': 370.539, 'text': 'For example, if we wanted to predict the drug effectiveness for this patient, we would start by asking if they are older than 50.', 'start': 360.851, 'duration': 9.688}], 'summary': 'Regression tree handles multiple predictors easily', 'duration': 28.216, 'max_score': 342.323, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4342323.jpg'}], 'start': 130.295, 'title': 'Regression trees for drug predictions and effectiveness', 'summary': 'Discusses the use of regression trees for predicting drug effectiveness based on dosage, with specific thresholds and average effectiveness values for different dosage ranges. it also highlights the advantages of regression trees over straight lines in accommodating multiple predictors and making accurate predictions in complex scenarios, demonstrated through a detailed example.', 'chapters': [{'end': 291.38, 'start': 130.295, 'title': 'Regression tree for predictions', 'summary': 'Discusses the use of regression trees for predictions, where the dosage is used to predict drug effectiveness, with specific thresholds and average effectiveness values for different dosage ranges.', 'duration': 161.085, 'highlights': ['The tree uses the average value, 100%, as its prediction for people with dosages between 14.5 and 23.5, based on four observations in the training data.', 'The tree uses the average value, 52.8%, as its prediction for people with dosages between 23.5 and 29, based on five observations in the training data set.', 'The tree uses the average value, 4.2%, as its prediction for people with dosages less than 14.5, based on six observations in the training data.']}, {'end': 395.901, 'start': 291.38, 'title': 'Regression tree for drug effectiveness', 'summary': 'Discusses the advantages of using a regression tree over a straight line for predicting drug effectiveness, highlighting its ability to accommodate multiple predictors and make accurate predictions in complex scenarios, as demonstrated by a detailed example.', 'duration': 104.521, 'highlights': ['A regression tree does a better job reflecting the data than a straight line, making it more effective for predicting drug effectiveness in different clusters of observations.', "The regression tree's advantage becomes evident when dealing with multiple predictors like dosage, age, and sex, where drawing a graph for prediction becomes difficult or impossible.", "An example is provided to demonstrate the regression tree's ability to accurately predict drug effectiveness by considering multiple predictors, with a specific prediction of 100% effectiveness for a female patient below the age of 50 with a dosage below 29."]}], 'duration': 265.606, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4130294.jpg', 'highlights': ['Regression trees predict drug effectiveness based on dosage thresholds and average values.', 'Regression trees accommodate multiple predictors and make accurate predictions in complex scenarios.', 'Regression trees reflect data better than straight lines, making them more effective for predicting drug effectiveness in different clusters of observations.']}, {'end': 970.854, 'segs': [{'end': 535.687, 'src': 'heatmap', 'start': 431.912, 'weight': 0, 'content': [{'end': 438.579, 'text': "Going back to the graph of the data, let's focus on the two observations with the smallest dosages.", 'start': 431.912, 'duration': 6.667}, {'end': 445.085, 'text': 'Their average dosage is 3, and that corresponds to this dotted red line.', 'start': 440.2, 'duration': 4.885}, {'end': 454.378, 'text': 'Now we can build a very simple tree that splits the observations into two groups based on whether or not dosage is less than three.', 'start': 446.576, 'duration': 7.802}, {'end': 460.62, 'text': 'The point on the far left is the only one with dosage less than three.', 'start': 456.319, 'duration': 4.301}, {'end': 466.181, 'text': 'And the average drug effectiveness for that one point is zero.', 'start': 462.24, 'duration': 3.941}, {'end': 472.103, 'text': 'So we put zero in the leaf on the left side for when dosage is less than three.', 'start': 467.582, 'duration': 4.521}, {'end': 479.332, 'text': 'All of the other points have dosages greater than or equal to 3.', 'start': 474.068, 'duration': 5.264}, {'end': 488.519, 'text': 'And the average drug effectiveness for all of the points with dosages greater than or equal to 3 is 38.8.', 'start': 479.332, 'duration': 9.187}, {'end': 494.823, 'text': 'So we put 38.8 in the leaf on the right side for when dosage is greater than or equal to 3.', 'start': 488.519, 'duration': 6.304}, {'end': 502.169, 'text': 'The values in each leaf are the predictions that this simple tree will make for drug effectiveness.', 'start': 494.823, 'duration': 7.346}, {'end': 514.866, 'text': 'For example, this point on the far left has dosage less than 3, and the tree predicts that the drug effectiveness will be 0.', 'start': 503.7, 'duration': 11.166}, {'end': 522.691, 'text': 'The prediction for this point, drug effectiveness equals 0, is pretty good since it is the same as the observed value.', 'start': 514.866, 'duration': 7.825}, {'end': 535.687, 'text': 'In contrast, for this point, which has dosage greater than 3, The tree predicts that the drug effectiveness will be 38.8.', 'start': 524.372, 'duration': 11.315}], 'summary': 'Simple tree predicts drug effectiveness based on dosage, average effectiveness is 0 for dosage < 3, and 38.8 for dosage >= 3.', 'duration': 89.111, 'max_score': 431.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4431912.jpg'}, {'end': 839.142, 'src': 'heatmap', 'start': 727.663, 'weight': 3, 'content': [{'end': 737.052, 'text': 'Again, the new threshold gives us new predictions, new residuals, and a new sum of squared residuals.', 'start': 727.663, 'duration': 9.389}, {'end': 747.779, 'text': 'Now shift the threshold over to the average dosage for the next two points, and add a new sum of squared residuals to the graph.', 'start': 739.014, 'duration': 8.765}, {'end': 755.244, 'text': 'And we repeat until we have calculated the sum of squared residuals for all of the remaining thresholds.', 'start': 749.641, 'duration': 5.603}, {'end': 764.911, 'text': 'Bam Now we can see the sum of squared residuals for all of the thresholds.', 'start': 759.527, 'duration': 5.384}, {'end': 771.514, 'text': 'And dosage less than 14.5 has the smallest sum of squared residuals.', 'start': 766.249, 'duration': 5.265}, {'end': 777.299, 'text': 'So dosage less than 14.5 will be the root of the tree.', 'start': 773.075, 'duration': 4.224}, {'end': 786.547, 'text': 'In summary, we split the data into two groups by finding the threshold that gave us the smallest sum of squared residuals.', 'start': 779, 'duration': 7.547}, {'end': 799.065, 'text': "Bam! Now let's focus on the six observations with dosage less than 14.5 that ended up in the node to the left of the root.", 'start': 787.828, 'duration': 11.237}, {'end': 807.551, 'text': 'In theory, we could split these six observations into two smaller groups, just like we did before,', 'start': 800.766, 'duration': 6.785}, {'end': 816.137, 'text': 'by calculating the sum of squared residuals for different thresholds and choosing the threshold with the lowest sum of squared residuals.', 'start': 807.551, 'duration': 8.586}, {'end': 832.199, 'text': 'Note, this observation has dosage less than 14.5, and does not have dosage less than 11.5, so it is the only observation to end up in this node.', 'start': 818.479, 'duration': 13.72}, {'end': 839.142, 'text': "And since we can't split a single observation into two groups, we will call this node a leaf.", 'start': 833.24, 'duration': 5.902}], 'summary': 'Finding thresholds with smallest sum of squared residuals to split data into groups for analysis.', 'duration': 58.884, 'max_score': 727.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4727663.jpg'}, {'end': 970.854, 'src': 'embed', 'start': 940.736, 'weight': 5, 'content': [{'end': 946.759, 'text': 'In machine learning lingo, the model has no bias but potentially large variance.', 'start': 940.736, 'duration': 6.023}, {'end': 957.004, 'text': 'Bummer Is there a way to prevent our tree from overfitting the training data? Yes, there are a bunch of techniques.', 'start': 947.959, 'duration': 9.045}, {'end': 963.411, 'text': 'The simplest is to only split observations when there are more than some minimum number.', 'start': 958.168, 'duration': 5.243}, {'end': 970.854, 'text': 'Typically, the minimum number of observations to allow for a split is 20.', 'start': 964.511, 'duration': 6.343}], 'summary': 'In machine learning, preventing overfitting by setting minimum split observations to 20 reduces variance.', 'duration': 30.118, 'max_score': 940.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4940736.jpg'}], 'start': 397.622, 'title': 'Regression tree building and predictive modeling', 'summary': 'Explains building a regression tree with 98% accuracy to predict drug effectiveness based on dosage and discusses predictive model evaluation, threshold selection, and overfitting avoidance.', 'chapters': [{'end': 535.687, 'start': 397.622, 'title': 'Regression tree building', 'summary': 'Explains how to build a regression tree from scratch to predict drug effectiveness based on dosage, with 98% accuracy, and demonstrates the process using a simple example.', 'duration': 138.065, 'highlights': ['The regression trees can easily handle complicated data with 98% accuracy, making it suitable for predicting drug effectiveness based on dosage.', 'The process of building a regression tree from scratch is explained, starting with splitting the observations into two groups based on dosage less than or greater than three and predicting drug effectiveness based on these splits.', 'The average drug effectiveness for points with dosages greater than or equal to 3 is 38.8, while for points with dosage less than three, it is zero, providing specific predictions for drug effectiveness based on dosage.']}, {'end': 970.854, 'start': 535.687, 'title': 'Predictive modeling and threshold selection', 'summary': 'Discusses evaluating predictive models using sum of squared residuals, selecting thresholds for data splitting, and addressing overfitting in predictive models, emphasizing the significance of minimizing residual sums and avoiding overfitting by setting minimum observations for splitting.', 'duration': 435.167, 'highlights': ['Threshold selection based on sum of squared residuals The chapter emphasizes using the sum of squared residuals to evaluate predictions at different thresholds, with the lowest sum of squared residuals indicating the optimal threshold for data splitting.', 'Impact of threshold selection on predictive accuracy It illustrates how selecting different thresholds based on average dosages results in varying sum of squared residuals, with the threshold of less than 14.5 yielding the smallest sum of squared residuals, leading to improved predictive accuracy.', 'Overfitting prevention techniques The chapter highlights the risk of overfitting in predictive models that fit training data perfectly and suggests preventing overfitting by imposing a minimum number of observations for data splitting, typically setting the minimum threshold at 20 observations.']}], 'duration': 573.232, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4397622.jpg', 'highlights': ['Regression trees achieve 98% accuracy in predicting drug effectiveness based on dosage.', 'Building a regression tree involves splitting observations based on dosage and predicting drug effectiveness.', 'Average drug effectiveness for dosages greater than or equal to 3 is 38.8, while for dosages less than three, it is zero.', 'Sum of squared residuals is used for threshold selection to evaluate predictions at different thresholds.', 'Selecting a threshold of less than 14.5 yields the smallest sum of squared residuals, improving predictive accuracy.', 'Overfitting in predictive models can be prevented by imposing a minimum number of observations for data splitting, typically setting the minimum threshold at 20.']}, {'end': 1347.049, 'segs': [{'end': 1036.823, 'src': 'embed', 'start': 970.854, 'weight': 0, 'content': [{'end': 977.958, 'text': "However, since this example doesn't have many observations, I set the minimum to 7.", 'start': 970.854, 'duration': 7.104}, {'end': 988.268, 'text': 'In other words, since there are only 6 observations with dosage less than 14.5, we will not split the observations in this node.', 'start': 977.958, 'duration': 10.31}, {'end': 1001.71, 'text': 'Instead, this node will become a leaf and the output will be the average drug effectiveness for the six observations with dosage less than 14.5,', 'start': 989.948, 'duration': 11.762}, {'end': 1003.93, 'text': '4.2%.', 'start': 1001.71, 'duration': 2.22}, {'end': 1016.092, 'text': 'Bam! Now we need to figure out what to do with the remaining 13 observations with dosages greater than or equal to 14.5.', 'start': 1003.931, 'duration': 12.161}, {'end': 1021.855, 'text': 'Since we have more than seven observations on the right side, we can split them into two groups.', 'start': 1016.092, 'duration': 5.763}, {'end': 1028.298, 'text': 'And we do that by finding the threshold that gives us the smallest sum of squared residuals.', 'start': 1023.336, 'duration': 4.962}, {'end': 1036.823, 'text': 'Note, there are only four observations with dosage greater than or equal to 29.', 'start': 1030.079, 'duration': 6.744}], 'summary': 'Using a decision tree, 6 observations with dosage less than 14.5 result in 4.2% effectiveness, while 13 observations with dosages greater than or equal to 14.5 are split into two groups based on the smallest sum of squared residuals, with only 4 observations at a dosage greater than or equal to 29.', 'duration': 65.969, 'max_score': 970.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4970854.jpg'}, {'end': 1120.264, 'src': 'embed', 'start': 1067.12, 'weight': 3, 'content': [{'end': 1073.741, 'text': 'we can split them into two groups by finding the threshold that gives us the minimum sum of squared residuals.', 'start': 1067.12, 'duration': 6.621}, {'end': 1081.483, 'text': 'Note, since there are fewer than seven observations in each of these two groups.', 'start': 1075.502, 'duration': 5.981}, {'end': 1086.925, 'text': 'this is the last split, because none of the leaves have more than seven observations in them.', 'start': 1081.483, 'duration': 5.442}, {'end': 1094.84, 'text': 'So we use the average drug effectiveness for the observations with dosages between 14.5 and 23.5 as the output for the leaf on the right.', 'start': 1088.638, 'duration': 6.202}, {'end': 1100.182, 'text': 'And we use the average drug effectiveness for observations with dosages between 23.5 and 29 as the output for the leaf on the left.', 'start': 1094.86, 'duration': 5.322}, {'end': 1120.264, 'text': "Since no leaf has more than 7 observations in it, We're done building the tree.", 'start': 1100.202, 'duration': 20.062}], 'summary': 'A decision tree was built with splits based on dosage ranges, with leaves containing less than seven observations, and the final output for the leaves was determined based on the average drug effectiveness within specific dosage ranges.', 'duration': 53.144, 'max_score': 1067.12, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ41067120.jpg'}, {'end': 1179.47, 'src': 'heatmap', 'start': 1155.597, 'weight': 0.745, 'content': [{'end': 1166.084, 'text': 'we will try different thresholds for dosage and calculate the sum of squared residuals at each step and pick the threshold that gives us the minimum sum of squared residuals.', 'start': 1155.597, 'duration': 10.487}, {'end': 1170.807, 'text': 'The best threshold becomes a candidate for the root.', 'start': 1167.545, 'duration': 3.262}, {'end': 1176.171, 'text': 'Now we focus on using age to predict drug effectiveness.', 'start': 1172.708, 'duration': 3.463}, {'end': 1179.47, 'text': 'Just like with dosage,', 'start': 1178.069, 'duration': 1.401}], 'summary': 'Testing dosage thresholds to find minimum sum of squared residuals and using age to predict drug effectiveness.', 'duration': 23.873, 'max_score': 1155.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ41155597.jpg'}, {'end': 1230.275, 'src': 'embed', 'start': 1205.307, 'weight': 5, 'content': [{'end': 1212.509, 'text': 'So we use that threshold to calculate the sum of squared residuals, and that becomes another candidate for the root.', 'start': 1205.307, 'duration': 7.202}, {'end': 1222.773, 'text': 'Now we compare the sum of squared residuals, SSRs, for each candidate and pick the candidate with the lowest value.', 'start': 1214.25, 'duration': 8.523}, {'end': 1230.275, 'text': 'Since age greater than 50 had the lowest sum of squared residuals, it becomes the root of the tree.', 'start': 1224.533, 'duration': 5.742}], 'summary': 'Using a threshold, we calculate sum of squared residuals to find the root, with age>50 as the chosen root.', 'duration': 24.968, 'max_score': 1205.307, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ41205307.jpg'}, {'end': 1259.066, 'src': 'heatmap', 'start': 1205.307, 'weight': 0.928, 'content': [{'end': 1212.509, 'text': 'So we use that threshold to calculate the sum of squared residuals, and that becomes another candidate for the root.', 'start': 1205.307, 'duration': 7.202}, {'end': 1222.773, 'text': 'Now we compare the sum of squared residuals, SSRs, for each candidate and pick the candidate with the lowest value.', 'start': 1214.25, 'duration': 8.523}, {'end': 1230.275, 'text': 'Since age greater than 50 had the lowest sum of squared residuals, it becomes the root of the tree.', 'start': 1224.533, 'duration': 5.742}, {'end': 1238.615, 'text': 'Then we grow the tree just like before, except now we compare the lowest sum of squared residuals from each predictor.', 'start': 1231.991, 'duration': 6.624}, {'end': 1248.12, 'text': 'And, just like before, when a leaf has less than a minimum number of observations, which is usually 20, but we are using 7,', 'start': 1239.715, 'duration': 8.405}, {'end': 1249.601, 'text': 'we stop trying to divide them.', 'start': 1248.12, 'duration': 1.481}, {'end': 1259.066, 'text': 'Triple bam! In summary, regression trees are a type of decision tree.', 'start': 1251.222, 'duration': 7.844}], 'summary': 'Using sum of squared residuals to determine root in regression tree.', 'duration': 53.759, 'max_score': 1205.307, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ41205307.jpg'}, {'end': 1314.515, 'src': 'embed', 'start': 1288.954, 'weight': 6, 'content': [{'end': 1297.301, 'text': 'we find the optimal threshold for each one and pick the candidate with the smallest sum of squared residuals to be the root.', 'start': 1288.954, 'duration': 8.347}, {'end': 1308.61, 'text': 'When we have fewer than some minimum number of observations in a node, seven in this example, but more commonly 20, then that node becomes a leaf.', 'start': 1298.842, 'duration': 9.768}, {'end': 1314.515, 'text': 'Otherwise, we repeat the process to split the remaining observations.', 'start': 1310.451, 'duration': 4.064}], 'summary': 'Optimize thresholds to find root candidate with smallest sum of squared residuals, with a minimum of 7 observations for a leaf.', 'duration': 25.561, 'max_score': 1288.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ41288954.jpg'}], 'start': 970.854, 'title': 'Decision trees & regression', 'summary': 'Explains decision tree analysis, demonstrating the splitting of observations based on dosage thresholds and building regression trees to predict drug effectiveness, with emphasis on 7 observations as a threshold for splitting and minimizing sum of squared residuals.', 'chapters': [{'end': 1086.925, 'start': 970.854, 'title': 'Decision tree analysis', 'summary': 'Explains the decision tree analysis process, demonstrating the splitting of observations based on dosage thresholds to form leaf nodes with average drug effectiveness outputs, with specific emphasis on the minimum of seven observations as a threshold for splitting.', 'duration': 116.071, 'highlights': ['Split observations based on dosage thresholds to form leaf nodes with average drug effectiveness outputs with a minimum of seven observations as a threshold for splitting.', 'Demonstration of splitting observations with dosages less than 14.5 into a leaf node with an output of 4.2% based on the average drug effectiveness for the six observations.', 'Illustration of splitting observations with dosages greater than or equal to 14.5 into two groups, with the smallest sum of squared residuals determining the threshold, and noting the presence of only four observations with a dosage greater than or equal to 29.', 'Explanation of splitting the 9 observations with dosages between 14.5 and 29 into two groups based on the threshold that yields the minimum sum of squared residuals, concluding that this is the last split due to none of the leaves having more than seven observations.']}, {'end': 1347.049, 'start': 1088.638, 'title': 'Building regression trees for predicting drug effectiveness', 'summary': 'Explores the process of building a regression tree to predict drug effectiveness by using different predictors and thresholds, with an emphasis on minimizing sum of squared residuals and stopping when a leaf has less than 7 observations.', 'duration': 258.411, 'highlights': ['Each leaf of the regression tree represents the average drug effectiveness from a different cluster of observations. The average drug effectiveness for observations within specific dosage ranges is used as the output for the respective leaf, with no leaf having more than 7 observations.', 'The tree-building process involves trying different thresholds for each predictor and selecting the one that minimizes the sum of squared residuals. Thresholds for predictors like age and sex are assessed to identify the one with the minimum sum of squared residuals, and the candidate with the lowest value becomes the root of the tree.', 'The chapter emphasizes stopping the tree-building process when a leaf has less than 7 observations. When a leaf has fewer than the specified minimum number of observations, the process stops, and the node becomes a leaf representing the numeric value. The process is repeated to split remaining observations until no further division is possible.']}], 'duration': 376.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/g9c66TUylZ4/pics/g9c66TUylZ4970854.jpg', 'highlights': ['Split observations based on dosage thresholds to form leaf nodes with average drug effectiveness outputs with a minimum of seven observations as a threshold for splitting.', 'Demonstration of splitting observations with dosages less than 14.5 into a leaf node with an output of 4.2% based on the average drug effectiveness for the six observations.', 'Illustration of splitting observations with dosages greater than or equal to 14.5 into two groups, with the smallest sum of squared residuals determining the threshold, and noting the presence of only four observations with a dosage greater than or equal to 29.', 'Explanation of splitting the 9 observations with dosages between 14.5 and 29 into two groups based on the threshold that yields the minimum sum of squared residuals, concluding that this is the last split due to none of the leaves having more than seven observations.', 'Each leaf of the regression tree represents the average drug effectiveness from a different cluster of observations. The average drug effectiveness for observations within specific dosage ranges is used as the output for the respective leaf, with no leaf having more than 7 observations.', 'The tree-building process involves trying different thresholds for each predictor and selecting the one that minimizes the sum of squared residuals. Thresholds for predictors like age and sex are assessed to identify the one with the minimum sum of squared residuals, and the candidate with the lowest value becomes the root of the tree.', 'The chapter emphasizes stopping the tree-building process when a leaf has less than 7 observations. When a leaf has fewer than the specified minimum number of observations, the process stops, and the node becomes a leaf representing the numeric value. The process is repeated to split remaining observations until no further division is possible.']}], 'highlights': ['Regression trees achieve 98% accuracy in predicting drug effectiveness based on dosage.', 'Regression trees accommodate multiple predictors and make accurate predictions in complex scenarios.', 'Regression trees reflect data better than straight lines, making them more effective for predicting drug effectiveness in different clusters of observations.', 'Building a regression tree involves splitting observations based on dosage and predicting drug effectiveness.', 'Average drug effectiveness for dosages greater than or equal to 3 is 38.8, while for dosages less than three, it is zero.', 'Sum of squared residuals is used for threshold selection to evaluate predictions at different thresholds.', 'Selecting a threshold of less than 14.5 yields the smallest sum of squared residuals, improving predictive accuracy.', 'Overfitting in predictive models can be prevented by imposing a minimum number of observations for data splitting, typically setting the minimum threshold at 20.', 'Split observations based on dosage thresholds to form leaf nodes with average drug effectiveness outputs with a minimum of seven observations as a threshold for splitting.', 'Demonstration of splitting observations with dosages less than 14.5 into a leaf node with an output of 4.2% based on the average drug effectiveness for the six observations.', 'Illustration of splitting observations with dosages greater than or equal to 14.5 into two groups, with the smallest sum of squared residuals determining the threshold, and noting the presence of only four observations with a dosage greater than or equal to 29.', 'Explanation of splitting the 9 observations with dosages between 14.5 and 29 into two groups based on the threshold that yields the minimum sum of squared residuals, concluding that this is the last split due to none of the leaves having more than seven observations.', 'Each leaf of the regression tree represents the average drug effectiveness from a different cluster of observations. The average drug effectiveness for observations within specific dosage ranges is used as the output for the respective leaf, with no leaf having more than 7 observations.', 'The tree-building process involves trying different thresholds for each predictor and selecting the one that minimizes the sum of squared residuals. Thresholds for predictors like age and sex are assessed to identify the one with the minimum sum of squared residuals, and the candidate with the lowest value becomes the root of the tree.', 'The chapter emphasizes stopping the tree-building process when a leaf has less than 7 observations. When a leaf has fewer than the specified minimum number of observations, the process stops, and the node becomes a leaf representing the numeric value. The process is repeated to split remaining observations until no further division is possible.']}