title

Linear Regression, Clearly Explained!!!

description

The concepts behind linear regression, fitting a line to data with least squares and R-squared, are pretty darn simple, so let's get down to it! NOTE: This StatQuest comes with a companion video for how to do linear regression in R: https://youtu.be/u1cc1r_Y7M0
You can also find example code at the StatQuest github: https://github.com/StatQuest/linear_regression_demo
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:37 The Main Ideas!!!
1:12 Review of fitting a line to data
4:00 Review of R-squared
12:13 R-squared for a multivariable model
14:16 Why adding variables will never reduce R-squared
16:08 Calculating a p-value for R-squared
25:26 The F-distribution
Correction:
25:39 I should have (Pfit - Pmean) instead of the other way around.
#statquest #regression

detail

{'title': 'Linear Regression, Clearly Explained!!!', 'heatmap': [{'end': 604.127, 'start': 570.943, 'weight': 1}], 'summary': 'Explains linear regression, covering least squares fitting, r-squared, p-value calculation and examples using mouse size and weight, demonstrating the predictive power and significance testing with an emphasis on understanding r-squared and p-value in data analysis.', 'chapters': [{'end': 105.414, 'segs': [{'end': 63.047, 'src': 'embed', 'start': 38.319, 'weight': 0, 'content': [{'end': 44.663, 'text': 'I promise you, I have lots and lots of slides that talk about all the nitty-gritty details behind linear regression.', 'start': 38.319, 'duration': 6.344}, {'end': 48.386, 'text': "But first, let's talk about the main ideas behind it.", 'start': 45.224, 'duration': 3.162}, {'end': 54.59, 'text': 'The first thing you do in linear regression is use least squares to fit a line to the data.', 'start': 49.386, 'duration': 5.204}, {'end': 58.726, 'text': 'The second thing you do is calculate R squared.', 'start': 55.865, 'duration': 2.861}, {'end': 63.047, 'text': 'Lastly, calculate a p-value for R squared.', 'start': 59.886, 'duration': 3.161}], 'summary': 'Linear regression: least squares fit, r squared, p-value calculation.', 'duration': 24.728, 'max_score': 38.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo38319.jpg'}, {'end': 112.018, 'src': 'embed', 'start': 84.042, 'weight': 2, 'content': [{'end': 91.286, 'text': "I'm going to introduce some new terminology in this part of the video, so it's worth watching even if you've already seen the earlier StatQuest.", 'start': 84.042, 'duration': 7.244}, {'end': 96.049, 'text': 'That said, if you need more details, check that StatQuest out.', 'start': 92.087, 'duration': 3.962}, {'end': 105.414, 'text': "For this review, we're going to be talking about a dataset where we took a bunch of mice and we measured their size and we measured their weight.", 'start': 97.289, 'duration': 8.125}, {'end': 112.018, 'text': 'Our goal is to use mouse weight as a way to predict mouse size.', 'start': 106.695, 'duration': 5.323}], 'summary': 'Introducing new terminology for predicting mouse size using weight.', 'duration': 27.976, 'max_score': 84.042, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo84042.jpg'}], 'start': 13.645, 'title': 'Linear regression', 'summary': "Covers main concepts of linear regression, including least squares fitting, r squared calculation, and p-value calculation, with a mention of the dataset involving measurements of mice's size and weight.", 'chapters': [{'end': 105.414, 'start': 13.645, 'title': 'Understanding linear regression', 'summary': "Discusses the main concepts of linear regression, including least squares fitting, r squared calculation, and p-value calculation, with a mention of the dataset involving measurements of mice's size and weight.", 'duration': 91.769, 'highlights': ['The first step in linear regression involves using least squares to fit a line to the data, followed by the calculation of R squared and a p-value for R squared.', 'The chapter emphasizes the importance of understanding the main ideas behind linear regression, including least squares fitting, R squared calculation, and p-value calculation.', "It introduces new terminology and discusses a dataset involving the measurements of mice's size and weight."]}], 'duration': 91.769, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo13645.jpg', 'highlights': ['The first step in linear regression involves using least squares to fit a line to the data, followed by the calculation of R squared and a p-value for R squared.', 'The chapter emphasizes the importance of understanding the main ideas behind linear regression, including least squares fitting, R squared calculation, and p-value calculation.', "It introduces new terminology and discusses a dataset involving the measurements of mice's size and weight."]}, {'end': 634.03, 'segs': [{'end': 221.471, 'src': 'embed', 'start': 166.457, 'weight': 1, 'content': [{'end': 174.178, 'text': 'So in this graph, we have the sum of squared residuals on the y-axis, and the different rotations on the x-axis.', 'start': 166.457, 'duration': 7.721}, {'end': 179.682, 'text': 'Lastly, you find the rotation that has the least sum of squares.', 'start': 175.299, 'duration': 4.383}, {'end': 187.888, 'text': 'More details about how this is actually done in practice are provided in the StatQuest on fitting a line to data.', 'start': 180.863, 'duration': 7.025}, {'end': 196.774, 'text': 'So, we see that this rotation is the one with the least squares, so it will be the one to fit to the data.', 'start': 189.149, 'duration': 7.625}, {'end': 203.277, 'text': 'This is our least squares rotation superimposed on the original data.', 'start': 198.395, 'duration': 4.882}, {'end': 210.022, 'text': 'Bam! Now we know why the method for fitting a line is called least squares.', 'start': 204.398, 'duration': 5.624}, {'end': 213.285, 'text': 'Now we have fit a line to the data.', 'start': 211.684, 'duration': 1.601}, {'end': 217.728, 'text': "This is awesome! Here's the equation for the line.", 'start': 213.905, 'duration': 3.823}, {'end': 221.471, 'text': 'Least squares estimated two parameters.', 'start': 218.889, 'duration': 2.582}], 'summary': 'Fitting a line using least squares minimizes sum of squared residuals, leading to a line equation and two parameters estimated.', 'duration': 55.014, 'max_score': 166.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo166457.jpg'}, {'end': 373.375, 'src': 'embed', 'start': 349.875, 'weight': 0, 'content': [{'end': 356.417, 'text': 'Now go back to the original plot and sum up the squared residuals around our least squares fit.', 'start': 349.875, 'duration': 6.542}, {'end': 363.6, 'text': "We'll call this ssfit for the sum of squares around the least squares fit.", 'start': 357.818, 'duration': 5.782}, {'end': 373.375, 'text': 'The sum of squares around the least squares fit is the sum of the distances between the data and the line squared.', 'start': 364.801, 'duration': 8.574}], 'summary': 'Summarize squared residuals around least squares fit as ssfit.', 'duration': 23.5, 'max_score': 349.875, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo349875.jpg'}, {'end': 523.048, 'src': 'embed', 'start': 457.912, 'weight': 3, 'content': [{'end': 466.098, 'text': 'R squared tells us how much of the variation in mouse size can be explained by taking mouse weight into account.', 'start': 457.912, 'duration': 8.186}, {'end': 469.24, 'text': 'This is the formula for R squared.', 'start': 467.239, 'duration': 2.001}, {'end': 476.946, 'text': "It's the variation around the mean minus the variation around the fit divided by the variation around the mean.", 'start': 469.901, 'duration': 7.045}, {'end': 479.948, 'text': "Let's look at an example.", 'start': 478.367, 'duration': 1.581}, {'end': 490.713, 'text': 'In this example, The variation around the mean equals 11.1, and the variation around the fit equals 4.4.', 'start': 480.928, 'duration': 9.785}, {'end': 492.654, 'text': 'So we plug those numbers into the equation.', 'start': 490.713, 'duration': 1.941}, {'end': 500.938, 'text': 'The result is that r squared equals 0.6, which is the same thing as saying 60%.', 'start': 493.594, 'duration': 7.344}, {'end': 509.322, 'text': 'This means there is a 60% reduction in the variance when we take the mouse weight into account.', 'start': 500.938, 'duration': 8.384}, {'end': 518.126, 'text': 'Alternatively, we can say that mouse weight explains 60% of the variation in mouse size.', 'start': 510.703, 'duration': 7.423}, {'end': 523.048, 'text': 'We can also use the sum of squares to make the same calculation.', 'start': 519.366, 'duration': 3.682}], 'summary': 'R squared measures 60% reduction in variance by considering mouse weight, explaining 60% of the variation in mouse size.', 'duration': 65.136, 'max_score': 457.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo457912.jpg'}, {'end': 604.127, 'src': 'heatmap', 'start': 570.943, 'weight': 5, 'content': [{'end': 577.125, 'text': 'In this case, knowing mouse weight means you can make a perfect prediction of mouse size.', 'start': 570.943, 'duration': 6.182}, {'end': 584.287, 'text': 'The variation around the mean is the same as it was before, 11.1.', 'start': 578.645, 'duration': 5.642}, {'end': 589.309, 'text': 'But now the variation around the fitted line equals zero, because there are no residuals.', 'start': 584.287, 'duration': 5.022}, {'end': 595.564, 'text': 'Plugging the numbers in, gives us an r-squared equal to 1, which equals 100%.', 'start': 590.35, 'duration': 5.214}, {'end': 604.127, 'text': 'In this case, mouse weight explains 100% of the variation in mouse size.', 'start': 595.564, 'duration': 8.563}], 'summary': 'Mouse weight explains 100% of mouse size variation.', 'duration': 33.184, 'max_score': 570.943, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo570943.jpg'}], 'start': 106.695, 'title': 'Predicting mouse size with least squares and understanding r-squared in regression', 'summary': 'Discusses using mouse weight to predict mouse size through the least squares method and explains the concept of r-squared in regression, demonstrating its quantification of predictive power with examples.', 'chapters': [{'end': 221.471, 'start': 106.695, 'title': 'Predicting mouse size with least squares', 'summary': 'Discusses using mouse weight to predict mouse size through the least squares method, involving rotating a line to minimize the sum of squared residuals, ultimately fitting a line to the data and obtaining the equation with two estimated parameters.', 'duration': 114.776, 'highlights': ['The least squares method involves rotating a line to minimize the sum of squared residuals, ultimately fitting a line to the data.', 'The equation for the line is obtained through the least squares method, resulting in two estimated parameters.', 'The sum of squared residuals is plotted against different rotations to find the rotation with the least sum of squares.']}, {'end': 634.03, 'start': 223.173, 'title': 'Understanding r-squared in regression', 'summary': 'Explains the concept of r-squared in regression, demonstrating how it quantifies the predictive power of a model by showing how much variation in the dependent variable can be explained by the independent variable, with examples illustrating the calculation and interpretation of r-squared values, including a case where the independent variable has no predictive power.', 'duration': 410.857, 'highlights': ['The formula for R squared is the variation around the mean minus the variation around the fit divided by the variation around the mean, providing a quantifiable measure of the predictive power of the model.', 'An example demonstrates that 60% of the variation in mouse size can be explained by mouse weight, with the r-squared value calculated as 0.6.', 'In another example, when knowing mouse weight means a perfect prediction of mouse size, the r-squared value is 1, indicating that 100% of the variation in mouse size can be explained by mouse weight.', 'A scenario is presented where knowing mouse weight does not help in predicting mouse size, resulting in an r-squared value of 0, indicating that the independent variable has no predictive power.']}], 'duration': 527.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo106695.jpg', 'highlights': ['The least squares method fits a line to data by minimizing sum of squared residuals.', 'The equation for the line is obtained through the least squares method, resulting in two estimated parameters.', 'The sum of squared residuals is plotted to find the rotation with the least sum of squares.', 'R squared quantifies the predictive power of the model by comparing variations.', 'An example demonstrates 60% of the variation in mouse size explained by weight, with r-squared value 0.6.', 'When mouse weight perfectly predicts size, r-squared value is 1, indicating 100% explained variation.', 'Knowing mouse weight does not help predict size results in an r-squared value of 0.']}, {'end': 972.526, 'segs': [{'end': 723.386, 'src': 'embed', 'start': 696.764, 'weight': 1, 'content': [{'end': 705.587, 'text': 'This gave us an r-squared of 60%, meaning 60% of the variation in mouse size could be explained by mouse weight.', 'start': 696.764, 'duration': 8.823}, {'end': 710.769, 'text': 'But the concept applies to any equation, no matter how complicated.', 'start': 706.667, 'duration': 4.102}, {'end': 716.664, 'text': 'First, you measure, square, and sum the distance from the data to the mean.', 'start': 711.943, 'duration': 4.721}, {'end': 723.386, 'text': 'Then measure, square, and sum the distance from the data to the complicated equation.', 'start': 717.604, 'duration': 5.782}], 'summary': 'R-squared of 60% explains mouse size by weight.', 'duration': 26.622, 'max_score': 696.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo696764.jpg'}, {'end': 816.287, 'src': 'embed', 'start': 751.386, 'weight': 0, 'content': [{'end': 756.389, 'text': 'We want to know how well weight and tail length predict body length.', 'start': 751.386, 'duration': 5.003}, {'end': 767.129, 'text': 'The first mouse we measured had weight equals 2.1, Tail length equals 1.3 and body length equals 2.5.', 'start': 757.45, 'duration': 9.679}, {'end': 772.572, 'text': "So that's how we plot this data on this 3D graph.", 'start': 767.129, 'duration': 5.443}, {'end': 775.674, 'text': "Here's all the data in the graph.", 'start': 773.893, 'duration': 1.781}, {'end': 782.278, 'text': 'The larger circles are points that are closer to us and represent mice that have shorter tails.', 'start': 776.374, 'duration': 5.904}, {'end': 789.182, 'text': 'The smaller circles are points that are further from us and represent mice with longer tails.', 'start': 783.278, 'duration': 5.904}, {'end': 792.652, 'text': 'Now we do a least squares fit.', 'start': 790.751, 'duration': 1.901}, {'end': 801.218, 'text': 'Since we have the extra term in the equation representing an extra dimension, we fit a plane instead of a line.', 'start': 793.493, 'duration': 7.725}, {'end': 804.34, 'text': "Here's the equation for the plane.", 'start': 802.399, 'duration': 1.941}, {'end': 808.382, 'text': 'The y value represents body length.', 'start': 805.721, 'duration': 2.661}, {'end': 813.045, 'text': 'Least squares estimates three different parameters.', 'start': 809.743, 'duration': 3.302}, {'end': 816.287, 'text': 'The first is the y-intercept.', 'start': 814.246, 'duration': 2.041}], 'summary': "Examining weight and tail length's prediction of body length via 3d graph and plane equation.", 'duration': 64.901, 'max_score': 751.386, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo751386.jpg'}, {'end': 972.526, 'src': 'embed', 'start': 943.397, 'weight': 2, 'content': [{'end': 945.938, 'text': "Here's the frowny face of sad times.", 'start': 943.397, 'duration': 2.541}, {'end': 950.96, 'text': 'The more silly parameters we add to the equation,', 'start': 947.698, 'duration': 3.262}, {'end': 958.223, 'text': 'the more opportunities we have for random events to reduce sum of squares fit and result in a better r-squared.', 'start': 950.96, 'duration': 7.263}, {'end': 967.583, 'text': 'Thus, people report an adjusted R-squared value that, in essence, scales R-squared by the number of parameters.', 'start': 959.198, 'duration': 8.385}, {'end': 972.526, 'text': "R-squared is awesome, but it's missing something.", 'start': 969.184, 'duration': 3.342}], 'summary': 'Adding parameters increases random events, affecting r-squared fit.', 'duration': 29.129, 'max_score': 943.397, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo943397.jpg'}], 'start': 635.171, 'title': 'R-squared in data analysis', 'summary': 'Explains the concept of r-squared, its calculation, and its role in indicating the explanatory power of a model, using the example of predicting mouse body length from weight and tail length, with an adjusted r-squared to account for parameters.', 'chapters': [{'end': 749.865, 'start': 635.171, 'title': 'Understanding r-squared in data analysis', 'summary': "Explains the concept of r-squared in data analysis, using examples to illustrate how the variation around the mean and fit can be calculated, and how r-squared can indicate the explanatory power of a model, such as in the case of mouse weight and tail length predicting the length of a mouse's body.", 'duration': 114.694, 'highlights': ["The concept of r-squared is illustrated using examples to show variation around the mean and fit, with a 0% r-squared indicating that mouse weight doesn't explain any of the variation around the mean, and a 60% r-squared indicating that 60% of the variation in mouse size could be explained by mouse weight.", 'Explanation of how to calculate the sum of squares around the mean and a complicated equation, highlighting the process of measuring, squaring, and summing the distance from the data to the mean or equation.', "Illustration of a more complicated example where mouse weight and tail length are used to predict the length of a mouse's body, requiring a three-dimensional graph to plot the data."]}, {'end': 972.526, 'start': 751.386, 'title': 'Predicting body length from weight and tail length', 'summary': 'Explains how weight and tail length are used to predict body length in mice using a 3d graph, a least squares fit plane equation, and the concept of r-squared, which is adjusted to account for the number of parameters.', 'duration': 221.14, 'highlights': ['The chapter explains how weight and tail length are used to predict body length in mice using a 3D graph', 'Least squares fit plane equation is used to predict body length', 'The concept of adjusted R-squared to account for the number of parameters']}], 'duration': 337.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo635171.jpg', 'highlights': ['Illustration of a more complicated example using mouse weight and tail length to predict body length, requiring a three-dimensional graph.', 'Explanation of how to calculate the sum of squares around the mean and a complicated equation, highlighting the process of measuring, squaring, and summing the distance from the data to the mean or equation.', 'The concept of adjusted R-squared to account for the number of parameters.', "The concept of r-squared is illustrated using examples to show variation around the mean and fit, with a 0% r-squared indicating that mouse weight doesn't explain any of the variation around the mean, and a 60% r-squared indicating that 60% of the variation in mouse size could be explained by mouse weight.", 'The chapter explains how weight and tail length are used to predict body length in mice using a 3D graph.', 'Least squares fit plane equation is used to predict body length.']}, {'end': 1387.783, 'segs': [{'end': 1078.932, 'src': 'embed', 'start': 997.969, 'weight': 0, 'content': [{'end': 1006.937, 'text': "What this means is when we calculate R squared by plugging the numbers in, we're going to get 100%.", 'start': 997.969, 'duration': 8.968}, {'end': 1007.918, 'text': '100% is a great number.', 'start': 1006.937, 'duration': 0.981}, {'end': 1014.103, 'text': "We've explained all the variation, but any two random points will give us the exact same thing.", 'start': 1008.298, 'duration': 5.805}, {'end': 1016.245, 'text': "It doesn't actually mean anything.", 'start': 1014.683, 'duration': 1.562}, {'end': 1022.919, 'text': 'We need a way to determine if the r-squared value is statistically significant.', 'start': 1017.677, 'duration': 5.242}, {'end': 1025.24, 'text': 'We need a p-value.', 'start': 1024.16, 'duration': 1.08}, {'end': 1033.144, 'text': "Before we calculate the p-value, let's review the main concepts behind r-squared one last time.", 'start': 1026.34, 'duration': 6.804}, {'end': 1043.328, 'text': 'The general equation for r-squared is the variance around the mean minus the variance around the fit divided by the variance around the mean.', 'start': 1034.624, 'duration': 8.704}, {'end': 1052.721, 'text': 'In our example, this means the variation in the mouse size, minus the variation after taking weight into account,', 'start': 1044.473, 'duration': 8.248}, {'end': 1055.123, 'text': 'divided by the variation in mouse size.', 'start': 1052.721, 'duration': 2.402}, {'end': 1065.332, 'text': 'In other words, r squared equals the variation in mouse size explained by weight, divided by the variation in mouse size,', 'start': 1056.444, 'duration': 8.888}, {'end': 1067.274, 'text': 'without taking weight into account.', 'start': 1065.332, 'duration': 1.942}, {'end': 1078.932, 'text': 'In this particular example, r-squared equals 0.6, meaning we saw a 60% reduction in variation once we took mouse weight into account.', 'start': 1068.686, 'duration': 10.246}], 'summary': 'R-squared value of 100% explained all variation, with a 60% reduction in variation after accounting for mouse weight.', 'duration': 80.963, 'max_score': 997.969, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo997969.jpg'}, {'end': 1329.643, 'src': 'embed', 'start': 1298.694, 'weight': 1, 'content': [{'end': 1303.576, 'text': "In our example, that's the variance in mouse size explained by mouse weight.", 'start': 1298.694, 'duration': 4.882}, {'end': 1314.621, 'text': 'If we had used mouse weight and tail length to explain variation in size, then we would end up with an equation that had three parameters,', 'start': 1304.977, 'duration': 9.644}, {'end': 1317.714, 'text': 'and pfit would equal 3..', 'start': 1314.621, 'duration': 3.093}, {'end': 1324.92, 'text': 'Thus, pfit minus pmean would equal 3 minus 1, which equals 2.', 'start': 1317.714, 'duration': 7.206}, {'end': 1329.643, 'text': 'Now the fit has two extra parameters, mouse weight and tail length.', 'start': 1324.92, 'duration': 4.723}], 'summary': 'Variance in mouse size explained by mouse weight, with 2 extra parameters.', 'duration': 30.949, 'max_score': 1298.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo1298694.jpg'}], 'start': 973.807, 'title': 'Understanding r-squared and p-value', 'summary': 'Delves into explaining the concepts of r-squared and p-value, with a notable example demonstrating a 60% reduction in variation explained by weight. it emphasizes the importance of determining statistical significance and provides insights into the impact of the number of parameters on variance explanation.', 'chapters': [{'end': 1111.825, 'start': 973.807, 'title': 'R-squared and p-value calculation', 'summary': 'Explains the concepts of r-squared and p-value, with an example demonstrating an 60% reduction in variation explained by weight, and emphasizes the importance of determining the statistical significance.', 'duration': 138.018, 'highlights': ['R-squared equals the variation in mouse size explained by weight, divided by the variation in mouse size, resulting in a 60% reduction in variation once weight is taken into account.', 'Emphasizes the need for a p-value to determine the statistical significance of the r-squared value.', "Explains that the sum of squares around the fit equals zero, leading to an R-squared value of 100% which, while great, doesn't actually mean anything without a way to determine statistical significance.", 'Introduces the concept of f, which is equal to the variation in mouse size explained by weight divided by the variation in mouse size not explained by weight, and highlights that the numerators for R-squared and for F are the same.']}, {'end': 1387.783, 'start': 1113.207, 'title': 'R-squared: understanding variance explanation', 'summary': 'Explains the concept of r-squared, including the calculation of the numerator and denominator, the significance of r-squared, and the impact of the number of parameters on variance explanation, with a focus on mouse size and weight example.', 'duration': 274.576, 'highlights': ['The numerator for the R-squared equation represents the variation in mouse size explained by weight, while the denominator is the variation in mouse size not explained by weight.', 'The number of parameters in the fit line impacts the variance explanation, with more parameters requiring more data to estimate them, as illustrated by the example of using mouse weight and tail length to explain variation in size.', 'The equation for R-squared helps determine if the explained variance is significant, relying on the same sums of squares and the significance of degrees of freedom in converting sums of squares into variances.']}], 'duration': 413.976, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo973807.jpg', 'highlights': ['R-squared equals the variation in mouse size explained by weight, divided by the variation in mouse size, resulting in a 60% reduction in variation once weight is taken into account.', 'The number of parameters in the fit line impacts the variance explanation, with more parameters requiring more data to estimate them, as illustrated by the example of using mouse weight and tail length to explain variation in size.', 'Emphasizes the need for a p-value to determine the statistical significance of the r-squared value.', 'The equation for R-squared helps determine if the explained variance is significant, relying on the same sums of squares and the significance of degrees of freedom in converting sums of squares into variances.']}, {'end': 1646.183, 'segs': [{'end': 1442.533, 'src': 'embed', 'start': 1387.783, 'weight': 2, 'content': [{'end': 1398.068, 'text': 'then the variation explained by the extra parameters in the fit will be a large number and the variation not explained by the extra parameters in the fit will be a small number.', 'start': 1387.783, 'duration': 10.285}, {'end': 1401.449, 'text': 'That makes F a really large number.', 'start': 1398.888, 'duration': 2.561}, {'end': 1413.759, 'text': "That question we've all been dying to know the answer to, how do we turn this number into a p-value? Conceptually, generate a set of random data.", 'start': 1403.171, 'duration': 10.588}, {'end': 1418.142, 'text': 'Calculate the mean and the sum of squares around the mean.', 'start': 1414.839, 'duration': 3.303}, {'end': 1422.745, 'text': 'Calculate the fit and the sum of squares around the fit.', 'start': 1419.203, 'duration': 3.542}, {'end': 1429.65, 'text': 'Now plug all those values into our equation for f, and that will give us a number.', 'start': 1424.146, 'duration': 5.504}, {'end': 1432.987, 'text': 'In this case, that number is 2.', 'start': 1430.13, 'duration': 2.857}, {'end': 1435.108, 'text': 'Now plot that number in a histogram.', 'start': 1432.987, 'duration': 2.121}, {'end': 1438.711, 'text': 'Now generate another set of random data.', 'start': 1436.169, 'duration': 2.542}, {'end': 1442.533, 'text': 'Calculate the mean and the sum of squares around the mean.', 'start': 1439.571, 'duration': 2.962}], 'summary': 'Using f statistic to generate p-value from random data.', 'duration': 54.75, 'max_score': 1387.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo1387783.jpg'}, {'end': 1523.978, 'src': 'embed', 'start': 1468.869, 'weight': 0, 'content': [{'end': 1474.872, 'text': 'And we just keep generating more and more random data sets, calculating the sums of squares,', 'start': 1468.869, 'duration': 6.003}, {'end': 1479.274, 'text': 'plugging them into our equation for f and plotting the results on our histogram.', 'start': 1474.872, 'duration': 4.402}, {'end': 1484.917, 'text': 'Now, imagine we did that hundreds, if not millions of times.', 'start': 1480.395, 'duration': 4.522}, {'end': 1490.741, 'text': "When we're all done with our random datasets, we return to our original dataset.", 'start': 1486.018, 'duration': 4.723}, {'end': 1495.804, 'text': 'We then plug the numbers into our equation for f.', 'start': 1492.122, 'duration': 3.682}, {'end': 1499.746, 'text': 'In this case, we got f equals 6.', 'start': 1495.804, 'duration': 3.942}, {'end': 1505.71, 'text': 'The p-value is the number of more extreme values divided by all of the values.', 'start': 1499.746, 'duration': 5.964}, {'end': 1516.475, 'text': 'So in this case, we have the value at f equals 6 and the value at f equals 7 divided by all the other randomizations that we created originally.', 'start': 1506.69, 'duration': 9.785}, {'end': 1523.978, 'text': 'If this concept is confusing to you, I have a StatQuest that explains p-values, so check that one out.', 'start': 1517.315, 'duration': 6.663}], 'summary': 'By generating and analyzing random data, we obtained a p-value of 6 for our equation f.', 'duration': 55.109, 'max_score': 1468.869, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo1468869.jpg'}, {'end': 1646.183, 'src': 'embed', 'start': 1587.69, 'weight': 1, 'content': [{'end': 1589.571, 'text': "Now let's review the main ideas.", 'start': 1587.69, 'duration': 1.881}, {'end': 1598.858, 'text': 'Given some data that you think are related, linear regression quantifies the relationship in the data.', 'start': 1591.112, 'duration': 7.746}, {'end': 1600.86, 'text': 'This is r-squared.', 'start': 1599.739, 'duration': 1.121}, {'end': 1602.942, 'text': 'This needs to be large.', 'start': 1601.581, 'duration': 1.361}, {'end': 1607.245, 'text': 'It also determines how reliable that relationship is.', 'start': 1604.083, 'duration': 3.162}, {'end': 1611.649, 'text': 'This is the p-value that we calculated with f.', 'start': 1608.186, 'duration': 3.463}, {'end': 1612.87, 'text': 'This needs to be small.', 'start': 1611.649, 'duration': 1.221}, {'end': 1616.256, 'text': 'You need both to have an interesting result.', 'start': 1613.735, 'duration': 2.521}, {'end': 1622.378, 'text': "Hooray! We've made it to the end of another exciting stat quest.", 'start': 1617.716, 'duration': 4.662}, {'end': 1624.059, 'text': 'Wow, this was a long one.', 'start': 1622.698, 'duration': 1.361}, {'end': 1625.799, 'text': 'I hope you had a good time.', 'start': 1624.739, 'duration': 1.06}, {'end': 1632.642, 'text': "If you like this and want to see more stat quests like it, why don't you subscribe to my channel? It's real easy.", 'start': 1626.8, 'duration': 5.842}, {'end': 1633.962, 'text': 'Just click the red button.', 'start': 1632.782, 'duration': 1.18}, {'end': 1640.344, 'text': "And if you have any ideas of stat quests that you'd like me to create, just put them in the comments below.", 'start': 1634.962, 'duration': 5.382}, {'end': 1641.745, 'text': "That's all there is to it.", 'start': 1640.804, 'duration': 0.941}, {'end': 1642.465, 'text': 'All right.', 'start': 1642.185, 'duration': 0.28}, {'end': 1646.183, 'text': 'Tune in next time for another really exciting stat quest.', 'start': 1643.038, 'duration': 3.145}], 'summary': 'Linear regression quantifies data relationship with r-squared and p-value. subscribe for more stat quests.', 'duration': 58.493, 'max_score': 1587.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo1587690.jpg'}], 'start': 1387.783, 'title': 'Calculating f-statistic, p-value, and p-value using f-distribution', 'summary': 'Covers the process of calculating f-statistic and p-value by generating random data, as well as the calculation of p-values using f-distribution, emphasizing the relationship between sample size, fit equation, r-squared, and p-value. it also ends with an invitation to subscribe for more stat quests and to suggest ideas for future content.', 'chapters': [{'end': 1495.804, 'start': 1387.783, 'title': 'Calculating f-statistic and p-value', 'summary': 'Explains the process of calculating f-statistic and p-value by generating random data, calculating sums of squares and plugging them into the equation for f, ultimately enabling the conversion of the number into a p-value.', 'duration': 108.021, 'highlights': ['The process involves generating random data, calculating the mean and sum of squares around the mean, and then plugging these values into the equation for f, resulting in the generation of a number, such as 2 or 3, which is then plotted on a histogram.', 'After generating multiple random datasets, calculating the sums of squares, and plotting the results on a histogram, the original dataset is used to calculate the numbers for f, facilitating the conversion of the number into a p-value.', 'The variation explained by the extra parameters in the fit produces a large F-value, which poses the question of how to convert this number into a p-value.']}, {'end': 1616.256, 'start': 1495.804, 'title': 'Calculating p-value using f-distribution', 'summary': 'Explains the process of calculating p-values using f-distribution, with emphasis on the relationship between sample size, fit equation, r-squared, and p-value.', 'duration': 120.452, 'highlights': ['The p-value is the number of more extreme values divided by all of the values, calculated using the f-distribution, with examples of standard f-distributions and their relationship to sample size.', 'People use the line to calculate the p-value rather than generating tons of random data sets, and the red line represents a standard f-distribution with a smaller sample size relative to the blue line, resulting in a smaller p-value when there are more samples relative to the number of parameters in the fit equation.', 'Linear regression quantifies the relationship in the data using r-squared, which needs to be large, and determines the reliability of that relationship using the p-value calculated with f, which needs to be small, both of which are essential for an interesting result.']}, {'end': 1646.183, 'start': 1617.716, 'title': 'Stat quest summary', 'summary': 'Ends with an invitation to subscribe to the channel for more stat quests and to suggest ideas for future content, promising another exciting stat quest in the next episode.', 'duration': 28.467, 'highlights': ['An invitation to subscribe to the channel for more stat quests is extended, with a promise of future exciting content.', 'Viewers are encouraged to suggest ideas for stat quests in the comments section for future episodes.', 'The chapter concludes with a promise of another exciting stat quest in the next episode.']}], 'duration': 258.4, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nk2CQITm_eo/pics/nk2CQITm_eo1387783.jpg', 'highlights': ['The p-value is the number of more extreme values divided by all of the values, calculated using the f-distribution, with examples of standard f-distributions and their relationship to sample size.', 'Linear regression quantifies the relationship in the data using r-squared, which needs to be large, and determines the reliability of that relationship using the p-value calculated with f, which needs to be small, both of which are essential for an interesting result.', 'The process involves generating random data, calculating the mean and sum of squares around the mean, and then plugging these values into the equation for f, resulting in the generation of a number, such as 2 or 3, which is then plotted on a histogram.', 'After generating multiple random datasets, calculating the sums of squares, and plotting the results on a histogram, the original dataset is used to calculate the numbers for f, facilitating the conversion of the number into a p-value.', 'The variation explained by the extra parameters in the fit produces a large F-value, which poses the question of how to convert this number into a p-value.', 'An invitation to subscribe to the channel for more stat quests is extended, with a promise of future exciting content.', 'Viewers are encouraged to suggest ideas for stat quests in the comments section for future episodes.', 'The chapter concludes with a promise of another exciting stat quest in the next episode.']}], 'highlights': ['Linear regression involves least squares fitting, R squared, and p-value calculation', 'The least squares method minimizes sum of squared residuals to fit a line to data', 'R squared quantifies the predictive power of the model by comparing variations', 'Illustration of a more complicated example using mouse weight and tail length to predict body length', 'R-squared equals the variation in mouse size explained by weight, divided by the variation in mouse size', 'The p-value is calculated using the f-distribution and determines the reliability of the relationship', 'The concept of adjusted R-squared accounts for the number of parameters in the model', 'Emphasizes the need for a p-value to determine the statistical significance of the r-squared value', 'An invitation to subscribe to the channel for more stat quests is extended']}