title
Testing Assumptions - Practical Machine Learning Tutorial with Python p.12
description
We've been learning about regression, and even coded our own very simple linear regression algorithm. Along with that, we've also built a coefficient of determination algorithm to check for the accuracy and reliability of our best-fit line. We've discussed and shown how a best-fit line may not be a great fit, but also explained why our example was correct directionally, even if it was not exact. Now, however, we are at the point where we're using two top-level algorithms, which are subsequently comprised of a handful of smaller algorithms. As we continue building this hierarchy of algorithms, we might wind up finding ourselves in trouble if just one of them have a tiny error, so we want to test our assumptions.
https://pythonprogramming.net/sample-data-testing-machine-learning-tutorial/
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
detail
{'title': 'Testing Assumptions - Practical Machine Learning Tutorial with Python p.12', 'heatmap': [{'end': 534.583, 'start': 443.227, 'weight': 0.924}], 'summary': 'The tutorial emphasizes the importance of testing assumptions in machine learning, while covering topics such as creating linear data sets, python numpy array creation, regression analysis insights, linear model training, and selecting features for price prediction and stock investing.', 'chapters': [{'end': 81.911, 'segs': [{'end': 81.911, 'src': 'embed', 'start': 27.596, 'weight': 0, 'content': [{'end': 30.898, 'text': 'and we have done linear regression and R squared and all this.', 'start': 27.596, 'duration': 3.302}, {'end': 36.982, 'text': 'And so the question is, we need to actually kind of test all of these assumptions.', 'start': 31.598, 'duration': 5.384}, {'end': 40.624, 'text': "So we've got actually two major algorithms.", 'start': 37.042, 'duration': 3.582}, {'end': 44.967, 'text': 'One is the equation for the best fit line and the other one is the R squared or coefficient of determination.', 'start': 40.724, 'duration': 4.243}, {'end': 52.712, 'text': "So we've got these two major algorithms that are also comprised of many other algorithms and, as we even saw just a few videos ago,", 'start': 45.567, 'duration': 7.145}, {'end': 59.276, 'text': 'the misplacement of a single parentheses changes everything and completely ruins the entire thing.', 'start': 52.712, 'duration': 6.564}, {'end': 63.819, 'text': 'So we need to be able to test to make sure things are working as intended.', 'start': 59.857, 'duration': 3.962}, {'end': 73.746, 'text': "So in the world of programming there's a similar kind of field and structure called unit testing, where we test each little small Unit,", 'start': 64.12, 'duration': 9.626}, {'end': 78.409, 'text': 'basically that we can in a program, and this kind of helps us from getting into trouble.', 'start': 73.746, 'duration': 4.663}, {'end': 81.911, 'text': 'now This is not going to be unit testing, but the idea is fairly similar.', 'start': 78.409, 'duration': 3.502}], 'summary': 'Testing algorithms and assumptions is crucial for ensuring accuracy and reliability in linear regression and r squared calculations.', 'duration': 54.315, 'max_score': 27.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk27596.jpg'}], 'start': 1.881, 'title': 'Machine learning testing basics', 'summary': 'Discusses the importance of testing assumptions in machine learning, emphasizing the verification of algorithms such as the best fit line equation and the r squared or coefficient of determination, to ensure accuracy and functionality within the program.', 'chapters': [{'end': 81.911, 'start': 1.881, 'title': 'Machine learning testing basics', 'summary': 'Discusses the importance of testing assumptions in machine learning, focusing on the need to verify algorithms like the best fit line equation and the r squared or coefficient of determination, to ensure their accuracy and functionality within the program.', 'duration': 80.03, 'highlights': ['The tutorial emphasizes the need to test assumptions in machine learning to ensure accuracy and functionality of algorithms, such as the best fit line equation and the R squared or coefficient of determination.', 'It mentions the potential risks in programming, comparing the concept of testing assumptions in machine learning to unit testing in programming to avoid troubles from arising.', 'The chapter highlights the importance of testing to prevent unintended outcomes, using the example of a misplaced parenthesis that can disrupt an entire process.']}], 'duration': 80.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk1881.jpg', 'highlights': ['The tutorial emphasizes the need to test assumptions in machine learning to ensure accuracy and functionality of algorithms, such as the best fit line equation and the R squared or coefficient of determination.', 'The chapter highlights the importance of testing to prevent unintended outcomes, using the example of a misplaced parenthesis that can disrupt an entire process.', 'It mentions the potential risks in programming, comparing the concept of testing assumptions in machine learning to unit testing in programming to avoid troubles from arising.']}, {'end': 249.037, 'segs': [{'end': 147.673, 'src': 'embed', 'start': 103.847, 'weight': 1, 'content': [{'end': 107.75, 'text': 'And then we can test to make sure is r-squared better, higher, right?', 'start': 103.847, 'duration': 3.903}, {'end': 111.974, 'text': 'And and then also just test our best fit line.', 'start': 109.011, 'duration': 2.963}, {'end': 120.08, 'text': "but for the most part we're actually going to be testing r-squared and if, if, if the data is not more linear,", 'start': 111.974, 'duration': 8.106}, {'end': 123.783, 'text': 'we can make it more spread apart or R squared should be lower, and so on.', 'start': 120.08, 'duration': 3.703}, {'end': 129.225, 'text': "So anyways, let's go ahead and do that, and we can also confirm visually that the best fit line is indeed working,", 'start': 123.823, 'duration': 5.402}, {'end': 134.627, 'text': 'just by looking at it and seeing whether or not it is indeed a best fit line or what looks to be a best fit line.', 'start': 129.225, 'duration': 5.402}, {'end': 144.451, 'text': "So, first what we're going to go ahead and do is we're going to import random because we're going to be using random numbers.", 'start': 135.888, 'duration': 8.563}, {'end': 147.673, 'text': 'Everybody, the obligatory pseudo-random.', 'start': 144.511, 'duration': 3.162}], 'summary': 'Testing r-squared for best fit line with random numbers', 'duration': 43.826, 'max_score': 103.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk103847.jpg'}, {'end': 204.875, 'src': 'embed', 'start': 173.215, 'weight': 0, 'content': [{'end': 182.262, 'text': "First is how much, like how many data points do we actually want to create here? And then we're going to say we'll pass variance.", 'start': 173.215, 'duration': 9.047}, {'end': 186.765, 'text': 'And this will be how variable do we want this data set to be.', 'start': 182.602, 'duration': 4.163}, {'end': 191.151, 'text': "Then we're going to pass step.", 'start': 188.47, 'duration': 2.681}, {'end': 199.153, 'text': 'And step will just be how far on average to step up the y value per point.', 'start': 191.591, 'duration': 7.562}, {'end': 201.314, 'text': "And we'll assign a default value there.", 'start': 199.654, 'duration': 1.66}, {'end': 204.875, 'text': "And then finally, we're going to do correlation.", 'start': 201.734, 'duration': 3.141}], 'summary': 'Create a dataset with specified data points, variance, step, and correlation.', 'duration': 31.66, 'max_score': 173.215, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk173215.jpg'}], 'start': 81.971, 'title': 'Testing linear data set with sample data', 'summary': 'Discusses creating a linear data set for testing, considering parameters like r-squared value and best fit line, and introduces a function to create a data set with specified parameters such as data points, variance, step, and correlation.', 'chapters': [{'end': 249.037, 'start': 81.971, 'title': 'Testing linear data set with sample data', 'summary': 'Discusses the process of creating a linear data set for testing by considering parameters like r-squared value and best fit line, and introduces a function to create a data set with specified parameters such as data points, variance, step, and correlation.', 'duration': 167.066, 'highlights': ["The process involves testing the data's r-squared value to ensure it is more linear, with an aim of achieving a higher r-squared value.", 'Confirming visually that the best fit line is indeed working by observing its linearity and appropriateness.', "Introducing a function 'create_data_set' which allows specifying parameters such as data points, variance, step, and correlation for creating a data set."]}], 'duration': 167.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk81971.jpg', 'highlights': ["Introducing a function 'create_data_set' for specifying parameters like data points, variance, step, and correlation.", "Testing the data's r-squared value to ensure it is more linear, aiming for a higher r-squared value.", 'Confirming the effectiveness of the best fit line by visually observing its linearity and appropriateness.']}, {'end': 577.447, 'segs': [{'end': 301.607, 'src': 'embed', 'start': 272.27, 'weight': 0, 'content': [{'end': 288.359, 'text': "so we'll say float 64, so that returns the X's, and then we also need to return Y values, so wise, and then D type equals NP float 64..", 'start': 272.27, 'duration': 16.089}, {'end': 291.961, 'text': "OK So that's the objective that we want to do.", 'start': 288.359, 'duration': 3.602}, {'end': 298.265, 'text': 'And then now what we want to do is start creating at least some random values.', 'start': 292.281, 'duration': 5.984}, {'end': 301.607, 'text': "So the first thing we're going to say is we're going to start with val equals 1.", 'start': 298.665, 'duration': 2.942}], 'summary': 'Objective: return x and y values as np float 64, starting with val equals 1.', 'duration': 29.337, 'max_score': 272.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk272270.jpg'}, {'end': 505.482, 'src': 'embed', 'start': 475.73, 'weight': 1, 'content': [{'end': 481.795, 'text': "So let's say we said we want 40 data points with a variance of 40.", 'start': 475.73, 'duration': 6.065}, {'end': 487.94, 'text': 'The step will be two and correlation will make that positive.', 'start': 481.795, 'duration': 6.145}, {'end': 495.936, 'text': "now we have x's, y's, uh, we can print r squared and all that fun stuff.", 'start': 490.092, 'duration': 5.844}, {'end': 499.038, 'text': "and um, let's go ahead and run that real quick.", 'start': 495.936, 'duration': 3.102}, {'end': 501.46, 'text': 'and in fact, are we still pro?', 'start': 499.038, 'duration': 2.422}, {'end': 505.482, 'text': "yeah, we're still graphing that prediction, so let's we'll get rid of the prediction.", 'start': 501.46, 'duration': 4.022}], 'summary': 'Request for 40 data points with variance of 40, step of 2, positive correlation, and computation of r squared.', 'duration': 29.752, 'max_score': 475.73, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk475730.jpg'}, {'end': 553.435, 'src': 'heatmap', 'start': 443.227, 'weight': 2, 'content': [{'end': 450.532, 'text': 'So to create a sample data set we could do something like and, for example, we could leave this here for now,', 'start': 443.227, 'duration': 7.305}, {'end': 454.354, 'text': "but I'm going to comment it out just so we know that we're working with our new data instead.", 'start': 450.532, 'duration': 3.822}, {'end': 461.498, 'text': "So underneath this, you could create a new data set, but I guess we'll create it down here underneath all these other functions.", 'start': 454.434, 'duration': 7.064}, {'end': 467.823, 'text': "So you could say something like x's, y's equals create data set.", 'start': 463.079, 'duration': 4.744}, {'end': 475.189, 'text': "And then let's say we did a recall that it's how much variance the step and the correlation.", 'start': 468.384, 'duration': 6.805}, {'end': 481.795, 'text': "So let's say we said we want 40 data points with a variance of 40.", 'start': 475.73, 'duration': 6.065}, {'end': 487.94, 'text': 'The step will be two and correlation will make that positive.', 'start': 481.795, 'duration': 6.145}, {'end': 495.936, 'text': "now we have x's, y's, uh, we can print r squared and all that fun stuff.", 'start': 490.092, 'duration': 5.844}, {'end': 499.038, 'text': "and um, let's go ahead and run that real quick.", 'start': 495.936, 'duration': 3.102}, {'end': 501.46, 'text': 'and in fact, are we still pro?', 'start': 499.038, 'duration': 2.422}, {'end': 505.482, 'text': "yeah, we're still graphing that prediction, so let's we'll get rid of the prediction.", 'start': 501.46, 'duration': 4.022}, {'end': 506.603, 'text': 'we could actually leave the prediction.', 'start': 505.482, 'duration': 1.121}, {'end': 509.085, 'text': 'that might be kind of interesting.', 'start': 506.603, 'duration': 2.482}, {'end': 511.926, 'text': 'um, for now that we might run trouble.', 'start': 509.085, 'duration': 2.841}, {'end': 518.431, 'text': "i'm not really sure if we're gonna get in trouble for that or not, but we'll just do that and let's run it and see.", 'start': 511.926, 'duration': 6.505}, {'end': 523.715, 'text': 'We might have to change something else, but I think that would be everything we would change Awesome.', 'start': 519.051, 'duration': 4.664}, {'end': 526.177, 'text': "So here's our data set and sure enough.", 'start': 524.255, 'duration': 1.922}, {'end': 534.583, 'text': "There's a nice best fit line for us and we see that We would kind of agree with that visually.", 'start': 526.217, 'duration': 8.366}, {'end': 543.55, 'text': "Let's go and Graph that other plot though that one and this will be a G prediction I don't even see it.", 'start': 535.123, 'duration': 8.427}, {'end': 545.931, 'text': 'It was for x equals 8.', 'start': 543.73, 'duration': 2.201}, {'end': 549.493, 'text': "I guess it would be right on the line, and then we're plotting the regression line.", 'start': 545.931, 'duration': 3.562}, {'end': 553.435, 'text': "So I'm guessing the line is just going right over it, probably.", 'start': 549.773, 'duration': 3.662}], 'summary': 'Creating a new data set with 40 data points, variance of 40, step of 2, and positive correlation. visualizing the best fit line and regression plot.', 'duration': 27.218, 'max_score': 443.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk443227.jpg'}], 'start': 250.258, 'title': 'Python numpy array creation and generating sample data', 'summary': 'Covers creating a python numpy array with float 64 data type, generating random y values, creating a data set with specified variance, step, and correlation, and visualizing best fit and regression lines.', 'chapters': [{'end': 321.624, 'start': 250.258, 'title': 'Python numpy array creation', 'summary': 'Discusses creating a python numpy array and returning x and y values of float 64 data type, while also explaining the process of generating random y values using python.', 'duration': 71.366, 'highlights': ["The objective is to return the NumPy array X's and Y's of float 64 data type.", "Explaining the process of creating random Y values using Python and starting with the value '1'.", "Emphasizing the importance of specifying the data type (float 64) for X's and Y's to avoid forgetting it later."]}, {'end': 577.447, 'start': 322.925, 'title': 'Generating sample data and creating a data set', 'summary': "Discusses the process of generating sample data with specified variance, step, and correlation, creating a data set of x's and y's, and visualizing the best fit line and regression line for the data set.", 'duration': 254.522, 'highlights': ["The process of generating sample data with specified variance, step, and correlation is explained. The chapter discusses the process of generating sample data with specified variance, step, and correlation, creating a data set of x's and y's, and visualizing the best fit line and regression line for the data set.", "Creating a data set of x's and y's for the sample data is outlined. The chapter explains the process of creating a data set of x's and y's for the sample data, utilizing the specified variance, step, and correlation.", 'Visualizing the best fit line and regression line for the created data set is demonstrated. The chapter demonstrates the visualization of the best fit line and regression line for the created data set, providing insights into the correlation and visual interpretation of the data.']}], 'duration': 327.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk250258.jpg', 'highlights': ["The chapter emphasizes the importance of specifying the data type (float 64) for X's and Y's to avoid forgetting it later.", "The process of generating sample data with specified variance, step, and correlation is explained, including creating a data set of x's and y's and visualizing the best fit line and regression line.", 'The chapter demonstrates the visualization of the best fit line and regression line for the created data set, providing insights into the correlation and visual interpretation of the data.']}, {'end': 837.784, 'segs': [{'end': 666.456, 'src': 'embed', 'start': 622.639, 'weight': 0, 'content': [{'end': 625.222, 'text': "So let's do 10.", 'start': 622.639, 'duration': 2.583}, {'end': 626.123, 'text': 'We can save and run that.', 'start': 625.222, 'duration': 0.901}, {'end': 628.964, 'text': "as you can see, it's much tighter.", 'start': 627.184, 'duration': 1.78}, {'end': 634.486, 'text': "everything's there, and sure enough the coefficient of determination is very, very strong.", 'start': 628.964, 'duration': 5.522}, {'end': 637.866, 'text': "it's 0.92, much better than before.", 'start': 634.486, 'duration': 3.38}, {'end': 640.027, 'text': 'and what if we changed this to an 80?', 'start': 637.866, 'duration': 2.161}, {'end': 644.888, 'text': 'now it should be less than 0.6, and sure enough it is less than 0.6.', 'start': 640.027, 'duration': 4.861}, {'end': 653.25, 'text': 'and so what you can begin to do is automatically write a program that simply calculates the coefficient of determination for,', 'start': 644.888, 'duration': 8.362}, {'end': 660.073, 'text': "for just a sample data set, and you would just make sure, for example, that you'd start with 40,", 'start': 654.93, 'duration': 5.143}, {'end': 666.456, 'text': 'save that number and then you would change that to 10, and hopefully the coefficient of determination was less than this initial number, and then,', 'start': 660.073, 'duration': 6.383}], 'summary': 'The coefficient of determination improved from 0.6 to 0.92 after changing the sample data set size from 80 to 10.', 'duration': 43.817, 'max_score': 622.639, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk622639.jpg'}, {'end': 719.032, 'src': 'embed', 'start': 686.405, 'weight': 2, 'content': [{'end': 689.187, 'text': 'we should get quite an ugly data set.', 'start': 686.405, 'duration': 2.782}, {'end': 690.107, 'text': 'Sure enough, we do.', 'start': 689.327, 'duration': 0.78}, {'end': 700.592, 'text': 'And the coefficient of determination is almost zero, which is absolutely not surprising because that almost looks like a completely flat line.', 'start': 690.827, 'duration': 9.765}, {'end': 703.053, 'text': 'And sure enough, this data is completely nonlinear.', 'start': 701.112, 'duration': 1.941}, {'end': 711.521, 'text': 'So if you did have a data set and you were trying to run linear regression on this data set and you came back with an R squared, that was this number.', 'start': 703.453, 'duration': 8.068}, {'end': 713.524, 'text': "that's like.", 'start': 711.521, 'duration': 2.003}, {'end': 719.032, 'text': '0007, you would probably be smart enough to decide, hey, my data is actually not linear.', 'start': 713.524, 'duration': 5.508}], 'summary': 'The data set is completely nonlinear with an r squared value of 0.0007, indicating it is not suitable for linear regression.', 'duration': 32.627, 'max_score': 686.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk686405.jpg'}, {'end': 749.725, 'src': 'embed', 'start': 722.857, 'weight': 3, 'content': [{'end': 731.73, 'text': 'That said, you can do other forms of classification with the data or not just classification, but you can do other forms of machine learning.', 'start': 722.857, 'duration': 8.873}, {'end': 734.835, 'text': "I'm thinking classification with your data.", 'start': 731.73, 'duration': 3.105}, {'end': 736.397, 'text': "It doesn't necessarily have to be linear.", 'start': 734.855, 'duration': 1.542}, {'end': 743.442, 'text': "And in fact, a lot of classification is Should be linear in some way, but we'll get there anyway That's enough for now.", 'start': 736.417, 'duration': 7.025}, {'end': 749.725, 'text': 'I think, but just kind of keep in mind that when you create Big, big scripts like we have here,', 'start': 743.482, 'duration': 6.243}], 'summary': 'Explore alternative forms of machine learning, particularly non-linear classification, for your data.', 'duration': 26.868, 'max_score': 722.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk722857.jpg'}, {'end': 787.654, 'src': 'embed', 'start': 761.973, 'weight': 4, 'content': [{'end': 767.4, 'text': 'but you could definitely program something that would go through and, like I was saying,', 'start': 761.973, 'duration': 5.427}, {'end': 772.968, 'text': 'check to make sure R squared was acting according to our assumption and our knowledge of how it ought to act.', 'start': 767.4, 'duration': 5.568}, {'end': 778.665, 'text': "So we're basically done with regression, but I want to make a quick edit to this video to cover two pretty important things.", 'start': 773.479, 'duration': 5.186}, {'end': 787.654, 'text': "One is a fundamental aspect to machine learning that might be getting overlooked using the really simple example that we've used here.", 'start': 779.005, 'duration': 8.649}], 'summary': 'Programming to check r squared adherence, covering important machine learning aspects.', 'duration': 25.681, 'max_score': 761.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk761973.jpg'}], 'start': 578.688, 'title': 'Regression analysis insights', 'summary': 'Delves into testing the impact of variance on the coefficient of determination, revealing a significant decrease from 0.6 to 0.92 when variance is reduced from 40 to 10, and discusses identifying nonlinear datasets, unsuitable for linear regression, emphasizing the importance of validating regression assumptions and understanding prediction line behavior.', 'chapters': [{'end': 686.405, 'start': 578.688, 'title': 'Testing variance impact on coefficient of determination', 'summary': 'Discusses testing the impact of changing variance on the coefficient of determination, showing a significant decrease from 0.6 to 0.92 when variance is reduced from 40 to 10, demonstrating the potential for automating coefficient of determination calculation and unit testing.', 'duration': 107.717, 'highlights': ['The coefficient of determination significantly increases from 0.6 to 0.92 when the variance is reduced from 40 to 10, demonstrating the impact of variance on the predictive accuracy.', "Automating the calculation of the coefficient of determination for different variance values can potentially enable the creation of unit tests to validate the predictive model's performance."]}, {'end': 743.442, 'start': 686.405, 'title': 'Nonlinear data and linear regression', 'summary': 'Discusses the identification of a nonlinear dataset with an almost zero coefficient of determination, indicating unsuitability for linear regression, and suggests exploring other forms of classification for machine learning.', 'duration': 57.037, 'highlights': ['The coefficient of determination is almost zero, indicating the unsuitability of the data for linear regression.', "Identification of a completely nonlinear dataset with a coefficient of determination of 'like .0007.'", 'Suggestion to explore other forms of classification for machine learning with the unsuitable data.']}, {'end': 837.784, 'start': 743.482, 'title': 'Regression analysis summary', 'summary': 'Covers the importance of validating regression assumptions, addressing fundamental aspects of machine learning, and rectifying errors in the code, with a focus on adjusting data and understanding prediction line behavior.', 'duration': 94.302, 'highlights': ['The importance of validating regression assumptions and understanding how R squared should act', 'Addressing fundamental aspects of machine learning and rectifying errors in the code', 'Adjusting data and understanding prediction line behavior']}], 'duration': 259.096, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk578688.jpg', 'highlights': ['Significant decrease in coefficient of determination from 0.6 to 0.92 when variance reduced from 40 to 10.', 'Automating calculation of coefficient of determination for different variance values can enable creation of unit tests.', 'Identification of completely nonlinear dataset with coefficient of determination almost zero.', 'Suggestion to explore other forms of classification for machine learning with unsuitable data.', 'Importance of validating regression assumptions and understanding prediction line behavior.']}, {'end': 982.234, 'segs': [{'end': 863.613, 'src': 'embed', 'start': 837.904, 'weight': 2, 'content': [{'end': 844.588, 'text': "One, we've created a linear model that is going to attempt to do this, but then also we've made a mistake.", 'start': 837.904, 'duration': 6.684}, {'end': 846.789, 'text': "So we'll address kind of both.", 'start': 845.168, 'duration': 1.621}, {'end': 852.933, 'text': 'But anyway, the first thing is, and the biggest mistake, actually there was two mistakes.', 'start': 846.909, 'duration': 6.024}, {'end': 856.415, 'text': "One I noticed in the video, just going back over it, I'm pretty sure it was here.", 'start': 853.073, 'duration': 3.342}, {'end': 859.877, 'text': 'There was also a colon at the end of the X.', 'start': 857.075, 'duration': 2.802}, {'end': 863.613, 'text': "I'm not sure why that was there.", 'start': 861.792, 'duration': 1.821}], 'summary': 'A linear model was created to address mistakes, including a colon error in the video.', 'duration': 25.709, 'max_score': 837.904, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk837904.jpg'}, {'end': 924.476, 'src': 'embed', 'start': 899.018, 'weight': 1, 'content': [{'end': 906.025, 'text': "But instead what we've done is we've sliced X and redefined X here and then sliced x after it's already been redefined.", 'start': 899.018, 'duration': 7.007}, {'end': 909.827, 'text': 'so this is actually minus forecast out of the ninety percent.', 'start': 906.025, 'duration': 3.802}, {'end': 913.249, 'text': 'so obviously simplifying things a little bit, uh.', 'start': 909.827, 'duration': 3.422}, {'end': 920.633, 'text': "this is the basically up to ninety percent and this is the last 10% of that 90%, it's a little bit more, but anyway.", 'start': 913.249, 'duration': 7.384}, {'end': 924.476, 'text': "So that was just, that's a failure in logic.", 'start': 921.814, 'duration': 2.662}], 'summary': 'Slicing x led to a 10% forecast reduction, a logic failure.', 'duration': 25.458, 'max_score': 899.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk899018.jpg'}, {'end': 986.036, 'src': 'embed', 'start': 961.926, 'weight': 0, 'content': [{'end': 969.05, 'text': "So what was the objective here? First of all, let me just say the reason why we did it this way is just for simplicity's sake.", 'start': 961.926, 'duration': 7.124}, {'end': 971.592, 'text': 'We were just trying to do a really simple regression example.', 'start': 969.25, 'duration': 2.342}, {'end': 977.732, 'text': "But let's say, regardless of whether or not you're interested in stock investing,", 'start': 972.489, 'duration': 5.243}, {'end': 982.234, 'text': 'this problem is every machine learning problem is gonna likely be a somewhat complex problem.', 'start': 977.732, 'duration': 4.502}, {'end': 986.036, 'text': 'So you have to think pretty logically about the features that you choose to use.', 'start': 982.254, 'duration': 3.782}], 'summary': 'Demonstrating a simple regression example for machine learning problems.', 'duration': 24.11, 'max_score': 961.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk961926.jpg'}], 'start': 837.904, 'title': 'Linear model training', 'summary': 'Discusses mistakes in linear model creation, logic failure in data slicing, and the importance of training features in a simple regression example, emphasizing the need for complexity in machine learning problems.', 'chapters': [{'end': 982.234, 'start': 837.904, 'title': 'Linear model training and fundamentals', 'summary': 'Discusses the mistakes in the linear model creation, the logic failure in data slicing, and the significance of training features in a simple regression example, emphasizing the need for complexity in machine learning problems.', 'duration': 144.33, 'highlights': ['Mistake in logic failure of data slicing The logic failure in data slicing occurred when X was redefined and then sliced, resulting in a failure in logic as X lately was not the last 10% as intended.', 'Mistake in linear model creation An error was found in the linear model creation due to a typo with a colon at the end of X, which was identified as a failure and rectified by cutting and pasting the correct logic.', 'Significance of training features in regression example The discussion emphasized the need for complexity in machine learning problems, irrespective of the application, and highlighted the importance of training against relevant features.']}], 'duration': 144.33, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk837904.jpg', 'highlights': ['The discussion emphasized the need for complexity in machine learning problems, irrespective of the application, and highlighted the importance of training against relevant features.', 'Mistake in logic failure of data slicing occurred when X was redefined and then sliced, resulting in a failure in logic as X lately was not the last 10% as intended.', 'An error was found in the linear model creation due to a typo with a colon at the end of X, which was identified as a failure and rectified by cutting and pasting the correct logic.']}, {'end': 1231.666, 'segs': [{'end': 1046.175, 'src': 'embed', 'start': 1002.729, 'weight': 0, 'content': [{'end': 1003.63, 'text': 'How about percent change?', 'start': 1002.729, 'duration': 0.901}, {'end': 1005.232, 'text': 'No, right?', 'start': 1004.531, 'duration': 0.701}, {'end': 1007.215, 'text': 'These may be volatility right?', 'start': 1005.273, 'duration': 1.942}, {'end': 1012.583, 'text': 'Maybe magnitude same thing with high, minus, low percent volatility and like direction, maybe, but not price.', 'start': 1007.255, 'duration': 5.328}, {'end': 1019.35, 'text': 'What about volume? No, not price, right? This is just magnitude kind of fluctuation maybe, stuff like that, volatility.', 'start': 1012.783, 'duration': 6.567}, {'end': 1023.094, 'text': 'So the only thing that really hinges on price is adjusted close.', 'start': 1020.131, 'duration': 2.963}, {'end': 1035.787, 'text': 'To illustrate that, despite training on a future value that is indeed price, What we can do is we can actually drop adjusted close from the features.', 'start': 1023.395, 'duration': 12.392}, {'end': 1039.45, 'text': 'And what do you think? When we drop this, what do you think is going to happen before we graph it??', 'start': 1036.207, 'duration': 3.243}, {'end': 1042.893, 'text': 'Is that going to create a similar line that follows price??', 'start': 1039.911, 'duration': 2.982}, {'end': 1046.175, 'text': 'Is it going to be a falling price, upward price, flat line??', 'start': 1042.933, 'duration': 3.242}], 'summary': 'Discussing the impact of adjusted close on price prediction and volatility in stock trading.', 'duration': 43.446, 'max_score': 1002.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk1002729.jpg'}, {'end': 1099.167, 'src': 'embed', 'start': 1068.732, 'weight': 3, 'content': [{'end': 1071.053, 'text': 'Not big differences.', 'start': 1068.732, 'duration': 2.321}, {'end': 1075.095, 'text': 'The only thing that might be sort of impactful is the adjusted volume,', 'start': 1071.413, 'duration': 3.682}, {'end': 1080.617, 'text': 'since probably less people are quickly flipping an $800 stock as opposed to a $50 stock or something like that.', 'start': 1075.095, 'duration': 5.522}, {'end': 1084.778, 'text': "But regardless, these just aren't the greatest features.", 'start': 1081.137, 'duration': 3.641}, {'end': 1088.64, 'text': 'So, thinking about your problem, in this case it was stock investing.', 'start': 1084.878, 'duration': 3.762}, {'end': 1092.883, 'text': 'What is a stock price indicative of??', 'start': 1089.26, 'duration': 3.623}, {'end': 1095.765, 'text': "It's indicative of the entire company's value.", 'start': 1093.003, 'duration': 2.762}, {'end': 1099.167, 'text': "Let's think of Google, for example, like $500 billion, I think.", 'start': 1095.785, 'duration': 3.382}], 'summary': 'Adjusted volume may impact stock trading, but stock price indicates company value, e.g. google at $500 billion.', 'duration': 30.435, 'max_score': 1068.732, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk1068732.jpg'}, {'end': 1156.777, 'src': 'embed', 'start': 1130.437, 'weight': 1, 'content': [{'end': 1137.562, 'text': "Fundamentally, Google's worth $500 billion because of things like its quarterly earnings, its price to earnings, its price to earnings to growth,", 'start': 1130.437, 'duration': 7.125}, {'end': 1139.163, 'text': 'its book value and so on.', 'start': 1137.562, 'duration': 1.601}, {'end': 1141.265, 'text': 'These are the things that value the company.', 'start': 1139.584, 'duration': 1.681}, {'end': 1147.65, 'text': "So if you wanted to predict stock price, you would use features that attempt to predict the company's overall value.", 'start': 1141.325, 'duration': 6.325}, {'end': 1153.154, 'text': 'And then from there, you can divide that by outstanding shares and get a specific share price for the company.', 'start': 1148.13, 'duration': 5.024}, {'end': 1156.777, 'text': 'But anyway, this was just meant to be a very simple example.', 'start': 1153.274, 'duration': 3.503}], 'summary': "Google's value of $500 billion is determined by factors like quarterly earnings, price to earnings, price to earnings to growth, and book value.", 'duration': 26.34, 'max_score': 1130.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk1130437.jpg'}], 'start': 982.254, 'title': 'Selecting features for price prediction and understanding stock investing', 'summary': 'Discusses the impact of adjusted close on price prediction, emphasizing the need for fundamental features like quarterly earnings and book value to predict stock price, rather than technical indicators like adjusted volume or pattern recognition.', 'chapters': [{'end': 1068.672, 'start': 982.254, 'title': 'Choosing features for price prediction', 'summary': 'Discusses the logical selection of features for price prediction, highlighting the impact of adjusted close on price and the consequence of excluding it, resulting in a flat prediction line due to the lack of direct correlation with price.', 'duration': 86.418, 'highlights': ['The only feature that directly hinges on price is adjusted close, while other features such as high minus low percent, percent change, and volume are more related to volatility and magnitude rather than price.', 'Excluding adjusted close from the features results in a flat prediction line, indicating the significant impact of this feature on price prediction.', 'The consequence of excluding adjusted close from the features is a flat prediction line, demonstrating the crucial role of this feature in price prediction.']}, {'end': 1231.666, 'start': 1068.732, 'title': 'Understanding stock investing', 'summary': "Discusses the importance of using fundamental features such as quarterly earnings, price to earnings, and book value to predict stock price, emphasizing that stock price is indicative of a company's overall value rather than technical indicators like adjusted volume or pattern recognition.", 'duration': 162.934, 'highlights': ["Stock price is indicative of a company's overall value, determined by fundamental features like quarterly earnings and price to earnings. Emphasizes the importance of fundamental features in determining stock price.", "Technical indicators like adjusted volume and pattern recognition do not accurately determine a company's value or stock price. Highlights the limitations of technical indicators in predicting stock price.", 'The speaker offers a tutorial series covering more complex examples of investing with fundamental features of companies. Mentions the availability of a tutorial series for those interested in more complex investing examples.']}], 'duration': 249.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Kpxwl2u-Wgk/pics/Kpxwl2u-Wgk982254.jpg', 'highlights': ['Excluding adjusted close from the features results in a flat prediction line, indicating the significant impact of this feature on price prediction.', "Stock price is indicative of a company's overall value, determined by fundamental features like quarterly earnings and price to earnings.", 'The only feature that directly hinges on price is adjusted close, while other features such as high minus low percent, percent change, and volume are more related to volatility and magnitude rather than price.', "Technical indicators like adjusted volume and pattern recognition do not accurately determine a company's value or stock price."]}], 'highlights': ['The tutorial emphasizes the need to test assumptions in machine learning to ensure accuracy and functionality of algorithms, such as the best fit line equation and the R squared or coefficient of determination.', 'Significant decrease in coefficient of determination from 0.6 to 0.92 when variance reduced from 40 to 10.', 'The discussion emphasized the need for complexity in machine learning problems, irrespective of the application, and highlighted the importance of training against relevant features.', "Introducing a function 'create_data_set' for specifying parameters like data points, variance, step, and correlation.", "The chapter emphasizes the importance of specifying the data type (float 64) for X's and Y's to avoid forgetting it later."]}