title
3.4: Linear Regression with Gradient Descent - Intelligence and Learning
description
In this video I continue my Machine Learning series and attempt to explain Linear Regression with Gradient Descent.
My Video explaining the Mathematics of Gradient Descent: https://youtu.be/jc2IthslyzM
This video is part of session 3 of my Spring 2017 ITP "Intelligence and Learning" course (https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/tree/master/week3-classification-regression)
Support this channel on Patreon: https://patreon.com/codingtrain
To buy Coding Train merchandise: https://www.designbyhumans.com/shop/codingtrain/
Send me your questions and coding challenges!: https://github.com/CodingTrain/Rainbow-Topics
Contact:
Twitter: https://twitter.com/shiffman
The Coding Train website: http://thecodingtrain.com/
Links discussed in this video:
Session 3 of Intelligence and Learning: https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/tree/master/week3-classification-regression
Nature of Code: http://natureofcode.com/
kwichmann's Linear Regression Diagnostics: https://kwichmann.github.io/ml_sandbox/linear_regression_diagnostics/
Linear Regression on Wikipedia: https://en.wikipedia.org/wiki/Linear_regression
Source Code for the all Video Lessons: https://github.com/CodingTrain/Rainbow-Code
p5.js: https://p5js.org/
Processing: https://processing.org
For More Coding Challenges: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH
For More Intelligence and Learning: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6YJ3XfHhT2Mm4Y5I99nrIKX
Help us caption & translate this video!
http://amara.org/v/7Yh8/
📄 Code of Conduct: https://github.com/CodingTrain/Code-of-Conduct
detail
{'title': '3.4: Linear Regression with Gradient Descent - Intelligence and Learning', 'heatmap': [{'end': 642.62, 'start': 601.961, 'weight': 0.854}, {'end': 699.336, 'start': 683.263, 'weight': 0.725}, {'end': 902.483, 'start': 862.357, 'weight': 0.702}, {'end': 996.791, 'start': 979.147, 'weight': 0.926}, {'end': 1067.268, 'start': 1037.028, 'weight': 1}], 'summary': 'Covers linear regression and neural networks, two-dimensional data analysis for predicting ice cream sales and cricket chirping frequency, the concept and application of gradient descent in machine learning, and the use of stochastic gradient descent to improve results in linear regression, emphasizing the importance of learning rate in minimizing error and the impact of small adjustments in gradient descent.', 'chapters': [{'end': 165.444, 'segs': [{'end': 49.975, 'src': 'embed', 'start': 20.765, 'weight': 1, 'content': [{'end': 24.566, 'text': 'So at the top of this video, why am I making another video about linear regression?', 'start': 20.765, 'duration': 3.801}, {'end': 29.927, 'text': 'So what I did in the previous two videos is I created a p5.js sketch.', 'start': 24.926, 'duration': 5.001}, {'end': 34.729, 'text': 'which implements linear regression using the ordinary least squares method, so a statistical approach.', 'start': 30.347, 'duration': 4.382}, {'end': 45.733, 'text': 'There are a whole bunch of data points in 2D space and I try to fit a line that fits to that data as best as possible so that I could predict new data points in this space.', 'start': 35.309, 'duration': 10.424}, {'end': 49.975, 'text': 'And you can see, as I start to click around, how the line is fit sort of changes.', 'start': 46.033, 'duration': 3.942}], 'summary': 'Demonstrating linear regression using p5.js sketch in 2d space.', 'duration': 29.21, 'max_score': 20.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab7420765.jpg'}, {'end': 129.729, 'src': 'embed', 'start': 101.507, 'weight': 0, 'content': [{'end': 112.254, 'text': 'Okay, so this is the problem that machine learning, neural network based, deep learning based systems are here to solve.', 'start': 101.507, 'duration': 10.747}, {'end': 117.178, 'text': 'to figure out a way to create a model to fit a given data set.', 'start': 113.375, 'duration': 3.803}, {'end': 125.205, 'text': 'And one technique for doing that, which is different than, say, ordinary least squares, is to use a technique called gradient descent.', 'start': 117.899, 'duration': 7.306}, {'end': 129.729, 'text': 'And what gradient descent essentially does, it says, let me make a guess.', 'start': 125.566, 'duration': 4.163}], 'summary': 'Machine learning aims to solve problems by creating models using techniques like gradient descent.', 'duration': 28.222, 'max_score': 101.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74101507.jpg'}], 'start': 1, 'title': 'Linear regression and neural networks', 'summary': 'Covers the concept of linear regression as a foundation for building a neural network-based machine learning system, including the implementation of linear regression using the ordinary least squares method and the introduction of gradient descent for creating a model to fit a given data set.', 'chapters': [{'end': 165.444, 'start': 1, 'title': 'Linear regression and neural networks', 'summary': 'Discusses the concept of linear regression as a foundation for building a neural network-based machine learning system, covering the implementation of linear regression using the ordinary least squares method and the introduction of gradient descent for creating a model to fit a given data set.', 'duration': 164.444, 'highlights': ['The chapter emphasizes the foundation for building a neural network-based machine learning system using linear regression and introduces the concept of gradient descent for creating a model to fit a given data set. This lays the foundation for future videos on building a neural network-based machine learning system and introduces the concept of gradient descent as a technique for creating a model to fit a given data set.', 'The implementation of linear regression using the ordinary least squares method and the discussion on whether linear regression makes sense based on the data are covered, providing insights into important questions in data science and machine learning. The chapter covers the implementation of linear regression using the ordinary least squares method and discusses the relevance of linear regression based on the data, addressing important questions in data science and machine learning.', 'The explanation of gradient descent as a technique that involves making iterative adjustments to the line to fit the data better is provided, highlighting its approach in creating a model to fit a given data set. The chapter explains gradient descent as a technique that involves making iterative adjustments to the line to fit the data better, demonstrating its role in creating a model to fit a given data set.']}], 'duration': 164.444, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab741000.jpg', 'highlights': ['The chapter emphasizes the foundation for building a neural network-based machine learning system using linear regression and introduces the concept of gradient descent for creating a model to fit a given data set.', 'The implementation of linear regression using the ordinary least squares method and the discussion on whether linear regression makes sense based on the data are covered, providing insights into important questions in data science and machine learning.', 'The explanation of gradient descent as a technique that involves making iterative adjustments to the line to fit the data better is provided, highlighting its approach in creating a model to fit a given data set.']}, {'end': 351.394, 'segs': [{'end': 274.505, 'src': 'embed', 'start': 165.724, 'weight': 0, 'content': [{'end': 172.386, 'text': "Okay, so, I'm going to make this same kind of diagram that I've made a few times now.", 'start': 165.724, 'duration': 6.662}, {'end': 175.727, 'text': "And we're going to really simplify it.", 'start': 173.266, 'duration': 2.461}, {'end': 180.948, 'text': 'So we have this idea of this two-dimensional space.', 'start': 176.247, 'duration': 4.701}, {'end': 189.963, 'text': "There is some, piece of data, we'll call it x, for example, the temperature that it is outside today.", 'start': 182.409, 'duration': 7.554}, {'end': 194.286, 'text': "And we're trying to predict some outcome maybe based on that temperature.", 'start': 190.704, 'duration': 3.582}, {'end': 200.529, 'text': "Yesterday we talked, we'll call that y, we talked about the sales of ice cream.", 'start': 195.046, 'duration': 5.483}, {'end': 206.133, 'text': 'I saw actually a data set that was interesting about like the frequency at which crickets chirp according to the temperature outside.', 'start': 200.629, 'duration': 5.504}, {'end': 208.934, 'text': "That's a data set you can find online somewhere and use.", 'start': 206.833, 'duration': 2.101}, {'end': 216.798, 'text': "So we have this idea and maybe there's some existing data points based on an ice cream store that we have studied.", 'start': 209.795, 'duration': 7.003}, {'end': 219.999, 'text': 'And I can graph that data.', 'start': 216.818, 'duration': 3.181}, {'end': 225.961, 'text': 'So the idea here is that we have our machine learning recipe.', 'start': 220.759, 'duration': 5.202}, {'end': 236.611, 'text': "We are going to take I know I'm out of the frame here we are going to take One of our inputs called x,", 'start': 228.222, 'duration': 8.389}, {'end': 243.919, 'text': 'feed it into the machine learning recipe and the machine learning recipe is going to give us a prediction y.', 'start': 236.611, 'duration': 7.308}, {'end': 251.196, 'text': 'So we have known data, and if we had new input data, we could make a guess.', 'start': 245.595, 'duration': 5.601}, {'end': 257.458, 'text': 'So yesterday my machine learning recipe was the ordinary least squares method,', 'start': 251.836, 'duration': 5.622}, {'end': 263.979, 'text': 'meaning I was able to do a statistical analysis of all of this data and create the line of best fit.', 'start': 257.458, 'duration': 6.521}, {'end': 273.384, 'text': 'And then if I had a new input, x value of such and such, I could look up its corresponding spot on the line, and that would be the y output.', 'start': 264.559, 'duration': 8.825}, {'end': 274.505, 'text': 'This is a function.', 'start': 273.564, 'duration': 0.941}], 'summary': 'Introduction to using machine learning to predict outcomes based on data points, using the example of temperature and ice cream sales.', 'duration': 108.781, 'max_score': 165.724, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74165724.jpg'}, {'end': 326.925, 'src': 'embed', 'start': 303.25, 'weight': 5, 'content': [{'end': 310.695, 'text': "So one thing I'll mention is that the maths required for gradient descent typically involve calculus,", 'start': 303.25, 'duration': 7.445}, {'end': 319.661, 'text': "And they involve two concepts from calculus one called a partial derivative, which, if you don't know calculus or what a derivative is, well,", 'start': 311.596, 'duration': 8.065}, {'end': 321.722, 'text': 'how can you be expected to know what a partial derivative is?', 'start': 319.661, 'duration': 2.061}, {'end': 326.925, 'text': 'As well as something called the chain rule.', 'start': 322.342, 'duration': 4.583}], 'summary': 'Maths for gradient descent involves calculus, including partial derivatives and the chain rule.', 'duration': 23.675, 'max_score': 303.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74303250.jpg'}], 'start': 165.724, 'title': 'Data analysis and machine learning', 'summary': 'Covers two-dimensional data analysis using temperature to predict ice cream sales and cricket chirping frequency, as well as the concept of gradient descent in machine learning, involving calculus and solving for m and b in the equation of a line.', 'chapters': [{'end': 216.798, 'start': 165.724, 'title': 'Two-dimensional data analysis', 'summary': 'Explores the concept of two-dimensional data analysis, using temperature as the independent variable to predict outcomes such as ice cream sales and cricket chirping frequency.', 'duration': 51.074, 'highlights': ['The chapter introduces the concept of using temperature as an independent variable to predict outcomes, such as ice cream sales and cricket chirping frequency.', 'The speaker mentions the availability of a data set related to the frequency at which crickets chirp according to the temperature outside, which can be used for analysis.', 'The chapter emphasizes the simplification of a two-dimensional space for data analysis, highlighting the process of predicting outcomes based on a single variable, such as temperature.']}, {'end': 351.394, 'start': 216.818, 'title': 'Machine learning: gradient descent', 'summary': 'Discusses the concept of gradient descent as a machine learning technique, involving calculus and the idea of solving for m and b in the equation of a line, as well as further explanation to be provided in a follow-up video.', 'duration': 134.576, 'highlights': ['The machine learning recipe involves taking an input x, feeding it into the recipe, and obtaining a prediction y, enabling making guesses for new input data.', 'The technique of gradient descent involves calculus, specifically partial derivatives and the chain rule, and will be further explained in a follow-up video.', 'The machine learning recipe utilizes the ordinary least squares method for statistical analysis and creating the line of best fit for data.']}], 'duration': 185.67, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74165724.jpg', 'highlights': ['The machine learning recipe involves taking an input x, feeding it into the recipe, and obtaining a prediction y, enabling making guesses for new input data.', 'The chapter introduces the concept of using temperature as an independent variable to predict outcomes, such as ice cream sales and cricket chirping frequency.', 'The speaker mentions the availability of a data set related to the frequency at which crickets chirp according to the temperature outside, which can be used for analysis.', 'The chapter emphasizes the simplification of a two-dimensional space for data analysis, highlighting the process of predicting outcomes based on a single variable, such as temperature.', 'The machine learning recipe utilizes the ordinary least squares method for statistical analysis and creating the line of best fit for data.', 'The technique of gradient descent involves calculus, specifically partial derivatives and the chain rule, and will be further explained in a follow-up video.']}, {'end': 708.478, 'segs': [{'end': 462.052, 'src': 'embed', 'start': 430.458, 'weight': 2, 'content': [{'end': 432.579, 'text': 'And this is what gradient descent does.', 'start': 430.458, 'duration': 2.121}, {'end': 437.142, 'text': 'You can think of desired as the known output.', 'start': 433.019, 'duration': 4.123}, {'end': 450.227, 'text': 'the correct output, what if I feed in one of these data points, right? And I say, look at this particular x, y pair.', 'start': 437.682, 'duration': 12.545}, {'end': 452.088, 'text': 'Let me feed it in.', 'start': 450.947, 'duration': 1.141}, {'end': 458.31, 'text': 'Let me try to get a guess, which is sometimes, I think, written as y tick, I think.', 'start': 452.488, 'duration': 5.822}, {'end': 462.052, 'text': "But I'm going to say, what if I get a y? I'm going to say y guess.", 'start': 458.35, 'duration': 3.702}], 'summary': 'Gradient descent computes y guess for given x, y pair', 'duration': 31.594, 'max_score': 430.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74430458.jpg'}, {'end': 566.535, 'src': 'embed', 'start': 537.387, 'weight': 0, 'content': [{'end': 541.991, 'text': "And this is, we've been talking about this, supervised learning.", 'start': 537.387, 'duration': 4.604}, {'end': 545.217, 'text': 'I can take the known data.', 'start': 543.015, 'duration': 2.202}, {'end': 547.239, 'text': 'send it in, get a guess.', 'start': 545.217, 'duration': 2.022}, {'end': 548.279, 'text': 'look at the error.', 'start': 547.239, 'duration': 1.04}, {'end': 549.42, 'text': 'tweak the knobs.', 'start': 548.279, 'duration': 1.141}, {'end': 551.582, 'text': 'send the next data point in, get a guess.', 'start': 549.42, 'duration': 2.162}, {'end': 552.243, 'text': 'look at the error.', 'start': 551.582, 'duration': 0.661}, {'end': 552.803, 'text': 'tweak the knobs.', 'start': 552.243, 'duration': 0.56}, {'end': 555.185, 'text': 'I can do this over and over and over again.', 'start': 552.863, 'duration': 2.322}, {'end': 559.329, 'text': 'And I can just start with random values for m and b.', 'start': 555.766, 'duration': 3.563}, {'end': 560.65, 'text': "So I don't know what it'd be.", 'start': 559.329, 'duration': 1.321}, {'end': 566.535, 'text': "I'm going to just put a line here, and then I can start moving the line around according to the error as I go through all the data.", 'start': 560.69, 'duration': 5.845}], 'summary': 'Supervised learning iterates to find optimal parameters for a line to fit data.', 'duration': 29.148, 'max_score': 537.387, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74537387.jpg'}, {'end': 642.62, 'src': 'heatmap', 'start': 601.961, 'weight': 0.854, 'content': [{'end': 609.187, 'text': "So what I'm going to do now is I am going to This.", 'start': 601.961, 'duration': 7.226}, {'end': 611.149, 'text': 'so I had this function, linear regression,', 'start': 609.187, 'duration': 1.962}, {'end': 621.46, 'text': 'and this linear regression function Calculates the slope of the line and the y-intercept m and b according to Ordinarily squares.', 'start': 611.149, 'duration': 10.311}, {'end': 627.327, 'text': "so what I'm going to do is I'm just going to completely Get rid of this.", 'start': 621.46, 'duration': 5.867}, {'end': 630.01, 'text': 'so now nothing happens there.', 'start': 627.327, 'duration': 2.683}, {'end': 637.159, 'text': 'So I can click and the first guess of the line I just plugged in some values.', 'start': 632.017, 'duration': 5.142}, {'end': 642.62, 'text': "Now, typically speaking, I think what's probably typically done is these values are initialized at zero.", 'start': 637.179, 'duration': 5.441}], 'summary': 'Linear regression function calculates slope and y-intercept using least squares.', 'duration': 40.659, 'max_score': 601.961, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74601961.jpg'}, {'end': 708.478, 'src': 'heatmap', 'start': 683.263, 'weight': 0.725, 'content': [{'end': 689.328, 'text': 'And so in the draw function, I think I want to call this now gradient descent.', 'start': 683.263, 'duration': 6.065}, {'end': 693.734, 'text': "Oh, I'm back.", 'start': 693.114, 'duration': 0.62}, {'end': 695.895, 'text': 'I had a little digression there that I had to edit out.', 'start': 693.754, 'duration': 2.141}, {'end': 699.336, 'text': 'Thanks for, ah, thanks for tuning in.', 'start': 696.655, 'duration': 2.681}, {'end': 703.297, 'text': "Okay, so where I am is that I'm changing the name of the function to gradient descent.", 'start': 699.636, 'duration': 3.661}, {'end': 708.478, 'text': "And what I want to do is I'm going to just look through all of the data.", 'start': 703.757, 'duration': 4.721}], 'summary': 'Renaming function to gradient descent and analyzing all data.', 'duration': 25.215, 'max_score': 683.263, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74683263.jpg'}], 'start': 351.394, 'title': 'Gradient descent in machine learning', 'summary': 'Discusses the concept of gradient descent, explaining its use in minimizing errors in machine learning models by adjusting parameters, emphasizing its application in improving supervised learning, demonstrated through the adjustment of m and b values for a linear regression model.', 'chapters': [{'end': 708.478, 'start': 351.394, 'title': 'Gradient descent in machine learning', 'summary': 'Discusses the concept of gradient descent, explaining how it is used to minimize errors in machine learning models by adjusting parameters, with a focus on how it can be applied to improve supervised learning, illustrated through the example of adjusting m and b values for a linear regression model.', 'duration': 357.084, 'highlights': ['Gradient descent is used to minimize errors in machine learning models by adjusting parameters. It explains the concept of gradient descent and its role in minimizing errors in machine learning models by adjusting parameters.', 'The example of adjusting m and b values for a linear regression model illustrates the application of gradient descent in supervised learning. The example demonstrates how gradient descent can be used to improve supervised learning by adjusting m and b values for a linear regression model.', 'Starting with random values for m and b, the process involves sending in data points, getting a guess, evaluating the error, and tweaking the parameters, iterating over the process to improve the model. The iterative process of sending in data points, evaluating the error, and tweaking parameters, starting with random m and b values, is highlighted as a key aspect of applying gradient descent to improve the model.']}], 'duration': 357.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74351394.jpg', 'highlights': ['The iterative process of sending in data points, evaluating the error, and tweaking parameters, starting with random m and b values, is highlighted as a key aspect of applying gradient descent to improve the model.', 'The example demonstrates how gradient descent can be used to improve supervised learning by adjusting m and b values for a linear regression model.', 'It explains the concept of gradient descent and its role in minimizing errors in machine learning models by adjusting parameters.']}, {'end': 933.885, 'segs': [{'end': 807.522, 'src': 'embed', 'start': 710.038, 'weight': 0, 'content': [{'end': 715.379, 'text': "So let's just first look through all the data.", 'start': 710.038, 'duration': 5.341}, {'end': 724.153, 'text': 'And, okay, so For each data set, I have the y is data index i dot y.', 'start': 715.66, 'duration': 8.493}, {'end': 728.294, 'text': 'So we can get the x and the y.', 'start': 724.153, 'duration': 4.141}, {'end': 733.375, 'text': 'Then, I can actually calculate a guess.', 'start': 728.294, 'duration': 5.081}, {'end': 738.617, 'text': 'So my guess is m times x plus b.', 'start': 734.016, 'duration': 4.601}, {'end': 740.397, 'text': 'This is my machine learning recipe.', 'start': 738.617, 'duration': 1.78}, {'end': 746.579, 'text': 'I am taking the input data x, I am multiplying it by m, I am adding b, and that is my guess.', 'start': 740.957, 'duration': 5.622}, {'end': 756.542, 'text': 'So now, My error equals y minus the guess.', 'start': 747.339, 'duration': 9.203}, {'end': 759.946, 'text': 'And I think, technically speaking, I think I should be saying guess minus y.', 'start': 756.942, 'duration': 3.004}, {'end': 766.348, 'text': 'You may recall that in the ordinary least squares method,', 'start': 762.685, 'duration': 3.663}, {'end': 771.671, 'text': 'I would always square the error because I want to get rid of the positive or negative aspect of it.', 'start': 766.348, 'duration': 5.323}, {'end': 779.256, 'text': "In this case and again I'm going to go a little further into this in the next video I actually want the positive or negative direction of the error,", 'start': 771.991, 'duration': 7.265}, {'end': 785.86, 'text': 'because I want to know which way, in essence, to tune the m and b values to get a better result.', 'start': 779.256, 'duration': 6.604}, {'end': 792.73, 'text': "So the issue here is now, and this is what's known as stochastic gradient descent.", 'start': 787.521, 'duration': 5.209}, {'end': 800.282, 'text': "So I want to make, for every single data point that's available, I want to make a change to m and b.", 'start': 792.95, 'duration': 7.332}, {'end': 807.522, 'text': 'So I need to calculate how should I change m and how should I change b.', 'start': 802.697, 'duration': 4.825}], 'summary': 'Using machine learning, the speaker discusses calculating a guess and using stochastic gradient descent to tune m and b values for better results.', 'duration': 97.484, 'max_score': 710.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74710038.jpg'}, {'end': 854.532, 'src': 'embed', 'start': 829.319, 'weight': 4, 'content': [{'end': 836.281, 'text': 'So in essence, we could say, if I adjust those values according to the error, maybe if I tried it again, I would get a better result.', 'start': 829.319, 'duration': 6.962}, {'end': 841.923, 'text': "And in this case, b can be adjusted directly by the error, because it's just the y-intercept.", 'start': 836.542, 'duration': 5.381}, {'end': 851.37, 'text': 'Should I move it up or down? And m, which is the slope, can be adjusted by the error, but according to also the input value itself.', 'start': 841.963, 'duration': 9.407}, {'end': 854.532, 'text': 'So this is how you can kind of intuitively understand it.', 'start': 852.21, 'duration': 2.322}], 'summary': 'Adjusting values based on error improves results in linear regression.', 'duration': 25.213, 'max_score': 829.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74829319.jpg'}, {'end': 917.641, 'src': 'heatmap', 'start': 862.357, 'weight': 5, 'content': [{'end': 868.802, 'text': "Now, so I'm missing a whole bunch of steps and a few pieces of explanation here, but let's just run this and see what happens.", 'start': 862.357, 'duration': 6.445}, {'end': 875.032, 'text': 'So first I always have to click, okay, first of all, I got an error.', 'start': 872.23, 'duration': 2.802}, {'end': 878.775, 'text': 'Uncaught reference error, n is not defined in gradient descent.', 'start': 875.733, 'duration': 3.042}, {'end': 882.038, 'text': 'Where did I have n? Oh, b equals b plus error.', 'start': 879.075, 'duration': 2.963}, {'end': 883.619, 'text': "Yeah, I don't know what n is.", 'start': 882.138, 'duration': 1.481}, {'end': 887.362, 'text': "So you can see like, okay, well, I don't know where that line went.", 'start': 884.92, 'duration': 2.442}, {'end': 890.625, 'text': 'It was there for a second and then it just went far away.', 'start': 887.442, 'duration': 3.183}, {'end': 891.425, 'text': "So here's the thing.", 'start': 890.665, 'duration': 0.76}, {'end': 902.483, 'text': 'If I come back to my analogy from the steering, One of the things in the steering behavior examples from Nature, of Code,', 'start': 892.286, 'duration': 10.197}, {'end': 906.307, 'text': "Craig Reynolds' examples is that there was a variable called maximum force.", 'start': 902.483, 'duration': 3.824}, {'end': 910.499, 'text': "I don't know if you can see that, maximum force.", 'start': 908.959, 'duration': 1.54}, {'end': 917.641, 'text': "Because one thing you might think about it here is, well, how powerful? I know what the error is between the way I'm going and where I want to go.", 'start': 911, 'duration': 6.641}], 'summary': "Debugging a coding error in gradient descent with undefined variable 'n'.", 'duration': 25.355, 'max_score': 862.357, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74862357.jpg'}], 'start': 710.038, 'title': 'Linear regression and stochastic gradient descent', 'summary': 'Discusses the process of calculating linear regression and the importance of considering error direction, as well as introduces stochastic gradient descent in linear regression to improve results in machine learning.', 'chapters': [{'end': 779.256, 'start': 710.038, 'title': 'Linear regression and error calculation', 'summary': 'Discusses the process of calculating a linear regression guess using the y = mx + b formula and emphasizes the importance of considering the positive or negative direction of the error in the context of machine learning.', 'duration': 69.218, 'highlights': ['The process of calculating a linear regression guess involves using the formula y = mx + b, where m is the slope and b is the y-intercept.', 'Emphasizes the importance of considering the positive or negative direction of the error in the context of machine learning.', 'In the ordinary least squares method, the error is squared to eliminate the positive or negative aspect of it.']}, {'end': 933.885, 'start': 779.256, 'title': 'Stochastic gradient descent in linear regression', 'summary': 'Introduces the concept of stochastic gradient descent in linear regression, exploring the adjustments of slope (m) and y-intercept (b) values according to errors and input values to improve results in machine learning.', 'duration': 154.629, 'highlights': ['The concept of stochastic gradient descent is introduced, where adjustments of m and b values are made for every data point to minimize error and improve results. The algorithm aims to make changes to m and b for every single data point, calculating adjustments based on the error to potentially achieve better results.', 'The adjustments of b (y-intercept) and m (slope) are discussed, with b being adjusted directly by the error and m being adjusted according to the input value and error. The y-intercept (b) can be adjusted directly by the error, while the slope (m) is adjusted based on both the error and the input value.', "The analogy of 'maximum force' from steering behavior is used to illustrate the concept of adjusting the power of turning in relation to the error, emphasizing the potential consequences of excessive turning power. The analogy of 'maximum force' is employed to demonstrate the importance of considering the power of turning in relation to the error, highlighting the potential issues of excessive turning power."]}], 'duration': 223.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74710038.jpg', 'highlights': ['The process of calculating a linear regression guess involves using the formula y = mx + b, where m is the slope and b is the y-intercept.', 'In the ordinary least squares method, the error is squared to eliminate the positive or negative aspect of it.', 'Emphasizes the importance of considering the positive or negative direction of the error in the context of machine learning.', 'The concept of stochastic gradient descent is introduced, where adjustments of m and b values are made for every data point to minimize error and improve results.', 'The adjustments of b (y-intercept) and m (slope) are discussed, with b being adjusted directly by the error and m being adjusted according to the input value and error.', "The analogy of 'maximum force' from steering behavior is used to illustrate the concept of adjusting the power of turning in relation to the error, emphasizing the potential consequences of excessive turning power."]}, {'end': 1279.257, 'segs': [{'end': 962.754, 'src': 'embed', 'start': 934.205, 'weight': 0, 'content': [{'end': 938.872, 'text': "Maybe I just only want to be able to make little adjustments because it's the wrong way.", 'start': 934.205, 'duration': 4.667}, {'end': 940.614, 'text': 'I want to just make a slight adjustment.', 'start': 938.932, 'duration': 1.682}, {'end': 942.437, 'text': "I don't want to overshoot the target.", 'start': 940.634, 'duration': 1.803}, {'end': 945.901, 'text': 'This target being I want to find the parameters.', 'start': 942.697, 'duration': 3.204}, {'end': 950.267, 'text': 'I want to find the weights, the m and b values to minimize the error.', 'start': 945.941, 'duration': 4.326}, {'end': 956.851, 'text': "So I don't want to overshoot what that optimal value is.", 'start': 951.028, 'duration': 5.823}, {'end': 962.754, 'text': 'And so that is where a variable, sometimes called alpha, but most commonly called learning rate comes in.', 'start': 957.111, 'duration': 5.643}], 'summary': 'Desire to make precise adjustments in finding optimal parameters with a focus on minimizing error using a learning rate.', 'duration': 28.549, 'max_score': 934.205, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74934205.jpg'}, {'end': 1008.59, 'src': 'heatmap', 'start': 979.147, 'weight': 0.926, 'content': [{'end': 983.908, 'text': 'And let me change, take this for b and multiply it by the learning rate.', 'start': 979.147, 'duration': 4.761}, {'end': 987.929, 'text': "So now I'm going to try this again with a learning rate of 0.001.", 'start': 985.348, 'duration': 2.581}, {'end': 994.831, 'text': "Hey, that doesn't look right.", 'start': 987.929, 'duration': 6.902}, {'end': 996.791, 'text': 'Come back to me.', 'start': 995.911, 'duration': 0.88}, {'end': 999.632, 'text': "So let's think about what might be wrong.", 'start': 996.811, 'duration': 2.821}, {'end': 1004.253, 'text': 'Over here, I wrote guess minus y.', 'start': 1000.712, 'duration': 3.541}, {'end': 1008.59, 'text': "And that's really what I, oh, that's what I wrote here.", 'start': 1006.909, 'duration': 1.681}], 'summary': 'Adjusting learning rate to 0.001 resulted in unexpected outcome, prompting re-evaluation of calculation.', 'duration': 29.443, 'max_score': 979.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74979147.jpg'}, {'end': 1067.268, 'src': 'heatmap', 'start': 1037.028, 'weight': 1, 'content': [{'end': 1039.631, 'text': 'Let me change that to, oh, I changed it already.', 'start': 1037.028, 'duration': 2.603}, {'end': 1042.835, 'text': 'Wait, how did I do that? Okay, I must have done that before that I went to explain it.', 'start': 1039.711, 'duration': 3.124}, {'end': 1044.076, 'text': "So let's try this.", 'start': 1043.315, 'duration': 0.761}, {'end': 1052.185, 'text': "Looks pretty good, right? Now, here's the thing.", 'start': 1049.883, 'duration': 2.302}, {'end': 1054.048, 'text': 'Let me put m and b back to zero.', 'start': 1052.606, 'duration': 1.442}, {'end': 1057.443, 'text': 'Hit refresh here.', 'start': 1056.522, 'duration': 0.921}, {'end': 1059.384, 'text': "And so let's see.", 'start': 1058.523, 'duration': 0.861}, {'end': 1067.268, 'text': "So we can see, interestingly enough, this isn't the correct line because the line should really go through those two points.", 'start': 1060.764, 'duration': 6.504}], 'summary': 'The speaker demonstrates a process of changing and adjusting a line, aiming for it to go through two specific points.', 'duration': 30.24, 'max_score': 1037.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab741037028.jpg'}, {'end': 1193.793, 'src': 'embed', 'start': 1166.709, 'weight': 1, 'content': [{'end': 1173.457, 'text': "And part of what I'm doing again is not to demonstrate the optimal way to do linear regression, but to demonstrate the technique,", 'start': 1166.709, 'duration': 6.748}, {'end': 1182.888, 'text': 'known as gradient descent, of making small adjustments to weights, to parameters, to the slope and y-intercept, based on an error,', 'start': 1173.457, 'duration': 9.431}, {'end': 1184.47, 'text': 'based on the supervised learning process.', 'start': 1182.888, 'duration': 1.582}, {'end': 1186.35, 'text': 'So this is a start to that.', 'start': 1184.79, 'duration': 1.56}, {'end': 1188.111, 'text': 'You could stop here.', 'start': 1187.331, 'duration': 0.78}, {'end': 1193.793, 'text': "And I highly recommend that you do, because what I'm going to do in the next video, I don't really know how that's going to go, to be honest.", 'start': 1188.131, 'duration': 5.662}], 'summary': 'Demonstrating gradient descent technique for making small adjustments to weights and parameters in supervised learning.', 'duration': 27.084, 'max_score': 1166.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab741166709.jpg'}], 'start': 934.205, 'title': 'Adjusting learning rate and gradient descent', 'summary': 'Discusses the importance of learning rate in minimizing error while finding parameters and weights, explores gradient descent and learning rate for line fitting, and explains gradient descent in linear regression, emphasizing the impact of small adjustments and potential use of batch gradient descent.', 'chapters': [{'end': 1039.631, 'start': 934.205, 'title': 'Adjusting learning rate for minimizing error', 'summary': 'Discusses the importance of learning rate in minimizing error while finding parameters and weights, and making small adjustments to avoid overshooting the optimal value.', 'duration': 105.426, 'highlights': ['The learning rate, typically a small number, is used to reduce the size of the error, ensuring that the optimal value is not overshot.', 'The process involves finding parameters and weights to minimize the error, emphasizing the significance of making slight adjustments without overshooting the target.', "The importance of correcting the order of subtraction to ensure accurate calculations and adjustments, such as changing 'guess minus y' to 'y minus guess'."]}, {'end': 1131.491, 'start': 1039.711, 'title': 'Gradient descent and learning rate', 'summary': 'Explores gradient descent and learning rate for line fitting using two points and demonstrates the impact of adjusting the learning rate, ultimately achieving the line of best fit.', 'duration': 91.78, 'highlights': ['The chapter demonstrates the impact of adjusting the learning rate, eventually achieving the line of best fit with a learning rate of 0.05.', 'The speaker notes the challenge of making very small changes with only two points and suggests the need for a higher learning rate for the demonstration.', 'The speaker discusses the strategy of annealing in machine learning systems, emphasizing the technique of starting with a high learning rate and then slowly adjusting it over time.']}, {'end': 1279.257, 'start': 1131.491, 'title': 'Gradient descent in linear regression', 'summary': 'Explains the concept of gradient descent in linear regression, emphasizing the importance of making small adjustments to weights based on error and demonstrating the technique through adjusting m and b values to minimize error, while also mentioning the potential use of batch gradient descent.', 'duration': 147.766, 'highlights': ['The importance of making small adjustments to weights in linear regression is emphasized, with a focus on minimizing error. Emphasizes the significance of making small adjustments to weights to minimize error in linear regression.', 'The concept of gradient descent is demonstrated through adjusting m and b values to minimize error in the supervised learning process. Demonstrates the technique of gradient descent through adjusting m and b values to minimize error in the supervised learning process.', 'Mentions the potential use of batch gradient descent, adjusting the weights all at once at the end of one cycle through all of the data. Introduces the concept of batch gradient descent, adjusting the weights at the end of one cycle through all of the data.']}], 'duration': 345.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/L-Lsfu4ab74/pics/L-Lsfu4ab74934205.jpg', 'highlights': ['The process involves finding parameters and weights to minimize the error, emphasizing the significance of making slight adjustments without overshooting the target.', 'The importance of learning rate in minimizing error while finding parameters and weights, explores gradient descent and learning rate for line fitting.', 'The importance of making small adjustments to weights in linear regression is emphasized, with a focus on minimizing error.']}], 'highlights': ['The iterative process of sending in data points, evaluating the error, and tweaking parameters, starting with random m and b values, is highlighted as a key aspect of applying gradient descent to improve the model.', 'The example demonstrates how gradient descent can be used to improve supervised learning by adjusting m and b values for a linear regression model.', 'The process involves finding parameters and weights to minimize the error, emphasizing the significance of making slight adjustments without overshooting the target.', 'The importance of learning rate in minimizing error while finding parameters and weights, explores gradient descent and learning rate for line fitting.', 'The importance of making small adjustments to weights in linear regression is emphasized, with a focus on minimizing error.', 'The concept of stochastic gradient descent is introduced, where adjustments of m and b values are made for every data point to minimize error and improve results.', 'The adjustments of b (y-intercept) and m (slope) are discussed, with b being adjusted directly by the error and m being adjusted according to the input value and error.', "The analogy of 'maximum force' from steering behavior is used to illustrate the concept of adjusting the power of turning in relation to the error, emphasizing the potential consequences of excessive turning power.", 'The process of calculating a linear regression guess involves using the formula y = mx + b, where m is the slope and b is the y-intercept.', 'In the ordinary least squares method, the error is squared to eliminate the positive or negative aspect of it.', 'Emphasizes the importance of considering the positive or negative direction of the error in the context of machine learning.', 'The chapter emphasizes the foundation for building a neural network-based machine learning system using linear regression and introduces the concept of gradient descent for creating a model to fit a given data set.', 'The implementation of linear regression using the ordinary least squares method and the discussion on whether linear regression makes sense based on the data are covered, providing insights into important questions in data science and machine learning.', 'The explanation of gradient descent as a technique that involves making iterative adjustments to the line to fit the data better is provided, highlighting its approach in creating a model to fit a given data set.', 'The machine learning recipe involves taking an input x, feeding it into the recipe, and obtaining a prediction y, enabling making guesses for new input data.', 'The chapter introduces the concept of using temperature as an independent variable to predict outcomes, such as ice cream sales and cricket chirping frequency.', 'The speaker mentions the availability of a data set related to the frequency at which crickets chirp according to the temperature outside, which can be used for analysis.', 'The chapter emphasizes the simplification of a two-dimensional space for data analysis, highlighting the process of predicting outcomes based on a single variable, such as temperature.', 'The machine learning recipe utilizes the ordinary least squares method for statistical analysis and creating the line of best fit for data.', 'The technique of gradient descent involves calculus, specifically partial derivatives and the chain rule, and will be further explained in a follow-up video.']}