title
3.5: Mathematics of Gradient Descent - Intelligence and Learning
description
In this video, I explain the mathematics behind Linear Regression with Gradient Descent, which was the topic of my previous machine learning video (https://youtu.be/L-Lsfu4ab74)
This video is part of session 3 of my Spring 2017 ITP "Intelligence and Learning" course (https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/tree/master/week3-classification-regression)
3Blue1Brown's Essence of Calculus: https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr
My videos on calculus:
Power Rule: https://youtu.be/IKb_3FJtA1U
Chain Rule: https://youtu.be/cE6wr0_ad8Y
Partial Derivative: https://youtu.be/-WVBXXV81R4
Support this channel on Patreon: https://patreon.com/codingtrain
To buy Coding Train merchandise: https://www.designbyhumans.com/shop/codingtrain/
Donate to the Processing Foundation: https://processingfoundation.org/
Send me your questions and coding challenges!: https://github.com/CodingTrain/Rainbow-Topics
Contact:
Twitter: https://twitter.com/shiffman
The Coding Train website: http://thecodingtrain.com/
Links discussed in this video:
Session 3 of Intelligence and Learning: https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/tree/master/week3-classification-regression
3Blue1Brown's Essence of Calculus: https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr
Source Code for the all Video Lessons: https://github.com/CodingTrain/Rainbow-Code
p5.js: https://p5js.org/
Processing: https://processing.org
For More Coding Challenges: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH
For More Intelligence and Learning: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6YJ3XfHhT2Mm4Y5I99nrIKX
📄 Code of Conduct: https://github.com/CodingTrain/Code-of-Conduct
detail
{'title': '3.5: Mathematics of Gradient Descent - Intelligence and Learning', 'heatmap': [{'end': 614.471, 'start': 551.331, 'weight': 0.829}, {'end': 1098.42, 'start': 1000.418, 'weight': 0.877}, {'end': 1262.737, 'start': 1232.096, 'weight': 0.795}], 'summary': 'Series covers foundational calculus concepts for machine learning, including linear regression, optimizing cost functions, calculus, error computation, and the application of chain rule in a neural network model.', 'chapters': [{'end': 105.241, 'segs': [{'end': 42.16, 'src': 'embed', 'start': 16.777, 'weight': 0, 'content': [{'end': 22.119, 'text': "And so this video, if you're watching it, this is a follow up to my linear regression with gradient descent video.", 'start': 16.777, 'duration': 5.342}, {'end': 24.501, 'text': 'That video stands alone.', 'start': 22.66, 'duration': 1.841}, {'end': 32.47, 'text': "It's a programming video where all I do is walk through the code for how to build an example that demonstrates linear regression with gradient descent.", 'start': 24.561, 'duration': 7.909}, {'end': 38.817, 'text': 'And this is a puzzle piece in my machine learning series that will hopefully act as a foundation and a building block to your understanding of,', 'start': 32.509, 'duration': 6.308}, {'end': 42.16, 'text': 'Hopefully, some more creative or practical examples that will come later.', 'start': 39.318, 'duration': 2.842}], 'summary': 'Follow-up video on linear regression with gradient descent, part of a machine learning series.', 'duration': 25.383, 'max_score': 16.777, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM16777.jpg'}, {'end': 85.425, 'src': 'embed', 'start': 48.504, 'weight': 1, 'content': [{'end': 51.386, 'text': 'But what I tried to do in this video is give some background.', 'start': 48.504, 'duration': 2.882}, {'end': 52.867, 'text': 'And I kind of worked it all out here.', 'start': 51.426, 'duration': 1.441}, {'end': 53.508, 'text': 'This is the end.', 'start': 52.927, 'duration': 0.581}, {'end': 54.648, 'text': "This is what's on the whiteboard.", 'start': 53.528, 'duration': 1.12}, {'end': 58.071, 'text': 'I thought somehow if I used multiple colored markers, it would somehow make a better video.', 'start': 55.049, 'duration': 3.022}, {'end': 59.372, 'text': "I don't think I really succeeded at that.", 'start': 58.091, 'duration': 1.281}, {'end': 62.833, 'text': 'So I kind of walked through and tried to describe the math.', 'start': 60.712, 'duration': 2.121}, {'end': 68.075, 'text': 'I should say that this involves topics from calculus,', 'start': 62.913, 'duration': 5.162}, {'end': 74.898, 'text': "and there's a great video series by 3Blue1Brown on YouTube that gives you great background and more depth in calculus.", 'start': 68.075, 'duration': 6.823}, {'end': 78.42, 'text': "So I'll put links to those videos in this video's description.", 'start': 74.918, 'duration': 3.502}, {'end': 82.363, 'text': "Honestly, if you're really interested in kind of soaking up as much as this as you can,", 'start': 78.76, 'duration': 3.603}, {'end': 85.425, 'text': 'I would go and watch those videos first and then come back here.', 'start': 82.363, 'duration': 3.062}], 'summary': 'Video provides background on calculus, recommends 3blue1brown video series on youtube for in-depth understanding.', 'duration': 36.921, 'max_score': 48.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM48504.jpg'}], 'start': 0.329, 'title': 'Linear regression with gradient descent', 'summary': 'Provides a foundational understanding of calculus concepts and prepares for future practical examples in the machine learning series.', 'chapters': [{'end': 105.241, 'start': 0.329, 'title': 'Linear regression with gradient descent', 'summary': 'Provides a follow-up to the linear regression with gradient descent video, offering a foundational understanding of calculus concepts and preparation for future practical examples in the machine learning series.', 'duration': 104.912, 'highlights': ['The video serves as a follow-up to the linear regression with gradient descent video, providing foundational understanding for future practical examples in the machine learning series.', "The presenter attempts to explain the math behind the topic, albeit with limited success in the video's presentation.", 'Viewers are encouraged to watch a 3Blue1Brown video series on calculus for a deeper understanding of the discussed concepts.']}], 'duration': 104.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM329.jpg', 'highlights': ['The video serves as a follow-up to the linear regression with gradient descent video, providing foundational understanding for future practical examples in the machine learning series.', 'Viewers are encouraged to watch a 3Blue1Brown video series on calculus for a deeper understanding of the discussed concepts.', "The presenter attempts to explain the math behind the topic, albeit with limited success in the video's presentation."]}, {'end': 447.306, 'segs': [{'end': 260.014, 'src': 'embed', 'start': 151.018, 'weight': 0, 'content': [{'end': 153.64, 'text': 'My machine learning system makes a guess.', 'start': 151.018, 'duration': 2.622}, {'end': 159.443, 'text': 'The error is the difference between those two things.', 'start': 154.88, 'duration': 4.563}, {'end': 167.348, 'text': 'The error is y, the correct answer, minus the guess.', 'start': 160.003, 'duration': 7.345}, {'end': 177.08, 'text': 'So this relates to the idea of a cost function, a loss function.', 'start': 168.717, 'duration': 8.363}, {'end': 185.063, 'text': 'So if we want to evaluate how is our machine learning algorithm performing, we have this large data set.', 'start': 177.12, 'duration': 7.943}, {'end': 186.643, 'text': 'Maybe it has n elements.', 'start': 185.103, 'duration': 1.54}, {'end': 199.343, 'text': 'So what we want to do is from 1 to n for all n elements, we want to minimize that error.', 'start': 187.623, 'duration': 11.72}, {'end': 216.095, 'text': 'So the cost function, cost equals the sum of y sub i, every known answer, minus the guess sub i squared.', 'start': 200.304, 'duration': 15.791}, {'end': 218.697, 'text': 'So this is the formula.', 'start': 216.875, 'duration': 1.822}, {'end': 221.178, 'text': 'This is known as a cost function.', 'start': 218.757, 'duration': 2.421}, {'end': 223.26, 'text': 'This is the total error.', 'start': 221.619, 'duration': 1.641}, {'end': 234.386, 'text': 'for the particular model being the current m and b values that describe this particular line, this is the error.', 'start': 224.421, 'duration': 9.965}, {'end': 246.256, 'text': 'So perhaps we can agree that our goal is to minimize this cost also known as maybe a loss.', 'start': 235.047, 'duration': 11.209}, {'end': 248.7, 'text': 'We want to minimize that loss.', 'start': 246.997, 'duration': 1.703}, {'end': 250.842, 'text': 'We want to have the lowest error.', 'start': 249.02, 'duration': 1.822}, {'end': 254.287, 'text': 'We want the m and b values for the lowest error.', 'start': 250.902, 'duration': 3.385}, {'end': 257.25, 'text': 'So we want to minimize this function.', 'start': 254.908, 'duration': 2.342}, {'end': 260.014, 'text': 'Now, what does it mean to minimize a function?', 'start': 257.331, 'duration': 2.683}], 'summary': 'The machine learning system aims to minimize error using a cost function and optimize m and b values for the lowest error.', 'duration': 108.996, 'max_score': 151.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM151018.jpg'}, {'end': 366.274, 'src': 'embed', 'start': 315.502, 'weight': 3, 'content': [{'end': 323.133, 'text': 'Well, what it means to minimize a function is to actually find the x value that produces the lowest y.', 'start': 315.502, 'duration': 7.631}, {'end': 328.579, 'text': 'This is like the easiest thing in the world that we could ever possibly do right now.', 'start': 324.755, 'duration': 3.824}, {'end': 334.145, 'text': "You don't need any calculus, fancy math, or anything to minimize this function.", 'start': 329.34, 'duration': 4.805}, {'end': 334.785, 'text': 'There it is.', 'start': 334.405, 'duration': 0.38}, {'end': 335.346, 'text': "It's at the bottom.", 'start': 334.805, 'duration': 0.541}, {'end': 336.407, 'text': "It's the lowest point.", 'start': 335.746, 'duration': 0.661}, {'end': 338.729, 'text': '0 There it is.', 'start': 336.427, 'duration': 2.302}, {'end': 339.931, 'text': 'I can see it.', 'start': 338.97, 'duration': 0.961}, {'end': 341.512, 'text': "It's quite obvious.", 'start': 340.651, 'duration': 0.861}, {'end': 354.67, 'text': "So this is the thing eventually we're going to in the machine learning systems that I'm going to get further into neural network based systems with many dimensions of data.", 'start': 343.386, 'duration': 11.284}, {'end': 361.092, 'text': "you know there might be some much more hard to describe crazy function that we're trying to approximate.", 'start': 354.67, 'duration': 6.422}, {'end': 362.352, 'text': "that it's much Harder.", 'start': 361.092, 'duration': 1.26}, {'end': 366.274, 'text': "I mean of course we could eyeball this as well, but it's much harder to sort of mathematically.", 'start': 362.352, 'duration': 3.922}], 'summary': 'Minimizing a function means finding the x value that produces the lowest y, which is the easiest thing to do without calculus or fancy math.', 'duration': 50.772, 'max_score': 315.502, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM315502.jpg'}, {'end': 424.951, 'src': 'embed', 'start': 395.833, 'weight': 4, 'content': [{'end': 401.836, 'text': 'And that is what I keep talking about, gradient descent.', 'start': 395.833, 'duration': 6.003}, {'end': 406.498, 'text': "So let's think about what gradient descent means.", 'start': 403.456, 'duration': 3.042}, {'end': 409.339, 'text': "Let's say we're looking at this point here.", 'start': 406.978, 'duration': 2.361}, {'end': 421.307, 'text': "and I'm going to walk along this function, and I'm right here, and I'm like, hello, I'm looking for the minimum.", 'start': 412.179, 'duration': 9.128}, {'end': 422.148, 'text': 'Is it over there??', 'start': 421.367, 'duration': 0.781}, {'end': 422.749, 'text': 'Is it over there??', 'start': 422.188, 'duration': 0.561}, {'end': 424.05, 'text': 'Could you help me, please?', 'start': 423.089, 'duration': 0.961}, {'end': 424.951, 'text': 'Could you please provide me?', 'start': 424.09, 'duration': 0.861}], 'summary': 'Discussion on gradient descent for finding the minimum point.', 'duration': 29.118, 'max_score': 395.833, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM395833.jpg'}], 'start': 105.322, 'title': 'Optimizing cost function and minimizing functions', 'summary': "Explains optimizing cost function in machine learning, detailing the cost function formula and the goal of minimizing it to improve model's performance. it also covers minimizing functions, finding lowest points, and introduces gradient descent for complex functions in machine learning systems.", 'chapters': [{'end': 260.014, 'start': 105.322, 'title': 'Optimizing cost function in machine learning', 'summary': "Explains the concept of a cost function in machine learning, detailing how it measures the error in predictions using the formula cost = σ(yi - guessi)^2, and the goal of minimizing this cost to improve the model's performance.", 'duration': 154.692, 'highlights': ['The error is y, the correct answer, minus the guess. Explains the calculation of error in predictions as the difference between the correct answer and the guess.', "Cost equals the sum of y sub i, every known answer, minus the guess sub i squared. Defines the cost function formula as the sum of squared differences between known answers and guesses, representing the total error in the model's predictions.", "Our goal is to minimize this cost also known as maybe a loss. Emphasizes the objective of minimizing the cost function, also referred to as minimizing loss, to achieve the lowest error and improve the model's predictive accuracy."]}, {'end': 447.306, 'start': 261.127, 'title': 'Minimizing functions and gradient descent', 'summary': 'Covers the concept of minimizing functions, illustrating how to find the lowest point of a simple function and introducing the concept of gradient descent for finding minima in more complex functions in machine learning systems.', 'duration': 186.179, 'highlights': ['Minimizing a function involves finding the x value that produces the lowest y, demonstrated with the function y equals x squared.', "The concept of gradient descent is introduced as a method for finding the minimum of a function by iteratively adjusting the input based on the function's gradient and step size.", 'Illustrates the increasing complexity of finding minima in machine learning systems with higher dimensions of data and more complex functions.']}], 'duration': 341.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM105322.jpg', 'highlights': ["Emphasizes the objective of minimizing the cost function, also referred to as minimizing loss, to achieve the lowest error and improve the model's predictive accuracy.", "Defines the cost function formula as the sum of squared differences between known answers and guesses, representing the total error in the model's predictions.", 'Explains the calculation of error in predictions as the difference between the correct answer and the guess.', 'Illustrates the increasing complexity of finding minima in machine learning systems with higher dimensions of data and more complex functions.', "The concept of gradient descent is introduced as a method for finding the minimum of a function by iteratively adjusting the input based on the function's gradient and step size.", 'Minimizing a function involves finding the x value that produces the lowest y, demonstrated with the function y equals x squared.']}, {'end': 748.899, 'segs': [{'end': 503.144, 'src': 'embed', 'start': 473.876, 'weight': 0, 'content': [{'end': 476.737, 'text': 'and how you can sort of think about these concepts from calculus.', 'start': 473.876, 'duration': 2.861}, {'end': 485.019, 'text': "But for us right now, what we can think of is it's just the slope of the graph at this particular point.", 'start': 477.278, 'duration': 7.741}, {'end': 490.24, 'text': 'And a way to describe that is like a tangent line to that graph.', 'start': 485.459, 'duration': 4.781}, {'end': 500.063, 'text': "So if I'm able to compute this line, then I could say ah, well, this direction, if I go this direction,", 'start': 491.5, 'duration': 8.563}, {'end': 503.144, 'text': "it's going up and I'm going away from the minimum.", 'start': 500.063, 'duration': 3.081}], 'summary': 'The slope of the graph represents the tangent line and direction away from the minimum.', 'duration': 29.268, 'max_score': 473.876, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM473876.jpg'}, {'end': 627.763, 'src': 'heatmap', 'start': 551.331, 'weight': 1, 'content': [{'end': 562.859, 'text': 'So this is the gradient descent algorithm that I programmed in the previous video, where what I did is we looked at every data point, we made a guess,', 'start': 551.331, 'duration': 11.528}, {'end': 570.104, 'text': 'we got the error, the difference between the known output and the guess, and then we adjusted the m and b values.', 'start': 562.859, 'duration': 7.245}, {'end': 572.065, 'text': 'm equals.', 'start': 571.004, 'duration': 1.061}, {'end': 575.627, 'text': 'so the idea here is that we want to say every.', 'start': 572.065, 'duration': 3.562}, {'end': 576.728, 'text': "as we're training the.", 'start': 575.627, 'duration': 1.101}, {'end': 588.173, 'text': "I don't know which color I'm using right now, as we're training the system, I want to say m equals, m plus delta m.", 'start': 576.728, 'duration': 11.445}, {'end': 590.573, 'text': 'Some change in m.', 'start': 588.173, 'duration': 2.4}, {'end': 594.854, 'text': 'b equals b plus delta b.', 'start': 590.573, 'duration': 4.281}, {'end': 605.217, 'text': 'So I want to know what is a way that I can change the value of m in y equals mx plus b in order to make the error less.', 'start': 594.854, 'duration': 10.363}, {'end': 610.718, 'text': 'The next step that I want to do is find the minimum cost.', 'start': 606.657, 'duration': 4.061}, {'end': 614.471, 'text': 'I want to minimize this function for a particular.', 'start': 612.509, 'duration': 1.962}, {'end': 617.854, 'text': 'I want to find the m and b values with the lowest error.', 'start': 614.471, 'duration': 3.383}, {'end': 625.161, 'text': "So to do that, we've established that gradient descent says if I could find the derivative of a function, I know which way to move to minimize it.", 'start': 618.415, 'duration': 6.746}, {'end': 627.763, 'text': 'So somehow I need to find the derivative of this function.', 'start': 625.401, 'duration': 2.362}], 'summary': 'Programmed gradient descent algorithm to minimize error in m and b values.', 'duration': 21.106, 'max_score': 551.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM551331.jpg'}], 'start': 447.546, 'title': 'Calculus and gradient descent', 'summary': 'Explains the derivative in calculus for determining slope and the gradient descent algorithm for minimizing error by adjusting values.', 'chapters': [{'end': 523.275, 'start': 447.546, 'title': 'Understanding the derivative in calculus', 'summary': 'Explains the concept of the derivative as a way to determine the slope of a graph at a specific point, enabling decision-making based on the direction of the slope and the proximity to the minimum.', 'duration': 75.729, 'highlights': ['The derivative in calculus helps to determine the slope of the graph at a specific point, allowing decision-making based on the direction of the slope and proximity to the minimum.', 'Understanding the derivative involves visualizing it as the slope of the graph at a particular point, providing insights into the direction to move and the proximity to the minimum.', "The concept of the derivative is crucial for decision-making, as it enables the identification of the slope's direction and the proximity to the minimum point, guiding the determination of the step size and direction."]}, {'end': 748.899, 'start': 523.595, 'title': 'Understanding gradient descent algorithm', 'summary': 'Explains the concept of gradient descent algorithm and its application in finding the minimum cost by computing the derivative of the error function, emphasizing the process of adjusting the m and b values to minimize the error.', 'duration': 225.304, 'highlights': ['The gradient descent algorithm involves adjusting the m and b values by computing the derivative of the error function, with the goal of minimizing the error.', 'The process involves finding the derivative of the cost function relative to m to understand how the error changes when m changes.', 'The function for the error, guess minus y squared, is simplified to the cost function j, which is then used to find the derivative relative to m for minimizing the error.']}], 'duration': 301.353, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM447546.jpg', 'highlights': ['The derivative in calculus helps to determine the slope of the graph at a specific point, allowing decision-making based on the direction of the slope and proximity to the minimum.', 'The gradient descent algorithm involves adjusting the m and b values by computing the derivative of the error function, with the goal of minimizing the error.', 'Understanding the derivative involves visualizing it as the slope of the graph at a particular point, providing insights into the direction to move and the proximity to the minimum.']}, {'end': 1115.804, 'segs': [{'end': 821.934, 'src': 'embed', 'start': 786.831, 'weight': 2, 'content': [{'end': 787.891, 'text': "So that's the power rule.", 'start': 786.831, 'duration': 1.06}, {'end': 797.152, 'text': "So I'm going to now apply that here, and I'm going to say, I don't know why I'm in purple now, but I am.", 'start': 790.671, 'duration': 6.481}, {'end': 797.933, 'text': '2 times error.', 'start': 797.172, 'duration': 0.761}, {'end': 802.749, 'text': 'to the first power.', 'start': 801.929, 'duration': 0.82}, {'end': 806.93, 'text': 'So the power rule says now 2 times error.', 'start': 803.729, 'duration': 3.201}, {'end': 809.971, 'text': 'OK But I also need the chain rule.', 'start': 807.63, 'duration': 2.341}, {'end': 810.591, 'text': "I'm not done.", 'start': 809.991, 'duration': 0.6}, {'end': 813.792, 'text': 'Why do I need the chain rule? Well, the chain rule is a rule.', 'start': 810.671, 'duration': 3.121}, {'end': 821.934, 'text': "I'm going to erase this over here and use another marker because somehow if I just use multiple colored markers, all of this will make sense.", 'start': 813.812, 'duration': 8.122}], 'summary': 'Applying power rule and chain rule in calculus with multiple colored markers.', 'duration': 35.103, 'max_score': 786.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM786831.jpg'}, {'end': 917.752, 'src': 'embed', 'start': 849.719, 'weight': 0, 'content': [{'end': 856.524, 'text': 'Well, what the chain rule says is if I want to get the derivative of y relative to z,', 'start': 849.719, 'duration': 6.805}, {'end': 875.946, 'text': 'What I can do is I can get the derivative of y relative to x to x and then multiply that by the derivative of x relative to z, which is then times 2z.', 'start': 858.035, 'duration': 17.911}, {'end': 878.77, 'text': 'I can chain derivatives.', 'start': 877.388, 'duration': 1.382}, {'end': 885.439, 'text': 'I can get the derivative of 1 relative to something times the derivative of that something relative to something else.', 'start': 879.09, 'duration': 6.349}, {'end': 888.102, 'text': "And that's actually weirdly what's going on here.", 'start': 885.779, 'duration': 2.323}, {'end': 900.428, 'text': 'It may not be immediately apparent to you, j is a function of error and error is a function of m and b,', 'start': 889.444, 'duration': 10.984}, {'end': 905.609, 'text': "because I'm computing the error as the guess mx plus b, minus a, known y.", 'start': 900.428, 'duration': 5.181}, {'end': 917.752, 'text': 'So here, I could then say, get this derivative, 2 times error, and multiply that by the derivative of that error function itself.', 'start': 905.609, 'duration': 12.143}], 'summary': 'The chain rule allows chaining derivatives, computing the derivative of y relative to z by multiplying the derivative of y relative to x by the derivative of x relative to z, which is then times 2z.', 'duration': 68.033, 'max_score': 849.719, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM849719.jpg'}, {'end': 1098.42, 'src': 'heatmap', 'start': 1000.418, 'weight': 0.877, 'content': [{'end': 1013.838, 'text': 'So this, the derivative of this, right, the power rule says, 1 times x times m to the 0 power, which means x.', 'start': 1000.418, 'duration': 13.42}, {'end': 1017.702, 'text': "And the derivative of a constant is 0, because the constant doesn't change right?", 'start': 1013.838, 'duration': 3.864}, {'end': 1020.744, 'text': 'Derivative describing how something changes.', 'start': 1017.902, 'duration': 2.842}, {'end': 1022.145, 'text': 'the derivative of this is 0..', 'start': 1020.744, 'duration': 1.401}, {'end': 1033.973, 'text': "So guess what? It's just x, meaning this whole thing turns out to just be x equals 2 times the error times x.", 'start': 1022.145, 'duration': 11.828}, {'end': 1034.553, 'text': 'And guess what?', 'start': 1033.973, 'duration': 0.58}, {'end': 1043.044, 'text': "The whole point is, if you watched the previous video, is we're going to take this and multiply it by something called a learning rate.", 'start': 1035.214, 'duration': 7.83}, {'end': 1046.647, 'text': 'Because we know the direction to go.', 'start': 1043.483, 'duration': 3.164}, {'end': 1051.094, 'text': 'This is giving us the direction to go to minimize that error, minimize that cost.', 'start': 1046.909, 'duration': 4.185}, {'end': 1053.968, 'text': 'But do I want to take a big step or a little step??', 'start': 1052.247, 'duration': 1.721}, {'end': 1058.791, 'text': "Well, if I'm going to multiply it by a learning rate anyway, it's sort of like this 2 has no point,", 'start': 1054.368, 'duration': 4.423}, {'end': 1061.912, 'text': "because I could have a learning rate that's twice as big or half as big.", 'start': 1058.791, 'duration': 3.121}, {'end': 1066.489, 'text': 'This is all it is, error times x.', 'start': 1064.008, 'duration': 2.481}, {'end': 1071.83, 'text': 'All of this math and craziness with power rule and chain rule and partial derivative.', 'start': 1066.489, 'duration': 5.341}, {'end': 1073.99, 'text': 'this it all boils down to.', 'start': 1071.83, 'duration': 2.16}, {'end': 1077.691, 'text': 'just finally, we get this error times x.', 'start': 1073.99, 'duration': 3.701}, {'end': 1080.912, 'text': "That's what should go here in delta m.", 'start': 1077.691, 'duration': 3.221}, {'end': 1082.933, 'text': "Guess what? Let's go back over to our code.", 'start': 1080.912, 'duration': 2.021}, {'end': 1084.313, 'text': 'And we can see there it is.', 'start': 1083.433, 'duration': 0.88}, {'end': 1089.678, 'text': 'Error times x.', 'start': 1087.717, 'duration': 1.961}, {'end': 1091.298, 'text': 'Error times x.', 'start': 1089.678, 'duration': 1.62}, {'end': 1091.758, 'text': 'There we go.', 'start': 1091.298, 'duration': 0.46}, {'end': 1092.598, 'text': "That's it.", 'start': 1092.218, 'duration': 0.38}, {'end': 1096.139, 'text': "That's why that says error times x.", 'start': 1092.958, 'duration': 3.181}, {'end': 1098.42, 'text': "That was a lot, but that's why it says it.", 'start': 1096.139, 'duration': 2.281}], 'summary': 'Derivative simplifies to error times x, guiding cost minimization.', 'duration': 98.002, 'max_score': 1000.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1000418.jpg'}, {'end': 1066.489, 'src': 'embed', 'start': 1017.902, 'weight': 3, 'content': [{'end': 1020.744, 'text': 'Derivative describing how something changes.', 'start': 1017.902, 'duration': 2.842}, {'end': 1022.145, 'text': 'the derivative of this is 0..', 'start': 1020.744, 'duration': 1.401}, {'end': 1033.973, 'text': "So guess what? It's just x, meaning this whole thing turns out to just be x equals 2 times the error times x.", 'start': 1022.145, 'duration': 11.828}, {'end': 1034.553, 'text': 'And guess what?', 'start': 1033.973, 'duration': 0.58}, {'end': 1043.044, 'text': "The whole point is, if you watched the previous video, is we're going to take this and multiply it by something called a learning rate.", 'start': 1035.214, 'duration': 7.83}, {'end': 1046.647, 'text': 'Because we know the direction to go.', 'start': 1043.483, 'duration': 3.164}, {'end': 1051.094, 'text': 'This is giving us the direction to go to minimize that error, minimize that cost.', 'start': 1046.909, 'duration': 4.185}, {'end': 1053.968, 'text': 'But do I want to take a big step or a little step??', 'start': 1052.247, 'duration': 1.721}, {'end': 1058.791, 'text': "Well, if I'm going to multiply it by a learning rate anyway, it's sort of like this 2 has no point,", 'start': 1054.368, 'duration': 4.423}, {'end': 1061.912, 'text': "because I could have a learning rate that's twice as big or half as big.", 'start': 1058.791, 'duration': 3.121}, {'end': 1066.489, 'text': 'This is all it is, error times x.', 'start': 1064.008, 'duration': 2.481}], 'summary': 'Derivative defines change, used with learning rate to minimize error.', 'duration': 48.587, 'max_score': 1017.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1017902.jpg'}], 'start': 748.899, 'title': 'Calculus and error computation', 'summary': "Delves into the application of power and chain rules in calculus, highlighting derivatives and function inter-relationships. it also discusses partial derivatives' role in error computation, emphasizing minimizing error in a learning algorithm through derivative calculations.", 'chapters': [{'end': 885.439, 'start': 748.899, 'title': 'Calculus rules: power and chain rule', 'summary': 'Explains the application of the power rule and chain rule in calculus, with an emphasis on their derivatives and the inter-relationship between functions, using multiple colored markers as a visual aid.', 'duration': 136.54, 'highlights': ['The chain rule explains the method to find the derivative of a function with respect to another variable, involving the derivatives of the intermediate variables, demonstrated through the example of finding the derivative of y with respect to z, resulting in the expression 2z. This emphasizes the interconnectedness of functions and variables.', 'The power rule is demonstrated through the example of finding the derivative of 2 times the error to the first power, resulting in the application of the rule as 2 times error, highlighting the simplicity and efficiency of the rule in finding derivatives.']}, {'end': 1115.804, 'start': 885.779, 'title': 'Partial derivatives in error computation', 'summary': "Explains the concept of partial derivatives in error computation, demonstrating how to calculate the derivative of error with respect to a variable and its significance in minimizing error in a learning algorithm, all boiling down to the equation 'error times x'.", 'duration': 230.025, 'highlights': ["The derivative of error with respect to the variable 'm' is calculated as x times 2 times the error, which is crucial in determining the direction to minimize error in the learning algorithm.", 'The significance of the derivative in minimizing error is emphasized through the concept of a learning rate, which determines the step size for error minimization.', "The detailed explanation and demonstration of the concept provides a clear understanding of how the complex mathematical concepts boil down to the simple equation 'error times x' for minimizing error."]}], 'duration': 366.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM748899.jpg', 'highlights': ["The derivative of error with respect to the variable 'm' is calculated as x times 2 times the error, crucial in determining the direction to minimize error in the learning algorithm.", 'The chain rule explains the method to find the derivative of a function with respect to another variable, involving the derivatives of the intermediate variables, demonstrated through the example of finding the derivative of y with respect to z, resulting in the expression 2z, emphasizing the interconnectedness of functions and variables.', 'The power rule is demonstrated through the example of finding the derivative of 2 times the error to the first power, resulting in the application of the rule as 2 times error, highlighting the simplicity and efficiency of the rule in finding derivatives.', 'The significance of the derivative in minimizing error is emphasized through the concept of a learning rate, which determines the step size for error minimization.', "The detailed explanation and demonstration of the concept provides a clear understanding of how the complex mathematical concepts boil down to the simple equation 'error times x' for minimizing error."]}, {'end': 1342.171, 'segs': [{'end': 1165.143, 'src': 'embed', 'start': 1140.191, 'weight': 1, 'content': [{'end': 1151.475, 'text': 'And the chain rule says that if I look at the derivative of that function relative to the error, I can multiply that by the derivative of the error.', 'start': 1140.191, 'duration': 11.284}, {'end': 1155.817, 'text': 'relative to m.', 'start': 1153.476, 'duration': 2.341}, {'end': 1157.619, 'text': 'So this is actually the chain rule.', 'start': 1155.817, 'duration': 1.802}, {'end': 1163.962, 'text': 'So I can get this by doing the derivative of relative to error, the derivative of error relative to m.', 'start': 1157.639, 'duration': 6.323}, {'end': 1165.143, 'text': "And that's what's going on here.", 'start': 1163.962, 'duration': 1.181}], 'summary': 'The chain rule states that the derivative of a function relative to an error can be obtained by multiplying it with the derivative of the error relative to m.', 'duration': 24.952, 'max_score': 1140.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1140191.jpg'}, {'end': 1262.737, 'src': 'heatmap', 'start': 1232.096, 'weight': 0.795, 'content': [{'end': 1237.32, 'text': "And again, we could get rid of the two, so it's really just error times x, or error times one.", 'start': 1232.096, 'duration': 5.224}, {'end': 1241.544, 'text': 'And then, if I come back over here again, there you go.', 'start': 1237.701, 'duration': 3.843}, {'end': 1246.187, 'text': 'Error times x, m changes by error times x, b changes by just error.', 'start': 1241.564, 'duration': 4.623}, {'end': 1262.737, 'text': 'So.. That hopefully gives you some more background as to why these formulas exist this way.', 'start': 1252.652, 'duration': 10.085}], 'summary': 'Explanation of error times x and error times one for formula existence.', 'duration': 30.641, 'max_score': 1232.096, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1232096.jpg'}, {'end': 1316.238, 'src': 'embed', 'start': 1279.792, 'weight': 0, 'content': [{'end': 1280.753, 'text': 'Change the weight.', 'start': 1279.792, 'duration': 0.961}, {'end': 1282.933, 'text': "Instead of saying m and b, I'm going to say the weight.", 'start': 1280.813, 'duration': 2.12}, {'end': 1288.074, 'text': 'Well, the weight changes based on the input multiplied by the error.', 'start': 1283.233, 'duration': 4.841}, {'end': 1290.234, 'text': "And then there's going to be some other pieces.", 'start': 1288.374, 'duration': 1.86}, {'end': 1292.195, 'text': 'But this formula is going to be everywhere.', 'start': 1290.274, 'duration': 1.921}, {'end': 1293.675, 'text': 'So I hope.', 'start': 1292.715, 'duration': 0.96}, {'end': 1296.016, 'text': 'This was another attempt.', 'start': 1294.975, 'duration': 1.041}, {'end': 1305.266, 'text': "Again, there's a lot of things I've glossed over here, in terms of a lot of the background, in terms of what really is a derivative?", 'start': 1297.097, 'duration': 8.169}, {'end': 1306.648, 'text': 'Why does calculus exist??', 'start': 1305.346, 'duration': 1.302}, {'end': 1309.251, 'text': 'Why does the chain rule work the way it works??', 'start': 1307.389, 'duration': 1.862}, {'end': 1313.575, 'text': 'Why does the power rule work the way it works? That partial derivative, huh?', 'start': 1309.291, 'duration': 4.284}, {'end': 1314.917, 'text': 'Did you say something about partial derivative?', 'start': 1313.615, 'duration': 1.302}, {'end': 1316.238, 'text': 'And so again,', 'start': 1315.197, 'duration': 1.041}], 'summary': 'Discussing weight changes based on input and error in a formula.', 'duration': 36.446, 'max_score': 1279.792, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1279792.jpg'}], 'start': 1117.204, 'title': 'Chain rule and neural network model', 'summary': 'Discusses the application of the chain rule in calculating the derivative of the cost function relative to a variable and explains the derivation and application of a formula for weight adjustment in a neural network model.', 'chapters': [{'end': 1189.277, 'start': 1117.204, 'title': 'Understanding chain rule in derivatives', 'summary': 'Discusses the application of the chain rule in calculating the derivative of the cost function relative to a variable, emphasizing the impact of changing the variable on the cost function and explaining the concept through a practical example.', 'duration': 72.073, 'highlights': ['The chain rule is applied to calculate the derivative of the cost function relative to a variable, illustrating the impact of variable change on the cost function.', 'Explanation of the chain rule involves multiplying the derivative of the function relative to the error by the derivative of the error relative to the variable.', 'Practical example demonstrates the application of the chain rule and the cancellation effect when calculating derivatives relative to different variables.']}, {'end': 1342.171, 'start': 1189.277, 'title': 'Neural network model summary', 'summary': 'Explains the derivation and application of a formula for weight adjustment in a neural network model, emphasizing the significance and recurring nature of the formula in subsequent sessions.', 'duration': 152.894, 'highlights': ['The weight changes based on the input multiplied by the error, which forms a crucial formula for weight adjustment in the neural network model.', 'The session provides insight into the background of calculus concepts like the power rule, chain rule, and partial derivatives, offering a foundational understanding for further exploration.', 'The presenter emphasizes the recurring nature of the weight adjustment formula, indicating its pervasive use in subsequent sessions of building the neural network model.']}], 'duration': 224.967, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/jc2IthslyzM/pics/jc2IthslyzM1117204.jpg', 'highlights': ['The weight changes based on the input multiplied by the error, forming a crucial formula for weight adjustment in the neural network model.', 'The chain rule is applied to calculate the derivative of the cost function relative to a variable, illustrating the impact of variable change on the cost function.', 'Explanation of the chain rule involves multiplying the derivative of the function relative to the error by the derivative of the error relative to the variable.', 'Practical example demonstrates the application of the chain rule and the cancellation effect when calculating derivatives relative to different variables.', 'The session provides insight into the background of calculus concepts like the power rule, chain rule, and partial derivatives, offering a foundational understanding for further exploration.', 'The presenter emphasizes the recurring nature of the weight adjustment formula, indicating its pervasive use in subsequent sessions of building the neural network model.']}], 'highlights': ['The video serves as a follow-up to the linear regression with gradient descent video, providing foundational understanding for future practical examples in the machine learning series.', "Emphasizes the objective of minimizing the cost function, also referred to as minimizing loss, to achieve the lowest error and improve the model's predictive accuracy.", 'The weight changes based on the input multiplied by the error, forming a crucial formula for weight adjustment in the neural network model.', "The derivative of error with respect to the variable 'm' is calculated as x times 2 times the error, crucial in determining the direction to minimize error in the learning algorithm.", 'The chain rule explains the method to find the derivative of a function with respect to another variable, involving the derivatives of the intermediate variables, demonstrated through the example of finding the derivative of y with respect to z, resulting in the expression 2z, emphasizing the interconnectedness of functions and variables.', 'The presenter emphasizes the recurring nature of the weight adjustment formula, indicating its pervasive use in subsequent sessions of building the neural network model.', 'Understanding the derivative involves visualizing it as the slope of the graph at a particular point, providing insights into the direction to move and the proximity to the minimum.']}