title
Machine Learning Tutorial Python - 4: Gradient Descent and Cost Function

description
In this tutorial, we are covering few important concepts in machine learning such as cost function, gradient descent, learning rate and mean squared error. We will use home price prediction use case to understand gradient descent. After going over math behind these concepts, we will write python code to implement gradient descent for linear regression in python. At the end I've an an exercise for you to practice gradient descent #MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #CostFunction #GradientDescent Code: https://github.com/codebasics/py/blob/master/ML/3_gradient_descent/gradient_descent.py Exercise csv file: https://github.com/codebasics/py/blob/master/ML/3_gradient_descent/Exercise/test_scores.csv Topics that are covered in this Video: 0:00 Overview 1:23 - What is prediction function? How can we calculate it? 4:00 - Mean squared error (ending time) 4:57 - Gradient descent algorithm and how it works? 11:00 - What is derivative? 12:30 - What is partial derivative? 16:07 - Use of python code to implement gradient descent 27:05 - Exercise is to come up with a linear function for given test results using gradient descent Topic Highlights: 1) Theory (We will talk about MSE, cost function, global minima) 2) Coding - (Plain python code that finds out a linear equation for given sample data points using gradient descent) 3) Exercise - (Exercise is to come up with a linear function for given test results using gradient descent) Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses. Next Video: Machine Learning Tutorial Python - 5: Save Model Using Joblib And Pickle: https://www.youtube.com/watch?v=KfnhNlD8WZI&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=5 Very Simple Explanation Of Neural Network: https://www.youtube.com/watch?v=ER2It2mIagI Populor Playlist: Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1 Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8 To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file. 🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website. #️⃣ Social Media #️⃣ 🔗 Discord: https://discord.gg/r42Kbuk 📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/ 📸 Codebasics Instagram: https://www.instagram.com/codebasicshub/ 🔊 Facebook: https://www.facebook.com/codebasicshub 📱 Twitter: https://twitter.com/codebasicshub 📝 Linkedin (Personal): https://www.linkedin.com/in/dhavalsays/ 📝 Linkedin (Codebasics): https://www.linkedin.com/company/codebasics/ 🔗 Patreon: https://www.patreon.com/codebasics?fan_landing=true

detail
{'title': 'Machine Learning Tutorial Python - 4: Gradient Descent and Cost Function', 'heatmap': [{'end': 273.862, 'start': 236.76, 'weight': 1}, {'end': 335.07, 'start': 290.17, 'weight': 0.843}, {'end': 977.949, 'start': 918.631, 'weight': 0.942}, {'end': 1686.614, 'start': 1671.621, 'weight': 0.778}], 'summary': 'This tutorial series on machine learning and python covers topics such as mean square error, cost function, and gradient descent, with practical implementation in python using the sklearn library. it explains the iterative process of gradient descent, emphasizing the importance of adjusting step sizes and slopes to reach the global minimum, and introduces slope calculation, derivatives, and partial derivatives, while also discussing cost reduction tracking and learning rate manipulation in the context of the gradient descent algorithm.', 'chapters': [{'end': 142.389, 'segs': [{'end': 38.863, 'src': 'embed', 'start': 14.662, 'weight': 1, 'content': [{'end': 21.267, 'text': 'Now, when you start going through machine learning tutorials, the thing that you inevitably come across is mathematical equations.', 'start': 14.662, 'duration': 6.605}, {'end': 27.473, 'text': 'And by looking at them, the first thought that jumps into your mind is, oh, my God, I suck at math.', 'start': 22.148, 'duration': 5.325}, {'end': 29.975, 'text': 'I used to get four out of 50 in my math test.', 'start': 27.593, 'duration': 2.382}, {'end': 34.079, 'text': "How am I going to deal with this? Let's not worry too much about it.", 'start': 30.376, 'duration': 3.703}, {'end': 38.863, 'text': "We can take one step at a time and the things won't seem that much hard.", 'start': 34.639, 'duration': 4.224}], 'summary': 'Machine learning tutorials often involve intimidating mathematical equations, but taking it step by step can make it less daunting.', 'duration': 24.201, 'max_score': 14.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw14662.jpg'}, {'end': 91.525, 'src': 'embed', 'start': 60.581, 'weight': 0, 'content': [{'end': 62.202, 'text': "With that, let's get started.", 'start': 60.581, 'duration': 1.621}, {'end': 70.328, 'text': 'during our linear algebra class in our school days, what we used to have was this equation and and x as an input,', 'start': 62.202, 'duration': 8.126}, {'end': 80.596, 'text': 'and we used to compute the value of y, where the way you derive 9 is by multiplying this 3 with 2, which will be 6 plus 3,', 'start': 70.328, 'duration': 10.268}, {'end': 83.498, 'text': "and that's how you come up with 9..", 'start': 80.596, 'duration': 2.902}, {'end': 91.525, 'text': 'In case of machine learning, however, you have observations or training data set, which is your input and output.', 'start': 83.498, 'duration': 8.027}], 'summary': 'In machine learning, input observations lead to output data.', 'duration': 30.944, 'max_score': 60.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw60581.jpg'}], 'start': 0.149, 'title': 'Gradient descent in machine learning', 'summary': 'Covers mean square error, cost function, gradient descent, and learning rate in machine learning, emphasizing implementation in a python program and practical application with the sklearn library.', 'chapters': [{'end': 142.389, 'start': 0.149, 'title': 'Gradient descent in machine learning', 'summary': 'Covers the important concepts of mean square error, cost function, gradient descent, and learning rate in machine learning, emphasizing the implementation of gradient descent in a python program, and the practical application of understanding these concepts when using the sklearn library for better utilization.', 'duration': 142.24, 'highlights': ['The chapter emphasizes the implementation of gradient descent in a Python program.', 'Understanding the concepts for better utilization of the sklearn library.', 'Importance of understanding the internals for practical use.', 'Explanation of deriving an equation for prediction function in machine learning.', 'Practical example of deriving an equation for predicting home prices.']}], 'duration': 142.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw149.jpg', 'highlights': ['Practical example of deriving an equation for predicting home prices.', 'Understanding the concepts for better utilization of the sklearn library.', 'The chapter emphasizes the implementation of gradient descent in a Python program.', 'Importance of understanding the internals for practical use.', 'Explanation of deriving an equation for prediction function in machine learning.']}, {'end': 330.806, 'segs': [{'end': 191.751, 'src': 'embed', 'start': 165.622, 'weight': 1, 'content': [{'end': 169.969, 'text': 'but you try to draw a line which, which is kind of a best fit.', 'start': 165.622, 'duration': 4.347}, {'end': 178.802, 'text': 'all right, but the problem here is you might have so many lines right that can potentially go through these data points.', 'start': 169.969, 'duration': 8.833}, {'end': 182.22, 'text': 'My data set is very simple here.', 'start': 180.057, 'duration': 2.163}, {'end': 191.751, 'text': "If you have like very heavy data set and if it's like scattered all over the place, then drawing these lines becomes even more difficult.", 'start': 182.32, 'duration': 9.431}], 'summary': 'Drawing the best-fit line for scattered data can be challenging.', 'duration': 26.129, 'max_score': 165.622, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw165622.jpg'}, {'end': 276.743, 'src': 'heatmap', 'start': 221.391, 'weight': 2, 'content': [{'end': 227.915, 'text': "the reason you want to square them is these deltas could be negative also, and if you don't square them and just add them,", 'start': 221.391, 'duration': 6.524}, {'end': 231.357, 'text': 'then the results might be skewed.', 'start': 227.915, 'duration': 3.442}, {'end': 235.219, 'text': 'after that you sum them up and divide it by n.', 'start': 231.357, 'duration': 3.862}, {'end': 236.76, 'text': 'so n here is 5.', 'start': 235.219, 'duration': 1.541}, {'end': 239.362, 'text': 'it is number of data points that you have available.', 'start': 236.76, 'duration': 2.602}, {'end': 243.657, 'text': 'The result is called mean square error.', 'start': 240.816, 'duration': 2.841}, {'end': 250.478, 'text': 'And mean square error is nothing but your actual data point minus the predicted data point.', 'start': 244.677, 'duration': 5.801}, {'end': 255.64, 'text': 'You square it, sum them up, and then divide by n.', 'start': 251.178, 'duration': 4.462}, {'end': 259.62, 'text': 'This mean square error is also called a cost function.', 'start': 255.64, 'duration': 3.98}, {'end': 265.502, 'text': 'There are different type of cost function as well, but mean square error is the most popular one.', 'start': 260.401, 'duration': 5.101}, {'end': 273.862, 'text': 'And here y predicted is replaced by mx plus b because you know that y is equal to mx plus b.', 'start': 266.638, 'duration': 7.224}, {'end': 276.743, 'text': "So that's the equation for mean square error.", 'start': 273.862, 'duration': 2.881}], 'summary': 'Mean square error is calculated by squaring differences, summing them, and dividing by the number of data points, which is 5 in this case.', 'duration': 55.352, 'max_score': 221.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw221391.jpg'}, {'end': 309.186, 'src': 'embed', 'start': 273.862, 'weight': 0, 'content': [{'end': 276.743, 'text': "So that's the equation for mean square error.", 'start': 273.862, 'duration': 2.881}, {'end': 281.726, 'text': 'Now we initially saw that there are so many different lines that you can draw.', 'start': 277.183, 'duration': 4.543}, {'end': 289.67, 'text': "You're not going to try every permutation and combination of m and b because that is very inefficient.", 'start': 282.526, 'duration': 7.144}, {'end': 297.615, 'text': 'You want to take some efficient approach where in very less iteration, you can reach your answer okay,', 'start': 290.17, 'duration': 7.445}, {'end': 309.186, 'text': 'and gradient descent is that algorithm that helps you find the best fit line in very less number of iteration or in a very efficient way?', 'start': 297.615, 'duration': 11.571}], 'summary': 'Gradient descent finds best fit line efficiently.', 'duration': 35.324, 'max_score': 273.862, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw273862.jpg'}], 'start': 142.789, 'title': 'Linear regression and gradient descent', 'summary': 'Explores deriving the best fit line equation, mean square error, and the usage of gradient descent algorithm to find the best fit line efficiently, highlighting the concept of mean square error, cost function, and the iterative process of gradient descent.', 'chapters': [{'end': 220.771, 'start': 142.789, 'title': 'Deriving best fit line equation', 'summary': 'Explores deriving the best fit line equation from given input and output data points, addressing the challenge of identifying the best fit line among many potential lines through error calculation and squaring.', 'duration': 77.982, 'highlights': ['Identifying the best fit line among potential lines through error calculation and squaring', 'Addressing the challenge of drawing a best fit line through scattered data points', 'Exploring the process of deriving the best fit line equation from input and output data points']}, {'end': 273.862, 'start': 221.391, 'title': 'Mean square error and cost function', 'summary': 'Explains the concept of mean square error, which is the result of summing the squared differences between actual and predicted data points and dividing by the number of data points. it also highlights the usage of mean square error as a cost function in linear regression with the formula y = mx + b.', 'duration': 52.471, 'highlights': ['The mean square error is the result of summing the squared differences between actual and predicted data points and dividing by the number of data points, which in this case is 5.', 'Mean square error is also referred to as a cost function, with different types existing, but it is the most popular one.', 'In the context of linear regression, the formula y = mx + b is used to replace y predicted in the mean square error calculation.']}, {'end': 330.806, 'start': 273.862, 'title': 'Gradient descent for best fit line', 'summary': 'Explains how gradient descent algorithm efficiently finds the best fit line by minimizing mean square error through fewer iterations, using a plotted graph of m and b against the cost function.', 'duration': 56.944, 'highlights': ['Gradient descent is an algorithm that efficiently finds the best fit line in very few iterations or in a very efficient way.', 'The algorithm helps in minimizing mean square error through fewer iterations.', 'A plotted graph of m and b against the cost function demonstrates the process of finding the best fit line.']}], 'duration': 188.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw142789.jpg', 'highlights': ['Gradient descent efficiently finds best fit line in few iterations', 'Deriving best fit line equation from input and output data points', 'Mean square error is the result of summing squared differences between actual and predicted data points']}, {'end': 665.487, 'segs': [{'end': 386.687, 'src': 'embed', 'start': 330.806, 'weight': 1, 'content': [{'end': 335.07, 'text': 'and for every value of m and b you will find some cost.', 'start': 330.806, 'duration': 4.264}, {'end': 343.057, 'text': 'so if you keep on plotting those points here and if you create a plane out of it, it will look like this it will be like a ball.', 'start': 335.07, 'duration': 7.987}, {'end': 350.993, 'text': 'And what you want to do is you want to start with some value of M and B.', 'start': 345.289, 'duration': 5.704}, {'end': 352.553, 'text': 'People usually start with zero.', 'start': 350.993, 'duration': 1.56}, {'end': 358.097, 'text': 'So here you can see this point has M as zero and B as zero.', 'start': 352.614, 'duration': 5.483}, {'end': 361.519, 'text': 'And from that point, you calculate the cost.', 'start': 358.837, 'duration': 2.682}, {'end': 363.28, 'text': "So let's say the cost is thousand.", 'start': 361.579, 'duration': 1.701}, {'end': 371.72, 'text': "then you reduce the value of m and b by some amount, and we'll see what that amount is later on.", 'start': 364.696, 'duration': 7.024}, {'end': 374.321, 'text': 'so you take kind of like a mini step.', 'start': 371.72, 'duration': 2.601}, {'end': 382.245, 'text': 'you come here and you will see that the error is now reduced to somewhere around 900, something like that,', 'start': 374.321, 'duration': 7.924}, {'end': 386.687, 'text': 'and then again you reduce it by taking one more step.', 'start': 382.245, 'duration': 4.442}], 'summary': 'Gradient descent method minimizes cost by adjusting m and b values.', 'duration': 55.881, 'max_score': 330.806, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw330806.jpg'}, {'end': 449.993, 'src': 'embed', 'start': 417.064, 'weight': 0, 'content': [{'end': 419.145, 'text': 'So that M1, B1 will be here somewhere.', 'start': 417.064, 'duration': 2.081}, {'end': 421.266, 'text': 'Blue line will have M2, B2.', 'start': 419.645, 'duration': 1.621}, {'end': 422.987, 'text': 'M2, B2 will be this red dot.', 'start': 421.386, 'duration': 1.601}, {'end': 425.846, 'text': 'then red line will have m3b3.', 'start': 423.905, 'duration': 1.941}, {'end': 428.486, 'text': 'so m3b3 will be somewhere here on this plot.', 'start': 425.846, 'duration': 2.64}, {'end': 434.168, 'text': 'so you can have, like so many, numerous lines which can create this plot.', 'start': 428.486, 'duration': 5.682}, {'end': 438.989, 'text': 'now we just said that you will take this baby step.', 'start': 434.168, 'duration': 4.821}, {'end': 440.69, 'text': 'but how easily you do it?', 'start': 438.989, 'duration': 1.701}, {'end': 449.993, 'text': 'because visually it sounds easy, but mathematically, when you give this task to your computer, you have to come up with some concrete approach.', 'start': 440.69, 'duration': 9.303}], 'summary': 'Discussing plotting multiple lines and mathematical approach.', 'duration': 32.929, 'max_score': 417.064, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw417064.jpg'}, {'end': 505.146, 'src': 'embed', 'start': 472.119, 'weight': 3, 'content': [{'end': 479.885, 'text': 'If you look at this 3D plot from this direction, what you will see is a chart of B against cost.', 'start': 472.119, 'duration': 7.766}, {'end': 483.715, 'text': 'that will be this curvature.', 'start': 480.974, 'duration': 2.741}, {'end': 492.16, 'text': 'similarly, if you look at this chart from this direction, the chart will look something like this and in both the cases,', 'start': 483.715, 'duration': 8.445}, {'end': 505.146, 'text': 'you are starting at this point, which is this star, and then taking these many steps and trying to reach this minimum point, which is this red dot.', 'start': 492.16, 'duration': 12.986}], 'summary': '3d plot shows b against cost, aiming to reach minimum red dot.', 'duration': 33.027, 'max_score': 472.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw472119.jpg'}, {'end': 642.446, 'src': 'embed', 'start': 613.957, 'weight': 4, 'content': [{'end': 628.372, 'text': 'now we have to get into calculus a little bit, because calculus allows you to figure out these baby steps and when we are talking about these slopes.', 'start': 613.957, 'duration': 14.415}, {'end': 634.939, 'text': 'really, this slope is nothing but a derivative of b with respect to this cost function.', 'start': 628.372, 'duration': 6.567}, {'end': 642.446, 'text': 'okay, if you want to go in details, i recommend this channel three blue, one, brown.', 'start': 634.939, 'duration': 7.507}], 'summary': 'Calculus helps in understanding slopes and derivatives in cost functions.', 'duration': 28.489, 'max_score': 613.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw613957.jpg'}], 'start': 330.806, 'title': 'Gradient descent in machine learning', 'summary': 'Explains gradient descent for minimizing the cost function in machine learning by reducing parameters m and b, emphasizing the importance of adjusting step sizes and slopes to reach the global minimum, and cautioning against the use of fixed step sizes. it also refers to the relevance of calculus and derivative in the process.', 'chapters': [{'end': 471.198, 'start': 330.806, 'title': 'Gradient descent for cost minimization', 'summary': 'Explains the concept of gradient descent for minimizing the cost function in machine learning, using step-wise reduction of parameters m and b to reach the minimum error point, as illustrated with visualizations of different lines with varying m and b values.', 'duration': 140.392, 'highlights': ['The process involves gradually reducing the values of M and B until the minimum error point is reached, thereby finding the optimal parameters for the prediction function.', 'Visualizations are used to demonstrate the concept, showing different lines with varying M and B values and the step-wise reduction process to reach the best fit line.', 'The initial cost is reduced by taking mini steps, gradually decreasing the error until the minimum point, which signifies the optimal M and B values for the prediction function.']}, {'end': 665.487, 'start': 472.119, 'title': 'Optimizing gradient descent algorithm', 'summary': 'Discusses the gradient descent algorithm, emphasizing the importance of adjusting step sizes and slopes to reach the global minimum, while cautioning against the use of fixed step sizes, and referring to the relevance of calculus and derivative in the process.', 'duration': 193.368, 'highlights': ['The importance of adjusting step sizes and slopes to reach the global minimum.', 'Caution against the use of fixed step sizes, as it may lead to missing the global minima.', 'Emphasizing the relevance of calculus and derivative in the process of gradient descent.']}], 'duration': 334.681, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw330806.jpg', 'highlights': ['Visualizations demonstrate lines with varying M and B values and the step-wise reduction process.', 'The process involves gradually reducing the values of M and B until the minimum error point is reached.', 'The initial cost is reduced by taking mini steps, gradually decreasing the error until the minimum point.', 'The importance of adjusting step sizes and slopes to reach the global minimum.', 'Emphasizing the relevance of calculus and derivative in the process of gradient descent.', 'Caution against the use of fixed step sizes, as it may lead to missing the global minima.']}, {'end': 914.967, 'segs': [{'end': 727.595, 'src': 'embed', 'start': 666.187, 'weight': 0, 'content': [{'end': 671.951, 'text': "I'm on this website called Math is Fun and these guys have explained it really well.", 'start': 666.187, 'duration': 5.764}, {'end': 676.774, 'text': 'So slope is nothing but a change in y divided by change in x.', 'start': 672.812, 'duration': 3.962}, {'end': 685.817, 'text': 'Okay, so if you have a line like this And if you want to calculate slope between the two points here, it is 24 dr by 15.', 'start': 676.774, 'duration': 9.043}, {'end': 690.319, 'text': 'But what if you want to calculate the slope at a particular point? Right.', 'start': 685.817, 'duration': 4.502}, {'end': 699.421, 'text': 'Like in our case, if you remember here, we want to calculate a slope at a particular point.', 'start': 690.939, 'duration': 8.482}, {'end': 700.401, 'text': 'Same thing here.', 'start': 699.581, 'duration': 0.82}, {'end': 708.143, 'text': 'Right So that slope will be nothing but a small change in y divided by a small change in x.', 'start': 700.521, 'duration': 7.622}, {'end': 708.463, 'text': 'All right.', 'start': 708.143, 'duration': 0.32}, {'end': 721.133, 'text': "we'll say as x shrinks to 0 and y shrinks to 0, that's when you get more accurate slope.", 'start': 710.089, 'duration': 11.044}, {'end': 727.595, 'text': 'okay. so for the equation like x square, that slope will be 2x.', 'start': 721.133, 'duration': 6.462}], 'summary': 'Explanation of slope with example and formula, demonstrating how slope is calculated at a specific point and for a specific equation.', 'duration': 61.408, 'max_score': 666.187, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw666187.jpg'}, {'end': 861.696, 'src': 'embed', 'start': 816.951, 'weight': 4, 'content': [{'end': 823.829, 'text': 'again, if you want to go in detail, just follow a three blue one brown youtube channel,', 'start': 816.951, 'duration': 6.878}, {'end': 827.711, 'text': 'and that guy is like really good in explaining these concepts in detail.', 'start': 823.829, 'duration': 3.882}, {'end': 829.291, 'text': 'okay, so just to revise the concept.', 'start': 827.711, 'duration': 1.58}, {'end': 840.73, 'text': 'the derivative of this function, uh, will be 3x squared, and this is the notation for your derivative, The derivative of functions,', 'start': 829.291, 'duration': 11.439}, {'end': 851.953, 'text': 'which has dependency on two variables it will be a partial derivative and the partial derivative of this function with respect to x will be this and with respect to y will be 2y.', 'start': 840.73, 'duration': 11.223}, {'end': 855.534, 'text': 'And this is how you mention your derivative.', 'start': 852.313, 'duration': 3.221}, {'end': 860.315, 'text': "This is the notation that you use, right? It looks like a d, but it's not like.", 'start': 855.574, 'duration': 4.741}, {'end': 861.696, 'text': "It's like a curve d.", 'start': 860.355, 'duration': 1.341}], 'summary': 'The transcript explains the concept of derivatives, including the notation and specific examples.', 'duration': 44.745, 'max_score': 816.951, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw816951.jpg'}], 'start': 666.187, 'title': 'Calculating slope and derivatives', 'summary': 'Covers understanding slope calculation between points and at a specific point, with an example involving the equation x square and a slope of 2x. it also introduces derivatives using notation d by dx, explains the derivative of x square as 2x and the slope as 4, covers partial derivatives for functions with two variables, x and y, and provides examples and notations for partial derivatives. additionally, the chapter mentions finding partial derivatives for a mean square error function.', 'chapters': [{'end': 727.595, 'start': 666.187, 'title': 'Understanding slope calculation', 'summary': 'Discusses the concept of slope, illustrating its calculation between two points and at a specific point, and provides an example with the equation x square, where the slope is 2x.', 'duration': 61.408, 'highlights': ['The concept of slope is explained as a change in y divided by change in x, and the slope between two points is calculated as 24 dr by 15.', 'The calculation of slope at a particular point is detailed, emphasizing that it is a small change in y divided by a small change in x, with greater accuracy achieved as x and y shrink to 0.', 'The specific example of the equation x square is used to demonstrate that the slope will be 2x.']}, {'end': 914.967, 'start': 727.595, 'title': 'Derivative and partial derivatives', 'summary': 'Introduces the concept of derivative using the notation d by dx, explaining the derivative of x square as 2x and the slope as 4. it also covers partial derivatives for functions with two variables, x and y, and provides examples and notations for partial derivatives. the chapter concludes with a mention of finding partial derivatives for a mean square error function.', 'duration': 187.372, 'highlights': ['The derivative of x square is 2x, with the slope of the function being 4.', 'Explanation of partial derivatives for functions with two variables, x and y, and the notations used for partial derivatives.', 'Mention of finding partial derivatives for a mean square error function.']}], 'duration': 248.78, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw666187.jpg', 'highlights': ['The derivative of x square is 2x, with the slope of the function being 4.', 'The concept of slope is explained as a change in y divided by change in x, and the slope between two points is calculated as 24 dr by 15.', 'The calculation of slope at a particular point is detailed, emphasizing that it is a small change in y divided by a small change in x, with greater accuracy achieved as x and y shrink to 0.', 'The specific example of the equation x square is used to demonstrate that the slope will be 2x.', 'Explanation of partial derivatives for functions with two variables, x and y, and the notations used for partial derivatives.', 'Mention of finding partial derivatives for a mean square error function.']}, {'end': 1325.07, 'segs': [{'end': 984.093, 'src': 'heatmap', 'start': 918.631, 'weight': 1, 'content': [{'end': 921.594, 'text': 'So generally for a derivative, you put two here.', 'start': 918.631, 'duration': 2.963}, {'end': 922.655, 'text': 'So two came here.', 'start': 921.774, 'duration': 0.881}, {'end': 929.361, 'text': "And this becomes 2 minus 1, which is 1, which we don't show here.", 'start': 923.596, 'duration': 5.765}, {'end': 934.546, 'text': 'And then once you have partial derivative, what you are having is a direction.', 'start': 929.822, 'duration': 4.724}, {'end': 936.948, 'text': 'So partial derivatives gives you a slope.', 'start': 934.606, 'duration': 2.342}, {'end': 941.773, 'text': 'And then once you have direction, now you need to take a step.', 'start': 937.649, 'duration': 4.124}, {'end': 944.555, 'text': 'So for the step, you use something called learning rate.', 'start': 941.853, 'duration': 2.702}, {'end': 951.519, 'text': 'So you have initial value of m and then you subtract this much, your learning rate into slope.', 'start': 945.396, 'duration': 6.123}, {'end': 954.32, 'text': 'So for example, you are here on this chart.', 'start': 951.959, 'duration': 2.361}, {'end': 955.921, 'text': 'This is your b1 value.', 'start': 954.82, 'duration': 1.101}, {'end': 967.066, 'text': 'To come up with this b2 value, you will subtract learning rate multiplied by the partial derivative, which is nothing but a slope here.', 'start': 956.861, 'duration': 10.205}, {'end': 971.404, 'text': "Now let's write Python code to implement gradient descent.", 'start': 968.522, 'duration': 2.882}, {'end': 977.949, 'text': "I'm going to use PyCharm today instead of Jupyter Notebook because I'm planning to use some of the debugging features.", 'start': 972.185, 'duration': 5.764}, {'end': 984.093, 'text': "PyCharm's community edition is freely available to download from JetBrains website.", 'start': 978.409, 'duration': 5.684}], 'summary': 'Using partial derivatives, implement gradient descent in python with pycharm for debugging.', 'duration': 69.046, 'max_score': 918.631, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw918631.jpg'}, {'end': 1047.29, 'src': 'embed', 'start': 1013.288, 'weight': 0, 'content': [{'end': 1025.694, 'text': 'because Matrix multiplication is very convenient with this and also numpy array tends to be more faster than simple Python list.', 'start': 1013.288, 'duration': 12.406}, {'end': 1033.423, 'text': 'So the first thing we are going to do is start with some value of M current and B current right.', 'start': 1025.694, 'duration': 7.729}, {'end': 1047.29, 'text': 'so, again, to revise the theory, you start with some value of m and b and then you take these baby steps to reach to a global minima.', 'start': 1033.423, 'duration': 13.867}], 'summary': 'Using numpy arrays for matrix multiplication leads to faster computation. iterative approach to reach global minima.', 'duration': 34.002, 'max_score': 1013.288, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1013288.jpg'}], 'start': 915.047, 'title': 'Implementing gradient descent in python', 'summary': 'Discusses the application of partial derivatives in determining slope and the use of learning rates to update values, and covers the implementation of gradient descent in python using pycharm to adjust values of m and b through 1000 iterations to reach the global minima.', 'chapters': [{'end': 967.066, 'start': 915.047, 'title': 'Partial derivatives and learning rate', 'summary': 'Discusses partial derivatives, slopes, and learning rates, emphasizing the use of partial derivatives to determine slope and the application of learning rates to update values in a chart or equation.', 'duration': 52.019, 'highlights': ['Partial derivatives give you a slope, and then you use a learning rate to update values in a chart or equation.', 'Using a learning rate, you update the initial value by subtracting the learning rate multiplied by the partial derivative.']}, {'end': 1325.07, 'start': 968.522, 'title': 'Implementing gradient descent in python', 'summary': 'Covers the implementation of gradient descent in python using pycharm instead of jupyter notebook, solving for the best fit line using numpy arrays, and iterating through 1000 steps to adjust the values of m and b to reach the global minima.', 'duration': 356.548, 'highlights': ['The chapter covers the implementation of gradient descent in Python using PyCharm instead of Jupyter Notebook.', 'Solving for the best fit line using numpy arrays to derive the values of m and b.', 'Iterating through 1000 steps to adjust the values of m and b to reach the global minima.']}], 'duration': 410.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw915047.jpg', 'highlights': ['Iterating through 1000 steps to adjust the values of m and b to reach the global minima.', 'Partial derivatives give you a slope, and then you use a learning rate to update values in a chart or equation.', 'Using a learning rate, you update the initial value by subtracting the learning rate multiplied by the partial derivative.', 'Solving for the best fit line using numpy arrays to derive the values of m and b.', 'The chapter covers the implementation of gradient descent in Python using PyCharm instead of Jupyter Notebook.']}, {'end': 1695.346, 'segs': [{'end': 1352.89, 'src': 'embed', 'start': 1325.07, 'weight': 4, 'content': [{'end': 1334.096, 'text': 'OK Now if you want to know how well you are doing you need to print cost at each iteration.', 'start': 1325.07, 'duration': 9.026}, {'end': 1336.938, 'text': 'You should be reducing your cost.', 'start': 1334.436, 'duration': 2.502}, {'end': 1342.782, 'text': 'Right So if you remember that 3D diagram at each step you should be reducing your cost.', 'start': 1337.398, 'duration': 5.384}, {'end': 1349.127, 'text': "Sometimes if you don't write your program well and if you start increasing the cost then you are never going to find the answer.", 'start': 1343.062, 'duration': 6.065}, {'end': 1351.369, 'text': "so let's print cost.", 'start': 1349.767, 'duration': 1.602}, {'end': 1352.89, 'text': 'so what is our cost?', 'start': 1351.369, 'duration': 1.521}], 'summary': 'Track cost at each iteration to ensure reduction and successful program execution.', 'duration': 27.82, 'max_score': 1325.07, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1325070.jpg'}, {'end': 1425.206, 'src': 'embed', 'start': 1388.02, 'weight': 1, 'content': [{'end': 1399.724, 'text': "and I'm using a list comprehension here and for each of these values you want to take their square and this is to deal with the negative values.", 'start': 1388.02, 'duration': 11.704}, {'end': 1411.976, 'text': 'after that I will print the cost at each iteration and when I run this, I I can now track down the cost.', 'start': 1399.724, 'duration': 12.252}, {'end': 1416.379, 'text': 'You can see the cost is reducing in each of these steps.', 'start': 1412.016, 'duration': 4.363}, {'end': 1425.206, 'text': 'Now, how do I know when I need to stop? So I can keep on increasing my iterations.', 'start': 1418.121, 'duration': 7.085}], 'summary': 'Using list comprehension to calculate squares, tracking cost reduction in each iteration.', 'duration': 37.186, 'max_score': 1388.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1388020.jpg'}, {'end': 1543.823, 'src': 'embed', 'start': 1502.84, 'weight': 3, 'content': [{'end': 1505.361, 'text': 'so i have to be between 0.01 and 0.1.', 'start': 1502.84, 'duration': 2.521}, {'end': 1506.281, 'text': 'so how about 0.09?', 'start': 1505.361, 'duration': 0.92}, {'end': 1521.451, 'text': 'there also i am increasing, so maybe 0.08.', 'start': 1506.281, 'duration': 15.17}, {'end': 1522.452, 'text': 'Okay, that looks good.', 'start': 1521.451, 'duration': 1.001}, {'end': 1523.732, 'text': 'Here I am reducing.', 'start': 1522.572, 'duration': 1.16}, {'end': 1535.959, 'text': "All right, so I will stick with this learning rate and increase my iterations to let's say 10, 000.", 'start': 1523.752, 'duration': 12.207}, {'end': 1539.801, 'text': 'You can see that now I reached my optimum value.', 'start': 1535.959, 'duration': 3.842}, {'end': 1543.823, 'text': 'The expected value of M was two and B was three.', 'start': 1540.902, 'duration': 2.921}], 'summary': 'Adjusted learning rate to 0.08, reached optimum at 10,000 iterations with m=2 and b=3.', 'duration': 40.983, 'max_score': 1502.84, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1502840.jpg'}, {'end': 1662.615, 'src': 'embed', 'start': 1570.579, 'weight': 0, 'content': [{'end': 1576.721, 'text': 'your cost will kind of stay the same if you are using the correct learning rate.', 'start': 1570.579, 'duration': 6.142}, {'end': 1582.003, 'text': 'So here you see in all these iterations, your cost is almost remaining constant.', 'start': 1577.001, 'duration': 5.002}, {'end': 1593.946, 'text': 'So you can use floating point comparison and just compare to cost and stop whenever your cost is not reducing too much.', 'start': 1582.363, 'duration': 11.583}, {'end': 1604.279, 'text': 'I also have a visual representation of how my MNB is moving towards the best fit line.', 'start': 1595.851, 'duration': 8.428}, {'end': 1610.486, 'text': 'So we started here, and then gradually we were moving closer to those points.', 'start': 1604.66, 'duration': 5.826}, {'end': 1614.109, 'text': 'Those red points are not quite visible, but they are here, here,', 'start': 1610.706, 'duration': 3.403}, {'end': 1620.477, 'text': 'And you can see that gradually I am reaching more and more closer towards those points.', 'start': 1615.07, 'duration': 5.407}, {'end': 1625.282, 'text': 'So you can use this Jupyter Notebook for visualization purpose.', 'start': 1620.977, 'duration': 4.305}, {'end': 1628.947, 'text': "And now we'll move into our exercise section.", 'start': 1625.883, 'duration': 3.064}, {'end': 1639.736, 'text': 'So the problem that you have to solve today is you are given the Mathematics and Computer Science test scores for all these students.', 'start': 1629.688, 'duration': 10.048}, {'end': 1644.457, 'text': 'And you have to find out the correlation between the Math score and Computer Science score.', 'start': 1640.216, 'duration': 4.241}, {'end': 1653.821, 'text': 'So in summary, B is your X and Computer Science score, which is column C, as your Y.', 'start': 1644.838, 'duration': 8.983}, {'end': 1662.615, 'text': 'using this, you will find a value of m and b by applying gradient descent algorithm,', 'start': 1654.93, 'duration': 7.685}], 'summary': 'Using correct learning rate keeps cost constant. visualizes mnb moving towards best fit line. solving correlation problem using gradient descent.', 'duration': 92.036, 'max_score': 1570.579, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1570579.jpg'}, {'end': 1695.346, 'src': 'heatmap', 'start': 1671.621, 'weight': 0.778, 'content': [{'end': 1681.787, 'text': 'and to compare the threshold we are going to use a math dot is close function and use a tolerance of 1 e raised to minus 20..', 'start': 1671.621, 'duration': 10.166}, {'end': 1686.614, 'text': 'okay. so if your two costs are in this range,', 'start': 1681.787, 'duration': 4.827}, {'end': 1695.346, 'text': 'then you have to stop your for loop and you have to tell me how many iteration you need to figure out the value of m and b.', 'start': 1686.614, 'duration': 8.732}], 'summary': 'Using a math dot is close function with a tolerance of 1e-20 to compare thresholds and determine iterations needed for m and b values.', 'duration': 23.725, 'max_score': 1671.621, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1671621.jpg'}], 'start': 1325.07, 'title': 'Cost monitoring and gradient descent algorithm', 'summary': 'Emphasizes reducing cost in iterative processes and highlights cost calculation methods. it also covers the implementation and optimization of the gradient descent algorithm, showcasing cost reduction tracking, learning rate manipulation, and application to find correlations.', 'chapters': [{'end': 1388.02, 'start': 1325.07, 'title': 'Monitoring cost in iterative process', 'summary': 'Emphasizes the importance of reducing cost at each iteration in a program and highlights the calculation of cost as 1/n times sum of squared differences between actual and predicted values.', 'duration': 62.95, 'highlights': ['The cost should be reduced at each iteration, as depicted in a 3D diagram.', 'The calculation of cost involves the formula: 1/n * sum((y - y_predicted)^2).', 'Inefficient programming leading to an increase in cost can hinder finding the answer.']}, {'end': 1695.346, 'start': 1388.02, 'title': 'Approaching gradient descent algorithm', 'summary': 'Covers the implementation and optimization of the gradient descent algorithm to find the optimal values for m and b, showcasing the process of tracking cost reduction, manipulating learning rates, and visualizing the movement of m and b towards the best-fit line, culminating in the application of the algorithm to find the correlation between math and computer science test scores.', 'duration': 307.326, 'highlights': ['The process of tracking cost reduction at each iteration is demonstrated, showing a consistent decrease in cost, ultimately reaching the expected values of M (two) and B (three).', 'The manipulation of learning rates is illustrated with iterative adjustments, showcasing the impact on cost reduction, emphasizing the need to select an appropriate learning rate to avoid overshooting the global minima.', 'Visual representation of the movement of M and B towards the best-fit line is provided, demonstrating the gradual convergence towards the optimal values through visualization using Jupyter Notebook.', 'The application of the gradient descent algorithm to find the correlation between Math and Computer Science test scores is outlined, emphasizing the comparison of costs between iterations to determine convergence and the utilization of a math dot is close function with a tolerance of 1e-20 to set the stopping condition for the algorithm.']}], 'duration': 370.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vsWrXfO3wWw/pics/vsWrXfO3wWw1325070.jpg', 'highlights': ['The application of the gradient descent algorithm to find the correlation between Math and Computer Science test scores is outlined, emphasizing the comparison of costs between iterations to determine convergence and the utilization of a math dot is close function with a tolerance of 1e-20 to set the stopping condition for the algorithm.', 'The manipulation of learning rates is illustrated with iterative adjustments, showcasing the impact on cost reduction, emphasizing the need to select an appropriate learning rate to avoid overshooting the global minima.', 'The process of tracking cost reduction at each iteration is demonstrated, showing a consistent decrease in cost, ultimately reaching the expected values of M (two) and B (three).', 'The cost should be reduced at each iteration, as depicted in a 3D diagram.', 'The calculation of cost involves the formula: 1/n * sum((y - y_predicted)^2).', 'Inefficient programming leading to an increase in cost can hinder finding the answer.', 'Visual representation of the movement of M and B towards the best-fit line is provided, demonstrating the gradual convergence towards the optimal values through visualization using Jupyter Notebook.']}], 'highlights': ['Practical example of deriving an equation for predicting home prices.', 'Visualizations demonstrate lines with varying M and B values and the step-wise reduction process.', 'The application of the gradient descent algorithm to find the correlation between Math and Computer Science test scores is outlined, emphasizing the comparison of costs between iterations to determine convergence and the utilization of a math dot is close function with a tolerance of 1e-20 to set the stopping condition for the algorithm.', 'Iterating through 1000 steps to adjust the values of m and b to reach the global minima.', 'Deriving best fit line equation from input and output data points', 'The process involves gradually reducing the values of M and B until the minimum error point is reached.', 'The chapter emphasizes the implementation of gradient descent in a Python program.', 'The process of tracking cost reduction at each iteration is demonstrated, showing a consistent decrease in cost, ultimately reaching the expected values of M (two) and B (three).', 'Using a learning rate, you update the initial value by subtracting the learning rate multiplied by the partial derivative.', 'The importance of adjusting step sizes and slopes to reach the global minimum.', 'The initial cost is reduced by taking mini steps, gradually decreasing the error until the minimum point.', 'Explanation of partial derivatives for functions with two variables, x and y, and the notations used for partial derivatives.', 'The calculation of cost involves the formula: 1/n * sum((y - y_predicted)^2).', 'The concept of slope is explained as a change in y divided by change in x, and the slope between two points is calculated as 24 dr by 15.', 'The chapter covers the implementation of gradient descent in Python using PyCharm instead of Jupyter Notebook.', 'The derivative of x square is 2x, with the slope of the function being 4.']}