title
Linear Regression

description

detail
{'title': 'Linear Regression', 'heatmap': [{'end': 1368.065, 'start': 1343.776, 'weight': 0.819}, {'end': 1666.726, 'start': 1645.208, 'weight': 1}, {'end': 1771.635, 'start': 1679.431, 'weight': 0.712}, {'end': 2095.815, 'start': 2047.293, 'weight': 0.722}], 'summary': 'On linear regression covers supervised learning, fitting linear and polynomial functions, house price prediction, least squares regression, lms and gradient descent methods, and update rules for beta j in the context of machine learning.', 'chapters': [{'end': 504.663, 'segs': [{'end': 80.007, 'src': 'embed', 'start': 45.374, 'weight': 0, 'content': [{'end': 48.337, 'text': 'We have already explained what is regression.', 'start': 45.374, 'duration': 2.963}, {'end': 54.276, 'text': 'So, regression is a supervised learning problem.', 'start': 49.198, 'duration': 5.078}, {'end': 66.533, 'text': 'where you are given examples of instances whose x and y value are given and you have to learn a function.', 'start': 56.983, 'duration': 9.55}, {'end': 71.978, 'text': 'So, that given an unknown x you have to predict y.', 'start': 67.073, 'duration': 4.905}, {'end': 80.007, 'text': 'So, you want a function which predicts given x predicts y and for regression y is continuous.', 'start': 71.978, 'duration': 8.029}], 'summary': 'Regression is a supervised learning problem where a function predicts y given x, and for regression, y is continuous.', 'duration': 34.633, 'max_score': 45.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy845374.jpg'}, {'end': 206.106, 'src': 'embed', 'start': 170.482, 'weight': 1, 'content': [{'end': 183.886, 'text': 'for example, the function could be like this, or the function could be like this, or the function could be like this.', 'start': 170.482, 'duration': 13.404}, {'end': 189.571, 'text': 'So, there are different types of functions which are possible.', 'start': 186.468, 'duration': 3.103}, {'end': 201.422, 'text': 'In linear regression, we assume that the function is linear like this blue line here, and out of the different linear functions possible,', 'start': 190.171, 'duration': 11.251}, {'end': 206.106, 'text': 'we want to find one that optimizes certain criteria.', 'start': 201.422, 'duration': 4.684}], 'summary': 'Linear regression aims to optimize a linear function among different options.', 'duration': 35.624, 'max_score': 170.482, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8170482.jpg'}, {'end': 319.039, 'src': 'embed', 'start': 289.621, 'weight': 2, 'content': [{'end': 297.668, 'text': 'And we can take the squared distance from each point to the line and take the sum of squared errors.', 'start': 289.621, 'duration': 8.047}, {'end': 304.694, 'text': 'The sum of squared errors is one measure of error and this is one of the popular measures of error.', 'start': 298.388, 'duration': 6.306}, {'end': 313.737, 'text': 'and we could try to find that function for which this sum of squared errors is minimized,', 'start': 305.354, 'duration': 8.383}, {'end': 319.039, 'text': 'assuming that after we have assumed that this function comes from a particular class.', 'start': 313.737, 'duration': 5.302}], 'summary': 'Measuring error using sum of squared errors, popular for finding minimized function within a specific class.', 'duration': 29.418, 'max_score': 289.621, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8289621.jpg'}, {'end': 504.663, 'src': 'embed', 'start': 417.434, 'weight': 3, 'content': [{'end': 426.616, 'text': 'In the second figure, the fit is slightly better with respect to, if you look at the sum of squared errors, this is highest in the first figure,', 'start': 417.434, 'duration': 9.182}, {'end': 433.157, 'text': 'lower in the second figure, lower in the third figure, 0 in the fourth figure.', 'start': 426.616, 'duration': 6.541}, {'end': 443.657, 'text': 'In the fourth figure where we fit a ninth degree polynomial, we are able have the function pass through all the training examples.', 'start': 433.497, 'duration': 10.16}, {'end': 453.479, 'text': 'So the sum of squared error of the training on the training example is 0, but remember what we talked in the last class.', 'start': 444.873, 'duration': 8.606}, {'end': 458.102, 'text': 'what we are interested in is finding the in a.', 'start': 453.479, 'duration': 4.623}, {'end': 467.908, 'text': 'what is interested in is minimizing the error on future examples, or minimizing the error on all examples according to the distribution.', 'start': 458.102, 'duration': 9.806}, {'end': 483.836, 'text': 'fitted the line the red line to all the points, this function does not really correspond to the green line.', 'start': 475.073, 'duration': 8.763}, {'end': 487.117, 'text': 'So, for other points the error may be higher.', 'start': 484.216, 'duration': 2.901}, {'end': 490.938, 'text': 'If you look at the third diagram,', 'start': 487.977, 'duration': 2.961}, {'end': 504.663, 'text': 'this function seems to have fit the points much better and we can expect that within this range the fit to the green line will be smaller.', 'start': 490.938, 'duration': 13.725}], 'summary': 'Fit improves with higher figure, lowest sum of squared error in 4th figure', 'duration': 87.229, 'max_score': 417.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8417434.jpg'}], 'start': 18.834, 'title': 'Linear regression and functions', 'summary': 'Introduces linear regression as a supervised learning problem to predict continuous values, discusses the fitting of linear and polynomial functions to data points, and emphasizes the importance of minimizing errors on future examples or according to the distribution.', 'chapters': [{'end': 319.039, 'start': 18.834, 'title': 'Introduction to linear regression', 'summary': 'Introduces linear regression as a supervised learning problem to predict continuous values, discusses the types of functions used, and highlights the optimization of criteria to find the best linear function.', 'duration': 300.205, 'highlights': ['Linear regression as a supervised learning problem to predict continuous values Regression is explained as a supervised learning problem where a function is learned to predict y given x, with the aim of finding a function that predicts y given x, and for regression, y is continuous.', 'Discussion on types of functions used for regression The chapter explains that for regression, there are different types of functions possible, such as linear, and discusses the assumption in linear regression that the function is linear and aims to find one that optimizes certain criteria.', 'Optimization of criteria to find the best linear function The chapter highlights the goal of finding a linear function that optimizes certain criteria, emphasizing the use of the sum of squared errors as a popular measure of error and the attempt to minimize it to find the best function.']}, {'end': 504.663, 'start': 319.079, 'title': 'Linear and polynomial functions', 'summary': 'Discusses the fitting of linear and polynomial functions to data points, comparing their sum of squared errors and emphasizing the importance of minimizing errors on future examples or according to the distribution.', 'duration': 185.584, 'highlights': ['The fitting of a ninth degree polynomial to the data points results in a sum of squared error of 0, indicating a perfect fit to the training examples.', 'The linear function in the second figure has a lower sum of squared error compared to the first figure, demonstrating a slightly better fit to the data points.', 'The cubic function in the third diagram seems to fit the points much better, suggesting a smaller error in the fit to the green line within a certain range.', 'The importance of minimizing errors on future examples or according to the distribution is emphasized to ensure a better correspondence between the fitted function and the actual relationship represented by the data points.']}], 'duration': 485.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy818834.jpg', 'highlights': ['Regression is explained as a supervised learning problem where a function is learned to predict y given x, with the aim of finding a function that predicts y given x, and for regression, y is continuous.', 'The chapter explains that for regression, there are different types of functions possible, such as linear, and discusses the assumption in linear regression that the function is linear and aims to find one that optimizes certain criteria.', 'The chapter highlights the goal of finding a linear function that optimizes certain criteria, emphasizing the use of the sum of squared errors as a popular measure of error and the attempt to minimize it to find the best function.', 'The fitting of a ninth degree polynomial to the data points results in a sum of squared error of 0, indicating a perfect fit to the training examples.', 'The linear function in the second figure has a lower sum of squared error compared to the first figure, demonstrating a slightly better fit to the data points.', 'The cubic function in the third diagram seems to fit the points much better, suggesting a smaller error in the fit to the green line within a certain range.', 'The importance of minimizing errors on future examples or according to the distribution is emphasized to ensure a better correspondence between the fitted function and the actual relationship represented by the data points.']}, {'end': 843.82, 'segs': [{'end': 574.331, 'src': 'embed', 'start': 507.926, 'weight': 0, 'content': [{'end': 513.289, 'text': 'So, we have to keep this in mind when we try to come up with a function.', 'start': 507.926, 'duration': 5.363}, {'end': 523.428, 'text': 'Now regression models as we said in regression models we can talk about a single variable.', 'start': 515.686, 'duration': 7.742}, {'end': 534.011, 'text': 'So, x can be a single variable then we call it simple regression or x can be multiple variables then we call it multiple regression.', 'start': 524.188, 'duration': 9.823}, {'end': 542.093, 'text': 'Now for each of this the function that we define may be a linear function or a non-linear function.', 'start': 535.011, 'duration': 7.082}, {'end': 553.602, 'text': 'Today we will talk about linear regression where we use a linear function in order to fit the training examples that we have got.', 'start': 542.657, 'duration': 10.945}, {'end': 566.008, 'text': 'So, in linear regression we are given an input x and we have to compute y and we have training examples which are given to us.', 'start': 554.843, 'duration': 11.165}, {'end': 574.331, 'text': 'So, we have to find a straight line function.', 'start': 568.569, 'duration': 5.762}], 'summary': 'Introduction to linear regression models and the use of a linear function to fit training examples.', 'duration': 66.405, 'max_score': 507.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8507926.jpg'}, {'end': 670.08, 'src': 'embed', 'start': 629.833, 'weight': 3, 'content': [{'end': 642.263, 'text': 'Now, in this function a linear function has certain parameters as you know that a line can be characterized by the slope.', 'start': 629.833, 'duration': 12.43}, {'end': 650.826, 'text': 'and if we extend this line plus intercept with the x axis.', 'start': 645.263, 'duration': 5.563}, {'end': 660.692, 'text': 'So, we can specify a line by these two parameters the slope and the intercept with the x or the y axis.', 'start': 651.587, 'duration': 9.105}, {'end': 670.08, 'text': 'So if we take the equation of the line as y equal to beta 0 plus beta 1 x,', 'start': 661.192, 'duration': 8.888}], 'summary': 'A linear function can be characterized by its slope and intercept with the x or y axis.', 'duration': 40.247, 'max_score': 629.833, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8629833.jpg'}, {'end': 728.586, 'src': 'embed', 'start': 703.364, 'weight': 4, 'content': [{'end': 710.33, 'text': 'Suppose the points are generated so we can assume that there is some noise in the data.', 'start': 703.364, 'duration': 6.966}, {'end': 714.794, 'text': 'because the data may not be, we may not be able to fit the data with a straight line.', 'start': 710.33, 'duration': 4.464}, {'end': 719.991, 'text': 'So, we can assume that there is an error.', 'start': 715.194, 'duration': 4.797}, {'end': 728.586, 'text': 'So, y equal to beta 0 plus beta 1 x plus epsilon this is the function from which the points are generated.', 'start': 720.472, 'duration': 8.114}], 'summary': 'Data points generated with error, y = β0 + β1x + ε', 'duration': 25.222, 'max_score': 703.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8703364.jpg'}, {'end': 843.82, 'src': 'embed', 'start': 798.635, 'weight': 5, 'content': [{'end': 814.925, 'text': 'So beta 0 is the y intercept of this actual line and we can say beta 1 is the slope of the line or we can call that beta 1 is the population slope and epsilon is a random error.', 'start': 798.635, 'duration': 16.29}, {'end': 828.587, 'text': 'Now, let us look at an example.', 'start': 825.945, 'duration': 2.642}, {'end': 842.679, 'text': 'So this is the linear regression model where we have this relation between the variables, which is a linear function y, equal to beta 0 plus beta 1,', 'start': 830.949, 'duration': 11.73}, {'end': 843.82, 'text': 'x plus epsilon.', 'start': 842.679, 'duration': 1.141}], 'summary': 'Linear regression model: y = beta0 + beta1*x + epsilon.', 'duration': 45.185, 'max_score': 798.635, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8798635.jpg'}], 'start': 507.926, 'title': 'Linear regression concepts', 'summary': 'Introduces the basics and model of linear regression, emphasizing its use in fitting training examples and explaining parameters, with examples like age from height and house price from area.', 'chapters': [{'end': 628.672, 'start': 507.926, 'title': 'Linear regression basics', 'summary': 'Introduces the concept of linear regression, highlighting its use in fitting training examples with a linear function to predict values like age from height, house price from area, and sensor distance from sensor value.', 'duration': 120.746, 'highlights': ['Linear regression involves fitting a linear function to training examples, such as predicting age from height, house price from area, and sensor distance from sensor value.', 'Regression models can involve a single variable (simple regression) or multiple variables (multiple regression).', 'The function defined in regression models may be linear or non-linear.']}, {'end': 843.82, 'start': 629.833, 'title': 'Linear regression model', 'summary': 'Explains the concept of linear regression, detailing the parameters of a linear function, the generation of data points with error, and the underlying function used for data generation.', 'duration': 213.987, 'highlights': ['The parameters of a line are characterized by the slope and intercept with the x or y axis, represented as y = beta0 + beta1x.', 'The function for generating points is y = beta0 + beta1x + epsilon, where epsilon represents the random error associated with the data.', 'The linear regression model establishes a relation between the variables through the equation y = beta0 + beta1x + epsilon.']}], 'duration': 335.894, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8507926.jpg', 'highlights': ['Linear regression involves fitting a linear function to training examples, such as predicting age from height, house price from area, and sensor distance from sensor value.', 'The function defined in regression models may be linear or non-linear.', 'Regression models can involve a single variable (simple regression) or multiple variables (multiple regression).', 'The parameters of a line are characterized by the slope and intercept with the x or y axis, represented as y = beta0 + beta1x.', 'The function for generating points is y = beta0 + beta1x + epsilon, where epsilon represents the random error associated with the data.', 'The linear regression model establishes a relation between the variables through the equation y = beta0 + beta1x + epsilon.']}, {'end': 1356.466, 'segs': [{'end': 1107.012, 'src': 'embed', 'start': 1058.826, 'weight': 0, 'content': [{'end': 1068.508, 'text': 'Given the data points, we are trying to find out the equation of the line that is an estimated value of each of these parameters.', 'start': 1058.826, 'duration': 9.682}, {'end': 1079.19, 'text': 'So, we are trying to come up with beta 0 hat, beta 1 hat, beta 2 hat, beta p hat.', 'start': 1068.888, 'duration': 10.302}, {'end': 1086.691, 'text': 'So, that the equation that we get is like this.', 'start': 1081.15, 'duration': 5.541}, {'end': 1107.012, 'text': 'So, this is the equation that we are trying to come up with as an estimator for the actual target function.', 'start': 1092.964, 'duration': 14.048}], 'summary': 'Finding equation of line using estimated parameters.', 'duration': 48.186, 'max_score': 1058.826, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81058826.jpg'}, {'end': 1201.254, 'src': 'embed', 'start': 1141.762, 'weight': 1, 'content': [{'end': 1155.187, 'text': 'So, if you want to minimize the sum of squared errors and based on that we come up with values of beta 0 hat, beta 1 hat, beta 2 hat, beta p hat.', 'start': 1141.762, 'duration': 13.425}, {'end': 1158.909, 'text': 'this particular equation is called the least square line.', 'start': 1155.187, 'duration': 3.722}, {'end': 1167.752, 'text': 'So, we will see that given the training points how we can come up with the least square line.', 'start': 1159.789, 'duration': 7.963}, {'end': 1178.079, 'text': 'So, let me just rub the board.', 'start': 1176.628, 'duration': 1.451}, {'end': 1201.254, 'text': 'Now the data that we have may not form a perfect line.', 'start': 1197.512, 'duration': 3.742}], 'summary': 'Minimize sum of squared errors to find least square line for training points.', 'duration': 59.492, 'max_score': 1141.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81141762.jpg'}, {'end': 1356.466, 'src': 'embed', 'start': 1315.244, 'weight': 2, 'content': [{'end': 1320.969, 'text': 'Further we will make the assumption that the errors are independent.', 'start': 1315.244, 'duration': 5.725}, {'end': 1336.89, 'text': 'that is epsilon 1, epsilon 2, epsilon n they are independent of each other and we can also assume that these errors are normally distributed.', 'start': 1324.56, 'duration': 12.33}, {'end': 1349.501, 'text': 'They are normally distributed with mean 0 and standard deviation sigma e.', 'start': 1343.776, 'duration': 5.725}, {'end': 1356.466, 'text': 'So, this sort of noise is called Gaussian noise or white noise.', 'start': 1351.353, 'duration': 5.113}], 'summary': 'Errors are assumed to be independent, normally distributed with mean 0 and standard deviation sigma e.', 'duration': 41.222, 'max_score': 1315.244, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81315244.jpg'}], 'start': 844.401, 'title': 'House price prediction with linear regression', 'summary': 'Explains the process of predicting house selling prices using linear regression by fitting a line to 15 examples of house size and selling price. it also introduces the concept of linear regression with multiple variables, emphasizing the process of finding the equation of the line and optimizing the values of beta 0, beta 1, and beta p.', 'chapters': [{'end': 932.169, 'start': 844.401, 'title': 'House price prediction with linear regression', 'summary': 'Explains the process of predicting house selling prices using linear regression by finding the equation of a line to fit 15 examples of house size and selling price.', 'duration': 87.768, 'highlights': ['By plotting the house size against the selling price, we can try to find a fit to this function to predict selling prices (15 examples provided).', 'The objective is to predict the selling price given the area of the house, aiming to find the equation of a line for simple linear regression.']}, {'end': 1356.466, 'start': 932.169, 'title': 'Linear regression and least square line', 'summary': 'Introduces the concept of linear regression with multiple variables and the least square line, emphasizing the process of finding the equation of the line and the assumptions made about the errors, with a focus on minimizing the sum of squared errors and optimizing the values of beta 0, beta 1, and beta p.', 'duration': 424.297, 'highlights': ['The process of finding the equation of the line involves estimating the values of beta 0, beta 1, beta 2, beta p, which form the estimator for the actual target function. The estimation involves finding the equation of the line by determining the values of beta 0 hat, beta 1 hat, beta 2 hat, beta p hat, which serves as an estimator for the actual target function.', 'The chapter emphasizes the optimization process to minimize the sum of squared errors and derive the least square line, which aims to find the values of beta 0 hat, beta 1 hat, beta 2 hat, beta p hat, thereby introducing the concept of the least square line. The optimization process focuses on minimizing the sum of squared errors to derive the least square line, which involves finding the values of beta 0 hat, beta 1 hat, beta 2 hat, beta p hat, representing the concept of the least square line.', 'The assumptions made about the errors include that the errors have a mean of 0, a standard deviation of sigma epsilon, are independent, and follow a normal distribution, specifically Gaussian noise or white noise. The assumptions about the errors involve them having a mean of 0, a standard deviation of sigma epsilon, independence from each other, and following a normal distribution, particularly Gaussian or white noise.']}], 'duration': 512.065, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy8844401.jpg', 'highlights': ['The process involves estimating the values of beta 0, beta 1, beta 2, beta p, which form the estimator for the actual target function.', 'The optimization process focuses on minimizing the sum of squared errors to derive the least square line.', 'The assumptions about the errors involve them having a mean of 0, a standard deviation of sigma epsilon, independence from each other, and following a normal distribution.']}, {'end': 1666.726, 'segs': [{'end': 1426.714, 'src': 'embed', 'start': 1362.481, 'weight': 0, 'content': [{'end': 1368.065, 'text': 'As we said, that now, given the training points the blue are the training points.', 'start': 1362.481, 'duration': 5.584}, {'end': 1379.092, 'text': 'in this picture we have come up with a line and for that line we can find out what is the sum of squared errors with respect to the blue training points.', 'start': 1368.065, 'duration': 11.027}, {'end': 1388.299, 'text': 'Out of all possible lines, so the different lines are parameterized by the values of beta 0 and beta 1.', 'start': 1380.073, 'duration': 8.226}, {'end': 1392.931, 'text': 'For each pair of values beta 0 beta 1 will have one line.', 'start': 1388.299, 'duration': 4.632}, {'end': 1398.779, 'text': 'We want to find that line for which the sum of squared errors is minimum.', 'start': 1393.212, 'duration': 5.567}, {'end': 1404.684, 'text': 'So, that is called the least squares regression line.', 'start': 1400.822, 'duration': 3.862}, {'end': 1419.17, 'text': 'The least squares regression line gives the unique line such that the sum of the squared vertical distances between the data points and the line is the smallest possible.', 'start': 1405.344, 'duration': 13.826}, {'end': 1426.714, 'text': 'So, we will find out how to choose this line and that is the algorithm that we will develop.', 'start': 1419.971, 'duration': 6.743}], 'summary': 'Finding the least squares regression line for minimum sum of squared errors.', 'duration': 64.233, 'max_score': 1362.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81362481.jpg'}, {'end': 1551.13, 'src': 'embed', 'start': 1518.567, 'weight': 2, 'content': [{'end': 1529.317, 'text': 'So this is the residue that we have when we make this assumption of the values of beta 0 and beta 1, and we want to find beta 0 and beta 1,', 'start': 1518.567, 'duration': 10.75}, {'end': 1532.559, 'text': 'so that the sum of the squares errors in minimum.', 'start': 1529.317, 'duration': 3.242}, {'end': 1537.544, 'text': 'Now, we have to find out how to learn the parameters.', 'start': 1533.981, 'duration': 3.563}, {'end': 1551.13, 'text': 'So, how to learn the parameters? Here the parameters are beta 0 and beta 1 and we want to find out the estimated value of these parameters.', 'start': 1541.741, 'duration': 9.389}], 'summary': 'Finding beta 0 and beta 1 to minimize sum of squares errors and learn parameters', 'duration': 32.563, 'max_score': 1518.567, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81518567.jpg'}, {'end': 1666.726, 'src': 'heatmap', 'start': 1602.152, 'weight': 3, 'content': [{'end': 1612.032, 'text': 'we can use a standard procedure of taking partial derivative of the objective function that the one that we have written on board here we want to.', 'start': 1602.152, 'duration': 9.88}, {'end': 1620.699, 'text': 'if you take the partial derivative of this function with respect to the coefficients beta 0 and beta 1 and set this to 0 by solving,', 'start': 1612.032, 'duration': 8.667}, {'end': 1625.263, 'text': 'we will get the values of beta 0 and beta 1 as given in this slide.', 'start': 1620.699, 'duration': 4.564}, {'end': 1634.831, 'text': 'So, beta 0 is sigma y minus beta 1 sigma x divided by n and beta 1 is n sigma x y minus sigma x sigma y etcetera.', 'start': 1625.783, 'duration': 9.048}, {'end': 1642.262, 'text': 'So, this derivation I am not doing in the class, but this is what you can do and this is the closed form solution that you can get.', 'start': 1635.271, 'duration': 6.991}, {'end': 1647.55, 'text': 'Now, let us come to multiple linear regression.', 'start': 1645.208, 'duration': 2.342}, {'end': 1656.938, 'text': 'In multiple linear regression you have y equal to beta 0 plus beta 1 x 1 plus beta n x n or you can write h x equal to sigma beta i x i.', 'start': 1647.63, 'duration': 9.308}, {'end': 1662.942, 'text': 'You can here also find a closed form solution and the closed form solution will involve matrix operations.', 'start': 1656.938, 'duration': 6.004}, {'end': 1666.726, 'text': 'So the matrix, inversion, etcetera, which are involved in this,', 'start': 1663.003, 'duration': 3.723}], 'summary': 'Using partial derivatives, we can find closed form solutions for beta values in linear and multiple linear regression models.', 'duration': 45.398, 'max_score': 1602.152, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81602152.jpg'}], 'start': 1362.481, 'title': 'Regression analysis', 'summary': 'Covers finding the least squares regression line and regression parameters beta 0 and beta 1, extending to multiple linear regression, with a focus on minimizing errors and employing iterative algorithms.', 'chapters': [{'end': 1516.95, 'start': 1362.481, 'title': 'Least squares regression line', 'summary': 'Discusses the concept of finding the least squares regression line, which minimizes the sum of squared errors between data points and the line, and the algorithm to choose this line given the training points.', 'duration': 154.469, 'highlights': ['The least squares regression line gives the unique line such that the sum of the squared vertical distances between the data points and the line is the smallest possible. It explains that the least squares regression line minimizes the sum of squared errors between data points and the line, ensuring the smallest possible vertical distances.', 'We want to minimize the sum of squared errors, given the data points x, i, y, i. The objective is to minimize the sum of squared errors between the data points, x, i, and y, i.', 'For each pair of values beta 0 beta 1 will have one line. It mentions that each pair of values beta 0 and beta 1 corresponds to one line, indicating the parameterization of different lines.']}, {'end': 1666.726, 'start': 1518.567, 'title': 'Finding regression parameters', 'summary': 'Discusses the process of finding regression parameters beta 0 and beta 1, including closed form solutions and iterative algorithms for minimization, and extends the concept to multiple linear regression involving matrix operations.', 'duration': 148.159, 'highlights': ['The chapter explains the process of finding beta 0 and beta 1 to minimize the sum of squares errors, offering both closed form solutions and iterative algorithms for learning the parameters.', 'It introduces the closed form solution for the two-dimensional problem where beta 0 is derived as sigma y minus beta 1 sigma x divided by n and beta 1 is n sigma x y minus sigma x sigma y, and further extends to multiple linear regression involving matrix operations for closed form solutions.', 'It emphasizes the use of partial derivatives to solve for the values of beta 0 and beta 1 in the objective function, providing the closed form solution for beta 0 and beta 1 as derived equations involving sigma, n, and x and y variables.']}], 'duration': 304.245, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81362481.jpg', 'highlights': ['The least squares regression line minimizes the sum of squared errors between data points and the line.', 'Each pair of values beta 0 and beta 1 corresponds to one line, indicating the parameterization of different lines.', 'The chapter explains the process of finding beta 0 and beta 1 to minimize the sum of squares errors, offering both closed form solutions and iterative algorithms for learning the parameters.', 'It introduces the closed form solution for the two-dimensional problem where beta 0 is derived as sigma y minus beta 1 sigma x divided by n and beta 1 is n sigma x y minus sigma x sigma y, and further extends to multiple linear regression involving matrix operations for closed form solutions.', 'It emphasizes the use of partial derivatives to solve for the values of beta 0 and beta 1 in the objective function, providing the closed form solution for beta 0 and beta 1 as derived equations involving sigma, n, and x and y variables.']}, {'end': 1951.627, 'segs': [{'end': 1771.635, 'src': 'heatmap', 'start': 1679.431, 'weight': 0.712, 'content': [{'end': 1687.433, 'text': 'One popular well known method is the using the delta rule which is also called the LMS method.', 'start': 1679.431, 'duration': 8.002}, {'end': 1697.054, 'text': 'It is the same method LMS which stands for least minimum slope.', 'start': 1687.773, 'duration': 9.281}, {'end': 1699.455, 'text': 'So, this LMS method can be used.', 'start': 1697.215, 'duration': 2.24}, {'end': 1711.314, 'text': 'This LMS method or the delta method will update the beta 0, beta 1 etcetera weight values to minimize the sum of squared errors.', 'start': 1701.912, 'duration': 9.402}, {'end': 1713.955, 'text': 'So, let us look at how it is done.', 'start': 1712.334, 'duration': 1.621}, {'end': 1719.236, 'text': 'So, we assume that this is the form of the function and we want to learn the parameters theta.', 'start': 1714.395, 'duration': 4.841}, {'end': 1722.417, 'text': 'Here the parameters are beta 0, beta 1, beta 2 etcetera.', 'start': 1719.336, 'duration': 3.081}, {'end': 1724.937, 'text': 'So, we want to learn a function h x.', 'start': 1723.257, 'duration': 1.68}, {'end': 1729.099, 'text': 'So, we want to learn y i.', 'start': 1727.458, 'duration': 1.641}, {'end': 1739.863, 'text': 'So, we want to learn h x which is beta 0 plus sigma beta i x i and we define a cost function j theta based on minimizing the sum of squared errors.', 'start': 1729.099, 'duration': 10.764}, {'end': 1747.285, 'text': 'So, j theta is sigma h x minus y whole square over all the training examples.', 'start': 1739.943, 'duration': 7.342}, {'end': 1749.466, 'text': 'So, this is the cost function that we had written.', 'start': 1747.465, 'duration': 2.001}, {'end': 1757.349, 'text': 'So this is the cost function which we want to minimize, and this function is a parameter of beta 0 and beta 1,', 'start': 1749.946, 'duration': 7.403}, {'end': 1760.711, 'text': 'and we want to find beta 0 and beta 1 to minimize this function.', 'start': 1757.349, 'duration': 3.362}, {'end': 1764.973, 'text': 'Now in the LMS algorithm.', 'start': 1763.552, 'duration': 1.421}, {'end': 1771.635, 'text': 'what we do is that we start with the initial value of theta, that is, initial value of beta 0, beta 1,', 'start': 1764.973, 'duration': 6.662}], 'summary': 'The lms method updates weight values to minimize sum of squared errors in function learning.', 'duration': 92.204, 'max_score': 1679.431, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81679431.jpg'}, {'end': 1757.349, 'src': 'embed', 'start': 1701.912, 'weight': 0, 'content': [{'end': 1711.314, 'text': 'This LMS method or the delta method will update the beta 0, beta 1 etcetera weight values to minimize the sum of squared errors.', 'start': 1701.912, 'duration': 9.402}, {'end': 1713.955, 'text': 'So, let us look at how it is done.', 'start': 1712.334, 'duration': 1.621}, {'end': 1719.236, 'text': 'So, we assume that this is the form of the function and we want to learn the parameters theta.', 'start': 1714.395, 'duration': 4.841}, {'end': 1722.417, 'text': 'Here the parameters are beta 0, beta 1, beta 2 etcetera.', 'start': 1719.336, 'duration': 3.081}, {'end': 1724.937, 'text': 'So, we want to learn a function h x.', 'start': 1723.257, 'duration': 1.68}, {'end': 1729.099, 'text': 'So, we want to learn y i.', 'start': 1727.458, 'duration': 1.641}, {'end': 1739.863, 'text': 'So, we want to learn h x which is beta 0 plus sigma beta i x i and we define a cost function j theta based on minimizing the sum of squared errors.', 'start': 1729.099, 'duration': 10.764}, {'end': 1747.285, 'text': 'So, j theta is sigma h x minus y whole square over all the training examples.', 'start': 1739.943, 'duration': 7.342}, {'end': 1749.466, 'text': 'So, this is the cost function that we had written.', 'start': 1747.465, 'duration': 2.001}, {'end': 1757.349, 'text': 'So this is the cost function which we want to minimize, and this function is a parameter of beta 0 and beta 1,', 'start': 1749.946, 'duration': 7.403}], 'summary': 'Using lms or delta method to minimize sum of squared errors in parameter learning.', 'duration': 55.437, 'max_score': 1701.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81701912.jpg'}, {'end': 1844.855, 'src': 'embed', 'start': 1815.456, 'weight': 1, 'content': [{'end': 1822.118, 'text': 'and if we keep on reducing, this will ultimately reach the minima of the function.', 'start': 1815.456, 'duration': 6.662}, {'end': 1827.159, 'text': 'So, the method that we follow is called a gradient descent method.', 'start': 1822.858, 'duration': 4.301}, {'end': 1828.86, 'text': 'Suppose. this is a function.', 'start': 1827.199, 'duration': 1.661}, {'end': 1836.001, 'text': 'we start with some place here and then what we do is that we find the gradient at this point.', 'start': 1828.86, 'duration': 7.141}, {'end': 1838.262, 'text': 'we find the gradient of the curve at this point.', 'start': 1836.001, 'duration': 2.261}, {'end': 1844.855, 'text': 'and then we take a small step in the negative direction of the gradient.', 'start': 1839.733, 'duration': 5.122}], 'summary': 'Using gradient descent, we reduce function to reach the minima.', 'duration': 29.399, 'max_score': 1815.456, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81815456.jpg'}, {'end': 1898.767, 'src': 'embed', 'start': 1870.221, 'weight': 3, 'content': [{'end': 1877.363, 'text': 'And we want to take a small step in the negative direction of the gradient and alpha here is the step size.', 'start': 1870.221, 'duration': 7.142}, {'end': 1885.304, 'text': 'Alpha is small and alpha determines if alpha is larger we take larger steps, if alpha is small we take smaller steps.', 'start': 1877.583, 'duration': 7.721}, {'end': 1898.767, 'text': 'So this is the initial value of beta and we make a small change to beta in the negative direction of the partial derivative of j theta with respect to beta.', 'start': 1886.124, 'duration': 12.643}], 'summary': 'Taking small steps in negative gradient direction with alpha as step size.', 'duration': 28.546, 'max_score': 1870.221, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81870221.jpg'}], 'start': 1666.726, 'title': 'Lms and gradient descent methods', 'summary': 'Discusses the lms method for updating weights and the gradient descent method for reaching minima of a function. it covers learning parameters, cost function, iterative algorithms, and step size determination for convergence.', 'chapters': [{'end': 1815.456, 'start': 1666.726, 'title': 'Lms method for weight update', 'summary': 'Discusses the lms method for updating weights iteratively to minimize the sum of squared errors in the context of learning parameters theta and the function h(x), using the cost function j(theta) and iterative algorithm.', 'duration': 148.73, 'highlights': ['The LMS method updates the weight values to minimize the sum of squared errors The LMS method or the delta method updates the beta 0, beta 1, etc. weight values to minimize the sum of squared errors in the cost function j(theta).', 'Cost function j(theta) is based on minimizing the sum of squared errors The cost function j(theta) is defined based on minimizing the sum of squared errors, represented as the sum of (h(x) - y)^2 over all the training examples.', 'The algorithm starts with initial values of beta 0 and beta 1, then updates them to reduce the sum of squared errors In the LMS algorithm, the initial values of beta 0 and beta 1 are updated using training examples to reduce the sum of squared errors in the cost function j(theta).']}, {'end': 1951.627, 'start': 1815.456, 'title': 'Gradient descent method', 'summary': 'Introduces the gradient descent method, which involves taking small steps in the negative direction of the gradient to reach the minima of a function, with alpha determining the step size and the method eventually converging to the global minima of a convex quadratic function.', 'duration': 136.171, 'highlights': ['The gradient descent method involves taking small steps in the negative direction of the gradient to reach the minima of a function, with alpha determining the step size and the method eventually converging to the global minima of a convex quadratic function.', 'Alpha determines the step size in the gradient descent method, with larger alpha resulting in larger steps and smaller alpha resulting in smaller steps.', 'The LMS update rule is used when there is a single training example to work out del del theta j theta.']}], 'duration': 284.901, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81666726.jpg', 'highlights': ['The LMS method updates the weight values to minimize the sum of squared errors', 'The gradient descent method involves taking small steps in the negative direction of the gradient to reach the minima of a function, with alpha determining the step size and the method eventually converging to the global minima of a convex quadratic function.', 'Cost function j(theta) is based on minimizing the sum of squared errors', 'Alpha determines the step size in the gradient descent method, with larger alpha resulting in larger steps and smaller alpha resulting in smaller steps.']}, {'end': 2128.137, 'segs': [{'end': 2035.051, 'src': 'embed', 'start': 2007.073, 'weight': 2, 'content': [{'end': 2018.939, 'text': 'we get the update rule as beta j equal to earlier value of beta j plus alpha times the y minus h x i times the value of x j.', 'start': 2007.073, 'duration': 11.866}, {'end': 2020.98, 'text': 'So this is the delta rule.', 'start': 2018.939, 'duration': 2.041}, {'end': 2027.845, 'text': 'and this delta rule you can easily interpret is that if y and h x are the same, you do not change beta.', 'start': 2020.98, 'duration': 6.865}, {'end': 2033.649, 'text': 'but if they are different suppose y is greater than h x then what you have to do?', 'start': 2027.845, 'duration': 5.804}, {'end': 2035.051, 'text': 'you have to increase beta.', 'start': 2033.649, 'duration': 1.402}], 'summary': 'Update rule for beta: beta_j = beta_j + alpha * (y - h_x_i) * x_j', 'duration': 27.978, 'max_score': 2007.073, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy82007073.jpg'}, {'end': 2116.209, 'src': 'heatmap', 'start': 2047.293, 'weight': 0, 'content': [{'end': 2053.635, 'text': 'So, you can do this for a single training example or you can update for all the training examples at a time.', 'start': 2047.293, 'duration': 6.342}, {'end': 2065.257, 'text': 'So you find out summation of this and then you take all the training examples and update all of them together.', 'start': 2054.455, 'duration': 10.802}, {'end': 2073.739, 'text': 'this is called batch gradient descent, or you can update based on a single example, which is called incremental gradient descent.', 'start': 2065.257, 'duration': 8.482}, {'end': 2089.313, 'text': 'So batch gradient descent takes the right steps in the right direction, but it is very slow because you make one step after processing all the inputs.', 'start': 2075.179, 'duration': 14.134}, {'end': 2095.815, 'text': 'So what is usually done is that we use stochastic gradient descent,', 'start': 2091.092, 'duration': 4.723}, {'end': 2102.46, 'text': 'where we take examples one at a time and make local make changes based on that training example.', 'start': 2095.815, 'duration': 6.645}, {'end': 2110.545, 'text': 'If you do that the entire process is fast and it has also been shown that stochastic gradient descent has very nice properties.', 'start': 2102.54, 'duration': 8.005}, {'end': 2116.209, 'text': 'So, that it does converge and between batch gradient descent and stochastic gradient descent.', 'start': 2110.885, 'duration': 5.324}], 'summary': 'Batch gradient descent is slow, stochastic gradient descent is fast and has favorable properties.', 'duration': 50.952, 'max_score': 2047.293, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy82047293.jpg'}], 'start': 1952.508, 'title': 'Gradient descent and update rules', 'summary': 'Discusses the update rules for beta j, including the delta rule, and compares batch gradient descent, stochastic gradient descent, and mini batch gradient descent for parameter updates in the context of machine learning.', 'chapters': [{'end': 2128.137, 'start': 1952.508, 'title': 'Gradient descent and update rules', 'summary': 'Discusses the update rules for beta j, including the delta rule, and compares batch gradient descent, stochastic gradient descent, and mini batch gradient descent for parameter updates in the context of machine learning.', 'duration': 175.629, 'highlights': ['The delta rule for a single training example is beta j = earlier value of beta j + alpha * (y - h(xi)) * xj, where y and h(xi) dictate whether to increase or decrease beta.', 'Batch gradient descent processes all inputs in one step, making it slow, while stochastic gradient descent updates parameters based on one example at a time, proving to be fast and convergent with nice properties.', 'Mini batch gradient descent involves taking a few examples at a time to determine parameter updates, offering a middle ground between batch and stochastic gradient descent.']}], 'duration': 175.629, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8PJ24SrQqy8/pics/8PJ24SrQqy81952508.jpg', 'highlights': ['Stochastic gradient descent updates parameters based on one example at a time, proving to be fast and convergent with nice properties.', 'Mini batch gradient descent involves taking a few examples at a time to determine parameter updates, offering a middle ground between batch and stochastic gradient descent.', 'The delta rule for a single training example is beta j = earlier value of beta j + alpha * (y - h(xi)) * xj, where y and h(xi) dictate whether to increase or decrease beta.']}], 'highlights': ['The fitting of a ninth degree polynomial to the data points results in a sum of squared error of 0, indicating a perfect fit to the training examples.', 'The cubic function in the third diagram seems to fit the points much better, suggesting a smaller error in the fit to the green line within a certain range.', 'The linear function in the second figure has a lower sum of squared error compared to the first figure, demonstrating a slightly better fit to the data points.', 'The importance of minimizing errors on future examples or according to the distribution is emphasized to ensure a better correspondence between the fitted function and the actual relationship represented by the data points.', 'The function for generating points is y = beta0 + beta1x + epsilon, where epsilon represents the random error associated with the data.', 'The linear regression model establishes a relation between the variables through the equation y = beta0 + beta1x + epsilon.', 'The process involves estimating the values of beta 0, beta 1, beta 2, beta p, which form the estimator for the actual target function.', 'The optimization process focuses on minimizing the sum of squared errors to derive the least square line.', 'The assumptions about the errors involve them having a mean of 0, a standard deviation of sigma epsilon, independence from each other, and following a normal distribution.', 'The least squares regression line minimizes the sum of squared errors between data points and the line.', 'The chapter explains the process of finding beta 0 and beta 1 to minimize the sum of squares errors, offering both closed form solutions and iterative algorithms for learning the parameters.', 'It introduces the closed form solution for the two-dimensional problem where beta 0 is derived as sigma y minus beta 1 sigma x divided by n and beta 1 is n sigma x y minus sigma x sigma y, and further extends to multiple linear regression involving matrix operations for closed form solutions.', 'The LMS method updates the weight values to minimize the sum of squared errors', 'The gradient descent method involves taking small steps in the negative direction of the gradient to reach the minima of a function, with alpha determining the step size and the method eventually converging to the global minima of a convex quadratic function.', 'Cost function j(theta) is based on minimizing the sum of squared errors', 'Alpha determines the step size in the gradient descent method, with larger alpha resulting in larger steps and smaller alpha resulting in smaller steps.', 'Stochastic gradient descent updates parameters based on one example at a time, proving to be fast and convergent with nice properties.', 'Mini batch gradient descent involves taking a few examples at a time to determine parameter updates, offering a middle ground between batch and stochastic gradient descent.', 'The delta rule for a single training example is beta j = earlier value of beta j + alpha * (y - h(xi)) * xj, where y and h(xi) dictate whether to increase or decrease beta.']}