title

Lecture 13 - Validation

description

Validation - Taking a peek out of sample. Model selection and data contamination. Cross validation. Lecture 13 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on May 15, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 13 - Validation', 'heatmap': [{'end': 578.777, 'start': 511.889, 'weight': 1}, {'end': 726.767, 'start': 669.047, 'weight': 0.719}, {'end': 828.318, 'start': 774.332, 'weight': 0.72}, {'end': 1346.575, 'start': 1292.89, 'weight': 0.718}, {'end': 1810.89, 'start': 1755.221, 'weight': 0.899}, {'end': 2226.444, 'start': 2117.863, 'weight': 0.889}], 'summary': 'The lecture covers topics on regularization in machine learning, emphasizing transition from constrained to unconstrained regularization, significance of validation and model selection, trade-off in choosing a validation set size, impact of early stopping, model selection algorithm based on validation errors, and the role of cross-validation in preventing overfitting and model selection, showcasing a 40% improvement in out-of-sample error for a digit classification task.', 'chapters': [{'end': 123.848, 'segs': [{'end': 105.357, 'src': 'embed', 'start': 36.551, 'weight': 0, 'content': [{'end': 40.835, 'text': 'and thereby reducing the VC dimension and improving the generalization property.', 'start': 36.551, 'duration': 4.284}, {'end': 52.701, 'text': 'to an unconstrained version, which creates an augmented error, in which no particular vector of weights is prohibited per se.', 'start': 41.716, 'duration': 10.985}, {'end': 58.864, 'text': 'But basically, you have a preference of weights based on a penalty that has to do with the constraint.', 'start': 53.221, 'duration': 5.643}, {'end': 66.525, 'text': 'And that equivalence will make us focus on the augmented error form of regularization in every practice we have.', 'start': 59.901, 'duration': 6.624}, {'end': 78.413, 'text': 'And the argument for it was to take the constrained version and look at it either as Lagrangian, which would be the formal way of solving it, or,', 'start': 67.366, 'duration': 11.047}, {'end': 80.374, 'text': 'as we did it in a geometric way,', 'start': 78.413, 'duration': 1.961}, {'end': 89.28, 'text': 'to find a condition that corresponds to minimization under a constraint and find that that would be locally equivalent to minimizing this in an unconstrained way.', 'start': 80.374, 'duration': 8.906}, {'end': 93.629, 'text': 'Then we went to the general form of a regularizer.', 'start': 91.007, 'duration': 2.622}, {'end': 98.312, 'text': 'And we called it capital omega of H.', 'start': 94.569, 'duration': 3.743}, {'end': 105.357, 'text': 'And it depends on small h rather than capital H, which was the other capital omega that we used in the VC analysis was.', 'start': 98.312, 'duration': 7.045}], 'summary': 'Regularization reduces vc dimension, improves generalization, uses penalty for weight preference.', 'duration': 68.806, 'max_score': 36.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk36551.jpg'}], 'start': 0.703, 'title': 'Regularization in machine learning', 'summary': 'Discusses regularization in machine learning, emphasizing the transition from constrained to unconstrained regularization, the concept of augmented error, and the preference of weights based on a penalty. it also highlights the importance of minimizing the augmented error for better out-of-sample error prediction.', 'chapters': [{'end': 123.848, 'start': 0.703, 'title': 'Regularization in machine learning', 'summary': 'Discusses regularization in machine learning, emphasizing the transition from constrained to unconstrained regularization, the concept of augmented error, and the preference of weights based on a penalty. it also highlights the importance of minimizing the augmented error for better out-of-sample error prediction.', 'duration': 123.145, 'highlights': ['The transition from constrained to unconstrained regularization involves explicitly forbidding some hypotheses to reduce VC dimension and improve generalization property, leading to an augmented error preference based on a penalty.', 'The focus on the augmented error form of regularization is emphasized in practice, with the argument for it being the equivalence to the constrained version, either through Lagrangian or geometric approach.', 'The discussion delves into the general form of a regularizer, denoted as capital omega of H, which depends on small h, and the formation of the augmented error as the in-sample error plus a specific term for better out-of-sample error prediction.']}], 'duration': 123.145, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk703.jpg', 'highlights': ['The transition from constrained to unconstrained regularization involves explicitly forbidding some hypotheses to reduce VC dimension and improve generalization property, leading to an augmented error preference based on a penalty.', 'The discussion delves into the general form of a regularizer, denoted as capital omega of H, which depends on small h, and the formation of the augmented error as the in-sample error plus a specific term for better out-of-sample error prediction.', 'The focus on the augmented error form of regularization is emphasized in practice, with the argument for it being the equivalence to the constrained version, either through Lagrangian or geometric approach.']}, {'end': 959.57, 'segs': [{'end': 215.583, 'src': 'embed', 'start': 189.365, 'weight': 2, 'content': [{'end': 193.266, 'text': 'the validation will tell you take lambda equals 0, and therefore no harm done.', 'start': 189.365, 'duration': 3.901}, {'end': 200.909, 'text': 'And, as you see, the choice of lambda is indeed critical because when you take the correct amount of lambda, which happens to be very small,', 'start': 194.443, 'duration': 6.466}, {'end': 205.734, 'text': 'in this case, the fit, which is the red curve, is very close to the target, which is the blue.', 'start': 200.909, 'duration': 4.825}, {'end': 209.397, 'text': 'Whereas if you push your luck and have more of the regularization,', 'start': 206.134, 'duration': 3.263}, {'end': 215.583, 'text': 'you end up constraining the fit so much that the red really wants to move toward the blue,', 'start': 209.397, 'duration': 6.186}], 'summary': 'Choosing the correct lambda is critical for model fit. small lambda leads to close fit, while excessive regularization constrains the fit.', 'duration': 26.218, 'max_score': 189.365, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk189365.jpg'}, {'end': 288.471, 'src': 'embed', 'start': 252.964, 'weight': 0, 'content': [{'end': 256.007, 'text': 'So why do we call it validation? And the distinction will be pretty important.', 'start': 252.964, 'duration': 3.043}, {'end': 261.043, 'text': "And then we'll go for model selection, a very important subject in machine learning.", 'start': 257.096, 'duration': 3.947}, {'end': 263.467, 'text': 'And it is the main task of validation.', 'start': 261.344, 'duration': 2.123}, {'end': 264.93, 'text': "That's what you use validation for.", 'start': 263.507, 'duration': 1.423}, {'end': 269.979, 'text': "And we'll find that model selection covers more territory than what the name may suggest to you.", 'start': 265.451, 'duration': 4.528}, {'end': 278.927, 'text': "Finally, we'll go to cross-validation, which is a type of validation that is very interesting, that allows you, if I give you a budget of n examples,", 'start': 271.343, 'duration': 7.584}, {'end': 282.688, 'text': 'to basically use all of them for validation and all of them for training.', 'start': 278.927, 'duration': 3.761}, {'end': 288.471, 'text': 'Which looks like cheating, because validation will look like a distinct activity from training, as we will see.', 'start': 283.349, 'duration': 5.122}], 'summary': 'Validation, model selection, and cross-validation in machine learning.', 'duration': 35.507, 'max_score': 252.964, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk252964.jpg'}, {'end': 388.369, 'src': 'embed', 'start': 361.569, 'weight': 1, 'content': [{'end': 371.082, 'text': 'So basically, what we did is concoct a term that we think captures the overfit complexity, overfit penalty.', 'start': 361.569, 'duration': 9.513}, {'end': 377.725, 'text': 'And then, instead of minimizing the in-sample, we minimize the in-sample plus that, and we call that the augmented error.', 'start': 372.063, 'duration': 5.662}, {'end': 381.186, 'text': 'And hopefully, the augmented error will be a better proxy for Eout.', 'start': 378.205, 'duration': 2.981}, {'end': 382.647, 'text': 'That was the deal.', 'start': 381.566, 'duration': 1.081}, {'end': 388.369, 'text': 'And we noticed that we are very, very inaccurate in the choice here.', 'start': 383.227, 'duration': 5.142}], 'summary': 'Concocted a term to capture overfit complexity, aiming for better proxy for eout.', 'duration': 26.8, 'max_score': 361.569, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk361569.jpg'}, {'end': 578.777, 'src': 'heatmap', 'start': 511.889, 'weight': 1, 'content': [{'end': 520.118, 'text': 'Now, if you take this quantity, and we are now treating it as an estimate for E out, a poor estimate, but nonetheless an estimate.', 'start': 511.889, 'duration': 8.229}, {'end': 525.904, 'text': 'We call it an estimate, because if you take the expected value of that with respect to the choice of K,', 'start': 520.739, 'duration': 5.165}, {'end': 531.41, 'text': 'with the probability distribution over the input space that generates X, what will that value be?', 'start': 525.904, 'duration': 5.506}, {'end': 534.601, 'text': 'Well, that is simply E out.', 'start': 532.679, 'duration': 1.922}, {'end': 541.166, 'text': 'So indeed, this quantity, the random variable here, has the correct expected value.', 'start': 535.221, 'duration': 5.945}, {'end': 543.228, 'text': "It's an unbiased estimate of E out.", 'start': 541.226, 'duration': 2.002}, {'end': 548.913, 'text': "But unbiased means that it's as likely to be here or here in terms of expected value.", 'start': 543.929, 'duration': 4.984}, {'end': 551.896, 'text': 'But we could be this, and this would be a good estimate.', 'start': 549.474, 'duration': 2.422}, {'end': 555.519, 'text': "Or we could be this, and this would be a terrible estimate, because you're not getting all of them.", 'start': 552.376, 'duration': 3.143}, {'end': 556.42, 'text': "You're just getting one of them.", 'start': 555.539, 'duration': 0.881}, {'end': 564.307, 'text': 'So if this guy swings very large, and I tell you this is an estimate of E out, and you get it here, this is what you will think E out is.', 'start': 557.201, 'duration': 7.106}, {'end': 567.39, 'text': 'So there is an error, but the error is not biased.', 'start': 565.028, 'duration': 2.362}, {'end': 568.831, 'text': "That's what this equation says.", 'start': 567.65, 'duration': 1.181}, {'end': 575.435, 'text': 'But we have to evaluate that swing, and the swing is obviously evaluated by the usual quantity, the variance.', 'start': 570.393, 'duration': 5.042}, {'end': 578.777, 'text': "And let's just call the variance sigma squared.", 'start': 576.696, 'duration': 2.081}], 'summary': 'The quantity serves as an unbiased estimate for e out, but its variance needs evaluation.', 'duration': 66.888, 'max_score': 511.889, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk511889.jpg'}, {'end': 631.481, 'src': 'embed', 'start': 605.871, 'weight': 6, 'content': [{'end': 611.075, 'text': 'Now, the notation we are going to have is that the number of points in the validation set is K.', 'start': 605.871, 'duration': 5.204}, {'end': 615.779, 'text': 'Remember that the number of points in the training set was N.', 'start': 611.075, 'duration': 4.704}, {'end': 622.624, 'text': 'So this will be K points, also generated according to the same rules, independently according to the probability distribution over the input space.', 'start': 615.779, 'duration': 6.845}, {'end': 631.481, 'text': 'And the error on that set, we are going to call eval, as in validation error.', 'start': 624.265, 'duration': 7.216}], 'summary': 'Validation set has k points, independently generated according to the same rules.', 'duration': 25.61, 'max_score': 605.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk605871.jpg'}, {'end': 726.767, 'src': 'heatmap', 'start': 669.047, 'weight': 0.719, 'content': [{'end': 674.169, 'text': 'So the main component is the expected value of this fellow, which we have seen before, expected value in a single point.', 'start': 669.047, 'duration': 5.122}, {'end': 679.631, 'text': 'And you just average linearly, as you did.', 'start': 674.609, 'duration': 5.022}, {'end': 682.472, 'text': 'This quantity happens to be E out.', 'start': 680.511, 'duration': 1.961}, {'end': 684.933, 'text': 'The expected value on one point is E out.', 'start': 682.572, 'duration': 2.361}, {'end': 687.854, 'text': 'Therefore, when you do that, you just get E out again.', 'start': 685.553, 'duration': 2.301}, {'end': 695.936, 'text': 'So indeed, again, the validation error is an unbiased estimate of the out-of-sample error,', 'start': 689.274, 'duration': 6.662}, {'end': 700.577, 'text': 'provided that all you did with the validation set is just measure the out-of-sample error.', 'start': 695.936, 'duration': 4.641}, {'end': 701.737, 'text': "You didn't use it in any way.", 'start': 700.637, 'duration': 1.1}, {'end': 707.779, 'text': "Now let's look at the variance, because that was our problem with the single-point estimate.", 'start': 703.077, 'duration': 4.702}, {'end': 709.359, 'text': "And let's see if there is an improvement.", 'start': 708.079, 'duration': 1.28}, {'end': 712.477, 'text': 'When you get the variance, you are going to take this formula.', 'start': 710.336, 'duration': 2.141}, {'end': 719.502, 'text': 'And then you are going to have a double summation, and have all cross terms of E between different points.', 'start': 713.178, 'duration': 6.324}, {'end': 726.767, 'text': "So you'll have the covariance between the value for k equals 1 and k equals 2, k equals 1 and k equals 3, et cetera.", 'start': 719.562, 'duration': 7.205}], 'summary': 'Validation error is an unbiased estimate of out-of-sample error, with variance addressed by double summation.', 'duration': 57.72, 'max_score': 669.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk669047.jpg'}, {'end': 701.737, 'src': 'embed', 'start': 674.609, 'weight': 3, 'content': [{'end': 679.631, 'text': 'And you just average linearly, as you did.', 'start': 674.609, 'duration': 5.022}, {'end': 682.472, 'text': 'This quantity happens to be E out.', 'start': 680.511, 'duration': 1.961}, {'end': 684.933, 'text': 'The expected value on one point is E out.', 'start': 682.572, 'duration': 2.361}, {'end': 687.854, 'text': 'Therefore, when you do that, you just get E out again.', 'start': 685.553, 'duration': 2.301}, {'end': 695.936, 'text': 'So indeed, again, the validation error is an unbiased estimate of the out-of-sample error,', 'start': 689.274, 'duration': 6.662}, {'end': 700.577, 'text': 'provided that all you did with the validation set is just measure the out-of-sample error.', 'start': 695.936, 'duration': 4.641}, {'end': 701.737, 'text': "You didn't use it in any way.", 'start': 700.637, 'duration': 1.1}], 'summary': 'Validation error is an unbiased estimate of out-of-sample error if not used.', 'duration': 27.128, 'max_score': 674.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk674609.jpg'}, {'end': 828.318, 'src': 'heatmap', 'start': 774.332, 'weight': 0.72, 'content': [{'end': 778.773, 'text': 'Because I had k squared elements, the fact that many of them dropped out is just to my advantage.', 'start': 774.332, 'duration': 4.441}, {'end': 781.034, 'text': 'I still have the 1 over k squared.', 'start': 779.393, 'duration': 1.641}, {'end': 787.082, 'text': 'And that gives me the better variance for the estimate based on eval than on a single point.', 'start': 781.514, 'duration': 5.568}, {'end': 793.41, 'text': 'This is your typical analysis of adding a bunch of independent estimates.', 'start': 787.222, 'duration': 6.188}, {'end': 795.132, 'text': 'So you get the sigma squared.', 'start': 793.811, 'duration': 1.321}, {'end': 797.015, 'text': 'That was the variance on a particular point.', 'start': 795.313, 'duration': 1.702}, {'end': 798.517, 'text': 'But now you divide it by k.', 'start': 797.315, 'duration': 1.202}, {'end': 803.884, 'text': 'Now we see a hope because even if the original estimate was this way,', 'start': 799.418, 'duration': 4.466}, {'end': 810.052, 'text': 'maybe we can have K big enough that we keep shrinking the error bar such that the E value itself as a random variable becomes.', 'start': 803.884, 'duration': 6.168}, {'end': 812.856, 'text': 'this, which is around E out, is what we want.', 'start': 810.052, 'duration': 2.804}, {'end': 815.078, 'text': 'And therefore, it becomes a reliable estimate.', 'start': 813.156, 'duration': 1.922}, {'end': 817.809, 'text': 'This looks promising.', 'start': 817.068, 'duration': 0.741}, {'end': 828.318, 'text': 'So now we can write the E val, which is a random variable, to be E out, which is the value we want plus or minus, something that averages to 0,', 'start': 818.169, 'duration': 10.149}], 'summary': 'Adding k independent estimates reduces variance, improving reliability.', 'duration': 53.986, 'max_score': 774.332, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk774332.jpg'}, {'end': 828.318, 'src': 'embed', 'start': 797.315, 'weight': 4, 'content': [{'end': 798.517, 'text': 'But now you divide it by k.', 'start': 797.315, 'duration': 1.202}, {'end': 803.884, 'text': 'Now we see a hope because even if the original estimate was this way,', 'start': 799.418, 'duration': 4.466}, {'end': 810.052, 'text': 'maybe we can have K big enough that we keep shrinking the error bar such that the E value itself as a random variable becomes.', 'start': 803.884, 'duration': 6.168}, {'end': 812.856, 'text': 'this, which is around E out, is what we want.', 'start': 810.052, 'duration': 2.804}, {'end': 815.078, 'text': 'And therefore, it becomes a reliable estimate.', 'start': 813.156, 'duration': 1.922}, {'end': 817.809, 'text': 'This looks promising.', 'start': 817.068, 'duration': 0.741}, {'end': 828.318, 'text': 'So now we can write the E val, which is a random variable, to be E out, which is the value we want plus or minus, something that averages to 0,', 'start': 818.169, 'duration': 10.149}], 'summary': 'Dividing by k can shrink error bar, making e value reliable estimate.', 'duration': 31.003, 'max_score': 797.315, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk797315.jpg'}], 'start': 123.948, 'title': 'Ml regularization and validation', 'summary': 'Covers the heuristic choice of regularization parameter omega, principled determination of lambda using validation, and emphasizes significance of validation and model selection. it also explores the process of estimating out-of-sample error using a validation set and the impact of sample size on error estimation.', 'chapters': [{'end': 411.406, 'start': 123.948, 'title': 'Regularization and validation in ml', 'summary': 'Discusses the heuristic choice of the regularization parameter omega, the principled determination of lambda using validation, and the critical choice of lambda in achieving a close fit to the target, emphasizing the significance of validation and model selection in machine learning.', 'duration': 287.458, 'highlights': ['The choice of omega in a practical situation is really a heuristic choice, guided by theory and certain goals. The selection of omega in practical situations is heuristic-driven, guided by theory and specific objectives.', 'The determination of lambda using validation is critical, as it allows for the selection of the correct amount of lambda, leading to a close fit to the target. Validation plays a crucial role in determining the correct amount of lambda, resulting in a close fit to the target.', 'Validation is a fundamental technique in machine learning, used for model selection and covers more territory than implied by its name. Validation is an essential technique in machine learning, primarily employed for model selection and encompasses a broader scope than its title suggests.', 'Cross-validation enables the use of all examples for both validation and training, providing a workaround to distinct activities, such as training and validation. Cross-validation allows for the utilization of all examples for both validation and training, offering a solution to the distinction between training and validation activities.', 'Regularization attempts to estimate the overfit penalty by minimizing the augmented error, which serves as a better proxy for the out-of-sample error. Regularization aims to estimate the overfit penalty by minimizing the augmented error, serving as a more suitable proxy for the out-of-sample error.']}, {'end': 959.57, 'start': 411.886, 'title': 'Validation and out-of-sample estimation', 'summary': 'Discusses the process of estimating out-of-sample error using a validation set, exploring the unbiased estimate of out-of-sample error, the impact of sample size on error estimation, and the trade-off between training and validation sets.', 'duration': 547.684, 'highlights': ['The validation error is an unbiased estimate of the out-of-sample error, provided the validation set is only used to measure the out-of-sample error and not for training. The validation error serves as an unbiased estimate of the out-of-sample error, assuming it is solely utilized for out-of-sample error measurement, enhancing the reliability of the estimation.', 'The variance of the validation error decreases as the sample size (k) increases, potentially leading to a more reliable estimate of the out-of-sample error. As the sample size (k) for the validation set increases, the variance of the validation error decreases, suggesting a potential improvement in the reliability of the out-of-sample error estimate.', 'The trade-off between the number of points used for training and validation is highlighted, as every point allocated to the validation set reduces the points available for training. Allocation of points to the validation set results in a trade-off, as it reduces the available points for training, emphasizing the need to carefully balance the allocation for effective model evaluation.']}], 'duration': 835.622, 'thumbnail': '', 'highlights': ['Validation is a fundamental technique in machine learning, primarily employed for model selection and encompasses a broader scope than its title suggests.', 'Regularization aims to estimate the overfit penalty by minimizing the augmented error, serving as a more suitable proxy for the out-of-sample error.', 'The determination of lambda using validation is critical, as it allows for the selection of the correct amount of lambda, leading to a close fit to the target.', 'The validation error serves as an unbiased estimate of the out-of-sample error, assuming it is solely utilized for out-of-sample error measurement, enhancing the reliability of the estimation.', 'As the sample size (k) for the validation set increases, the variance of the validation error decreases, suggesting a potential improvement in the reliability of the out-of-sample error estimate.', 'Cross-validation allows for the utilization of all examples for both validation and training, offering a solution to the distinction between training and validation activities.', 'Allocation of points to the validation set results in a trade-off, as it reduces the available points for training, emphasizing the need to carefully balance the allocation for effective model evaluation.']}, {'end': 1437.172, 'segs': [{'end': 1070.265, 'src': 'embed', 'start': 1037.603, 'weight': 1, 'content': [{'end': 1041.484, 'text': 'Therefore, if you increase k, you are moving in this direction.', 'start': 1037.603, 'duration': 3.881}, {'end': 1047.309, 'text': 'So I used to be here, and I used to expect that level of E out.', 'start': 1043.587, 'duration': 3.722}, {'end': 1050.551, 'text': "Now I am here, and I'm expecting that level of E out.", 'start': 1048.309, 'duration': 2.242}, {'end': 1053.761, 'text': "That doesn't look very promising.", 'start': 1052.481, 'duration': 1.28}, {'end': 1061.603, 'text': "I may get a reliable estimate, because I'm using bigger K, but I'm getting a reliable estimate of a worse quantity.", 'start': 1054.582, 'duration': 7.021}, {'end': 1070.265, 'text': 'If you want to take an extreme case, you are going to take this estimate and go to your customer and tell them what you expect the performance to be.', 'start': 1061.623, 'duration': 8.642}], 'summary': 'Increasing k may result in a reliable estimate but worse quantity, impacting performance.', 'duration': 32.662, 'max_score': 1037.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1037603.jpg'}, {'end': 1112.692, 'src': 'embed', 'start': 1082.233, 'weight': 2, 'content': [{'end': 1086.076, 'text': 'Now you want the estimate to be very reliable, and you forget about the quality of the hypothesis.', 'start': 1082.233, 'duration': 3.843}, {'end': 1090.12, 'text': 'So you keep increasing k, keep increasing k, keep increasing k.', 'start': 1086.457, 'duration': 3.663}, {'end': 1093.022, 'text': 'So you end up with a very, very reliable estimate.', 'start': 1090.12, 'duration': 2.902}, {'end': 1101.09, 'text': "The problem is that it's an estimate of a very, very poor quantity, because you use two examples to train, and you are basically in the noise.", 'start': 1093.963, 'duration': 7.127}, {'end': 1108.156, 'text': 'So the statement you are going to make to your customer in this case is that here is a system.', 'start': 1102.231, 'duration': 5.925}, {'end': 1112.692, 'text': "I am very sure that it's terrible.", 'start': 1109.708, 'duration': 2.984}], 'summary': 'Increasing k for reliable estimate leads to poor quality hypothesis, resulting in a terrible system.', 'duration': 30.459, 'max_score': 1082.233, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1082233.jpg'}, {'end': 1346.575, 'src': 'heatmap', 'start': 1292.89, 'weight': 0.718, 'content': [{'end': 1293.81, 'text': "So it's a funny situation.", 'start': 1292.89, 'duration': 0.92}, {'end': 1297.812, 'text': "I'm giving you g, and I'm giving you the validation estimate on g minus.", 'start': 1293.83, 'duration': 3.982}, {'end': 1299.893, 'text': "Why? Because that's the only estimate I have.", 'start': 1298.072, 'duration': 1.821}, {'end': 1304.355, 'text': "I cannot give you the estimate on g, because now if I get g, I don't have any guys to validate on.", 'start': 1299.973, 'duration': 4.382}, {'end': 1306.956, 'text': 'So you can see now the compromise.', 'start': 1305.275, 'duration': 1.681}, {'end': 1314.098, 'text': 'So now, under this scenario, I am not really using in performance by taking a bigger validation set,', 'start': 1308.016, 'duration': 6.082}, {'end': 1316.679, 'text': "because I'm going to put them back when I get the final hypothesis.", 'start': 1314.098, 'duration': 2.581}, {'end': 1324.723, 'text': "What I am losing here is that the validation error I'm reporting is a validation error on a different hypothesis than the one I am giving you.", 'start': 1316.98, 'duration': 7.743}, {'end': 1331.727, 'text': "And if the difference is big, then my estimate is bad, because I'm estimating on something other than what I'm giving you.", 'start': 1325.783, 'duration': 5.944}, {'end': 1335.609, 'text': "And that's what happens when you have large K.", 'start': 1332.447, 'duration': 3.162}, {'end': 1339.171, 'text': 'When you have large K, the discrepancy between G- and G is bigger.', 'start': 1335.609, 'duration': 3.562}, {'end': 1341.772, 'text': 'And I am giving you the estimate in G-.', 'start': 1339.911, 'duration': 1.861}, {'end': 1343.433, 'text': 'So that estimate is poor.', 'start': 1342.072, 'duration': 1.361}, {'end': 1346.575, 'text': 'And therefore, I get a bad estimate again.', 'start': 1344.173, 'duration': 2.402}], 'summary': 'Using a larger validation set leads to poor estimation with a large discrepancy between g- and g.', 'duration': 53.685, 'max_score': 1292.89, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1292890.jpg'}, {'end': 1416.421, 'src': 'embed', 'start': 1352.797, 'weight': 0, 'content': [{'end': 1360.379, 'text': 'After you do your thing, and you do your estimates, and as you will see further, you do your choices, you go and put all the examples to train on.', 'start': 1352.797, 'duration': 7.582}, {'end': 1363.06, 'text': 'Because this is your best bet of getting a good hypothesis.', 'start': 1360.479, 'duration': 2.581}, {'end': 1368.222, 'text': 'If your k is small, the validation error is not reliable.', 'start': 1364.02, 'duration': 4.202}, {'end': 1371.723, 'text': "It's a bad estimate, just because the variance of it is big.", 'start': 1368.302, 'duration': 3.421}, {'end': 1373.783, 'text': 'Small k, I have small k.', 'start': 1372.263, 'duration': 1.52}, {'end': 1375.444, 'text': "It's 1 over square root of k, so I'm doing this.", 'start': 1373.783, 'duration': 1.661}, {'end': 1379.955, 'text': 'If you get big K, the problem is not the reliability of the estimate.', 'start': 1376.314, 'duration': 3.641}, {'end': 1385.456, 'text': 'The problem is that the thing you are estimating is getting further and further away from the thing you are reporting.', 'start': 1379.995, 'duration': 5.461}, {'end': 1387.596, 'text': 'So now we have a compromise.', 'start': 1386.576, 'duration': 1.02}, {'end': 1390.877, 'text': "We don't want K to be too small in order not to have fluctuations.", 'start': 1387.856, 'duration': 3.021}, {'end': 1395.638, 'text': "We don't want K to be too big in order not to be too far from what we are reporting.", 'start': 1391.177, 'duration': 4.461}, {'end': 1399.719, 'text': 'And as usual in machine learning, there is a rule of thumb.', 'start': 1396.618, 'duration': 3.101}, {'end': 1402.64, 'text': 'And the rule of thumb is pretty simple.', 'start': 1401.359, 'duration': 1.281}, {'end': 1403.8, 'text': "That's why it's a rule of thumb.", 'start': 1402.96, 'duration': 0.84}, {'end': 1408.255, 'text': 'It says, take 1 fifth for validation.', 'start': 1404.733, 'duration': 3.522}, {'end': 1412.418, 'text': 'That usually gives you the best of both worlds.', 'start': 1409.676, 'duration': 2.742}, {'end': 1413.279, 'text': 'Nothing proved.', 'start': 1412.638, 'duration': 0.641}, {'end': 1415.32, 'text': 'You can find counterexamples.', 'start': 1413.959, 'duration': 1.361}, {'end': 1416.421, 'text': "I'm not going to argue with that.", 'start': 1415.34, 'duration': 1.081}], 'summary': 'Choose k value carefully for reliable validation error; rule of thumb: use 1/5 for validation.', 'duration': 63.624, 'max_score': 1352.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1352797.jpg'}], 'start': 961.946, 'title': 'Balancing validation set size', 'summary': 'Discusses the trade-off between reliability of estimate and expected error value in choosing a validation set size, emphasizing the importance of finding a balance. it suggests using 1/5th of the dataset for validation to achieve a balance.', 'chapters': [{'end': 1127.026, 'start': 961.946, 'title': 'Validation set and learning curves', 'summary': 'Discusses the trade-off between the reliability of the estimate of the validation set and the expected value of error, suggesting that a larger k for validation points may provide a reliable estimate but of a worse quantity, ultimately highlighting the importance of finding a balance.', 'duration': 165.08, 'highlights': ['The reliability of the estimate of the validation set is inversely proportional to the square root of the number of validation points, with a conclusion that using a small k leads to a bad estimate.', 'Increasing the number of validation points (K) may result in a more reliable estimate, but it could also lead to a worse expected value of error, indicating a trade-off between reliability and quality of the estimate.', 'Continuously increasing K for validation points can lead to a very reliable estimate but of a very poor quantity, ultimately resulting in an estimate of a very poor quality hypothesis, which may not be well-received by the customer.']}, {'end': 1437.172, 'start': 1127.987, 'title': 'Optimizing validation set size', 'summary': 'Discusses the importance of choosing the right validation set size for training machine learning models, emphasizing the trade-off between reliability of estimates and proximity to reported results, and suggests a rule of thumb of using 1/5th of the dataset for validation to achieve a balance.', 'duration': 309.185, 'highlights': ['The compromise between reliability of estimates and proximity to reported results when choosing the validation set size is emphasized, with a rule of thumb of using 1/5th of the dataset for validation suggested. Rule of thumb suggests using 1/5th of the dataset for validation', 'The impact of choosing a small k for validation is discussed, emphasizing the unreliability of the validation error estimate due to high variance. Small k leads to unreliable validation error estimate due to high variance', 'The trade-off when using a large k for validation is highlighted, focusing on the increasing discrepancy between the estimated and reported hypotheses, leading to poor estimates. Large k leads to increasing discrepancy between estimated and reported hypotheses, resulting in poor estimates']}], 'duration': 475.226, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk961946.jpg', 'highlights': ['Using a small k leads to a bad estimate due to high variance', 'Increasing K may result in a more reliable estimate but worse expected error value', 'Continuously increasing K can lead to a very reliable estimate but of poor quality', 'The compromise between reliability of estimates and proximity to reported results is emphasized', 'Rule of thumb suggests using 1/5th of the dataset for validation']}, {'end': 1931.791, 'segs': [{'end': 1486.838, 'src': 'embed', 'start': 1456.755, 'weight': 2, 'content': [{'end': 1459.836, 'text': "And this is a very important point, so let's talk about it in detail.", 'start': 1456.755, 'duration': 3.081}, {'end': 1470.668, 'text': 'Once I make my estimate affect the learning process, the set I am using is going to change nature.', 'start': 1461.366, 'duration': 9.302}, {'end': 1472.749, 'text': "So let's look at the situation that we have seen before.", 'start': 1470.849, 'duration': 1.9}, {'end': 1478.631, 'text': 'Remember this fellow? This was early stopping in neural networks.', 'start': 1473.689, 'duration': 4.942}, {'end': 1483.252, 'text': 'And let me magnify it for you to see the green curve.', 'start': 1478.951, 'duration': 4.301}, {'end': 1486.838, 'text': 'You see the green curve now? So there is a green curve.', 'start': 1483.272, 'duration': 3.566}], 'summary': 'Estimates impact learning process, changing nature of data. discusses early stopping in neural networks.', 'duration': 30.083, 'max_score': 1456.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1456755.jpg'}, {'end': 1635.183, 'src': 'embed', 'start': 1607.033, 'weight': 1, 'content': [{'end': 1612.457, 'text': "Let's say I have the test set which is unbiased, and I'm claiming that the validation set has an optimistic bias.", 'start': 1607.033, 'duration': 5.424}, {'end': 1615.758, 'text': 'Optimistic is not like, I mean, optimism is good.', 'start': 1613.257, 'duration': 2.501}, {'end': 1618.479, 'text': 'But here is optimism followed by disappointment.', 'start': 1616.138, 'duration': 2.341}, {'end': 1619.279, 'text': "It's deception.", 'start': 1618.539, 'duration': 0.74}, {'end': 1625.961, 'text': "We are just calling it optimistic to understand that it's always in the direction of thinking that the error will be smaller than it will actually turn out to be.", 'start': 1620.059, 'duration': 5.902}, {'end': 1629.822, 'text': "So let's say we have two hypotheses.", 'start': 1628.601, 'duration': 1.221}, {'end': 1635.183, 'text': "And for simplicity, let's have them have both the same E out.", 'start': 1631.362, 'duration': 3.821}], 'summary': 'Validation set may have optimistic bias, leading to deception in error estimation.', 'duration': 28.15, 'max_score': 1607.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1607033.jpg'}, {'end': 1813.652, 'src': 'heatmap', 'start': 1751.239, 'weight': 3, 'content': [{'end': 1754.461, 'text': 'because you are deliberately picking the minimum of the realization.', 'start': 1751.239, 'duration': 3.222}, {'end': 1759.622, 'text': "And it's very easy to see that the expected value of E is less than 0.5.", 'start': 1755.221, 'duration': 4.401}, {'end': 1770.219, 'text': 'The easiest thing to say is that if I have two variables like that, the probability that the minimum will be less than 1.5, is 75%.', 'start': 1759.622, 'duration': 10.597}, {'end': 1773.621, 'text': 'Because all you need to do is one of them being less than 1 half.', 'start': 1770.219, 'duration': 3.402}, {'end': 1778.864, 'text': 'So if the probability of being less than 1 half is 75%, you expect the expected value to be less than 1 half.', 'start': 1774.361, 'duration': 4.503}, {'end': 1779.784, 'text': "It's mostly there.", 'start': 1779.224, 'duration': 0.56}, {'end': 1781.085, 'text': 'The mass is mostly below.', 'start': 1779.944, 'duration': 1.141}, {'end': 1785.648, 'text': 'So now we realize this is what? This is an optimistic bias.', 'start': 1782.286, 'duration': 3.362}, {'end': 1789.458, 'text': 'And that is exactly the same what happened with the early stopping.', 'start': 1786.536, 'duration': 2.922}, {'end': 1792.84, 'text': "We picked a point because it's minimum on the realization.", 'start': 1789.858, 'duration': 2.982}, {'end': 1794.861, 'text': 'And that is what we reported.', 'start': 1793.8, 'duration': 1.061}, {'end': 1796.922, 'text': 'Because of that, the thing used to be this.', 'start': 1794.901, 'duration': 2.021}, {'end': 1797.923, 'text': 'But we wait.', 'start': 1797.342, 'duration': 0.581}, {'end': 1799.163, 'text': "When it's there, we ignore it.", 'start': 1798.023, 'duration': 1.14}, {'end': 1800.504, 'text': "When it's here, we take it.", 'start': 1799.504, 'duration': 1}, {'end': 1803.986, 'text': 'So now that introduces a bias, and that bias is optimistic.', 'start': 1801.144, 'duration': 2.842}, {'end': 1806.327, 'text': 'And that will be true for the validation set.', 'start': 1804.586, 'duration': 1.741}, {'end': 1810.89, 'text': 'So our discussion so far is based on just looking at the out.', 'start': 1806.768, 'duration': 4.122}, {'end': 1813.652, 'text': 'Now we are going to use it, and we are going to introduce a bias.', 'start': 1811.25, 'duration': 2.402}], 'summary': 'Expected value e is less than 0.5 due to optimistic bias in picking minimum.', 'duration': 62.413, 'max_score': 1751.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1751239.jpg'}, {'end': 1836.6, 'src': 'embed', 'start': 1814.724, 'weight': 0, 'content': [{'end': 1822.93, 'text': 'Fortunately for us, the utility of validation in machine learning is so light that we are going to swallow the bias.', 'start': 1814.724, 'duration': 8.206}, {'end': 1824.731, 'text': 'Bias is minor.', 'start': 1823.811, 'duration': 0.92}, {'end': 1826.072, 'text': 'We are not going to push our luck.', 'start': 1824.771, 'duration': 1.301}, {'end': 1833.137, 'text': 'We are not going to estimate tons of stuff and keep adding bias until the validation error basically becomes training error in disguise.', 'start': 1826.112, 'duration': 7.025}, {'end': 1836.6, 'text': "We're just going to choose a parameter, choose between models and whatnot.", 'start': 1833.437, 'duration': 3.163}], 'summary': 'Validation utility in ml is light, minimizing bias, choosing parameters and models.', 'duration': 21.876, 'max_score': 1814.724, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1814724.jpg'}, {'end': 1887.852, 'src': 'embed', 'start': 1859.815, 'weight': 4, 'content': [{'end': 1864.839, 'text': 'And the choice of lambda, in the case we saw, happens to be a manifestation of this.', 'start': 1859.815, 'duration': 5.024}, {'end': 1866.34, 'text': "So let's talk about it.", 'start': 1865.379, 'duration': 0.961}, {'end': 1871.882, 'text': 'Basically, we are going to use the validation set more than once.', 'start': 1867.839, 'duration': 4.043}, {'end': 1873.163, 'text': "That's how we're going to make the choice.", 'start': 1871.962, 'duration': 1.201}, {'end': 1874.104, 'text': "So let's look.", 'start': 1873.383, 'duration': 0.721}, {'end': 1876.125, 'text': 'This is a diagram.', 'start': 1875.285, 'duration': 0.84}, {'end': 1877.486, 'text': "I'm going to build it up.", 'start': 1876.165, 'duration': 1.321}, {'end': 1882.47, 'text': "So let's build it up, and then I'll focus on it and look at how the diagram reflects the logic.", 'start': 1877.706, 'duration': 4.764}, {'end': 1887.852, 'text': 'We have M models that we are going to choose from.', 'start': 1884.251, 'duration': 3.601}], 'summary': 'Using validation set multiple times to choose from m models.', 'duration': 28.037, 'max_score': 1859.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1859815.jpg'}], 'start': 1437.192, 'title': 'Validation, bias, and model selection in ml', 'summary': 'Discusses validation in machine learning, emphasizing the shift from unbiased test error to biased validation error, the impact of early stopping, and the optimistic bias introduced in model selection by minimum validation error.', 'chapters': [{'end': 1625.961, 'start': 1437.192, 'title': 'Validation and bias in machine learning', 'summary': 'Discusses the concept of validation in machine learning, emphasizing the shift from unbiased test error to biased validation error and the impact of early stopping on learning process and model choices.', 'duration': 188.769, 'highlights': ['The shift from unbiased test error to biased validation error due to early stopping in the learning process is a key point in understanding the concept of validation in machine learning. Unbiased test error becomes biased validation error.', 'The impact of early stopping on the learning process and model choices is highlighted, illustrating how it changes the nature of the dataset being used. Early stopping affects the learning process and the nature of the dataset.', 'The difference between unbiased test set and optimistic biased validation set is emphasized, showing how the validation set tends to deceive by consistently underestimating the error. Validation set exhibits optimistic bias, consistently underestimating the error.']}, {'end': 1931.791, 'start': 1628.601, 'title': 'Biases in model selection', 'summary': 'Discusses how the minimum validation error introduces an optimistic bias in model selection, impacting the expected value of the out-of-sample error and the use of validation sets for model selection.', 'duration': 303.19, 'highlights': ['The expected value of E is less than 0.5 due to the rules of the game, with a 75% probability that the minimum will be less than 0.5. The expected value of E is impacted by the minimum validation error, with a 75% probability that the minimum will be less than 0.5, resulting in an optimistic bias.', 'The bias introduced by the minimum validation error is minor and does not significantly impact the reliability of the estimate for the out-of-sample error, especially with a respectable size validation set. The bias introduced by the minimum validation error is considered minor and not expected to significantly impact the reliability of the estimate for the out-of-sample error, particularly with a considerable size validation set.', 'The main use of validation sets is for model selection, and the choice of lambda is a manifestation of this. Validation sets are primarily used for model selection, and the choice of lambda serves as an example of this utilization.']}], 'duration': 494.599, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1437192.jpg', 'highlights': ['The shift from unbiased test error to biased validation error due to early stopping in the learning process is a key point in understanding the concept of validation in machine learning.', 'The difference between unbiased test set and optimistic biased validation set is emphasized, showing how the validation set tends to deceive by consistently underestimating the error.', 'The impact of early stopping on the learning process and model choices is highlighted, illustrating how it changes the nature of the dataset being used.', 'The expected value of E is less than 0.5 due to the rules of the game, with a 75% probability that the minimum will be less than 0.5, resulting in an optimistic bias.', 'The main use of validation sets is for model selection, and the choice of lambda is a manifestation of this.', 'The bias introduced by the minimum validation error is minor and does not significantly impact the reliability of the estimate for the out-of-sample error, especially with a respectable size validation set.']}, {'end': 2755.427, 'segs': [{'end': 2226.444, 'src': 'heatmap', 'start': 2117.863, 'weight': 0.889, 'content': [{'end': 2126.308, 'text': 'And from that training, which is training now on the model we chose, we are going to get the final hypothesis, which is g m star.', 'start': 2117.863, 'duration': 8.445}, {'end': 2136.015, 'text': 'Again, we are reporting the validation error on a reduced hypothesis, if you will, but reporting the hypothesis the best we can do,', 'start': 2127.388, 'duration': 8.627}, {'end': 2139.238, 'text': 'because we know that we get better out of sample when we add the examples.', 'start': 2136.015, 'duration': 3.223}, {'end': 2140.639, 'text': 'This is the regime.', 'start': 2139.258, 'duration': 1.381}, {'end': 2144.282, 'text': "Let's complete the slide.", 'start': 2141.86, 'duration': 2.422}, {'end': 2150.306, 'text': 'em that we introduced happens to be the value of the validation error on the reduced, as we discussed.', 'start': 2145.563, 'duration': 4.743}, {'end': 2152.969, 'text': 'And this is true for all of them.', 'start': 2151.728, 'duration': 1.241}, {'end': 2158.535, 'text': 'And then you pick the model m star that happens to have the smallest em.', 'start': 2153.49, 'duration': 5.045}, {'end': 2160.698, 'text': "And that is the one that you're going to report.", 'start': 2158.876, 'duration': 1.822}, {'end': 2164.542, 'text': "And you're going to restore your d, as we did before.", 'start': 2160.798, 'duration': 3.744}, {'end': 2166.184, 'text': 'And this is what you have.', 'start': 2164.662, 'duration': 1.522}, {'end': 2168.927, 'text': 'So this is the algorithm for model selection.', 'start': 2166.324, 'duration': 2.603}, {'end': 2172.86, 'text': "Now let's look at the bias.", 'start': 2170.939, 'duration': 1.921}, {'end': 2174.901, 'text': "I'm going to run an experiment to show you the bias.", 'start': 2172.88, 'duration': 2.021}, {'end': 2177.943, 'text': 'Let me put it here and just build towards it.', 'start': 2175.241, 'duration': 2.702}, {'end': 2181.745, 'text': 'What is the bias now? We know we selected a particular model.', 'start': 2178.463, 'duration': 3.282}, {'end': 2185.567, 'text': 'And we selected it based on d val.', 'start': 2182.666, 'duration': 2.901}, {'end': 2186.128, 'text': "That's the killer.", 'start': 2185.607, 'duration': 0.521}, {'end': 2194.011, 'text': 'So when you use the estimate to choose, the estimate is no longer reliable, because you particularly chose for it.', 'start': 2187.088, 'duration': 6.923}, {'end': 2198.113, 'text': 'So now it looks optimistic, because by choice, it has a good performance.', 'start': 2194.071, 'duration': 4.042}, {'end': 2202.075, 'text': 'Not because it has an inherently good performance, because you looked for the one with the good performance.', 'start': 2198.533, 'duration': 3.542}, {'end': 2207.397, 'text': 'So the expected value of this fellow.', 'start': 2204.316, 'duration': 3.081}, {'end': 2213.737, 'text': 'is now a biased estimate of the ultimate quantity we want, which is the out-of-sample error.', 'start': 2209.375, 'duration': 4.362}, {'end': 2217.639, 'text': 'So the eval, the sample thing, is biased of that.', 'start': 2213.938, 'duration': 3.701}, {'end': 2219.681, 'text': 'And we would like to evaluate that.', 'start': 2218.36, 'duration': 1.321}, {'end': 2222.322, 'text': 'So here is the illustration on the curve.', 'start': 2220.581, 'duration': 1.741}, {'end': 2226.444, 'text': "And I'm going to ask you a question about it, so you have to pay attention in order to be able to answer the question.", 'start': 2222.462, 'duration': 3.982}], 'summary': 'Training model, selecting smallest error, evaluating bias in model selection', 'duration': 108.581, 'max_score': 2117.863, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2117863.jpg'}, {'end': 2432.824, 'src': 'embed', 'start': 2403.301, 'weight': 0, 'content': [{'end': 2405.403, 'text': 'And in every situation you will have, there will be a bias.', 'start': 2403.301, 'duration': 2.102}, {'end': 2407.365, 'text': 'How much bias depends on a number of factors.', 'start': 2405.683, 'duration': 1.682}, {'end': 2408.566, 'text': 'But the bias is there.', 'start': 2407.585, 'duration': 0.981}, {'end': 2422.151, 'text': "Let's try to find analytically a guideline for the type of bias.", 'start': 2418.146, 'duration': 4.005}, {'end': 2422.852, 'text': 'Why is that??', 'start': 2422.311, 'duration': 0.541}, {'end': 2429.039, 'text': "Because I'm using the validation set to estimate the out-of-sample error and I'm really claiming that it's close to the out-of-sample error.", 'start': 2422.952, 'duration': 6.087}, {'end': 2432.824, 'text': "And we realize that if I don't use it too much, I'll be OK.", 'start': 2429.26, 'duration': 3.564}], 'summary': 'Bias exists in every situation, its extent depends on factors. using the validation set to estimate out-of-sample error is crucial.', 'duration': 29.523, 'max_score': 2403.301, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2403301.jpg'}, {'end': 2500.247, 'src': 'embed', 'start': 2467.037, 'weight': 3, 'content': [{'end': 2473.98, 'text': 'Models in the general sense, this could be M values of the regularization parameter lambda in a fixed situation.', 'start': 2467.037, 'duration': 6.943}, {'end': 2480.362, 'text': "But we're still making one of M choices.", 'start': 2474.4, 'duration': 5.962}, {'end': 2492.086, 'text': 'The way to look at it is to think that the validation set is actually used for training, but training on a very special hypothesis set.', 'start': 2480.582, 'duration': 11.504}, {'end': 2496.282, 'text': 'the hypothesis set of the finalists.', 'start': 2494.059, 'duration': 2.223}, {'end': 2500.247, 'text': 'What does that mean? So I have H1 up to HM.', 'start': 2496.983, 'duration': 3.264}], 'summary': 'Exploring m values of lambda for training on a specialized hypothesis set.', 'duration': 33.21, 'max_score': 2467.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2467037.jpg'}, {'end': 2616.979, 'src': 'embed', 'start': 2589.331, 'weight': 1, 'content': [{'end': 2592.072, 'text': 'And the final, final hypothesis is this guy.', 'start': 2589.331, 'duration': 2.741}, {'end': 2598.488, 'text': 'Okay? is less than or equal to the out-of-sample error, plus a penalty for the model complexity.', 'start': 2592.596, 'duration': 5.892}, {'end': 2602.35, 'text': 'And the penalty, if you use even the simple union bound, will have that form.', 'start': 2598.948, 'duration': 3.402}, {'end': 2605.832, 'text': 'So you still have the 1 over square root of k.', 'start': 2602.89, 'duration': 2.942}, {'end': 2608.133, 'text': 'So you can always make it better by having more examples.', 'start': 2605.832, 'duration': 2.301}, {'end': 2612.076, 'text': 'But then you have a contribution because of the number of guys you are choosing from.', 'start': 2608.734, 'duration': 3.342}, {'end': 2614.817, 'text': "So if you are choosing between 10 guys, that's one thing.", 'start': 2612.696, 'duration': 2.121}, {'end': 2616.979, 'text': "If you are choosing between 100 guys, that's another.", 'start': 2615.097, 'duration': 1.882}], 'summary': 'Final hypothesis <= out-of-sample error + penalty for model complexity.', 'duration': 27.648, 'max_score': 2589.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2589331.jpg'}, {'end': 2669.822, 'src': 'embed', 'start': 2644.77, 'weight': 2, 'content': [{'end': 2650.392, 'text': "And indeed, if you are looking for a choice of one parameter, let's say I'm picking the regularization parameter.", 'start': 2644.77, 'duration': 5.622}, {'end': 2657.677, 'text': "When you are actually picking the regression parameter and you haven't put a grid, you don't say I'm choosing between 1, 0.1, and 0.01,,", 'start': 2651.312, 'duration': 6.365}, {'end': 2659.518, 'text': 'et cetera a finite number.', 'start': 2657.677, 'duration': 1.841}, {'end': 2662.56, 'text': "I'm actually choosing the numerical value of lambda, whatever it be.", 'start': 2659.878, 'duration': 2.682}, {'end': 2667.881, 'text': 'So I could end up with lambda equals 0.127543.', 'start': 2662.82, 'duration': 5.061}, {'end': 2669.822, 'text': 'You are making a choice between an infinite number of guys.', 'start': 2667.881, 'duration': 1.941}], 'summary': 'Selecting a numerical value for the regularization parameter, e.g., lambda equals 0.127543.', 'duration': 25.052, 'max_score': 2644.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2644770.jpg'}], 'start': 1931.831, 'title': 'Model selection and bias analysis', 'summary': 'Explores a model selection algorithm based on validation errors, discusses bias in the selected model, showcases an experiment with two models and their out-of-sample errors, and analyzes the impact of validation set size on bias. it also discusses the challenges of choosing between models and parameters, emphasizing the impact of the number of choices on the training and out-of-sample errors, as well as the relationship between the validation set size and the number of parameters.', 'chapters': [{'end': 2429.039, 'start': 1931.831, 'title': 'Model selection algorithm and bias analysis', 'summary': 'Explores a model selection algorithm based on validation errors and discusses the presence of bias in the selected model, showcasing an experiment with two models and their out-of-sample errors, while also analyzing the impact of validation set size on bias.', 'duration': 497.208, 'highlights': ['The algorithm for model selection involves training multiple models on a reduced set, evaluating their performance using the validation set, and selecting the model with the smallest validation error as the final hypothesis. The algorithm involves training multiple models on a reduced set, evaluating their performance using the validation set, and selecting the model with the smallest validation error as the final hypothesis.', 'The validation error of the selected model exhibits optimistic bias due to the selection process, leading to unreliable estimates of the out-of-sample error. The validation error of the selected model exhibits optimistic bias due to the selection process, leading to unreliable estimates of the out-of-sample error.', 'An experiment is conducted to showcase the systematic bias in the selection of models based on their validation errors, with the average validation error on the chosen model being compared to its actual out-of-sample error. An experiment is conducted to showcase the systematic bias in the selection of models based on their validation errors, with the average validation error on the chosen model being compared to its actual out-of-sample error.', 'The impact of validation set size on bias is analyzed, demonstrating that larger validation set sizes lead to more reliable estimates and reduced bias in model selection. The impact of validation set size on bias is analyzed, demonstrating that larger validation set sizes lead to more reliable estimates and reduced bias in model selection.']}, {'end': 2755.427, 'start': 2429.26, 'title': 'Choosing between models and parameters', 'summary': 'Discusses the challenges of choosing between models and parameters, emphasizing the impact of the number of choices on the training and out-of-sample errors, as well as the relationship between the validation set size and the number of parameters.', 'duration': 326.167, 'highlights': ['The impact of the number of choices on training and out-of-sample errors is discussed, emphasizing the penalty for model complexity and the logarithmic worsening effect with an increasing number of choices. The penalty for model complexity and the logarithmic worsening effect with an increasing number of choices are emphasized, illustrating how the number of choices impacts both training and out-of-sample errors.', 'The relationship between the validation set size and the number of parameters is explored, highlighting the impact of the VC dimension and the rule of thumb regarding the number of parameters that can be effectively chosen using the validation set. The impact of the VC dimension and the rule of thumb regarding the number of parameters that can be effectively chosen using the validation set is explained, emphasizing the relationship between the validation set size and the number of parameters.', 'The concept of choosing parameters with one degree of freedom, such as the regularization parameter and early stopping, is introduced, emphasizing the correspondence between the number of parameters and degrees of freedom. The concept of choosing parameters with one degree of freedom, such as the regularization parameter and early stopping, is introduced, highlighting the correspondence between the number of parameters and degrees of freedom.']}], 'duration': 823.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk1931831.jpg', 'highlights': ['The impact of validation set size on bias is analyzed, demonstrating that larger validation set sizes lead to more reliable estimates and reduced bias in model selection.', 'The penalty for model complexity and the logarithmic worsening effect with an increasing number of choices are emphasized, illustrating how the number of choices impacts both training and out-of-sample errors.', 'The concept of choosing parameters with one degree of freedom, such as the regularization parameter and early stopping, is introduced, highlighting the correspondence between the number of parameters and degrees of freedom.', 'The algorithm for model selection involves training multiple models on a reduced set, evaluating their performance using the validation set, and selecting the model with the smallest validation error as the final hypothesis.']}, {'end': 3733.777, 'segs': [{'end': 2915.765, 'src': 'embed', 'start': 2886.813, 'weight': 0, 'content': [{'end': 2893.618, 'text': 'When you give that as your estimate, your customer is as likely to be pleasantly surprised as unpleasantly surprised.', 'start': 2886.813, 'duration': 6.805}, {'end': 2899.803, 'text': 'And if your test set is big, they are likely not to be surprised at all, to be very close to your estimate.', 'start': 2894.459, 'duration': 5.344}, {'end': 2901.524, 'text': 'So there is no bias there.', 'start': 2900.603, 'duration': 0.921}, {'end': 2905.581, 'text': 'Now, the validation set is in between.', 'start': 2903.12, 'duration': 2.461}, {'end': 2909.543, 'text': "It's slightly contaminated, because it made few choices.", 'start': 2906.702, 'duration': 2.841}, {'end': 2915.765, 'text': 'And the wisdom here, please keep it slightly contaminated.', 'start': 2910.463, 'duration': 5.302}], 'summary': 'Estimate accurately to avoid customer surprises. validation set may be slightly contaminated.', 'duration': 28.952, 'max_score': 2886.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2886813.jpg'}, {'end': 3081.181, 'src': 'embed', 'start': 3050.434, 'weight': 2, 'content': [{'end': 3052.795, 'text': 'And therefore, I can claim that the out-of-sample error is close.', 'start': 3050.434, 'duration': 2.361}, {'end': 3058.56, 'text': 'Because the bigger K is, the bigger the discrepancy between the training set and the full set.', 'start': 3052.876, 'duration': 5.684}, {'end': 3062.684, 'text': 'And therefore, the bigger the discrepancy between the hypothesis I get here and the hypothesis I get here.', 'start': 3059.021, 'duration': 3.663}, {'end': 3064.305, 'text': "So I'd like K to be small.", 'start': 3063.084, 'duration': 1.221}, {'end': 3068.328, 'text': "But also, I'd like K to be large.", 'start': 3065.546, 'duration': 2.782}, {'end': 3072.612, 'text': 'Because the bigger K is, the more reliable this estimate is for that.', 'start': 3068.889, 'duration': 3.723}, {'end': 3076.134, 'text': 'So I want K to have two conditions.', 'start': 3073.952, 'duration': 2.182}, {'end': 3081.181, 'text': 'It has to be small, and it has to be large.', 'start': 3076.575, 'duration': 4.606}], 'summary': 'K should be both small and large for reliable estimate.', 'duration': 30.747, 'max_score': 3050.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3050434.jpg'}, {'end': 3127.28, 'src': 'embed', 'start': 3098.503, 'weight': 1, 'content': [{'end': 3102.307, 'text': 'And when you look at it, it will look first, and then you realize this is actually valid.', 'start': 3098.503, 'duration': 3.804}, {'end': 3109.633, 'text': "So what do we do? I'm going to describe one form of cross-validation, which is the simplest to describe, which is called leave-one-out.", 'start': 3102.687, 'duration': 6.946}, {'end': 3112.015, 'text': 'Other methods will be leave-more-out.', 'start': 3110.133, 'duration': 1.882}, {'end': 3112.555, 'text': "That's all.", 'start': 3112.215, 'duration': 0.34}, {'end': 3114.437, 'text': "But let's focus on leave-one-out.", 'start': 3113.256, 'duration': 1.181}, {'end': 3117.572, 'text': 'Here is the idea.', 'start': 3115.47, 'duration': 2.102}, {'end': 3120.454, 'text': 'You give me a data set of N.', 'start': 3118.092, 'duration': 2.362}, {'end': 3123.296, 'text': 'I am going to use N minus 1 of them for training.', 'start': 3120.454, 'duration': 2.842}, {'end': 3127.28, 'text': "That's good, because now I am very close to N.", 'start': 3124.257, 'duration': 3.023}], 'summary': 'Describes leave-one-out cross-validation method for training data, giving n-1 for training.', 'duration': 28.777, 'max_score': 3098.503, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3098503.jpg'}, {'end': 3405.278, 'src': 'embed', 'start': 3375.596, 'weight': 6, 'content': [{'end': 3378.579, 'text': 'Never mind the fact that each of them is evaluated in a different hypothesis.', 'start': 3375.596, 'duration': 2.983}, {'end': 3387.306, 'text': 'So now I was able to use n minus 1 points to train, and that will give me something very close to what happens with n.', 'start': 3379.78, 'duration': 7.526}, {'end': 3389.188, 'text': "And I'm using n points to validate.", 'start': 3387.306, 'duration': 1.882}, {'end': 3394.834, 'text': 'The catch, obviously, these are not independent.', 'start': 3390.612, 'duration': 4.222}, {'end': 3401.617, 'text': 'These are not independent, because the examples were used to create the hypothesis, and some examples were used to evaluate them.', 'start': 3395.114, 'duration': 6.503}, {'end': 3405.278, 'text': 'And you will see that each of them is affected by the other,', 'start': 3402.297, 'duration': 2.981}], 'summary': 'Using n-1 points for training and n points for validation can give close results, but not independent due to shared examples in hypothesis creation and evaluation.', 'duration': 29.682, 'max_score': 3375.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3375596.jpg'}, {'end': 3671.359, 'src': 'embed', 'start': 3641.421, 'weight': 3, 'content': [{'end': 3654.721, 'text': 'if your question is is the linear model better than the constant model in this case? The only thing you look in all of this is the cross-validation error.', 'start': 3641.421, 'duration': 13.3}, {'end': 3664.071, 'text': "So this guy, this guy, this guy averaged is the negative grade, because it's error, for the linear model.", 'start': 3656.403, 'duration': 7.668}, {'end': 3668.476, 'text': 'This guy, this guy, this guy averaged is the grade for the constant model.', 'start': 3664.692, 'duration': 3.784}, {'end': 3671.359, 'text': 'And as you see, the constant model wins.', 'start': 3668.997, 'duration': 2.362}], 'summary': 'Constant model outperforms linear model based on cross-validation error.', 'duration': 29.938, 'max_score': 3641.421, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3641421.jpg'}, {'end': 3719.166, 'src': 'embed', 'start': 3694.366, 'weight': 5, 'content': [{'end': 3701.052, 'text': "There is a very strong indication that there is a positive slope involved, and maybe it's a linear model with a positive slope.", 'start': 3694.366, 'duration': 6.686}, {'end': 3702.353, 'text': "Don't go there.", 'start': 3701.212, 'duration': 1.141}, {'end': 3705.075, 'text': 'You can fool yourself into any pattern you want.', 'start': 3702.633, 'duration': 2.442}, {'end': 3707.277, 'text': 'Go about it in a systematic way.', 'start': 3705.695, 'duration': 1.582}, {'end': 3709.839, 'text': 'This is the quantity we know, the cross-validation error.', 'start': 3707.617, 'duration': 2.222}, {'end': 3711.14, 'text': 'This is the way to compute it.', 'start': 3710.099, 'duration': 1.041}, {'end': 3719.166, 'text': "We are going to take it as the indication, notwithstanding that there is an error bar, because it's a small sample, in this case 3.", 'start': 3711.5, 'duration': 7.666}], 'summary': 'Indication of positive slope in linear model with small sample size of 3.', 'duration': 24.8, 'max_score': 3694.366, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3694366.jpg'}], 'start': 2755.587, 'title': 'Data contamination and cross-validation in ml', 'summary': 'Discusses the concept of data contamination in training and its impact on error estimates, emphasizing the trade-off between reliability and contamination. it also delves into the role of different data sets like training, test, and validation sets, and the dilemma of k in cross-validation. additionally, it covers the leave-one-out cross-validation method and the use of cross-validation for model selection, showcasing the process of leave-one-out cross-validation and using the cross-validation error to choose between linear and constant models.', 'chapters': [{'end': 3098.083, 'start': 2755.587, 'title': 'Data contamination in training', 'summary': 'Discusses the concept of data contamination in training and its impact on error estimates, emphasizing the trade-off between reliability and contamination. it also delves into the role of different data sets like training, test, and validation sets, and the dilemma of k in cross-validation.', 'duration': 342.496, 'highlights': ['The training set is totally contaminated, making it unreliable as an estimate for the out-of-sample error, while the test set remains unbiased and provides a reliable estimate. The validation set, although slightly contaminated due to making some choices, requires careful handling to maintain reliability. ', 'The chapter explores the dilemma of choosing the value of K in cross-validation, highlighting the conflicting requirements of K being small for minimizing discrepancy between training and full sets, and K being large for reliable estimates. It teases the introduction of new mathematics to address this dilemma. ']}, {'end': 3442.017, 'start': 3098.503, 'title': 'Cross-validation in machine learning', 'summary': 'Discusses the leave-one-out cross-validation method, where a model is trained on n-1 data points and validated on one point, leading to the estimation of the average cross-validation error.', 'duration': 343.514, 'highlights': ["The leave-one-out cross-validation method involves training a model on N-1 data points and validating on one point, leading to the estimation of the average cross-validation error. By repeatedly performing leave-one-out cross-validation for different indices n, a set of estimates for the validation error is obtained, providing an unbiased estimate of the model's performance.", 'The cross-validation error, ECV, is defined as the average of the validation error estimates from the leave-one-out cross-validation process. The cross-validation error is calculated by averaging the validation error estimates obtained from multiple training sessions, each followed by a single evaluation on a data point.', 'The examples used in cross-validation are not independent, leading to correlation between the validation errors, yet the effective number of points is close to N, as if they were independent. Despite the correlation between the validation errors due to using the same examples for training and evaluation, the effective number of points in cross-validation is found to be very close to N.']}, {'end': 3733.777, 'start': 3442.858, 'title': 'Cross-validation for model selection', 'summary': 'Illustrates the use of cross-validation for model selection, demonstrating the process of leave-one-out cross-validation and using the cross-validation error to choose between linear and constant models, where the constant model is found to be better.', 'duration': 290.919, 'highlights': ['The chapter illustrates the process of leave-one-out cross-validation for model selection, using a small sample size of 3 points. The process of leave-one-out cross-validation is demonstrated, emphasizing the small sample size of 3 points.', 'The cross-validation error is used to compare the performance of linear and constant models, with the constant model being found to have a lower error and thus chosen as the better model. The cross-validation error is utilized to compare the performance of linear and constant models, ultimately leading to the selection of the constant model due to its lower error.', 'The importance of avoiding heuristic approaches and relying on systematic computation of the cross-validation error is emphasized for accurate model selection. The emphasis is placed on the significance of avoiding heuristic approaches and instead relying on systematic computation of the cross-validation error for accurate model selection.']}], 'duration': 978.19, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk2755587.jpg', 'highlights': ['The training set is totally contaminated, making it unreliable as an estimate for the out-of-sample error, while the test set remains unbiased and provides a reliable estimate.', 'The leave-one-out cross-validation method involves training a model on N-1 data points and validating on one point, leading to the estimation of the average cross-validation error.', 'The chapter explores the dilemma of choosing the value of K in cross-validation, highlighting the conflicting requirements of K being small for minimizing discrepancy between training and full sets, and K being large for reliable estimates.', 'The cross-validation error is used to compare the performance of linear and constant models, with the constant model being found to have a lower error and thus chosen as the better model.', 'The validation set, although slightly contaminated due to making some choices, requires careful handling to maintain reliability.', 'The importance of avoiding heuristic approaches and relying on systematic computation of the cross-validation error is emphasized for accurate model selection.', 'The examples used in cross-validation are not independent, leading to correlation between the validation errors, yet the effective number of points is close to N, as if they were independent.']}, {'end': 5157.868, 'segs': [{'end': 3864.984, 'src': 'embed', 'start': 3837.902, 'weight': 3, 'content': [{'end': 3842.164, 'text': 'When you look at the training error, not surprisingly, the training error always goes down.', 'start': 3837.902, 'duration': 4.262}, {'end': 3844.045, 'text': 'What else is new? You have more.', 'start': 3842.625, 'duration': 1.42}, {'end': 3844.606, 'text': 'You fit better.', 'start': 3844.085, 'duration': 0.521}, {'end': 3851.878, 'text': 'The out-of-sample error, which I am evaluating on the points that were not involved at all in this process cross-validation or otherwise,', 'start': 3846.195, 'duration': 5.683}, {'end': 3855.059, 'text': 'just out-of-sample, totally I get this fellow.', 'start': 3851.878, 'duration': 3.181}, {'end': 3864.984, 'text': 'And the cross-validation error which I get from the 500 examples by excluding one point at a time and taking the average is remarkably similar to E out.', 'start': 3856.2, 'duration': 8.784}], 'summary': 'Training error decreases, out-of-sample error is similar to cross-validation error.', 'duration': 27.082, 'max_score': 3837.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3837902.jpg'}, {'end': 3999.942, 'src': 'embed', 'start': 3904.713, 'weight': 0, 'content': [{'end': 3905.874, 'text': 'So this is a typical thing.', 'start': 3904.713, 'duration': 1.161}, {'end': 3907.055, 'text': "It's like unregularized.", 'start': 3905.914, 'duration': 1.141}, {'end': 3913.901, 'text': "Now, when you use the validation, and you stop at the sixth because the cross-validation error told you so, it's a nice, smooth surface.", 'start': 3907.936, 'duration': 5.965}, {'end': 3919.206, 'text': "It's not a perfect error, but it didn't put an effort where it didn't belong.", 'start': 3914.302, 'duration': 4.904}, {'end': 3924.831, 'text': 'And when you look at the bottom line, what is the in-sample error here? 0%.', 'start': 3919.846, 'duration': 4.985}, {'end': 3925.511, 'text': 'You got it perfect.', 'start': 3924.831, 'duration': 0.68}, {'end': 3926.912, 'text': 'We know that.', 'start': 3926.432, 'duration': 0.48}, {'end': 3930.536, 'text': 'And the out-of-sample error? 2.5%.', 'start': 3928.314, 'duration': 2.222}, {'end': 3934.479, 'text': "For digits, that's OK, but not great.", 'start': 3930.536, 'duration': 3.943}, {'end': 3939.71, 'text': 'Here we went, and now the in-sample error is 0.8.', 'start': 3936.687, 'duration': 3.023}, {'end': 3940.55, 'text': 'But we know better.', 'start': 3939.71, 'duration': 0.84}, {'end': 3942.632, 'text': "We don't care about the in-sample error going to 0.", 'start': 3940.651, 'duration': 1.981}, {'end': 3944.013, 'text': "That's actually harmful in some cases.", 'start': 3942.632, 'duration': 1.381}, {'end': 3947.256, 'text': 'The out-of-sample error is 1.5%.', 'start': 3944.634, 'duration': 2.622}, {'end': 3951.68, 'text': 'Now, if you are in the range, 2.5% means that you are performing 97.5%.', 'start': 3947.256, 'duration': 4.424}, {'end': 3952.461, 'text': 'Here, you are performing 8.5%.', 'start': 3951.68, 'duration': 0.781}, {'end': 3953.722, 'text': '40% improvement in that range is a lot.', 'start': 3952.461, 'duration': 1.261}, {'end': 3955.143, 'text': 'There is a limit here that you cannot exceed.', 'start': 3953.742, 'duration': 1.401}, {'end': 3967.871, 'text': 'So here, you are really doing great by just doing that simple thing.', 'start': 3963.528, 'duration': 4.343}, {'end': 3972.956, 'text': 'Now you can see why validation is considered, in this context, as similar to regularization.', 'start': 3967.911, 'duration': 5.045}, {'end': 3974.017, 'text': 'It does the same thing.', 'start': 3973.056, 'duration': 0.961}, {'end': 3981.103, 'text': 'It prevented overfitting, but it prevented overfitting by estimating the out-of-sample error, rather than estimating something else.', 'start': 3974.337, 'duration': 6.766}, {'end': 3994.602, 'text': 'Now, let me go and very quickly, and I will close the lecture with it, give you the more general ones.', 'start': 3985.407, 'duration': 9.195}, {'end': 3995.704, 'text': 'So we talked about leave one out.', 'start': 3994.622, 'duration': 1.082}, {'end': 3998.942, 'text': 'Seldom you use leave-one-out in real problems.', 'start': 3996.92, 'duration': 2.022}, {'end': 3999.942, 'text': 'And you can think of why.', 'start': 3999.022, 'duration': 0.92}], 'summary': 'Validation helps prevent overfitting, achieving 40% improvement in performance with 2.5% out-of-sample error.', 'duration': 95.229, 'max_score': 3904.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3904713.jpg'}, {'end': 4120.91, 'src': 'embed', 'start': 4092.416, 'weight': 5, 'content': [{'end': 4096.12, 'text': 'Now, the reason I introduce this is because this is what I actually recommend to you.', 'start': 4092.416, 'duration': 3.704}, {'end': 4102.064, 'text': 'Very specifically, tenfold cross-validation works very nicely in practice.', 'start': 4096.74, 'duration': 5.324}, {'end': 4109.046, 'text': 'So the rule is, you take the total number of examples, divide them by 10, and that is the size of your validation set.', 'start': 4104.184, 'duration': 4.862}, {'end': 4113.328, 'text': 'You repeat it 10 times, and you get an estimate, and you are ready to go.', 'start': 4109.086, 'duration': 4.242}, {'end': 4114.127, 'text': "That's it.", 'start': 4113.768, 'duration': 0.359}, {'end': 4120.91, 'text': "I will stop here, and we'll take questions after a short break.", 'start': 4114.688, 'duration': 6.222}], 'summary': 'Recommend tenfold cross-validation for practical use.', 'duration': 28.494, 'max_score': 4092.416, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4092416.jpg'}, {'end': 4464.714, 'src': 'embed', 'start': 4438.284, 'weight': 6, 'content': [{'end': 4443.385, 'text': 'So you can use it to choose the regularization parameter, and then you can also use it on the side to do something else.', 'start': 4438.284, 'duration': 5.101}, {'end': 4445.326, 'text': 'So both of them are active in the same problem.', 'start': 4443.465, 'duration': 1.861}, {'end': 4450.967, 'text': 'And in most of the practical cases you will encounter, you will actually be using both.', 'start': 4445.946, 'duration': 5.021}, {'end': 4459.026, 'text': 'Very seldom can you get away without regularization, and very seldom can you get away without validation.', 'start': 4451.917, 'duration': 7.109}, {'end': 4464.714, 'text': 'Someone is asking that this seems to be like a brute force method for model selection.', 'start': 4460.248, 'duration': 4.466}], 'summary': 'Regularization and validation are essential for model selection in practical cases.', 'duration': 26.43, 'max_score': 4438.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4438284.jpg'}, {'end': 4509.955, 'src': 'embed', 'start': 4483.818, 'weight': 7, 'content': [{'end': 4489.882, 'text': "I can do model selection based on, I know my target function is symmetric, so I'm going to choose a symmetric model.", 'start': 4483.818, 'duration': 6.064}, {'end': 4491.884, 'text': 'That can be considered model selection.', 'start': 4490.303, 'duration': 1.581}, {'end': 4496.568, 'text': 'And there are a bunch of other logical methods to choose the model.', 'start': 4492.504, 'duration': 4.064}, {'end': 4500.69, 'text': 'The great thing about validation is that there are no assumptions whatsoever.', 'start': 4497.128, 'duration': 3.562}, {'end': 4502.451, 'text': 'You have capital M models.', 'start': 4501.45, 'duration': 1.001}, {'end': 4509.955, 'text': 'What are the models? What assumptions do they have? How close they are, or not close to the target function? Who cares? They have M models.', 'start': 4502.631, 'duration': 7.324}], 'summary': 'Model selection based on symmetry, validation with no assumptions, and m models.', 'duration': 26.137, 'max_score': 4483.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4483818.jpg'}, {'end': 4573.595, 'src': 'embed', 'start': 4544.848, 'weight': 9, 'content': [{'end': 4546.609, 'text': 'Is it used for that or not??', 'start': 4544.848, 'duration': 1.761}, {'end': 4552.904, 'text': 'Validation makes a principled choice.', 'start': 4549.375, 'duration': 3.529}, {'end': 4556.907, 'text': 'Regardless of the nature of that choice.', 'start': 4553.925, 'duration': 2.982}, {'end': 4561.109, 'text': "Let's say that I have a time series.", 'start': 4557.867, 'duration': 3.242}, {'end': 4567.272, 'text': "And one of the things in time series, let's say for financial forecasting, is that you can train, and then you get a system.", 'start': 4561.789, 'duration': 5.483}, {'end': 4570.794, 'text': 'And then the world is not stationary.', 'start': 4567.772, 'duration': 3.022}, {'end': 4573.595, 'text': "So a system that used to work doesn't work anymore.", 'start': 4571.174, 'duration': 2.421}], 'summary': 'Validation helps in adapting to non-stationary time series data for financial forecasting.', 'duration': 28.747, 'max_score': 4544.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4544848.jpg'}, {'end': 4696.842, 'src': 'embed', 'start': 4670.296, 'weight': 8, 'content': [{'end': 4675.177, 'text': "And if I have the other guy, which is completely swinging, it's very easy to pull it down, and I get worse effect of the bias.", 'start': 4670.296, 'duration': 4.881}, {'end': 4680.339, 'text': 'So whenever you minimize the error bar, you minimize the vulnerability to bias as well.', 'start': 4675.717, 'duration': 4.622}, {'end': 4683.24, 'text': "That's the only thing that cross-validation does.", 'start': 4681.159, 'duration': 2.081}, {'end': 4688.362, 'text': 'It allows you to use a lot of examples to validate, while using a lot of examples to train.', 'start': 4683.32, 'duration': 5.042}, {'end': 4689.402, 'text': "That's the key.", 'start': 4688.742, 'duration': 0.66}, {'end': 4696.842, 'text': 'Going back to the previous lecture, a question on that.', 'start': 4693.259, 'duration': 3.583}], 'summary': 'Minimize error bar to reduce bias vulnerability. cross-validation uses many examples to validate and train.', 'duration': 26.546, 'max_score': 4670.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4670296.jpg'}, {'end': 4871.888, 'src': 'embed', 'start': 4846.98, 'weight': 10, 'content': [{'end': 4853.363, 'text': "Say there's a scenario where you find your model through cross-validation, and then you test the out-of-sample error.", 'start': 4846.98, 'duration': 6.383}, {'end': 4857.825, 'text': 'But somehow you test a different model, and it gives you a smaller out-of-sample error.', 'start': 4853.723, 'duration': 4.102}, {'end': 4865.782, 'text': 'Should you still keep the one you found through cross-validation? I went through this learning and came up with a model.', 'start': 4857.845, 'duration': 7.937}, {'end': 4871.888, 'text': 'Someone else went through whatever exercise they have and came up with a final hypothesis in this case.', 'start': 4866.483, 'duration': 5.405}], 'summary': 'When testing a different model, it gave a smaller out-of-sample error than the one found through cross-validation.', 'duration': 24.908, 'max_score': 4846.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk4846980.jpg'}], 'start': 3735.799, 'title': 'Importance of cross-validation in model selection', 'summary': 'Demonstrates the importance of cross-validation for model selection, showing a 40% improvement in out-of-sample error for a digit classification task. it discusses preventing overfitting, recommends tenfold cross-validation, emphasizes the use of regularization and cross-validation in model selection, and discusses model validation simplicity, time-dependent data, and cross-validation benefits in large datasets.', 'chapters': [{'end': 3974.017, 'start': 3735.799, 'title': 'Cross-validation for model selection', 'summary': 'Demonstrates the use of cross-validation to select the optimal number of features for a nonlinear transformation, resulting in a 40% improvement in out-of-sample error in a digit classification task.', 'duration': 238.218, 'highlights': ['The use of cross-validation to select the optimal number of features for a nonlinear transformation resulted in a 40% improvement in out-of-sample error in a digit classification task.', 'The cross-validation error was remarkably similar to the out-of-sample error, tracking it very nicely and guiding the selection of the optimal number of features.', 'Validation through cross-validation led to a 40% improvement in performance, demonstrating its similarity to regularization in this context.', 'The in-sample error reduced to 0% after feature selection based on cross-validation, while the out-of-sample error improved to 2.5% in the selected model, indicating a significant enhancement in performance.']}, {'end': 4459.026, 'start': 3974.337, 'title': 'Cross-validation and model selection', 'summary': 'Discusses the importance of cross-validation in preventing overfitting, recommends tenfold cross-validation as a practical approach, and emphasizes the use of both regularization and cross-validation in model selection.', 'duration': 484.689, 'highlights': ['The chapter discusses the importance of cross-validation in preventing overfitting. It prevented overfitting by estimating the out-of-sample error, rather than estimating something else.', 'Recommends tenfold cross-validation as a practical approach. The rule is to take the total number of examples, divide them by 10, and repeat it 10 times to get an estimate.', 'Emphasizes the use of both regularization and cross-validation in model selection. One of the biggest utilities for validation is to choose the regularization parameter, and both of them are active in the same problem.']}, {'end': 4846.86, 'start': 4460.248, 'title': 'Model selection and validation', 'summary': 'Discusses the concept of model selection and validation, emphasizing the simplicity and lack of assumptions in the validation process, the use of validation for time-dependent data, the comparison between validation and cross-validation, and the potential benefits of cross-validation in large datasets, citing the example of the netflix case.', 'duration': 386.612, 'highlights': ['The validation process is extremely simple and immune to assumptions, allowing for model selection without requiring specific assumptions about the data. Validation process is simple, immune to assumptions, and allows for model selection without specific assumptions.', 'Validation can be used for making principled choices in the case of time-dependent data, such as in tracking the evolution of systems. Validation can be used for making principled choices in the case of time-dependent data, such as in tracking the evolution of systems.', 'Cross-validation allows for using a lot of examples to validate while using a lot of examples to train, minimizing the vulnerability to bias, especially in large datasets Cross-validation allows for using many examples to validate and train, minimizing bias, especially in large datasets.']}, {'end': 5157.868, 'start': 4846.98, 'title': 'Cross-validation and model evaluation', 'summary': 'Discusses the evaluation of models through cross-validation, addressing the issue of model selection based on out-of-sample error, the impact of sample size on cross-validation, and the behavior of bias with an increasing number of validation points.', 'duration': 310.888, 'highlights': ['The chapter addresses the issue of model selection based on out-of-sample error, questioning the validity of selecting a model found through cross-validation over a different model that yields a smaller out-of-sample error. out-of-sample error', 'It discusses the impact of sample size on cross-validation, considering the relative size of the error bar, correlations, and bias, and whether it is a good idea to resample to enlarge the current sample. sample size, error bar, correlations, bias', 'The chapter also explores the behavior of bias when increasing the number of points left out, and how the bias is a function of how it is used rather than something inherent in the estimate. number of points left out, bias']}], 'duration': 1422.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/o7zzaKd0Lkk/pics/o7zzaKd0Lkk3735799.jpg', 'highlights': ['The use of cross-validation resulted in a 40% improvement in out-of-sample error for a digit classification task', 'Validation through cross-validation led to a 40% improvement in performance, demonstrating its similarity to regularization', 'The in-sample error reduced to 0% after feature selection based on cross-validation, indicating a significant enhancement in performance', 'The cross-validation error tracked the out-of-sample error very nicely, guiding the selection of the optimal number of features', 'The chapter discusses the importance of cross-validation in preventing overfitting by estimating the out-of-sample error', 'Recommends tenfold cross-validation as a practical approach for model selection', 'Emphasizes the use of both regularization and cross-validation in model selection', 'The validation process is extremely simple and immune to assumptions, allowing for model selection without requiring specific assumptions about the data', 'Cross-validation allows for using many examples to validate and train, minimizing bias, especially in large datasets', 'Validation can be used for making principled choices in the case of time-dependent data, such as in tracking the evolution of systems', 'The chapter addresses the issue of model selection based on out-of-sample error and the impact of sample size on cross-validation']}], 'highlights': ['The use of cross-validation resulted in a 40% improvement in out-of-sample error for a digit classification task', 'The in-sample error reduced to 0% after feature selection based on cross-validation, indicating a significant enhancement in performance', 'The cross-validation error tracked the out-of-sample error very nicely, guiding the selection of the optimal number of features', 'The transition from constrained to unconstrained regularization involves explicitly forbidding some hypotheses to reduce VC dimension and improve generalization property, leading to an augmented error preference based on a penalty', 'The discussion delves into the general form of a regularizer, denoted as capital omega of H, which depends on small h, and the formation of the augmented error as the in-sample error plus a specific term for better out-of-sample error prediction', 'The focus on the augmented error form of regularization is emphasized in practice, with the argument for it being the equivalence to the constrained version, either through Lagrangian or geometric approach', 'The impact of validation set size on bias is analyzed, demonstrating that larger validation set sizes lead to more reliable estimates and reduced bias in model selection', 'The penalty for model complexity and the logarithmic worsening effect with an increasing number of choices are emphasized, illustrating how the number of choices impacts both training and out-of-sample errors', 'The algorithm for model selection involves training multiple models on a reduced set, evaluating their performance using the validation set, and selecting the model with the smallest validation error as the final hypothesis', 'The training set is totally contaminated, making it unreliable as an estimate for the out-of-sample error, while the test set remains unbiased and provides a reliable estimate', 'The leave-one-out cross-validation method involves training a model on N-1 data points and validating on one point, leading to the estimation of the average cross-validation error', 'The chapter explores the dilemma of choosing the value of K in cross-validation, highlighting the conflicting requirements of K being small for minimizing discrepancy between training and full sets, and K being large for reliable estimates', 'The cross-validation error is used to compare the performance of linear and constant models, with the constant model being found to have a lower error and thus chosen as the better model', 'The validation set, although slightly contaminated due to making some choices, requires careful handling to maintain reliability', 'The importance of avoiding heuristic approaches and relying on systematic computation of the cross-validation error is emphasized for accurate model selection', 'The examples used in cross-validation are not independent, leading to correlation between the validation errors, yet the effective number of points is close to N, as if they were independent', 'The impact of early stopping on the learning process and model choices is highlighted, illustrating how it changes the nature of the dataset being used', 'The expected value of E is less than 0.5 due to the rules of the game, with a 75% probability that the minimum will be less than 0.5, resulting in an optimistic bias', 'The main use of validation sets is for model selection, and the choice of lambda is a manifestation of this', 'The bias introduced by the minimum validation error is minor and does not significantly impact the reliability of the estimate for the out-of-sample error, especially with a respectable size validation set', 'The determination of lambda using validation is critical, as it allows for the selection of the correct amount of lambda, leading to a close fit to the target', 'The validation error serves as an unbiased estimate of the out-of-sample error, assuming it is solely utilized for out-of-sample error measurement, enhancing the reliability of the estimation', 'As the sample size (k) for the validation set increases, the variance of the validation error decreases, suggesting a potential improvement in the reliability of the out-of-sample error estimate', 'Cross-validation allows for the utilization of all examples for both validation and training, offering a solution to the distinction between training and validation activities', 'Allocation of points to the validation set results in a trade-off, as it reduces the available points for training, emphasizing the need to carefully balance the allocation for effective model evaluation', 'Using a small k leads to a bad estimate due to high variance', 'Increasing K may result in a more reliable estimate but worse expected error value', 'Continuously increasing K can lead to a very reliable estimate but of poor quality', 'The compromise between reliability of estimates and proximity to reported results is emphasized', 'Rule of thumb suggests using 1/5th of the dataset for validation', 'The concept of choosing parameters with one degree of freedom, such as the regularization parameter and early stopping, is introduced, highlighting the correspondence between the number of parameters and degrees of freedom', 'The chapter discusses the importance of cross-validation in preventing overfitting by estimating the out-of-sample error', 'Recommends tenfold cross-validation as a practical approach for model selection', 'Emphasizes the use of both regularization and cross-validation in model selection', 'The validation process is extremely simple and immune to assumptions, allowing for model selection without requiring specific assumptions about the data', 'Cross-validation allows for using many examples to validate and train, minimizing bias, especially in large datasets', 'Validation can be used for making principled choices in the case of time-dependent data, such as in tracking the evolution of systems', 'The chapter addresses the issue of model selection based on out-of-sample error and the impact of sample size on cross-validation']}