title

Lecture 08 - Bias-Variance Tradeoff

description

Bias-Variance Tradeoff - Breaking down the learning performance into competing quantities. The learning curves. Lecture 8 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on April 26, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 08 - Bias-Variance Tradeoff', 'heatmap': [{'end': 650.565, 'start': 598.063, 'weight': 0.836}, {'end': 922.46, 'start': 781.123, 'weight': 0.878}, {'end': 1431.111, 'start': 1337.456, 'weight': 0.899}, {'end': 1710.636, 'start': 1659.296, 'weight': 0.739}, {'end': 1800.833, 'start': 1752.471, 'weight': 0.749}, {'end': 2448.812, 'start': 2395.396, 'weight': 0.73}, {'end': 2861.009, 'start': 2811.628, 'weight': 1}], 'summary': 'Covers vc analysis, generalization bound, and bias-variance tradeoff in machine learning, discussing concepts like vc dimension, bias, variance, and generalization error, emphasizing their quantifiable relationships and practical implications for learning feasibility and model complexity.', 'chapters': [{'end': 282.014, 'segs': [{'end': 118.746, 'src': 'embed', 'start': 44.64, 'weight': 0, 'content': [{'end': 48.923, 'text': 'and then in estimating the example resources that are needed in order to learn.', 'start': 44.64, 'duration': 4.283}, {'end': 54.046, 'text': 'One of the important aspects of the VC analysis is the scope.', 'start': 50.484, 'duration': 3.562}, {'end': 66.073, 'text': 'The VC inequality and the generalization bound that corresponds to it describe the generalization ability of the final hypothesis you are going to pick.', 'start': 55.607, 'duration': 10.466}, {'end': 75.448, 'text': 'And it describes that in terms of the VC dimension of the hypothesis set and makes a statement.', 'start': 67.666, 'duration': 7.782}, {'end': 80.35, 'text': 'that is true for all but delta of the data sets that you might get.', 'start': 75.448, 'duration': 4.902}, {'end': 82.391, 'text': 'So this is where it applies.', 'start': 80.91, 'duration': 1.481}, {'end': 92.074, 'text': 'And the most important part of the application are the disappearing blocks, because it gives the generality that the VC inequality has.', 'start': 83.571, 'duration': 8.503}, {'end': 96.615, 'text': 'So the VC bound is valid for any learning algorithm.', 'start': 92.974, 'duration': 3.641}, {'end': 105.801, 'text': 'for any input distribution that may take place, and also for any target function that you may be able to learn.', 'start': 98.004, 'duration': 7.797}, {'end': 109.059, 'text': 'This is the most theoretical part.', 'start': 107.558, 'duration': 1.501}, {'end': 115.424, 'text': 'And then we went into a little bit of a practical part, where we are asking about the utility of the VC dimension in practice.', 'start': 109.119, 'duration': 6.305}, {'end': 116.925, 'text': 'You have a learning problem.', 'start': 115.884, 'duration': 1.041}, {'end': 118.746, 'text': 'Someone comes with a problem.', 'start': 117.685, 'duration': 1.061}], 'summary': 'Vc analysis measures generalization ability for learning algorithms, valid for any input distribution and target function.', 'duration': 74.106, 'max_score': 44.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a844640.jpg'}, {'end': 255.076, 'src': 'embed', 'start': 187.841, 'weight': 4, 'content': [{'end': 191.643, 'text': 'And we were able to say that, theoretically, the bound will give us.', 'start': 187.841, 'duration': 3.802}, {'end': 195.304, 'text': 'the number of examples needed will be proportional to the VC dimension more or less.', 'start': 191.643, 'duration': 3.661}, {'end': 204.508, 'text': 'And although the constant of proportionality, if you go for the bound, will be horrifically pessimistic,', 'start': 196.625, 'duration': 7.883}, {'end': 210.251, 'text': 'you will end up requiring tens of thousands of examples for something for which you really need only maybe 50 examples.', 'start': 204.508, 'duration': 5.743}, {'end': 214.954, 'text': 'The good news is that the actual quantity behaves in the same way as the bound.', 'start': 210.951, 'duration': 4.003}, {'end': 221.258, 'text': 'So the number of examples needed is in practice, as a practical observation, indeed proportional to the VC dimension.', 'start': 215.354, 'duration': 5.904}, {'end': 227.702, 'text': 'And furthermore, as a rule of thumb, in order to get to the interesting part, the interesting delta and epsilon,', 'start': 222.078, 'duration': 5.624}, {'end': 232.486, 'text': 'You need the number of examples to be 10 times the VC dimension.', 'start': 228.502, 'duration': 3.984}, {'end': 233.827, 'text': 'More will be better.', 'start': 233.006, 'duration': 0.821}, {'end': 235.048, 'text': 'Less might work.', 'start': 234.167, 'duration': 0.881}, {'end': 241.734, 'text': 'But the ballpark of it is that you have a factor of 10 in order to start getting interesting generalization properties.', 'start': 235.348, 'duration': 6.386}, {'end': 250.873, 'text': 'We ended with summarizing the entire theoretical analysis into a very simple bound, which we are referring to as the generalization bound.', 'start': 243.228, 'duration': 7.645}, {'end': 255.076, 'text': 'that tells us a bound on the out-of-sample performance, given the in-sample performance.', 'start': 250.873, 'duration': 4.203}], 'summary': 'The number of examples needed is proportional to the vc dimension, typically 10 times, for interesting generalization properties.', 'duration': 67.235, 'max_score': 187.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8187841.jpg'}], 'start': 0.703, 'title': 'Vc analysis and generalization bound', 'summary': 'Covers vc analysis, including the definition of vc dimension, its application in establishing learning feasibility and estimating resources, the generalization ability described by the vc inequality, and its utility in practical learning problems. it also explores the relationship between the number of examples needed and the vc dimension, finding that the number of examples needed is proportional to the vc dimension, with a practical rule of thumb suggesting that the number of examples should be 10 times the vc dimension for interesting generalization properties.', 'chapters': [{'end': 118.746, 'start': 0.703, 'title': 'Vc analysis and generalization bound', 'summary': 'Covers the vc analysis, including the definition of vc dimension, its application in establishing learning feasibility and estimating resources, the generalization ability described by the vc inequality, and its utility in practical learning problems.', 'duration': 118.043, 'highlights': ['The VC dimension was defined as the most points that the hypothesis set can shatter, and it is used to establish learning feasibility and estimate example resources.', 'The VC inequality and generalization bound describe the generalization ability of the final hypothesis based on the VC dimension, applicable to all but delta of the data sets.', 'The VC bound is valid for any learning algorithm, input distribution, and target function, making it a theoretical yet widely applicable concept.', 'The practical utility of the VC dimension in learning problems was also discussed, addressing its relevance in real-world applications.']}, {'end': 282.014, 'start': 119.126, 'title': 'Vc dimension analysis', 'summary': 'Explores the relationship between the number of examples needed and the vc dimension, finding that the number of examples needed is proportional to the vc dimension, with a practical rule of thumb suggesting that the number of examples should be 10 times the vc dimension for interesting generalization properties.', 'duration': 162.888, 'highlights': ['The number of examples needed is proportional to the VC dimension, with a practical rule of thumb suggesting that the number of examples should be 10 times the VC dimension for interesting generalization properties. Number of examples needed is proportional to VC dimension; Practical rule of thumb suggests 10 times the VC dimension for interesting generalization properties.', 'Theoretical analysis summarized into a simple bound referred to as the generalization bound, which provides a bound on the out-of-sample performance based on the in-sample performance. Theoretical analysis summarized into a simple bound; Generalization bound provides a bound on the out-of-sample performance.', 'The bound will give a requirement of tens of thousands of examples, while in reality, only maybe 50 examples are needed for the same task. Theoretical bound suggests tens of thousands of examples needed; In reality, only maybe 50 examples are needed.']}], 'duration': 281.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8703.jpg', 'highlights': ['The VC dimension is used to establish learning feasibility and estimate example resources.', 'The VC inequality and generalization bound describe the generalization ability of the final hypothesis based on the VC dimension.', 'The VC bound is valid for any learning algorithm, input distribution, and target function, making it a widely applicable concept.', 'The practical utility of the VC dimension in learning problems was discussed, addressing its relevance in real-world applications.', 'The number of examples needed is proportional to the VC dimension, with a practical rule of thumb suggesting 10 times the VC dimension for interesting generalization properties.', 'Theoretical analysis summarized into a simple bound referred to as the generalization bound, which provides a bound on the out-of-sample performance based on the in-sample performance.', 'The theoretical bound suggests tens of thousands of examples needed, while in reality, only maybe 50 examples are needed for the same task.']}, {'end': 975.212, 'segs': [{'end': 310.084, 'src': 'embed', 'start': 282.314, 'weight': 0, 'content': [{'end': 285.476, 'text': 'And we will take advantage of that when we get a technique like regularization.', 'start': 282.314, 'duration': 3.162}, {'end': 290.62, 'text': "So that's the end of the VC analysis, which is the biggest part of the theory here.", 'start': 286.157, 'duration': 4.463}, {'end': 296.567, 'text': 'And we are going to switch today to another approach, which is the bias-variance tradeoff.', 'start': 291.46, 'duration': 5.107}, {'end': 298.83, 'text': "It's a stand-alone theory.", 'start': 297.168, 'duration': 1.662}, {'end': 301.955, 'text': 'It gives us a different angle on generalization.', 'start': 299.411, 'duration': 2.544}, {'end': 305.62, 'text': "And I'm going to cover it beginning to end during this lecture.", 'start': 302.455, 'duration': 3.165}, {'end': 307.342, 'text': 'So this is the plan.', 'start': 306.621, 'duration': 0.721}, {'end': 310.084, 'text': 'And the outline is very simple.', 'start': 308.683, 'duration': 1.401}], 'summary': 'Switching to bias-variance tradeoff, a stand-alone theory for generalization.', 'duration': 27.77, 'max_score': 282.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8282314.jpg'}, {'end': 353.32, 'src': 'embed', 'start': 328.437, 'weight': 2, 'content': [{'end': 337.102, 'text': 'And we are going to contrast the bias-variance analysis versus the VC analysis on these learning curves and then apply them to the linear regression case that we are familiar with.', 'start': 328.437, 'duration': 8.665}, {'end': 338.043, 'text': 'So this is the plan.', 'start': 337.403, 'duration': 0.64}, {'end': 340.426, 'text': 'So the first part is the bias and variance.', 'start': 338.624, 'duration': 1.802}, {'end': 347.313, 'text': 'And in the big picture, we have been trying to characterize a tradeoff.', 'start': 341.947, 'duration': 5.366}, {'end': 353.32, 'text': 'And roughly speaking, the tradeoff is between approximation and generalization.', 'start': 348.074, 'duration': 5.246}], 'summary': 'Contrasting bias-variance and vc analysis for tradeoff between approximation and generalization.', 'duration': 24.883, 'max_score': 328.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8328437.jpg'}, {'end': 448.542, 'src': 'embed', 'start': 421.718, 'weight': 1, 'content': [{'end': 426.902, 'text': 'That is, if you have fewer hypotheses, you have a better chance of generalization.', 'start': 421.718, 'duration': 5.184}, {'end': 431.185, 'text': "And one way to look at it is that I'm trying to approximate the target function.", 'start': 427.282, 'duration': 3.903}, {'end': 432.686, 'text': 'You give me a hypothesis set.', 'start': 431.205, 'duration': 1.481}, {'end': 439.211, 'text': 'Now, if I tell you I have good news for you, the target function is actually in the hypothesis set.', 'start': 433.166, 'duration': 6.045}, {'end': 443.879, 'text': 'You have the perfect approximation under your control.', 'start': 440.356, 'duration': 3.523}, {'end': 448.542, 'text': "Well, it's under your hand, but not necessarily under your control.", 'start': 444.9, 'duration': 3.642}], 'summary': 'Having fewer hypotheses increases chance of generalization in approximating the target function.', 'duration': 26.824, 'max_score': 421.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8421718.jpg'}, {'end': 650.565, 'src': 'heatmap', 'start': 598.063, 'weight': 0.836, 'content': [{'end': 600.804, 'text': 'As if you had access to the target function,', 'start': 598.063, 'duration': 2.741}, {'end': 607.606, 'text': 'and you are stuck with this hypothesis set and you are eagerly looking for which hypothesis best describes the target function.', 'start': 600.804, 'duration': 6.802}, {'end': 611.69, 'text': 'And then you quantify how well that best hypothesis performed.', 'start': 608.326, 'duration': 3.364}, {'end': 616.536, 'text': 'And that is your measure of the approximation ability.', 'start': 612.371, 'duration': 4.165}, {'end': 622.022, 'text': 'Then what is the other component? The other component is exactly what I alluded to.', 'start': 617.317, 'duration': 4.705}, {'end': 629.109, 'text': 'Can you zoom in on it? This is the best hypothesis, and it has a certain approximation ability.', 'start': 622.523, 'duration': 6.586}, {'end': 631.631, 'text': 'Now I need to pick it.', 'start': 630.21, 'duration': 1.421}, {'end': 636.715, 'text': 'So I have to use the examples in order to zoom in into the hypothesis set, and pick this particular one.', 'start': 632.111, 'duration': 4.604}, {'end': 642.98, 'text': 'Can I zoom in on it or do I get something that is a poor approximation of the approximation?', 'start': 637.115, 'duration': 5.865}, {'end': 650.565, 'text': "And that decomposition will give us the bias variance and we'll be able to put them at the end of the lecture side by side in order to compare.", 'start': 644.281, 'duration': 6.284}], 'summary': 'Selecting best hypothesis by quantifying approximation ability and decomposing into bias variance for comparison.', 'duration': 52.502, 'max_score': 598.063, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8598063.jpg'}, {'end': 777.821, 'src': 'embed', 'start': 754.992, 'weight': 3, 'content': [{'end': 762.255, 'text': 'Now, the gap here comes from the fact that, if you look at the final hypothesis, the final hypothesis depends on a number of things.', 'start': 754.992, 'duration': 7.263}, {'end': 767.837, 'text': "Among other things, it does depend on the data set that I'm going to give you.", 'start': 763.755, 'duration': 4.082}, {'end': 773.339, 'text': 'Because if I give you a different data set, you will find a different final hypothesis.', 'start': 769.577, 'duration': 3.762}, {'end': 777.821, 'text': 'Now, that dependency is quite important in the bias-variance analysis.', 'start': 774.139, 'duration': 3.682}], 'summary': 'Final hypothesis depends on various factors, including the dataset, impacting bias-variance analysis.', 'duration': 22.829, 'max_score': 754.992, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8754992.jpg'}, {'end': 840.761, 'src': 'embed', 'start': 817.082, 'weight': 4, 'content': [{'end': 825.809, 'text': 'we would like to see a decomposition of this quantity into the two conceptual components, approximation and generalization, that we saw.', 'start': 817.082, 'duration': 8.727}, {'end': 828.071, 'text': "So here's what we are going to do.", 'start': 826.69, 'duration': 1.381}, {'end': 840.761, 'text': 'We are going to take this quantity, which equals to this quantity, as I mentioned here, and then realize that this depends on the particular data set.', 'start': 830.393, 'duration': 10.368}], 'summary': 'Decompose the quantity into approximation and generalization components.', 'duration': 23.679, 'max_score': 817.082, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8817082.jpg'}, {'end': 929.394, 'src': 'heatmap', 'start': 781.123, 'weight': 5, 'content': [{'end': 786.026, 'text': "It has always been there, but I didn't need to carry ugly notation throughout when I'm not using it.", 'start': 781.123, 'duration': 4.903}, {'end': 788.328, 'text': "Here I'm using it, so we'll have to live with it.", 'start': 786.367, 'duration': 1.961}, {'end': 791.49, 'text': "So now, I'll make that dependency explicit.", 'start': 788.768, 'duration': 2.722}, {'end': 797.74, 'text': "So I'm having now a superscript, which tells me that this g comes from that particular data set.", 'start': 792.21, 'duration': 5.53}, {'end': 800.985, 'text': 'If you give me another data set, this would be a different g.', 'start': 797.76, 'duration': 3.225}, {'end': 804.411, 'text': 'And you take the same g, apply it to x, compare it to f, and this is your error.', 'start': 800.985, 'duration': 3.426}, {'end': 812.959, 'text': 'And finally, in order for it to be genuinely out of sample error, you need to get the expected value of that error over the entire space.', 'start': 806.314, 'duration': 6.645}, {'end': 814.661, 'text': 'This is what we have.', 'start': 813.96, 'duration': 0.701}, {'end': 817.082, 'text': 'Now what we would like to do.', 'start': 815.681, 'duration': 1.401}, {'end': 825.809, 'text': 'we would like to see a decomposition of this quantity into the two conceptual components, approximation and generalization, that we saw.', 'start': 817.082, 'duration': 8.727}, {'end': 828.071, 'text': "So here's what we are going to do.", 'start': 826.69, 'duration': 1.381}, {'end': 840.761, 'text': 'We are going to take this quantity, which equals to this quantity, as I mentioned here, and then realize that this depends on the particular data set.', 'start': 830.393, 'duration': 10.368}, {'end': 848.007, 'text': 'Now, I would like to read this from the dependency on the specific data set that I gave you.', 'start': 842.202, 'duration': 5.805}, {'end': 850.409, 'text': "So I'm going to play the following game.", 'start': 848.768, 'duration': 1.641}, {'end': 856.634, 'text': "I'm going to give you a budget of N examples, training examples to learn from.", 'start': 851.25, 'duration': 5.384}, {'end': 864.892, 'text': 'Now, if I give you that budget N, I could generate one d, and another d, and another d, each of them with N examples.', 'start': 858.106, 'duration': 6.786}, {'end': 869.776, 'text': 'And each of them will result in a different hypothesis G.', 'start': 865.452, 'duration': 4.324}, {'end': 872.598, 'text': 'And each of them will result in a different out-of-sample error.', 'start': 869.776, 'duration': 2.822}, {'end': 881.65, 'text': 'Correct?. So if I want to get rid of the dependency on the particular sample that I give you and just know the behavior, if I give you n data points,', 'start': 873.219, 'duration': 8.431}, {'end': 883.252, 'text': 'what happens then?', 'start': 881.65, 'duration': 1.602}, {'end': 885.535, 'text': 'I would like to integrate d out.', 'start': 883.252, 'duration': 2.283}, {'end': 890.863, 'text': "So I'm going to get the expected value of that error with respect to d.", 'start': 886.196, 'duration': 4.667}, {'end': 895.079, 'text': 'This is not a quantity that you are going to encounter in any given situation.', 'start': 891.877, 'duration': 3.202}, {'end': 898.141, 'text': 'In any given situation, you have a specific data set to work with.', 'start': 895.119, 'duration': 3.022}, {'end': 907.346, 'text': 'However, if I want to analyze the general behavior, I tell you, someone comes on my door, how many examples do you have? And they tell me 100.', 'start': 899.241, 'duration': 8.105}, {'end': 908.586, 'text': "I haven't seen the examples yet.", 'start': 907.346, 'duration': 1.24}, {'end': 913.089, 'text': 'So it stands to logic that I say, for 100 examples, the following behavior follows.', 'start': 908.987, 'duration': 4.102}, {'end': 917.731, 'text': 'So I must be taking an expected value with respect to all possible realizations of 100 examples.', 'start': 913.429, 'duration': 4.302}, {'end': 920.616, 'text': "And that is indeed what I'm going to do.", 'start': 918.832, 'duration': 1.784}, {'end': 922.46, 'text': "I'm going to get the expected value of that.", 'start': 920.897, 'duration': 1.563}, {'end': 924.925, 'text': "And this is the quantity that I'm going to decompose.", 'start': 922.76, 'duration': 2.165}, {'end': 927.951, 'text': 'And this obviously happens to be the expected value of the other guy.', 'start': 925.406, 'duration': 2.545}, {'end': 929.394, 'text': 'And we have that.', 'start': 928.512, 'duration': 0.882}], 'summary': 'Analyzing out-of-sample error behavior with varying data sets and sample sizes.', 'duration': 148.271, 'max_score': 781.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8781123.jpg'}], 'start': 282.314, 'title': 'Bias-variance tradeoff in machine learning', 'summary': 'Discusses bias-variance tradeoff in machine learning, comparing it with vc analysis, and quantifying the trade-off through approximation and generalization. it focuses on real-valued functions, applies the analysis specifically to linear regression, and explains error decomposition and the general behavior analysis.', 'chapters': [{'end': 753.737, 'start': 282.314, 'title': 'Bias-variance tradeoff in machine learning', 'summary': 'Discusses the bias-variance tradeoff in machine learning, comparing it with the vc analysis and quantifying the trade-off through approximation and generalization, focusing on real-valued functions and applying the analysis specifically to linear regression.', 'duration': 471.423, 'highlights': ['The chapter introduces the bias-variance tradeoff as a different approach to generalization compared to VC analysis, and plans to cover it in detail during the lecture. The bias-variance tradeoff is highlighted as a stand-alone theory that provides a different angle on generalization, with the intention to cover it thoroughly in the lecture.', 'The tradeoff between approximation and generalization is discussed, emphasizing the impact of hypothesis set size on the ability to approximate the target function and the challenge of generalization in machine learning. The discussion focuses on the tradeoff between approximation and generalization, highlighting the impact of hypothesis set size on approximating the target function and the challenge of generalization in machine learning.', 'The bias-variance analysis quantifies the tradeoff through decomposition into approximation and generalization components, with a focus on real-valued functions and applying the analysis specifically to linear regression. The bias-variance analysis is highlighted for quantifying the tradeoff through decomposition into approximation and generalization components, focusing on real-valued functions and applying the analysis specifically to linear regression.']}, {'end': 975.212, 'start': 754.992, 'title': 'Bias-variance analysis and error decomposition', 'summary': 'Explains how the final hypothesis depends on the specific data set, introduces a decomposition of the error into approximation and generalization components, and outlines the process of analyzing the general behavior by taking the expected value with respect to all possible realizations of a given data set.', 'duration': 220.22, 'highlights': ['The final hypothesis depends on the specific data set and can vary with different data sets, which is crucial in bias-variance analysis.', 'Introduction of a decomposition of the error into approximation and generalization components to analyze the behavior of the expected error with respect to all possible realizations of a given data set.', 'Analyzing the general behavior by taking the expected value with respect to all possible realizations of a given data set, enabling the understanding of the behavior for a specific number of examples.']}], 'duration': 692.898, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8282314.jpg', 'highlights': ['The bias-variance tradeoff is a different approach to generalization compared to VC analysis, and will be covered thoroughly in the lecture.', 'The tradeoff between approximation and generalization emphasizes the impact of hypothesis set size on approximating the target function and the challenge of generalization in machine learning.', 'The bias-variance analysis quantifies the tradeoff through decomposition into approximation and generalization components, focusing on real-valued functions and applying the analysis specifically to linear regression.', 'The final hypothesis depends on the specific data set and can vary with different data sets, crucial in bias-variance analysis.', 'Introduction of a decomposition of the error into approximation and generalization components to analyze the behavior of the expected error with respect to all possible realizations of a given data set.', 'Analyzing the general behavior by taking the expected value with respect to all possible realizations of a given data set, enabling the understanding of the behavior for a specific number of examples.']}, {'end': 1856.37, 'segs': [{'end': 1024.369, 'src': 'embed', 'start': 996.337, 'weight': 0, 'content': [{'end': 999.118, 'text': 'So this is the quantity that we are going to carry to the next slide.', 'start': 996.337, 'duration': 2.781}, {'end': 1000.558, 'text': "Let's do that.", 'start': 1000.018, 'duration': 0.54}, {'end': 1008.02, 'text': 'And the main notion in order to evaluate this quantity is the notion of an average hypothesis.', 'start': 1002.619, 'duration': 5.401}, {'end': 1010.501, 'text': "It's a pretty interesting idea.", 'start': 1008.14, 'duration': 2.361}, {'end': 1011.561, 'text': 'So here is the idea.', 'start': 1010.961, 'duration': 0.6}, {'end': 1017.508, 'text': 'You have a hypothesis set, and you are learning from a particular data set.', 'start': 1013.567, 'duration': 3.941}, {'end': 1021.969, 'text': 'And I am going to define a particular hypothesis.', 'start': 1018.748, 'duration': 3.221}, {'end': 1024.369, 'text': "I'm going to call it the average hypothesis.", 'start': 1022.429, 'duration': 1.94}], 'summary': 'The notion of an average hypothesis in evaluating a quantity from a hypothesis set is discussed.', 'duration': 28.032, 'max_score': 996.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8996337.jpg'}, {'end': 1070.294, 'src': 'embed', 'start': 1042.542, 'weight': 1, 'content': [{'end': 1050.065, 'text': 'So how about getting the expected value of these hypotheses? What does that formally mean? We have x fixed.', 'start': 1042.542, 'duration': 7.523}, {'end': 1055.707, 'text': 'So we actually are in a good position, because g of x is really just a random variable at this point.', 'start': 1050.265, 'duration': 5.442}, {'end': 1059.168, 'text': "It's a random variable determined by the choice of your data.", 'start': 1055.907, 'duration': 3.261}, {'end': 1060.809, 'text': 'The data is the randomization source.', 'start': 1059.288, 'duration': 1.521}, {'end': 1061.729, 'text': 'x is fixed.', 'start': 1061.189, 'duration': 0.54}, {'end': 1064.99, 'text': "So you think, I have one test point in the space that I'm interested in.", 'start': 1061.969, 'duration': 3.021}, {'end': 1070.294, 'text': 'Maybe you are playing the stock market, and now you are only interested in what is going to happen tomorrow.', 'start': 1066.171, 'duration': 4.123}], 'summary': 'Discussing the expected value of hypotheses and random variables in the context of fixed data and test points.', 'duration': 27.752, 'max_score': 1042.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81042541.jpg'}, {'end': 1431.111, 'src': 'heatmap', 'start': 1337.456, 'weight': 0.899, 'content': [{'end': 1340.777, 'text': 'So the first guy is a genuine expected value of these guys.', 'start': 1337.456, 'duration': 3.321}, {'end': 1344.738, 'text': 'When I apply the expected value to this guy, again, this is just a constant.', 'start': 1341.217, 'duration': 3.521}, {'end': 1346.659, 'text': 'So the expected value of it is itself.', 'start': 1344.758, 'duration': 1.901}, {'end': 1351.601, 'text': "So the second guy I add, without bothering with the expected value, because it's just a constant.", 'start': 1347.559, 'duration': 4.042}, {'end': 1355.862, 'text': 'So this is what I have as the expression for the quantities that I want.', 'start': 1352.341, 'duration': 3.521}, {'end': 1360.944, 'text': "Now let's take this and look at it closely, because this will be the bias and variance.", 'start': 1357.363, 'duration': 3.581}, {'end': 1370.331, 'text': 'This is the quantity again, and it equals this fellow.', 'start': 1364.987, 'duration': 5.344}, {'end': 1373.974, 'text': "Now let's look beyond the math and understand what is going on.", 'start': 1371.111, 'duration': 2.863}, {'end': 1376.355, 'text': 'This measure.', 'start': 1374.814, 'duration': 1.541}, {'end': 1388.224, 'text': 'this quantity tells you how far your hypothesis that you got from learning on a particular data set differs from the ultimate thing, the target.', 'start': 1376.355, 'duration': 11.869}, {'end': 1392.263, 'text': 'And we are decomposing this into two steps.', 'start': 1390.14, 'duration': 2.123}, {'end': 1401.217, 'text': 'The first step is to ask you how far is your hypothesis that you got from that particular data set,', 'start': 1393.225, 'duration': 7.992}, {'end': 1405.824, 'text': 'from the best possible you can get using your hypothesis set?', 'start': 1401.217, 'duration': 4.607}, {'end': 1414.347, 'text': "Now there's sort of a leap here, because I don't know whether this is the best in the hypothesis set.", 'start': 1407.926, 'duration': 6.421}, {'end': 1415.388, 'text': 'I got it by the averaging.', 'start': 1414.407, 'duration': 0.981}, {'end': 1419.409, 'text': "But it looks like, since I'm averaging from several data sets, it looks like a pretty good hypothesis.", 'start': 1415.428, 'duration': 3.981}, {'end': 1422.749, 'text': "I'm not even sure that it's actually in the hypothesis set.", 'start': 1419.989, 'duration': 2.76}, {'end': 1425.41, 'text': "It's the average of guys that came from the hypothesis set.", 'start': 1422.849, 'duration': 2.561}, {'end': 1431.111, 'text': 'But I can definitely construct a hypothesis set where the average of hypotheses not necessarily belongs there.', 'start': 1425.81, 'duration': 5.301}], 'summary': 'The transcript discusses the expected value, bias, variance, and hypothesis decomposition in statistical learning.', 'duration': 93.655, 'max_score': 1337.456, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81337456.jpg'}, {'end': 1658.895, 'src': 'embed', 'start': 1631.481, 'weight': 2, 'content': [{'end': 1635.263, 'text': "And the expected value of the bias with respect to x, I'm just going to call it bias.", 'start': 1631.481, 'duration': 3.782}, {'end': 1639.084, 'text': "And the expected here, we're going to call it expected variance.", 'start': 1636.003, 'duration': 3.081}, {'end': 1640.925, 'text': "And that's what you get.", 'start': 1639.745, 'duration': 1.18}, {'end': 1643.166, 'text': 'And this is the bias-variance decomposition.', 'start': 1641.405, 'duration': 1.761}, {'end': 1649.768, 'text': 'So now, I have a single number that describes the expected out-of-sample.', 'start': 1644.885, 'duration': 4.883}, {'end': 1651.43, 'text': 'So I give you a full learning situation.', 'start': 1649.788, 'duration': 1.642}, {'end': 1657.354, 'text': 'I give you a target function, and an input distribution, and a hypothesis set, and a learning algorithm.', 'start': 1651.85, 'duration': 5.504}, {'end': 1658.895, 'text': 'And you have all the components.', 'start': 1657.614, 'duration': 1.281}], 'summary': 'Bias-variance decomposition yields expected out-of-sample for learning situation.', 'duration': 27.414, 'max_score': 1631.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81631481.jpg'}, {'end': 1710.636, 'src': 'heatmap', 'start': 1659.296, 'weight': 0.739, 'content': [{'end': 1661.157, 'text': 'You go about and learn for every data set.', 'start': 1659.296, 'duration': 1.861}, {'end': 1664.199, 'text': 'And you get someone else learn from another data set.', 'start': 1661.718, 'duration': 2.481}, {'end': 1666.221, 'text': 'And you get the expected value of the out-of-sample error.', 'start': 1664.32, 'duration': 1.901}, {'end': 1674.667, 'text': "And I'm telling you, if this out-of-sample error is 0.3, well, 0.05 of it is because of bias, and 0.25 is because of variance.", 'start': 1666.541, 'duration': 8.126}, {'end': 1679.951, 'text': "So 0.05 means that your hypothesis is pretty good in approximation, but maybe it's too big.", 'start': 1675.568, 'duration': 4.383}, {'end': 1683.193, 'text': 'Therefore, you have a lot of variance, which is 0.25.', 'start': 1679.971, 'duration': 3.222}, {'end': 1686.435, 'text': 'So this is the decomposition.', 'start': 1683.193, 'duration': 3.242}, {'end': 1692.64, 'text': "Now let's look at the trade-off of generalization versus approximation in terms of this decomposition.", 'start': 1686.475, 'duration': 6.165}, {'end': 1693.6, 'text': 'That was the purpose.', 'start': 1692.78, 'duration': 0.82}, {'end': 1699.484, 'text': 'Here is the bias, explicitly written as a formula.', 'start': 1696.582, 'duration': 2.902}, {'end': 1702.707, 'text': 'And here is the variance.', 'start': 1701.005, 'duration': 1.702}, {'end': 1710.636, 'text': 'Now we would like to argue that there is a trade-off that when you change your hypothesis, that you make it bigger, more complex or smaller,', 'start': 1704.471, 'duration': 6.165}], 'summary': 'Decomposition of out-of-sample error: 0.05 bias, 0.25 variance, trade-off between generalization and approximation.', 'duration': 51.34, 'max_score': 1659.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81659296.jpg'}, {'end': 1800.833, 'src': 'heatmap', 'start': 1752.471, 'weight': 0.749, 'content': [{'end': 1756.312, 'text': 'If I use this guy, obviously I am far away from the target function.', 'start': 1752.471, 'duration': 3.841}, {'end': 1758.373, 'text': 'And therefore, the bias is big.', 'start': 1757.013, 'duration': 1.36}, {'end': 1767.651, 'text': 'If I have a big hypothesis set, this is big enough that it actually includes the f, then when I learn, on average, I will be very close to f.', 'start': 1760.285, 'duration': 7.366}, {'end': 1770.733, 'text': "Maybe I won't hit f exactly because of the nonlinearity of the regime.", 'start': 1767.651, 'duration': 3.082}, {'end': 1774.236, 'text': 'In the regime, I get n examples, learn, and keep it.', 'start': 1770.753, 'duration': 3.483}, {'end': 1776.638, 'text': 'Another n example, learn and keep it, and then take the average.', 'start': 1774.536, 'duration': 2.102}, {'end': 1778.84, 'text': 'I might have lost some because of the nonlinearity.', 'start': 1776.938, 'duration': 1.902}, {'end': 1781.122, 'text': "I might not get f, but I'll get pretty close.", 'start': 1778.88, 'duration': 2.242}, {'end': 1784.887, 'text': 'So the bias here is very, very small, close to 0.', 'start': 1781.622, 'duration': 3.265}, {'end': 1786.888, 'text': 'In terms of the variance here, there is no variance.', 'start': 1784.887, 'duration': 2.001}, {'end': 1789.569, 'text': "If I have one target function, I don't care what data set you give me.", 'start': 1786.908, 'duration': 2.661}, {'end': 1791.51, 'text': 'I will always give you that function.', 'start': 1789.689, 'duration': 1.821}, {'end': 1793.65, 'text': 'So there is nothing to lose here in terms of variance.', 'start': 1791.89, 'duration': 1.76}, {'end': 1800.833, 'text': "Here, I have so many varieties that depending on the examples you give me, I may pick this and another example, because I'm fitting your data.", 'start': 1794.09, 'duration': 6.743}], 'summary': 'Big hypothesis set includes f, small bias, no variance, close to f with n examples.', 'duration': 48.362, 'max_score': 1752.471, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81752471.jpg'}, {'end': 1841.422, 'src': 'embed', 'start': 1814.358, 'weight': 3, 'content': [{'end': 1820.162, 'text': 'Now you can see that if I go from a small hypothesis to a bigger hypothesis, the bias goes down, and the variance goes up.', 'start': 1814.358, 'duration': 5.804}, {'end': 1831.129, 'text': "So the idea here if I make the hypothesis set bigger, I am making the bias smaller, because I'm making this bigger,", 'start': 1821.142, 'duration': 9.987}, {'end': 1833.65, 'text': 'getting it closer to f and being able to approximate it better.', 'start': 1831.129, 'duration': 2.521}, {'end': 1834.831, 'text': 'So the bias is diminishing.', 'start': 1833.77, 'duration': 1.061}, {'end': 1841.422, 'text': 'But the bias goes down, and here the variance goes up.', 'start': 1836.4, 'duration': 5.022}], 'summary': 'Increasing hypothesis set reduces bias but increases variance.', 'duration': 27.064, 'max_score': 1814.358, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81814358.jpg'}], 'start': 975.492, 'title': 'Hypothesis evaluation and bias-variance trade-off', 'summary': 'Covers the concept of average hypothesis evaluation and the bias-variance trade-off in machine learning, with emphasis on evaluating quantities based on expected values and the impact of hypothesis set size on bias and variance.', 'chapters': [{'end': 1082.784, 'start': 975.492, 'title': 'Average hypothesis evaluation', 'summary': 'Discusses the concept of an average hypothesis and evaluating a quantity based on expected values with respect to a fixed point x.', 'duration': 107.292, 'highlights': ['The concept of an average hypothesis is introduced, defined as the expected value of hypotheses learned from different data sets, providing insight into evaluating a quantity. Introducing the concept of an average hypothesis, defined as the expected value of hypotheses learned from different data sets, providing insight into evaluating a quantity.', 'Explanation of evaluating the quantity using the notion of expected values and random variables with a fixed point x, illustrating its relevance in practical scenarios such as stock market predictions. Explanation of evaluating the quantity using the notion of expected values and random variables with a fixed point x, illustrating its relevance in practical scenarios such as stock market predictions.']}, {'end': 1856.37, 'start': 1082.784, 'title': 'Bias-variance trade-off', 'summary': 'Discusses the bias-variance trade-off in machine learning, where the bias decreases as the hypothesis set becomes bigger, leading to better approximation, while the variance increases, causing more variety to choose from, impacting the generalization.', 'duration': 773.586, 'highlights': ['The bias decreases as the hypothesis set becomes bigger, getting closer to the target function and being able to approximate it better. This illustrates the trade-off where increasing the hypothesis set size results in a reduction of bias, leading to better approximation of the target function.', 'The variance increases as the hypothesis set becomes bigger, leading to more variety to choose from and a larger impact on generalization. Expanding the hypothesis set size results in a larger variance, providing more variety to choose from and impacting the generalization of the model.', 'The chapter discusses the bias-variance trade-off in machine learning, where the bias decreases as the hypothesis set becomes bigger, leading to better approximation, while the variance increases, causing more variety to choose from, impacting the generalization. This summary encapsulates the core concept of the bias-variance trade-off, highlighting the inverse relationship between bias and variance in machine learning models.']}], 'duration': 880.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a8975492.jpg', 'highlights': ['Introducing the concept of an average hypothesis, defined as the expected value of hypotheses learned from different data sets, providing insight into evaluating a quantity.', 'Explanation of evaluating the quantity using the notion of expected values and random variables with a fixed point x, illustrating its relevance in practical scenarios such as stock market predictions.', 'The summary encapsulates the core concept of the bias-variance trade-off, highlighting the inverse relationship between bias and variance in machine learning models.', 'The bias decreases as the hypothesis set becomes bigger, getting closer to the target function and being able to approximate it better.', 'The variance increases as the hypothesis set becomes bigger, leading to more variety to choose from and a larger impact on generalization.']}, {'end': 2163.926, 'segs': [{'end': 1904.639, 'src': 'embed', 'start': 1856.73, 'weight': 4, 'content': [{'end': 1861.114, 'text': "So now let's take a very concrete example, and we will solve it beginning to end.", 'start': 1856.73, 'duration': 4.384}, {'end': 1866.46, 'text': 'And if you understand this example fully, you will have understood bias and variance perfectly.', 'start': 1861.755, 'duration': 4.705}, {'end': 1868.041, 'text': "So let's see.", 'start': 1867.601, 'duration': 0.44}, {'end': 1873.326, 'text': 'I took the simplest possible example that I can get a solution of fully.', 'start': 1868.742, 'duration': 4.584}, {'end': 1876.209, 'text': 'My target is a sinusoid.', 'start': 1874.748, 'duration': 1.461}, {'end': 1882.686, 'text': "That's an easy function, and I just wanted to restrict myself from minus 1, plus 1.", 'start': 1878.362, 'duration': 4.324}, {'end': 1886.989, 'text': "So I'm going to get sine pi x, just to scale it so that it's from minus 1 to plus 1.", 'start': 1882.686, 'duration': 4.303}, {'end': 1887.83, 'text': 'Gets me the whole action.', 'start': 1886.989, 'duration': 0.841}, {'end': 1894.736, 'text': 'And therefore, the function formally defined, the target function, is from minus 1, plus 1 to the real numbers.', 'start': 1888.57, 'duration': 6.166}, {'end': 1899.379, 'text': 'The codomain is the real numbers, but obviously the function will be restricted from minus 1 to plus 1 as a range.', 'start': 1895.076, 'duration': 4.303}, {'end': 1904.639, 'text': 'Now, the target function is unknown.', 'start': 1901.317, 'duration': 3.322}], 'summary': 'Example of solving a sinusoid function with a restricted range.', 'duration': 47.909, 'max_score': 1856.73, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81856730.jpg'}, {'end': 1949.783, 'src': 'embed', 'start': 1922.327, 'weight': 5, 'content': [{'end': 1926.989, 'text': 'We are going to get things in terms of it, and then you will understand why the tradeoff exists.', 'start': 1922.327, 'duration': 4.662}, {'end': 1929.61, 'text': 'So the function looks like this.', 'start': 1928.229, 'duration': 1.381}, {'end': 1931.772, 'text': "Surprise It's like a sinusoid.", 'start': 1930.131, 'duration': 1.641}, {'end': 1934.293, 'text': 'Fine Now the catch is the following.', 'start': 1931.932, 'duration': 2.361}, {'end': 1935.434, 'text': "You're going to learn this function.", 'start': 1934.313, 'duration': 1.121}, {'end': 1937.235, 'text': "I'm going to give you a data set.", 'start': 1936.034, 'duration': 1.201}, {'end': 1943.819, 'text': 'How big is the data set? I am not in a generous mood today.', 'start': 1937.515, 'duration': 6.304}, {'end': 1946.401, 'text': "So I'm just going to give you two examples.", 'start': 1944.8, 'duration': 1.601}, {'end': 1949.783, 'text': 'And from the two examples, you need to learn the whole target function.', 'start': 1947.261, 'duration': 2.522}], 'summary': 'Learning target function from two examples with small data set', 'duration': 27.456, 'max_score': 1922.327, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81922327.jpg'}, {'end': 2028.548, 'src': 'embed', 'start': 1999.596, 'weight': 3, 'content': [{'end': 2004.3, 'text': 'Looks good now, having seen the constant already, right? These are your two hypotheses.', 'start': 1999.596, 'duration': 4.704}, {'end': 2008.244, 'text': 'And we would like to see which one is better.', 'start': 2004.421, 'duration': 3.823}, {'end': 2014.674, 'text': "Better for what? That's the key issue.", 'start': 2010.99, 'duration': 3.684}, {'end': 2020.96, 'text': "So let's start to answer the question of approximation first, and then go to the question of learning.", 'start': 2015.775, 'duration': 5.185}, {'end': 2023.243, 'text': "Here's the question of approximation.", 'start': 2022.001, 'duration': 1.242}, {'end': 2025.345, 'text': 'H0 versus H1.', 'start': 2024.063, 'duration': 1.282}, {'end': 2028.548, 'text': "When I talk about approximation, I'm not talking about learning.", 'start': 2026.086, 'duration': 2.462}], 'summary': 'Comparing two hypotheses for approximation and learning.', 'duration': 28.952, 'max_score': 1999.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81999596.jpg'}, {'end': 2103.246, 'src': 'embed', 'start': 2074.13, 'weight': 1, 'content': [{'end': 2075.35, 'text': 'I can solve this.', 'start': 2074.13, 'duration': 1.22}, {'end': 2078.891, 'text': 'I get a line in general, calculate the mean square error.', 'start': 2075.51, 'duration': 3.381}, {'end': 2082.953, 'text': 'It will be a function of a and b, differentiate with the rest of a and b, and get the optimal.', 'start': 2078.931, 'duration': 4.022}, {'end': 2083.572, 'text': "It's not a big deal.", 'start': 2082.973, 'duration': 0.599}, {'end': 2086.656, 'text': 'So you end up with this.', 'start': 2085.514, 'duration': 1.142}, {'end': 2088.397, 'text': "That's your best approximation.", 'start': 2087.096, 'duration': 1.301}, {'end': 2092.719, 'text': 'This is not a learning situation, but this is the best you can do using the linear model.', 'start': 2088.677, 'duration': 4.042}, {'end': 2096.141, 'text': 'Under those conditions, you made errors.', 'start': 2093.699, 'duration': 2.442}, {'end': 2098.063, 'text': 'And these are your errors.', 'start': 2096.302, 'duration': 1.761}, {'end': 2103.246, 'text': "You didn't get it right, and these regions tell you how far you are from the target.", 'start': 2099.624, 'duration': 3.622}], 'summary': 'Linear model minimizes errors to best approximation, not a learning situation.', 'duration': 29.116, 'max_score': 2074.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82074130.jpg'}, {'end': 2163.926, 'src': 'embed', 'start': 2119.893, 'weight': 0, 'content': [{'end': 2123.795, 'text': "So if I move the 0, the big error will contribute a lot, because it's squared.", 'start': 2119.893, 'duration': 3.902}, {'end': 2125.255, 'text': 'So I just put it in the middle.', 'start': 2123.835, 'duration': 1.42}, {'end': 2128.676, 'text': 'And this is your hypothesis.', 'start': 2125.595, 'duration': 3.081}, {'end': 2131.937, 'text': 'And how much is your error? Big.', 'start': 2129.657, 'duration': 2.28}, {'end': 2134.198, 'text': 'The whole thing is your error.', 'start': 2132.117, 'duration': 2.081}, {'end': 2135.519, 'text': "Let's quantify it.", 'start': 2134.758, 'duration': 0.761}, {'end': 2140.4, 'text': 'If you get the expected value of the mean square error, you get a number, which here will be 0.5.', 'start': 2135.579, 'duration': 4.821}, {'end': 2144.368, 'text': 'And here will be approximately 0.2.', 'start': 2140.4, 'duration': 3.968}, {'end': 2145.569, 'text': 'So the linear model wins.', 'start': 2144.368, 'duration': 1.201}, {'end': 2147.551, 'text': "Yeah, I'm approximating.", 'start': 2146.57, 'duration': 0.981}, {'end': 2148.151, 'text': 'I have more to do.', 'start': 2147.571, 'duration': 0.58}, {'end': 2150.594, 'text': 'Sure If you give me third order, I will be able to do better.', 'start': 2148.191, 'duration': 2.403}, {'end': 2152.936, 'text': "If you give me 17th order, I'll be able to do better.", 'start': 2150.814, 'duration': 2.122}, {'end': 2155.979, 'text': "But that's the game, in terms of approximation.", 'start': 2153.356, 'duration': 2.623}, {'end': 2158.401, 'text': 'The more, the merrier, because you have all the information.', 'start': 2156.359, 'duration': 2.042}, {'end': 2159.462, 'text': 'There is no question of zooming in.', 'start': 2158.421, 'duration': 1.041}, {'end': 2163.926, 'text': "Now let's go for learning.", 'start': 2161.724, 'duration': 2.202}], 'summary': 'Using mean square error, linear model outperforms with 0.5 and 0.2 values, indicating better approximation potential.', 'duration': 44.033, 'max_score': 2119.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82119893.jpg'}], 'start': 1856.73, 'title': 'Bias-variance tradeoff in model approximation', 'summary': 'Explains bias and variance through a sinusoid prediction example, presents bias-variance tradeoff with a sinusoidal target function and two hypothesis sets, and discusses model approximation using constant and linear models, highlighting their quantifiable errors to determine the superior model.', 'chapters': [{'end': 1904.639, 'start': 1856.73, 'title': 'Bias and variance in target function', 'summary': 'Explains bias and variance using a concrete example of predicting a sinusoid function within a range of -1 to +1, highlighting the importance of understanding this example fully for grasping bias and variance concepts.', 'duration': 47.909, 'highlights': ['The target function is a sinusoid within the range of -1 to +1, represented as sine pi x.', 'Understanding the provided example fully is crucial for comprehending bias and variance concepts.']}, {'end': 2028.548, 'start': 1904.859, 'title': 'Bias-variance tradeoff illustration', 'summary': 'Illustrates the bias-variance tradeoff by presenting a target function as a sinusoid, providing a small dataset of two examples for learning, and introducing two hypothesis sets (h0 and h1) - a constant model and a linear model - to determine the better approximation.', 'duration': 123.689, 'highlights': ['The target function is presented as a sinusoid, and a small dataset of two examples is provided for learning.', 'Two hypothesis sets, H0 and H1, are introduced - a constant model and a linear model - to determine the better approximation.', 'The concept of approximation is explored in the context of the bias-variance tradeoff, emphasizing the comparison between H0 and H1 for better understanding.']}, {'end': 2163.926, 'start': 2029.008, 'title': 'Model approximation and learning', 'summary': 'Discusses the process of approximating a sinusoidal target function using constant and linear models, highlighting the mean square error and the quantifiable errors associated with each model, ultimately determining the superiority of the linear model over the constant model.', 'duration': 134.918, 'highlights': ['The linear model yields a mean square error of approximately 0.2, while the constant model results in a mean square error of 0.5, demonstrating the superiority of the linear model in approximation.', 'The process involves fitting the target function with a line through calculation of the mean square error and obtaining the optimal parameters, illustrating the practical application of the linear model in approximation.', 'The chapter emphasizes the limitations of using a constant or linear model for approximation, suggesting the potential for improvement with higher-order models, indicating the scope for enhanced approximation with additional information and higher-order models.', 'The transcript explores the concept of approximation and learning, delineating the distinction between approximation based on available information and the potential for further enhancement through learning and access to more data.']}], 'duration': 307.196, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a81856730.jpg', 'highlights': ['The linear model yields a mean square error of approximately 0.2, while the constant model results in a mean square error of 0.5, demonstrating the superiority of the linear model in approximation.', 'The process involves fitting the target function with a line through calculation of the mean square error and obtaining the optimal parameters, illustrating the practical application of the linear model in approximation.', 'The chapter emphasizes the limitations of using a constant or linear model for approximation, suggesting the potential for improvement with higher-order models, indicating the scope for enhanced approximation with additional information and higher-order models.', 'The concept of approximation is explored in the context of the bias-variance tradeoff, emphasizing the comparison between H0 and H1 for better understanding.', 'Understanding the provided example fully is crucial for comprehending bias and variance concepts.', 'The target function is presented as a sinusoid, and a small dataset of two examples is provided for learning.', 'The transcript explores the concept of approximation and learning, delineating the distinction between approximation based on available information and the potential for further enhancement through learning and access to more data.', 'Two hypothesis sets, H0 and H1, are introduced - a constant model and a linear model - to determine the better approximation.', 'The target function is a sinusoid within the range of -1 to +1, represented as sine pi x.']}, {'end': 2736.758, 'segs': [{'end': 2288.434, 'src': 'embed', 'start': 2254.045, 'weight': 0, 'content': [{'end': 2256.227, 'text': 'But this depends on which two points I gave you.', 'start': 2254.045, 'duration': 2.182}, {'end': 2259.891, 'text': "If I give you another two points, I'll give you another two points, et cetera.", 'start': 2256.748, 'duration': 3.143}, {'end': 2263.714, 'text': "So I'm not sure how to really compare them, because it does depend on your data set.", 'start': 2259.931, 'duration': 3.783}, {'end': 2266.717, 'text': "That's why we needed the bias-variance analysis.", 'start': 2264.415, 'duration': 2.302}, {'end': 2271.521, 'text': "That's why we got the expected value of the error with respect to the choice of the data set,", 'start': 2267.057, 'duration': 4.464}, {'end': 2280.227, 'text': "so that we actually are talking inherently about a linear model, learning a target using two points, regardless of which two points I'm talking about.", 'start': 2271.521, 'duration': 8.706}, {'end': 2288.434, 'text': "So let's do the bias and variance decomposition for the constant guy.", 'start': 2282.169, 'duration': 6.265}], 'summary': 'Bias-variance analysis is needed to compare linear model learning with two points, regardless of the specific points used.', 'duration': 34.389, 'max_score': 2254.045, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82254045.jpg'}, {'end': 2448.812, 'src': 'heatmap', 'start': 2395.396, 'weight': 0.73, 'content': [{'end': 2399.298, 'text': 'Because this game of getting one at a time and then getting the average does get me somewhere.', 'start': 2395.396, 'duration': 3.902}, {'end': 2402.72, 'text': 'But do remember, this is not the output of your learning process.', 'start': 2399.778, 'duration': 2.942}, {'end': 2403.42, 'text': 'I wish it were.', 'start': 2402.82, 'duration': 0.6}, {'end': 2404.28, 'text': "It isn't.", 'start': 2403.92, 'duration': 0.36}, {'end': 2407.762, 'text': "The output of your learning process is one of those guys, and you don't know which.", 'start': 2404.941, 'duration': 2.821}, {'end': 2410.664, 'text': 'This just happens that if you repeat it, this will be your average.', 'start': 2408.363, 'duration': 2.301}, {'end': 2415.607, 'text': 'And because you are getting different guys here, there will be a variance around this.', 'start': 2411.264, 'duration': 4.343}, {'end': 2419.649, 'text': "And the variance, I'm describing it basically by the standard deviation you're going to get.", 'start': 2415.767, 'duration': 3.882}, {'end': 2429.174, 'text': 'So the error between the green line and the target function will give you the bias, and the width of the gray region will give you the variance.', 'start': 2420.709, 'duration': 8.465}, {'end': 2435.568, 'text': 'Understood what the analysis is? So that takes care of H0.', 'start': 2430.427, 'duration': 5.141}, {'end': 2436.529, 'text': "Let's go to H1.", 'start': 2435.628, 'duration': 0.901}, {'end': 2440.35, 'text': 'So to remember, the learning situation for H0 was this.', 'start': 2437.029, 'duration': 3.321}, {'end': 2442.17, 'text': 'This is when I had the constant model.', 'start': 2440.47, 'duration': 1.7}, {'end': 2448.812, 'text': 'What will happen if you are actually fitting the two points not with a constant, which you do at the midpoint,', 'start': 2443.031, 'duration': 5.781}], 'summary': 'Learning process output: variance described by standard deviation, bias by error.', 'duration': 53.416, 'max_score': 2395.396, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82395396.jpg'}], 'start': 2164.147, 'title': 'Bias-variance analysis in machine learning and its trade-off', 'summary': 'Explains bias-variance analysis in machine learning, using an example of learning a target function with two points, and discusses the trade-off by analyzing bias and variance components for different models, impacting approximation and out-of-sample error.', 'chapters': [{'end': 2339.729, 'start': 2164.147, 'title': 'Bias-variance analysis in machine learning', 'summary': 'Explains the concept of bias-variance analysis in machine learning, using the example of learning a target function using two points and evaluating the expected out-of-sample error.', 'duration': 175.582, 'highlights': ['The chapter emphasizes the importance of bias-variance analysis in machine learning, particularly in learning a target function using two points, to evaluate the expected out-of-sample error.', 'It discusses the process of fitting a line to two examples and evaluating the out-of-sample error for different hypothesis sets, illustrating the impact of the choice of data set on the error.', 'The chapter presents a simulation exercise demonstrating the distribution of hypotheses obtained when fitting a line to two points, highlighting the variability in out-of-sample error and the significance of the expected out-of-sample error.', 'It explains the concept of bias-variance decomposition for the constant hypothesis, using a figure to illustrate the generation of data sets and the resulting distribution of hypotheses, emphasizing the impact on the expected out-of-sample error.']}, {'end': 2736.758, 'start': 2340.83, 'title': 'Bias-variance trade-off in learning', 'summary': 'Discusses the bias-variance trade-off in learning by analyzing the bias and variance components for different models, highlighting the impact on approximation and out-of-sample error.', 'duration': 395.928, 'highlights': ['The trade-off between bias and variance is demonstrated by comparing the bias and variance components for different models, showcasing the impact on approximation and out-of-sample error. Demonstration of the trade-off between bias and variance components for different models, showcasing their impact on approximation and out-of-sample error.', 'Comparison of the bias for different models shows that a model with smaller bias may have larger variance, leading to a trade-off between bias and variance. Comparison of bias for different models and the trade-off between bias and variance.', 'Quantitative comparison of bias and variance for different models reveals the impact on out-of-sample error, providing insights into the selection of models based on the bias-variance trade-off. Quantitative comparison of bias and variance for different models and its impact on out-of-sample error, providing insights into model selection based on the trade-off.']}], 'duration': 572.611, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82164147.jpg', 'highlights': ['The chapter emphasizes the importance of bias-variance analysis in machine learning, particularly in learning a target function using two points, to evaluate the expected out-of-sample error.', 'The trade-off between bias and variance is demonstrated by comparing the bias and variance components for different models, showcasing the impact on approximation and out-of-sample error.', 'Comparison of the bias for different models shows that a model with smaller bias may have larger variance, leading to a trade-off between bias and variance.', 'Quantitative comparison of bias and variance for different models reveals the impact on out-of-sample error, providing insights into the selection of models based on the bias-variance trade-off.']}, {'end': 3265.504, 'segs': [{'end': 2773.777, 'src': 'embed', 'start': 2736.778, 'weight': 0, 'content': [{'end': 2738.779, 'text': 'The letter of recommendation is there.', 'start': 2736.778, 'duration': 2.001}, {'end': 2740.36, 'text': "It's much easier when I find it.", 'start': 2739.139, 'duration': 1.221}, {'end': 2743.18, 'text': 'However, finding it is a big deal.', 'start': 2740.7, 'duration': 2.48}, {'end': 2746.181, 'text': 'So the question is not that the target function is there.', 'start': 2744.021, 'duration': 2.16}, {'end': 2754.661, 'text': 'The question is, can I find it? Therefore, when I give you 100 examples, you choose the hypothesis set to match the 100 examples.', 'start': 2746.221, 'duration': 8.44}, {'end': 2760.726, 'text': "If the 100 examples are terribly noisy, that's even worse, because their information to guide you is worse.", 'start': 2755.061, 'duration': 5.665}, {'end': 2763.828, 'text': "So that's what I mean by the data resources you have.", 'start': 2761.086, 'duration': 2.742}, {'end': 2772.235, 'text': "The data resources you have is, what do you have in order to navigate the hypothesis set? Let's pick a hypothesis set that we can afford to navigate.", 'start': 2764.209, 'duration': 8.026}, {'end': 2773.777, 'text': 'That is the game in learning.', 'start': 2772.716, 'duration': 1.061}], 'summary': 'In learning, data resources guide the choice of hypothesis set based on examples, with noisy data posing a challenge.', 'duration': 36.999, 'max_score': 2736.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82736778.jpg'}, {'end': 2879.894, 'src': 'heatmap', 'start': 2811.628, 'weight': 3, 'content': [{'end': 2813.99, 'text': 'We have seen that already in the bias-variance decomposition.', 'start': 2811.628, 'duration': 2.362}, {'end': 2815.072, 'text': 'And this is the quantity.', 'start': 2814.191, 'duration': 0.881}, {'end': 2818.735, 'text': 'I know this is the quantity that I will get in any learning situation.', 'start': 2815.512, 'duration': 3.223}, {'end': 2819.957, 'text': 'It depends on the data set.', 'start': 2818.835, 'duration': 1.122}, {'end': 2826.063, 'text': 'If I want a quantity that describes just the size of the set, I will integrate this out and get the expected value with respect to d.', 'start': 2820.357, 'duration': 5.706}, {'end': 2826.964, 'text': "That's the quantity I have.", 'start': 2826.063, 'duration': 0.901}, {'end': 2829.966, 'text': "And the other one is exactly the same, except it's in-sample.", 'start': 2827.684, 'duration': 2.282}, {'end': 2831.807, 'text': "We didn't use it in the bias-variance analysis.", 'start': 2830.066, 'duration': 1.741}, {'end': 2834.409, 'text': "This one, I'm going to get the expected value of the in-sample.", 'start': 2832.187, 'duration': 2.222}, {'end': 2839.212, 'text': 'So I want to get, given this situation, if I give you any examples, how well are you going to fit them?', 'start': 2834.429, 'duration': 4.783}, {'end': 2843.195, 'text': 'Well, it depends on the examples, but on average, this is how well you are going to fit them.', 'start': 2839.252, 'duration': 3.943}, {'end': 2847.938, 'text': 'And you ask yourself, how do these vary with n? And here comes the learning curve.', 'start': 2843.755, 'duration': 4.183}, {'end': 2849.819, 'text': 'As you get more examples, you learn better.', 'start': 2847.978, 'duration': 1.841}, {'end': 2853.341, 'text': "So hopefully, the learning curve will tell, and we'll see what the learning curve looks like.", 'start': 2850.219, 'duration': 3.122}, {'end': 2857.104, 'text': "So let's take a simple model first.", 'start': 2855.263, 'duration': 1.841}, {'end': 2861.009, 'text': "So it's a simple model.", 'start': 2860.008, 'duration': 1.001}, {'end': 2865.476, 'text': "And because it's a simple model, it does not approximate your target function well.", 'start': 2861.57, 'duration': 3.906}, {'end': 2869.261, 'text': 'The best out-of-sample error you can do is pretty high.', 'start': 2865.976, 'duration': 3.285}, {'end': 2876.573, 'text': 'When you learn, the in-sample will be very close to the out-of-sample.', 'start': 2872.532, 'duration': 4.041}, {'end': 2879.894, 'text': "So let's look at first the behavior as you increase n.", 'start': 2877.033, 'duration': 2.861}], 'summary': 'The quantity in learning situations depends on the dataset and varies with n, as shown by the learning curve.', 'duration': 68.266, 'max_score': 2811.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82811628.jpg'}, {'end': 2959.274, 'src': 'embed', 'start': 2930.354, 'weight': 4, 'content': [{'end': 2933.716, 'text': "And as you can see, although I am getting worse in-sample, I'm getting better out-of-sample.", 'start': 2930.354, 'duration': 3.362}, {'end': 2939.779, 'text': 'And indeed, the discrepancy between them, which is a generalization error, is getting tighter and tighter as n increases.', 'start': 2934.076, 'duration': 5.703}, {'end': 2941.18, 'text': 'Completely logical.', 'start': 2940.359, 'duration': 0.821}, {'end': 2943.601, 'text': 'By the way, this is a real model.', 'start': 2941.76, 'duration': 1.841}, {'end': 2947.903, 'text': 'So when we talk about overfitting, I will tell you what that model is, the simple model and the complex model.', 'start': 2943.821, 'duration': 4.082}, {'end': 2951.907, 'text': "The complex model, exactly the same behavior, except it's shifted.", 'start': 2949.264, 'duration': 2.643}, {'end': 2956.731, 'text': "It's a complex model, so it has a better approximation for your target function.", 'start': 2951.967, 'duration': 4.764}, {'end': 2959.274, 'text': 'So it can achieve, in principle, a better out-of-sample error.', 'start': 2956.751, 'duration': 2.523}], 'summary': "Model's out-of-sample performance improves with increasing n, reducing generalization error", 'duration': 28.92, 'max_score': 2930.354, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82930354.jpg'}, {'end': 3048.344, 'src': 'embed', 'start': 3022.601, 'weight': 5, 'content': [{'end': 3030.828, 'text': 'And the reason I introduced it here is that I want to illustrate the bias and variance analysis versus the VC analysis using the learning curves.', 'start': 3022.601, 'duration': 8.227}, {'end': 3034.651, 'text': 'It will be very illustrative to understand how the two theories relate to each other.', 'start': 3030.928, 'duration': 3.723}, {'end': 3042.838, 'text': "So let's start with the VC analysis on learning curves.", 'start': 3037.073, 'duration': 5.765}, {'end': 3044.62, 'text': 'These are learning curves.', 'start': 3043.639, 'duration': 0.981}, {'end': 3046.902, 'text': 'The in-sample error goes up, as promised.', 'start': 3045.281, 'duration': 1.621}, {'end': 3048.344, 'text': 'The out-of-sample error goes down.', 'start': 3046.922, 'duration': 1.422}], 'summary': 'Illustrating bias and variance analysis vs vc analysis using learning curves, showing in-sample error increase and out-of-sample error decrease.', 'duration': 25.743, 'max_score': 3022.601, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83022601.jpg'}], 'start': 2736.778, 'title': 'Learning function and analysis', 'summary': 'Discusses the challenge of finding the target function in learning and emphasizes the importance of data resources in navigating the hypothesis set, and the impact of noisy examples on guiding the learning process. it also introduces learning curves as a tool to analyze the expected in-sample and out-of-sample errors as a function of n, illustrating how the bias and variance analysis versus the vc analysis relate to each other, and highlighting the behavior of simple and complex models with increasing n.', 'chapters': [{'end': 2773.777, 'start': 2736.778, 'title': 'Finding target function in learning', 'summary': 'Discusses the challenge of finding the target function in learning, emphasizing the importance of data resources in navigating the hypothesis set, and the impact of noisy examples on guiding the learning process.', 'duration': 36.999, 'highlights': ['The importance of data resources in navigating the hypothesis set is emphasized, with a focus on the challenge of finding the target function in learning.', 'The impact of noisy examples on guiding the learning process is highlighted, indicating that terribly noisy examples can worsen the information available for guidance.', 'The difficulty of finding the target function is discussed, emphasizing the significance of this challenge in the learning process.']}, {'end': 3265.504, 'start': 2775.158, 'title': 'Learning curve and bias-variance analysis', 'summary': 'Introduces learning curves as a tool to analyze the expected in-sample and out-of-sample errors as a function of n, illustrating how the bias and variance analysis versus the vc analysis relate to each other, highlighting the behavior of simple and complex models with increasing n.', 'duration': 490.346, 'highlights': ['Learning curves illustrate the expected in-sample and out-of-sample errors as a function of n, showing the behavior of simple and complex models with increasing n. Learning curves plot the expected value of in-sample and out-of-sample errors as a function of n, demonstrating the behavior of simple and complex models as the sample size increases.', 'As the sample size increases, the out-of-sample error decreases, while the in-sample error may increase for a complex model, leading to a tighter generalization error as n increases. With an increase in sample size, the out-of-sample error decreases, while the in-sample error may increase for a complex model, resulting in a tighter generalization error as n increases.', 'The bias-variance analysis versus the VC analysis is illustrated using the learning curves to understand their relationship and behavior with increasing n. The chapter uses learning curves to illustrate the relationship and behavior between the bias-variance analysis and the VC analysis as the sample size increases.']}], 'duration': 528.726, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a82736778.jpg', 'highlights': ['The importance of data resources in navigating the hypothesis set is emphasized, with a focus on the challenge of finding the target function in learning.', 'The impact of noisy examples on guiding the learning process is highlighted, indicating that terribly noisy examples can worsen the information available for guidance.', 'The difficulty of finding the target function is discussed, emphasizing the significance of this challenge in the learning process.', 'Learning curves illustrate the expected in-sample and out-of-sample errors as a function of n, showing the behavior of simple and complex models with increasing n.', 'As the sample size increases, the out-of-sample error decreases, while the in-sample error may increase for a complex model, leading to a tighter generalization error as n increases.', 'The bias-variance analysis versus the VC analysis is illustrated using the learning curves to understand their relationship and behavior with increasing n.']}, {'end': 3600.135, 'segs': [{'end': 3352.49, 'src': 'embed', 'start': 3285.873, 'weight': 2, 'content': [{'end': 3290.977, 'text': 'And if you read the exercise and you follow the steps, it will give you a very good insight into the linear regression.', 'start': 3285.873, 'duration': 5.104}, {'end': 3293.039, 'text': "I'll try to explain the highlights of it.", 'start': 3291.277, 'duration': 1.762}, {'end': 3296.722, 'text': "So let's start with a reminder, the linear regression.", 'start': 3293.679, 'duration': 3.043}, {'end': 3298.684, 'text': "So linear regression, I'm using a target.", 'start': 3297.122, 'duration': 1.562}, {'end': 3307.251, 'text': 'For the purpose of simplification, I am going to use a noisy target, which is linear plus noise.', 'start': 3299.284, 'duration': 7.967}, {'end': 3311.435, 'text': "So I'm using linear regression to learn something linear plus noise.", 'start': 3308.352, 'duration': 3.083}, {'end': 3313.517, 'text': "If it weren't for the noise, I would get it perfectly.", 'start': 3311.495, 'duration': 2.022}, {'end': 3314.458, 'text': "It's already linear.", 'start': 3313.717, 'duration': 0.741}, {'end': 3317.281, 'text': 'But because of the noise, I will be deviating a little bit.', 'start': 3314.898, 'duration': 2.383}, {'end': 3321.945, 'text': 'This is just to make the mathematics that results easier to handle.', 'start': 3318.001, 'duration': 3.944}, {'end': 3327.288, 'text': 'Now, you gave him the data set, and the data set is a noisy data set.', 'start': 3323.426, 'duration': 3.862}, {'end': 3329.569, 'text': 'So each of these is picked independently.', 'start': 3327.308, 'duration': 2.261}, {'end': 3334.751, 'text': 'And this y depends on x, and the only unknown here is the noise.', 'start': 3330.029, 'duration': 4.722}, {'end': 3338.373, 'text': 'So you get this value to give you the average, and then you add a noise to get the y.', 'start': 3334.971, 'duration': 3.402}, {'end': 3342.866, 'text': 'Now, do you remember the linear regression solution?', 'start': 3340.985, 'duration': 1.881}, {'end': 3347.948, 'text': 'Regardless of what the target function is, you look at the data and this is what you get for the solution.', 'start': 3343.486, 'duration': 4.462}, {'end': 3352.49, 'text': 'You take the input data set and the output data set.', 'start': 3348.869, 'duration': 3.621}], 'summary': 'Explanation of linear regression with noisy data sets and its impact on the solution.', 'duration': 66.617, 'max_score': 3285.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83285873.jpg'}, {'end': 3434.824, 'src': 'embed', 'start': 3393.03, 'weight': 1, 'content': [{'end': 3394.811, 'text': 'And that would be an error pattern.', 'start': 3393.03, 'duration': 1.781}, {'end': 3398.192, 'text': 'So it would be plus something, minus something, plus something, minus something.', 'start': 3395.031, 'duration': 3.161}, {'end': 3403.754, 'text': 'And if I add the squared values over here, get the average of those, I will get what we call the in-sample error.', 'start': 3398.292, 'duration': 5.462}, {'end': 3411.896, 'text': "For the out-of-sample error, I'm going to play a simplifying trick here, in order to get the learning curve in the finite case.", 'start': 3405.194, 'duration': 6.702}, {'end': 3418.036, 'text': "Here. I'm going to consider that in order to get the out-of-sample error what I'm going to do?", 'start': 3412.694, 'duration': 5.342}, {'end': 3424.019, 'text': "I'm going to just generate the same inputs, which is a complete no-no in out-of-sample.", 'start': 3418.036, 'duration': 5.983}, {'end': 3426.42, 'text': "Supposedly, out-of-sample, you get points that you haven't seen before.", 'start': 3424.039, 'duration': 2.381}, {'end': 3428.561, 'text': "But you have seen these x's before.", 'start': 3427.14, 'duration': 1.421}, {'end': 3433.203, 'text': "But the redeeming value is that I'm going to give you fresh noise.", 'start': 3429.321, 'duration': 3.882}, {'end': 3434.824, 'text': "So that's the unknown.", 'start': 3433.884, 'duration': 0.94}], 'summary': 'Analyzing error patterns and learning curves for in-sample and out-of-sample errors.', 'duration': 41.794, 'max_score': 3393.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83393030.jpg'}, {'end': 3567.208, 'src': 'embed', 'start': 3539.688, 'weight': 0, 'content': [{'end': 3542.971, 'text': 'And there is a very specific formula that you can get, which is interesting.', 'start': 3539.688, 'duration': 3.283}, {'end': 3544.772, 'text': 'So let me finish with this.', 'start': 3543.011, 'duration': 1.761}, {'end': 3547.234, 'text': 'The best approximation error is sigma squared.', 'start': 3544.792, 'duration': 2.442}, {'end': 3548.295, 'text': "That's the line.", 'start': 3547.334, 'duration': 0.961}, {'end': 3553.779, 'text': 'What is the expected in-sample error? It has a very simple formula.', 'start': 3550.476, 'duration': 3.303}, {'end': 3557.598, 'text': 'Everything is scaled by sigma squared.', 'start': 3556.217, 'duration': 1.381}, {'end': 3560.882, 'text': "So what you have here is, it's almost perfect.", 'start': 3558.179, 'duration': 2.703}, {'end': 3567.208, 'text': 'And you are doing better than perfect by this amount, the ratio of d plus 1.', 'start': 3561.782, 'duration': 5.426}], 'summary': 'The best approximation error is sigma squared, with a simple formula for expected in-sample error scaled by sigma squared, achieving almost perfect results and surpassing perfection by a ratio of d plus 1.', 'duration': 27.52, 'max_score': 3539.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83539688.jpg'}], 'start': 3266.164, 'title': 'Linear regression analysis', 'summary': 'Contrasts theoretical approaches in linear regression, emphasizing the analysis for the linear regression case and the importance of understanding noisy data sets. it also discusses in-sample and out-of-sample errors, illustrating how the in-sample error decreases as the number of examples increases, while the out-of-sample error eventually decreases as well, and the expected in-sample error has a simple formula scaled by sigma squared.', 'chapters': [{'end': 3352.49, 'start': 3266.164, 'title': 'Theoretical approaches in linear regression', 'summary': 'Discusses the contrast between theoretical approaches in linear regression, emphasizing the analysis for the linear regression case and the importance of understanding noisy data sets.', 'duration': 86.326, 'highlights': ['The chapter emphasizes the analysis for the linear regression case, providing insight into handling noisy data sets and understanding the impact of noise on linear regression.', 'Linear regression is used to learn something linear plus noise, and the impact of noise is highlighted as it causes deviation in the results.', 'The solution for linear regression is based on the input and output data sets, regardless of the target function, providing a universal approach to the problem.']}, {'end': 3600.135, 'start': 3353.03, 'title': 'Linear regression learning curves', 'summary': 'Discusses the concept of in-sample and out-of-sample errors in linear regression, illustrating how the in-sample error decreases as the number of examples increases, while the out-of-sample error eventually decreases as well and the expected in-sample error has a simple formula scaled by sigma squared.', 'duration': 247.105, 'highlights': ['The in-sample error pattern is analyzed by comparing the predicted values from the final hypothesis with the actual targets, resulting in an error pattern that is quantified by averaging the squared values, giving the in-sample error.', 'The out-of-sample error is simplified by generating new points with fresh noise on the same inputs, leading to a very specific learning curve with values dependent on the variance of the noise and the number of examples, where the expected in-sample error is almost perfect and even better than perfect due to fitting the finite sample rather than the whole function.', 'The linear regression learning curve shows that the in-sample error decreases as the number of examples increases, with the out-of-sample error eventually decreasing as well, and the expected in-sample error has a simple formula scaled by sigma squared, demonstrating the impact of fitting the noise and the finite sample on the learning process.']}], 'duration': 333.971, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83266164.jpg', 'highlights': ['The linear regression learning curve shows that the in-sample error decreases as the number of examples increases, with the out-of-sample error eventually decreasing as well, and the expected in-sample error has a simple formula scaled by sigma squared, demonstrating the impact of fitting the noise and the finite sample on the learning process.', 'The in-sample error pattern is analyzed by comparing the predicted values from the final hypothesis with the actual targets, resulting in an error pattern that is quantified by averaging the squared values, giving the in-sample error.', 'The chapter emphasizes the analysis for the linear regression case, providing insight into handling noisy data sets and understanding the impact of noise on linear regression.', 'The out-of-sample error is simplified by generating new points with fresh noise on the same inputs, leading to a very specific learning curve with values dependent on the variance of the noise and the number of examples, where the expected in-sample error is almost perfect and even better than perfect due to fitting the finite sample rather than the whole function.', 'Linear regression is used to learn something linear plus noise, and the impact of noise is highlighted as it causes deviation in the results.', 'The solution for linear regression is based on the input and output data sets, regardless of the target function, providing a universal approach to the problem.']}, {'end': 3998.718, 'segs': [{'end': 3684.629, 'src': 'embed', 'start': 3600.475, 'weight': 0, 'content': [{'end': 3604.719, 'text': "And as a result of that, I'm deviating from the optimal guy, and I'm paying the price in out-of-sample error.", 'start': 3600.475, 'duration': 4.244}, {'end': 3610.401, 'text': "And what is the price I'm paying in out-of-sample error? It is the mirror image.", 'start': 3606.279, 'duration': 4.122}, {'end': 3614.002, 'text': 'I lose exactly in out-of-sample what I gained in sample.', 'start': 3611.561, 'duration': 2.441}, {'end': 3618.223, 'text': 'And the most interesting quantity is the summary quantity.', 'start': 3616.162, 'duration': 2.061}, {'end': 3622.124, 'text': 'What is the expected generalization error? The generalization error is the difference between this and that.', 'start': 3618.243, 'duration': 3.881}, {'end': 3623.124, 'text': 'I have the formula for them.', 'start': 3622.204, 'duration': 0.92}, {'end': 3625.265, 'text': 'So all I need to do is write this.', 'start': 3623.345, 'duration': 1.92}, {'end': 3627.866, 'text': 'So let me magnify this.', 'start': 3626.105, 'duration': 1.761}, {'end': 3630.656, 'text': 'This is the generalization error.', 'start': 3629.335, 'duration': 1.321}, {'end': 3637.141, 'text': 'It has the form of the VC dimension divided by the number of examples.', 'start': 3631.437, 'duration': 5.704}, {'end': 3639.482, 'text': "In this case, it's exact.", 'start': 3637.461, 'duration': 2.021}, {'end': 3641.824, 'text': 'And this is what I promised last time.', 'start': 3640.263, 'duration': 1.561}, {'end': 3651.591, 'text': 'I told you that this rule of proportionality between a VC dimension and a number of examples persists to the level where sometimes you just divide the VC dimension by the number of examples,', 'start': 3642.365, 'duration': 9.226}, {'end': 3653.132, 'text': 'and that will give you a generalization error.', 'start': 3651.591, 'duration': 1.541}, {'end': 3658.496, 'text': 'This is the concrete version of it, in spite of the fact that here, this is not a VC dimension.', 'start': 3654.073, 'duration': 4.423}, {'end': 3659.396, 'text': 'This is real-valued.', 'start': 3658.536, 'duration': 0.86}, {'end': 3661.637, 'text': "But it's degrees of freedom, so it plays the role.", 'start': 3659.736, 'duration': 1.901}, {'end': 3671.423, 'text': 'We could actually solve for it and realize that this is indeed the compromise between the degrees of freedom I have in the case of linear regression and the number of examples I am using.', 'start': 3662.058, 'duration': 9.365}, {'end': 3677.226, 'text': "So we'll stop here, and we will go into questions and answers after a short break.", 'start': 3672.583, 'duration': 4.643}, {'end': 3684.629, 'text': "OK, so let's go into the questions.", 'start': 3681.886, 'duration': 2.743}], 'summary': 'Deviation from optimal guy leads to paying the price in out-of-sample error, with the generalization error being the difference between vc dimension and number of examples.', 'duration': 84.154, 'max_score': 3600.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83600475.jpg'}, {'end': 3785.176, 'src': 'embed', 'start': 3760.129, 'weight': 4, 'content': [{'end': 3765.293, 'text': 'And again, the approximation for the simple model is worse than the approximation for the complex model.', 'start': 3760.129, 'duration': 5.164}, {'end': 3770.997, 'text': "So if your game is approximation, and that's your purpose, then obviously the complex model is better.", 'start': 3765.993, 'duration': 5.004}, {'end': 3776.967, 'text': 'In this particular case, you can also ask yourself about the generalization ability.', 'start': 3772.302, 'duration': 4.665}, {'end': 3783.634, 'text': 'The generalization ability will be the discrepancy between either the blue and red curve.', 'start': 3777.067, 'duration': 6.567}, {'end': 3785.176, 'text': 'That would be the VC analysis.', 'start': 3783.814, 'duration': 1.362}], 'summary': 'Complex model has better approximation than simple model for game.', 'duration': 25.047, 'max_score': 3760.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83760129.jpg'}, {'end': 3931.012, 'src': 'embed', 'start': 3906.762, 'weight': 5, 'content': [{'end': 3914.049, 'text': 'But in choosing this model or another, what really dictates the performance is the number of examples versus the complexity of the model.', 'start': 3906.762, 'duration': 7.287}, {'end': 3923.766, 'text': 'OK, When you did the analysis for linear regression?', 'start': 3917.612, 'duration': 6.154}, {'end': 3928.95, 'text': 'if you did it using the perceptron model, would you get the same generalization error?', 'start': 3923.766, 'duration': 5.184}, {'end': 3931.012, 'text': "Let's go for that.", 'start': 3929.551, 'duration': 1.461}], 'summary': 'Performance depends on examples vs. model complexity. linear regression vs. perceptron model comparison.', 'duration': 24.25, 'max_score': 3906.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83906762.jpg'}], 'start': 3600.475, 'title': 'Generalization error and model complexity', 'summary': 'Discusses the relationship between vc dimension and number of examples, leading to a formula for generalization error, and the trade-off between simple and complex models in machine learning, emphasizing the impact on performance.', 'chapters': [{'end': 3659.396, 'start': 3600.475, 'title': 'Generalization error and vc dimension', 'summary': 'Discusses the concept of generalization error in machine learning, specifically the relationship between the vc dimension and the number of examples, resulting in a formula for generalization error as the vc dimension divided by the number of examples, and the observation that the rule of proportionality between vc dimension and number of examples persists even in cases involving real-valued dimensions.', 'duration': 58.921, 'highlights': ['The generalization error is the difference between in-sample and out-of-sample error, and it has the form of the VC dimension divided by the number of examples.', 'The relationship between VC dimension and number of examples persists, sometimes resulting in a generalization error obtained by dividing the VC dimension by the number of examples.', 'Deviation from the optimal model leads to a price in out-of-sample error, with the loss in out-of-sample error being exactly what was gained in sample.']}, {'end': 3998.718, 'start': 3659.736, 'title': 'Trade-off between complexity and performance', 'summary': 'Discusses the trade-off between using simple models and complex models, emphasizing the compromise between the degrees of freedom and the number of examples, and how the performance is dictated by the number of examples versus the complexity of the model.', 'duration': 338.982, 'highlights': ['The compromise between degrees of freedom and number of examples The discussion emphasizes the compromise between the degrees of freedom in linear regression and the number of examples, indicating that the performance of the model is dictated by this trade-off.', 'Trade-off between simple and complex models The comparison between simple and complex models indicates a trade-off in terms of the ability to approximate and the generalization ability, with the performance of the system being influenced by the number of examples versus the complexity of the model.', "Effect of number of examples on model performance The examples demonstrate that matching the complexity of the model to the data resources, represented by the number of examples, significantly impacts the model's performance, with a larger number of points leading to better performance."]}], 'duration': 398.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83600475.jpg', 'highlights': ['The relationship between VC dimension and number of examples persists, sometimes resulting in a generalization error obtained by dividing the VC dimension by the number of examples.', 'The generalization error is the difference between in-sample and out-of-sample error, and it has the form of the VC dimension divided by the number of examples.', 'Deviation from the optimal model leads to a price in out-of-sample error, with the loss in out-of-sample error being exactly what was gained in sample.', 'The compromise between degrees of freedom and number of examples The discussion emphasizes the compromise between the degrees of freedom in linear regression and the number of examples, indicating that the performance of the model is dictated by this trade-off.', 'Trade-off between simple and complex models The comparison between simple and complex models indicates a trade-off in terms of the ability to approximate and the generalization ability, with the performance of the system being influenced by the number of examples versus the complexity of the model.', "Effect of number of examples on model performance The examples demonstrate that matching the complexity of the model to the data resources, represented by the number of examples, significantly impacts the model's performance, with a larger number of points leading to better performance."]}, {'end': 4595.523, 'segs': [{'end': 4028.064, 'src': 'embed', 'start': 3998.718, 'weight': 0, 'content': [{'end': 4001.941, 'text': "When it's the same, they cancel out neatly, and you get the formula that I had.", 'start': 3998.718, 'duration': 3.223}, {'end': 4009.128, 'text': 'But asymptotically, if you make certain assumptions about how X is generated, and you take the asymptotic result, you will get the same thing.', 'start': 4002.341, 'duration': 6.787}, {'end': 4011.109, 'text': 'The short answer is the following.', 'start': 4010.028, 'duration': 1.081}, {'end': 4018.135, 'text': 'The analysis in the exact form that I gave, which gives me these very neat results, is very specific to linear regression,', 'start': 4011.41, 'duration': 6.725}, {'end': 4023.7, 'text': 'very specific to the choice of out-of-sample, as I did it, if you want to give the answer exactly in a finite case.', 'start': 4018.135, 'duration': 5.565}, {'end': 4028.064, 'text': 'If you use the first electron, you would be able to find a parallel, but it may not be as neat.', 'start': 4024.121, 'duration': 3.943}], 'summary': 'The analysis is specific to linear regression, providing neat results, but may not be as neat for other cases.', 'duration': 29.346, 'max_score': 3998.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83998718.jpg'}, {'end': 4067.993, 'src': 'embed', 'start': 4039.831, 'weight': 2, 'content': [{'end': 4042.493, 'text': 'The lecture is called bias-variance, and now we have variance of the noise.', 'start': 4039.831, 'duration': 2.662}, {'end': 4045.556, 'text': "So obviously, I'm so used to these things that I didn't notice.", 'start': 4042.513, 'duration': 3.043}, {'end': 4051.441, 'text': 'When I say the variance here, this has absolutely nothing to do with the bias-variance analysis that I talked about.', 'start': 4046.216, 'duration': 5.225}, {'end': 4052.342, 'text': "It's a noise.", 'start': 4051.701, 'duration': 0.641}, {'end': 4054.443, 'text': "I'm trying to measure the energy of it.", 'start': 4052.762, 'duration': 1.681}, {'end': 4061.149, 'text': "So it's a zero-mean noise, so the energy of it is proportional to the variance.", 'start': 4054.904, 'duration': 6.245}, {'end': 4065.992, 'text': 'So I should have called it the energy of the noise, sigma squared, in order not to confuse people.', 'start': 4061.409, 'duration': 4.583}, {'end': 4067.993, 'text': 'But I hope that I did not confuse too many people.', 'start': 4066.032, 'duration': 1.961}], 'summary': 'Lecture addresses zero-mean noise variance and energy measurement.', 'duration': 28.162, 'max_score': 4039.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84039831.jpg'}, {'end': 4144.327, 'src': 'embed', 'start': 4104.643, 'weight': 3, 'content': [{'end': 4107.406, 'text': 'Now, I know that there are two contributing factors, bias and variance.', 'start': 4104.643, 'duration': 2.763}, {'end': 4114.131, 'text': "Can I get the variance down without getting the bias up? That's a bunch of techniques.", 'start': 4108.287, 'duration': 5.844}, {'end': 4115.953, 'text': 'Regularization will belong to that category.', 'start': 4114.192, 'duration': 1.761}, {'end': 4119.956, 'text': 'Can I get both of them down? That would be learning from hints.', 'start': 4116.493, 'duration': 3.463}, {'end': 4122.756, 'text': 'There would be something that affects both them, and so on.', 'start': 4120.156, 'duration': 2.6}, {'end': 4128.279, 'text': 'So you can map different techniques to how they are affecting the bias and variance.', 'start': 4123.017, 'duration': 5.262}, {'end': 4131.761, 'text': 'I would say that, in terms of any application to learning situation,', 'start': 4128.74, 'duration': 3.021}, {'end': 4135.823, 'text': "it's a guideline rather than something that I'm going to plug in and tell you what the model is.", 'start': 4131.761, 'duration': 4.062}, {'end': 4144.327, 'text': "The answer for the model selection is mostly through validation, which we're going to talk about in a few lectures.", 'start': 4136.723, 'duration': 7.604}], 'summary': 'Balancing bias and variance in model selection through techniques like regularization and learning from hints is crucial for optimal performance.', 'duration': 39.684, 'max_score': 4104.643, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84104643.jpg'}, {'end': 4220.055, 'src': 'embed', 'start': 4195.84, 'weight': 9, 'content': [{'end': 4207.344, 'text': 'Now, although this was just a theoretical way of getting the bias-variance decomposition and this is a conceptual way of understanding what it is there is an ensemble learning method that builds exactly on this,', 'start': 4195.84, 'duration': 11.504}, {'end': 4210.285, 'text': 'which is called bagging, bootstrap aggregation.', 'start': 4207.344, 'duration': 2.941}, {'end': 4213.968, 'text': 'And the idea is that what do I need in order to get g bar?', 'start': 4210.925, 'duration': 3.043}, {'end': 4220.055, 'text': 'We said g bar is great, if I can get it, but it requires an infinite number of data sets and I have only one data set.', 'start': 4213.988, 'duration': 6.067}], 'summary': 'Ensemble learning method bagging, bootstrap aggregation builds on theoretical bias-variance decomposition.', 'duration': 24.215, 'max_score': 4195.84, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84195840.jpg'}, {'end': 4310.547, 'src': 'embed', 'start': 4259.635, 'weight': 1, 'content': [{'end': 4261.615, 'text': 'And believe it or not, that gives you actually a dividend.', 'start': 4259.635, 'duration': 1.98}, {'end': 4264.537, 'text': 'It gives you something about the ensemble learning.', 'start': 4261.635, 'duration': 2.902}, {'end': 4267.478, 'text': 'And there are other, obviously more sophisticated methods of ensemble learning.', 'start': 4264.817, 'duration': 2.661}, {'end': 4273.221, 'text': 'And one way or the other, they appeal to the fact that you are reducing the variance by averaging a bunch of stuff.', 'start': 4267.818, 'duration': 5.403}, {'end': 4279.064, 'text': 'So you can say that either taken outright, like bagging, or inspired in some sense,', 'start': 4274.441, 'duration': 4.623}, {'end': 4281.965, 'text': "that it's a good idea to average because you cancel out fluctuations.", 'start': 4279.064, 'duration': 2.901}, {'end': 4295.272, 'text': 'If you use the Bayesian approach, does this bias-variance dilemma still appear? Repeat the question, please.', 'start': 4285.127, 'duration': 10.145}, {'end': 4305.14, 'text': 'If you use a Bayesian approach, does this bias-variance still appear? The bias-variance is there to stay.', 'start': 4295.452, 'duration': 9.688}, {'end': 4306.682, 'text': "It's a fact.", 'start': 4305.221, 'duration': 1.461}, {'end': 4310.547, 'text': 'And we can take a particular approach.', 'start': 4307.243, 'duration': 3.304}], 'summary': 'Ensemble learning reduces variance by averaging, including bayesian approach.', 'duration': 50.912, 'max_score': 4259.635, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84259635.jpg'}, {'end': 4401.893, 'src': 'embed', 'start': 4375.965, 'weight': 7, 'content': [{'end': 4383.042, 'text': 'When is there extrapolation in machine learning? Functional approximation is one of the fields that is very much related.', 'start': 4375.965, 'duration': 7.077}, {'end': 4387.264, 'text': "Because you are given a finite sample, and you're coming from a function, and you're trying to approximate it.", 'start': 4383.102, 'duration': 4.162}, {'end': 4389.145, 'text': 'And this is one of the applications.', 'start': 4387.384, 'duration': 1.761}, {'end': 4394.509, 'text': 'And in general, interpolation is easier than extrapolation, because you have a handle.', 'start': 4389.586, 'duration': 4.923}, {'end': 4398.531, 'text': 'And if you want to articulate that, in terms of the stuff we have,', 'start': 4394.669, 'duration': 3.862}, {'end': 4401.893, 'text': 'the variance in interpolation is smaller than the variance in extrapolation in general.', 'start': 4398.531, 'duration': 3.362}], 'summary': 'Extrapolation in machine learning is more challenging than interpolation due to higher variance.', 'duration': 25.928, 'max_score': 4375.965, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84375965.jpg'}, {'end': 4536.726, 'src': 'embed', 'start': 4507.907, 'weight': 6, 'content': [{'end': 4512.971, 'text': "But then, in general, when you actually apply some of these algorithms, there is a correlation between one another, so there's a covariance.", 'start': 4507.907, 'duration': 5.064}, {'end': 4516.414, 'text': "So there's a question of the balance between the two.", 'start': 4512.991, 'duration': 3.423}, {'end': 4524.999, 'text': 'But it really is, in terms of application, related more to ensemble learning than to just the general bias-variance analysis as I did it.', 'start': 4516.934, 'duration': 8.065}, {'end': 4532.484, 'text': 'Because in the bias-variance analysis, I had the luxury of picking independently generated data sets,', 'start': 4525.339, 'duration': 7.145}, {'end': 4536.726, 'text': "generating independent guys and then averaging them, because it's a conceptual aspect.", 'start': 4532.484, 'duration': 4.242}], 'summary': 'Applying algorithms shows a correlation and covariance, related more to ensemble learning than general bias-variance analysis.', 'duration': 28.819, 'max_score': 4507.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84507907.jpg'}, {'end': 4579.107, 'src': 'embed', 'start': 4552.221, 'weight': 8, 'content': [{'end': 4559.078, 'text': 'So, is linear regression actually learning or is it just fitting along the lines of function approximation??', 'start': 4552.221, 'duration': 6.857}, {'end': 4565.323, 'text': 'Linear regression is a learning technique and fitting is the first part of learning.', 'start': 4559.962, 'duration': 5.361}, {'end': 4567.964, 'text': 'So you always fit in order to learn.', 'start': 4566.044, 'duration': 1.92}, {'end': 4572.505, 'text': 'The only added thing is that you want to make sure that as you fit, you also perform well out of sample.', 'start': 4568.384, 'duration': 4.121}, {'end': 4574.325, 'text': "That's what the theory was about.", 'start': 4572.665, 'duration': 1.66}, {'end': 4579.107, 'text': "So I've been spending four lectures trying to make sure that when you do the intuitive thing, I give you data, you fit them.", 'start': 4574.626, 'duration': 4.481}], 'summary': 'Linear regression is a learning technique, emphasizing fitting for learning, and the importance of performing well out of sample.', 'duration': 26.886, 'max_score': 4552.221, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a84552221.jpg'}], 'start': 3998.718, 'title': 'Bias-variance tradeoff in machine learning', 'summary': "Provides a detailed analysis of bias-variance tradeoff in linear regression, emphasizing specific assumptions and methods, including noise variance's impact. it also discusses benefits of ensemble learning in reducing variance, the persistence of the bias-variance dilemma in the bayesian approach, and the impact of covariance on ensemble learning.", 'chapters': [{'end': 4259.134, 'start': 3998.718, 'title': 'Bias-variance analysis in linear regression', 'summary': 'Provides a detailed analysis of the bias-variance tradeoff in linear regression, emphasizing the specific assumptions and methods used, including the concept of noise variance and its impact, as well as the limitations and guidance provided by the analysis for model selection and learning techniques.', 'duration': 260.416, 'highlights': ['The analysis in the exact form provides specific results for linear regression and out-of-sample choice. The analysis gives specific results for linear regression and out-of-sample choice, providing a clear understanding of the impact of specific choices on the outcomes.', 'The lecture clarifies the concept of noise variance and its distinction from bias-variance analysis, emphasizing its role in measuring the energy of the noise. The lecture emphasizes the distinction between noise variance and bias-variance analysis, highlighting the significance of noise variance in measuring the energy of the noise and its impact on the analysis.', 'The bias-variance analysis serves as a guide for understanding the impact of bias and variance on the outcomes, offering insights into affecting these quantities through techniques like regularization and learning from hints. The bias-variance analysis acts as a guide for understanding the impact of bias and variance, providing insights into techniques such as regularization and learning from hints to affect these quantities and optimize outcomes.', 'The chapter emphasizes the importance of validation as the gold standard for model selection and choices in learning situations. The chapter underscores the significance of validation as the gold standard for model selection and choices in learning situations, highlighting its pivotal role in making informed decisions.', 'The concept of ensemble learning methods, particularly bagging, is discussed as an extension of theoretical bias-variance decomposition, utilizing bootstrapping to create multiple datasets for analysis. The discussion delves into ensemble learning methods, specifically bagging, as an extension of theoretical bias-variance decomposition, utilizing bootstrapping to create multiple datasets for analysis and understanding its significance.']}, {'end': 4595.523, 'start': 4259.635, 'title': 'Ensemble learning and bias-variance dilemma', 'summary': 'Discusses the benefits of ensemble learning in reducing variance by averaging, the persistence of the bias-variance dilemma in the bayesian approach, the relation of functional approximation to machine learning, and the impact of covariance on ensemble learning. it also explains the concept of fitting in linear regression as a part of the learning process.', 'duration': 335.888, 'highlights': ['Ensemble learning reduces variance by averaging a bunch of methods, such as bagging, and is effective in canceling out fluctuations. Ensemble learning, like bagging, reduces variance by averaging multiple methods, effectively canceling out fluctuations.', 'The bias-variance dilemma persists in the Bayesian approach, where the bias and variance remain unchanged despite the specific approach. The bias-variance dilemma remains in the Bayesian approach, with the bias and variance remaining unchanged despite the specific approach.', 'Functional approximation in machine learning involves interpolation and extrapolation, where interpolation is generally easier than extrapolation due to smaller variance. Functional approximation in machine learning involves interpolation and extrapolation, with interpolation being generally easier than extrapolation due to smaller variance.', 'Covariance plays a role in ensemble learning, affecting the balance between reducing variance and independence of the averaged methods. Covariance affects the balance between reducing variance and independence of the averaged methods in ensemble learning.', 'Linear regression is a learning technique that involves fitting as the first part of learning, with the additional requirement of performing well out of sample. Linear regression is a learning technique that involves fitting as the first part of learning, with the additional requirement of performing well out of sample.']}], 'duration': 596.805, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zrEyxfl2-a8/pics/zrEyxfl2-a83998718.jpg', 'highlights': ['The analysis in the exact form provides specific results for linear regression and out-of-sample choice, offering a clear understanding of the impact of specific choices on the outcomes.', 'Ensemble learning reduces variance by averaging multiple methods, effectively canceling out fluctuations.', 'The lecture emphasizes the distinction between noise variance and bias-variance analysis, highlighting the significance of noise variance in measuring the energy of the noise and its impact on the analysis.', 'The bias-variance analysis acts as a guide for understanding the impact of bias and variance, providing insights into techniques such as regularization and learning from hints to affect these quantities and optimize outcomes.', 'The chapter underscores the significance of validation as the gold standard for model selection and choices in learning situations, highlighting its pivotal role in making informed decisions.', 'The bias-variance dilemma remains in the Bayesian approach, with the bias and variance remaining unchanged despite the specific approach.', 'Covariance affects the balance between reducing variance and independence of the averaged methods in ensemble learning.', 'Functional approximation in machine learning involves interpolation and extrapolation, with interpolation being generally easier than extrapolation due to smaller variance.', 'Linear regression is a learning technique that involves fitting as the first part of learning, with the additional requirement of performing well out of sample.', 'The concept of ensemble learning methods, particularly bagging, is discussed as an extension of theoretical bias-variance decomposition, utilizing bootstrapping to create multiple datasets for analysis and understanding its significance.']}], 'highlights': ['The VC dimension is used to establish learning feasibility and estimate example resources.', 'The VC inequality and generalization bound describe the generalization ability of the final hypothesis based on the VC dimension.', 'The practical utility of the VC dimension in learning problems was discussed, addressing its relevance in real-world applications.', 'The number of examples needed is proportional to the VC dimension, with a practical rule of thumb suggesting 10 times the VC dimension for interesting generalization properties.', 'Theoretical analysis summarized into a simple bound referred to as the generalization bound, which provides a bound on the out-of-sample performance based on the in-sample performance.', 'The theoretical bound suggests tens of thousands of examples needed, while in reality, only maybe 50 examples are needed for the same task.', 'The tradeoff between approximation and generalization emphasizes the impact of hypothesis set size on approximating the target function and the challenge of generalization in machine learning.', 'The bias-variance analysis quantifies the tradeoff through decomposition into approximation and generalization components, focusing on real-valued functions and applying the analysis specifically to linear regression.', 'The final hypothesis depends on the specific data set and can vary with different data sets, crucial in bias-variance analysis.', 'The summary encapsulates the core concept of the bias-variance trade-off, highlighting the inverse relationship between bias and variance in machine learning models.', 'The linear model yields a mean square error of approximately 0.2, while the constant model results in a mean square error of 0.5, demonstrating the superiority of the linear model in approximation.', 'The process involves fitting the target function with a line through calculation of the mean square error and obtaining the optimal parameters, illustrating the practical application of the linear model in approximation.', 'The importance of data resources in navigating the hypothesis set is emphasized, with a focus on the challenge of finding the target function in learning.', 'Learning curves illustrate the expected in-sample and out-of-sample errors as a function of n, showing the behavior of simple and complex models with increasing n.', 'The in-sample error pattern is analyzed by comparing the predicted values from the final hypothesis with the actual targets, resulting in an error pattern that is quantified by averaging the squared values, giving the in-sample error.', 'The relationship between VC dimension and number of examples persists, sometimes resulting in a generalization error obtained by dividing the VC dimension by the number of examples.', 'The generalization error is the difference between in-sample and out-of-sample error, and it has the form of the VC dimension divided by the number of examples.', 'Deviation from the optimal model leads to a price in out-of-sample error, with the loss in out-of-sample error being exactly what was gained in sample.', 'The compromise between degrees of freedom and number of examples The discussion emphasizes the compromise between the degrees of freedom in linear regression and the number of examples, indicating that the performance of the model is dictated by this trade-off.', 'Trade-off between simple and complex models The comparison between simple and complex models indicates a trade-off in terms of the ability to approximate and the generalization ability, with the performance of the system being influenced by the number of examples versus the complexity of the model.', 'The analysis in the exact form provides specific results for linear regression and out-of-sample choice, offering a clear understanding of the impact of specific choices on the outcomes.', 'Ensemble learning reduces variance by averaging multiple methods, effectively canceling out fluctuations.', 'The lecture emphasizes the distinction between noise variance and bias-variance analysis, highlighting the significance of noise variance in measuring the energy of the noise and its impact on the analysis.', 'The bias-variance analysis acts as a guide for understanding the impact of bias and variance, providing insights into techniques such as regularization and learning from hints to affect these quantities and optimize outcomes.', 'The chapter underscores the significance of validation as the gold standard for model selection and choices in learning situations, highlighting its pivotal role in making informed decisions.', 'The bias-variance dilemma remains in the Bayesian approach, with the bias and variance remaining unchanged despite the specific approach.', 'Covariance affects the balance between reducing variance and independence of the averaged methods in ensemble learning.', 'Functional approximation in machine learning involves interpolation and extrapolation, with interpolation being generally easier than extrapolation due to smaller variance.', 'Linear regression is a learning technique that involves fitting as the first part of learning, with the additional requirement of performing well out of sample.', 'The concept of ensemble learning methods, particularly bagging, is discussed as an extension of theoretical bias-variance decomposition, utilizing bootstrapping to create multiple datasets for analysis and understanding its significance.']}