Coursnap

title
Lecture 07 - The VC Dimension

description
The VC Dimension - A measure of what it takes a model to learn. Relationship to the number of parameters and degrees of freedom. Lecture 7 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/ This lecture was recorded on April 24, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail
{'title': 'Lecture 07 - The VC Dimension', 'heatmap': [{'end': 449.007, 'start': 347.786, 'weight': 0.814}, {'end': 841.256, 'start': 791.728, 'weight': 0.743}, {'end': 972.659, 'start': 924.383, 'weight': 0.703}, {'end': 1326.086, 'start': 1277.286, 'weight': 0.765}, {'end': 1942.004, 'start': 1673.171, 'weight': 0.812}, {'end': 2030.499, 'start': 1977.115, 'weight': 0.717}, {'end': 2294.981, 'start': 2242.466, 'weight': 0.701}, {'end': 2382.795, 'start': 2338.459, 'weight': 0.832}, {'end': 2515.076, 'start': 2466.015, 'weight': 0.763}, {'end': 2872.449, 'start': 2778.47, 'weight': 0.843}, {'end': 3709.061, 'start': 3572.146, 'weight': 0.778}, {'end': 3882.518, 'start': 3834.758, 'weight': 0.748}], 'summary': 'Discusses learning theory main results, vc theory, vc dimension of perceptrons, its impact, tradeoff analysis, generalization bounds, and the relationship between vc dimension, breakpoint, epsilon, and delta in learning, providing insights on growth functions, generalization, and learning implications.', 'chapters': [{'end': 85.161, 'segs': [{'end': 44.043, 'src': 'embed', 'start': 14.027, 'weight': 1, 'content': [{'end': 14.607, 'text': 'Welcome back.', 'start': 14.027, 'duration': 0.58}, {'end': 22.55, 'text': 'Last time we introduced the main result in learning theory and there were two parts.', 'start': 16.708, 'duration': 5.842}, {'end': 31.896, 'text': 'The first part is to get a handle on the growth function m, sub h of n which characterizes the hypothesis set H.', 'start': 23.823, 'duration': 8.073}, {'end': 44.043, 'text': 'And the way we got a handle on it is by introducing the idea of a breakpoint and then bounding the gross function in terms of a formula that depends on that breakpoint.', 'start': 33.255, 'duration': 10.788}], 'summary': 'Introduced main result in learning theory with two parts: characterizing hypothesis set h and bounding growth function based on breakpoint.', 'duration': 30.016, 'max_score': 14.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI14027.jpg'}, {'end': 99.29, 'src': 'embed', 'start': 68.077, 'weight': 0, 'content': [{'end': 71.858, 'text': 'It is bounded above by a polynomial in n, since k is a constant.', 'start': 68.077, 'duration': 3.781}, {'end': 75.118, 'text': 'And if you look at this, this is indeed a polynomial.', 'start': 72.698, 'duration': 2.42}, {'end': 80.52, 'text': 'And the maximum power you have in this expression is n to the k minus 1.', 'start': 75.358, 'duration': 5.162}, {'end': 85.161, 'text': 'So not only is it polynomial, but also the order of the polynomial depends on the breakpoint.', 'start': 80.52, 'duration': 4.641}, {'end': 99.29, 'text': 'We were interested in the growth function because it was our way of characterizing the redundancy that we need to understand in order to be able to switch from the Hoeffding inequality to the VC inequality.', 'start': 86.735, 'duration': 12.555}], 'summary': 'The growth function is bounded above by a polynomial in n, with the maximum power being n to the k minus 1.', 'duration': 31.213, 'max_score': 68.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI68077.jpg'}], 'start': 14.027, 'title': 'Learning theory main result', 'summary': 'Introduces the main result in learning theory, discussing the growth function m sub h of n, introducing the idea of a breakpoint, and bounding the growth function using a combinatorial formula that is polynomial and depends on the breakpoint.', 'chapters': [{'end': 85.161, 'start': 14.027, 'title': 'Learning theory main result', 'summary': 'Introduces the main result in learning theory, discussing the growth function m sub h of n which characterizes the hypothesis set h, introducing the idea of a breakpoint, and bounding the growth function using a combinatorial formula that is polynomial and depends on the breakpoint.', 'duration': 71.134, 'highlights': ['The growth function m sub h of n characterizes the hypothesis set H and is bounded above by a polynomial in n, since k is a constant, with the maximum power being n to the k minus 1.', 'Introducing the idea of a breakpoint to get a handle on the growth function m sub h of n, bounding the growth function in terms of a formula that depends on the breakpoint, and finding a combinatorial formula that upper bounds the growth function, given that it has a breakpoint.']}], 'duration': 71.134, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI14027.jpg', 'highlights': ['The growth function m sub h of n is bounded above by a polynomial in n, with the maximum power being n to the k minus 1.', 'Introducing the idea of a breakpoint to get a handle on the growth function m sub h of n, bounding the growth function in terms of a formula that depends on the breakpoint.']}, {'end': 998.159, 'segs': [{'end': 109.239, 'src': 'embed', 'start': 86.735, 'weight': 1, 'content': [{'end': 99.29, 'text': 'We were interested in the growth function because it was our way of characterizing the redundancy that we need to understand in order to be able to switch from the Hoeffding inequality to the VC inequality.', 'start': 86.735, 'duration': 12.555}, {'end': 102.894, 'text': 'And the VC inequality will be the case that handles the learning proper.', 'start': 99.75, 'duration': 3.144}, {'end': 109.239, 'text': 'And in order to do that, we looked at the bad events that are characterized by a small area, according to Hoeffding.', 'start': 103.755, 'duration': 5.484}], 'summary': 'Studying growth function to transition to vc inequality for learning.', 'duration': 22.504, 'max_score': 86.735, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI86735.jpg'}, {'end': 159.52, 'src': 'embed', 'start': 132.574, 'weight': 0, 'content': [{'end': 140.259, 'text': 'if you recall that one, we ended up switching completely from the Hoeffding inequality, which is the top one, into the VC inequality,', 'start': 132.574, 'duration': 7.685}, {'end': 144.822, 'text': 'which is the final theoretical result in machine learning, the characterization of generalization.', 'start': 140.259, 'duration': 4.563}, {'end': 153.325, 'text': 'And they are very similar, except for a fundamental difference, which is here, and technical differences, which are in the constants.', 'start': 145.463, 'duration': 7.862}, {'end': 159.52, 'text': 'As you go through the proof, you will have to change 2s into 4s, and epsilon into a smaller epsilon, and whatnot.', 'start': 154.318, 'duration': 5.202}], 'summary': 'Switched from hoeffding to vc inequality in machine learning.', 'duration': 26.946, 'max_score': 132.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI132574.jpg'}, {'end': 282.452, 'src': 'embed', 'start': 256.664, 'weight': 3, 'content': [{'end': 262.466, 'text': 'We will be able to compute the VC dimension exactly for perceptrons in any dimensional space.', 'start': 256.664, 'duration': 5.802}, {'end': 265.407, 'text': "So it doesn't have to be two-dimensional space like the one we did before.", 'start': 262.786, 'duration': 2.621}, {'end': 268.988, 'text': "You take any dimension, and we'll get the value of the VC dimension exactly.", 'start': 265.727, 'duration': 3.261}, {'end': 276.35, 'text': 'This is a rare case, because usually when we get the VC dimension, we get a bound on the VC dimension, just out of the practicality of the situation.', 'start': 269.548, 'duration': 6.802}, {'end': 282.452, 'text': "But here, we'll be able to get it exactly, and that will help us going through the interpretation of the VC dimension.", 'start': 276.71, 'duration': 5.742}], 'summary': 'Perceptrons vc dimension can be computed exactly in any dimensional space, aiding interpretation.', 'duration': 25.788, 'max_score': 256.664, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI256664.jpg'}, {'end': 458.799, 'src': 'heatmap', 'start': 347.786, 'weight': 4, 'content': [{'end': 355.511, 'text': 'What is it? In words? It is the most points you can shatter.', 'start': 347.786, 'duration': 7.725}, {'end': 357.334, 'text': 'That is not a foreign notion for us.', 'start': 355.751, 'duration': 1.583}, {'end': 365.276, 'text': 'So if you can shatter 20 points, and that is the most you can do, then the VC dimension is 20.', 'start': 358.155, 'duration': 7.121}, {'end': 374.981, 'text': 'In terms of the technical quantities we defined, this would be the largest value of n, such that the gross function is 2 to the n.', 'start': 365.276, 'duration': 9.705}, {'end': 377.562, 'text': 'So if you go one above, the 2 to the n will be broken.', 'start': 374.981, 'duration': 2.581}, {'end': 380.343, 'text': 'So you can think, ah, so VC dimension is the maximum.', 'start': 377.742, 'duration': 2.601}, {'end': 381.704, 'text': 'Next one must be a breakpoint.', 'start': 380.483, 'duration': 1.221}, {'end': 382.984, 'text': 'And that is indeed the case.', 'start': 382.004, 'duration': 0.98}, {'end': 388.827, 'text': 'The most important thing to realize is that we are talking about the most points you can chatter.', 'start': 383.925, 'duration': 4.902}, {'end': 393.804, 'text': "It doesn't guarantee that every n points, let's say that the VC dimension is n.", 'start': 388.967, 'duration': 4.837}, {'end': 397.066, 'text': "It doesn't say that every n points can be shattered.", 'start': 393.804, 'duration': 3.262}, {'end': 404.029, 'text': 'All you need is one set of n points that can be shattered, in order to say that you can shatter n points.', 'start': 397.286, 'duration': 6.743}, {'end': 408.731, 'text': 'That has always been the case in our analysis.', 'start': 404.489, 'duration': 4.242}, {'end': 413.374, 'text': "Let's try to take this definition and interpret it.", 'start': 410.132, 'duration': 3.242}, {'end': 415.235, 'text': "Let's say that I computed the VC dimension.", 'start': 413.434, 'duration': 1.801}, {'end': 418.447, 'text': 'I told you the VC dimension in this case is 15.', 'start': 415.255, 'duration': 3.192}, {'end': 424.11, 'text': 'Now, what can you say about n, which is at most 15, in terms of the ability to shatter or not?', 'start': 418.447, 'duration': 5.663}, {'end': 433.756, 'text': 'You can say that if n is at most 15, the VC dimension, then H is guaranteed to be able to shatter n points.', 'start': 425.611, 'duration': 8.145}, {'end': 435.037, 'text': "Which n points? I haven't said.", 'start': 433.856, 'duration': 1.181}, {'end': 440.3, 'text': 'But there has to be n points for which the hypothesis set can shatter.', 'start': 435.197, 'duration': 5.103}, {'end': 449.007, 'text': "Why is that? It's simply because Since the VC dimension is this number, there will be that many points that can be shattered.", 'start': 440.62, 'duration': 8.387}, {'end': 453.292, 'text': 'Well, any subset of them will have to be shattered as well.', 'start': 449.728, 'duration': 3.564}, {'end': 458.799, 'text': 'Therefore, a smaller number will also be shattered, which means that if n is smaller, you can shatter it.', 'start': 453.633, 'duration': 5.166}], 'summary': 'Vc dimension is the maximum points shattered, ensuring the ability to shatter n points.', 'duration': 23.602, 'max_score': 347.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI347786.jpg'}, {'end': 574.646, 'src': 'embed', 'start': 547.578, 'weight': 6, 'content': [{'end': 555.863, 'text': 'So the VC dimension will also serve as the order of the polynomial that bounds the gross function of a hypothesis set that has that VC dimension.', 'start': 547.578, 'duration': 8.285}, {'end': 558.185, 'text': 'All of this is very simple.', 'start': 557.224, 'duration': 0.961}, {'end': 563.344, 'text': "Now let's take examples in order to get the VC dimension in some cases.", 'start': 560.103, 'duration': 3.241}, {'end': 565.224, 'text': 'And you have seen this before.', 'start': 563.504, 'duration': 1.72}, {'end': 574.646, 'text': "Remember positive rays? How many points can positive rays shatter? Oh, it's just one point.", 'start': 565.304, 'duration': 9.342}], 'summary': 'Vc dimension serves as the order of the polynomial, with positive rays shattering only one point.', 'duration': 27.068, 'max_score': 547.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI547578.jpg'}, {'end': 719.666, 'src': 'embed', 'start': 691.352, 'weight': 5, 'content': [{'end': 693.734, 'text': 'We hope that the final hypothesis approximates this guy.', 'start': 691.352, 'duration': 2.382}, {'end': 696.596, 'text': 'And we introduce this thing in order to get the probabilistic analysis.', 'start': 694.234, 'duration': 2.362}, {'end': 697.397, 'text': 'We have seen this before.', 'start': 696.636, 'duration': 0.761}, {'end': 703.421, 'text': "Now let's look at this diagram, and see what the VC dimension says.", 'start': 699.038, 'duration': 4.383}, {'end': 705.683, 'text': 'The main result.', 'start': 704.942, 'duration': 0.741}, {'end': 717.385, 'text': "is that if the VC dimension is finite, that's all you are asking, then now the green final hypothesis will generalize.", 'start': 706.917, 'duration': 10.468}, {'end': 719.666, 'text': 'That we have established by the theory.', 'start': 718.265, 'duration': 1.401}], 'summary': "Vc dimension's finiteness ensures generalization of the final hypothesis.", 'duration': 28.314, 'max_score': 691.352, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI691352.jpg'}, {'end': 788.947, 'src': 'embed', 'start': 765.683, 'weight': 7, 'content': [{'end': 774.685, 'text': 'We have gone through all of this trouble in order to guarantee that generalization will happen uniformly, regardless of which hypothesis you pick.', 'start': 765.683, 'duration': 9.002}, {'end': 780.326, 'text': 'Therefore, you can find the craziest learning algorithm and it can pick anything you want,', 'start': 775.465, 'duration': 4.861}, {'end': 782.886, 'text': 'and you still can make the statement about the final hypothesis.', 'start': 780.326, 'duration': 2.56}, {'end': 786.627, 'text': "So now the learning algorithm doesn't matter as far as generalization is concerned.", 'start': 783.226, 'duration': 3.401}, {'end': 788.947, 'text': "So let's punish it by graying it out.", 'start': 786.927, 'duration': 2.02}], 'summary': 'Uniform generalization guarantees regardless of learning algorithm.', 'duration': 23.264, 'max_score': 765.683, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI765683.jpg'}, {'end': 841.256, 'src': 'heatmap', 'start': 791.728, 'weight': 0.743, 'content': [{'end': 796.51, 'text': "Now, it's also independent of? the input distribution.", 'start': 791.728, 'duration': 4.782}, {'end': 798.731, 'text': 'So this is the box.', 'start': 797.911, 'duration': 0.82}, {'end': 802.332, 'text': 'Now, this was technically introduced in order to get Hoeffding.', 'start': 799.571, 'duration': 2.761}, {'end': 805.213, 'text': 'And obviously, it has to survive in order to get the VC inequality.', 'start': 802.412, 'duration': 2.801}, {'end': 810.334, 'text': 'The reason I am talking about the independence here is because of an interesting point.', 'start': 806.713, 'duration': 3.621}, {'end': 817.016, 'text': 'We mentioned that when we define the gross function or the VC dimension, I give you the budget N.', 'start': 810.814, 'duration': 6.202}, {'end': 822.077, 'text': 'And then you choose the points any way you want, with a view to maximizing the dichotomies.', 'start': 817.016, 'duration': 5.061}, {'end': 827.28, 'text': 'Right? So now, there is no probability distribution that can beat you.', 'start': 822.277, 'duration': 5.003}, {'end': 835.269, 'text': 'I can pick the weirdest probability distribution that has preferences for funny points and whatnot, and your choice of the points will be fine,', 'start': 827.78, 'duration': 7.489}, {'end': 837.271, 'text': 'because you chose the points that maximize.', 'start': 835.269, 'duration': 2.002}, {'end': 841.256, 'text': 'So whatever the probability distribution does, you will be doing more.', 'start': 837.692, 'duration': 3.564}], 'summary': 'Independence of input distribution in maximizing dichotomies.', 'duration': 49.528, 'max_score': 791.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI791728.jpg'}, {'end': 877.713, 'src': 'embed', 'start': 847.676, 'weight': 8, 'content': [{'end': 851.838, 'text': 'The learning statement that this guy will generalize will hold for any probability distribution.', 'start': 847.676, 'duration': 4.162}, {'end': 854.979, 'text': 'So another guy beats the dust.', 'start': 852.798, 'duration': 2.181}, {'end': 862.521, 'text': 'Now you look at this, and then there is a third guy, which is an obvious one, which is the target function.', 'start': 857.48, 'duration': 5.041}, {'end': 869.072, 'text': "All of this analysis? The target function didn't matter at all, as far as generalization is concerned.", 'start': 863.102, 'duration': 5.97}, {'end': 871.892, 'text': "We are generalizing to it, but we don't care what it is.", 'start': 869.092, 'duration': 2.8}, {'end': 877.713, 'text': "As long as it generates the examples we learn from, and then we test on it, that's all we care about.", 'start': 872.372, 'duration': 5.341}], 'summary': "Generalization holds for any probability distribution, target function doesn't matter for generalization.", 'duration': 30.037, 'max_score': 847.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI847676.jpg'}, {'end': 924.083, 'src': 'embed', 'start': 896.235, 'weight': 2, 'content': [{'end': 898.916, 'text': 'The hypothesis set is where we define the VC dimension.', 'start': 896.235, 'duration': 2.681}, {'end': 899.716, 'text': 'And if you remember,', 'start': 898.956, 'duration': 0.76}, {'end': 905.978, 'text': 'very early on I told you that the hypothesis set is a little bit of an artificial notion to introduce as part of the learning diagram.', 'start': 899.716, 'duration': 6.262}, {'end': 909.539, 'text': 'And I said that we are going to introduce it because there is no downside.', 'start': 906.638, 'duration': 2.901}, {'end': 911.359, 'text': 'There is no loss of generality, which is true.', 'start': 909.579, 'duration': 1.78}, {'end': 913.379, 'text': 'And there is an upside for the theory.', 'start': 911.879, 'duration': 1.5}, {'end': 915.12, 'text': 'Now you can see the upside.', 'start': 913.86, 'duration': 1.26}, {'end': 919.842, 'text': 'The entire VC theory deals with the hypothesis set by itself.', 'start': 915.62, 'duration': 4.222}, {'end': 924.083, 'text': "That's what has a VC dimension, and that's what will tell you you are able to generalize.", 'start': 920.242, 'duration': 3.841}], 'summary': 'Introduction of hypothesis set for vc dimension in learning theory.', 'duration': 27.848, 'max_score': 896.235, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI896235.jpg'}, {'end': 972.659, 'src': 'heatmap', 'start': 924.383, 'weight': 0.703, 'content': [{'end': 929.045, 'text': 'The rest of the guys that are more intuitive that disappeared here are not relevant to that theory.', 'start': 924.383, 'duration': 4.662}, {'end': 937.148, 'text': 'Now, the training examples are left, because the statement that involves the VC dimension is a probabilistic statement.', 'start': 930.546, 'duration': 6.602}, {'end': 940.849, 'text': 'It says that with high probability, you will generalize.', 'start': 937.468, 'duration': 3.381}, {'end': 947.593, 'text': "With high probability, with respect to what? It's with respect to generating the data.", 'start': 941.61, 'duration': 5.983}, {'end': 952.737, 'text': 'You may get a very unlucky data set, for which you are not going to generalize.', 'start': 948.153, 'duration': 4.584}, {'end': 955.999, 'text': 'The guarantee is that this happens with a very small probability.', 'start': 953.217, 'duration': 2.782}, {'end': 959.522, 'text': 'So this remains here, just because it is part of the statement.', 'start': 956.419, 'duration': 3.103}, {'end': 963.765, 'text': 'And this triangle is where the VC inequality lives.', 'start': 960.122, 'duration': 3.643}, {'end': 972.659, 'text': 'Now we go into a fun thing of computing the VC dimension for the perceptrons.', 'start': 967.157, 'duration': 5.502}], 'summary': 'Vc dimension ensures high probability of generalization in machine learning.', 'duration': 48.276, 'max_score': 924.383, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI924383.jpg'}, {'end': 955.999, 'src': 'embed', 'start': 930.546, 'weight': 9, 'content': [{'end': 937.148, 'text': 'Now, the training examples are left, because the statement that involves the VC dimension is a probabilistic statement.', 'start': 930.546, 'duration': 6.602}, {'end': 940.849, 'text': 'It says that with high probability, you will generalize.', 'start': 937.468, 'duration': 3.381}, {'end': 947.593, 'text': "With high probability, with respect to what? It's with respect to generating the data.", 'start': 941.61, 'duration': 5.983}, {'end': 952.737, 'text': 'You may get a very unlucky data set, for which you are not going to generalize.', 'start': 948.153, 'duration': 4.584}, {'end': 955.999, 'text': 'The guarantee is that this happens with a very small probability.', 'start': 953.217, 'duration': 2.782}], 'summary': 'Vc dimension statement ensures high probability of generalization with small probability of failure.', 'duration': 25.453, 'max_score': 930.546, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI930546.jpg'}], 'start': 86.735, 'title': 'Vc theory & generalization', 'summary': 'Covers the transition from hoeffding inequality to vc inequality, emphasizing redundancy characterization through growth function, significance of vc dimension in learning situations, and its relation to generalization in learning. it discusses vc dimension, its computation for perceptrons, and its relationship to generalization, as well as the concepts of hypothesis sets, vc theory, and the role of probability distributions and target functions in generalization.', 'chapters': [{'end': 153.325, 'start': 86.735, 'title': 'Growth function and redundancy in learning', 'summary': 'Discusses the transition from hoeffding inequality to vc inequality in machine learning, emphasizing the characterization of redundancy through the growth function and its relation to learning proper.', 'duration': 66.59, 'highlights': ['The transition from Hoeffding inequality to VC inequality is the final theoretical result in machine learning, characterizing generalization with a fundamental difference and technical differences in the constants.', 'The growth function is used to characterize the redundancy resulting from overlapping bad regions of different hypotheses, leading to the switch to VC inequality.', 'The redundancy in learning is related to the growth function, which addresses the small area characterization of bad events according to Hoeffding.']}, {'end': 315.708, 'start': 154.318, 'title': 'Vc dimension and its applications', 'summary': 'Focuses on the transition from hypotheses m to growth function, the technical finesse involving 2n points, and the significance of vc dimension in learning situations, including the exact computation for perceptrons in any dimensional space.', 'duration': 161.39, 'highlights': ['The significance of VC dimension in learning situations, including the exact computation for perceptrons in any dimensional space The VC dimension is emphasized as a crucial concept in learning situations, as it provides insight into the complexity of hypothesis sets. The exact computation of VC dimension for perceptrons in any dimensional space is highlighted as a rare case, offering practical value for understanding and applying the concept.', 'The transition from hypotheses M to growth function The chapter discusses the replacement of the number of hypotheses M with the growth function, indicating a significant shift in the theoretical framework for understanding learning and classification. This transition enhances the understanding of the complexity of hypothesis sets.', 'The technical finesse involving 2n points The technical finesse involving 2n points is highlighted as a crucial aspect in the argument for the single sample, effectively addressing the challenges related to the role of Eout and the growth function. This refinement contributes to a clearer understanding of the theoretical concepts.']}, {'end': 786.627, 'start': 317.068, 'title': 'The vc dimension and generalization', 'summary': 'Discusses the concept of vc dimension, defining it as the maximum number of points a hypothesis set can shatter, and its relationship to generalization in learning, emphasizing that if the vc dimension is finite, the final hypothesis will generalize, independent of the learning algorithm.', 'duration': 469.559, 'highlights': ["The VC dimension is defined as the maximum number of points a hypothesis set can shatter, providing a measure of the set's expressive power and complexity.", 'If the VC dimension is finite, the final hypothesis will generalize, regardless of the learning algorithm, ensuring uniform generalization across different hypothesis selections.', 'Examples of VC dimensions are illustrated for various hypothesis sets, such as positive rays having a VC dimension of 1, 2D perceptrons having a VC dimension of 3, and convex sets in two dimensions having an infinite VC dimension.', 'The VC dimension serves as an upper bound for the growth function of a hypothesis set, with the order of the bounding polynomial being equal to the VC dimension.']}, {'end': 998.159, 'start': 786.927, 'title': 'Understanding vc theory and hypothesis sets', 'summary': 'Explains the concept of hypothesis sets, vc theory, and the role of probability distributions and target functions in generalization, emphasizing the independence of the hypothesis set from input distribution and the irrelevance of the target function to generalization. additionally, it discusses the significance of the training examples and the probabilistic nature of the vc inequality.', 'duration': 211.232, 'highlights': ["The hypothesis set's independence from the input distribution is emphasized, ensuring generalization regardless of the probability distribution. Independence of hypothesis set from input distribution, ensuring generalization, regardless of probability distribution.", 'The irrelevance of the target function to generalization is highlighted, as long as it generates the examples used for learning and testing. Irrelevance of the target function to generalization, as long as it generates the learning and testing examples.', 'The significance of training examples and their role in the probabilistic nature of the VC inequality is explained, ensuring generalization with high probability, relative to the generated data. Significance of training examples in the probabilistic nature of VC inequality, ensuring generalization with high probability relative to generated data.']}], 'duration': 911.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI86735.jpg', 'highlights': ['The transition from Hoeffding inequality to VC inequality characterizes generalization with fundamental and technical differences.', 'The growth function characterizes redundancy from overlapping bad regions of hypotheses, leading to the switch to VC inequality.', 'The significance of VC dimension in learning situations provides insight into the complexity of hypothesis sets.', 'The exact computation of VC dimension for perceptrons in any dimensional space offers practical value for understanding and applying the concept.', "The VC dimension is defined as the maximum number of points a hypothesis set can shatter, providing a measure of the set's expressive power and complexity.", 'If the VC dimension is finite, the final hypothesis will generalize, ensuring uniform generalization across different hypothesis selections.', 'The VC dimension serves as an upper bound for the growth function of a hypothesis set, with the order of the bounding polynomial being equal to the VC dimension.', "The hypothesis set's independence from the input distribution ensures generalization regardless of the probability distribution.", 'The irrelevance of the target function to generalization is highlighted, as long as it generates the examples used for learning and testing.', 'The significance of training examples and their role in the probabilistic nature of the VC inequality ensures generalization with high probability relative to generated data.']}, {'end': 2241.171, 'segs': [{'end': 1097.611, 'src': 'embed', 'start': 1064.534, 'weight': 0, 'content': [{'end': 1069.197, 'text': 'And we asked ourselves, how much did it go up? What turns out to be a very simple formula.', 'start': 1064.534, 'duration': 4.663}, {'end': 1076.381, 'text': 'The VC dimension of perceptrons is exactly d plus 1.', 'start': 1070.118, 'duration': 6.263}, {'end': 1077.222, 'text': 'Now we need to prove that.', 'start': 1076.381, 'duration': 0.841}, {'end': 1081.504, 'text': "And we're going to prove it in two stages, very simple stages.", 'start': 1078.202, 'duration': 3.302}, {'end': 1092.429, 'text': 'One of them is that we are going to show that the VC dimension is at most d plus 1.', 'start': 1082.205, 'duration': 10.224}, {'end': 1097.611, 'text': "And then we're going to show that the VC dimension is at least d plus 1.", 'start': 1092.429, 'duration': 5.182}], 'summary': 'The vc dimension of perceptrons is d plus 1, proved in two stages.', 'duration': 33.077, 'max_score': 1064.534, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI1064534.jpg'}, {'end': 1326.086, 'src': 'heatmap', 'start': 1277.286, 'weight': 0.765, 'content': [{'end': 1279.408, 'text': 'So I get a 1.', 'start': 1277.286, 'duration': 2.122}, {'end': 1280.788, 'text': 'So this is an invertible matrix.', 'start': 1279.408, 'duration': 1.38}, {'end': 1286.572, 'text': 'Why am I interested in an invertible matrix? Because we can shatter the data set.', 'start': 1281.729, 'duration': 4.843}, {'end': 1288.033, 'text': 'This is how we are going to do it.', 'start': 1286.972, 'duration': 1.061}, {'end': 1291.535, 'text': 'Look at any set of labels you want.', 'start': 1289.494, 'duration': 2.041}, {'end': 1293.116, 'text': 'So this is a dichotomy.', 'start': 1291.935, 'duration': 1.181}, {'end': 1301.122, 'text': 'This is the value at the first one, plus or minus 1, plus or minus 1, and plus or minus 1, on the X points that I just showed you.', 'start': 1294.658, 'duration': 6.464}, {'end': 1304.685, 'text': "So all of these could be any pattern of plus or minus 1's.", 'start': 1301.743, 'duration': 2.942}, {'end': 1312.25, 'text': 'So I would like to tell you that any dichotomy you pick from this plus 1, plus 1, minus 1, minus 1, plus 1,', 'start': 1305.866, 'duration': 6.384}, {'end': 1316.993, 'text': 'et cetera I can find a perceptron that realizes this dichotomy.', 'start': 1312.25, 'duration': 4.743}, {'end': 1320.536, 'text': 'If I do that, then I have showed you that I can shatter the set.', 'start': 1317.214, 'duration': 3.322}, {'end': 1326.086, 'text': 'So let us look for the w that satisfies.', 'start': 1322.378, 'duration': 3.708}], 'summary': 'Invertible matrix can shatter dataset by finding perceptron for any dichotomy.', 'duration': 48.8, 'max_score': 1277.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI1277286.jpg'}, {'end': 1942.004, 'src': 'heatmap', 'start': 1673.171, 'weight': 0.812, 'content': [{'end': 1676.756, 'text': 'So obviously, d plus 2 is bigger than d plus 1.', 'start': 1673.171, 'duration': 3.585}, {'end': 1685.266, 'text': 'What do you know when you have more vectors than dimensions? Oh, I know that they must be linearly dependent.', 'start': 1676.756, 'duration': 8.51}, {'end': 1696.51, 'text': 'Therefore, we must have that one of them will be a linear combination from the rest of the guys.', 'start': 1688.924, 'duration': 7.586}, {'end': 1703.174, 'text': 'So you take j, whichever it might be, and this will be equal to the sum, over the rest of the guys,', 'start': 1696.67, 'duration': 6.504}, {'end': 1706.017, 'text': "of some coefficients that I don't know times those guys.", 'start': 1703.174, 'duration': 2.843}, {'end': 1708.098, 'text': 'This will apply to any set you choose.', 'start': 1706.217, 'duration': 1.881}, {'end': 1712.481, 'text': "And this is the property that I'm actually going to use in order to establish what I want.", 'start': 1708.659, 'duration': 3.822}, {'end': 1717.155, 'text': 'Furthermore, I can actually make something about the AIs.', 'start': 1714.754, 'duration': 2.401}, {'end': 1720.296, 'text': 'AIs could be anything for this statement to hold.', 'start': 1717.215, 'duration': 3.081}, {'end': 1724.178, 'text': "But I'm going to claim that not all of them are zeros in this case.", 'start': 1720.856, 'duration': 3.322}, {'end': 1728.119, 'text': 'At least some of the AIs are non-zero.', 'start': 1725.258, 'duration': 2.861}, {'end': 1731.08, 'text': 'How do I know that? This is not part of the linear dependence.', 'start': 1728.539, 'duration': 2.541}, {'end': 1738.388, 'text': 'This is actually because of the particular form of these guys, where the first coordinate of all of these guys is always 1.', 'start': 1731.481, 'duration': 6.907}, {'end': 1741.671, 'text': 'When you look at this and apply it to the first coordinate, 1 equals.', 'start': 1738.388, 'duration': 3.283}, {'end': 1746.735, 'text': 'Well, these cannot all be 0s, because it has to add up to 1.', 'start': 1742.251, 'duration': 4.484}, {'end': 1751.56, 'text': "Therefore, some of the a i's will be non-zero.", 'start': 1746.735, 'duration': 4.825}, {'end': 1752.621, 'text': "That's all I need.", 'start': 1751.92, 'duration': 0.701}, {'end': 1753.481, 'text': 'I need this to hold.', 'start': 1752.701, 'duration': 0.78}, {'end': 1757.104, 'text': "I need some of the a i's not to be 0.", 'start': 1753.982, 'duration': 3.122}, {'end': 1758.446, 'text': 'Everybody buys that this is the case.', 'start': 1757.104, 'duration': 1.342}, {'end': 1759.346, 'text': "That's all I need.", 'start': 1758.706, 'duration': 0.64}, {'end': 1763.67, 'text': 'And then we go and show the dichotomy that you cannot implement.', 'start': 1759.447, 'duration': 4.223}, {'end': 1772.81, 'text': 'We have that, right? Consider the following dichotomy.', 'start': 1767.188, 'duration': 5.622}, {'end': 1780.693, 'text': "I'm going to take the xi's corresponding to non-zero ai.", 'start': 1775.811, 'duration': 4.882}, {'end': 1782.494, 'text': "So some of the ai's are non-zero, for sure.", 'start': 1780.713, 'duration': 1.781}, {'end': 1783.494, 'text': 'Maybe some of them are zeros.', 'start': 1782.534, 'duration': 0.96}, {'end': 1785.515, 'text': "I'm going to focus only on the non-zero guys.", 'start': 1783.595, 'duration': 1.92}, {'end': 1789.157, 'text': "I don't care what you do with the terms that have ai equals 0.", 'start': 1785.655, 'duration': 3.502}, {'end': 1789.817, 'text': 'Do whatever you want.', 'start': 1789.157, 'duration': 0.66}, {'end': 1791.138, 'text': 'Give them any plus or minus 1.', 'start': 1789.837, 'duration': 1.301}, {'end': 1791.818, 'text': "Let's not look at them.", 'start': 1791.138, 'duration': 0.68}, {'end': 1800.748, 'text': "I'm looking at those guys, and I am now constructing a dichotomy that I'm going to show you that you cannot implement using a perceptron.", 'start': 1793.286, 'duration': 7.462}, {'end': 1812.131, 'text': "For the xi's with a non-zero ai, I am going to give the label, which happens to be the sign of that coefficient.", 'start': 1803.028, 'duration': 9.103}, {'end': 1813.891, 'text': 'That is a non-zero number.', 'start': 1812.751, 'duration': 1.14}, {'end': 1818.072, 'text': "It's positive or negative, so I will give it plus 1 or minus 1, according to whether it's positive or negative.", 'start': 1813.931, 'duration': 4.141}, {'end': 1821.293, 'text': 'And I will do that for every non-zero term here.', 'start': 1818.772, 'duration': 2.521}, {'end': 1823.945, 'text': 'Everybody sees that.', 'start': 1823.244, 'duration': 0.701}, {'end': 1830.692, 'text': "And now I'm going to complete the dichotomy by telling you what will happen with xj.", 'start': 1825.386, 'duration': 5.306}, {'end': 1835.417, 'text': "I'm going to require that xj goes to minus 1.", 'start': 1831.413, 'duration': 4.004}, {'end': 1838.941, 'text': 'Now all you need to realize is that this is a dichotomy.', 'start': 1835.417, 'duration': 3.524}, {'end': 1843.762, 'text': 'These are values of plus or minus 1 on specific points.', 'start': 1841.04, 'duration': 2.722}, {'end': 1847.504, 'text': 'The other guys, which happen to be 0, give them any plus or minus 1.', 'start': 1844.502, 'duration': 3.002}, {'end': 1847.905, 'text': 'You choose.', 'start': 1847.504, 'duration': 0.401}, {'end': 1852.128, 'text': "And for the final guy, which is sitting here, I'm going to give it minus 1.", 'start': 1848.765, 'duration': 3.363}, {'end': 1853.489, 'text': 'This is a legitimate dichotomy.', 'start': 1852.128, 'duration': 1.361}, {'end': 1856.331, 'text': "And I'm going to show you that you cannot implement this particular one.", 'start': 1853.909, 'duration': 2.422}, {'end': 1863.937, 'text': "How is that? Because I really don't know your points.", 'start': 1861.095, 'duration': 2.842}, {'end': 1867.42, 'text': 'So I must be using just that algebraic property in order to find this.', 'start': 1864.057, 'duration': 3.363}, {'end': 1869.566, 'text': 'And the idea is very simple.', 'start': 1868.505, 'duration': 1.061}, {'end': 1873.969, 'text': 'This is the form we have.', 'start': 1871.828, 'duration': 2.141}, {'end': 1876.271, 'text': 'xj happens to be the linear sum of these guys.', 'start': 1874.23, 'duration': 2.041}, {'end': 1881.615, 'text': "I'm going to multiply by any w.", 'start': 1876.992, 'duration': 4.623}, {'end': 1884.297, 'text': 'So for any w, I mean the perceptrons, you multiply by w.', 'start': 1881.615, 'duration': 2.682}, {'end': 1885.598, 'text': 'That is what makes it a perceptron.', 'start': 1884.297, 'duration': 1.301}, {'end': 1886.739, 'text': "So I'm going to multiply by it.", 'start': 1885.638, 'duration': 1.101}, {'end': 1889.921, 'text': 'And then I realize that w transpose time xj.', 'start': 1887.239, 'duration': 2.682}, {'end': 1896.046, 'text': 'this will be actually the signal for the last guy, is actually the sum of the signals for the different guys with these coefficients.', 'start': 1889.921, 'duration': 6.125}, {'end': 1897.107, 'text': 'That has to happen.', 'start': 1896.526, 'duration': 0.581}, {'end': 1907.578, 'text': 'So what is the problem? The problem is that when you take this as your perceptron, then by definition, the label is the sign of this quantity.', 'start': 1898.795, 'duration': 8.783}, {'end': 1917.242, 'text': 'For the guys where ai is non-zero, we force this quantity, which is the value yi, to be the sign of ai.', 'start': 1909.459, 'duration': 7.783}, {'end': 1919.723, 'text': "That's what we constructed.", 'start': 1918.122, 'duration': 1.601}, {'end': 1928.001, 'text': 'What can you conclude, given that the sign of this fellow is the same as the sign of this fellow? It must be that these guys agree in sign.', 'start': 1920.939, 'duration': 7.062}, {'end': 1930.181, 'text': 'They are either both positive or both negative.', 'start': 1928.021, 'duration': 2.16}, {'end': 1936.923, 'text': 'Therefore, I can conclude that if you multiply them, you get something positive.', 'start': 1930.941, 'duration': 5.982}, {'end': 1939.623, 'text': 'That is for sure.', 'start': 1937.343, 'duration': 2.28}, {'end': 1942.004, 'text': 'So now I have a handle on this term.', 'start': 1939.964, 'duration': 2.04}], 'summary': 'When having more vectors than dimensions, they must be linearly dependent, leading to an impossible dichotomy for perceptrons.', 'duration': 268.833, 'max_score': 1673.171, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI1673171.jpg'}, {'end': 2047.839, 'src': 'heatmap', 'start': 1977.115, 'weight': 1, 'content': [{'end': 1988.191, 'text': 'What does this force the value of the perceptron, your perceptron, the one you had here, to be? It will have to be plus 1.', 'start': 1977.115, 'duration': 11.076}, {'end': 1993.613, 'text': "Therefore, it's impossible to get that to be minus 1 if you chose this.", 'start': 1988.191, 'duration': 5.422}, {'end': 1995.474, 'text': 'This is a choice that is legitimate.', 'start': 1994.013, 'duration': 1.461}, {'end': 1996.594, 'text': 'It is a dichotomy.', 'start': 1995.514, 'duration': 1.08}, {'end': 2005.097, 'text': 'And now, if you pick those guys, pick the rest of the zero-coefficient guys any way you want, you are forbidden from having this as minus 1.', 'start': 1997.054, 'duration': 8.043}, {'end': 2009.859, 'text': 'Therefore, you cannot shatter your set for any set you choose.', 'start': 2005.097, 'duration': 4.762}, {'end': 2014.141, 'text': 'Therefore, you cannot shatter any set of d plus 2 points.', 'start': 2010.319, 'duration': 3.822}, {'end': 2015.561, 'text': 'And I have the result.', 'start': 2014.801, 'duration': 0.76}, {'end': 2019.174, 'text': "So let's put it together.", 'start': 2018.294, 'duration': 0.88}, {'end': 2025.437, 'text': 'First, we showed that the VC dimension is at most d plus 1.', 'start': 2021.115, 'duration': 4.322}, {'end': 2030.499, 'text': "And then we showed that it's at least d plus 1.", 'start': 2025.437, 'duration': 5.062}, {'end': 2033.94, 'text': "Actually, did we do it this way or the other way around? That's another quiz.", 'start': 2030.499, 'duration': 3.441}, {'end': 2034.921, 'text': "No, no, it's not another quiz.", 'start': 2034, 'duration': 0.921}, {'end': 2040.729, 'text': 'The conclusion is that the VC dimension is d plus 1.', 'start': 2037.325, 'duration': 3.404}, {'end': 2045.896, 'text': 'And now, d-dimensional perceptron, the VC dimension is d plus 1.', 'start': 2040.729, 'duration': 5.167}, {'end': 2047.839, 'text': "Let's ask ourselves a simple question.", 'start': 2045.896, 'duration': 1.943}], 'summary': 'The vc dimension of a d-dimensional perceptron is d plus 1.', 'duration': 50.785, 'max_score': 1977.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI1977115.jpg'}], 'start': 999.028, 'title': 'Vc dimension and perceptrons', 'summary': 'Explains the vc dimension of perceptrons, showing it is exactly d+1, demonstrating the ability to shatter d+1 points and proving the vc dimension is at most d plus 1. it also interprets the relationship between vc dimension, parameters, and practical application.', 'chapters': [{'end': 1286.572, 'start': 999.028, 'title': 'Understanding vc dimension of perceptrons', 'summary': 'Explains the vc dimension of perceptrons, demonstrating that the vc dimension of a d-dimensional perceptron is exactly d+1, supporting it with a specific set of n points and a simple formula for vc dimension.', 'duration': 287.544, 'highlights': ['By demonstrating a specific set of n points, the chapter proves that the VC dimension of perceptrons is exactly d+1, providing insight into the significance of VC dimension.', 'The troublesome case of four points in three dimensions is easily shattered, leading to the conclusion that the VC dimension of perceptrons is d+1.']}, {'end': 1502.485, 'start': 1286.972, 'title': 'Perceptron and vc dimension', 'summary': 'Explains how a perceptron can shatter d+1 points and establishes the vc dimension is less than or equal to d+1, using algebraic solutions to realize any dichotomy.', 'duration': 215.513, 'highlights': ['A perceptron can realize any dichotomy, shattering D+1 points. The perceptron can find a solution that realizes any dichotomy, shattering the set of D+1 points.', 'The VC dimension is established to be less than or equal to D+1. The chapter concludes that the VC dimension is less than or equal to D+1, using algebraic solutions to realize any dichotomy and shatter D+1 points.']}, {'end': 1873.969, 'start': 1503.786, 'title': 'Vc dimension proof', 'summary': 'Explores the concept of vc dimension and proves that the vc dimension is at most d plus 1 by demonstrating the inability to implement a specific dichotomy using a perceptron.', 'duration': 370.183, 'highlights': ['The VC dimension is at most d plus 1 The chapter proves that the VC dimension is at most d plus 1 by demonstrating the inability to implement a specific dichotomy using a perceptron.', 'Establishing the premise of VC dimension The speaker discusses the different statements required to establish the premise that the VC dimension is at most d plus 1, encouraging audience participation and engagement.', 'Demonstration of linear dependence The speaker explains the concept of linear dependence using the example of having more vectors than dimensions and the resulting linear combination, providing a clear understanding of the property used to establish the proof.']}, {'end': 2241.171, 'start': 1874.23, 'title': 'Interpreting vc dimension', 'summary': 'Explains the interpretation of vc dimension in relation to the number of parameters and its practical application, highlighting the relationship between vc dimension and degrees of freedom in a model and its relevance in practice.', 'duration': 366.941, 'highlights': ['The VC dimension is d plus 1, which is the number of parameters in the perceptron model, providing degrees of freedom to create different hypotheses.', 'The VC dimension signifies the maximum number of points that can be shattered, and with more parameters, there is a higher VC dimension, highlighting the relationship between VC dimension and degrees of freedom in a model.', 'The practical application of the VC dimension lies in understanding its relevance for learning and its utility in real-world scenarios, indicating its significance beyond theoretical understanding.']}], 'duration': 1242.143, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI999028.jpg', 'highlights': ['The VC dimension of perceptrons is exactly d+1, demonstrating the ability to shatter d+1 points.', 'The VC dimension is established to be less than or equal to D+1, using algebraic solutions to realize any dichotomy and shatter D+1 points.', 'The VC dimension is at most d plus 1, demonstrated by the inability to implement a specific dichotomy using a perceptron.', 'The VC dimension is d plus 1, which is the number of parameters in the perceptron model, providing degrees of freedom to create different hypotheses.']}, {'end': 2827.296, 'segs': [{'end': 2294.981, 'src': 'heatmap', 'start': 2242.466, 'weight': 0.701, 'content': [{'end': 2249.771, 'text': "If you don't know how to use it, what? So now the problem that we are facing is actually here.", 'start': 2242.466, 'duration': 7.305}, {'end': 2257.135, 'text': 'Because now, the specification that is needed in order for you to get you to pick the right hypothesis is pretty elaborate.', 'start': 2249.931, 'duration': 7.204}, {'end': 2258.496, 'text': 'You need a lot of examples here.', 'start': 2257.216, 'duration': 1.28}, {'end': 2260.718, 'text': 'So that is the relation we are going to see.', 'start': 2259.177, 'duration': 1.541}, {'end': 2266.963, 'text': 'The parameters happen to be analog degrees of freedom.', 'start': 2263.06, 'duration': 3.903}, {'end': 2273.828, 'text': 'When I talk about w0, w0 can assume a continuous value from R, and it matters.', 'start': 2267.383, 'duration': 6.445}, {'end': 2278.332, 'text': 'If you pick a different threshold, you will get a different perceptron.', 'start': 2274.229, 'duration': 4.103}, {'end': 2281.134, 'text': 'It will return different values for parts of the space.', 'start': 2278.772, 'duration': 2.362}, {'end': 2284.697, 'text': 'So this is genuinely degrees of freedom that happen to be analog.', 'start': 2281.595, 'duration': 3.102}, {'end': 2294.981, 'text': 'The great thing about the VC dimension is that it translated those into binary, if you will, degrees of freedom.', 'start': 2285.538, 'duration': 9.443}], 'summary': 'Challenges in hypothesis selection due to elaborate specification and need for many examples in relation to analog degrees of freedom.', 'duration': 52.515, 'max_score': 2242.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2242466.jpg'}, {'end': 2382.795, 'src': 'heatmap', 'start': 2338.459, 'weight': 0.832, 'content': [{'end': 2343.381, 'text': 'So the VC dimension now abstracts whatever the mathematics that goes inside, et cetera.', 'start': 2338.459, 'duration': 4.922}, {'end': 2348.683, 'text': "You go outside, and if I can shatter 20 points, that's good.", 'start': 2344.241, 'duration': 4.442}, {'end': 2355.146, 'text': 'If someone else can shatter 30 points, they have more degrees of freedom to be able to do that, regardless of where this came from.', 'start': 2348.963, 'duration': 6.183}, {'end': 2366.61, 'text': "Now let's look at the usual suspects, and see if the correspondence between degrees of freedom and VC dimension holds.", 'start': 2359.487, 'duration': 7.123}, {'end': 2374.927, 'text': 'Who are the usual suspects? I think this is the last lecture where we are going to see the positive rays and the rest of the gang.', 'start': 2367.011, 'duration': 7.916}, {'end': 2375.968, 'text': "So don't despair.", 'start': 2375.187, 'duration': 0.781}, {'end': 2377.57, 'text': 'Positive rays.', 'start': 2376.969, 'duration': 0.601}, {'end': 2380.312, 'text': 'So what are positive rays? What is the VC dimension? That was 1.', 'start': 2377.77, 'duration': 2.542}, {'end': 2382.795, 'text': 'I can shatter at most one point.', 'start': 2380.312, 'duration': 2.483}], 'summary': 'Vc dimension abstracts mathematics, with 20 shattered points, more degrees of freedom.', 'duration': 44.336, 'max_score': 2338.459, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2338459.jpg'}, {'end': 2439.205, 'src': 'embed', 'start': 2409.244, 'weight': 1, 'content': [{'end': 2410.125, 'text': 'Positive intervals.', 'start': 2409.244, 'duration': 0.881}, {'end': 2413.487, 'text': 'The positive intervals, the VC dimension was 2.', 'start': 2410.605, 'duration': 2.882}, {'end': 2414.588, 'text': 'That is the most I can shatter.', 'start': 2413.487, 'duration': 1.101}, {'end': 2419.671, 'text': 'What do they look like? In this case, I have this guy, the small blue guy.', 'start': 2415.268, 'duration': 4.403}, {'end': 2422.073, 'text': 'And there is a beginning and an end.', 'start': 2420.772, 'duration': 1.301}, {'end': 2425.155, 'text': 'And between them, I return plus 1, and here I return minus 1.', 'start': 2422.613, 'duration': 2.542}, {'end': 2429.418, 'text': 'So depending on the choice of the beginning and the end, I will get one hypothesis versus another.', 'start': 2425.155, 'duration': 4.263}, {'end': 2434.242, 'text': 'How many parameters, or how many degrees of freedom? 2.', 'start': 2430.299, 'duration': 3.943}, {'end': 2437.744, 'text': 'And what is the VC dimension? 2.', 'start': 2434.242, 'duration': 3.502}, {'end': 2439.205, 'text': "Then by induction, it's true.", 'start': 2437.744, 'duration': 1.461}], 'summary': 'Positive intervals with vc dimension of 2, 2 parameters, and true by induction.', 'duration': 29.961, 'max_score': 2409.244, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2409244.jpg'}, {'end': 2515.076, 'src': 'heatmap', 'start': 2466.015, 'weight': 0.763, 'content': [{'end': 2471.219, 'text': 'In more complicated models, it may be difficult to argue what is a parameter that is contributing and what is not.', 'start': 2466.015, 'duration': 5.204}, {'end': 2476.524, 'text': 'But at least we are establishing the principle that a parameter may not necessarily contribute a degree of freedom.', 'start': 2471.48, 'duration': 5.044}, {'end': 2482.409, 'text': 'And since the VC dimension is a bottom line, it looks at what you were able to achieve.', 'start': 2477.025, 'duration': 5.384}, {'end': 2488.215, 'text': 'it would be a more reliable way of measuring what is the actual degrees of freedom you have, instead of going through the analysis of your model.', 'start': 2482.409, 'duration': 5.806}, {'end': 2492.317, 'text': "So let's take one-dimensional perceptron.", 'start': 2489.295, 'duration': 3.022}, {'end': 2493.578, 'text': 'Very simple model.', 'start': 2492.817, 'duration': 0.761}, {'end': 2496.899, 'text': "You have only one variable, and you're going to give it a weight.", 'start': 2494.058, 'duration': 2.841}, {'end': 2498.44, 'text': "That's one parameter.", 'start': 2497.439, 'duration': 1.001}, {'end': 2501.141, 'text': "And then you're going to compare it with a threshold, w0.", 'start': 2498.8, 'duration': 2.341}, {'end': 2502.102, 'text': "That's a second parameter.", 'start': 2501.201, 'duration': 0.901}, {'end': 2505.134, 'text': "And then you're going to give me plus or minus 1.", 'start': 2502.142, 'duration': 2.992}, {'end': 2509.715, 'text': 'So this fellow has two parameters, indeed two degrees of freedom.', 'start': 2505.134, 'duration': 4.581}, {'end': 2514.256, 'text': "And I get the VC dimension to be 2, because it's d plus 1.", 'start': 2510.435, 'duration': 3.821}, {'end': 2515.076, 'text': 'We proved it generally.', 'start': 2514.256, 'duration': 0.82}], 'summary': 'Establishing the principle that a parameter may not necessarily contribute a degree of freedom, by using vc dimension as a more reliable way of measuring actual degrees of freedom.', 'duration': 49.061, 'max_score': 2466.015, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2466015.jpg'}, {'end': 2502.102, 'src': 'embed', 'start': 2477.025, 'weight': 0, 'content': [{'end': 2482.409, 'text': 'And since the VC dimension is a bottom line, it looks at what you were able to achieve.', 'start': 2477.025, 'duration': 5.384}, {'end': 2488.215, 'text': 'it would be a more reliable way of measuring what is the actual degrees of freedom you have, instead of going through the analysis of your model.', 'start': 2482.409, 'duration': 5.806}, {'end': 2492.317, 'text': "So let's take one-dimensional perceptron.", 'start': 2489.295, 'duration': 3.022}, {'end': 2493.578, 'text': 'Very simple model.', 'start': 2492.817, 'duration': 0.761}, {'end': 2496.899, 'text': "You have only one variable, and you're going to give it a weight.", 'start': 2494.058, 'duration': 2.841}, {'end': 2498.44, 'text': "That's one parameter.", 'start': 2497.439, 'duration': 1.001}, {'end': 2501.141, 'text': "And then you're going to compare it with a threshold, w0.", 'start': 2498.8, 'duration': 2.341}, {'end': 2502.102, 'text': "That's a second parameter.", 'start': 2501.201, 'duration': 0.901}], 'summary': "Vc dimension provides reliable measurement of model's degrees of freedom.", 'duration': 25.077, 'max_score': 2477.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2477025.jpg'}, {'end': 2715.171, 'src': 'embed', 'start': 2669.83, 'weight': 3, 'content': [{'end': 2674.352, 'text': 'You look at the effective number of parameters, and effective for us as far as the result.', 'start': 2669.83, 'duration': 4.522}, {'end': 2676.413, 'text': 'And the result is captured by the VC dimension.', 'start': 2674.512, 'duration': 1.901}, {'end': 2679.174, 'text': 'So this is our quantity for measuring the degrees of freedom.', 'start': 2676.633, 'duration': 2.541}, {'end': 2691.639, 'text': "Now let's look at the number of data points needed which a practitioner would be interested in and doesn't care about the rest of the story that I told you.", 'start': 2683.194, 'duration': 8.445}, {'end': 2693.6, 'text': 'So you have a system.', 'start': 2692.039, 'duration': 1.561}, {'end': 2701.525, 'text': "Let's say that you manage a learning system, and you look at the hypothesis set, and you say a VC dimension is 7.", 'start': 2693.66, 'duration': 7.865}, {'end': 2702.746, 'text': 'I want a certain performance.', 'start': 2701.525, 'duration': 1.221}, {'end': 2704.746, 'text': 'Could you please tell me how many examples I need?', 'start': 2702.806, 'duration': 1.94}, {'end': 2711.949, 'text': 'First, we know that the most important theoretical thing is that the fact that there is a VC dimension finite one means that you can learn.', 'start': 2705.707, 'duration': 6.242}, {'end': 2715.171, 'text': 'That is the biggest achievement that we have done in theory.', 'start': 2712.209, 'duration': 2.962}], 'summary': 'Vc dimension quantifies degrees of freedom, helps determine data points needed for learning.', 'duration': 45.341, 'max_score': 2669.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2669830.jpg'}, {'end': 2834.721, 'src': 'embed', 'start': 2802.738, 'weight': 5, 'content': [{'end': 2807.34, 'text': 'And then after you said that, you ask yourself, how many examples do I need? Fair enough.', 'start': 2802.738, 'duration': 4.602}, {'end': 2816.53, 'text': 'When you say, I want to be 10% away from E out, what you are really saying is that epsilon is 0.1.', 'start': 2808.46, 'duration': 8.07}, {'end': 2824.795, 'text': 'When you want to say that I want to be 95% sure that the statement is correct, you are picking delta to be 5%, or 0.05.', 'start': 2816.53, 'duration': 8.265}, {'end': 2825.435, 'text': "So that's what you do.", 'start': 2824.795, 'duration': 0.64}, {'end': 2827.296, 'text': 'So you want certain epsilon and delta.', 'start': 2825.735, 'duration': 1.561}, {'end': 2834.721, 'text': 'And then you ask yourself, how does N depend on the VC dimension? You are competing with someone else.', 'start': 2828.197, 'duration': 6.524}], 'summary': 'Decide on epsilon and delta values, and consider the impact of vc dimension on n.', 'duration': 31.983, 'max_score': 2802.738, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2802738.jpg'}], 'start': 2242.466, 'title': 'Vc dimension and its impact', 'summary': 'Explores the relationship between degrees of freedom and vc dimension, illustrating how parameters influence binary outcomes. it also examines the impact of vc dimension on the number of data points needed for learning, emphasizing the effect of small quantities epsilon and delta on the vc inequality.', 'chapters': [{'end': 2609.546, 'start': 2242.466, 'title': 'Degrees of freedom and vc dimension', 'summary': 'Discusses the relationship between degrees of freedom and vc dimension, highlighting how parameters and degrees of freedom translate into binary and the illustration of positive rays and positive intervals.', 'duration': 367.08, 'highlights': ['The VC dimension abstracts the expressiveness of a model and translates parameters into binary degrees of freedom, providing a reliable way to measure actual degrees of freedom. VC dimension abstracts model expressiveness, translates parameters into binary degrees of freedom, reliable measurement of degrees of freedom', 'Illustration of positive rays and positive intervals demonstrates the relationship between degrees of freedom and VC dimension, where the choice of parameters corresponds to the VC dimension. Illustration of positive rays and intervals, demonstration of parameter-VC dimension relationship', 'Example of a one-dimensional perceptron model with two parameters but only two degrees of freedom, showcasing the concept of redundant parameters. Example of one-dimensional perceptron model, two parameters but only two degrees of freedom, demonstration of redundant parameters']}, {'end': 2827.296, 'start': 2610.386, 'title': 'Vc dimension and number of data points', 'summary': 'Explains the concept of vc dimension as measuring the effective number of parameters and its impact on the number of data points needed for learning, with emphasis on epsilon and delta as small quantities affecting the vc inequality.', 'duration': 216.91, 'highlights': ['The VC dimension measures the effective number of parameters, rather than the raw number of parameters. The VC dimension is a measure of the effective number of parameters, which can be smaller than the raw number of parameters, and it captures the degrees of freedom in learning systems.', 'The significance of VC dimension lies in its finite value, enabling learning. The fact that there is a finite VC dimension means that learning is possible, representing a significant theoretical achievement.', 'Epsilon and delta are crucial small quantities affecting the VC inequality and the number of examples needed for learning. Epsilon and delta are important in the VC inequality, as they represent the desired level of accuracy and confidence, influencing the number of examples required for learning.']}], 'duration': 584.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2242466.jpg', 'highlights': ['VC dimension abstracts model expressiveness, translates parameters into binary degrees of freedom, reliable measurement of degrees of freedom', 'Illustration of positive rays and intervals, demonstration of parameter-VC dimension relationship', 'Example of one-dimensional perceptron model, two parameters but only two degrees of freedom, demonstration of redundant parameters', 'The VC dimension is a measure of the effective number of parameters, which can be smaller than the raw number of parameters, and it captures the degrees of freedom in learning systems', 'The fact that there is a finite VC dimension means that learning is possible, representing a significant theoretical achievement', 'Epsilon and delta are important in the VC inequality, as they represent the desired level of accuracy and confidence, influencing the number of examples required for learning']}, {'end': 3165.693, 'segs': [{'end': 2876.392, 'src': 'embed', 'start': 2848.508, 'weight': 0, 'content': [{'end': 2851.77, 'text': 'Because a bigger VC dimension will give them more flexibility.', 'start': 2848.508, 'duration': 3.262}, {'end': 2853.49, 'text': 'They might be able to fit the data better.', 'start': 2851.81, 'duration': 1.68}, {'end': 2854.751, 'text': 'They will get a better en.', 'start': 2853.57, 'duration': 1.181}, {'end': 2858.313, 'text': 'So if they can get the same generalization bound, they are better off.', 'start': 2855.231, 'duration': 3.082}, {'end': 2860.013, 'text': "So I'm really interested in this question.", 'start': 2858.333, 'duration': 1.68}, {'end': 2861.174, 'text': 'I just want to know how they relate.', 'start': 2860.033, 'duration': 1.141}, {'end': 2871.028, 'text': 'In order to do this, we are going to look at this function, which is a polynomial, a very simple polynomial, just one term.', 'start': 2863.963, 'duration': 7.065}, {'end': 2872.449, 'text': "It's not polynomial.", 'start': 2871.328, 'duration': 1.121}, {'end': 2873.95, 'text': "It's monomial, I guess.", 'start': 2872.549, 'duration': 1.401}, {'end': 2876.392, 'text': 'n to the d.', 'start': 2874.51, 'duration': 1.882}], 'summary': 'Increasing vc dimension allows better data fitting and generalization if same bound is achieved.', 'duration': 27.884, 'max_score': 2848.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2848508.jpg'}, {'end': 3026.106, 'src': 'embed', 'start': 2997.398, 'weight': 2, 'content': [{'end': 2999.58, 'text': 'The probability now is what? Less than 20-something.', 'start': 2997.398, 'duration': 2.182}, {'end': 3000.881, 'text': "That's nice.", 'start': 2999.98, 'duration': 0.901}, {'end': 3004.244, 'text': 'But eventually, the exponential wins.', 'start': 3001.362, 'duration': 2.882}, {'end': 3005.646, 'text': "That's the good part.", 'start': 3004.625, 'duration': 1.021}, {'end': 3007.968, 'text': 'It goes down, and then goes up.', 'start': 3005.706, 'duration': 2.262}, {'end': 3010.871, 'text': 'So now you can see that this is the interesting region.', 'start': 3008.368, 'duration': 2.503}, {'end': 3015.897, 'text': "And I'm going to ask myself, in order to get to this region, which is the performance you want?", 'start': 3010.891, 'duration': 5.006}, {'end': 3022.544, 'text': 'how many examples, which is this coordinate, do I need, given the different VC dimensions, which supposedly is 4 here and 5 here?', 'start': 3015.897, 'duration': 6.647}, {'end': 3026.106, 'text': 'Again, this is a caricature, because this is not the function I have.', 'start': 3023.305, 'duration': 2.801}], 'summary': 'Exponential growth eventually wins, with an interesting region to explore, needing different examples for performance.', 'duration': 28.708, 'max_score': 2997.398, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2997398.jpg'}, {'end': 3176.097, 'src': 'embed', 'start': 3144.248, 'weight': 1, 'content': [{'end': 3146.269, 'text': 'And these levels are not going to be very much different.', 'start': 3144.248, 'duration': 2.021}, {'end': 3147.969, 'text': 'On this scale, this is 1.', 'start': 3146.329, 'duration': 1.64}, {'end': 3151.49, 'text': 'This is what? This is 10 to the minus 5.', 'start': 3147.969, 'duration': 3.521}, {'end': 3153.39, 'text': "That's a very, very, very small probability.", 'start': 3151.49, 'duration': 1.9}, {'end': 3157.131, 'text': 'So the play here is very small, as you vary delta significantly.', 'start': 3153.49, 'duration': 3.641}, {'end': 3162.312, 'text': 'And the play with epsilon, which will affect the exponent, is that these guys will be spread.', 'start': 3157.791, 'duration': 4.521}, {'end': 3165.693, 'text': 'Instead of being 20, 40, it will be 2, 000, 4, 000, and so on.', 'start': 3162.332, 'duration': 3.361}, {'end': 3167.213, 'text': 'But this will be the shape.', 'start': 3166.393, 'duration': 0.82}, {'end': 3170.134, 'text': "Now let's add the other curve.", 'start': 3168.593, 'duration': 1.541}, {'end': 3171.915, 'text': 'So this is n to the 5.', 'start': 3170.154, 'duration': 1.761}, {'end': 3176.097, 'text': "How about n to the 10? What does it look like? Now it didn't go upstairs.", 'start': 3171.915, 'duration': 4.182}], 'summary': 'Discussion on small probabilities and exponential growth, with examples of 10^-5 probability and n to the power of 10.', 'duration': 31.849, 'max_score': 3144.248, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3144248.jpg'}], 'start': 2828.197, 'title': 'Vc dimension and tradeoff analysis', 'summary': 'Discusses the impact of vc dimension on generalization bound and the tradeoff between vc dimension and number of examples, showing the exponential decrease in probability as sample size increases.', 'chapters': [{'end': 2861.174, 'start': 2828.197, 'title': 'Relationship between n and vc dimension', 'summary': 'Discusses the relationship between the vc dimension and the ability to achieve a certain generalization bound, emphasizing the impact of vc dimension on flexibility and the ability to fit data better.', 'duration': 32.977, 'highlights': ["The VC dimension's impact on the flexibility to fit data better is crucial, as a bigger VC dimension can potentially lead to achieving the same generalization bound as a smaller VC dimension but with greater flexibility.", 'Understanding the relationship between VC dimension and achieving a specific generalization bound is a key question for determining the competitive advantage in solving the same problem with the same data.']}, {'end': 3165.693, 'start': 2863.963, 'title': 'Tradeoff between d and n', 'summary': 'Explores the tradeoff between the vc dimension (d) and the number of examples (n) through the analysis of a polynomial function, demonstrating how the probability exponentially decreases as n increases, impacting the performance and sample size required.', 'duration': 301.73, 'highlights': ['The probability exponentially decreases as n increases, impacting the performance and sample size required. The exponential nature of the probability curve as n increases significantly impacts the sample size required for a small probability, with the curve peaking at different points for different values of n and eventually being dominated by the negative exponential.', 'The interesting region for determining the performance lies beyond the initial part of the curve, with small probability values being significant. The chapter emphasizes the significance of the region beyond the initial part of the probability curve, where small probability values indicate the level of performance and impact the determination of the number of examples needed.', 'The analysis demonstrates the impact of epsilon on the exponent and the spread of probability values. The chapter discusses how the variation in epsilon affects the exponent and leads to a significant spread in probability values, influencing the determination of sample sizes required for different levels of performance.']}], 'duration': 337.496, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI2828197.jpg', 'highlights': ['Understanding the relationship between VC dimension and achieving a specific generalization bound is crucial for determining competitive advantage.', 'The impact of epsilon on the exponent and the spread of probability values influences the determination of sample sizes required for different levels of performance.', 'The probability exponentially decreases as n increases, significantly impacting the sample size required for a small probability.']}, {'end': 3475.29, 'segs': [{'end': 3211.475, 'src': 'embed', 'start': 3185.581, 'weight': 2, 'content': [{'end': 3189.843, 'text': "Varying the VC dimension, I'm getting a different curve, and I'm getting the behavior in the interesting region.", 'start': 3185.581, 'duration': 4.262}, {'end': 3190.983, 'text': 'You see the point here.', 'start': 3189.863, 'duration': 1.12}, {'end': 3195.404, 'text': 'And then I keep adding, and I will add up to 30.', 'start': 3191.264, 'duration': 4.14}, {'end': 3201.588, 'text': 'So these are the curves that I get by varying the VC dimension, the alleged VC dimension here.', 'start': 3195.404, 'duration': 6.184}, {'end': 3205.028, 'text': '5, 10, 15, 20, 25, 30.', 'start': 3201.608, 'duration': 3.42}, {'end': 3206.952, 'text': 'Something very nice is observable here.', 'start': 3205.03, 'duration': 1.922}, {'end': 3211.475, 'text': 'These guys are extremely regular in their progression.', 'start': 3208.813, 'duration': 2.662}], 'summary': 'By varying vc dimension, observed interesting behavior in curves up to 30, showing regular progression.', 'duration': 25.894, 'max_score': 3185.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3185581.jpg'}, {'end': 3328.066, 'src': 'embed', 'start': 3300.778, 'weight': 5, 'content': [{'end': 3306.263, 'text': 'but a practical observation that is almost as good as a mathematical statement.', 'start': 3300.778, 'duration': 5.485}, {'end': 3315.731, 'text': 'The practical observation is that the actual quantity we are trying to bound follows the same monotonicity as the bound.', 'start': 3307.924, 'duration': 7.807}, {'end': 3317.901, 'text': 'That is the observation.', 'start': 3316.9, 'duration': 1.001}, {'end': 3326.345, 'text': 'You use bigger VC dimension, the quantities you will get are bigger, and actually very close to proportional.', 'start': 3318.921, 'duration': 7.424}, {'end': 3328.066, 'text': 'This is an observation.', 'start': 3327.005, 'duration': 1.061}], 'summary': 'Practical observation: larger vc dimension leads to proportionally bigger quantities.', 'duration': 27.288, 'max_score': 3300.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3300778.jpg'}, {'end': 3390.022, 'src': 'embed', 'start': 3366.254, 'weight': 0, 'content': [{'end': 3375.265, 'text': 'The first lesson is that there is a proportionality between the VC dimension and the number of examples needed in order to achieve a certain level of performance.', 'start': 3366.254, 'duration': 9.011}, {'end': 3382.815, 'text': 'That is a theoretical observance for the bound, and a practical observance for the actual quantity you get.', 'start': 3376.287, 'duration': 6.528}, {'end': 3383.776, 'text': "That's number one.", 'start': 3383.095, 'duration': 0.681}, {'end': 3390.022, 'text': 'Number two is that, give us just a guide, proportional, not proportional.', 'start': 3385.12, 'duration': 4.902}], 'summary': 'Vc dimension relates to number of examples and performance level. proportional relationship between vc dimension and examples, theoretical and practical observations.', 'duration': 23.768, 'max_score': 3366.254, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3366254.jpg'}, {'end': 3463.646, 'src': 'embed', 'start': 3429.05, 'weight': 4, 'content': [{'end': 3431.572, 'text': 'the following rule of thumb holds', 'start': 3429.05, 'duration': 2.522}, {'end': 3438.115, 'text': 'You have a VC dimension, and you are asking for a number of examples in order to get reasonable generalization.', 'start': 3432.952, 'duration': 5.163}, {'end': 3444.619, 'text': 'The rule is, you need 10 times the VC dimension.', 'start': 3439.256, 'duration': 5.363}, {'end': 3446, 'text': 'No proof.', 'start': 3445.539, 'duration': 0.461}, {'end': 3447.4, 'text': "It's not a mathematical statement.", 'start': 3446.24, 'duration': 1.16}, {'end': 3451.323, 'text': "And I'm saying greater than or equal, obviously, because if you get more, you will get better performance.", 'start': 3448.041, 'duration': 3.282}, {'end': 3455.698, 'text': 'But that will get you in the middle of the interesting region.', 'start': 3452.435, 'duration': 3.263}, {'end': 3463.646, 'text': 'That will get you in the region where the probability statement is meaningful, rather than completely unknown what the generalization will be like.', 'start': 3455.998, 'duration': 7.648}], 'summary': 'For reasonable generalization, aim for 10 times the vc dimension.', 'duration': 34.596, 'max_score': 3429.05, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3429050.jpg'}], 'start': 3166.393, 'title': 'Vc dimension and its impact', 'summary': 'Explores the impact of varying the vc dimension on curve behavior, highlights the linear progression between vc dimension and number of examples, and discusses a practical proportional relationship with a suggested rule of thumb of 10 times the vc dimension for generalization.', 'chapters': [{'end': 3300.778, 'start': 3166.393, 'title': 'Vc dimension and progression', 'summary': 'Explores the impact of varying the vc dimension on the behavior of curves and the relationship between vc dimension and the number of examples, demonstrating a linear progression and highlighting the limitations of theoretical bounds.', 'duration': 134.385, 'highlights': ['The number of examples needed to achieve a certain level is proportional to the VC dimension, with a linear progression observed from 5 to 30, showcasing the impact of varying VC dimension.', 'Theoretical bounds in terms of the number of examples needed exhibit limitations, as different VC dimensions may result in varying performance, highlighting the discrepancy between theoretical bounds and actual quantity.', 'The behavior of curves is influenced by varying the VC dimension, leading to a linear progression and demonstrating the impact of VC dimension on the number of examples required.']}, {'end': 3475.29, 'start': 3300.778, 'title': 'Proportionality between vc dimension and examples', 'summary': 'Discusses the practical observation of a proportional relationship between vc dimension and number of examples needed for a certain level of performance, suggesting a rule of thumb of 10 times the vc dimension for reasonable generalization.', 'duration': 174.512, 'highlights': ['There is a proportionality between the VC dimension and the number of examples needed in order to achieve a certain level of performance.', 'A rule of thumb suggests needing 10 times the VC dimension for reasonable generalization.', 'The practical observation is that the actual quantity follows the same monotonicity as the bound, with bigger VC dimension leading to the need for more examples.']}], 'duration': 308.897, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3166393.jpg', 'highlights': ['The number of examples needed is proportional to the VC dimension, showcasing the impact of varying VC dimension.', 'Theoretical bounds in terms of the number of examples exhibit limitations due to varying performance with different VC dimensions.', 'The behavior of curves is influenced by varying the VC dimension, demonstrating the impact on the number of examples required.', 'There is a proportionality between the VC dimension and the number of examples needed for a certain level of performance.', 'A rule of thumb suggests needing 10 times the VC dimension for reasonable generalization.', 'The practical observation is that the actual quantity follows the same monotonicity as the bound, with bigger VC dimension leading to the need for more examples.']}, {'end': 3920.708, 'segs': [{'end': 3545.74, 'src': 'embed', 'start': 3514.735, 'weight': 0, 'content': [{'end': 3521.678, 'text': 'And the logic we will use is that you specify epsilon, and then I will compute what delta is, depending on the number of examples.', 'start': 3514.735, 'duration': 6.943}, {'end': 3525.039, 'text': "So you tell me what your tolerance is, and I'll tell you what the probability is.", 'start': 3522.298, 'duration': 2.741}, {'end': 3529.461, 'text': 'Another way of looking at it is the other way around.', 'start': 3526.3, 'duration': 3.161}, {'end': 3532.529, 'text': 'You tell me what delta is.', 'start': 3530.908, 'duration': 1.621}, {'end': 3535.332, 'text': 'You would like to make a statement with reliability 95%.', 'start': 3532.569, 'duration': 2.763}, {'end': 3545.74, 'text': 'Can you tell me what tolerance can you guarantee under the 95%? You start with the delta, and you go to the epsilon.', 'start': 3535.332, 'duration': 10.408}], 'summary': 'Specify epsilon to compute delta based on number of examples, ensuring 95% reliability.', 'duration': 31.005, 'max_score': 3514.735, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3514735.jpg'}, {'end': 3709.061, 'src': 'heatmap', 'start': 3572.146, 'weight': 0.778, 'content': [{'end': 3573.307, 'text': 'And finally, I take a square root.', 'start': 3572.146, 'duration': 1.161}, {'end': 3577.268, 'text': "So that's what you get.", 'start': 3575.086, 'duration': 2.182}, {'end': 3579.269, 'text': 'Very straightforward.', 'start': 3578.629, 'duration': 0.64}, {'end': 3583.313, 'text': 'So now I can start with this fellow, and get epsilon.', 'start': 3580.19, 'duration': 3.123}, {'end': 3588.237, 'text': "Now I'm going to call this formula capital Omega, which is a notation that will survive with us.", 'start': 3584.073, 'duration': 4.164}, {'end': 3591.159, 'text': "It's a formula that depends on several things.", 'start': 3589.077, 'duration': 2.082}, {'end': 3596.884, 'text': 'And as you can see, if the growth function is bigger, the VC dimension is bigger, Omega is worse.', 'start': 3591.519, 'duration': 5.365}, {'end': 3604.31, 'text': "That's understood, because the bigger VC dimension, the worse the guarantee on generalization, which is this approximation thing.", 'start': 3597.264, 'duration': 7.046}, {'end': 3610.014, 'text': 'And if I have more examples, I am in good shape, because now the gross function is polynomial.', 'start': 3605.09, 'duration': 4.924}, {'end': 3615.177, 'text': 'By the time you take the natural logarithm, this guy becomes logarithmic.', 'start': 3610.054, 'duration': 5.123}, {'end': 3620.441, 'text': 'And logarithmic gets killed by linear, as much as linear gets killed by an exponential.', 'start': 3615.638, 'duration': 4.803}, {'end': 3625.004, 'text': 'So this is just one step down in the exponential scale of the previous statement.', 'start': 3620.621, 'duration': 4.383}, {'end': 3628.246, 'text': 'So indeed, if I have more examples, I will get a smaller value of omega.', 'start': 3625.364, 'duration': 2.882}, {'end': 3636.352, 'text': 'And obviously, if you are more finicky, if you want the guarantee to be 99% instead of 95%, So now delta is 0.01 instead of 0.05.', 'start': 3628.607, 'duration': 7.745}, {'end': 3644.037, 'text': "Then epsilon will be looser, because you're making a statement that is true for more of the time, so you will have to accommodate bigger epsilon.", 'start': 3636.352, 'duration': 7.685}, {'end': 3646.699, 'text': "So that's what we have.", 'start': 3644.537, 'duration': 2.162}, {'end': 3648.74, 'text': 'The statement now is a positive one.', 'start': 3647.179, 'duration': 1.561}, {'end': 3651.422, 'text': 'It used to be that we are characterizing bad events.', 'start': 3648.78, 'duration': 2.642}, {'end': 3653.644, 'text': 'Now we are going to state the good event.', 'start': 3651.942, 'duration': 1.702}, {'end': 3656.405, 'text': 'The good event happens most of the time.', 'start': 3654.344, 'duration': 2.061}, {'end': 3658.987, 'text': 'So it happens with probability greater than or equal to 1 minus delta.', 'start': 3656.505, 'duration': 2.482}, {'end': 3663.049, 'text': 'And that statement is that E in tracks E out.', 'start': 3659.707, 'duration': 3.342}, {'end': 3664.89, 'text': 'They are within this omega.', 'start': 3663.649, 'duration': 1.241}, {'end': 3670.073, 'text': 'And omega happens to be a function of the number of examples.', 'start': 3665.41, 'duration': 4.663}, {'end': 3671.114, 'text': 'It goes down with it.', 'start': 3670.373, 'duration': 0.741}, {'end': 3674.756, 'text': 'Of the hypothesis set, it goes up with its VC dimension.', 'start': 3671.794, 'duration': 2.962}, {'end': 3679.038, 'text': 'And with the delta, the probability you chose to make the statement about.', 'start': 3675.416, 'duration': 3.622}, {'end': 3681.52, 'text': 'And this guy will go up with smaller delta.', 'start': 3679.559, 'duration': 1.961}, {'end': 3685.803, 'text': "So I'm just keeping it in this form, because we don't worry about this anymore.", 'start': 3682.701, 'duration': 3.102}, {'end': 3687.805, 'text': 'We just want to understand this value.', 'start': 3686.444, 'duration': 1.361}, {'end': 3688.685, 'text': "So let's look at it.", 'start': 3687.945, 'duration': 0.74}, {'end': 3691.788, 'text': 'And that will be called the generalization bound.', 'start': 3690.006, 'duration': 1.782}, {'end': 3696.611, 'text': 'So with probability on minus delta, we have this value.', 'start': 3691.868, 'duration': 4.743}, {'end': 3698.573, 'text': "So now I'm going to simplify this.", 'start': 3697.112, 'duration': 1.461}, {'end': 3701.215, 'text': "So here's the first simplification.", 'start': 3699.874, 'duration': 1.341}, {'end': 3709.061, 'text': "Instead of the absolute value of E out minus E in, I'm going to have just E out minus E in.", 'start': 3702.216, 'duration': 6.845}], 'summary': 'Explanation of the relationship between growth function, vc dimension, and the probability of generalization in machine learning.', 'duration': 136.915, 'max_score': 3572.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3572146.jpg'}, {'end': 3610.014, 'src': 'embed', 'start': 3584.073, 'weight': 1, 'content': [{'end': 3588.237, 'text': "Now I'm going to call this formula capital Omega, which is a notation that will survive with us.", 'start': 3584.073, 'duration': 4.164}, {'end': 3591.159, 'text': "It's a formula that depends on several things.", 'start': 3589.077, 'duration': 2.082}, {'end': 3596.884, 'text': 'And as you can see, if the growth function is bigger, the VC dimension is bigger, Omega is worse.', 'start': 3591.519, 'duration': 5.365}, {'end': 3604.31, 'text': "That's understood, because the bigger VC dimension, the worse the guarantee on generalization, which is this approximation thing.", 'start': 3597.264, 'duration': 7.046}, {'end': 3610.014, 'text': 'And if I have more examples, I am in good shape, because now the gross function is polynomial.', 'start': 3605.09, 'duration': 4.924}], 'summary': 'Formula capital omega depends on growth function and vc dimension, with larger vc dimension leading to worse generalization guarantee.', 'duration': 25.941, 'max_score': 3584.073, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3584073.jpg'}, {'end': 3701.215, 'src': 'embed', 'start': 3670.373, 'weight': 3, 'content': [{'end': 3671.114, 'text': 'It goes down with it.', 'start': 3670.373, 'duration': 0.741}, {'end': 3674.756, 'text': 'Of the hypothesis set, it goes up with its VC dimension.', 'start': 3671.794, 'duration': 2.962}, {'end': 3679.038, 'text': 'And with the delta, the probability you chose to make the statement about.', 'start': 3675.416, 'duration': 3.622}, {'end': 3681.52, 'text': 'And this guy will go up with smaller delta.', 'start': 3679.559, 'duration': 1.961}, {'end': 3685.803, 'text': "So I'm just keeping it in this form, because we don't worry about this anymore.", 'start': 3682.701, 'duration': 3.102}, {'end': 3687.805, 'text': 'We just want to understand this value.', 'start': 3686.444, 'duration': 1.361}, {'end': 3688.685, 'text': "So let's look at it.", 'start': 3687.945, 'duration': 0.74}, {'end': 3691.788, 'text': 'And that will be called the generalization bound.', 'start': 3690.006, 'duration': 1.782}, {'end': 3696.611, 'text': 'So with probability on minus delta, we have this value.', 'start': 3691.868, 'duration': 4.743}, {'end': 3698.573, 'text': "So now I'm going to simplify this.", 'start': 3697.112, 'duration': 1.461}, {'end': 3701.215, 'text': "So here's the first simplification.", 'start': 3699.874, 'duration': 1.341}], 'summary': 'The transcript discusses the relationship between vc dimension, delta, and generalization bound.', 'duration': 30.842, 'max_score': 3670.373, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3670373.jpg'}, {'end': 3822.157, 'src': 'embed', 'start': 3794.657, 'weight': 2, 'content': [{'end': 3798.519, 'text': "Because it's the difference between what you did out of sample versus in sample.", 'start': 3794.657, 'duration': 3.862}, {'end': 3801.161, 'text': 'So this is a bound on the generalization error.', 'start': 3798.86, 'duration': 2.301}, {'end': 3805.965, 'text': 'And when you rearrange it, you can say with probability greater than or equal to 1, and so on.', 'start': 3802.042, 'duration': 3.923}, {'end': 3808.206, 'text': 'And this is the form that will survive with us.', 'start': 3805.985, 'duration': 2.221}, {'end': 3810.208, 'text': 'You just take E in and put it on the other side.', 'start': 3808.567, 'duration': 1.641}, {'end': 3814.911, 'text': 'Now, this is a generalization bound.', 'start': 3812.229, 'duration': 2.682}, {'end': 3817.413, 'text': 'And it is very interesting to look at it.', 'start': 3815.832, 'duration': 1.581}, {'end': 3822.157, 'text': 'So it bounds E out on the left-hand side with E in plus omega.', 'start': 3817.814, 'duration': 4.343}], 'summary': 'Generalization bound compares e out with e in plus omega.', 'duration': 27.5, 'max_score': 3794.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3794657.jpg'}, {'end': 3866.885, 'src': 'embed', 'start': 3839.71, 'weight': 5, 'content': [{'end': 3844.152, 'text': 'Furthermore, it shows that remember when we talked about the trade-off?', 'start': 3839.71, 'duration': 4.442}, {'end': 3847.753, 'text': 'Remember when someone said bigger hypothesis is good or bad?', 'start': 3844.232, 'duration': 3.521}, {'end': 3851.154, 'text': "OK, it's good for en, but bad for generalization.", 'start': 3847.953, 'duration': 3.201}, {'end': 3852.635, 'text': 'Now you can see why.', 'start': 3851.834, 'duration': 0.801}, {'end': 3855.956, 'text': 'So this guy goes down with a bigger hypothesis set.', 'start': 3852.735, 'duration': 3.221}, {'end': 3858.938, 'text': 'This guy goes up with a bigger hypothesis set.', 'start': 3856.576, 'duration': 2.362}, {'end': 3860.459, 'text': 'Poorer generalization.', 'start': 3859.278, 'duration': 1.181}, {'end': 3866.885, 'text': "Therefore, it's not clear that it's a good idea to pick a bigger hypothesis set or a smaller hypothesis set.", 'start': 3860.96, 'duration': 5.925}], 'summary': 'Bigger hypothesis set leads to poorer generalization, indicating trade-off between set size and generalization quality.', 'duration': 27.175, 'max_score': 3839.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3839710.jpg'}, {'end': 3904.417, 'src': 'heatmap', 'start': 3834.758, 'weight': 6, 'content': [{'end': 3838.682, 'text': 'So it tells us something about E out, in terms of quantities that we control.', 'start': 3834.758, 'duration': 3.924}, {'end': 3844.152, 'text': 'Furthermore, it shows that remember when we talked about the trade-off?', 'start': 3839.71, 'duration': 4.442}, {'end': 3847.753, 'text': 'Remember when someone said bigger hypothesis is good or bad?', 'start': 3844.232, 'duration': 3.521}, {'end': 3851.154, 'text': "OK, it's good for en, but bad for generalization.", 'start': 3847.953, 'duration': 3.201}, {'end': 3852.635, 'text': 'Now you can see why.', 'start': 3851.834, 'duration': 0.801}, {'end': 3855.956, 'text': 'So this guy goes down with a bigger hypothesis set.', 'start': 3852.735, 'duration': 3.221}, {'end': 3858.938, 'text': 'This guy goes up with a bigger hypothesis set.', 'start': 3856.576, 'duration': 2.362}, {'end': 3860.459, 'text': 'Poorer generalization.', 'start': 3859.278, 'duration': 1.181}, {'end': 3866.885, 'text': "Therefore, it's not clear that it's a good idea to pick a bigger hypothesis set or a smaller hypothesis set.", 'start': 3860.96, 'duration': 5.925}, {'end': 3873.15, 'text': 'There may be a balance between them that will make the sum the smallest possible, and that affects the quantity I care about.', 'start': 3867.305, 'duration': 5.845}, {'end': 3875.332, 'text': 'So this will translate to that.', 'start': 3874.251, 'duration': 1.081}, {'end': 3878.615, 'text': 'The other thing is that now that I got rid of the absolute values,', 'start': 3875.872, 'duration': 2.743}, {'end': 3882.518, 'text': "we'll be able to take expected values in certain cases and compare it with other stuff.", 'start': 3878.615, 'duration': 3.903}, {'end': 3884.52, 'text': 'So this will be a very friendly quantity to do.', 'start': 3882.558, 'duration': 1.962}, {'end': 3892.106, 'text': "It's so friendly that we are going to derive a technique, one of the most important techniques in machine learning, based on this.", 'start': 3885.18, 'duration': 6.926}, {'end': 3893.567, 'text': "It's called regularization.", 'start': 3892.126, 'duration': 1.441}, {'end': 3898.672, 'text': 'And the idea here is that I use E in as a proxy for E out.', 'start': 3894.608, 'duration': 4.064}, {'end': 3904.417, 'text': "Now, after all of this analysis, I realize that it's not E in only that affects the game.", 'start': 3899.773, 'duration': 4.644}], 'summary': 'Bigger hypothesis set leads to poorer generalization, influencing the choice of hypothesis set size and introducing the concept of regularization in machine learning.', 'duration': 69.659, 'max_score': 3834.758, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3834758.jpg'}], 'start': 3475.81, 'title': 'Generalization bounds', 'summary': 'Discusses simplifying vc dimension, introducing a new formula for epsilon and delta, and explaining the impact of growth function and examples. it also delves into the concept of generalization bound and its implications on trade-off between hypothesis set size and generalization performance.', 'chapters': [{'end': 3625.004, 'start': 3475.81, 'title': 'Vc dimension simplification and generalization bounds', 'summary': 'Simplifies the vc dimension concept, introduces a new formula for determining epsilon and delta, and explains the impact of growth function and number of examples on generalization bounds.', 'duration': 149.194, 'highlights': ['The formula for determining epsilon and delta is simplified, allowing for the computation of probability based on tolerance and vice versa.', 'The new formula, denoted as Omega, depends on the growth function and VC dimension, where a larger VC dimension leads to a worse generalization guarantee.', 'The impact of the number of examples on generalization bounds is explained, where having more examples results in a better guarantee due to the logarithmic nature of the formula.']}, {'end': 3920.708, 'start': 3625.364, 'title': 'Generalization bound and its implications', 'summary': 'Discusses the concept of a generalization bound, which bounds the difference between out-of-sample and in-sample performance, and its implications on the trade-off between hypothesis set size and generalization performance.', 'duration': 295.344, 'highlights': ['The generalization bound, which is a function of the number of examples, decreases with the increase in the hypothesis set and increases with its VC dimension and the chosen probability delta, illustrating the trade-off between hypothesis set size and generalization performance.', 'The generalization error, represented by the difference between out-of-sample and in-sample performance, is bounded by the generalization bound, providing insights into the relationship between controlled quantities and out-of-sample performance.', 'The implications of the trade-off between hypothesis set size and generalization performance are highlighted, indicating that a balance between them may lead to the smallest possible generalization error, influencing the overall performance.', 'The concept of regularization, based on using a modified proxy for in-sample performance, is introduced as a technique derived from the analysis of the generalization bound, emphasizing its significance in the field of machine learning.']}], 'duration': 444.898, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3475810.jpg', 'highlights': ['The formula for determining epsilon and delta is simplified, allowing for the computation of probability based on tolerance and vice versa.', 'The new formula, denoted as Omega, depends on the growth function and VC dimension, where a larger VC dimension leads to a worse generalization guarantee.', 'The impact of the number of examples on generalization bounds is explained, where having more examples results in a better guarantee due to the logarithmic nature of the formula.', 'The generalization bound, which is a function of the number of examples, decreases with the increase in the hypothesis set and increases with its VC dimension and the chosen probability delta, illustrating the trade-off between hypothesis set size and generalization performance.', 'The generalization error, represented by the difference between out-of-sample and in-sample performance, is bounded by the generalization bound, providing insights into the relationship between controlled quantities and out-of-sample performance.', 'The implications of the trade-off between hypothesis set size and generalization performance are highlighted, indicating that a balance between them may lead to the smallest possible generalization error, influencing the overall performance.', 'The concept of regularization, based on using a modified proxy for in-sample performance, is introduced as a technique derived from the analysis of the generalization bound, emphasizing its significance in the field of machine learning.']}, {'end': 4396.851, 'segs': [{'end': 3990.554, 'src': 'embed', 'start': 3961.663, 'weight': 0, 'content': [{'end': 3966.647, 'text': 'That means really that if I have a breakpoint, then any bigger point is also a breakpoint.', 'start': 3961.663, 'duration': 4.984}, {'end': 3970.27, 'text': 'And most of the discussion deals with the smallest breakpoint.', 'start': 3967.127, 'duration': 3.143}, {'end': 3978.014, 'text': 'So the notion of a breakpoint covers a lot of values.', 'start': 3973.266, 'duration': 4.748}, {'end': 3983.462, 'text': 'The VC dimension is a unique one, which happens to be the biggest value just short of the first breakpoint.', 'start': 3978.374, 'duration': 5.088}, {'end': 3990.554, 'text': 'Does this cover it? Yeah.', 'start': 3986.812, 'duration': 3.742}], 'summary': 'Discussion on breakpoints, vc dimension, and covering a wide range of values.', 'duration': 28.891, 'max_score': 3961.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3961663.jpg'}, {'end': 4076.233, 'src': 'embed', 'start': 4031.455, 'weight': 1, 'content': [{'end': 4036.498, 'text': 'Breakpoint is a failure to shatter, and VC dimension is an ability to shatter.', 'start': 4031.455, 'duration': 5.043}, {'end': 4041.784, 'text': 'And obviously, if you take the maximum ability to shatter, which will give you the value of the VC dimension,', 'start': 4037.398, 'duration': 4.386}, {'end': 4045.308, 'text': 'that will be one short of the next guy which you failed to shatter.', 'start': 4041.784, 'duration': 3.524}, {'end': 4048.252, 'text': 'So that is your smallest breakpoint, and the other ones will be other breakpoints.', 'start': 4045.328, 'duration': 2.924}, {'end': 4056.578, 'text': 'Also, can you repeat the practical interpretation of epsilon and delta? Epsilon and delta.', 'start': 4050.114, 'duration': 6.464}, {'end': 4061.782, 'text': 'Epsilon and delta as two quantities, they are the performance parameters of learning.', 'start': 4056.798, 'duration': 4.984}, {'end': 4064.724, 'text': 'So there are two things that I want to make sure of.', 'start': 4062.502, 'duration': 2.222}, {'end': 4067.566, 'text': 'I want to make sure that E in tracks E out.', 'start': 4065.264, 'duration': 2.302}, {'end': 4070.568, 'text': 'The level of tracking is epsilon.', 'start': 4068.587, 'duration': 1.981}, {'end': 4072.85, 'text': "That's the approximation parameter.", 'start': 4070.628, 'duration': 2.222}, {'end': 4076.233, 'text': 'Now, I cannot guarantee that statement absolutely.', 'start': 4073.49, 'duration': 2.743}], 'summary': 'Vc dimension measures ability to shatter, epsilon and delta are performance parameters of learning.', 'duration': 44.778, 'max_score': 4031.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI4031455.jpg'}, {'end': 4252.408, 'src': 'embed', 'start': 4228.206, 'weight': 2, 'content': [{'end': 4234.37, 'text': "And therefore you can't really either keep track of this redundancy exactly or say it cannot be more than the number of parameters,", 'start': 4228.206, 'duration': 6.164}, {'end': 4236.271, 'text': 'because even taking into consideration,', 'start': 4234.37, 'duration': 1.901}, {'end': 4238.933, 'text': 'So in many of the cases, the VC dimension is estimated as a bound.', 'start': 4236.371, 'duration': 2.562}, {'end': 4241.475, 'text': 'But again, we are already in a bound.', 'start': 4239.753, 'duration': 1.722}, {'end': 4244.639, 'text': "Even if you know it exactly, it's not like we know what the generalization error will be like.", 'start': 4241.535, 'duration': 3.104}, {'end': 4246.441, 'text': 'We know a bound on the generalization error.', 'start': 4244.679, 'duration': 1.762}, {'end': 4252.408, 'text': 'So in this series of logical development, we get a bound on a bound on a bound on a bound.', 'start': 4246.962, 'duration': 5.446}], 'summary': 'Vc dimension estimated as a bound, leading to a series of nested bounds.', 'duration': 24.202, 'max_score': 4228.206, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI4228206.jpg'}, {'end': 4312.146, 'src': 'embed', 'start': 4281.611, 'weight': 4, 'content': [{'end': 4287.935, 'text': 'If you actually go and solve the VC inequality and try to get a bound, the bound will be ridiculously high.', 'start': 4281.611, 'duration': 6.324}, {'end': 4290.558, 'text': 'Much higher than you will actually need in practice.', 'start': 4288.775, 'duration': 1.783}, {'end': 4293.041, 'text': "But you don't use it as an absolute indication.", 'start': 4290.858, 'duration': 2.183}, {'end': 4294.664, 'text': 'You use it only as a relative indication.', 'start': 4293.061, 'duration': 1.603}, {'end': 4302.735, 'text': 'We come across any interesting examples where n has to be much bigger than 10 times the VC dimension.', 'start': 4296.907, 'duration': 5.828}, {'end': 4312.146, 'text': 'The interesting example is that when the customer is very finicky and wants a very smaller y and delta, because the smaller y and delta,', 'start': 4303.499, 'duration': 8.647}], 'summary': 'Vc inequality bound is high, used as relative indication, n > 10*vc dimension in some cases.', 'duration': 30.535, 'max_score': 4281.611, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI4281611.jpg'}], 'start': 3921.569, 'title': 'Understanding vc dimension, breakpoint, epsilon, and delta in learning', 'summary': 'Explores the relationship between vc dimension and breakpoint, clarifies the interpretation of epsilon and delta as performance parameters, and discusses the practical implications of the vc inequality in determining the number of examples needed for learning.', 'chapters': [{'end': 4048.252, 'start': 3921.569, 'title': 'Understanding vc dimension and breakpoint', 'summary': 'Explains the relationship between vc dimension and breakpoint, with the vc dimension being the biggest value just short of the first breakpoint, and clarifies that the breakpoint is a failure to shatter while vc dimension is an ability to shatter.', 'duration': 126.683, 'highlights': ['The VC dimension is the biggest value just short of the first breakpoint, and it covers a lot of values. (Relevance: 5)', 'The breakpoint is a failure to shatter, while the VC dimension is an ability to shatter. (Relevance: 4)', 'When defining a breakpoint, any bigger point is also a breakpoint, and the notion of a breakpoint covers a lot of values. (Relevance: 3)', 'The distinction between a breakpoint and a VC dimension is that the breakpoint poses it negatively, and the VC dimension poses it positively. (Relevance: 2)', 'The maximum ability to shatter gives the value of the VC dimension, which is one short of the next breakpoint. (Relevance: 1)']}, {'end': 4396.851, 'start': 4050.114, 'title': 'Interpretation of epsilon and delta in learning', 'summary': 'Explains the interpretation of epsilon and delta as performance parameters in learning, the effect of error measure on the number of points to choose, the challenges in determining vc dimension for most learning hypotheses, and the practical implications of the vc inequality in determining the number of examples needed for learning.', 'duration': 346.737, 'highlights': ['The VC inequality and its practical implications in determining the number of examples needed for learning, indicating that the bound can be much higher than what is actually needed in practice.', 'The challenges in determining the VC dimension for most learning hypotheses, with the VC dimension estimated as a bound, leading to a loose bound on the generalization error.', 'The practical interpretation of epsilon and delta as performance parameters in learning, with epsilon as the approximation parameter and delta as the probability measure to ensure a high probability that the tracking statement holds.', 'The effect of error measure on the number of points to choose, with different error measures leading to modifications and variances in the analysis.']}], 'duration': 475.282, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Dc0sr0kdBVI/pics/Dc0sr0kdBVI3921569.jpg', 'highlights': ['The VC dimension is the biggest value just short of the first breakpoint, and it covers a lot of values. (Relevance: 5)', 'The breakpoint is a failure to shatter, while the VC dimension is an ability to shatter. (Relevance: 4)', 'The challenges in determining the VC dimension for most learning hypotheses, with the VC dimension estimated as a bound, leading to a loose bound on the generalization error.', 'The practical interpretation of epsilon and delta as performance parameters in learning, with epsilon as the approximation parameter and delta as the probability measure to ensure a high probability that the tracking statement holds.', 'The VC inequality and its practical implications in determining the number of examples needed for learning, indicating that the bound can be much higher than what is actually needed in practice.']}], 'highlights': ['The VC dimension is the biggest value just short of the first breakpoint, and it covers a lot of values.', 'The transition from Hoeffding inequality to VC inequality characterizes generalization with fundamental and technical differences.', 'The growth function characterizes redundancy from overlapping bad regions of hypotheses, leading to the switch to VC inequality.', 'The VC dimension serves as an upper bound for the growth function of a hypothesis set, with the order of the bounding polynomial being equal to the VC dimension.', 'The VC dimension is exactly d+1, demonstrating the ability to shatter d+1 points.', 'The VC dimension is at most d plus 1, demonstrated by the inability to implement a specific dichotomy using a perceptron.', 'VC dimension abstracts model expressiveness, translates parameters into binary degrees of freedom, reliable measurement of degrees of freedom', 'The VC dimension is a measure of the effective number of parameters, which can be smaller than the raw number of parameters, and it captures the degrees of freedom in learning systems', 'The fact that there is a finite VC dimension means that learning is possible, representing a significant theoretical achievement', 'The number of examples needed is proportional to the VC dimension, showcasing the impact of varying VC dimension.', 'Theoretical bounds in terms of the number of examples exhibit limitations due to varying performance with different VC dimensions.', 'The behavior of curves is influenced by varying the VC dimension, demonstrating the impact on the number of examples required.', 'The formula for determining epsilon and delta is simplified, allowing for the computation of probability based on tolerance and vice versa.', 'The new formula, denoted as Omega, depends on the growth function and VC dimension, where a larger VC dimension leads to a worse generalization guarantee.', 'The generalization bound, which is a function of the number of examples, decreases with the increase in the hypothesis set and increases with its VC dimension and the chosen probability delta, illustrating the trade-off between hypothesis set size and generalization performance.', 'The implications of the trade-off between hypothesis set size and generalization performance are highlighted, indicating that a balance between them may lead to the smallest possible generalization error, influencing the overall performance.']}