title

Lecture 06 - Theory of Generalization

description

Theory of Generalization - How an infinite model can learn from a finite sample. The most important theoretical result in machine learning. Lecture 6 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on April 19, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 06 - Theory of Generalization', 'heatmap': [{'end': 713.612, 'start': 564.318, 'weight': 0.759}, {'end': 848.45, 'start': 790.715, 'weight': 0.702}, {'end': 1126.901, 'start': 1068.567, 'weight': 0.767}], 'summary': 'The lecture covers topics including dichotomies, generalization theory with polynomial constraints and breakpoints, patterns, recursion, and analysis techniques, analytic solutions for growth functions, neural network polynomial power, handling overlaps in growth function and canvas of data sets, vapnik-chervonenkis proof, vapnik-szervoninkis inequality, and bounding growth function and bias-variance tradeoff.', 'chapters': [{'end': 542.025, 'segs': [{'end': 41.701, 'src': 'embed', 'start': 2.941, 'weight': 1, 'content': [{'end': 5.822, 'text': 'The following program is brought to you by Caltech.', 'start': 2.941, 'duration': 2.881}, {'end': 18.767, 'text': 'Welcome back.', 'start': 18.247, 'duration': 0.52}, {'end': 26.19, 'text': 'Last time, we introduced some important concepts in our theoretical development.', 'start': 21.068, 'duration': 5.122}, {'end': 30.572, 'text': 'And the first concept was dichotomies.', 'start': 27.831, 'duration': 2.741}, {'end': 35.459, 'text': 'And the idea is that there is an input space behind this opaque sheet.', 'start': 31.758, 'duration': 3.701}, {'end': 39.68, 'text': "And there is a hypothesis that's separating red regions from blue regions.", 'start': 36.099, 'duration': 3.581}, {'end': 41.701, 'text': "But we don't get to see that.", 'start': 40.48, 'duration': 1.221}], 'summary': 'Caltech program introduces theoretical concepts, focusing on dichotomies and hypothesis separation of regions.', 'duration': 38.76, 'max_score': 2.941, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2941.jpg'}, {'end': 111.669, 'src': 'embed', 'start': 62.097, 'weight': 0, 'content': [{'end': 67.121, 'text': 'we had a problem with counting the number of hypotheses because we end up with a very large number.', 'start': 62.097, 'duration': 5.024}, {'end': 74.786, 'text': 'But if you restrict your attention to dichotomies, which are the hypotheses restricted to a finite set of points,', 'start': 67.841, 'duration': 6.945}, {'end': 79.589, 'text': "the blue and red points here then you don't have to count everything that is happening outside.", 'start': 74.786, 'duration': 4.803}, {'end': 83.532, 'text': 'You only count it as different when something different happens only on those points.', 'start': 79.629, 'duration': 3.903}, {'end': 87.115, 'text': 'So a dichotomy is a mini-hypothesis, if you will.', 'start': 84.152, 'duration': 2.963}, {'end': 92.979, 'text': 'And it counts the hypotheses only on the finite set of points.', 'start': 87.855, 'duration': 5.124}, {'end': 101.162, 'text': 'This resulted in a definition that parallels the number of hypotheses, which is the number of dichotomies in this case.', 'start': 94.238, 'duration': 6.924}, {'end': 103.244, 'text': 'So we defined the growth function.', 'start': 101.783, 'duration': 1.461}, {'end': 107.746, 'text': 'The growth function is, you pick the points x1 up to xn.', 'start': 104.044, 'duration': 3.702}, {'end': 111.669, 'text': 'You pick them wisely with a view to maximizing the dichotomies,', 'start': 108.287, 'duration': 3.382}], 'summary': 'Defined growth function to count dichotomies, parallel to number of hypotheses.', 'duration': 49.572, 'max_score': 62.097, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE62097.jpg'}, {'end': 243.492, 'src': 'embed', 'start': 218.163, 'weight': 4, 'content': [{'end': 223.145, 'text': 'We then talked about the maximum number of dichotomies under the constraint that there is a breakpoint.', 'start': 218.163, 'duration': 4.982}, {'end': 231.747, 'text': 'And we had an illustrative example to tell you that when you tell me that you cannot get all patterns on any in this case two points,', 'start': 223.305, 'duration': 8.442}, {'end': 236.649, 'text': 'that is a very strong restriction on the number of dichotomies you can get on a larger number of points.', 'start': 231.747, 'duration': 4.902}, {'end': 238.15, 'text': 'So this is the simplest case.', 'start': 236.989, 'duration': 1.161}, {'end': 242.651, 'text': 'If you take any two columns, you can get all four patterns.', 'start': 239.45, 'duration': 3.201}, {'end': 243.492, 'text': "That's by degree.", 'start': 242.691, 'duration': 0.801}], 'summary': 'The constraint of a breakpoint limits the number of dichotomies, illustrated with the example of getting all patterns on two points.', 'duration': 25.329, 'max_score': 218.163, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE218163.jpg'}], 'start': 2.941, 'title': 'Dichotomies and generalization', 'summary': 'Introduces the concept of dichotomies in theoretical development and discusses the theory of generalization, focusing on the growth function bounded by a polynomial and characterizing generalization under the constraint of a breakpoint.', 'chapters': [{'end': 62.097, 'start': 2.941, 'title': 'Introduction to dichotomies in theoretical development', 'summary': 'Introduces the concept of dichotomies in theoretical development, explaining how an input space is divided into red and blue regions, with only the observed data points being visible, and the purpose for the dichotomies.', 'duration': 59.156, 'highlights': ['The chapter introduces the concept of dichotomies, where an input space is divided into red and blue regions, with only observed data points being visible.', 'It explains that the purpose for the dichotomies is to separate red regions from blue regions in the input space.']}, {'end': 542.025, 'start': 62.097, 'title': 'Theory of generalization', 'summary': 'Discusses the concept of dichotomies as mini-hypotheses and introduces the growth function, which is bounded by a polynomial, with a focus on the maximum number of dichotomies under the constraint of a breakpoint, aiming to characterize the generalization.', 'duration': 479.928, 'highlights': ["The growth function is defined as the maximum number of dichotomies on n points, such that they have a breakpoint K, and it is bounded by a polynomial. The growth function is a key concept, representing the maximum number of dichotomies with a breakpoint, and it's crucial for generalization.", 'The concept of dichotomies as mini-hypotheses is introduced, which counts the hypotheses only on a finite set of points. The introduction of dichotomies as mini-hypotheses emphasizes counting hypotheses on a finite set of points, simplifying the calculation.', 'The breakpoint is defined as a restriction on the maximum number of dichotomies, with a specific example of a breakpoint for the perceptrons in a two-dimensional space. The concept of a breakpoint as a restriction on the number of dichotomies is illustrated with a specific example in a two-dimensional space.']}], 'duration': 539.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2941.jpg', 'highlights': ["The growth function is a key concept, representing the maximum number of dichotomies with a breakpoint, and it's crucial for generalization.", 'The chapter introduces the concept of dichotomies, where an input space is divided into red and blue regions, with only observed data points being visible.', 'The concept of dichotomies as mini-hypotheses is introduced, which counts the hypotheses only on a finite set of points.', 'The purpose for the dichotomies is to separate red regions from blue regions in the input space.', 'The breakpoint is defined as a restriction on the maximum number of dichotomies, with a specific example of a breakpoint for the perceptrons in a two-dimensional space.']}, {'end': 1596.212, 'segs': [{'end': 715.793, 'src': 'heatmap', 'start': 564.318, 'weight': 0, 'content': [{'end': 572.265, 'text': "Now I'm going to do a structural analysis of this, and this will happen through this division.", 'start': 564.318, 'duration': 7.947}, {'end': 573.586, 'text': 'So look at it.', 'start': 572.925, 'duration': 0.661}, {'end': 575.207, 'text': 'Still the same problem.', 'start': 574.387, 'duration': 0.82}, {'end': 581.793, 'text': "x1 and xn is my vector, and I'm trying to fill this with as many rows as possible under a constraint of a breakpoint.", 'start': 575.307, 'duration': 6.486}, {'end': 584.896, 'text': "But now I'm going to isolate the last point.", 'start': 582.494, 'duration': 2.402}, {'end': 587.998, 'text': 'Why am I isolating the last point? Because I want a recursion.', 'start': 585.496, 'duration': 2.502}, {'end': 590.44, 'text': 'I want to be able to relate this fellow.', 'start': 588.379, 'duration': 2.061}, {'end': 593.923, 'text': 'to the same fellow applied to smaller quantities.', 'start': 591.181, 'duration': 2.742}, {'end': 600.588, 'text': 'And you have seen enough of that to realize that if I manage to do that, I might be able to actually solve for b of n and k.', 'start': 594.584, 'duration': 6.004}, {'end': 601.989, 'text': "That's why I'm isolating the last point.", 'start': 600.588, 'duration': 1.401}, {'end': 612.497, 'text': 'So after I do the isolation, I am going to group the rows of the big matrix into some groups.', 'start': 603.591, 'duration': 8.906}, {'end': 615.8, 'text': 'So this is just my way of looking at things.', 'start': 613.598, 'duration': 2.202}, {'end': 616.821, 'text': "I haven't changed anything.", 'start': 615.96, 'duration': 0.861}, {'end': 622.992, 'text': "What I'm going to do, I'm going to shuffle the rows around after you have constructed them.", 'start': 617.708, 'duration': 5.284}, {'end': 628.216, 'text': "So you have a full matrix now, and I'm shuffling them, and putting some guys in the first group.", 'start': 623.012, 'duration': 5.204}, {'end': 630.818, 'text': "And the first group I'm going to call S1.", 'start': 629.237, 'duration': 1.581}, {'end': 634.781, 'text': 'Here is the definition of the group S1.', 'start': 633.16, 'duration': 1.621}, {'end': 644.248, 'text': 'These are the rows that appear only once as far as x1 up to xn-1 are concerned.', 'start': 636.122, 'duration': 8.126}, {'end': 651.339, 'text': 'Well, every row in its entirety appears only once, because these are different rows.', 'start': 645.558, 'duration': 5.781}, {'end': 652.84, 'text': "That's how I'm constructing the matrix.", 'start': 651.399, 'duration': 1.441}, {'end': 661.981, 'text': 'But if you take out the last guy, it is conceivable that the first n minus 1 coordinates happen twice once with extension minus 1,', 'start': 653.46, 'duration': 8.521}, {'end': 664.002, 'text': 'once with extension plus 1..', 'start': 661.981, 'duration': 2.021}, {'end': 667.763, 'text': "So I'm taking the guys that go with only one extension, whatever it might be.", 'start': 664.002, 'duration': 3.761}, {'end': 671.583, 'text': 'It could be minus 1 or it could be plus 1, but not both, and putting them in this group.', 'start': 667.783, 'duration': 3.8}, {'end': 674.264, 'text': 'Fairly well defined.', 'start': 673.084, 'duration': 1.18}, {'end': 680.209, 'text': 'So you fill it up, and these are all the rows that have a single extension.', 'start': 676.735, 'duration': 3.474}, {'end': 689.655, 'text': 'Now you go under this, and you define the number of rows in this group to be alpha.', 'start': 682.529, 'duration': 7.126}, {'end': 690.396, 'text': 'It is a number.', 'start': 689.815, 'duration': 0.581}, {'end': 691.697, 'text': "I'm just going to call it alpha.", 'start': 690.456, 'duration': 1.241}, {'end': 700.624, 'text': "And you can see where this is going, because now I'm going to claim that the b of n and k, which is the total number of rows in the entire matrix,", 'start': 692.698, 'duration': 7.926}, {'end': 702.306, 'text': 'is alpha plus something.', 'start': 700.624, 'duration': 1.682}, {'end': 703.487, 'text': 'That is obvious.', 'start': 702.946, 'duration': 0.541}, {'end': 706.79, 'text': "I have already taken care of alpha, and I'm going to add up the other stuff later on.", 'start': 703.527, 'duration': 3.263}, {'end': 713.612, 'text': "So what is the other stuff? The other stuff I'm going to call S2.", 'start': 709.372, 'duration': 4.24}, {'end': 715.793, 'text': 'And you probably have a good guess what these are.', 'start': 714.172, 'duration': 1.621}], 'summary': 'Structural analysis of matrix with division and grouping based on extensions and alpha value.', 'duration': 23.095, 'max_score': 564.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE564318.jpg'}, {'end': 818.48, 'src': 'embed', 'start': 790.715, 'weight': 1, 'content': [{'end': 798.896, 'text': 'Now, I am going to try to find a handle on alphas and betas, so that I can find a recursion for the big function b of n and k.', 'start': 790.715, 'duration': 8.181}, {'end': 811.859, 'text': 'b of n and k are the maximum number of rows, patterns, I can get on n points, such that no k columns have all possible patterns.', 'start': 798.896, 'duration': 12.963}, {'end': 813.579, 'text': "That's the definition.", 'start': 812.799, 'duration': 0.78}, {'end': 818.48, 'text': "I'm going to relate that to the same quantity on smaller numbers, smaller n and smaller k.", 'start': 814.059, 'duration': 4.421}], 'summary': 'Finding recursion for max number of patterns on n points with no k columns having all possible patterns', 'duration': 27.765, 'max_score': 790.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE790715.jpg'}, {'end': 848.45, 'src': 'heatmap', 'start': 790.715, 'weight': 0.702, 'content': [{'end': 798.896, 'text': 'Now, I am going to try to find a handle on alphas and betas, so that I can find a recursion for the big function b of n and k.', 'start': 790.715, 'duration': 8.181}, {'end': 811.859, 'text': 'b of n and k are the maximum number of rows, patterns, I can get on n points, such that no k columns have all possible patterns.', 'start': 798.896, 'duration': 12.963}, {'end': 813.579, 'text': "That's the definition.", 'start': 812.799, 'duration': 0.78}, {'end': 818.48, 'text': "I'm going to relate that to the same quantity on smaller numbers, smaller n and smaller k.", 'start': 814.059, 'duration': 4.421}, {'end': 823.812, 'text': 'The first is to estimate alpha and beta.', 'start': 821.691, 'duration': 2.121}, {'end': 831.553, 'text': "I'd like to ask you to focus on the x1 up to xn minus 1 columns.", 'start': 827.313, 'duration': 4.24}, {'end': 836.875, 'text': "And I'm going to help you visually do that by graying out the rest.", 'start': 832.314, 'duration': 4.561}, {'end': 840.695, 'text': 'Now for a moment, look at this.', 'start': 839.375, 'duration': 1.32}, {'end': 846.829, 'text': 'Are these rows different? They used to be different when you have the extension.', 'start': 842.176, 'duration': 4.653}, {'end': 848.45, 'text': 'Well, let me see.', 'start': 847.79, 'duration': 0.66}], 'summary': 'Finding recursion for b of n and k, estimating alpha and beta, relating to smaller numbers', 'duration': 57.735, 'max_score': 790.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE790715.jpg'}, {'end': 997.965, 'src': 'embed', 'start': 941.628, 'weight': 2, 'content': [{'end': 951.173, 'text': 'So what is the condition that is occurring here? I can say that alpha plus beta, which is the total number of rows or patterns in this mini-matrix.', 'start': 941.628, 'duration': 9.545}, {'end': 958.734, 'text': 'Can I say something about a breakpoint for this small matrix? Yeah.', 'start': 952.914, 'duration': 5.82}, {'end': 963.856, 'text': 'The original matrix, I could not find all possible patterns on any k columns.', 'start': 959.614, 'duration': 4.242}, {'end': 971.178, 'text': 'Right? So I cannot possibly find all possible patterns on any k columns on this smaller set.', 'start': 964.316, 'duration': 6.862}, {'end': 978.5, 'text': 'Because if I find all possible patterns on k columns here, they will serve as all possible patterns in the big matrix.', 'start': 971.758, 'duration': 6.742}, {'end': 980.201, 'text': "And I know that doesn't exist.", 'start': 979.101, 'duration': 1.1}, {'end': 991.577, 'text': 'So I can now confidently say that alpha plus beta, which is the number of different guys here, is less than or equal to b of n, minus 1,', 'start': 981.626, 'duration': 9.951}, {'end': 997.965, 'text': 'because I have only x1 up to x, minus 1, and k, because that is the breakpoint for these guys as well.', 'start': 991.577, 'duration': 6.388}], 'summary': 'The number of patterns is less than or equal to b of n, minus 1.', 'duration': 56.337, 'max_score': 941.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE941628.jpg'}, {'end': 1126.901, 'src': 'heatmap', 'start': 1068.567, 'weight': 0.767, 'content': [{'end': 1080.582, 'text': "Now the interesting thing when I look at these guys is that I'm going to be able to argue that these guys have a breakpoint of k minus 1, not k.", 'start': 1068.567, 'duration': 12.015}, {'end': 1081.623, 'text': 'The argument is very cute.', 'start': 1080.582, 'duration': 1.041}, {'end': 1088.896, 'text': "Let's say that you have all possible patterns on guys in this small matrix.", 'start': 1083.331, 'duration': 5.565}, {'end': 1089.856, 'text': 'First, I have to kill.', 'start': 1088.936, 'duration': 0.92}, {'end': 1094.32, 'text': 'I mean, these are not different guys, because these are identical to these.', 'start': 1089.896, 'duration': 4.424}, {'end': 1097.883, 'text': 'So let me reduce it to the guys that are patently different.', 'start': 1094.6, 'duration': 3.283}, {'end': 1099.144, 'text': "So I'm now looking at this matrix.", 'start': 1097.963, 'duration': 1.181}, {'end': 1101.746, 'text': "I'm claiming that is a breakpoint here.", 'start': 1099.464, 'duration': 2.282}, {'end': 1102.827, 'text': 'Why is that??', 'start': 1102.346, 'duration': 0.481}, {'end': 1113.934, 'text': 'Because if you had guys here where you get all possible patterns, Then by adding both copies plus 1 and minus 1, and adding xn,', 'start': 1103.267, 'duration': 10.667}, {'end': 1123.619, 'text': 'you will be getting k columns overall that have all possible patterns, which you know you cannot have because k is a breakpoint for the whole thing.', 'start': 1113.934, 'duration': 9.685}, {'end': 1126.901, 'text': "So now I'm taking advantage of the fact that these guys repeat.", 'start': 1124.36, 'duration': 2.541}], 'summary': 'The argument suggests a breakpoint of k minus 1, not k, based on patterns and repetition in the small matrix.', 'duration': 58.334, 'max_score': 1068.567, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE1068567.jpg'}, {'end': 1308.5, 'src': 'embed', 'start': 1279.949, 'weight': 4, 'content': [{'end': 1284.131, 'text': 'So b of n and k now, I know, has to be at most this fellow.', 'start': 1279.949, 'duration': 4.182}, {'end': 1286.893, 'text': 'So you can see where the recursion is going.', 'start': 1285.412, 'duration': 1.481}, {'end': 1292.237, 'text': 'So now I know that this property holds for the b of n and k.', 'start': 1288.496, 'duration': 3.741}, {'end': 1297.778, 'text': 'And now all I need to do is solve it in order to find an actual numerical value for b of n and k.', 'start': 1292.237, 'duration': 5.541}, {'end': 1308.5, 'text': 'And that numerical value will serve as an upper bound for any gross function of a hypothesis set that has a breakpoint k.', 'start': 1297.778, 'duration': 10.722}], 'summary': 'Recursion property holds for b of n and k, serving as an upper bound for gross function with breakpoint k.', 'duration': 28.551, 'max_score': 1279.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE1279949.jpg'}, {'end': 1367.436, 'src': 'embed', 'start': 1336.282, 'weight': 5, 'content': [{'end': 1339.864, 'text': "So this would be, there's a breakpoint of 1, breakpoint 2, breakpoint 3, et cetera.", 'start': 1336.282, 'duration': 3.582}, {'end': 1346.288, 'text': "And what I'd like to do here, I'd like to fill this table with an upper bound on b of n and k.", 'start': 1340.485, 'duration': 5.803}, {'end': 1350.15, 'text': "I'd like to put numbers here that I know that b of n and k can be at most that number.", 'start': 1346.288, 'duration': 3.862}, {'end': 1354.692, 'text': 'And we can construct this matrix very, very easily, having this recursion.', 'start': 1351.35, 'duration': 3.342}, {'end': 1355.773, 'text': "So here's what we do.", 'start': 1355.113, 'duration': 0.66}, {'end': 1360.577, 'text': 'First, I fill the boundary conditions.', 'start': 1358.072, 'duration': 2.505}, {'end': 1362.441, 'text': "So let's look at this.", 'start': 1361.379, 'duration': 1.062}, {'end': 1367.436, 'text': 'Here, it says that there is a breakpoint of 1.', 'start': 1362.662, 'duration': 4.774}], 'summary': 'Filling a table with upper bounds for b of n and k, constructing a matrix using recursion to determine breakpoints.', 'duration': 31.154, 'max_score': 1336.282, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE1336282.jpg'}], 'start': 542.225, 'title': 'Patterns, recursion, and analysis', 'summary': 'Discusses structural analysis, recursion, and estimation techniques, highlighting grouping of rows, defining groups s1 and s2, quantifying total number of rows, estimating alpha and beta, and analyzing matrix patterns and breakpoints to establish upper bounds for values.', 'chapters': [{'end': 818.48, 'start': 542.225, 'title': 'Structural analysis and recursion in pattern formation', 'summary': 'Discusses the structural analysis and recursion in pattern formation, highlighting the grouping of rows into categories, the definition of groups s1 and s2, and the quantification of the total number of rows in the matrix based on these groups.', 'duration': 276.255, 'highlights': ['The total number of rows in the matrix is quantified as alpha plus 2 beta, where alpha represents the number of rows in group S1 and beta represents the number of rows in group S2. The analysis quantifies the total number of rows in the matrix based on the grouping into categories, with alpha representing the number of rows in group S1 and beta representing the number of rows in group S2.', 'The definition of group S1 involves selecting rows that appear only once as far as x1 up to xn-1 are concerned, and the number of rows in this group is denoted as alpha. Group S1 is defined as the set of rows that appear only once as far as x1 up to xn-1 are concerned, and the number of rows in this group is denoted as alpha.', 'Group S2 consists of rows that occur with both patterns (extension plus 1 and extension minus 1), and the total number of rows in this group is denoted as beta. Group S2 is comprised of rows that occur with both patterns (extension plus 1 and extension minus 1), and the total number of rows in this group is denoted as beta.', 'The chapter aims to find a recursion for the function b of n and k, representing the maximum number of rows and patterns on n points, such that no k columns have all possible patterns. The goal is to establish a recursion for the function b of n and k, which denotes the maximum number of rows and patterns on n points, ensuring that no k columns have all possible patterns.']}, {'end': 1043.109, 'start': 821.691, 'title': 'Estimating alpha and beta', 'summary': 'Discusses the process of estimating alpha and beta, where alpha plus beta represents the total number of rows or patterns in a mini-matrix, and it is proven that alpha plus beta is less than or equal to b of n, minus 1, multiplied by k, where b of n and k is the maximum number of rows.', 'duration': 221.418, 'highlights': ['Alpha plus beta, the total number of rows or patterns in the mini-matrix, is proven to be less than or equal to b of n, minus 1, multiplied by k, where b of n and k is the maximum number of rows.', 'It is explained that the maximum number of rows obtained in a special way is at most b of n and k, n and k minus 1, and therefore, alpha plus beta is proven to be less than or equal to this value.', 'The chapter discusses the process of visually identifying different rows based on extensions and conditions, leading to the estimation of alpha and beta.']}, {'end': 1596.212, 'start': 1044.482, 'title': 'Analysis of matrix patterns and breakpoints', 'summary': 'Focuses on analyzing the patterns and breakpoints in a matrix, demonstrating the relationship between the number of patterns and breakpoints, providing insights into the recursive nature of the analysis, and establishing upper bounds for the values through a numerical computation.', 'duration': 551.73, 'highlights': ['The analysis focuses on the patterns and breakpoints in a matrix, demonstrating the relationship between the number of patterns and breakpoints. The analysis delves into identifying patterns and breakpoints in a matrix, illustrating the relationship between the number of patterns and breakpoints, and providing insights into the constraints and implications of these patterns and breakpoints.', 'The chapter provides insights into the recursive nature of the analysis and establishes upper bounds for the values through a numerical computation. Insights are provided into the recursive nature of the analysis, showcasing the process of establishing upper bounds for the values through a numerical computation, demonstrating the progression of values for different breakpoints, and highlighting the vacuous nature of constraints for larger breakpoints.', 'The analysis demonstrates the process of filling a table with upper bounds for the values, showcasing the constraints and implications of the patterns and breakpoints in the matrix. The process of filling a table with upper bounds for the values is demonstrated, emphasizing the constraints and implications of the patterns and breakpoints in the matrix, and showcasing the progression of values for different breakpoints, with a focus on the vacuous nature of constraints for larger breakpoints.']}], 'duration': 1053.987, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE542225.jpg', 'highlights': ['The total number of rows in the matrix is quantified as alpha plus 2 beta, where alpha represents the number of rows in group S1 and beta represents the number of rows in group S2.', 'The chapter aims to find a recursion for the function b of n and k, representing the maximum number of rows and patterns on n points, such that no k columns have all possible patterns.', 'Alpha plus beta, the total number of rows or patterns in the mini-matrix, is proven to be less than or equal to b of n, minus 1, multiplied by k, where b of n and k is the maximum number of rows.', 'The analysis focuses on the patterns and breakpoints in a matrix, demonstrating the relationship between the number of patterns and breakpoints.', 'The chapter provides insights into the recursive nature of the analysis and establishes upper bounds for the values through a numerical computation.', 'The process of filling a table with upper bounds for the values is demonstrated, emphasizing the constraints and implications of the patterns and breakpoints in the matrix.']}, {'end': 2283.627, 'segs': [{'end': 1732.619, 'src': 'embed', 'start': 1705.478, 'weight': 0, 'content': [{'end': 1709.501, 'text': 'So this quantity will be an upper bound for B of N and K.', 'start': 1705.478, 'duration': 4.023}, {'end': 1712.883, 'text': 'So you can now, if you believe that, which we will argue in a moment,', 'start': 1709.501, 'duration': 3.382}, {'end': 1721.309, 'text': 'that you compute this number and that will be an upper bound for the growth function of any hypothesis set that has a breakpoint K,', 'start': 1712.883, 'duration': 8.426}, {'end': 1725.312, 'text': 'without asking any questions whatsoever about the hypothesis set or the input space.', 'start': 1721.309, 'duration': 4.003}, {'end': 1732.619, 'text': "Now, it shouldn't come as a surprise that this quantity is right, because if you look at this,", 'start': 1726.958, 'duration': 5.661}], 'summary': 'An upper bound for b of n and k can be computed without asking questions about the hypothesis set or the input space.', 'duration': 27.141, 'max_score': 1705.478, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE1705478.jpg'}], 'start': 1597.933, 'title': 'Analytic solution for b of n and k', 'summary': 'Presents an analytic solution for b of n and k, demonstrating the upper bound for the growth function of any hypothesis set with a breakpoint k, and proving its equivalence through a combinatorial argument.', 'chapters': [{'end': 2283.627, 'start': 1597.933, 'title': 'Analytic solution for b of n and k', 'summary': 'Presents an analytic solution for b of n and k, providing an upper bound for the growth function of any hypothesis set with a breakpoint k, without needing to ask any questions. it demonstrates the combinatorial quantity n choose i as an upper bound for b of n and k, and proves its equivalence through a combinatorial argument.', 'duration': 685.694, 'highlights': ['It demonstrates the combinatorial quantity n choose i as an upper bound for B of N and K, and proves its equivalence through a combinatorial argument, providing a theoretical upper bound for the growth function of any hypothesis set with a breakpoint K.', 'The chapter presents an analytic solution for B of n and k, providing an upper bound for the growth function of any hypothesis set with a breakpoint K, without needing to ask any questions.', 'It shows the equivalence of the combinatorial quantity n choose i as an upper bound for B of N and K through a combinatorial argument, demonstrating the theoretical upper bound for the growth function of any hypothesis set with a breakpoint K.']}], 'duration': 685.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE1597933.jpg', 'highlights': ['It demonstrates the combinatorial quantity n choose i as an upper bound for B of N and K, and proves its equivalence through a combinatorial argument, providing a theoretical upper bound for the growth function of any hypothesis set with a breakpoint K.', 'The chapter presents an analytic solution for B of n and k, providing an upper bound for the growth function of any hypothesis set with a breakpoint K, without needing to ask any questions.', 'It shows the equivalence of the combinatorial quantity n choose i as an upper bound for B of N and K through a combinatorial argument, demonstrating the theoretical upper bound for the growth function of any hypothesis set with a breakpoint K.']}, {'end': 2734.277, 'segs': [{'end': 2410.242, 'src': 'embed', 'start': 2330.062, 'weight': 0, 'content': [{'end': 2339.163, 'text': 'Therefore, the maximum power in this quantity is n to the k minus 1.', 'start': 2330.062, 'duration': 9.101}, {'end': 2344.565, 'text': 'This comes from n times n minus 1 times n minus 2 times k times.', 'start': 2339.163, 'duration': 5.402}, {'end': 2347.886, 'text': 'That corresponds to the case where i equals k minus 1.', 'start': 2344.845, 'duration': 3.041}, {'end': 2349.666, 'text': 'So when you get n, choose k minus 1.', 'start': 2347.886, 'duration': 1.78}, {'end': 2350.267, 'text': "That's what you get.", 'start': 2349.666, 'duration': 0.601}, {'end': 2353.968, 'text': "Anything else will give you a power of n, but it's smaller power.", 'start': 2350.747, 'duration': 3.221}, {'end': 2355.769, 'text': 'So this is the most you will have.', 'start': 2354.748, 'duration': 1.021}, {'end': 2360.71, 'text': "What do we know about this fellow K? We know it's just a number.", 'start': 2356.849, 'duration': 3.861}, {'end': 2361.59, 'text': "It's a constant.", 'start': 2360.93, 'duration': 0.66}, {'end': 2363.331, 'text': "It doesn't change with n.", 'start': 2361.61, 'duration': 1.721}, {'end': 2367.832, 'text': 'And therefore, this is indeed a polynomial in N.', 'start': 2363.331, 'duration': 4.501}, {'end': 2371.773, 'text': 'And we have achieved what we set out to do.', 'start': 2367.832, 'duration': 3.941}, {'end': 2373.734, 'text': 'That is pretty good.', 'start': 2372.754, 'duration': 0.98}, {'end': 2380.616, 'text': "So let's take three examples in order to make this relate to experiences we had before.", 'start': 2375.854, 'duration': 4.762}, {'end': 2386.072, 'text': 'This is the famous quantity by now.', 'start': 2384.571, 'duration': 1.501}, {'end': 2386.873, 'text': 'You know it by heart.', 'start': 2386.112, 'duration': 0.761}, {'end': 2387.953, 'text': 'I have the n.', 'start': 2387.333, 'duration': 0.62}, {'end': 2388.714, 'text': 'I remember k.', 'start': 2387.953, 'duration': 0.761}, {'end': 2389.835, 'text': 'I have to put minus 1.', 'start': 2388.714, 'duration': 1.121}, {'end': 2393.257, 'text': 'And that is the upper bound for anything that has a breakpoint k.', 'start': 2389.835, 'duration': 3.422}, {'end': 2400.582, 'text': "So now let's take hypothesis sets we looked at before, for which we computed the gross function explicitly,", 'start': 2393.257, 'duration': 7.325}, {'end': 2403.023, 'text': 'and see if they actually satisfy the bound.', 'start': 2400.582, 'duration': 2.441}, {'end': 2404.724, 'text': 'They had better, because this is math.', 'start': 2403.284, 'duration': 1.44}, {'end': 2405.285, 'text': 'We proved it.', 'start': 2404.785, 'duration': 0.5}, {'end': 2407.366, 'text': 'But just to see that this is indeed the case.', 'start': 2405.785, 'duration': 1.581}, {'end': 2410.242, 'text': 'Positive ray.', 'start': 2409.622, 'duration': 0.62}], 'summary': 'The maximum power in the given quantity is n to the k minus 1, forming a polynomial in n.', 'duration': 80.18, 'max_score': 2330.062, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2330062.jpg'}, {'end': 2589.332, 'src': 'embed', 'start': 2558.501, 'weight': 4, 'content': [{'end': 2562.105, 'text': "It will always happen that it's true, but there will be a slack in many cases.", 'start': 2558.501, 'duration': 3.604}, {'end': 2566.167, 'text': 'So now, we verified it.', 'start': 2564.246, 'duration': 1.921}, {'end': 2567.747, 'text': 'We are very comfortable now with the result.', 'start': 2566.187, 'duration': 1.56}, {'end': 2572.188, 'text': "Let's apply it to something where we could not get the growth function.", 'start': 2568.087, 'duration': 4.101}, {'end': 2582.165, 'text': 'Remember this old fellow? Well, in the two-dimensional perceptron, we went through a full argument just to prove that the breakpoint is 4.', 'start': 2573.129, 'duration': 9.036}, {'end': 2589.332, 'text': "But we didn't bother to go through a general number of points N and ask ourselves how many hypotheses can the perceptron generate on N points??", 'start': 2582.165, 'duration': 7.167}], 'summary': 'Verified result, 2d perceptron breakpoint at 4.', 'duration': 30.831, 'max_score': 2558.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2558501.jpg'}], 'start': 2283.627, 'title': 'Polynomial power and growth function in neural networks', 'summary': "Discusses the polynomial power of a neural network function in relation to hoeffding's theorem, highlighting the maximum power of n to the k minus 1 and its polynomial nature. it also examines the application of the growth function and breakpoints to hypothesis sets, demonstrating tight bounds through examples such as positive rays, positive intervals, and perceptrons in two dimensions, ultimately leading to the justification of substituting the growth function for the number of hypotheses in the context of the hoeffding inequality.", 'chapters': [{'end': 2387.953, 'start': 2283.627, 'title': 'Polynomial power of neural network function', 'summary': "Discusses the polynomial power of a neural network function in relation to hoeffding's theorem, highlighting the maximum power of n to the k minus 1 and its polynomial nature.", 'duration': 104.326, 'highlights': ['The maximum power in the discussed quantity is n to the k minus 1, where k is a constant, demonstrating its polynomial nature.', "The chapter emphasizes achieving a polynomial quantity in n, contrasting it with the negative exponential in n from Hoeffding's theorem.", 'The function of the neural network is described as n multiplied by itself a number of times, i times for the i-th term, leading to the maximum power of n to the k minus 1.', 'The polynomial nature of the neural network function is underscored by the constant nature of k, independent of n.']}, {'end': 2734.277, 'start': 2387.953, 'title': 'Growth function and breakpoints in hypothesis sets', 'summary': 'Discusses the application of the growth function and breakpoints to hypothesis sets, demonstrating their tight bounds through examples such as positive rays, positive intervals, and perceptrons in two dimensions, ultimately leading to the justification of substituting the growth function for the number of hypotheses in the context of the hoeffding inequality.', 'duration': 346.324, 'highlights': ['The gross function for positive rays is n+1, which satisfies the bound, demonstrating a tight relationship between the growth function and breakpoints. The gross function for positive rays is n+1, which is verified to satisfy the bound, showcasing the strong correlation between the growth function and breakpoints.', 'The analysis also confirms the tightness of the bounds for positive intervals, reinforcing the relationship between the growth function and breakpoints. The analysis confirms the tightness of the bounds for positive intervals, further emphasizing the interconnection between the growth function and breakpoints.', 'The application of breakpoints allows for bounding the growth function of perceptrons in two dimensions, providing a simple characterization of hypothesis sets and enabling its application in the context of the Hoeffding inequality. Breakpoints are utilized to bound the growth function of perceptrons in two dimensions, offering a straightforward characterization of hypothesis sets and facilitating its integration with the Hoeffding inequality.']}], 'duration': 450.65, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2283627.jpg', 'highlights': ['The maximum power in the discussed quantity is n to the k minus 1, demonstrating its polynomial nature.', 'The function of the neural network is described as n multiplied by itself a number of times, i times for the i-th term, leading to the maximum power of n to the k minus 1.', 'The polynomial nature of the neural network function is underscored by the constant nature of k, independent of n.', 'The gross function for positive rays is n+1, which satisfies the bound, demonstrating a tight relationship between the growth function and breakpoints.', 'The analysis also confirms the tightness of the bounds for positive intervals, reinforcing the relationship between the growth function and breakpoints.', 'The application of breakpoints allows for bounding the growth function of perceptrons in two dimensions, providing a simple characterization of hypothesis sets and enabling its application in the context of the Hoeffding inequality.']}, {'end': 3068.799, 'segs': [{'end': 2810.955, 'src': 'embed', 'start': 2781.351, 'weight': 2, 'content': [{'end': 2785.695, 'text': "So we'll get a perfect handle on the E in, the in-sample error part of the deal.", 'start': 2781.351, 'duration': 4.344}, {'end': 2788.735, 'text': 'But in the Hefding inequality, there is this E out.', 'start': 2786.633, 'duration': 2.102}, {'end': 2792.458, 'text': 'And E out relates to the performance over the entire space.', 'start': 2789.315, 'duration': 3.143}, {'end': 2794.52, 'text': 'So we are no longer talking about dichotomies.', 'start': 2792.839, 'duration': 1.681}, {'end': 2796.042, 'text': 'We are talking about full hypotheses.', 'start': 2794.54, 'duration': 1.502}, {'end': 2799.225, 'text': 'So we lose the benefit of the growth function.', 'start': 2796.582, 'duration': 2.643}, {'end': 2803.228, 'text': 'So what do we do about E out? That was a question that was asked last time.', 'start': 2799.865, 'duration': 3.363}, {'end': 2810.955, 'text': 'So what to do about E out in order to get the argument to conform while we are just using a finite sample is the second step.', 'start': 2803.709, 'duration': 7.246}], 'summary': 'Discussing the challenge of e out in hefding inequality and addressing it with finite samples.', 'duration': 29.604, 'max_score': 2781.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2781351.jpg'}, {'end': 2879.381, 'src': 'embed', 'start': 2854.83, 'weight': 0, 'content': [{'end': 2862.652, 'text': "Well, I'm doing this space because the event, being good or bad, whether E in goes to E out depends on the sample, depends on the data set.", 'start': 2854.83, 'duration': 7.822}, {'end': 2866.033, 'text': 'For some data sets, you will be close to the E out.', 'start': 2863.093, 'duration': 2.94}, {'end': 2868.494, 'text': 'For some data sets, you are not going to be close.', 'start': 2866.433, 'duration': 2.061}, {'end': 2876.536, 'text': 'So I want to draw it here, in order to look at the bad regions and the overlaps, and then argue why the growth function is useful for the overlaps.', 'start': 2869.414, 'duration': 7.122}, {'end': 2879.381, 'text': 'Now, we assume that there is a probability distribution.', 'start': 2877.316, 'duration': 2.065}], 'summary': 'The e in to e out transition depends on the data set, with some being close and some not, analyzed using the growth function.', 'duration': 24.551, 'max_score': 2854.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2854830.jpg'}, {'end': 2949.784, 'src': 'embed', 'start': 2920.918, 'weight': 3, 'content': [{'end': 2926.245, 'text': "And we realize that I didn't paint a lot of area, and that is because of Hoeffding inequality.", 'start': 2920.918, 'duration': 5.327}, {'end': 2928.848, 'text': 'Hoeffding inequality tells me that that area is small.', 'start': 2926.265, 'duration': 2.583}, {'end': 2932.572, 'text': "So I'm entitled to put a small patch.", 'start': 2929.989, 'duration': 2.583}, {'end': 2936.117, 'text': 'Now we went from one hypothesis, which is this guy.', 'start': 2933.614, 'duration': 2.503}, {'end': 2942.122, 'text': 'to the case where we have multiple hypotheses using the union bound.', 'start': 2937.54, 'duration': 4.582}, {'end': 2945.183, 'text': 'So again, this is the space of data sets, exactly the same one.', 'start': 2942.482, 'duration': 2.701}, {'end': 2949.784, 'text': 'And now I am saying, for the first hypothesis, you get this bad region.', 'start': 2945.883, 'duration': 3.901}], 'summary': 'Hoeffding inequality constrains area, union bound handles multiple hypotheses.', 'duration': 28.866, 'max_score': 2920.918, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2920918.jpg'}, {'end': 3056.127, 'src': 'embed', 'start': 3026.117, 'weight': 1, 'content': [{'end': 3026.757, 'text': 'You get all of them.', 'start': 3026.117, 'duration': 0.64}, {'end': 3032.392, 'text': "It's not as good as the first one.", 'start': 3030.951, 'duration': 1.441}, {'end': 3033.672, 'text': 'I never expected it to be.', 'start': 3032.612, 'duration': 1.06}, {'end': 3037.555, 'text': "But definitely not as bad as the second one, because now they're overlapping.", 'start': 3034.293, 'duration': 3.262}, {'end': 3043.838, 'text': 'And indeed, the total area, which is the bad region, something bad happening, is a small fraction of the whole thing.', 'start': 3037.655, 'duration': 6.183}, {'end': 3045.099, 'text': 'And I can learn.', 'start': 3044.158, 'duration': 0.941}, {'end': 3050.182, 'text': 'So we are trying to characterize this overlap.', 'start': 3046.68, 'duration': 3.502}, {'end': 3051.823, 'text': "That's the whole deal with the growth function.", 'start': 3050.222, 'duration': 1.601}, {'end': 3056.127, 'text': 'Now, one way to do it is the one that I alluded to before.', 'start': 3052.663, 'duration': 3.464}], 'summary': 'Small fraction of total area impacted, learning to characterize overlap in growth function.', 'duration': 30.01, 'max_score': 3026.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3026117.jpg'}], 'start': 2734.577, 'title': 'Handling overlaps in growth function and canvas of data sets and hypotheses', 'summary': "Discusses using growth function to address overlaps in in-sample and out-of-sample errors, challenges of incorporating e out in the argument, impact of hoeffding's inequality on bad regions, challenges posed by multiple hypotheses, and introduction of vc bound to address overlaps.", 'chapters': [{'end': 2820.08, 'start': 2734.577, 'title': 'Handling overlaps in growth function', 'summary': 'Discusses the use of growth function to address overlaps in the context of in-sample and out-of-sample errors, and the challenge of incorporating e out in the argument while using a finite sample.', 'duration': 85.503, 'highlights': ['The growth function is used to replace M, which assumed no overlaps, providing a handle on the in-sample error part of the deal.', 'The challenge lies in dealing with E out in the Hefding inequality, which relates to the performance over the entire space and requires addressing when using a finite sample.', 'Understanding how the growth function relates to overlaps is crucial in establishing a clean argument, offering insights into handling E in and E out for in-sample and out-of-sample errors, respectively.']}, {'end': 3068.799, 'start': 2821.361, 'title': 'Canvas of data sets and hypotheses', 'summary': "Discusses the concept of a canvas of data sets, the impact of hoeffding's inequality on the bad regions, the challenges posed by multiple hypotheses and the introduction of the vc bound to address overlaps, leading to the characterization of the growth function.", 'duration': 247.438, 'highlights': ['The canvas of data sets represents the entire set of possible data sets, where the event being good or bad depends on the sample, and the growth function is discussed in relation to the bad regions and overlaps. Canvas covering the entire set of possible data sets; discussion of bad regions and overlaps.', "The impact of Hoeffding's inequality is illustrated, indicating that the area of bad regions is small, leading to the introduction of the union bound and the challenges posed by multiple hypotheses in filling up the canvas. Illustration of small area of bad regions due to Hoeffding's inequality; challenges posed by multiple hypotheses in filling up the canvas.", "Introduction of the VC bound as a solution to address overlaps between hypotheses, leading to the characterization of the growth function in relation to the bad region's total area. Introduction of VC bound to address overlaps between hypotheses; characterization of growth function in relation to the bad region's total area."]}], 'duration': 334.222, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE2734577.jpg', 'highlights': ['Understanding how the growth function relates to overlaps is crucial in establishing a clean argument, offering insights into handling E in and E out for in-sample and out-of-sample errors, respectively.', "Introduction of VC bound to address overlaps between hypotheses; characterization of growth function in relation to the bad region's total area.", 'The challenge lies in dealing with E out in the Hefding inequality, which relates to the performance over the entire space and requires addressing when using a finite sample.', "The impact of Hoeffding's inequality is illustrated, indicating that the area of bad regions is small, leading to the introduction of the union bound and the challenges posed by multiple hypotheses in filling up the canvas."]}, {'end': 3562.29, 'segs': [{'end': 3119.885, 'src': 'embed', 'start': 3069.84, 'weight': 0, 'content': [{'end': 3075.922, 'text': "The reason we introduced the growth function is because it's an abstract quantity that is simple, and it's going to characterize the overlaps.", 'start': 3069.84, 'duration': 6.082}, {'end': 3082.424, 'text': 'So the question is, how is the growth function going to characterize the overlaps? Here is what is going to happen.', 'start': 3076.743, 'duration': 5.681}, {'end': 3094.489, 'text': 'I want to tell you that if you look at this canvas, if any point gets painted at all, it will get painted over 100 times.', 'start': 3084.545, 'duration': 9.944}, {'end': 3097.213, 'text': "Let's say that I have that guarantee.", 'start': 3096.052, 'duration': 1.161}, {'end': 3099.974, 'text': "I don't know which hypothesis will paint it again.", 'start': 3098.113, 'duration': 1.861}, {'end': 3105.077, 'text': 'But any point that gets a red, it will get a blue and a green a hundred times.', 'start': 3100.374, 'duration': 4.703}, {'end': 3116.723, 'text': "If I tell you that statement, what do you know about the total area that is colored? Now it's at most one hundredth of what it used to be.", 'start': 3106.698, 'duration': 10.025}, {'end': 3119.885, 'text': 'Because when I had them overlapping, they filled it over.', 'start': 3117.543, 'duration': 2.342}], 'summary': 'Introduction of growth function to characterize overlaps, guaranteeing each point gets painted over 100 times, reducing total colored area to at most one hundredth.', 'duration': 50.045, 'max_score': 3069.84, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3069840.jpg'}, {'end': 3242.84, 'src': 'embed', 'start': 3213.755, 'weight': 1, 'content': [{'end': 3215.297, 'text': "And that's the second part of the proof.", 'start': 3213.755, 'duration': 1.542}, {'end': 3224.272, 'text': 'What to do about E out? The simplest argument possible.', 'start': 3219.13, 'duration': 5.142}, {'end': 3227.094, 'text': 'That is really the breakthrough that Vapnik and Chervenkis did.', 'start': 3224.512, 'duration': 2.582}, {'end': 3232.016, 'text': "Back to the bin, just because it's an illustration of the binary case.", 'start': 3228.914, 'duration': 3.102}, {'end': 3234.277, 'text': 'So here, we have one hypothesis.', 'start': 3232.056, 'duration': 2.221}, {'end': 3241.52, 'text': 'And we have E out, which is the overall in the entire space, the error in the entire space.', 'start': 3235.297, 'duration': 6.223}, {'end': 3242.84, 'text': 'We pick a sample.', 'start': 3241.98, 'duration': 0.86}], 'summary': 'Vapnik and chervenkis made a breakthrough in the proof, addressing e out with a binary illustration.', 'duration': 29.085, 'max_score': 3213.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3213755.jpg'}, {'end': 3389.879, 'src': 'embed', 'start': 3365.382, 'weight': 4, 'content': [{'end': 3371.765, 'text': 'That is, when I had multiple bins, the tie between E out and E in became looser and looser.', 'start': 3365.382, 'duration': 6.383}, {'end': 3382.931, 'text': "Because I'm looking for worst case, and I might be unlucky enough that the tracking now lost the tightness that one bin with Herfding would dictate.", 'start': 3371.925, 'duration': 11.006}, {'end': 3389.879, 'text': 'If I am doing multiple bins and not looking at the bin at all, just looking at the two samples from each bin.', 'start': 3384.423, 'duration': 5.456}], 'summary': 'As the number of bins increased, the tie between e out and e in became looser, impacting the tracking accuracy.', 'duration': 24.497, 'max_score': 3365.382, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3365382.jpg'}, {'end': 3437.8, 'src': 'embed', 'start': 3411.995, 'weight': 2, 'content': [{'end': 3417.457, 'text': 'If you are patient enough, it will happen, exactly for the same reason, because you keep looking for it until it happens.', 'start': 3411.995, 'duration': 5.462}, {'end': 3425.542, 'text': 'So the mathematical ramifications of multiple hypotheses happen here exactly the same way they happen here.', 'start': 3418.433, 'duration': 7.109}, {'end': 3434.772, 'text': 'The finesse now is that if I characterize it using the two samples, then I am completely in the realm of dichotomies.', 'start': 3426.783, 'duration': 7.989}, {'end': 3437.8, 'text': "Because now I'm not appealing to E out at all.", 'start': 3435.899, 'duration': 1.901}], 'summary': 'Persistence leads to multiple hypotheses with mathematical ramifications.', 'duration': 25.805, 'max_score': 3411.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3411995.jpg'}], 'start': 3069.84, 'title': 'Growth function and vapnik-chervonenkis proof', 'summary': 'Explains how the growth function characterizes overlaps, leading to the total colored area shrinking to at most one hundredth of its original size. it also discusses the vapnik-chervonenkis proof, highlighting the breakthrough in dealing with eout and the mathematical ramifications of multiple hypotheses.', 'chapters': [{'end': 3161.144, 'start': 3069.84, 'title': 'Growth function and overlaps', 'summary': 'Explains how the growth function characterizes overlaps, demonstrating that any point on the canvas will be painted over 100 times, leading to the total colored area shrinking to at most one hundredth of its original size, as a result of overusing and shrinking the hypotheses.', 'duration': 91.304, 'highlights': ['The growth function characterizes overlaps by showing that any point on the canvas will be painted over 100 times, leading to the total colored area shrinking to at most one hundredth of its original size.', 'For every point that is colored, it has to be done 100 times, resulting in overusing and shrinking of the hypotheses.']}, {'end': 3562.29, 'start': 3161.484, 'title': 'Vapnik-chervonenkis proof', 'summary': 'Explains the vapnik-chervonenkis proof, highlighting the breakthrough in dealing with eout, the use of multiple samples to track eout and ein, and the mathematical ramifications of multiple hypotheses.', 'duration': 400.806, 'highlights': ['The breakthrough in dealing with Eout by Vapnik and Chervonenkis was the use of multiple samples to track Eout and Ein, allowing the characterization of dichotomies without appealing to Eout at all.', 'When using multiple bins and looking at two samples from each bin, they track each other but also get loosely apart as more bins are used, reflecting the looser tie between Eout and Ein in multiple bin scenarios.', 'The mathematical ramifications of multiple hypotheses happen in a similar way when characterizing using two samples, enabling the full characterization and readiness to proceed in the proof.']}], 'duration': 492.45, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3069840.jpg', 'highlights': ['The growth function characterizes overlaps by showing that any point on the canvas will be painted over 100 times, leading to the total colored area shrinking to at most one hundredth of its original size.', 'The breakthrough in dealing with Eout by Vapnik and Chervonenkis was the use of multiple samples to track Eout and Ein, allowing the characterization of dichotomies without appealing to Eout at all.', 'The mathematical ramifications of multiple hypotheses happen in a similar way when characterizing using two samples, enabling the full characterization and readiness to proceed in the proof.', 'For every point that is colored, it has to be done 100 times, resulting in overusing and shrinking of the hypotheses.', 'When using multiple bins and looking at two samples from each bin, they track each other but also get loosely apart as more bins are used, reflecting the looser tie between Eout and Ein in multiple bin scenarios.']}, {'end': 4278.612, 'segs': [{'end': 3620.631, 'src': 'embed', 'start': 3587.755, 'weight': 0, 'content': [{'end': 3595.382, 'text': 'Because if n is big enough, if I give you enough examples, using that hypothesis, you will be able to claim that E in tracks E out correctly.', 'start': 3587.755, 'duration': 7.627}, {'end': 3604.645, 'text': 'This result, which is called the Vapnik-Szervoninkis inequality, is the most important theoretical result in machine learning.', 'start': 3596.682, 'duration': 7.963}, {'end': 3610.568, 'text': "On that happy note, we'll stop here and take questions after a short break.", 'start': 3606.166, 'duration': 4.402}, {'end': 3620.631, 'text': "So let's start the Q&A.", 'start': 3619.631, 'duration': 1}], 'summary': 'Vapnik-szervoninkis inequality is key in ml theory.', 'duration': 32.876, 'max_score': 3587.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3587755.jpg'}, {'end': 3709.425, 'src': 'embed', 'start': 3678.158, 'weight': 1, 'content': [{'end': 3684.16, 'text': "And I'm supposed to pick the sample in that space that maximizes the number of dichotomies, et cetera, as we define the growth function.", 'start': 3678.158, 'duration': 6.002}, {'end': 3690.205, 'text': "But it's a sample that I pick when I apply this to a particular input space and a hypothesis set.", 'start': 3685.441, 'duration': 4.764}, {'end': 3699.993, 'text': "Also, there are some people asking, they didn't understand exactly why alpha was different to beta.", 'start': 3693.327, 'duration': 6.666}, {'end': 3709.425, 'text': 'Alpha is different from beta? Yeah, why? The short answer is that I never made the statement that alpha is different from beta.', 'start': 3700.833, 'duration': 8.592}], 'summary': 'Select sample to maximize dichotomies for growth function. no statement made on alpha vs beta.', 'duration': 31.267, 'max_score': 3678.158, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3678158.jpg'}, {'end': 3813.083, 'src': 'embed', 'start': 3784.868, 'weight': 2, 'content': [{'end': 3788.332, 'text': "So if you want a bound for it, yes, it's bounded by 2 to the n, not a polynomial.", 'start': 3784.868, 'duration': 3.464}, {'end': 3790.493, 'text': 'All of these cases.', 'start': 3789.392, 'duration': 1.101}, {'end': 3792.914, 'text': "we're addressing the case where there is a breakpoint,", 'start': 3790.493, 'duration': 2.421}, {'end': 3797.636, 'text': 'because that is the case where I can guarantee a polynomial and therefore I can guarantee learning.', 'start': 3792.914, 'duration': 4.722}, {'end': 3799.097, 'text': 'That is the interesting case.', 'start': 3798.176, 'duration': 0.921}, {'end': 3804.719, 'text': 'If there is no breakpoint, this theoretical line of analysis will not guarantee learning.', 'start': 3799.357, 'duration': 5.362}, {'end': 3813.083, 'text': 'So if I have a hypothesis set that happens to be able to shatter every set of points, I cannot make a statement using this line of analysis.', 'start': 3805.259, 'duration': 7.824}], 'summary': 'The bound for the cases with breakpoints is guaranteed by 2 to the n, not a polynomial.', 'duration': 28.215, 'max_score': 3784.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3784868.jpg'}, {'end': 3899.939, 'src': 'embed', 'start': 3872.392, 'weight': 3, 'content': [{'end': 3877.534, 'text': "And you would like to know some prediction of the generalization performance that you're going to get.", 'start': 3872.392, 'duration': 5.142}, {'end': 3881.795, 'text': 'So you go into the room and ask yourself for this hypothesis set over this input space.', 'start': 3878.214, 'duration': 3.581}, {'end': 3883.455, 'text': 'what is the breakpoint??', 'start': 3881.795, 'duration': 1.66}, {'end': 3891.097, 'text': "Now you have to actually go and study your hypothesis set and then find out that using this hypothesis set, I cannot separate, let's say,", 'start': 3883.495, 'duration': 7.602}, {'end': 3892.537, 'text': '10 points in every possible way.', 'start': 3891.097, 'duration': 1.44}, {'end': 3895.838, 'text': 'Very much along the argument we used for the perceptor in two dimensions.', 'start': 3892.917, 'duration': 2.921}, {'end': 3899.939, 'text': 'We found out that we cannot separate 4 points in every possible way.', 'start': 3896.078, 'duration': 3.861}], 'summary': 'The hypothesis set cannot separate 10 points in every possible way.', 'duration': 27.547, 'max_score': 3872.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3872392.jpg'}], 'start': 3562.29, 'title': 'Vapnik-szervoninkis inequality and analyzing growth function', 'summary': 'Discusses the vapnik-szervoninkis inequality, emphasizing the polynomial nature of the constant in relation to the breakpoint, and delves into the analysis of growth function, breakpoints, and their role in learning, including practical examples and the significance of tracking e in to minimize e out, as well as the practical use of vc dimension.', 'chapters': [{'end': 3677.617, 'start': 3562.29, 'title': 'Vapnik-szervoninkis inequality', 'summary': 'Discusses the vapnik-szervoninkis inequality, the most important theoretical result in machine learning, which states that for any hypothesis set with a breakpoint, the constant is polynomial in n, with the order decided by the breakpoint, allowing claim of correct e in tracks e out with enough examples.', 'duration': 115.327, 'highlights': ['The Vapnik-Szervoninkis inequality is the most important theoretical result in machine learning, stating that for any hypothesis set with a breakpoint, the constant is polynomial in n, with the order decided by the breakpoint, allowing claim of correct E in tracks E out with enough examples.', 'The statement holds true for any hypothesis set that has a breakpoint and the fellow is polynomial in n, with the order of the polynomial decided by the breakpoint.', 'With enough examples, using a hypothesis, one will be able to claim that E in tracks E out correctly, as per the Vapnik-Szervoninkis inequality.', 'The abstract labels x1 up to xn-1 in the hypothesis set do not correspond to any particular input space in mind, but when they do, they will correspond to a sample.']}, {'end': 4278.612, 'start': 3678.158, 'title': 'Analyzing growth function and breakpoints', 'summary': 'Discusses the analysis of growth function, breakpoints, and their relevance in guaranteeing learning, with examples such as convex sets and positive and negative rays, and the importance of tracking e in to minimize e out, while also addressing the practical use of vc dimension.', 'duration': 600.454, 'highlights': ['The growth function is analyzed in relation to the number of dichotomies and its application to specific input spaces and hypothesis sets, without making assertions about the relative values of alpha and beta.', 'The significance of breakpoints is highlighted in ensuring a polynomial and guaranteeing learning, particularly in cases with breakpoints, while acknowledging that theoretical analysis may not guarantee learning in the absence of breakpoints.', 'The practical relevance of determining the breakpoint for a given hypothesis set over an input space is emphasized, as well as the availability of pre-determined breakpoints for well-known learning models such as perceptrons and neural networks.', 'The discrepancy between the exact estimate of the growth function and the bound, exemplified by positive and negative rays, is discussed, indicating cases where the bounds may not be tight and the implications of focusing on E in to track E out for practical learning purposes.', 'The upcoming lecture on VC dimension is hinted at, addressing its relevance and assuring a comprehensive understanding, and a note is provided on the theoretical nature of the current lecture in comparison to the forthcoming more approachable mathematics.']}], 'duration': 716.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE3562290.jpg', 'highlights': ['The Vapnik-Szervoninkis inequality is the most important theoretical result in machine learning, stating that for any hypothesis set with a breakpoint, the constant is polynomial in n, with the order decided by the breakpoint, allowing claim of correct E in tracks E out with enough examples.', 'The growth function is analyzed in relation to the number of dichotomies and its application to specific input spaces and hypothesis sets, without making assertions about the relative values of alpha and beta.', 'The significance of breakpoints is highlighted in ensuring a polynomial and guaranteeing learning, particularly in cases with breakpoints, while acknowledging that theoretical analysis may not guarantee learning in the absence of breakpoints.', 'The practical relevance of determining the breakpoint for a given hypothesis set over an input space is emphasized, as well as the availability of pre-determined breakpoints for well-known learning models such as perceptrons and neural networks.']}, {'end': 4672.183, 'segs': [{'end': 4350.197, 'src': 'embed', 'start': 4326.636, 'weight': 0, 'content': [{'end': 4333.469, 'text': 'You just know that in your setup, when you get n points, that is the n here, and the breakpoint is k, that is k here.', 'start': 4326.636, 'duration': 6.833}, {'end': 4338.388, 'text': 'Under those conditions, can you bound the growth function?', 'start': 4334.665, 'duration': 3.723}, {'end': 4341.77, 'text': 'Can you tell me that the growth function can never be bigger than something?', 'start': 4338.428, 'duration': 3.342}, {'end': 4345.893, 'text': "That something is what I'm calling b of n and k.", 'start': 4342.471, 'duration': 3.422}, {'end': 4348.595, 'text': "So what am I doing? I'm taking the minimal conditions you gave me.", 'start': 4345.893, 'duration': 2.702}, {'end': 4350.197, 'text': 'I have n points, and k is a breakpoint.', 'start': 4348.616, 'duration': 1.581}], 'summary': 'Given n points and a breakpoint k, can the growth function be bounded by b(n, k)?', 'duration': 23.561, 'max_score': 4326.636, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE4326636.jpg'}, {'end': 4416.181, 'src': 'embed', 'start': 4386.77, 'weight': 1, 'content': [{'end': 4391.693, 'text': 'And that formula now serves as an upper bound for the more hairy quantity, which is the gross function.', 'start': 4386.77, 'duration': 4.923}, {'end': 4395.696, 'text': 'that is very particular to a learning situation, an input space and a hypothesis set.', 'start': 4391.693, 'duration': 4.003}, {'end': 4399.657, 'text': 'Also, a particular question on the proof of b and k, the recursion.', 'start': 4396.196, 'duration': 3.461}, {'end': 4405.578, 'text': 'So slide 5.', 'start': 4399.997, 'duration': 5.581}, {'end': 4416.181, 'text': 'The question is why does k not change when going back from n to n, minus 1?', 'start': 4405.578, 'duration': 10.603}], 'summary': 'The formula serves as an upper bound for the gross function, used in learning situation and recursion.', 'duration': 29.411, 'max_score': 4386.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE4386770.jpg'}, {'end': 4540.476, 'src': 'embed', 'start': 4512.466, 'weight': 2, 'content': [{'end': 4519.15, 'text': "I decided that when I get to regression functions, I'm going to apply a completely different approach, which is the bias-variance tradeoff.", 'start': 4512.466, 'duration': 6.684}, {'end': 4525.172, 'text': "It will give us another insight into the situation, and we'll tackle the real-valued functions directly, the regression functions.", 'start': 4519.55, 'duration': 5.622}, {'end': 4530.275, 'text': "And therefore, I think we'll have both the benefit of having another way of looking at it, and covering both types of functions.", 'start': 4525.533, 'duration': 4.742}, {'end': 4535.31, 'text': "So there's this person that says I feel silly asking this.", 'start': 4531.787, 'duration': 3.523}, {'end': 4540.476, 'text': 'but is the bottom line that we can prove learnability if the learning model cannot learn everything?', 'start': 4535.31, 'duration': 5.166}], 'summary': 'Plan to apply bias-variance tradeoff to regression functions for additional insight and coverage of both types of functions.', 'duration': 28.01, 'max_score': 4512.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE4512466.jpg'}, {'end': 4672.183, 'src': 'embed', 'start': 4655.418, 'weight': 3, 'content': [{'end': 4657.339, 'text': 'What is a breakpoint? Dichotomies.', 'start': 4655.418, 'duration': 1.921}, {'end': 4658.179, 'text': 'They are not really dichotomies.', 'start': 4657.379, 'duration': 0.8}, {'end': 4659.779, 'text': 'You have real values, you have not real values.', 'start': 4658.219, 'duration': 1.56}, {'end': 4663.36, 'text': 'So there are technicalities to be done in order to be able to reduce them to this case.', 'start': 4659.979, 'duration': 3.381}, {'end': 4667.422, 'text': 'But the same principle applies, regardless of the type of function you have.', 'start': 4663.7, 'duration': 3.722}, {'end': 4670.082, 'text': "I think that's it.", 'start': 4669.482, 'duration': 0.6}, {'end': 4670.462, 'text': 'Very good.', 'start': 4670.162, 'duration': 0.3}, {'end': 4672.183, 'text': "Thank you, and we'll see you next week.", 'start': 4670.983, 'duration': 1.2}], 'summary': 'Discussion on types of breakpoints and principles of function regardless of type.', 'duration': 16.765, 'max_score': 4655.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE4655418.jpg'}], 'start': 4278.952, 'title': 'Bounding growth function and bias-variance tradeoff', 'summary': 'Discusses the derivation of an upper bound for the growth function in learning situations and the application of bias-variance tradeoff in regression functions, highlighting the impact of target function on learnability and the independence of generalization from the target function.', 'chapters': [{'end': 4495.05, 'start': 4278.952, 'title': 'Bounding growth function for learning situations', 'summary': 'Discusses the derivation of an upper bound, b of n and k, for the growth function in learning situations using minimal conditions of n points and a breakpoint k, and the maintenance of the breakpoint k during recursion. it also explains the combinatorial nature of b of n and k and its significance as an upper bound for hypothesis sets.', 'duration': 216.098, 'highlights': ['The derivation of an upper bound, b of n and k, for the growth function in learning situations using minimal conditions of n points and a breakpoint k, which serves as an upper bound for any hypothesis set with a breakpoint k (relevance score: 5)', 'The explanation of the combinatorial nature of b of n and k and its significance as an upper bound for hypothesis sets, leading to a simple recursion and formula derivation (relevance score: 4)', 'The maintenance of the breakpoint k during recursion, where k remains fixed when reducing the set from n to n-1 due to constraints on the possible patterns in the columns (relevance score: 3)', 'The discussion of the combinatorial nature of b of n and k, making it analyzable without detailed input space and correlation between events, and its role as an upper bound for the growth function (relevance score: 2)']}, {'end': 4672.183, 'start': 4495.05, 'title': 'Bias-variance tradeoff in regression functions', 'summary': 'Discusses the application of bias-variance tradeoff in regression functions and the impact of target function on learnability, emphasizing the independence of generalization from the target function and the observation of learnability in-sample.', 'duration': 177.133, 'highlights': ['The chapter emphasizes the application of bias-variance tradeoff in regression functions and its contribution to gaining insight into the situation, covering both types of functions.', 'It explains the impact of the target function on learnability, stating that if there is a breakpoint, with enough examples, E in will be close to E out for the hypothesis set.', 'The generalization question is independent of the target function, while the observation of learnability in-sample is dependent on the target function.', 'The discussion also touches on the generalization of multi-class problems, which has no restriction on the inputs or outputs, and the concept of dichotomies and real values in technical reduction.', 'The analysis of VC analysis extended to real-valued functions is considered a technical extension that does not significantly add to the insight, leading to the decision to apply a different approach for regression functions.']}], 'duration': 393.231, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6FWRijsmLtE/pics/6FWRijsmLtE4278952.jpg', 'highlights': ['Derivation of an upper bound, b of n and k, for the growth function in learning situations using minimal conditions of n points and a breakpoint k (relevance score: 5)', 'The explanation of the combinatorial nature of b of n and k and its significance as an upper bound for hypothesis sets, leading to a simple recursion and formula derivation (relevance score: 4)', 'The chapter emphasizes the application of bias-variance tradeoff in regression functions and its contribution to gaining insight into the situation, covering both types of functions.', 'The discussion also touches on the generalization of multi-class problems, which has no restriction on the inputs or outputs, and the concept of dichotomies and real values in technical reduction.']}], 'highlights': ['The Vapnik-Szervoninkis inequality is the most important theoretical result in machine learning, stating that for any hypothesis set with a breakpoint, the constant is polynomial in n, with the order decided by the breakpoint, allowing claim of correct E in tracks E out with enough examples.', 'The growth function characterizes overlaps by showing that any point on the canvas will be painted over 100 times, leading to the total colored area shrinking to at most one hundredth of its original size.', 'The chapter presents an analytic solution for B of n and k, providing an upper bound for the growth function of any hypothesis set with a breakpoint K, without needing to ask any questions.', "The growth function is a key concept, representing the maximum number of dichotomies with a breakpoint, and it's crucial for generalization.", 'The maximum power in the discussed quantity is n to the k minus 1, demonstrating its polynomial nature.', 'Understanding how the growth function relates to overlaps is crucial in establishing a clean argument, offering insights into handling E in and E out for in-sample and out-of-sample errors, respectively.', 'The purpose for the dichotomies is to separate red regions from blue regions in the input space.', 'The chapter introduces the concept of dichotomies, where an input space is divided into red and blue regions, with only observed data points being visible.', 'The concept of dichotomies as mini-hypotheses is introduced, which counts the hypotheses only on a finite set of points.', 'The chapter aims to find a recursion for the function b of n and k, representing the maximum number of rows and patterns on n points, such that no k columns have all possible patterns.']}