title

Lecture 05 - Training Versus Testing

description

Training versus Testing - The difference between training and testing in mathematical terms. What makes a learning model able to generalize? Lecture 5 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on April 17, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 05 - Training Versus Testing', 'heatmap': [{'end': 3511.303, 'start': 3415.839, 'weight': 1}], 'summary': 'Lecture covers error measures, probabilistic learning, training vs testing, perceptron analysis, growth functions, breakpoints, combinatorial mathematics, and generalization theory, providing insights on polynomial growth functions and their limitations in feasible learning.', 'chapters': [{'end': 252.368, 'segs': [{'end': 60.835, 'src': 'embed', 'start': 0.683, 'weight': 0, 'content': [{'end': 3.564, 'text': 'The following program is brought to you by Caltech.', 'start': 0.683, 'duration': 2.881}, {'end': 16.047, 'text': 'Welcome back.', 'start': 15.468, 'duration': 0.579}, {'end': 28.572, 'text': 'Last time, we talked about error and noise, and these are two notions that relate the learning problem to practical situations.', 'start': 19.149, 'duration': 9.423}, {'end': 39.021, 'text': 'So, in the case of error measures, we realize that, in order to specify the error that is caused by your hypothesis,', 'start': 30.538, 'duration': 8.483}, {'end': 46.404, 'text': 'we should try to estimate the cost of using your h instead of f, which should have been used in the first place.', 'start': 39.021, 'duration': 7.383}, {'end': 52.847, 'text': 'And that is something the user can specify of the price to pay when they use h instead of f.', 'start': 47.124, 'duration': 5.723}, {'end': 56.068, 'text': 'And that is the principled way of defining an error measure.', 'start': 52.847, 'duration': 3.221}, {'end': 60.835, 'text': 'In the absence of that, which happens for quite a bit of time,', 'start': 57.529, 'duration': 3.306}], 'summary': 'Discussion on error measures in learning and estimating the cost of using a hypothesis instead of the ideal function.', 'duration': 60.152, 'max_score': 0.683, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU683.jpg'}], 'start': 0.683, 'title': 'Error measures and probabilistic learning', 'summary': 'Discusses defining error measures in learning problems, estimating cost, evaluating error on training and out-of-sample points, and emphasizes noisy targets affected by probability distribution, transitional and joint probability in training examples.', 'chapters': [{'end': 109.535, 'start': 0.683, 'title': 'Error measures and learning problem', 'summary': 'Discusses the principled way of defining error measures in relation to learning problems, including the estimation of cost and the evaluation of error on training and out-of-sample points.', 'duration': 108.852, 'highlights': ['The principled way of defining an error measure is by estimating the cost of using the hypothesis instead of the target function, allowing the user to specify the price to pay when using h instead of f.', 'The error between the performance of the hypothesis and the target function on a particular point can be plugged into different error quantities, such as in-sample error and out-of-sample error, and averaged to estimate the error on training points and out-of-sample points.', 'In the case of the training set, the error on the training points is estimated and averaged with respect to the N examples.', 'Analytic properties or practical properties of optimization are used to choose error measures in the absence of user-specified cost for using h instead of f.']}, {'end': 252.368, 'start': 109.535, 'title': 'Probabilistic learning and noisy targets', 'summary': 'Discusses the concept of noisy targets in machine learning, where the target variable is affected by a probability distribution rather than being a deterministic function, and emphasizes the consideration of transitional probability and joint probability distribution in generating training examples.', 'duration': 142.833, 'highlights': ['The target variable in machine learning may not be a deterministic function, but rather a probabilistic distribution affected by the input variable, as seen in examples like credit behavior and movie ratings.', 'The transitional probability from x to y and the joint probability distribution, which accommodates noise, play crucial roles in generating training examples and need to be considered when computing expected values for errors.', 'In the case of noisy targets, the joint probability distribution P(X) * P(Y|X) determines the independence between different pairs of input and output variables, requiring the consideration of probability with respect to both x and y when computing expected values for errors.']}], 'duration': 251.685, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU683.jpg', 'highlights': ['The error between the performance of the hypothesis and the target function on a particular point can be plugged into different error quantities, such as in-sample error and out-of-sample error, and averaged to estimate the error on training points and out-of-sample points.', 'The principled way of defining an error measure is by estimating the cost of using the hypothesis instead of the target function, allowing the user to specify the price to pay when using h instead of f.', 'The target variable in machine learning may not be a deterministic function, but rather a probabilistic distribution affected by the input variable, as seen in examples like credit behavior and movie ratings.', 'The transitional probability from x to y and the joint probability distribution, which accommodates noise, play crucial roles in generating training examples and need to be considered when computing expected values for errors.']}, {'end': 864.611, 'segs': [{'end': 606.193, 'src': 'embed', 'start': 578.25, 'weight': 2, 'content': [{'end': 580.911, 'text': 'And therefore, this guarantee is no guarantee at all.', 'start': 578.25, 'duration': 2.661}, {'end': 590.92, 'text': 'If we can replace M with another quantity and justify that, and that quantity is not infinite, even if the hypothesis set is infinite,', 'start': 582.092, 'duration': 8.828}, {'end': 601.99, 'text': 'then we are in business and we can start talking about the feasibility of learning in an actual model and be able to establish the notion in a way that we can apply to a real situation.', 'start': 590.92, 'duration': 11.07}, {'end': 603.311, 'text': "So that's the plan.", 'start': 602.731, 'duration': 0.58}, {'end': 606.193, 'text': 'So we are talking about M.', 'start': 604.972, 'duration': 1.221}], 'summary': 'Replacing m with another quantity can establish feasibility of learning in a model.', 'duration': 27.943, 'max_score': 578.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU578250.jpg'}, {'end': 704.702, 'src': 'embed', 'start': 677.523, 'weight': 1, 'content': [{'end': 684.847, 'text': 'Why is that? Because your learning algorithm is free to pick whichever hypothesis it wants based on the examples.', 'start': 677.523, 'duration': 7.324}, {'end': 695.775, 'text': 'So if you tell me that the probability of any of the bad events is small, then whichever hypothesis your algorithm picks, they will be OK.', 'start': 685.528, 'duration': 10.247}, {'end': 697.757, 'text': 'And I want that guarantee to be there.', 'start': 696.296, 'duration': 1.461}, {'end': 704.702, 'text': "So let's try to understand the probability of the B1 or B2 or BM.", 'start': 698.757, 'duration': 5.945}], 'summary': 'Learning algorithm picks hypothesis based on examples, probability of bad events is small', 'duration': 27.179, 'max_score': 677.523, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU677523.jpg'}, {'end': 764.184, 'src': 'embed', 'start': 728.012, 'weight': 0, 'content': [{'end': 729.373, 'text': 'There could be many situations.', 'start': 728.012, 'duration': 1.361}, {'end': 737.2, 'text': 'Now, the point of the bound is that we would like to make that statement regardless of the correlations between the events.', 'start': 730.373, 'duration': 6.827}, {'end': 747.37, 'text': 'And therefore we use the union bound, which actually bounds it by the total area of the first one plus the total area of the second one, et cetera,', 'start': 738.461, 'duration': 8.909}, {'end': 748.511, 'text': 'as if they were disjoint.', 'start': 747.37, 'duration': 1.141}, {'end': 752.634, 'text': 'That will always hold, regardless of the level of overlap.', 'start': 749.631, 'duration': 3.003}, {'end': 754.876, 'text': 'But you can see that this is a poor bound.', 'start': 753.014, 'duration': 1.862}, {'end': 764.184, 'text': "Because in this case we are estimating it to be about 3 times the area when it's actually closer to just the area because the overlap is so significant.", 'start': 755.556, 'duration': 8.628}], 'summary': 'The union bound is a poor estimation, overestimating by about 3 times due to significant overlap.', 'duration': 36.172, 'max_score': 728.012, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU728012.jpg'}], 'start': 252.368, 'title': 'Training vs testing', 'summary': 'Introduces the realistic relation between training and testing, emphasizing the mathematical framework, key quantities, and the notion of breakpoint leading to vc dimension. it also explains the importance of learning material and the need to replace m with a better quantity, aiming to handle the overlap situation in practical scenarios.', 'chapters': [{'end': 321.552, 'start': 252.368, 'title': 'Training vs testing: theory and realistic relation', 'summary': 'Introduces the concept of relating training to testing in-sample and out-of-sample in a realistic way, focusing on mathematical framework, key quantities, and the notion of breakpoint leading to vc dimension.', 'duration': 69.184, 'highlights': ['The chapter introduces the concept of relating training to testing in-sample and out-of-sample in a realistic way, focusing on mathematical framework, key quantities, and the notion of breakpoint leading to VC dimension.', 'The lecture will cover the theory track for three lectures, followed by another theory lecture on a related but different topic.', 'Introducing the key notion, the breakpoint, which will later result in the VC dimension, the main notion in the theory of learning.', 'The outline includes spending time talking about training versus testing, putting the mathematical framework that describes what is training versus testing, and introducing quantities that will be mathematically helpful in characterizing that relationship.']}, {'end': 864.611, 'start': 322.293, 'title': 'Training vs testing: key concepts', 'summary': 'Explains the concepts of training and testing in machine learning, emphasizing the importance of learning material and the need to replace m with a better quantity, while also discussing the origin of m and the goal of improving it to handle the overlap situation in practical scenarios.', 'duration': 542.318, 'highlights': ['The distinction between training and testing is explained with the analogy of practice problems being the training set and the final exam being the test, emphasizing the goal of learning the material and the importance of fixing hypotheses before taking the exam.', 'The mathematical descriptions of testing and training are presented, highlighting how they track understanding of the material and the impact of exploration on performance, leading to the goal of replacing M with a more suitable quantity to address the overlap situation in practical scenarios.', 'The concept and origin of M are discussed, along with the goal of improving it to handle the overlap situation in practical scenarios, emphasizing the need to abstract from the hypothesis set a quantity that characterizes the overlaps and achieves a good bound without intricate analysis.']}], 'duration': 612.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU252368.jpg', 'highlights': ['The chapter introduces the concept of relating training to testing in a realistic way, focusing on mathematical framework, key quantities, and the notion of breakpoint leading to VC dimension.', 'The distinction between training and testing is explained with the analogy of practice problems being the training set and the final exam being the test, emphasizing the goal of learning the material and the importance of fixing hypotheses before taking the exam.', 'The mathematical descriptions of testing and training are presented, highlighting how they track understanding of the material and the impact of exploration on performance, leading to the goal of replacing M with a more suitable quantity to address the overlap situation in practical scenarios.', 'Introducing the key notion, the breakpoint, which will later result in the VC dimension, the main notion in the theory of learning.']}, {'end': 2163.171, 'segs': [{'end': 1950.003, 'src': 'embed', 'start': 1901.06, 'weight': 0, 'content': [{'end': 1903.903, 'text': 'If I tell you what is M for the perceptron, infinity.', 'start': 1901.06, 'duration': 2.843}, {'end': 1904.744, 'text': 'And then you go home.', 'start': 1904.143, 'duration': 0.601}, {'end': 1907.186, 'text': 'What is the gross function of the perceptron?', 'start': 1905.625, 'duration': 1.561}, {'end': 1912.952, 'text': 'You have to tell me what is the gross function at n equals 1, what is at n equals 2, at n equals 3, at n equals 4..', 'start': 1907.206, 'duration': 5.746}, {'end': 1913.573, 'text': "It's a whole function.", 'start': 1912.952, 'duration': 0.621}, {'end': 1917.32, 'text': "So you say, one and two, it's easy.", 'start': 1914.758, 'duration': 2.562}, {'end': 1922.485, 'text': "Let's start with n equals 3.", 'start': 1917.381, 'duration': 5.104}, {'end': 1923.706, 'text': "So I'm choosing three points.", 'start': 1922.485, 'duration': 1.221}, {'end': 1928.091, 'text': 'And I chose them wisely, so that I can maximize the number of dichotomies.', 'start': 1924.347, 'duration': 3.744}, {'end': 1938.599, 'text': "And now I'm asking myself, what is the value of the gross function for the perceptron for the value N equals 3? Well, it's not that difficult.", 'start': 1929.112, 'duration': 9.487}, {'end': 1942.48, 'text': 'You can say, I can actually get everything there is to get.', 'start': 1938.619, 'duration': 3.861}, {'end': 1950.003, 'text': 'Why? Because I can have my line here, or I can have my line here, or I can have my line here.', 'start': 1942.84, 'duration': 7.163}], 'summary': 'The gross function of the perceptron for n=3 maximizes dichotomies.', 'duration': 48.943, 'max_score': 1901.06, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU1901060.jpg'}, {'end': 2036.587, 'src': 'embed', 'start': 2014.728, 'weight': 3, 'content': [{'end': 2023.3, 'text': 'And indeed, you can, with authority, answer the question that the growth function for this case, m at n equals 3, is 8.', 'start': 2014.728, 'duration': 8.572}, {'end': 2028.182, 'text': "Now let's see if we are still in luck when we go to n equals 4.", 'start': 2023.3, 'duration': 4.882}, {'end': 2032.825, 'text': "What is the gross function for 4 points? We'll choose the point in general position again.", 'start': 2028.182, 'duration': 4.643}, {'end': 2036.587, 'text': 'We are not going to have any collinearity in order to maximize our chances.', 'start': 2033.505, 'duration': 3.082}], 'summary': 'The growth function for m at n=3 is 8.', 'duration': 21.859, 'max_score': 2014.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2014728.jpg'}, {'end': 2132.588, 'src': 'embed', 'start': 2105.079, 'weight': 4, 'content': [{'end': 2108.603, 'text': "Because if it's the maximum possible, then we are declaring perceptrons are as strong as can be.", 'start': 2105.079, 'duration': 3.524}, {'end': 2109.764, 'text': 'So now they break.', 'start': 2109.003, 'duration': 0.761}, {'end': 2112.026, 'text': 'And they are limited.', 'start': 2111.005, 'duration': 1.021}, {'end': 2117.591, 'text': "And if I pick another model which, let's say, just for the extreme case, the set of all hypotheses,", 'start': 2112.686, 'duration': 4.905}, {'end': 2119.833, 'text': 'what would be the growth function for the set of all hypotheses?', 'start': 2117.591, 'duration': 2.242}, {'end': 2122.659, 'text': 'It would be 2 to the n, because I can generate anything.', 'start': 2120.717, 'duration': 1.942}, {'end': 2129.545, 'text': 'So now, according to this measure that I just introduced, the set of all hypotheses is stronger than the perceptrons.', 'start': 2123.56, 'duration': 5.985}, {'end': 2131.047, 'text': 'Satisfactory result.', 'start': 2130.066, 'duration': 0.981}, {'end': 2132.588, 'text': 'Simple, but satisfactory.', 'start': 2131.567, 'duration': 1.021}], 'summary': 'According to the measure introduced, the set of all hypotheses is stronger than perceptrons, with a growth function of 2 to the n.', 'duration': 27.509, 'max_score': 2105.079, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2105079.jpg'}], 'start': 865.672, 'title': 'Perceptron analysis and growth', 'summary': 'Delves into perceptron analysis, covering the target function, hypothesis, e in and e out, dichotomies, growth function, and limitations, with specific growth function values provided for n=3 and n=4.', 'chapters': [{'end': 908.43, 'start': 865.672, 'title': 'Perceptron example analysis', 'summary': 'Discusses the example of the perceptron, focusing on the target function and the hypothesis, using binary classification and emphasizing the concept of a badly performing hypothesis.', 'duration': 42.758, 'highlights': ['The target function for a perceptron returns plus 1 for some inputs and minus 1 for others, while the hypothesis, a perceptron, represents a badly performing hypothesis. (Relevance: 5)', 'The chapter emphasizes the use of binary classification, with plus 1 versus minus 1 for the target and hypothesis, and the concept of agreeing versus disagreeing in this context. (Relevance: 3)']}, {'end': 1120.679, 'start': 908.51, 'title': 'Perceptron e in and e out', 'summary': 'Explains the concepts of e in and e out in perceptrons, emphasizing the overlap and tracking of these quantities as hypotheses change.', 'duration': 212.169, 'highlights': ['The out-of-sample error for a hypothesis is determined by the areas of disagreement, where one area is labeled as +1 and the other as -1.', 'The in-sample error (En) is calculated by the fraction of misclassified points compared to all samples, illustrating the concept of E in.', 'The change in E out and E in when transitioning between hypotheses is small, as indicated by the small area and low probability of data points falling in this region.']}, {'end': 1444.481, 'start': 1120.699, 'title': 'Replacing m with a new quantity', 'summary': 'Introduces a new quantity to replace m, which counts the number of hypotheses on a restricted set of points, thus aiding in characterizing the complexity of any model used. this new quantity, called dichotomies, is motivated to replace m and is based on counting the number of patterns of red and blue for a fixed constellation of points.', 'duration': 323.782, 'highlights': ["The new quantity, called dichotomies, aims to replace M and is based on counting the number of patterns of red and blue for a fixed constellation of points. Introduces the new quantity 'dichotomies' to replace M, which counts the number of patterns of red and blue for a fixed constellation of points, motivating its use to characterize the complexity of any model.", 'The new quantity aims to count the number of hypotheses on a restricted set of points, aiding in characterizing the complexity of any model used. The new quantity aims to count the number of hypotheses on a restricted set of points, thus aiding in characterizing the complexity of any model used.', 'The chapter emphasizes the importance of understanding the new quantity well before proving its ability to replace M. Emphasizes the importance of understanding the new quantity well before proving its ability to replace M, setting the stage for a future lecture.']}, {'end': 1877.543, 'start': 1444.481, 'title': 'The gross function and dichotomies', 'summary': 'Explains the concept of dichotomies and the gross function, which counts the maximum dichotomies possible on any n points, providing an upper bound of 2 to the power of n for the number of dichotomies.', 'duration': 433.062, 'highlights': ['The gross function counts the most dichotomies using the hypothesis set on any n points, providing an upper bound of 2 to the n for the number of dichotomies. The gross function, denoted as m, is defined as the maximum number of dichotomies that can be obtained using the hypothesis set on any n points, with an upper bound of 2 to the power of n for the number of dichotomies.', 'M is the maximum number of dichotomies that can be obtained using the hypothesis set. The quantity M, which represents the maximum number of dichotomies that can be obtained using the hypothesis set, is being replaced by the gross function m.', 'The growth function is a function of n and is denoted as mh, representing the dependency on the hypothesis set. The growth function, denoted as mh, is a function of n and is dependent on the hypothesis set, providing a more complex representation compared to the previous quantity M.']}, {'end': 2163.171, 'start': 1878.104, 'title': 'Growth function of perceptrons', 'summary': 'Discusses the growth function of perceptrons, showing that for n=3, the growth function is 8, and for n=4, it is 14, indicating the limitations of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'duration': 285.067, 'highlights': ['The growth function for perceptrons at n=4 is 14, indicating the limitations of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'At n=3, the growth function for perceptrons is 8, demonstrating the limited capability of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'The chapter explains the concept of the growth function using examples, highlighting the challenges of computing the growth function for perceptrons with more than 4 points.']}], 'duration': 1297.499, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU865672.jpg', 'highlights': ['The growth function for perceptrons at n=4 is 14, indicating the limitations of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'At n=3, the growth function for perceptrons is 8, demonstrating the limited capability of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'The gross function, denoted as m, is defined as the maximum number of dichotomies that can be obtained using the hypothesis set on any n points, with an upper bound of 2 to the power of n for the number of dichotomies.', 'The new quantity aims to count the number of hypotheses on a restricted set of points, thus aiding in characterizing the complexity of any model used.', 'The out-of-sample error for a hypothesis is determined by the areas of disagreement, where one area is labeled as +1 and the other as -1.']}, {'end': 2519.14, 'segs': [{'end': 2216.399, 'src': 'embed', 'start': 2191.857, 'weight': 0, 'content': [{'end': 2200.704, 'text': 'From a point on, which we are going to call A, this is the parameter that decides one hypothesis versus the other in this particular hypothesis set.', 'start': 2191.857, 'duration': 8.847}, {'end': 2204.366, 'text': 'All the points that are bigger go to plus 1.', 'start': 2201.844, 'duration': 2.522}, {'end': 2206.007, 'text': 'All the points that are smaller go to minus 1.', 'start': 2204.366, 'duration': 1.641}, {'end': 2208.789, 'text': 'I call it positive ray, because here is the ray.', 'start': 2206.007, 'duration': 2.782}, {'end': 2211.091, 'text': 'Very simple hypothesis set.', 'start': 2209.81, 'duration': 1.281}, {'end': 2216.399, 'text': 'Now, in order to define the growth function, I need a bunch of points.', 'start': 2212.898, 'duration': 3.501}], 'summary': 'In a simple hypothesis set, points bigger than a go to +1, smaller go to -1, defining the growth function.', 'duration': 24.542, 'max_score': 2191.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2191857.jpg'}, {'end': 2287.786, 'src': 'embed', 'start': 2259.3, 'weight': 4, 'content': [{'end': 2262.702, 'text': 'That is the parameter that gives me one hypothesis versus the other.', 'start': 2259.3, 'duration': 3.402}, {'end': 2273.178, 'text': 'So formally? The hypothesis set is a set from the reals to And I can actually find an analytic formula here.', 'start': 2263.162, 'duration': 10.016}, {'end': 2280.322, 'text': "If you want an analytic formula, you remember the sign, this thing? If you apply it, that's exactly what I described.", 'start': 2273.218, 'duration': 7.104}, {'end': 2287.786, 'text': 'Now we ask ourselves, what is the growth function? Here is a very simple argument.', 'start': 2280.982, 'duration': 6.804}], 'summary': 'Comparing hypotheses using analytic formulas and growth functions.', 'duration': 28.486, 'max_score': 2259.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2259300.jpg'}], 'start': 2163.171, 'title': 'Growth functions', 'summary': 'Discusses positive rays and growth function, including determining growth function by varying parameters and the concepts of dichotomies and growth functions in relation to the number of line segments and intervals. it also provides specific growth function values for different scenarios, such as the gross function for choosing two intervals out of n plus 1 segments.', 'chapters': [{'end': 2287.786, 'start': 2163.171, 'title': 'Positive rays and growth function', 'summary': 'Discusses the concept of positive rays as a simple hypothesis set defined on the real line and the process of determining the growth function by varying the parameter to obtain different patterns on a set of n points.', 'duration': 124.615, 'highlights': ['The chapter introduces the concept of positive rays as a simple hypothesis set defined on the real line, where points greater than a certain parameter go to plus 1 and points smaller than the parameter go to minus 1. This simple hypothesis set, known as positive rays, categorizes points on the real line into plus 1 or minus 1 based on a parameter, providing a fundamental model for understanding growth functions.', 'The process of determining the growth function involves generating n points and applying the hypothesis to observe different patterns on these points by varying the parameter. By generating and applying n points to the hypothesis set, the chapter explains the process of determining the growth function by observing the different patterns that result from varying the parameter, providing insights into the analytic formula for the growth function.', "An analytic formula for the growth function is presented, utilizing the sign function to describe the variation of patterns on the set of n points. The chapter delves into an analytic formula for the growth function, utilizing the sign function to illustrate the variation of patterns on a set of n points, offering a deeper understanding of the growth function's mathematical representation."]}, {'end': 2519.14, 'start': 2289.086, 'title': 'Dichotomies and growth functions', 'summary': 'Discusses the concept of dichotomies and growth functions in relation to the number of line segments and intervals, finding that the growth function for n points is n plus 1 and the gross function for choosing two intervals out of n plus 1 segments is n plus 1 choose 2.', 'duration': 230.054, 'highlights': ['The growth function for n points is N plus 1. Regardless of what n is, the number of dichotomies in n points is exactly N plus 1.', 'The gross function for choosing two intervals out of N plus 1 segments is N plus 1 choose 2. The number of ways to pick two segments out of N plus 1 segments is N plus 1 choose 2, with an additional one needed to account for the case where all points are red.', 'The concept of positive intervals is more powerful than the previous one. Positive intervals are more powerful than positive rays as they allow for a one-to-one mapping between dichotomies and the choice of two intervals, leading to a gross function of N plus 1 choose 2.']}], 'duration': 355.969, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2163171.jpg', 'highlights': ['The concept of positive rays categorizes points on the real line into plus 1 or minus 1 based on a parameter, providing a fundamental model for understanding growth functions.', 'The chapter explains the process of determining the growth function by observing the different patterns that result from varying the parameter, providing insights into the analytic formula for the growth function.', "An analytic formula for the growth function is presented, utilizing the sign function to illustrate the variation of patterns on a set of n points, offering a deeper understanding of the growth function's mathematical representation.", 'The growth function for n points is N plus 1, and the number of dichotomies in n points is exactly N plus 1.', 'The number of ways to pick two segments out of N plus 1 segments is N plus 1 choose 2, with an additional one needed to account for the case where all points are red.', 'Positive intervals are more powerful than positive rays as they allow for a one-to-one mapping between dichotomies and the choice of two intervals, leading to a gross function of N plus 1 choose 2.']}, {'end': 3007.824, 'segs': [{'end': 2578.049, 'src': 'embed', 'start': 2551.63, 'weight': 3, 'content': [{'end': 2559.935, 'text': 'So if you look at the values of x, at which the hypothesis is plus 1, this has to be a convex region, any convex region.', 'start': 2551.63, 'duration': 8.305}, {'end': 2566.878, 'text': 'A convex region is a region where, if you pick any two points within the region,', 'start': 2562.834, 'duration': 4.044}, {'end': 2571.002, 'text': 'the entirety of the line segment connecting them lies within the region.', 'start': 2566.878, 'duration': 4.124}, {'end': 2571.923, 'text': "That's the definition.", 'start': 2571.202, 'duration': 0.721}, {'end': 2576.107, 'text': 'So this is my artwork for a convex region.', 'start': 2572.824, 'duration': 3.283}, {'end': 2578.049, 'text': 'You take any two points.', 'start': 2576.527, 'duration': 1.522}], 'summary': 'A convex region is defined as a region where the line segment connecting any two points lies entirely within the region.', 'duration': 26.419, 'max_score': 2551.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2551630.jpg'}, {'end': 2716.481, 'src': 'embed', 'start': 2686.755, 'weight': 0, 'content': [{'end': 2689.816, 'text': "If you succeed in that, then you don't care about this cloud.", 'start': 2686.755, 'duration': 3.061}, {'end': 2693.156, 'text': 'The other one will count, because you are taking the maximum.', 'start': 2691.056, 'duration': 2.1}, {'end': 2695.177, 'text': "So here's the way to do it.", 'start': 2693.756, 'duration': 1.421}, {'end': 2701.498, 'text': 'Take a circle, and put your points on the perimeter of that circle.', 'start': 2696.417, 'duration': 5.081}, {'end': 2708.917, 'text': 'Now, I maintain that you can get any dichotomy you want on these points.', 'start': 2704.795, 'duration': 4.122}, {'end': 2713.079, 'text': 'What is the argument? Well, pick your favorite one.', 'start': 2710.538, 'duration': 2.541}, {'end': 2716.481, 'text': 'I have a bunch of blues and a bunch of red.', 'start': 2714.66, 'duration': 1.821}], 'summary': 'Points on a circle can form any dichotomy.', 'duration': 29.726, 'max_score': 2686.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2686755.jpg'}, {'end': 2884.162, 'src': 'embed', 'start': 2850.465, 'weight': 1, 'content': [{'end': 2853.166, 'text': 'So sometimes that thing will be too much.', 'start': 2850.465, 'duration': 2.701}, {'end': 2858.627, 'text': 'But in general, you can see the trend that is more sophisticated, you get a bigger growth function.', 'start': 2853.486, 'duration': 5.141}, {'end': 2863.008, 'text': "Now let's go back to the big picture to see where that growth function will fit.", 'start': 2859.867, 'duration': 3.141}, {'end': 2868.769, 'text': 'Remember this inequality? Oh yes, we have seen it.', 'start': 2865.169, 'duration': 3.6}, {'end': 2869.97, 'text': 'We have seen it often.', 'start': 2868.909, 'duration': 1.061}, {'end': 2871.45, 'text': 'We are tired of it.', 'start': 2870.07, 'duration': 1.38}, {'end': 2875.651, 'text': 'What we are trying to do is replace M.', 'start': 2873.37, 'duration': 2.281}, {'end': 2884.162, 'text': 'And we decided to replace it with the gross function, m.', 'start': 2878.298, 'duration': 5.864}], 'summary': 'Replacing m with gross function m leads to bigger growth.', 'duration': 33.697, 'max_score': 2850.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2850465.jpg'}, {'end': 2964.505, 'src': 'embed', 'start': 2939.749, 'weight': 7, 'content': [{'end': 2946.838, 'text': 'And if you are patient enough, or if your customer has enough data, which will be an enormous amount of data, you will eventually get this to win.', 'start': 2939.749, 'duration': 7.089}, {'end': 2951.364, 'text': 'And you will get the probability to be diminishingly small, which means that you can generalize.', 'start': 2947.319, 'duration': 4.045}, {'end': 2960.582, 'text': "That's a very attractive observation, because now all you need to do is just declare that this is polynomial, and you are in business.", 'start': 2952.975, 'duration': 7.607}, {'end': 2964.505, 'text': "We saw that it's not that easy to evaluate this explicitly.", 'start': 2961.502, 'duration': 3.003}], 'summary': 'With an enormous amount of data, the probability can diminish to generalize, making it an attractive observation for polynomial declaration.', 'duration': 24.756, 'max_score': 2939.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2939749.jpg'}, {'end': 3017.63, 'src': 'embed', 'start': 2983.379, 'weight': 2, 'content': [{'end': 2985.119, 'text': 'But at least we know we can do it.', 'start': 2983.379, 'duration': 1.74}, {'end': 2994.343, 'text': "If you're given enough examples, you will be able to generalize from a finite set, albeit big, to the general space, with a probability assurance.", 'start': 2985.26, 'duration': 9.083}, {'end': 2996.104, 'text': "So that's pretty good.", 'start': 2995.404, 'duration': 0.7}, {'end': 2999.438, 'text': "I'm happy that this is the case.", 'start': 2998.237, 'duration': 1.201}, {'end': 3005.462, 'text': 'So maybe we can, as I mentioned, just prove that mh is polynomial, the gross function is polynomial.', 'start': 3000.059, 'duration': 5.403}, {'end': 3007.263, 'text': 'Can we do that? Maybe we can.', 'start': 3005.702, 'duration': 1.561}, {'end': 3007.824, 'text': 'Maybe we cannot.', 'start': 3007.283, 'duration': 0.541}, {'end': 3011.546, 'text': "So now here's the key notion that will enable us to do that.", 'start': 3009.305, 'duration': 2.241}, {'end': 3017.63, 'text': 'We are going to define what is called a breakpoint.', 'start': 3014.688, 'duration': 2.942}], 'summary': 'Generalizing from examples to the general space with probability assurance, potentially proving polynomial functions using the notion of breakpoints.', 'duration': 34.251, 'max_score': 2983.379, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2983379.jpg'}], 'start': 2519.9, 'title': 'Growth functions in learning', 'summary': 'Explores growth functions in learning, including their limitations and trade-offs, and emphasizes the potential of polynomial growth functions for feasible learning.', 'chapters': [{'end': 2716.481, 'start': 2519.9, 'title': 'Convex sets and growth functions', 'summary': 'Explores the concept of convex sets as hypotheses, demonstrating their variety and limitations, and discusses how to maximize the growth function by strategically placing points on a circle to achieve all possible dichotomies.', 'duration': 196.581, 'highlights': ['Convex sets are more powerful than linear hypotheses as they form quadratic regions, allowing for a variety of valid hypotheses. The quadratic nature of convex sets makes them more powerful than linear hypotheses.', 'A convex region is defined as a region where any two points within it have a line segment connecting them that lies entirely within the region, demonstrating the criteria for a valid hypothesis. The definition of a convex region is that any two points within it have a line segment connecting them that lies entirely within the region.', 'Strategically placing points on the perimeter of a circle allows for the attainment of all possible dichotomies using convex regions, maximizing the growth function. Placing points on the perimeter of a circle enables the attainment of all possible dichotomies using convex regions, maximizing the growth function.']}, {'end': 3007.824, 'start': 2716.861, 'title': 'Growth functions in learning', 'summary': 'Discusses the growth functions in learning, including the limitations of defining the gross function as the maximum and the trade-off between tightness of bounds and generality, with a focus on the potential of polynomial growth functions for feasible learning.', 'duration': 290.963, 'highlights': ['The gross function is 2 to the n, demonstrating the limitations of defining it as the maximum and the potential for learning when points are not on the perimeter of a circle.', 'The trade-off between tightness of bounds and generality is emphasized, with the understanding that a general result applicable to all situations may not be tightly bound in any given situation.', 'The comparison of growth functions for positive rays, positive intervals, and convex sets highlights the trend of increasing sophistication leading to a bigger growth function.', 'Replacing the constant M with the gross function m, particularly when it is polynomial, is presented as a feasible approach for learning, with the potential for generalization and probability assurance.', 'The significance of declaring a hypothesis set with a polynomial growth function as feasible for learning, given enough examples, is emphasized as a positive outcome.']}], 'duration': 487.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU2519900.jpg', 'highlights': ['Placing points on the perimeter of a circle enables the attainment of all possible dichotomies using convex regions, maximizing the growth function.', 'A convex region is defined as a region where any two points within it have a line segment connecting them that lies entirely within the region, demonstrating the criteria for a valid hypothesis.', 'Convex sets are more powerful than linear hypotheses as they form quadratic regions, allowing for a variety of valid hypotheses.', 'The comparison of growth functions for positive rays, positive intervals, and convex sets highlights the trend of increasing sophistication leading to a bigger growth function.', 'The significance of declaring a hypothesis set with a polynomial growth function as feasible for learning, given enough examples, is emphasized as a positive outcome.', 'Replacing the constant M with the gross function m, particularly when it is polynomial, is presented as a feasible approach for learning, with the potential for generalization and probability assurance.', 'The trade-off between tightness of bounds and generality is emphasized, with the understanding that a general result applicable to all situations may not be tightly bound in any given situation.', 'The gross function is 2 to the n, demonstrating the limitations of defining it as the maximum and the potential for learning when points are not on the perimeter of a circle.']}, {'end': 3625.185, 'segs': [{'end': 3200.5, 'src': 'embed', 'start': 3174.977, 'weight': 0, 'content': [{'end': 3182.949, 'text': 'We can plug in for N and we ask ourselves when do I get to the point where I no longer get 2 to the n numerically for a particular value?', 'start': 3174.977, 'duration': 7.972}, {'end': 3186.346, 'text': 'What is the breakpoint here?', 'start': 3185.485, 'duration': 0.861}, {'end': 3188.008, 'text': 'n equals 1..', 'start': 3187.247, 'duration': 0.761}, {'end': 3188.748, 'text': 'I get 1 plus 1.', 'start': 3188.008, 'duration': 0.74}, {'end': 3189.409, 'text': "That's 2.", 'start': 3188.748, 'duration': 0.661}, {'end': 3190.83, 'text': 'That also happens to be 2 to the 1.', 'start': 3189.409, 'duration': 1.421}, {'end': 3194.954, 'text': '2 n plus 1 is 3.', 'start': 3190.83, 'duration': 4.124}, {'end': 3196.856, 'text': "Oh, that's less than 4.", 'start': 3194.954, 'duration': 1.902}, {'end': 3200.5, 'text': 'So 2 must be a breakpoint.', 'start': 3196.856, 'duration': 3.644}], 'summary': 'The breakpoint is 2, as 2^1 is equal to 2 and 2^2 is greater than 3.', 'duration': 25.523, 'max_score': 3174.977, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3174977.jpg'}, {'end': 3325.301, 'src': 'embed', 'start': 3292.172, 'weight': 2, 'content': [{'end': 3302.397, 'text': "So what is the main result? The main result is that, first part will be, if you don't have a breakpoint, I have news for you.", 'start': 3292.172, 'duration': 10.225}, {'end': 3306.776, 'text': 'The gross function is 2 to the n.', 'start': 3304.335, 'duration': 2.441}, {'end': 3308.176, 'text': "Yes, that's the definition.", 'start': 3306.776, 'duration': 1.4}, {'end': 3308.816, 'text': 'Thank you.', 'start': 3308.256, 'duration': 0.56}, {'end': 3311.637, 'text': 'So that cannot possibly be the main result.', 'start': 3309.897, 'duration': 1.74}, {'end': 3316.739, 'text': 'So what is the main result? The main result is that you have a breakpoint.', 'start': 3311.657, 'duration': 5.082}, {'end': 3325.301, 'text': 'Any breakpoint, 1, 5, 7,000, just tell me that there is a breakpoint.', 'start': 3318.399, 'duration': 6.902}], 'summary': 'Main result: presence of breakpoint, any value, 1, 5, 7,000', 'duration': 33.129, 'max_score': 3292.172, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3292172.jpg'}, {'end': 3511.303, 'src': 'heatmap', 'start': 3415.839, 'weight': 1, 'content': [{'end': 3417.462, 'text': 'On those 100 points.', 'start': 3415.839, 'duration': 1.623}, {'end': 3421.93, 'text': 'for any choice of three guys, you cannot possibly have all possible combinations.', 'start': 3417.462, 'duration': 4.468}, {'end': 3426.277, 'text': 'And any three points, all 100 choose three of them.', 'start': 3423.032, 'duration': 3.245}, {'end': 3430.083, 'text': 'The combinatorial restriction is enormous.', 'start': 3427.441, 'duration': 2.642}, {'end': 3435.387, 'text': 'And you will end up losing possible dichotomies in draws because of that restriction.', 'start': 3430.724, 'duration': 4.663}, {'end': 3440.371, 'text': "And therefore, the thing that used to be 2 to the n if it's unrestricted will collapse to polynomial.", 'start': 3435.827, 'duration': 4.544}, {'end': 3445.094, 'text': "Let's take a puzzle and try to compute this in a particular case.", 'start': 3441.551, 'duration': 3.543}, {'end': 3445.875, 'text': 'Here is the puzzle.', 'start': 3445.134, 'duration': 0.741}, {'end': 3448.937, 'text': 'We have only three points.', 'start': 3447.896, 'duration': 1.041}, {'end': 3456.215, 'text': 'And for this hypothesis set, I am telling you that the breakpoint is 2.', 'start': 3450.418, 'duration': 5.797}, {'end': 3462.278, 'text': 'So you cannot get all possible four dichotomies on any two points.', 'start': 3456.215, 'duration': 6.063}, {'end': 3470.001, 'text': 'If you put x1 and x2, you cannot get minus 1, minus 1, minus 1, plus 1, plus 1, minus 1, and plus 1, plus 1, all of them.', 'start': 3462.418, 'duration': 7.583}, {'end': 3470.581, 'text': 'You cannot get them.', 'start': 3470.081, 'duration': 0.5}, {'end': 3471.802, 'text': 'One of them has to be missing.', 'start': 3470.701, 'duration': 1.101}, {'end': 3479.085, 'text': "So I'm asking you, given that this is the constraint, how many dichotomies can you get on three points?", 'start': 3473.302, 'duration': 5.783}, {'end': 3484.55, 'text': "You can see, this is what I'm trying to do, because I'm telling you that the restriction on 2 will give you.", 'start': 3480.029, 'duration': 4.521}, {'end': 3486.931, 'text': "If I didn't have the restriction, I would be putting 8..", 'start': 3484.55, 'duration': 2.381}, {'end': 3488.111, 'text': "So I'm just telling you that it gives.", 'start': 3486.931, 'duration': 1.18}, {'end': 3489.131, 'text': 'So how many do I get?', 'start': 3488.131, 'duration': 1}, {'end': 3498.734, 'text': "For visual clarity, I'm going to express them as either black or white circles, just for you to be able to, instead of writing minus 1 or plus 1..", 'start': 3490.312, 'duration': 8.422}, {'end': 3499.774, 'text': 'So this dichotomy is fine.', 'start': 3498.734, 'duration': 1.04}, {'end': 3500.754, 'text': "It doesn't violate anything.", 'start': 3499.794, 'duration': 0.96}, {'end': 3502.915, 'text': 'I have only one possibility.', 'start': 3500.934, 'duration': 1.981}, {'end': 3504.675, 'text': 'So we keep adding.', 'start': 3503.935, 'duration': 0.74}, {'end': 3507.561, 'text': 'Everything is fine.', 'start': 3506.9, 'duration': 0.661}, {'end': 3511.303, 'text': 'As a matter of fact, everything will remain fine until we get to 4.', 'start': 3507.881, 'duration': 3.422}], 'summary': 'Combinatorial restrictions limit possible dichotomies, collapsing to polynomial from 2^n if unrestricted.', 'duration': 95.464, 'max_score': 3415.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3415839.jpg'}], 'start': 3009.305, 'title': 'Breakpoints in hypothesis sets and combinatorial mathematics', 'summary': 'Explains breakpoints in hypothesis sets, their correspondence with complexity, and discusses breakpoints in combinatorial mathematics, exploring their impact on the number of possible dichotomies, and concluding that having a breakpoint guarantees the gross function to be polynomial in n.', 'chapters': [{'end': 3155.163, 'start': 3009.305, 'title': 'Understanding breakpoints in hypothesis sets', 'summary': 'Explains the concept of breakpoints in hypothesis sets, which are defined as the point at which the set fails to generate all possible dichotomies, and their correspondence with the complexity of the hypothesis set.', 'duration': 145.858, 'highlights': ['Breakpoint is defined as the point at which a hypothesis set fails to generate all possible dichotomies, and it has a correspondence with the complexity of the hypothesis set, with a higher breakpoint indicating a more capable hypothesis set.', 'The breakpoint characterizes the capability of the perceptrons, with a lower breakpoint indicating a limited capability of the hypothesis set.', 'Knowing the breakpoint of a hypothesis set allows for understanding its learning behavior without needing to know the input space.']}, {'end': 3625.185, 'start': 3155.223, 'title': 'Breakpoints in combinatorial mathematics', 'summary': 'Discusses the concept of breakpoints in combinatorial mathematics, exploring examples to identify breakpoints and their impact on the number of possible dichotomies, ultimately leading to the conclusion that having a breakpoint guarantees the gross function to be polynomial in n.', 'duration': 469.962, 'highlights': ['The main result is that having a breakpoint guarantees the gross function to be polynomial in N. This key result of the chapter emphasizes the significance of breakpoints in ensuring the polynomial nature of the gross function, providing a fundamental insight into combinatorial mathematics.', 'Identifying breakpoints through examples, such as 2 and 3, showcases the impact on the number of possible dichotomies, demonstrating the constraints imposed by breakpoints. The examples of 2 and 3 serve to illustrate how breakpoints restrict the number of possible dichotomies, highlighting the combinatorial limitations arising from breakpoints in mathematical computations.', 'Exploring the impact of breakpoints on the number of possible dichotomies on a specific set of points, revealing the substantial combinatorial restrictions imposed by breakpoints. The analysis of the impact of breakpoints on the number of possible dichotomies provides a tangible understanding of the constraints and limitations introduced by breakpoints, offering practical insights into combinatorial mathematics.']}], 'duration': 615.88, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3009305.jpg', 'highlights': ['Having a breakpoint guarantees the gross function to be polynomial in N.', 'Breakpoint characterizes the capability of perceptrons, with a lower breakpoint indicating limited capability.', 'Identifying breakpoints through examples showcases the impact on the number of possible dichotomies.']}, {'end': 3926.669, 'segs': [{'end': 3654.106, 'src': 'embed', 'start': 3626.226, 'weight': 2, 'content': [{'end': 3631.267, 'text': 'The only row you are going to be able to add to this table is this one.', 'start': 3626.226, 'duration': 5.041}, {'end': 3638.519, 'text': 'This is indeed the solution, and you can verify it at home.', 'start': 3635.558, 'duration': 2.961}, {'end': 3642.701, 'text': 'So now we know that indeed, the breakpoint is a very good restriction.', 'start': 3639.24, 'duration': 3.461}, {'end': 3650.004, 'text': 'And we are going in the next lecture to prove that it actually leads to a polynomial growth, which is the main result we want.', 'start': 3643.301, 'duration': 6.703}, {'end': 3654.106, 'text': 'Let me stop here, and we will take the questions after a short break.', 'start': 3650.564, 'duration': 3.542}], 'summary': 'The breakpoint is a very good restriction leading to polynomial growth.', 'duration': 27.88, 'max_score': 3626.226, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3626226.jpg'}, {'end': 3730.463, 'src': 'embed', 'start': 3704.022, 'weight': 1, 'content': [{'end': 3710.647, 'text': "What I'm going to do is that I'm going to apply a different approach to real-valued functions, which is the bias-variance tradeoff.", 'start': 3704.022, 'duration': 6.625}, {'end': 3719.294, 'text': "And it's a completely different approach from this one that will give us another angle on generalization that is particularly suitable for real-valued functions.", 'start': 3711.328, 'duration': 7.966}, {'end': 3726.32, 'text': "But the short answer is that if the function is not binary, there is a counterpart to what I'm saying that will work.", 'start': 3719.774, 'duration': 6.546}, {'end': 3730.463, 'text': "But it is significantly more technical than the one I'm developing.", 'start': 3726.58, 'duration': 3.883}], 'summary': 'Applying bias-variance tradeoff to real-valued functions for a different angle on generalization.', 'duration': 26.441, 'max_score': 3704.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3704022.jpg'}, {'end': 3799.03, 'src': 'embed', 'start': 3772.685, 'weight': 0, 'content': [{'end': 3777.065, 'text': "Because basically, you can get anything, so it doesn't mean anything that you fit the data.", 'start': 3772.685, 'duration': 4.38}, {'end': 3781.566, 'text': 'And therefore, you have less hope of generalization, which will be formalized through the theoretical results.', 'start': 3777.306, 'duration': 4.26}, {'end': 3788.448, 'text': 'And the correct answer is what is the good balance between the two extremes?', 'start': 3782.267, 'duration': 6.181}, {'end': 3793.109, 'text': "And then we'll find a value for which we are not exactly shattering the points,", 'start': 3788.668, 'duration': 4.441}, {'end': 3797.91, 'text': 'but we are not very restricted in which we are getting some approximation and we are getting some generalization.', 'start': 3793.109, 'duration': 4.801}, {'end': 3799.03, 'text': 'And that will come up.', 'start': 3798.15, 'duration': 0.88}], 'summary': 'Finding a good balance between shattering points and generalization for better approximation is crucial.', 'duration': 26.345, 'max_score': 3772.685, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3772685.jpg'}], 'start': 3626.226, 'title': 'Theory on generalization', 'summary': 'Delves into the significance of breakpoint, bias-variance tradeoff, and polynomial growth in generalization, offering insights on technical development for real-valued functions and the trade-off between fitting data and generalization.', 'chapters': [{'end': 3926.669, 'start': 3626.226, 'title': 'Theory on generalization', 'summary': 'Discusses the importance of breakpoint, the application of bias-variance tradeoff, and the significance of polynomial growth in the number of points for generalization with insights on the technical development for real-valued functions and the trade-off between fitting data and generalization.', 'duration': 300.443, 'highlights': ['The significance of polynomial growth in the number of points for generalization The chapter emphasizes the importance of polynomial growth in the number of points n, where polynomial time is considered acceptable due to the Hoeffding inequality guaranteeing a small probability of something bad happening for large enough n.', 'The application of bias-variance tradeoff for real-valued functions The chapter discusses the application of a different approach, bias-variance tradeoff, for real-valued functions, providing another angle on generalization, which is particularly suitable for such functions.', 'The importance of breakpoint as a very good restriction The chapter reinforces the significance of the breakpoint as a very good restriction, leading to a polynomial growth, which is the main result desired for generalization.']}], 'duration': 300.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3626226.jpg', 'highlights': ['The importance of breakpoint as a very good restriction', 'The application of bias-variance tradeoff for real-valued functions', 'The significance of polynomial growth in the number of points for generalization']}, {'end': 4603.696, 'segs': [{'end': 4163.519, 'src': 'embed', 'start': 4134.055, 'weight': 2, 'content': [{'end': 4134.796, 'text': 'So you take it away.', 'start': 4134.055, 'duration': 0.741}, {'end': 4137.497, 'text': 'And then the last one that is remaining is this guy.', 'start': 4135.176, 'duration': 2.321}, {'end': 4141.26, 'text': "And that also doesn't work, because it violates it for those guys.", 'start': 4137.658, 'duration': 3.602}, {'end': 4142.361, 'text': 'You can look at it and verify.', 'start': 4141.279, 'duration': 1.082}, {'end': 4144.783, 'text': 'And the conclusion here is that I cannot add anything.', 'start': 4142.72, 'duration': 2.063}, {'end': 4146.002, 'text': "So that's what I'm stuck with.", 'start': 4144.943, 'duration': 1.059}, {'end': 4154.049, 'text': 'And therefore, the number of different rows I can get under the constraint that 2 is a breakpoint, in this case, is 4.', 'start': 4146.404, 'duration': 7.645}, {'end': 4161.336, 'text': 'Obviously. the remark I mentioned is that maybe you can start instead of gradually from 0, 0, 0, 0, 0, 1,.', 'start': 4154.049, 'duration': 7.287}, {'end': 4163.519, 'text': 'maybe you can start more cleverly or something.', 'start': 4161.336, 'duration': 2.183}], 'summary': 'Unable to add anything, 2 breakpoint constraint allows 4 different rows.', 'duration': 29.464, 'max_score': 4134.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU4134055.jpg'}, {'end': 4205.919, 'src': 'embed', 'start': 4181.125, 'weight': 1, 'content': [{'end': 4189.131, 'text': 'specifically, does the probability measure change when you change from a hypothesis to dichotomy? For this one? Yeah.', 'start': 4181.125, 'duration': 8.006}, {'end': 4195.254, 'text': 'So the idea here, capital M is the number of hypotheses, period.', 'start': 4189.451, 'duration': 5.803}, {'end': 4197.295, 'text': "So it's infinity for perceptions.", 'start': 4195.734, 'duration': 1.561}, {'end': 4198.115, 'text': 'We have to live with that.', 'start': 4197.315, 'duration': 0.8}, {'end': 4205.919, 'text': 'In our attempt to replace it with the growth function, we are going to replace it by something that is not infinite, bounded above by 2 to the n.', 'start': 4199.156, 'duration': 6.763}], 'summary': 'Probability measure changes when switching from infinite hypotheses to bounded 2^n growth function.', 'duration': 24.794, 'max_score': 4181.125, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU4181125.jpg'}, {'end': 4530.388, 'src': 'embed', 'start': 4494.37, 'weight': 0, 'content': [{'end': 4499.993, 'text': 'What is a real-life situation, similar to the one in the puzzle, where you realize that this breakpoint may be too small?', 'start': 4494.37, 'duration': 5.623}, {'end': 4505.6, 'text': 'So the first order of business is to get the breakpoint out of there.', 'start': 4502.078, 'duration': 3.522}, {'end': 4507.16, 'text': 'That there is a breakpoint, we are in business.', 'start': 4505.62, 'duration': 1.54}, {'end': 4512.602, 'text': 'Second one is how does the value of the breakpoint relate to the learning situation??', 'start': 4507.82, 'duration': 4.782}, {'end': 4514.663, 'text': 'Do I need more examples when I have a bigger breakpoint??', 'start': 4512.622, 'duration': 2.041}, {'end': 4515.363, 'text': 'The answer is yes.', 'start': 4514.683, 'duration': 0.68}, {'end': 4518.484, 'text': 'What is the estimate? And there is a theoretical estimate, a bound.', 'start': 4515.823, 'duration': 2.661}, {'end': 4525.506, 'text': "Maybe the bound is too loose, so we'll have to find practical rules of thumb that translate the breakpoint to a number of examples.", 'start': 4518.824, 'duration': 6.682}, {'end': 4526.707, 'text': 'All of this is coming up.', 'start': 4525.806, 'duration': 0.901}, {'end': 4530.388, 'text': 'So the existence of the breakpoint means learning is feasible.', 'start': 4527.047, 'duration': 3.341}], 'summary': 'Identifying and adjusting breakpoint size for feasible learning conditions.', 'duration': 36.018, 'max_score': 4494.37, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU4494370.jpg'}], 'start': 3928.203, 'title': 'Puzzle constraints and breakpoints', 'summary': 'Discusses placing bits in rows to maximize different patterns under the constraint of not having all possible patterns on any two points, resulting in a maximum of 4 different rows. it also explores the transition from the number of hypotheses to dichotomies and the importance of polynomial growth functions for real learning models.', 'chapters': [{'end': 4298.825, 'start': 3928.203, 'title': 'Puzzle constraints and breakpoints', 'summary': 'Discusses placing bits in rows to maximize different patterns under the constraint of not having all possible patterns on any two points, resulting in a maximum of 4 different rows. it also explores the transition from the number of hypotheses to dichotomies and the importance of polynomial growth functions for real learning models.', 'duration': 370.622, 'highlights': ['The maximum number of different rows under the constraint that 2 is a breakpoint is 4, as having all possible patterns on any two points is not allowed, resulting in an inability to add additional rows.', 'The transition from the number of hypotheses to dichotomies involves replacing the infinite number of hypotheses with a bounded gross function, which is polynomial for real learning models like perceptrons and neural networks, leading to a manageable right-hand side and a high probability of generalization as n grows.', 'Finding breakpoints is not a one-size-fits-all approach and can involve arguments based on neural networks, estimation of growth functions, and identifying a point where the growth function becomes less than 2 to the power of n, resulting in a maximum estimate for the breakpoint.']}, {'end': 4603.696, 'start': 4299.105, 'title': 'Training and testing points allocation', 'summary': 'Discusses the allocation of training and testing points, addressing the interpretation of sample usage, rules of thumb for allocation, and the impact on in-sample error. additionally, it covers the concept of growth function and its relevance to the learning situation.', 'duration': 304.591, 'highlights': ['The chapter discusses the allocation of training and testing points, addressing the interpretation of sample usage, rules of thumb for allocation, and the impact on in-sample error.', 'The concept of growth function and its relevance to the learning situation is explored, including the existence and value of the breakpoint, and its role in determining the resources needed for achieving certain performance.', 'The discussion also touches on the probabilistic statement for the Hevding inequality as an alternative to the case-by-case discussion on the growth rate of N.']}], 'duration': 675.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/SEYAnnLazMU/pics/SEYAnnLazMU3928203.jpg', 'highlights': ['The transition from the number of hypotheses to dichotomies involves replacing the infinite number of hypotheses with a bounded gross function, which is polynomial for real learning models like perceptrons and neural networks, leading to a manageable right-hand side and a high probability of generalization as n grows.', 'The concept of growth function and its relevance to the learning situation is explored, including the existence and value of the breakpoint, and its role in determining the resources needed for achieving certain performance.', 'The chapter discusses the allocation of training and testing points, addressing the interpretation of sample usage, rules of thumb for allocation, and the impact on in-sample error.', 'Finding breakpoints is not a one-size-fits-all approach and can involve arguments based on neural networks, estimation of growth functions, and identifying a point where the growth function becomes less than 2 to the power of n, resulting in a maximum estimate for the breakpoint.']}], 'highlights': ['The error between the performance of the hypothesis and the target function on a particular point can be plugged into different error quantities, such as in-sample error and out-of-sample error, and averaged to estimate the error on training points and out-of-sample points.', 'The target variable in machine learning may not be a deterministic function, but rather a probabilistic distribution affected by the input variable, as seen in examples like credit behavior and movie ratings.', 'The transitional probability from x to y and the joint probability distribution, which accommodates noise, play crucial roles in generating training examples and need to be considered when computing expected values for errors.', 'The chapter introduces the concept of relating training to testing in a realistic way, focusing on mathematical framework, key quantities, and the notion of breakpoint leading to VC dimension.', 'The distinction between training and testing is explained with the analogy of practice problems being the training set and the final exam being the test, emphasizing the goal of learning the material and the importance of fixing hypotheses before taking the exam.', 'The growth function for perceptrons at n=4 is 14, indicating the limitations of perceptrons compared to the set of all hypotheses with a growth function of 2^n.', 'The gross function, denoted as m, is defined as the maximum number of dichotomies that can be obtained using the hypothesis set on any n points, with an upper bound of 2 to the power of n for the number of dichotomies.', 'The concept of positive rays categorizes points on the real line into plus 1 or minus 1 based on a parameter, providing a fundamental model for understanding growth functions.', 'Placing points on the perimeter of a circle enables the attainment of all possible dichotomies using convex regions, maximizing the growth function.', 'The trade-off between tightness of bounds and generality is emphasized, with the understanding that a general result applicable to all situations may not be tightly bound in any given situation.', 'Having a breakpoint guarantees the gross function to be polynomial in N.', 'The transition from the number of hypotheses to dichotomies involves replacing the infinite number of hypotheses with a bounded gross function, which is polynomial for real learning models like perceptrons and neural networks, leading to a manageable right-hand side and a high probability of generalization as n grows.', 'The concept of growth function and its relevance to the learning situation is explored, including the existence and value of the breakpoint, and its role in determining the resources needed for achieving certain performance.', 'Finding breakpoints is not a one-size-fits-all approach and can involve arguments based on neural networks, estimation of growth functions, and identifying a point where the growth function becomes less than 2 to the power of n, resulting in a maximum estimate for the breakpoint.']}