title

Lecture 02 - Is Learning Feasible?

description

Is Learning Feasible? - Can we generalize from a limited sample to the entire space? Relationship between in-sample and out-of-sample. Lecture 2 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on April 5, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 02 - Is Learning Feasible?', 'heatmap': [{'end': 1203.011, 'start': 1151.209, 'weight': 1}], 'summary': "Lecture 'is learning feasible?' covers essential criteria for machine learning, learning feasibility with marble experiment, sample frequency's impact on accuracy, trade-offs in learning, probability distribution, learning with multiple hypotheses, biased coins and probability, model generalization challenges, and the importance of data quantity for generalization, employing various examples and applications.", 'chapters': [{'end': 109.002, 'segs': [{'end': 109.002, 'src': 'embed', 'start': 42.556, 'weight': 0, 'content': [{'end': 51.824, 'text': "And we realize that this condition can be intuitively met in many applications, even if we don't know mathematically what the pattern is.", 'start': 42.556, 'duration': 9.268}, {'end': 54.846, 'text': 'The example we gave was the credit card approval.', 'start': 52.344, 'duration': 2.502}, {'end': 60.251, 'text': 'There is clearly a pattern if someone has a particular salary, has been in residence for so long,', 'start': 55.367, 'duration': 4.884}, {'end': 65.956, 'text': 'has that much debt and so on that this is somewhat correlated to their credit behavior.', 'start': 60.251, 'duration': 5.705}, {'end': 72.061, 'text': "And therefore, we know that the pattern exists, in spite of the fact that we don't know exactly what the pattern is.", 'start': 66.616, 'duration': 5.445}, {'end': 79.648, 'text': 'The second item is that we cannot pin down the pattern mathematically, like the example I just gave.', 'start': 73.342, 'duration': 6.306}, {'end': 82.251, 'text': 'And this is why we resort to machine learning.', 'start': 80.069, 'duration': 2.182}, {'end': 87.355, 'text': 'The third one is that we have data that represents that pattern.', 'start': 83.294, 'duration': 4.061}, {'end': 92.737, 'text': 'In the case of the credit application, for example, there are historical records of previous customers.', 'start': 87.595, 'duration': 5.142}, {'end': 97.319, 'text': 'And we have the data they wrote in their application when they applied.', 'start': 93.457, 'duration': 3.862}, {'end': 101.48, 'text': "And we have some years' worth of record of their credit behavior.", 'start': 97.759, 'duration': 3.721}, {'end': 109.002, 'text': 'So we have data that are going to enable us to correlate what they wrote in the application to their eventual credit behavior.', 'start': 101.96, 'duration': 7.042}], 'summary': 'Machine learning used for credit card approval based on correlated data.', 'duration': 66.446, 'max_score': 42.556, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA42556.jpg'}], 'start': 0.703, 'title': 'Machine learning criteria', 'summary': 'Introduces three criteria for determining if machine learning is suitable for a domain, including the existence of a pattern, inability to mathematically define the pattern, and the presence of data to represent the pattern, exemplified by the credit card approval process.', 'chapters': [{'end': 109.002, 'start': 0.703, 'title': 'Criteria for machine learning', 'summary': 'Introduces three criteria for determining if machine learning is suitable for a domain, including the existence of a pattern, inability to mathematically define the pattern, and the presence of data to represent the pattern, exemplified by the credit card approval process.', 'duration': 108.299, 'highlights': ['The existence of a pattern In many applications, even without knowing the mathematical pattern, it can be intuitively understood that a pattern exists, such as in the credit card approval process.', 'Inability to mathematically define the pattern Machine learning is resorted to when the pattern cannot be precisely determined mathematically, as exemplified in the credit card approval process.', 'Presence of data to represent the pattern Historical records of previous customers and their credit behavior, along with the application data, provide the necessary information to correlate the application details to credit behavior.']}], 'duration': 108.299, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA703.jpg', 'highlights': ['Presence of data to represent the pattern Historical records of previous customers and their credit behavior, along with the application data, provide the necessary information to correlate the application details to credit behavior.', 'The existence of a pattern In many applications, even without knowing the mathematical pattern, it can be intuitively understood that a pattern exists, such as in the credit card approval process.', 'Inability to mathematically define the pattern Machine learning is resorted to when the pattern cannot be precisely determined mathematically, as exemplified in the credit card approval process.']}, {'end': 468.529, 'segs': [{'end': 164.563, 'src': 'embed', 'start': 136.138, 'weight': 1, 'content': [{'end': 140.139, 'text': 'But the idea here is that when we develop the theory of learning,', 'start': 136.138, 'duration': 4.001}, {'end': 146.24, 'text': 'we will realize that you can apply the technique regardless of whether there is a pattern or not,', 'start': 140.139, 'duration': 6.101}, {'end': 149.641, 'text': 'and you are going to determine whether there is a pattern or not.', 'start': 146.24, 'duration': 3.401}, {'end': 156.002, 'text': 'So you are not going to be fooled and think, I learned, and then give the system to your customer, and the customer will be disappointed.', 'start': 150.141, 'duration': 5.861}, {'end': 160.063, 'text': 'There is something you can actually measure that will tell you whether you learned or not.', 'start': 156.382, 'duration': 3.681}, {'end': 164.563, 'text': 'So if there is no pattern, there is no harm done in trying machine learning.', 'start': 160.703, 'duration': 3.86}], 'summary': 'Theory of learning can be applied regardless of pattern presence. machine learning worth trying if no pattern found.', 'duration': 28.425, 'max_score': 136.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA136138.jpg'}, {'end': 203.304, 'src': 'embed', 'start': 178.68, 'weight': 0, 'content': [{'end': 186.385, 'text': 'If you can outright program it and find the result perfectly, then why bother generating examples and try to learn and go through all of that?', 'start': 178.68, 'duration': 7.705}, {'end': 188.626, 'text': 'But machine learning is not going to refuse.', 'start': 186.725, 'duration': 1.901}, {'end': 191.528, 'text': "It is going to learn, and it's going to give you a system.", 'start': 189.086, 'duration': 2.442}, {'end': 194.39, 'text': "It may not be the best system in this case, but it's a system nonetheless.", 'start': 191.548, 'duration': 2.842}, {'end': 198.62, 'text': "The third one, I'm afraid you cannot do without.", 'start': 195.717, 'duration': 2.903}, {'end': 200.221, 'text': 'You have to have data.', 'start': 199.1, 'duration': 1.121}, {'end': 203.304, 'text': 'Machine learning is about learning from data.', 'start': 201.022, 'duration': 2.282}], 'summary': 'Machine learning learns from data to generate a system, even if not perfect.', 'duration': 24.624, 'max_score': 178.68, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA178680.jpg'}, {'end': 312.656, 'src': 'embed', 'start': 286.63, 'weight': 2, 'content': [{'end': 292.132, 'text': 'So in spite of the fact that the target function is generally unknown, it is known on the data that I give you.', 'start': 286.63, 'duration': 5.502}, {'end': 298.715, 'text': 'This is the data that you are going to use as training examples, and that you are going to use to figure out what the target function is.', 'start': 292.452, 'duration': 6.263}, {'end': 303.077, 'text': 'So in the case of supervised learning, you have the targets explicitly.', 'start': 299.575, 'duration': 3.502}, {'end': 309.934, 'text': "In the other cases you have less information than the target, and we talked about it, like unsupervised learning where you don't have anything,", 'start': 304.15, 'duration': 5.784}, {'end': 312.656, 'text': 'and reinforcement learning where you have partial information,', 'start': 309.934, 'duration': 2.722}], 'summary': 'Supervised learning has explicit targets, while unsupervised and reinforcement learning have less information.', 'duration': 26.026, 'max_score': 286.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA286630.jpg'}, {'end': 368.434, 'src': 'embed', 'start': 340.747, 'weight': 4, 'content': [{'end': 345.129, 'text': 'And hopefully, g approximates f, the actual target function, which remains unknown.', 'start': 340.747, 'duration': 4.382}, {'end': 347.95, 'text': 'And g is picked from a hypothesis set.', 'start': 345.809, 'duration': 2.141}, {'end': 352.939, 'text': 'And the general symbol for a member of the hypothesis set is small h.', 'start': 348.65, 'duration': 4.289}, {'end': 354.923, 'text': 'So small h is a generic hypothesis.', 'start': 352.939, 'duration': 1.984}, {'end': 357.127, 'text': 'The one you happen to pick, you are going to call G.', 'start': 355.244, 'duration': 1.883}, {'end': 362.13, 'text': 'Now, we looked at an example of a learning algorithm.', 'start': 359.369, 'duration': 2.761}, {'end': 368.434, 'text': 'So first, the learning model, the perceptron itself, which is a linear function, thresholded.', 'start': 362.331, 'duration': 6.103}], 'summary': 'The learning model, perceptron, is a linear function thresholded to approximate the unknown target function.', 'duration': 27.687, 'max_score': 340.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA340747.jpg'}, {'end': 453.543, 'src': 'embed', 'start': 421.92, 'weight': 5, 'content': [{'end': 422.84, 'text': 'can we actually learn??', 'start': 421.92, 'duration': 0.92}, {'end': 426.922, 'text': "So we said it's unknown function.", 'start': 424.381, 'duration': 2.541}, {'end': 429.503, 'text': 'Unknown function is an attractive assumption, as I said.', 'start': 426.982, 'duration': 2.521}, {'end': 436.567, 'text': "But can we learn an unknown function, really? And then we realize that if you look at it, it's really impossible.", 'start': 430.183, 'duration': 6.384}, {'end': 444.274, 'text': "Why is it impossible? Because I am going to give you a finite data set, and I'm going to give you the value of the function on this set.", 'start': 436.748, 'duration': 7.526}, {'end': 449.099, 'text': "Good, Now I'm going to ask you what is the function outside that set?", 'start': 444.655, 'duration': 4.444}, {'end': 453.543, 'text': 'How in the world are you going to tell what the function is outside?', 'start': 450.2, 'duration': 3.343}], 'summary': 'Learning an unknown function from finite data is impossible.', 'duration': 31.623, 'max_score': 421.92, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA421920.jpg'}], 'start': 109.382, 'title': 'Essential criteria for machine learning', 'summary': 'Discusses the three essential criteria for machine learning, highlighting the need for data, the ability to learn without patterns, and the unknown nature of the target function in supervised learning. it also explains supervised learning and the challenges of learning an unknown function with finite data.', 'chapters': [{'end': 247.552, 'start': 109.382, 'title': 'Essential criteria for machine learning', 'summary': 'Discusses the three criteria for machine learning, emphasizing the essential need for data, the ability to learn regardless of the presence of a pattern, and the unknown nature of the target function in supervised learning.', 'duration': 138.17, 'highlights': ['Machine learning requires data, without which no learning can occur. The chapter emphasizes the essential need for data in machine learning, stating that without data, there is absolutely nothing one can do.', 'The ability to learn regardless of the presence of a pattern is a key aspect of machine learning. It is highlighted that machine learning can be applied regardless of whether there is a pattern or not, and the technique can be used to determine the presence of a pattern.', 'The unknown nature of the target function in supervised learning is a critical property emphasized in the chapter. The chapter insists on the unknown nature of the target function in supervised learning, using the example of a target function corresponding to a credit application, denoted as f, and its unknown nature.']}, {'end': 468.529, 'start': 248.152, 'title': 'Supervised learning and learning algorithms', 'summary': 'Explains supervised learning, where the target function is generally unknown but is known on the given data used as training examples, and discusses the challenges of learning an unknown function with finite data.', 'duration': 220.377, 'highlights': ['Supervised learning provides explicit target outputs in addition to input data for learning, offering more information compared to unsupervised and reinforcement learning. In supervised learning, the target function is generally unknown, but it is known on the data provided for training examples. This provides explicit target outputs along with input data, offering more information compared to unsupervised and reinforcement learning.', "The learning algorithm produces a hypothesis, denoted as 'g', which ideally approximates the actual target function 'f', chosen from a hypothesis set denoted as 'h'. The learning algorithm produces a hypothesis denoted as 'g', which ideally approximates the actual target function 'f', chosen from a hypothesis set denoted as 'h'. This process aims to find a hypothesis that approximates the unknown target function.", "Challenges of learning an unknown function with finite data are discussed, highlighting the impossibility of accurately predicting the function's behavior outside the given dataset. The lecture addresses the challenge of learning an unknown function with finite data, emphasizing the impossibility of accurately predicting the function's behavior outside the given dataset. This limitation arises from the function's unknown nature, which can assume any value outside the provided dataset."]}], 'duration': 359.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA109382.jpg', 'highlights': ['Machine learning requires data for learning to occur.', 'The ability to learn regardless of the presence of a pattern is a key aspect of machine learning.', 'The unknown nature of the target function in supervised learning is a critical property.', 'Supervised learning provides explicit target outputs in addition to input data for learning.', 'The learning algorithm produces a hypothesis that ideally approximates the actual target function.', 'Challenges of learning an unknown function with finite data are discussed.']}, {'end': 760.942, 'segs': [{'end': 496.622, 'src': 'embed', 'start': 469.349, 'weight': 0, 'content': [{'end': 474.25, 'text': "So it doesn't look like the statement we made is feasible in terms of learning.", 'start': 469.349, 'duration': 4.901}, {'end': 476.651, 'text': 'And therefore, we have to do something about it.', 'start': 474.87, 'duration': 1.781}, {'end': 481.392, 'text': 'And what we are going to do about it is the subject of this lecture.', 'start': 477.511, 'duration': 3.881}, {'end': 494.379, 'text': 'Now, the lecture is called Is Learning Feasible? And I am going to address this question in extreme detail from beginning to end.', 'start': 483.243, 'duration': 11.136}, {'end': 496.622, 'text': 'This is the only topic for this lecture.', 'start': 494.639, 'duration': 1.983}], 'summary': 'Learning feasibility will be addressed in detail in the lecture.', 'duration': 27.273, 'max_score': 469.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA469349.jpg'}, {'end': 592.484, 'src': 'embed', 'start': 519.472, 'weight': 1, 'content': [{'end': 524.077, 'text': 'So we are going to answer it in a way that is concrete and where the mathematics is very friendly.', 'start': 519.472, 'duration': 4.605}, {'end': 531.053, 'text': "And then after that, I'm going to be able to relate that probabilistic situation to learning as we stated.", 'start': 525.051, 'duration': 6.002}, {'end': 532.393, 'text': 'It will take two stages.', 'start': 531.073, 'duration': 1.32}, {'end': 536.955, 'text': 'First, I will just translate the expressions into something that relates to learning.', 'start': 532.694, 'duration': 4.261}, {'end': 542.757, 'text': 'And then we will move forward and make it correspond to real learning.', 'start': 537.435, 'duration': 5.322}, {'end': 543.917, 'text': "And that's the last one.", 'start': 543.217, 'duration': 0.7}, {'end': 549.08, 'text': 'And then after we do that, and we think we are done, we find that there is a serious dilemma that we have.', 'start': 544.358, 'duration': 4.722}, {'end': 556.724, 'text': 'And we will find a solution to that dilemma, and then declare victory that, indeed, learning is feasible in a very particular sense.', 'start': 549.721, 'duration': 7.003}, {'end': 561.287, 'text': "So let's start with the experiment that I talked about.", 'start': 558.706, 'duration': 2.581}, {'end': 563.668, 'text': 'Consider the following situation.', 'start': 562.348, 'duration': 1.32}, {'end': 565.509, 'text': 'You have a bin.', 'start': 564.949, 'duration': 0.56}, {'end': 567.811, 'text': 'And the bin has marbles.', 'start': 566.75, 'duration': 1.061}, {'end': 570.192, 'text': 'The marbles are either red or green.', 'start': 568.471, 'duration': 1.721}, {'end': 573.597, 'text': "That's what it looks like.", 'start': 572.777, 'duration': 0.82}, {'end': 578.379, 'text': 'And we are going to do an experiment with this bin.', 'start': 575.258, 'duration': 3.121}, {'end': 584.201, 'text': 'And the experiment is to pick a sample from the bin, some marbles.', 'start': 580.46, 'duration': 3.741}, {'end': 587.362, 'text': "So let's formalize what the probability distribution is.", 'start': 584.741, 'duration': 2.621}, {'end': 590.364, 'text': 'There is a probability of picking a red marble.', 'start': 588.063, 'duration': 2.301}, {'end': 592.484, 'text': "And let's call it mu.", 'start': 591.044, 'duration': 1.44}], 'summary': 'Demonstrating learning feasibility through a probabilistic experiment with red and green marbles.', 'duration': 73.012, 'max_score': 519.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA519472.jpg'}, {'end': 720.519, 'src': 'embed', 'start': 689.303, 'weight': 3, 'content': [{'end': 690.924, 'text': 'And it will have some red and some green.', 'start': 689.303, 'duration': 1.621}, {'end': 692.424, 'text': "It's a probabilistic situation.", 'start': 691.064, 'duration': 1.36}, {'end': 699.758, 'text': 'And we are going to call the fraction of marbles in the sample.', 'start': 693.753, 'duration': 6.005}, {'end': 702.86, 'text': 'So this now is a probabilistic quantity.', 'start': 700.998, 'duration': 1.862}, {'end': 704.861, 'text': 'Mu is an unknown constant sitting there.', 'start': 702.9, 'duration': 1.961}, {'end': 711.166, 'text': 'If you pick a sample, someone else picks a sample, you will have a different frequency in sample from the other person.', 'start': 705.742, 'duration': 5.424}, {'end': 714.932, 'text': 'And we are going to call it nu.', 'start': 711.646, 'duration': 3.286}, {'end': 720.519, 'text': 'Now, interestingly enough, nu also should appear in the figure.', 'start': 716.173, 'duration': 4.346}], 'summary': 'The situation involves red and green marbles, with a probabilistic quantity nu, representing different frequencies in samples.', 'duration': 31.216, 'max_score': 689.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA689303.jpg'}], 'start': 469.349, 'title': 'Learning feasibility and marble experiment', 'summary': 'Explores the feasibility of learning, providing a detailed analysis and concludes its feasibility in a particular sense. additionally, it delves into the probability distribution of red and green marbles in an experiment, represented by mu and 1 minus mu, and their relation to the fraction of red marbles in a sample, denoted as nu.', 'chapters': [{'end': 565.509, 'start': 469.349, 'title': 'Is learning feasible?', 'summary': "Addresses the question 'is learning feasible?' in extreme detail, starting with a simple probabilistic situation and then relating it to learning, ultimately declaring that learning is feasible in a very particular sense.", 'duration': 96.16, 'highlights': ["The lecture is solely focused on the question 'Is Learning Feasible?' and will delve into it extensively.", 'The lecture will begin by discussing a simple probabilistic situation and then relating it to learning.', 'The speaker aims to declare that learning is feasible in a very particular sense after addressing a serious dilemma and finding a solution.']}, {'end': 760.942, 'start': 566.75, 'title': 'Probability distribution in marbles experiment', 'summary': 'Discusses the experiment of picking marbles from a bin, where the probability distribution of red and green marbles, represented by mu and 1 minus mu respectively, is explored. the unknown constant mu is related to the fraction of red marbles in a sample, denoted as nu.', 'duration': 194.192, 'highlights': ['The experiment involves picking marbles from a bin with a probability distribution of red and green marbles, represented by mu and 1 minus mu respectively. The chapter emphasizes the experiment of picking marbles from a bin, highlighting the probability distribution of red and green marbles represented by mu and 1 minus mu, where mu is the probability of picking a red marble and 1 minus mu is the probability of picking a green marble.', 'The unknown constant mu is related to the fraction of red marbles in a sample, denoted as nu. The discussion points out the relationship between the unknown constant mu and the fraction of red marbles in a sample, denoted as nu, emphasizing the probabilistic nature of nu and its variability between different samples.', 'The chapter delves into the probabilistic nature of the experiment and its relevance to machine learning concepts. The chapter explores the probabilistic nature of the experiment, drawing parallels to machine learning concepts and the importance of understanding the unknown constant mu in relation to the fraction of red marbles in a sample.']}], 'duration': 291.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA469349.jpg', 'highlights': ["The lecture is solely focused on the question 'Is Learning Feasible?' and will delve into it extensively.", 'The lecture will begin by discussing a simple probabilistic situation and then relating it to learning.', 'The speaker aims to declare that learning is feasible in a very particular sense after addressing a serious dilemma and finding a solution.', 'The chapter explores the probabilistic nature of the experiment, drawing parallels to machine learning concepts and the importance of understanding the unknown constant mu in relation to the fraction of red marbles in a sample.', 'The unknown constant mu is related to the fraction of red marbles in a sample, denoted as nu. The discussion points out the relationship between the unknown constant mu and the fraction of red marbles in a sample, denoted as nu, emphasizing the probabilistic nature of nu and its variability between different samples.', 'The experiment involves picking marbles from a bin with a probability distribution of red and green marbles, represented by mu and 1 minus mu respectively. The chapter emphasizes the experiment of picking marbles from a bin, highlighting the probability distribution of red and green marbles represented by mu and 1 minus mu, where mu is the probability of picking a red marble and 1 minus mu is the probability of picking a green marble.']}, {'end': 1188.238, 'segs': [{'end': 809.705, 'src': 'embed', 'start': 761.763, 'weight': 1, 'content': [{'end': 767.868, 'text': 'Does nu, which is the sample frequency, tell us anything about mu?', 'start': 761.763, 'duration': 6.105}, {'end': 771.531, 'text': 'which is the actual frequency in the bin that we are interested in knowing??', 'start': 767.868, 'duration': 3.663}, {'end': 774.713, 'text': 'The short answer is no.', 'start': 773.853, 'duration': 0.86}, {'end': 792.972, 'text': 'Why? Because the sample can be mostly green, while the bin is mostly red.', 'start': 776.815, 'duration': 16.157}, {'end': 807.523, 'text': 'Anybody doubts that? The thing could have 90% red, and I pick 100 marbles, and all of them happen to be green.', 'start': 794.974, 'duration': 12.549}, {'end': 809.705, 'text': 'This is possible.', 'start': 808.684, 'duration': 1.021}], 'summary': 'Sample frequency (nu) does not necessarily reflect actual frequency in the bin of interest.', 'duration': 47.942, 'max_score': 761.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA761763.jpg'}, {'end': 894.677, 'src': 'embed', 'start': 835.579, 'weight': 0, 'content': [{'end': 847.377, 'text': 'yes?. Because if you know a little bit about probability, You realize that if the sample is big enough, the sample frequency,', 'start': 835.579, 'duration': 11.798}, {'end': 854.823, 'text': 'which is nu the mysterious disappearing quantity here, that is likely to be close to mu.', 'start': 847.377, 'duration': 7.446}, {'end': 858.186, 'text': 'Think of a presidential poll.', 'start': 856.865, 'duration': 1.321}, {'end': 862.95, 'text': 'There are maybe 100 million or more voters in the US.', 'start': 859.827, 'duration': 3.123}, {'end': 866.529, 'text': 'And you make a poll of 3, 000 people.', 'start': 864.168, 'duration': 2.361}, {'end': 868.889, 'text': 'You have 3, 000 marbles, so to speak.', 'start': 866.749, 'duration': 2.14}, {'end': 873.17, 'text': 'And you look at the result in the marbles, and you tell me how the 100 million will vote.', 'start': 869.71, 'duration': 3.46}, {'end': 877.752, 'text': 'How the heck did you know that? So now the statistics come in.', 'start': 873.671, 'duration': 4.081}, {'end': 879.612, 'text': "That's where the probability plays a role.", 'start': 877.952, 'duration': 1.66}, {'end': 885.654, 'text': 'And the main distinction between the two answers is possible versus probable.', 'start': 880.572, 'duration': 5.082}, {'end': 894.677, 'text': 'In science and in engineering, you go a huge distance by settling for not absolutely certain, but almost certain.', 'start': 886.733, 'duration': 7.944}], 'summary': 'Probability and statistics help predict outcomes; sample size impacts accuracy.', 'duration': 59.098, 'max_score': 835.579, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA835579.jpg'}, {'end': 1084.919, 'src': 'embed', 'start': 1024.295, 'weight': 2, 'content': [{'end': 1029.92, 'text': 'Well, how small can we guarantee it? Good news.', 'start': 1024.295, 'duration': 5.625}, {'end': 1034.082, 'text': "It's e to the minus n.", 'start': 1031.761, 'duration': 2.321}, {'end': 1035.443, 'text': "It's a negative exponential.", 'start': 1034.082, 'duration': 1.361}, {'end': 1039.846, 'text': 'That is great, because negative exponentials tend to die very fast.', 'start': 1036.164, 'duration': 3.682}, {'end': 1043.848, 'text': 'So if you get a bigger sample, this will be a diminishingly small probability.', 'start': 1040.246, 'duration': 3.602}, {'end': 1046.89, 'text': 'So the probability of something bad happening will be very small.', 'start': 1044.288, 'duration': 2.602}, {'end': 1052.553, 'text': 'And we can claim that, indeed, nu will be within epsilon from mu.', 'start': 1047.329, 'duration': 5.224}, {'end': 1055.715, 'text': 'And we will be wrong for a very minute amount of the time.', 'start': 1052.973, 'duration': 2.742}, {'end': 1058.176, 'text': "But that's the good news.", 'start': 1056.976, 'duration': 1.2}, {'end': 1060.627, 'text': 'Now the bad news.', 'start': 1059.707, 'duration': 0.92}, {'end': 1064.968, 'text': 'Ouch! Epsilon is our tolerance.', 'start': 1061.387, 'duration': 3.581}, {'end': 1072.13, 'text': "If you are a very tolerant person, you say, I just want nu and mu to be within, let's say, 0.1.", 'start': 1065.848, 'duration': 6.282}, {'end': 1073.61, 'text': "That's not very much to ask.", 'start': 1072.13, 'duration': 1.48}, {'end': 1081.512, 'text': 'Now, the price you pay for that is that you plug in the exponent, not epsilon, but epsilon squared.', 'start': 1074.69, 'duration': 6.822}, {'end': 1084.919, 'text': 'So that becomes 0.01.', 'start': 1081.612, 'duration': 3.307}], 'summary': 'Using a negative exponential, a bigger sample ensures a very small probability, with the trade-off of a squared tolerance for a small amount of error.', 'duration': 60.624, 'max_score': 1024.295, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1024295.jpg'}, {'end': 1166.92, 'src': 'embed', 'start': 1135.002, 'weight': 5, 'content': [{'end': 1137.262, 'text': 'We just put 2 here and 2 there.', 'start': 1135.002, 'duration': 2.26}, {'end': 1142.183, 'text': "Now, between you and me, I prefer the original formula better, without the 2's.", 'start': 1137.882, 'duration': 4.301}, {'end': 1149.208, 'text': 'However, the formula with the twos has a distinct advantage of being true.', 'start': 1143.444, 'duration': 5.764}, {'end': 1152.29, 'text': 'So we have to settle for that.', 'start': 1151.209, 'duration': 1.081}, {'end': 1160.135, 'text': "Now, that inequality is called Herfding's inequality.", 'start': 1153.611, 'duration': 6.524}, {'end': 1164.078, 'text': 'It is the main inequality we are going to be using in the course.', 'start': 1160.796, 'duration': 3.282}, {'end': 1166.92, 'text': 'You can look for the proof.', 'start': 1164.819, 'duration': 2.101}], 'summary': "Herfding's inequality with 2's has distinct advantage over the original formula.", 'duration': 31.918, 'max_score': 1135.002, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1135002.jpg'}], 'start': 761.763, 'title': 'Sample frequency and probabilistic approximation', 'summary': "Explores the relationship between sample frequency and actual frequency, emphasizing the impact of sample size on accuracy, demonstrated through a presidential poll. it also discusses herfding's inequality and probabilistic approximation, highlighting the significance of epsilon as a tolerance and its impact on the probability of approximation.", 'chapters': [{'end': 894.677, 'start': 761.763, 'title': 'Sample frequency vs actual frequency', 'summary': 'Discusses the relationship between sample frequency and actual frequency, highlighting that while the sample frequency nu may not provide accurate information about the actual frequency mu, with a large enough sample size, nu is likely to be close to mu, as demonstrated through the example of a presidential poll.', 'duration': 132.914, 'highlights': ['The sample frequency nu may not accurately represent the actual frequency mu, as the sample can be mostly green while the bin is mostly red, demonstrating the possibility of picking 100 marbles with 90% red and all turning out to be green.', 'With a large enough sample size, the sample frequency nu is likely to be close to the actual frequency mu, as exemplified by a poll of 3,000 people accurately predicting the voting behavior of 100 million voters in the US.', 'In science and engineering, settling for almost certain probabilities is valuable, as it allows for significant progress despite not being absolutely certain about the outcomes.']}, {'end': 1188.238, 'start': 895.078, 'title': "Herfding's inequality and probabilistic approximation", 'summary': "Discusses herfding's inequality and the concept of probabilistic approximation, emphasizing the relationship between sample frequency and bin frequency, the significance of epsilon as a tolerance, and the impact of different epsilon values on the probability of approximation, with the formula e to the power of -n representing the probability of approximation.", 'duration': 293.16, 'highlights': ['The formula e to the minus n represents the probability of nu being within epsilon of mu, with larger samples leading to a diminishingly small probability of nu not approximating mu well.', 'Epsilon serves as the tolerance, impacting the probability of approximation, where a more stringent tolerance like 10 to the power of -6 significantly reduces the benefit of the negative exponential.', "The concept of Herfding's inequality is introduced as the main inequality used in the course, providing the foundation for proving concepts related to the VC dimension."]}], 'duration': 426.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA761763.jpg', 'highlights': ['With a large enough sample size, the sample frequency nu is likely to be close to the actual frequency mu, as exemplified by a poll of 3,000 people accurately predicting the voting behavior of 100 million voters in the US.', 'The sample frequency nu may not accurately represent the actual frequency mu, as the sample can be mostly green while the bin is mostly red, demonstrating the possibility of picking 100 marbles with 90% red and all turning out to be green.', 'The formula e to the minus n represents the probability of nu being within epsilon of mu, with larger samples leading to a diminishingly small probability of nu not approximating mu well.', 'In science and engineering, settling for almost certain probabilities is valuable, as it allows for significant progress despite not being absolutely certain about the outcomes.', 'Epsilon serves as the tolerance, impacting the probability of approximation, where a more stringent tolerance like 10 to the power of -6 significantly reduces the benefit of the negative exponential.', "The concept of Herfding's inequality is introduced as the main inequality used in the course, providing the foundation for proving concepts related to the VC dimension."]}, {'end': 1824.495, 'segs': [{'end': 1318.071, 'src': 'embed', 'start': 1270.621, 'weight': 0, 'content': [{'end': 1274.742, 'text': 'And nu is the disappearing quantity, which happens to be the frequency in the sample you have.', 'start': 1270.621, 'duration': 4.121}, {'end': 1279.044, 'text': 'So what about the Hoeffding inequality?', 'start': 1277.003, 'duration': 2.041}, {'end': 1291.654, 'text': 'Well, the one attraction of this inequality is that it is valid for every n positive integer and every epsilon which is greater than 0..', 'start': 1280.566, 'duration': 11.088}, {'end': 1296.617, 'text': 'Pick any tolerance you want, and for any number of examples you want, this is true.', 'start': 1291.654, 'duration': 4.963}, {'end': 1298.279, 'text': "It's not an asymptotic result.", 'start': 1296.798, 'duration': 1.481}, {'end': 1300.9, 'text': "It's the result that holds for every n and epsilon.", 'start': 1298.419, 'duration': 2.481}, {'end': 1304.823, 'text': "That's a very attractive proposition for something that has an exponential in it.", 'start': 1301.221, 'duration': 3.602}, {'end': 1313.128, 'text': 'Now, Hevding inequality belongs to a large class of mathematical laws, which are called the laws of large numbers.', 'start': 1306.104, 'duration': 7.024}, {'end': 1318.071, 'text': 'So this is one law of large numbers, one form of it, and there are tons of them.', 'start': 1314.249, 'duration': 3.822}], 'summary': 'Hoeffding inequality holds for every n and epsilon, part of laws of large numbers.', 'duration': 47.45, 'max_score': 1270.621, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1270621.jpg'}, {'end': 1473.799, 'src': 'embed', 'start': 1445.623, 'weight': 1, 'content': [{'end': 1453.666, 'text': 'the smaller the epsilon is, the bigger the n you need in order to compensate for it and come up with the same level of probability bound.', 'start': 1445.623, 'duration': 8.043}, {'end': 1456.447, 'text': 'And that makes a lot of sense.', 'start': 1454.666, 'duration': 1.781}, {'end': 1467.132, 'text': 'If you have more examples, you are more sure that nu and mu will be close together, even closer and closer and closer, as you get larger n.', 'start': 1457.608, 'duration': 9.524}, {'end': 1467.892, 'text': 'So this makes sense.', 'start': 1467.132, 'duration': 0.76}, {'end': 1473.799, 'text': "Finally, it's a subtle point, but it's worth saying.", 'start': 1470.918, 'duration': 2.881}], 'summary': 'As epsilon decreases, you need a larger n for the same probability bound. more examples lead to closer nu and mu as n increases.', 'duration': 28.176, 'max_score': 1445.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1445623.jpg'}], 'start': 1192.048, 'title': 'Learning trade-offs', 'summary': 'Covers pac statements, hoeffding inequality, and explores the trade-offs between the number of examples and tolerance in probability bounds, emphasizing their applicability for any positive integer n and epsilon greater than 0.', 'chapters': [{'end': 1393.737, 'start': 1192.048, 'title': 'Hoeffding inequality and pac statements', 'summary': 'Explains the concept of pac statements and the hoeffding inequality, emphasizing its applicability for any positive integer n and epsilon greater than 0, and its relation to the laws of large numbers.', 'duration': 201.689, 'highlights': ['The Hoeffding inequality is valid for every n positive integer and every epsilon greater than 0, making it applicable for any tolerance and number of examples. Emphasizes the wide applicability of the Hoeffding inequality for any n and epsilon, making it a versatile tool in statistical analysis.', 'The Hoeffding inequality belongs to a class of mathematical laws called the laws of large numbers, and it is considered friendly due to its non-asymptotic nature and the presence of an exponential term. Highlights the relationship of the Hoeffding inequality to the laws of large numbers and its favorable characteristics, contributing to a deeper understanding of its significance in statistical theory.', "The probability in the left-hand side of the inequality explicitly depends on the unknown quantity mu, while the right-hand side does not contain mu, providing a useful tool for bounding the probability uniformly. Illustrates the asymmetrical dependence of the probability on mu and the advantage of bounding it uniformly without reliance on the unknown quantity mu, contributing to a clearer comprehension of the inequality's practical implications."]}, {'end': 1824.495, 'start': 1393.737, 'title': 'Trade-offs in learning and probability', 'summary': 'Discusses the trade-off between the number of examples (n) and tolerance (epsilon) in probability bounds, the interdependence of nu and mu, and the mapping of a learning problem to a simplistic bin situation, emphasizing the addition of a probability component.', 'duration': 430.758, 'highlights': ['The trade-off between N and epsilon in probability bounds dictates that a smaller epsilon requires a larger N to compensate for it, resulting in a higher level of probability bound. The smaller the epsilon is, the bigger the n you need in order to compensate for it and come up with the same level of probability bound.', 'The interdependence of nu and mu in the context of probability is discussed, emphasizing that nu affects mu, and the use of symmetric probability forms to infer mu from nu. The statement that nu is approximately the same as mu implies that mu is approximately the same as nu, and although the cause and effect is that mu affects nu, the logic used is that nu tends to be close to mu.', 'The mapping of a learning problem to a simplistic bin situation is explained, highlighting the correspondence of the bin to the input space in the learning problem, the mapping of hypotheses to the target function, and the addition of a probability component to the learning problem. The bin is corresponded to the input space in the learning problem, with marbles representing points and colors representing the agreement or disagreement between the hypothesis and the target function, adding a probability component to the learning problem.']}], 'duration': 632.447, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1192048.jpg', 'highlights': ['The Hoeffding inequality is valid for every n positive integer and every epsilon greater than 0, making it applicable for any tolerance and number of examples.', 'The trade-off between N and epsilon in probability bounds dictates that a smaller epsilon requires a larger N to compensate for it, resulting in a higher level of probability bound.', 'The Hoeffding inequality belongs to a class of mathematical laws called the laws of large numbers, and it is considered friendly due to its non-asymptotic nature and the presence of an exponential term.']}, {'end': 2114.854, 'segs': [{'end': 1895.701, 'src': 'embed', 'start': 1847.16, 'weight': 1, 'content': [{'end': 1854.403, 'text': 'It picks a hypothesis from the hypothesis set and produces it as the final hypothesis, which hopefully approximates f.', 'start': 1847.16, 'duration': 7.243}, {'end': 1854.904, 'text': "That's the game.", 'start': 1854.403, 'duration': 0.501}, {'end': 1862.994, 'text': 'So what is the addition we are going to do? In the bin analogy, This is the input space.', 'start': 1855.444, 'duration': 7.55}, {'end': 1864.994, 'text': 'Now the input space has a probability.', 'start': 1863.234, 'duration': 1.76}, {'end': 1870.197, 'text': 'So I need to apply this probability to the points from the input space that are being generated.', 'start': 1865.375, 'duration': 4.822}, {'end': 1877.239, 'text': 'So I am going to introduce a probability distribution over the input space.', 'start': 1872.157, 'duration': 5.082}, {'end': 1883.322, 'text': "So now the points in the input space, let's say the d-dimensional Euclidean space, are not just generic points now.", 'start': 1877.92, 'duration': 5.402}, {'end': 1886.423, 'text': 'There is a probability of picking one point versus the other.', 'start': 1883.722, 'duration': 2.701}, {'end': 1889.705, 'text': "And that is captured by the probability, which I'm going to call P.", 'start': 1886.823, 'duration': 2.882}, {'end': 1895.701, 'text': 'Now, the interesting thing is that I am making no assumptions about P.', 'start': 1891.499, 'duration': 4.202}], 'summary': 'Introducing a probability distribution over the input space in the hypothesis game.', 'duration': 48.541, 'max_score': 1847.16, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1847160.jpg'}, {'end': 1990.038, 'src': 'embed', 'start': 1917.836, 'weight': 0, 'content': [{'end': 1925.318, 'text': 'Of course, the probability choice will affect the choice of the probability of getting a green marble or a red marble,', 'start': 1917.836, 'duration': 7.482}, {'end': 1927.759, 'text': 'because now the probability of different marbles changed.', 'start': 1925.318, 'duration': 2.441}, {'end': 1931.019, 'text': 'So it could change the value mu.', 'start': 1928.159, 'duration': 2.86}, {'end': 1936.341, 'text': 'But the good news with the Hoeffding is that I could bound the performance independently of mu.', 'start': 1931.52, 'duration': 4.821}, {'end': 1942.002, 'text': "So I can get away with not only any p, but with a p that I don't know.", 'start': 1936.901, 'duration': 5.101}, {'end': 1944.523, 'text': 'And I will still be able to make the mathematical statement.', 'start': 1942.322, 'duration': 2.201}, {'end': 1948.444, 'text': 'So this is a very benign addition to the problem.', 'start': 1945.202, 'duration': 3.242}, {'end': 1952.446, 'text': 'And it will give us very high dividends, which is the feasibility of learning.', 'start': 1949.064, 'duration': 3.382}, {'end': 1962.271, 'text': 'So what do you do with the probability? You use the probability to generate the points x1 up to xn.', 'start': 1953.567, 'duration': 8.704}, {'end': 1968.275, 'text': 'So now, x1 up to xn are assumed to be generated by that probability independently.', 'start': 1962.692, 'duration': 5.583}, {'end': 1970.916, 'text': "That's the only assumption that is made.", 'start': 1969.095, 'duration': 1.821}, {'end': 1973.658, 'text': 'If you make that assumption, we are in business.', 'start': 1971.657, 'duration': 2.001}, {'end': 1979.576, 'text': 'But the good news, as I mentioned before, we did not compromise about the target function.', 'start': 1975.015, 'duration': 4.561}, {'end': 1984.277, 'text': "You don't need to make assumptions about the function you don't know and you want to learn, which is good news.", 'start': 1979.836, 'duration': 4.441}, {'end': 1987.218, 'text': 'And the addition is almost technical.', 'start': 1984.897, 'duration': 2.321}, {'end': 1990.038, 'text': 'There is a probability somewhere generating the points.', 'start': 1987.678, 'duration': 2.36}], 'summary': 'Hoeffding allows independent performance bound, regardless of mu or p, ensuring feasibility of learning.', 'duration': 72.202, 'max_score': 1917.836, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1917836.jpg'}, {'end': 2102.49, 'src': 'embed', 'start': 2074.661, 'weight': 5, 'content': [{'end': 2080.043, 'text': 'The situation as I describe it, you have a single bin, and you have red and green marbles, and this and that corresponds to the following.', 'start': 2074.661, 'duration': 5.382}, {'end': 2081.605, 'text': 'A bank comes to my office.', 'start': 2080.623, 'duration': 0.982}, {'end': 2084.726, 'text': 'We would like a formula for credit approval.', 'start': 2082.045, 'duration': 2.681}, {'end': 2086.478, 'text': 'And we have data.', 'start': 2085.797, 'duration': 0.681}, {'end': 2093.643, 'text': 'So, instead of actually taking the data and searching hypotheses and picking one like the perceptual learning algorithm,', 'start': 2087.458, 'duration': 6.185}, {'end': 2096.366, 'text': 'here is what I do that corresponds to what I just described.', 'start': 2093.643, 'duration': 2.723}, {'end': 2102.49, 'text': "You guys want a linear formula? I guess the salary should have a big weight, let's say 2.", 'start': 2097.126, 'duration': 5.364}], 'summary': 'Proposal for credit approval formula based on weighted salary data', 'duration': 27.829, 'max_score': 2074.661, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2074661.jpg'}], 'start': 1826.258, 'title': 'Probability in machine learning', 'summary': 'Discusses introducing probability distribution in machine learning, providing flexibility in choosing distributions and independent performance bounds. it also highlights the significance of probability in learning and emphasizes not compromising the target function.', 'chapters': [{'end': 1942.002, 'start': 1826.258, 'title': 'Introduction to probability in machine learning', 'summary': 'Introduces the concept of adding a probability distribution over the input space in machine learning, allowing for flexibility in choosing the probability distribution without making assumptions, and the ability to bound the performance independently of the probability choice.', 'duration': 115.744, 'highlights': ['The addition of a probability distribution over the input space in machine learning allows for flexibility in choosing the probability distribution without making assumptions about it.', 'The learning algorithm picks a hypothesis from the hypothesis set and produces it as the final hypothesis, which approximates the target function, generating the training examples.', 'The Hoeffding inequality allows for bounding the performance independently of the probability distribution, providing flexibility in choosing the probability distribution without knowing its specifics.']}, {'end': 2114.854, 'start': 1942.322, 'title': 'Feasibility of learning in probability', 'summary': 'Discusses the significant role of probability in generating points independently and its impact on learning, emphasizing the importance of not compromising on the target function and the distinction between learning and verification.', 'duration': 172.532, 'highlights': ['The significance of using probability to generate points independently is stressed as it contributes to the feasibility of learning.', 'Emphasizing the importance of not compromising on the target function in the learning process is highlighted.', 'The distinction between learning and verification is explained, illustrating the process of formulating a formula for credit approval based on given data.']}], 'duration': 288.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA1826258.jpg', 'highlights': ['The Hoeffding inequality allows for bounding the performance independently of the probability distribution, providing flexibility in choosing the probability distribution without knowing its specifics.', 'The addition of a probability distribution over the input space in machine learning allows for flexibility in choosing the probability distribution without making assumptions about it.', 'The learning algorithm picks a hypothesis from the hypothesis set and produces it as the final hypothesis, which approximates the target function, generating the training examples.', 'The significance of using probability to generate points independently is stressed as it contributes to the feasibility of learning.', 'Emphasizing the importance of not compromising on the target function in the learning process is highlighted.', 'The distinction between learning and verification is explained, illustrating the process of formulating a formula for credit approval based on given data.']}, {'end': 2639.249, 'segs': [{'end': 2211.904, 'src': 'embed', 'start': 2183.483, 'weight': 0, 'content': [{'end': 2187.787, 'text': 'So how do we do that? No guarantee that nu will be small.', 'start': 2183.483, 'duration': 4.304}, {'end': 2196.433, 'text': "We need to choose the hypothesis from multiple h's.", 'start': 2190.649, 'duration': 5.784}, {'end': 2197.114, 'text': "That's the game.", 'start': 2196.633, 'duration': 0.481}, {'end': 2202.217, 'text': "And in that case, you're going to go for the sample, so to speak, generated by every hypothesis.", 'start': 2197.394, 'duration': 4.823}, {'end': 2205.98, 'text': 'And then you pick the hypothesis that is most favorable, that gives you the least error.', 'start': 2202.578, 'duration': 3.402}, {'end': 2209.763, 'text': "So now, that doesn't look like a difficult thing.", 'start': 2207.101, 'duration': 2.662}, {'end': 2211.904, 'text': 'It worked with one bin.', 'start': 2210.603, 'duration': 1.301}], 'summary': 'Select hypothesis with least error from multiple choices.', 'duration': 28.421, 'max_score': 2183.483, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2183483.jpg'}, {'end': 2358.376, 'src': 'embed', 'start': 2334.84, 'weight': 3, 'content': [{'end': 2341.784, 'text': 'And you say that it must be that the corresponding bin is good, and the corresponding bin happens to be the hypothesis you chose.', 'start': 2334.84, 'duration': 6.944}, {'end': 2343.925, 'text': 'So that is an abstraction of learning.', 'start': 2342.384, 'duration': 1.541}, {'end': 2346.067, 'text': 'That was easy enough.', 'start': 2345.226, 'duration': 0.841}, {'end': 2355.774, 'text': "Now, because this is going to stay with us, I'm now going to introduce the notation that will survive with us for the entire discussion of learning.", 'start': 2348.209, 'duration': 7.565}, {'end': 2358.376, 'text': 'So here is the notation.', 'start': 2357.535, 'duration': 0.841}], 'summary': 'Introduction of notation for learning abstraction', 'duration': 23.536, 'max_score': 2334.84, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2334840.jpg'}, {'end': 2474.276, 'src': 'embed', 'start': 2415.091, 'weight': 1, 'content': [{'end': 2419.474, 'text': 'Now, we go and get the other one, which happens to be mu.', 'start': 2415.091, 'duration': 4.383}, {'end': 2421.555, 'text': 'And that is called out-of-sample.', 'start': 2419.714, 'duration': 1.841}, {'end': 2427.499, 'text': 'So if you are in this field, what matters is the out-of-sample performance.', 'start': 2422.396, 'duration': 5.103}, {'end': 2428.119, 'text': "That's the lesson.", 'start': 2427.539, 'duration': 0.58}, {'end': 2431.582, 'text': "Out-of-sample means something that you haven't seen.", 'start': 2428.3, 'duration': 3.282}, {'end': 2436.305, 'text': "And if you perform out-of-sample on something that you haven't seen, then you must have really learned.", 'start': 2432.002, 'duration': 4.303}, {'end': 2437.746, 'text': "So that's the standard for it.", 'start': 2436.745, 'duration': 1.001}, {'end': 2440.147, 'text': 'And the name for it is E-out.', 'start': 2438.166, 'duration': 1.981}, {'end': 2450.213, 'text': "So with this in mind, we realize that we don't yet have the dependency on H, which we need.", 'start': 2444.105, 'duration': 6.108}, {'end': 2465.731, 'text': 'So we are going to make the notation a little bit more elaborate by calling E in and E out, calling them e in of h and e out of h.', 'start': 2451.114, 'duration': 14.617}, {'end': 2474.276, 'text': 'Why is that? Well, in sample performance, you are trying to see the error of approximating the target function by your hypothesis.', 'start': 2465.731, 'duration': 8.545}], 'summary': 'Out-of-sample performance is essential for learning, named e-out, to approximate the target function.', 'duration': 59.185, 'max_score': 2415.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2415091.jpg'}, {'end': 2639.249, 'src': 'embed', 'start': 2614.301, 'weight': 2, 'content': [{'end': 2623.51, 'text': "The Hevling inequality that we have happily studied and declared important and all of that doesn't apply to multiple bins.", 'start': 2614.301, 'duration': 9.209}, {'end': 2632.706, 'text': 'What? You told us mathematics, and you go read the proof and all of that.', 'start': 2627.543, 'duration': 5.163}, {'end': 2639.249, 'text': 'Are you just pulling tricks on us? What is the deal here? And you even can complain.', 'start': 2632.786, 'duration': 6.463}], 'summary': 'Hevling inequality does not apply to multiple bins.', 'duration': 24.948, 'max_score': 2614.301, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2614301.jpg'}], 'start': 2114.914, 'title': 'Learning theory and multiple hypotheses', 'summary': 'Delves into learning with multiple hypotheses, stressing the significance of choosing the hypothesis with the lowest error from multiple bins to enhance learning and avoid dependence on a single hypothesis. it also covers learning theory notation, e in of h and e out of h, and explores the implications of hoeffding inequality in the context of multiple bins.', 'chapters': [{'end': 2290.983, 'start': 2114.914, 'title': 'Learning with multiple hypotheses', 'summary': 'Explains the concept of learning with multiple hypotheses, emphasizing the need to choose the hypothesis with the least error from multiple bins, in order to improve the learning process and avoid relying on a single hypothesis.', 'duration': 176.069, 'highlights': ["Emphasizes the need to choose the hypothesis with the least error from multiple bins In order to improve the learning process and avoid relying on a single hypothesis, it's essential to select the hypothesis that gives the least error from multiple bins.", 'Explains the concept of learning with multiple hypotheses The chapter delves into the idea of choosing from multiple hypotheses in order to deliberately find a hypothesis that works well on the data, thus emphasizing the need to move away from relying on a single hypothesis.', 'Discusses the challenges of relying on a single hypothesis The chapter highlights the lack of control over the quality of the data and the unpredictability of whether a single hypothesis will perform well, emphasizing the importance of exploring multiple hypotheses to improve learning outcomes.']}, {'end': 2639.249, 'start': 2291.764, 'title': 'Learning theory notation and multiple bins', 'summary': 'Discusses the notation for in-sample and out-of-sample performance, introducing e in of h and e out of h, and explains the implications of hoeffding inequality in the context of multiple bins.', 'duration': 347.485, 'highlights': ['The chapter introduces the notation for in-sample and out-of-sample performance as E in of h and E out of h, emphasizing the importance of out-of-sample performance in determining learning success. The notation for in-sample and out-of-sample performance is introduced as E in of h and E out of h, highlighting the importance of out-of-sample performance in determining learning success.', 'The chapter explains the implications of Hoeffding inequality in the context of multiple bins, highlighting the limitation of the inequality in this scenario. The chapter discusses the implications of Hoeffding inequality in the context of multiple bins, highlighting its limitation in this scenario.', 'The chapter emphasizes the importance of out-of-sample performance in determining learning success and the necessity of considering multiple bins in the learning process. The importance of out-of-sample performance in determining learning success and the necessity of considering multiple bins in the learning process is emphasized.']}], 'duration': 524.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2114914.jpg', 'highlights': ['Emphasizes the need to choose the hypothesis with the least error from multiple bins', 'The chapter introduces the notation for in-sample and out-of-sample performance as E in of h and E out of h', 'Explains the implications of Hoeffding inequality in the context of multiple bins', 'Explains the concept of learning with multiple hypotheses', 'Discusses the challenges of relying on a single hypothesis', 'Emphasizes the importance of out-of-sample performance in determining learning success']}, {'end': 2962.806, 'segs': [{'end': 2765.442, 'src': 'embed', 'start': 2736.796, 'weight': 0, 'content': [{'end': 2740.679, 'text': 'And therefore, if you get five heads, it must be that this coin gives you heads.', 'start': 2736.796, 'duration': 3.883}, {'end': 2741.58, 'text': 'We know better.', 'start': 2740.999, 'duration': 0.581}, {'end': 2745.602, 'text': "So in the online audience, what happened? In the online audience, it's also five heads.", 'start': 2741.66, 'duration': 3.942}, {'end': 2747.363, 'text': 'There are lots of biased coins out there.', 'start': 2745.622, 'duration': 1.741}, {'end': 2752.607, 'text': 'Are there really biased coins? No.', 'start': 2748.965, 'duration': 3.642}, {'end': 2759.98, 'text': "What is the deal here? So let's look at this.", 'start': 2754.548, 'duration': 5.432}, {'end': 2765.442, 'text': "With the audience here, I didn't want to push my luck with 10 coins, because it's live broadcast.", 'start': 2760.12, 'duration': 5.322}], 'summary': 'The speaker discusses the occurrence of five heads in a coin toss and the presence of biased coins, emphasizing a skeptical view on the matter.', 'duration': 28.646, 'max_score': 2736.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2736796.jpg'}, {'end': 2815.828, 'src': 'embed', 'start': 2786.641, 'weight': 1, 'content': [{'end': 2788.462, 'text': 'And that will give you about 1 in 1, 000.', 'start': 2786.641, 'duration': 1.821}, {'end': 2791.423, 'text': 'No chance that you will get it.', 'start': 2788.462, 'duration': 2.961}, {'end': 2794.225, 'text': 'Not chance, but very little chance.', 'start': 2793.004, 'duration': 1.221}, {'end': 2798.607, 'text': 'Now the second question is the one we actually ran the experiment for.', 'start': 2795.585, 'duration': 3.022}, {'end': 2804.85, 'text': "If you toss 1, 000 fair coins, it wasn't 1, 000 here, it's how many there.", 'start': 2800.188, 'duration': 4.662}, {'end': 2807.471, 'text': 'Maybe out there is 1, 000.', 'start': 2804.93, 'duration': 2.541}, {'end': 2813.004, 'text': 'What is the probability that some coin? will give you all 10 heads.', 'start': 2807.471, 'duration': 5.533}, {'end': 2815.828, 'text': 'Not difficult at all to compute.', 'start': 2814.526, 'duration': 1.302}], 'summary': 'The probability of getting all 10 heads when tossing 1,000 fair coins is very little, not exceeding 1 in 1,000.', 'duration': 29.187, 'max_score': 2786.641, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2786641.jpg'}, {'end': 2890.905, 'src': 'embed', 'start': 2864.554, 'weight': 2, 'content': [{'end': 2869.557, 'text': 'you will end up with extremely high probabilities that something bad will happen somewhere.', 'start': 2864.554, 'duration': 5.003}, {'end': 2870.758, 'text': "That's the key.", 'start': 2870.217, 'duration': 0.541}, {'end': 2875.861, 'text': "So let's translate this into the learning situation.", 'start': 2872.099, 'duration': 3.762}, {'end': 2877.522, 'text': 'So here are your coins.', 'start': 2876.601, 'duration': 0.921}, {'end': 2884.537, 'text': "And how do they correspond to the bins? Well, it's a binary experiment.", 'start': 2880.192, 'duration': 4.345}, {'end': 2890.905, 'text': "Whether you are picking a red marble or a green marble, or you are flipping a coin getting heads or tails, it's a binary situation.", 'start': 2884.597, 'duration': 6.308}], 'summary': 'High probabilities of bad outcomes due to binary situations.', 'duration': 26.351, 'max_score': 2864.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2864554.jpg'}], 'start': 2639.529, 'title': 'Biased coins and probability', 'summary': "Delves into biased coins through a practical experiment challenging assumptions, discusses the probability of getting 10 heads with surprising findings, and explains hoeffding's inequality and its application to decision-making.", 'chapters': [{'end': 2759.98, 'start': 2639.529, 'title': 'Biased coin experiment', 'summary': 'Discusses the concept of biased coins through a practical experiment where the audience is asked to flip a coin five times, and the results challenge the assumption of biased coins.', 'duration': 120.451, 'highlights': ['The speaker conducts an experiment where the audience is asked to flip a coin five times and report the outcomes, challenging the assumption of biased coins. The speaker challenges the concept of biased coins through a practical experiment. The audience is asked to flip a coin five times and report the outcomes, leading to the revelation that there are no biased coins.', 'The audience is surprised to find that many of them have obtained five heads when flipping the coin, leading to the conclusion that the assumption of biased coins is incorrect. Many audience members obtain five heads when flipping the coin, challenging the assumption of biased coins and leading to the conclusion that there are no biased coins.', 'The speaker emphasizes that the concept of biased coins is not valid and prompts the audience to reconsider their understanding of the issue. The speaker emphasizes that the concept of biased coins is not valid and prompts the audience to reconsider their understanding of the issue, challenging their assumptions.']}, {'end': 2843.182, 'start': 2760.12, 'title': 'Probability of getting 10 heads', 'summary': 'Discusses the probability of getting 10 heads when tossing fair coins, finding it to be 1 in 1,000 for 10 coins, and surprisingly more likely than not for 1,000 coins.', 'duration': 83.062, 'highlights': ['The probability of getting all 10 heads when tossing 10 fair coins is 1 in 1,000, indicating very little chance of occurrence.', 'When tossing 1,000 fair coins, the probability of getting all 10 heads is surprisingly more likely than not, challenging the intuition about the real probability.']}, {'end': 2962.806, 'start': 2843.722, 'title': "Hoeffding's inequality and learning paradigm", 'summary': "Explains hoeffding's inequality and its application to the learning situation, highlighting the danger of misleading outcomes and the importance of understanding probability and hypothesis in decision-making.", 'duration': 119.084, 'highlights': ["Hoeffding's inequality applies to any situation with probabilities, with a 0.5% chance of error in each case, leading to high probabilities of a bad outcome if the errors are disjoint.", 'The learning situation is illustrated using the analogy of picking coins and corresponding bins, emphasizing the importance of understanding probability and the impact of fair coins resulting in a random hypothesis with no useful information.', "The chapter warns about the danger of misleading outcomes in the learning paradigm, where the pursuit of a 'perfect' hypothesis can lead to overconfidence and false conclusions, highlighting the need for a deeper understanding of data and probability."]}], 'duration': 323.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2639529.jpg', 'highlights': ['The audience is surprised to find that many of them have obtained five heads when flipping the coin, leading to the conclusion that the assumption of biased coins is incorrect.', 'The probability of getting all 10 heads when tossing 10 fair coins is 1 in 1,000, indicating very little chance of occurrence.', 'The learning situation is illustrated using the analogy of picking coins and corresponding bins, emphasizing the importance of understanding probability and the impact of fair coins resulting in a random hypothesis with no useful information.']}, {'end': 3463.921, 'segs': [{'end': 3142.058, 'src': 'embed', 'start': 3101.878, 'weight': 1, 'content': [{'end': 3103.86, 'text': "If it's bad, one of them is bad.", 'start': 3101.878, 'duration': 1.982}, {'end': 3105.661, 'text': 'So this is equal to that.', 'start': 3104.1, 'duration': 1.561}, {'end': 3108.624, 'text': 'This is called the union bound in probability.', 'start': 3106.342, 'duration': 2.282}, {'end': 3112.928, 'text': "It's a very loose bound, in general, because it doesn't consider the overlap.", 'start': 3109.025, 'duration': 3.903}, {'end': 3117.433, 'text': 'Remember when I told you that the half a percent here, half a percent here, half a percent here.', 'start': 3113.189, 'duration': 4.244}, {'end': 3121.377, 'text': 'if you are very unlucky and these are non-overlapping, they add up.', 'start': 3117.433, 'duration': 3.944}, {'end': 3124.71, 'text': 'The non-overlapping is the worst-case assumption.', 'start': 3122.369, 'duration': 2.341}, {'end': 3127.051, 'text': 'And it is the assumption used by the union bound.', 'start': 3124.97, 'duration': 2.081}, {'end': 3128.372, 'text': 'So you get this.', 'start': 3127.711, 'duration': 0.661}, {'end': 3132.614, 'text': 'And the good news about this is that I have a handle on each term of them.', 'start': 3128.932, 'duration': 3.682}, {'end': 3133.634, 'text': 'The union bound is coming up.', 'start': 3132.674, 'duration': 0.96}, {'end': 3142.058, 'text': 'So I put the ORs, and then I use the union bound to say that this is equal to, and simply sum the individual probabilities.', 'start': 3133.694, 'duration': 8.364}], 'summary': 'The union bound in probability gives a loose worst-case assumption, adding up non-overlapping terms.', 'duration': 40.18, 'max_score': 3101.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3101878.jpg'}, {'end': 3195.58, 'src': 'embed', 'start': 3161.068, 'weight': 3, 'content': [{'end': 3164.23, 'text': 'Therefore, I have the employment because of the OR.', 'start': 3161.068, 'duration': 3.162}, {'end': 3171.434, 'text': 'And this one, because of the union bound, where I have the pessimistic assumption that I just need to add the probabilities.', 'start': 3164.55, 'duration': 6.884}, {'end': 3180.985, 'text': 'Now, all of this, again, we make simplistic assumptions, which is really not simplistic as in trivially restricting, but rather the opposite.', 'start': 3172.216, 'duration': 8.769}, {'end': 3185.329, 'text': "We just don't want to make any assumptions that restrict the applicability of our result.", 'start': 3181.365, 'duration': 3.964}, {'end': 3186.75, 'text': 'So we took the worst case.', 'start': 3185.749, 'duration': 1.001}, {'end': 3195.58, 'text': 'It cannot get worse than that, right? So if you look at this, now I have good news to you, because each term here is a fixed hypothesis.', 'start': 3187.391, 'duration': 8.189}], 'summary': 'The employment is due to the or and union bound, making pessimistic assumptions and considering fixed hypotheses.', 'duration': 34.512, 'max_score': 3161.068, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3161068.jpg'}, {'end': 3281.99, 'src': 'embed', 'start': 3202.089, 'weight': 0, 'content': [{'end': 3207.936, 'text': 'So if I look at a term by itself, Hoeffding applies to this exactly the same way it applied before.', 'start': 3202.089, 'duration': 5.847}, {'end': 3211.323, 'text': 'So this is a mathematical statement now.', 'start': 3209.602, 'duration': 1.721}, {'end': 3212.844, 'text': "I'm not looking at the bigger experiment.", 'start': 3211.343, 'duration': 1.501}, {'end': 3215.526, 'text': 'I reduced the bigger experiment to a bunch of quantities.', 'start': 3213.164, 'duration': 2.362}, {'end': 3219.468, 'text': 'Each of them corresponds to a simple experiment that we already solved.', 'start': 3215.626, 'duration': 3.842}, {'end': 3223.591, 'text': 'So I can substitute for each of these by the bound that the Hoeffding gives me.', 'start': 3219.929, 'duration': 3.662}, {'end': 3235.973, 'text': "So what is the bound that the Hoeffding gives me? Is? That's the one.", 'start': 3227.334, 'duration': 8.639}, {'end': 3242.78, 'text': 'For every one of them, each of these guys was less than or equal to this quantity.', 'start': 3236.593, 'duration': 6.187}, {'end': 3246.149, 'text': 'One by one, all of them are obviously the same.', 'start': 3244.108, 'duration': 2.041}, {'end': 3248.791, 'text': 'So each of them is smaller than this quantity.', 'start': 3247.23, 'duration': 1.561}, {'end': 3250.071, 'text': 'Each of them is smaller than this quantity.', 'start': 3248.871, 'duration': 1.2}, {'end': 3256.215, 'text': 'So now I can be confident that the probabilities that I am interested in,', 'start': 3250.511, 'duration': 5.704}, {'end': 3263.698, 'text': 'which is the probability that the in-sample error is being close to the out-of-sample error.', 'start': 3256.215, 'duration': 7.483}, {'end': 3269.582, 'text': 'the closeness of them is bigger than my tolerance, the bad event under the genuine learning scenario.', 'start': 3263.698, 'duration': 5.884}, {'end': 3281.99, 'text': 'You generate marbles from every bin, and you look deliberately for a sample that happens to be all green, or as green as possible.', 'start': 3270.582, 'duration': 11.408}], 'summary': "Hoeffding's bound applies to quantities in experiments, each smaller than a given quantity, ensuring confidence in probabilities.", 'duration': 79.901, 'max_score': 3202.089, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3202089.jpg'}, {'end': 3340.974, 'src': 'embed', 'start': 3308.132, 'weight': 6, 'content': [{'end': 3310.093, 'text': 'Now, this is the bad event.', 'start': 3308.132, 'duration': 1.961}, {'end': 3311.835, 'text': "I'd like the probability to be small.", 'start': 3310.334, 'duration': 1.501}, {'end': 3314.918, 'text': "I don't like to magnify the right-hand side.", 'start': 3312.395, 'duration': 2.523}, {'end': 3317.766, 'text': 'Because that is the probability of something bad happening.', 'start': 3315.725, 'duration': 2.041}, {'end': 3326.65, 'text': 'Now, with M, we realize, if you use 10 hypotheses, this probability is probably tight.', 'start': 3319.287, 'duration': 7.363}, {'end': 3332.373, 'text': 'If you use a million hypotheses, we probably are already in trouble.', 'start': 3327.631, 'duration': 4.742}, {'end': 3334.314, 'text': 'There is no guarantee.', 'start': 3333.333, 'duration': 0.981}, {'end': 3340.974, 'text': 'Because now the million gets multiplied by what used to be a respectable probability, which is 1 in 100, 000.', 'start': 3335.632, 'duration': 5.342}], 'summary': 'Probability decreases as hypotheses increase; 1 in 100,000 likely with 10, but risky with a million.', 'duration': 32.842, 'max_score': 3308.132, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3308132.jpg'}, {'end': 3408.559, 'src': 'embed', 'start': 3380.708, 'weight': 7, 'content': [{'end': 3384.849, 'text': 'If you have a very sophisticated model, M is huge, let alone infinite.', 'start': 3380.708, 'duration': 4.141}, {'end': 3386.51, 'text': "That's later to come.", 'start': 3384.909, 'duration': 1.601}, {'end': 3388.27, 'text': "That's what the theory of generalization is about.", 'start': 3386.55, 'duration': 1.72}, {'end': 3398.253, 'text': 'But if you pick a very sophisticated example with a large M, you lose the link between the in-sample and out-of-sample.', 'start': 3389.091, 'duration': 9.162}, {'end': 3399.734, 'text': 'So you look at here.', 'start': 3398.794, 'duration': 0.94}, {'end': 3403.778, 'text': "I didn't mean it this way.", 'start': 3402.698, 'duration': 1.08}, {'end': 3406.779, 'text': 'But let me go back, just to show you what it is.', 'start': 3404.258, 'duration': 2.521}, {'end': 3408.559, 'text': "At least you know it's over.", 'start': 3407.659, 'duration': 0.9}], 'summary': 'Generalization theory is about the link between in-sample and out-of-sample, affected by model complexity.', 'duration': 27.851, 'max_score': 3380.708, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3380708.jpg'}], 'start': 2964.175, 'title': 'Probability and model generalization', 'summary': 'Covers hoeffding inequality, union bound, and their application to multiple bins and employment analysis, highlighting the probability of bad events and the challenge of generalization as model sophistication increases, potentially resulting in poor generalization.', 'chapters': [{'end': 3160.268, 'start': 2964.175, 'title': 'Hoeffding inequality and multiple bins', 'summary': 'Discusses the hoeffding inequality and its application to a large number of bins, demonstrating the probability of bad events and the union bound in probability.', 'duration': 196.093, 'highlights': ['The chapter explains the application of the Hoeffding inequality to a large number of bins and the concept of the union bound in probability. It discusses the dilution of the guarantee as the number of experiments increases and the need to find a Hoeffding counterpart for a process involving a large number of bins.', 'The chapter illustrates the probability of a bad event in a process involving a large number of hypotheses, providing insights into the union bound in probability. It provides a mathematical approach to determining the probability that a hypothesis chosen from a set of M hypotheses is bad, using the union bound to derive an upper bound on the individual probabilities.', 'The chapter emphasizes the concept of the union bound in probability and its application to derive an upper bound on the individual probabilities of bad events. It explains the loose nature of the union bound and its reliance on worst-case assumptions, demonstrating the use of the bound to sum the individual probabilities and derive an upper bound on the overall probability.']}, {'end': 3256.215, 'start': 3161.068, 'title': 'Employment and union bound analysis', 'summary': "Discusses employing the union bound to make pessimistic assumptions and applying hoeffding's inequality to fixed hypotheses, resulting in confidence in the probabilities of interest.", 'duration': 95.147, 'highlights': ["Applying the union bound to make pessimistic assumptions for employment and declaring fixed hypotheses for each term to apply Hoeffding's inequality.", "Reducing the bigger experiment to a bunch of quantities, each corresponding to a simple experiment, allowing substitution for each term with the bound from Hoeffding's inequality.", "Confidence in the probabilities of interest due to every term being smaller than a certain quantity based on the bound from Hoeffding's inequality."]}, {'end': 3463.921, 'start': 3256.215, 'title': 'Generalization and model sophistication', 'summary': 'Discusses the challenge of generalization as model sophistication increases, leading to a larger deviation between in-sample and out-of-sample error probabilities, with m hypotheses exacerbating the issue, potentially resulting in poor generalization.', 'duration': 207.706, 'highlights': ['The probability of in-sample error being close to the out-of-sample error is discussed, with the closeness being dependent on a tolerance level, and the impact of model sophistication on the link between in-sample and out-of-sample error probabilities. Probability analysis, impact of model sophistication', 'The exponential factor and the added factor due to the presence of capital M hypotheses are highlighted, emphasizing the potential exacerbation of the bad event probability with an increasing number of hypotheses. Exponential factor, impact of capital M hypotheses', 'The discussion emphasizes how a larger number of hypotheses, such as a million, can lead to a significant increase in the probability of something bad happening, impacting the generalization ability of the model. Impact of a larger number of hypotheses on bad event probability', 'The challenge of memorization in-sample and poor generalization out-of-sample with a sophisticated model due to the large number of parameters and ways to interpret the model is discussed, highlighting the loss of the link between in-sample and out-of-sample as M increases. Impact of model sophistication on memorization and generalization']}], 'duration': 499.746, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA2964175.jpg', 'highlights': ['The chapter discusses the application of the Hoeffding inequality and the concept of the union bound in probability, emphasizing the dilution of the guarantee as the number of experiments increases.', 'It provides insights into the probability of a bad event in a process involving a large number of hypotheses, using the union bound to derive an upper bound on the individual probabilities.', 'The chapter emphasizes the loose nature of the union bound and its reliance on worst-case assumptions, demonstrating its application to derive an upper bound on the overall probability.', "Applying the union bound to make pessimistic assumptions for employment and declaring fixed hypotheses for each term to apply Hoeffding's inequality.", "Confidence in the probabilities of interest due to every term being smaller than a certain quantity based on the bound from Hoeffding's inequality.", 'The probability of in-sample error being close to the out-of-sample error is discussed, with the closeness being dependent on a tolerance level, and the impact of model sophistication on the link between in-sample and out-of-sample error probabilities.', 'The discussion emphasizes how a larger number of hypotheses, such as a million, can lead to a significant increase in the probability of something bad happening, impacting the generalization ability of the model.', 'The challenge of memorization in-sample and poor generalization out-of-sample with a sophisticated model due to the large number of parameters and ways to interpret the model is discussed, highlighting the loss of the link between in-sample and out-of-sample as M increases.']}, {'end': 4592.517, 'segs': [{'end': 3490.903, 'src': 'embed', 'start': 3463.921, 'weight': 1, 'content': [{'end': 3474.486, 'text': 'It means that either the resources of the examples you have, the amount of data you have, is not sufficient to guarantee any generalization,', 'start': 3463.921, 'duration': 10.565}, {'end': 3479.328, 'text': 'which is somewhat equivalent, that your tolerance is too stringent.', 'start': 3474.486, 'duration': 4.842}, {'end': 3482.499, 'text': 'The situation is not really mysterious.', 'start': 3480.798, 'duration': 1.701}, {'end': 3487.982, 'text': "Let's say that you would like to take a poll for the president.", 'start': 3483.18, 'duration': 4.802}, {'end': 3490.903, 'text': "And let's say that you ask five people at random.", 'start': 3488.542, 'duration': 2.361}], 'summary': 'Insufficient data may result in inaccurate generalization. example: polling 5 random people for president.', 'duration': 26.982, 'max_score': 3463.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3463921.jpg'}, {'end': 3643.201, 'src': 'embed', 'start': 3618.906, 'weight': 3, 'content': [{'end': 3625.469, 'text': 'So if I know the probability, I can tell you exactly what are the likelihood that you will get one sample, or another, or another.', 'start': 3618.906, 'duration': 6.563}, {'end': 3629.491, 'text': 'Now, what you do in statistics is the reverse of that.', 'start': 3626.649, 'duration': 2.842}, {'end': 3634.974, 'text': 'You already have the sample, and you are trying to infer which probability gave rise to it.', 'start': 3629.531, 'duration': 5.443}, {'end': 3640.158, 'text': 'So you are using the effect to decide the cause, rather than the other way around.', 'start': 3635.874, 'duration': 4.284}, {'end': 3643.201, 'text': 'The same situation here.', 'start': 3641.94, 'duration': 1.261}], 'summary': 'In statistics, you infer the probability from the sample, using effect to decide the cause.', 'duration': 24.295, 'max_score': 3618.906, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3618906.jpg'}, {'end': 3810.127, 'src': 'embed', 'start': 3779.409, 'weight': 4, 'content': [{'end': 3782.571, 'text': 'Not clear that anything that has to do with the size really is the complexity.', 'start': 3779.409, 'duration': 3.162}, {'end': 3786.574, 'text': 'Maybe the complexity has to do with the structure of individual hypotheses and whatnot.', 'start': 3782.611, 'duration': 3.963}, {'end': 3788.416, 'text': "And that's a very interesting point.", 'start': 3787.155, 'duration': 1.261}, {'end': 3795.541, 'text': 'And that will be discussed at some point, the complexity of individual hypotheses versus the complexity of the model that captures all the hypotheses.', 'start': 3788.736, 'duration': 6.805}, {'end': 3798.483, 'text': 'This will be a topic that we will discuss much later in the course.', 'start': 3795.861, 'duration': 2.622}, {'end': 3801.442, 'text': 'And some people are getting ahead.', 'start': 3799.941, 'duration': 1.501}, {'end': 3810.127, 'text': 'So how do you pick G? We have one way of picking G that already was established last time, which is the perceptron learning algorithm.', 'start': 3801.782, 'duration': 8.345}], 'summary': 'The complexity of individual hypotheses will be discussed later in the course, and g can be picked using the perceptron learning algorithm.', 'duration': 30.718, 'max_score': 3779.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3779409.jpg'}, {'end': 4003.843, 'src': 'embed', 'start': 3971.14, 'weight': 2, 'content': [{'end': 3974.722, 'text': "There's also a common confusion.", 'start': 3971.14, 'duration': 3.582}, {'end': 3987.667, 'text': 'Why are there multiple bins? The bin was only our conceptual tool to argue that learning is feasible in a probabilistic sense.', 'start': 3975.122, 'duration': 12.545}, {'end': 3996.538, 'text': 'When we used a single bin, we had a correspondence with a hypothesis and it looked like we actually captured the essence of learning.', 'start': 3988.973, 'duration': 7.565}, {'end': 4003.843, 'text': 'until we looked closer and we realized that if you restrict yourself to one bin and apply the Hoeffding inequality directly to it,', 'start': 3996.538, 'duration': 7.305}], 'summary': 'Multiple bins used to show learning feasibility in probabilistic sense', 'duration': 32.703, 'max_score': 3971.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3971140.jpg'}, {'end': 4438.255, 'src': 'embed', 'start': 4412.549, 'weight': 0, 'content': [{'end': 4418.898, 'text': 'then there is a reason to believe that the answer in the general population or in the big bin will be close to the answer you got in sample.', 'start': 4412.549, 'duration': 6.349}, {'end': 4420.979, 'text': "So that's the verification.", 'start': 4419.938, 'duration': 1.041}, {'end': 4429.222, 'text': 'In order to move from verification to learning, you need to be able to make that statement simultaneously on a number of these guys.', 'start': 4421.319, 'duration': 7.903}, {'end': 4433.104, 'text': "And that's why we had the modified Hoeffding inequality at the end.", 'start': 4429.582, 'duration': 3.522}, {'end': 4438.255, 'text': 'which is this one, that has the red m in it.', 'start': 4435.034, 'duration': 3.221}], 'summary': 'Verification in population close to sample; modified hoeffding inequality used.', 'duration': 25.706, 'max_score': 4412.549, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA4412549.jpg'}], 'start': 3463.921, 'title': 'Importance of data quantity', 'summary': 'Stresses the need for sufficient data for generalization, citing examples of polling and infinite hypothesis sets. it discusses nu equals mu in statistics and machine learning, model complexity, and using multiple bins to represent hypotheses, ensuring learning through probability distribution and modified hoeffding inequality.', 'chapters': [{'end': 3595.164, 'start': 3463.921, 'title': 'Importance of sufficient data', 'summary': 'Emphasizes the importance of having a sufficient amount of data for generalization, illustrated through the example of polling for a president with a small sample size and the implications of infinite hypothesis sets on learning models.', 'duration': 131.243, 'highlights': ['The importance of having a sufficient amount of data for generalization is highlighted through the example of polling for the president, where a small sample size yields insignificant results.', 'The concept of infinite hypothesis sets in learning models is discussed, emphasizing the challenges it poses and the need for abstract quantities to describe them with finite values.']}, {'end': 3971.12, 'start': 3595.404, 'title': 'Inference and learning models', 'summary': 'Discusses the implication of nu equals mu, and vice versa, in statistics and machine learning, where the cause and effect relationship is used to infer probability and sample distribution. it also covers the concept of model complexity and extends the equation to support a valid range of responses.', 'duration': 375.716, 'highlights': ['The cause and effect relationship is used to infer probability and sample distribution. In statistics, the cause and effect relationship is used to infer the probability that gave rise to a sample, while in machine learning, the effect (sample) is used to infer the cause (probability).', 'The concept of model complexity and the measure of complexity in terms of the structure of individual hypotheses are discussed. The discussion includes the consideration of the complexity of individual hypotheses versus the complexity of the model that captures all the hypotheses, which will be explored further in the course.', 'The extension of the equation to support a valid range of responses instead of a binary response is explained. The discussion involves modifying the equation to accommodate expected value and sample average, and considering the variance of the variable when choosing the bound.']}, {'end': 4592.517, 'start': 3971.14, 'title': 'Feasibility of learning in probabilistic sense', 'summary': 'Explains the conceptual tool of using multiple bins to represent multiple hypotheses for learning, the relationship between probability and learning, the invocation of probability distribution, and the modified hoeffding inequality to guarantee the behavior of multiple hypotheses, allowing for learning.', 'duration': 621.377, 'highlights': ['The conceptual tool of using multiple bins to represent multiple hypotheses is crucial for learning Using multiple bins to represent multiple hypotheses is essential for learning, as it allows for exploring hypotheses based on their performance in sample and picking the one that performs best, perhaps in sample, and hoping that it will generalize well out of sample.', 'The invocation of probability distribution is necessary to benefit from probabilistic analysis in learning The only invocation of probability needed for the probabilistic analysis in learning is to put a probability distribution on x, whereas the hypothesis set h is left as a fixed set without a probability distribution.', 'The modified Hoeffding inequality guarantees the behavior of multiple hypotheses, enabling learning The modified Hoeffding inequality deals with a situation where there are multiple hypotheses simultaneously and ensures that all of them behave well, allowing for learning. The probability of bad things happening when there are many possibilities is greater than when there is only one, as represented by the red m in the inequality.', 'The relationship between probability and learning is discussed, including the necessity of making statements simultaneously on multiple hypotheses for learning In order to move from verification to learning, it is necessary to be able to make statements simultaneously on a number of hypotheses. This requirement is addressed by the modified Hoeffding inequality, which ensures the behavior of multiple hypotheses in the learning process.']}], 'duration': 1128.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MEG35RDD7RA/pics/MEG35RDD7RA3463921.jpg', 'highlights': ['The modified Hoeffding inequality guarantees the behavior of multiple hypotheses, enabling learning', 'The importance of having a sufficient amount of data for generalization is highlighted through the example of polling for the president, where a small sample size yields insignificant results', 'The conceptual tool of using multiple bins to represent multiple hypotheses is crucial for learning', 'The cause and effect relationship is used to infer probability and sample distribution', 'The concept of model complexity and the measure of complexity in terms of the structure of individual hypotheses are discussed']}], 'highlights': ["The lecture delves into the question 'Is Learning Feasible?' extensively.", 'The importance of having a sufficient amount of data for generalization is highlighted through the example of polling for the president, where a small sample size yields insignificant results.', 'The Hoeffding inequality allows for bounding the performance independently of the probability distribution, providing flexibility in choosing the probability distribution without knowing its specifics.', 'The learning algorithm picks a hypothesis from the hypothesis set and produces it as the final hypothesis, which approximates the target function, generating the training examples.', 'The existence of a pattern can be intuitively understood, such as in the credit card approval process.', 'The unknown nature of the target function in supervised learning is a critical property.', 'The trade-off between N and epsilon in probability bounds dictates that a smaller epsilon requires a larger N to compensate for it, resulting in a higher level of probability bound.', 'The concept of model complexity and the measure of complexity in terms of the structure of individual hypotheses are discussed.', 'The probability of in-sample error being close to the out-of-sample error is discussed, with the closeness being dependent on a tolerance level, and the impact of model sophistication on the link between in-sample and out-of-sample error probabilities.', 'The challenge of memorization in-sample and poor generalization out-of-sample with a sophisticated model due to the large number of parameters and ways to interpret the model is discussed, highlighting the loss of the link between in-sample and out-of-sample as M increases.']}