title
Lecture 01 - The Learning Problem
description
The Learning Problem - Introduction; supervised, unsupervised, and reinforcement learning. Components of the learning problem. Lecture 1 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on April 3, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.
detail
{'title': 'Lecture 01 - The Learning Problem', 'heatmap': [{'end': 1914.773, 'start': 1856.114, 'weight': 1}], 'summary': 'Lecture covers various aspects of machine learning, including basics, applications, concepts, learning algorithms, language modeling, limitations, and relationship with statistics. it emphasizes the importance of data in machine learning applications and highlights the impact of a 10% improvement in machine learning performance, citing a $1 million payout by netflix. the content also explores supervised, unsupervised, and reinforcement learning, and discusses the limitations of the perceptron learning algorithm. additionally, it delves into the impact of data and hypothesis set size on learning, emphasizing the importance of data size for generalization, and the trade-off between hypothesis set size and generalization.', 'chapters': [{'end': 464.781, 'segs': [{'end': 151.31, 'src': 'embed', 'start': 100.452, 'weight': 0, 'content': [{'end': 103.096, 'text': 'There is a logical dependency that goes through the course.', 'start': 100.452, 'duration': 2.644}, {'end': 107.582, 'text': 'And there is one exception to that logical dependency.', 'start': 104.017, 'duration': 3.565}, {'end': 112.349, 'text': "One lecture, which is the third one, doesn't really belong here.", 'start': 108.604, 'duration': 3.745}, {'end': 120.593, 'text': "It's a practical topic and the reason I included it early on, because I needed to give you some tools to play around with,", 'start': 113.331, 'duration': 7.262}, {'end': 123.654, 'text': 'to test the theoretical and conceptual aspects.', 'start': 120.593, 'duration': 3.061}, {'end': 130.936, 'text': 'If I waited until it belongs normally, which is to the second aspect of the learning models, which is down there.', 'start': 124.034, 'duration': 6.902}, {'end': 137.8, 'text': "The beginning of the course will be just too theoretical for people's taste.", 'start': 132.156, 'duration': 5.644}, {'end': 144.745, 'text': 'And as you see, if you look at the colors, it is mostly red in the beginning, and mostly blue in the end.', 'start': 138.781, 'duration': 5.964}, {'end': 151.31, 'text': 'So it starts building the concepts and the theory, and then it goes on to the practical aspects.', 'start': 145.146, 'duration': 6.164}], 'summary': 'Course content is strategically structured, with practical topics introduced early to balance theoretical concepts. the course progresses from red (theory) to blue (practical).', 'duration': 50.858, 'max_score': 100.452, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI100452.jpg'}, {'end': 294.497, 'src': 'embed', 'start': 244.521, 'weight': 1, 'content': [{'end': 247.323, 'text': "And it's a puzzle in more ways than one, as you will see.", 'start': 244.521, 'duration': 2.802}, {'end': 251.066, 'text': "So let's start with an example.", 'start': 248.984, 'duration': 2.082}, {'end': 260.017, 'text': "The example of machine learning that I'm going to start with is how a viewer would rate a movie.", 'start': 254.116, 'duration': 5.901}, {'end': 266.579, 'text': "Now, that is an interesting problem, and it's interesting for us because we watch movies.", 'start': 262.318, 'duration': 4.261}, {'end': 269.46, 'text': "It's very interesting for a company that rents out movies.", 'start': 266.599, 'duration': 2.861}, {'end': 280.105, 'text': 'And indeed, a company, which is Netflix, wanted to improve the in-house system by a mere 10%.', 'start': 270.22, 'duration': 9.885}, {'end': 282.107, 'text': 'So they make recommendations when you log in.', 'start': 280.105, 'duration': 2.002}, {'end': 286.01, 'text': "They recommend movies that they think you will like, so they think that you'll rate them highly.", 'start': 282.387, 'duration': 3.623}, {'end': 289.493, 'text': 'And they had a system, and they wanted to improve the system.', 'start': 286.791, 'duration': 2.702}, {'end': 294.497, 'text': 'So how much is a 10% improvement in performance worth to the company?', 'start': 290.294, 'duration': 4.203}], 'summary': 'Netflix aimed to improve its recommendation system by 10% to enhance user satisfaction and viewing experience.', 'duration': 49.976, 'max_score': 244.521, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI244521.jpg'}, {'end': 351.412, 'src': 'embed', 'start': 323.778, 'weight': 5, 'content': [{'end': 329.463, 'text': 'You are likely to rent the movie that they recommend, and they will make lots of money, much more than the $1 million they promised.', 'start': 323.778, 'duration': 5.685}, {'end': 331.884, 'text': 'And this is very typical in machine learning.', 'start': 330.163, 'duration': 1.721}, {'end': 334.865, 'text': 'For example, machine learning has application in financial forecasting.', 'start': 331.924, 'duration': 2.941}, {'end': 339.927, 'text': 'You can imagine that the minutest improvement in financial forecasting can make a lot of money.', 'start': 335.325, 'duration': 4.602}, {'end': 351.412, 'text': 'So the fact that you can actually push the system to be better using machine learning is a very attractive aspect of the technique in a wide spectrum of applications.', 'start': 340.888, 'duration': 10.524}], 'summary': 'Machine learning can lead to significant financial gains, exceeding $1 million, particularly in financial forecasting.', 'duration': 27.634, 'max_score': 323.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI323778.jpg'}, {'end': 477.134, 'src': 'embed', 'start': 445.017, 'weight': 4, 'content': [{'end': 446.457, 'text': 'We have to have data.', 'start': 445.017, 'duration': 1.44}, {'end': 447.778, 'text': 'We are learning from data.', 'start': 446.537, 'duration': 1.241}, {'end': 456.819, 'text': 'So if someone knocks on my door with an interesting machine learning application and they tell me how exciting it is and how great the application would be and how much money they would make,', 'start': 448.378, 'duration': 8.441}, {'end': 460.16, 'text': 'the first question I ask what data do you have?', 'start': 456.819, 'duration': 3.341}, {'end': 462.76, 'text': 'If you have data, we are in business.', 'start': 461.18, 'duration': 1.58}, {'end': 464.781, 'text': "If you don't, you are out of luck.", 'start': 463.2, 'duration': 1.581}, {'end': 469.322, 'text': 'If you have these three components, you are ready to apply machine learning.', 'start': 466.201, 'duration': 3.121}, {'end': 477.134, 'text': 'Now, let me give you a solution to the movie rating in order to start getting a feel for it.', 'start': 471.812, 'duration': 5.322}], 'summary': 'Data is crucial for machine learning success; without it, the business is out of luck.', 'duration': 32.117, 'max_score': 445.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI445017.jpg'}], 'start': 0.703, 'title': 'Machine learning basics and its value in business', 'summary': 'Introduces the outline of the course, emphasizing the logical dependency of topics and the transition from theoretical to practical aspects, and discusses the learning problem, including an example about movie ratings, mathematical formalization, and the first algorithm for machine learning. it also highlights the significant impact of a 10% improvement in machine learning performance, citing a $1 million payout by netflix for achieving this improvement, and emphasizes the crucial role of data in machine learning applications, highlighting its potential for substantial financial gains.', 'chapters': [{'end': 269.46, 'start': 0.703, 'title': 'Machine learning basics', 'summary': 'Introduces the outline of the course, emphasizing the logical dependency of topics and the transition from theoretical to practical aspects, and discusses the learning problem, including an example about movie ratings, mathematical formalization, and the first algorithm for machine learning.', 'duration': 268.757, 'highlights': ['The course outline emphasizes the logical dependency of topics and the transition from theoretical to practical aspects. The course outline highlights the logical dependency of topics and the transition from theoretical to practical aspects.', 'The lecture introduces the learning problem, including an example about movie ratings and mathematical formalization, and discusses the first algorithm for machine learning. The lecture discusses the learning problem, including an example about movie ratings, mathematical formalization, and the first algorithm for machine learning.', "The topics of the course are color-coded to designate their main content, whether it's mathematical or practical. The topics of the course are color-coded to designate their main content, whether it's mathematical or practical."]}, {'end': 464.781, 'start': 270.22, 'title': 'Value of machine learning in business', 'summary': 'Discusses the significant impact of a 10% improvement in machine learning performance, citing a $1 million payout by netflix for achieving this improvement, and emphasizes the crucial role of data in machine learning applications, highlighting its potential for substantial financial gains.', 'duration': 194.561, 'highlights': ['The payout for achieving a 10% improvement in the in-house system by Netflix was $1 million, showcasing the substantial value attributed to performance enhancements in machine learning applications.', 'The crucial role of data in machine learning applications is emphasized, with the speaker highlighting the necessity of data for successful machine learning implementation and its direct correlation to the potential for financial gains.', 'The importance of machine learning in a wide spectrum of applications is underscored, with a specific example of its application in financial forecasting that can lead to significant financial gains even with small improvements in performance.']}], 'duration': 464.078, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI703.jpg', 'highlights': ['The course outline emphasizes the logical dependency of topics and the transition from theoretical to practical aspects.', 'The lecture introduces the learning problem, including an example about movie ratings and mathematical formalization, and discusses the first algorithm for machine learning.', "The topics of the course are color-coded to designate their main content, whether it's mathematical or practical.", 'The payout for achieving a 10% improvement in the in-house system by Netflix was $1 million, showcasing the substantial value attributed to performance enhancements in machine learning applications.', 'The crucial role of data in machine learning applications is emphasized, with the speaker highlighting the necessity of data for successful machine learning implementation and its direct correlation to the potential for financial gains.', 'The importance of machine learning in a wide spectrum of applications is underscored, with a specific example of its application in financial forecasting that can lead to significant financial gains even with small improvements in performance.']}, {'end': 1010.471, 'segs': [{'end': 495.38, 'src': 'embed', 'start': 466.201, 'weight': 2, 'content': [{'end': 469.322, 'text': 'If you have these three components, you are ready to apply machine learning.', 'start': 466.201, 'duration': 3.121}, {'end': 477.134, 'text': 'Now, let me give you a solution to the movie rating in order to start getting a feel for it.', 'start': 471.812, 'duration': 5.322}, {'end': 479.134, 'text': 'Here is a system.', 'start': 478.314, 'duration': 0.82}, {'end': 480.895, 'text': 'Let me start to focus on part of it.', 'start': 479.174, 'duration': 1.721}, {'end': 488.938, 'text': 'We are going to describe a viewer as a vector of factors, a profile, if you will.', 'start': 482.495, 'duration': 6.443}, {'end': 495.38, 'text': 'If you look here, for example, the first one would be comedy content.', 'start': 489.478, 'duration': 5.902}], 'summary': 'Ready to apply machine learning with three components. describing a viewer as a vector of factors.', 'duration': 29.179, 'max_score': 466.201, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI466201.jpg'}, {'end': 675.882, 'src': 'embed', 'start': 645.891, 'weight': 1, 'content': [{'end': 650.913, 'text': 'Now, what machine learning will do is reverse engineer that process.', 'start': 645.891, 'duration': 5.022}, {'end': 660.858, 'text': 'It starts from the rating, and then tries to find out what factors would be consistent with that rating.', 'start': 654.395, 'duration': 6.463}, {'end': 662.873, 'text': 'So think of it this way.', 'start': 661.872, 'duration': 1.001}, {'end': 666.315, 'text': "You start, let's say, with completely random factors.", 'start': 663.153, 'duration': 3.162}, {'end': 675.882, 'text': 'So you take these guys just random numbers from beginning to end and these guys random numbers from beginning to end for every user and every movie.', 'start': 667.776, 'duration': 8.106}], 'summary': 'Machine learning reverse engineers the process by finding factors consistent with ratings.', 'duration': 29.991, 'max_score': 645.891, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI645891.jpg'}, {'end': 763.209, 'src': 'embed', 'start': 734.602, 'weight': 0, 'content': [{'end': 745.547, 'text': "that didn't watch a movie and you get the vector that resulted from that learning process and you get the movie vector that resulted from that process and you do the inner product,", 'start': 734.602, 'duration': 10.945}, {'end': 751.07, 'text': 'lo and behold, you get a rating which is actually consistent with how that viewer rates the movie.', 'start': 745.547, 'duration': 5.523}, {'end': 752.51, 'text': "That's the idea.", 'start': 751.87, 'duration': 0.64}, {'end': 763.209, 'text': 'Now, this actually, the solution I described, is one of the winning solutions in the competition that I mentioned.', 'start': 755.433, 'duration': 7.776}], 'summary': 'Winning solution in competition uses vectors for accurate movie ratings.', 'duration': 28.607, 'max_score': 734.602, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI734602.jpg'}, {'end': 839.969, 'src': 'embed', 'start': 817.081, 'weight': 3, 'content': [{'end': 825.707, 'text': 'they are going to rely on historical records of previous customers and how their credit behavior turned out and then try to reverse engineer the system.', 'start': 817.081, 'duration': 8.626}, {'end': 829.51, 'text': "And when they get the system frozen, they're going to apply it to a future customer.", 'start': 826.048, 'duration': 3.462}, {'end': 830.671, 'text': "That's the deal.", 'start': 830.131, 'duration': 0.54}, {'end': 838.048, 'text': 'So what are the components here? First, you have the applicant information.', 'start': 831.332, 'duration': 6.716}, {'end': 839.969, 'text': 'And the applicant information.', 'start': 838.809, 'duration': 1.16}], 'summary': 'Using historical customer records to reverse engineer credit system for future applicants.', 'duration': 22.888, 'max_score': 817.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI817081.jpg'}, {'end': 937.233, 'src': 'embed', 'start': 907.765, 'weight': 4, 'content': [{'end': 911.549, 'text': 'The output y is simply the decision, either to extend credit or not to extend credit.', 'start': 907.765, 'duration': 3.784}, {'end': 917.815, 'text': "It's plus 1 and minus 1.", 'start': 911.729, 'duration': 6.086}, {'end': 920.818, 'text': "And being a good or bad customer, that is from the bank's point of view.", 'start': 917.815, 'duration': 3.003}, {'end': 926.046, 'text': 'Now, we have after that the target function.', 'start': 923.024, 'duration': 3.022}, {'end': 933.371, 'text': 'The target function is a function from a domain X, which is the set of all of these.', 'start': 926.466, 'duration': 6.905}, {'end': 937.233, 'text': 'So it is a set of vectors of d dimensions.', 'start': 934.832, 'duration': 2.401}], 'summary': 'The target function determines credit extension based on customer behavior.', 'duration': 29.468, 'max_score': 907.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI907765.jpg'}], 'start': 466.201, 'title': 'Machine learning applications', 'summary': 'Covers a machine learning approach for movie rating, used in a competition and aiming to automate the rating process, as well as a metaphor of credit approval in machine learning, predicting credit extension based on customer data.', 'chapters': [{'end': 779.867, 'start': 466.201, 'title': 'Machine learning for movie rating', 'summary': 'Explains a machine learning approach for movie rating, using a vector of factors for viewers and movies, with the winning solution being used in a competition and the approach aiming to automate the rating process without the need for manual analysis.', 'duration': 313.666, 'highlights': ['The machine learning approach for movie rating involves using a vector of factors for viewers and movies, with the aim of automating the rating process. The machine learning approach involves using a vector of factors for viewers and movies to automate the rating process without the need for manual analysis.', 'The described solution for movie rating is one of the winning solutions in a competition, demonstrating its real-world applicability. The described solution for movie rating is one of the winning solutions in a competition, demonstrating its real-world applicability.', 'The learning process involves reverse engineering the rating by starting with random factors and gradually adjusting them based on actual ratings, eventually resulting in meaningful factors that produce consistent ratings. The learning process involves reverse engineering the rating by starting with random factors and gradually adjusting them based on actual ratings, eventually resulting in meaningful factors that produce consistent ratings.']}, {'end': 1010.471, 'start': 780.347, 'title': 'Credit approval metaphor', 'summary': "Explains the metaphor of credit approval in the context of machine learning, using historical customer data to learn an ideal credit approval formula, aiming to predict whether to extend credit or not based on the customer's application information and previous credit behavior.", 'duration': 230.124, 'highlights': ["The bank uses historical records of previous customers' credit behavior to reverse engineer a system for credit approval. The bank relies on historical records of previous customers' credit behavior to reverse engineer a system for credit approval, aiming to predict whether to extend credit or not based on the customer's application information and previous credit behavior.", 'The input x represents the customer application, while the output y indicates the decision to either extend credit or deny it. The input x represents the customer application, while the output y indicates the decision to either extend credit or deny it, with the target function being the ideal credit approval formula, which is unknown and needs to be learned from data.', "The data used for learning the target function is based on previous customer application records, including input information and the output of their credit behavior. The data used for learning the target function is based on previous customer application records, including input information and the output of their credit behavior, aiming to predict credit approval based on historical records of previous customers' credit behavior."]}], 'duration': 544.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI466201.jpg', 'highlights': ['The described solution for movie rating is one of the winning solutions in a competition, demonstrating its real-world applicability.', 'The learning process involves reverse engineering the rating by starting with random factors and gradually adjusting them based on actual ratings, eventually resulting in meaningful factors that produce consistent ratings.', 'The machine learning approach for movie rating involves using a vector of factors for viewers and movies, with the aim of automating the rating process.', "The bank relies on historical records of previous customers' credit behavior to reverse engineer a system for credit approval, aiming to predict whether to extend credit or not based on the customer's application information and previous credit behavior.", 'The input x represents the customer application, while the output y indicates the decision to either extend credit or deny it, with the target function being the ideal credit approval formula, which is unknown and needs to be learned from data.', "The data used for learning the target function is based on previous customer application records, including input information and the output of their credit behavior, aiming to predict credit approval based on historical records of previous customers' credit behavior."]}, {'end': 1534.401, 'segs': [{'end': 1037.731, 'src': 'embed', 'start': 1010.791, 'weight': 0, 'content': [{'end': 1014.693, 'text': 'All of this makes sense when you are talking about you have 100, 000 of those guys.', 'start': 1010.791, 'duration': 3.902}, {'end': 1018.655, 'text': 'Then you pretty much say, OK, I will capture what the essence of that function is.', 'start': 1015.093, 'duration': 3.562}, {'end': 1020.256, 'text': 'So this is the data.', 'start': 1019.375, 'duration': 0.881}, {'end': 1026.32, 'text': 'And then you use the data, which is the historical records, in order to get the hypothesis.', 'start': 1020.776, 'duration': 5.544}, {'end': 1033.708, 'text': 'The hypothesis is the formal name we are going to call the formula we get to approximate the target function.', 'start': 1026.821, 'duration': 6.887}, {'end': 1037.731, 'text': 'So the hypothesis lives in the same word as the target function.', 'start': 1034.288, 'duration': 3.443}], 'summary': 'Using historical records, we derive a hypothesis to approximate the target function.', 'duration': 26.94, 'max_score': 1010.791, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1010791.jpg'}, {'end': 1131, 'src': 'embed', 'start': 1101.373, 'weight': 1, 'content': [{'end': 1104.174, 'text': 'Otherwise, the target function is a mysterious quantity for us.', 'start': 1101.373, 'duration': 2.801}, {'end': 1108.255, 'text': 'And eventually, we would like to produce the final hypothesis.', 'start': 1105.314, 'duration': 2.941}, {'end': 1117.377, 'text': 'The final hypothesis is the formula the bank is going to use in order to approve or deny credit, with the hope that G hopefully approximates that F.', 'start': 1108.475, 'duration': 8.902}, {'end': 1122.778, 'text': 'Now, what connects those two guys? This will be the learning algorithm.', 'start': 1117.377, 'duration': 5.401}, {'end': 1131, 'text': 'So the learning algorithm takes the examples and will produce the final hypothesis, as we described in the example of the movie rating.', 'start': 1123.178, 'duration': 7.822}], 'summary': 'Learning algorithm produces final hypothesis for credit approval based on examples.', 'duration': 29.627, 'max_score': 1101.373, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1101373.jpg'}, {'end': 1264.241, 'src': 'embed', 'start': 1234.66, 'weight': 2, 'content': [{'end': 1236.522, 'text': 'So there is no downside.', 'start': 1234.66, 'duration': 1.862}, {'end': 1241.405, 'text': 'The upside is not obvious here, but it will become obvious as we go through the theory.', 'start': 1237.482, 'duration': 3.923}, {'end': 1246.769, 'text': 'The hypothesis set will play a pivotal role in the theory of learning.', 'start': 1242.006, 'duration': 4.763}, {'end': 1250.171, 'text': 'It will tell us can we learn, and how well we learn, and whatnot.', 'start': 1247.229, 'duration': 2.942}, {'end': 1255.816, 'text': 'Therefore, having it as an explicit component in the problem statement will make the theory go through.', 'start': 1250.652, 'duration': 5.164}, {'end': 1259.218, 'text': "So that's why we have this figure.", 'start': 1256.636, 'duration': 2.582}, {'end': 1264.241, 'text': 'So now, let me focus on the solution components of that figure.', 'start': 1259.778, 'duration': 4.463}], 'summary': 'Hypothesis set is pivotal in theory of learning, aiding in understanding learning capability and making theory comprehensive.', 'duration': 29.581, 'max_score': 1234.66, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1234660.jpg'}, {'end': 1388.916, 'src': 'embed', 'start': 1359.895, 'weight': 3, 'content': [{'end': 1366.039, 'text': 'So if you are asked what is the learning model you are using, you are actually choosing both a hypothesis set and a learning algorithm.', 'start': 1359.895, 'duration': 6.144}, {'end': 1368.08, 'text': "We'll see the perceptron in a moment.", 'start': 1366.699, 'duration': 1.381}, {'end': 1370.221, 'text': 'So the perceptron, this will be the perceptron model.', 'start': 1368.1, 'duration': 2.121}, {'end': 1372.523, 'text': 'And this will be the perceptron learning algorithm.', 'start': 1370.582, 'duration': 1.941}, {'end': 1374.184, 'text': 'This could be neural network.', 'start': 1372.843, 'duration': 1.341}, {'end': 1376.045, 'text': 'And this would be back propagation.', 'start': 1374.604, 'duration': 1.441}, {'end': 1380.268, 'text': "This could be support vector machines of some kind, let's say radial business function version.", 'start': 1376.505, 'duration': 3.763}, {'end': 1382.249, 'text': 'And this would be the quadratic programming.', 'start': 1380.588, 'duration': 1.661}, {'end': 1388.916, 'text': 'So every time you have a model, there is a hypothesis set, and then there is an algorithm that will do the searching and produce one of those guys.', 'start': 1382.569, 'duration': 6.347}], 'summary': 'Choosing a learning model involves a hypothesis set and a learning algorithm, such as perceptron, neural network with back propagation, or support vector machines with quadratic programming.', 'duration': 29.021, 'max_score': 1359.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1359895.jpg'}], 'start': 1010.791, 'title': 'Machine learning concepts', 'summary': 'Covers machine learning basics, including target function, hypothesis, and data. it also delves into learning algorithm, hypothesis set, and solution components, emphasizing the importance of historical records and training examples in approximating the unknown target function.', 'chapters': [{'end': 1101.233, 'start': 1010.791, 'title': 'Machine learning basics', 'summary': 'Discusses the concept of target function, hypothesis, and data in machine learning, aiming to approximate the unknown target function with the known hypothesis, using historical records and training examples.', 'duration': 90.442, 'highlights': ['The target function, denoted as f, is unknown, while the hypothesis, denoted as g, is known and created by us, aiming to approximate f well.', 'The data consists of capital N points, serving as the data set, with the output always denoted as y.', 'The historical records are used to obtain the hypothesis, which is the formal name for the formula approximating the target function.', 'The diagram illustrates the unknown target function, which is ideal but never known, and the training examples serve as the means to understand it.']}, {'end': 1259.218, 'start': 1101.373, 'title': 'Learning algorithm and hypothesis set', 'summary': "Discusses the learning algorithm's role in producing the final hypothesis for credit approval, emphasizing the importance of the hypothesis set and its impact on the theory of learning.", 'duration': 157.845, 'highlights': ['The learning algorithm produces the final hypothesis, which is the formula for credit approval, aiming to approximate the target function.', 'The hypothesis set, representing a set of candidate formulas, plays a pivotal role in the theory of learning, determining learning capabilities and outcomes.', 'The inclusion of a hypothesis set poses no downside as it aligns with practical considerations, and its explicit presence in the problem statement facilitates the development of learning theory.', 'The learning algorithm creates the formula from a pre-set model of candidate formulas, known as the hypothesis set, to pick one hypothesis for credit approval.']}, {'end': 1534.401, 'start': 1259.778, 'title': 'Solution components and hypothesis set', 'summary': 'Discusses the solution components of the target function, learning algorithm, and hypothesis set, particularly focusing on the perceptron model and its application in determining credit approval based on attributes like salary and outstanding debt.', 'duration': 274.623, 'highlights': ['The learning model consists of a hypothesis set and a learning algorithm, such as the perceptron model and perceptron learning algorithm, used to produce hypotheses (G) for solving the problem.', 'The perceptron model uses weights (w) for attributes like salary and outstanding debt, and a linear formula to calculate a credit score, which is then compared to a threshold for credit card approval or denial.', 'The hypothesis set formalization defines the functional form of all hypotheses, including the credit score and threshold comparison for credit approval or denial.']}], 'duration': 523.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1010791.jpg', 'highlights': ['The historical records are used to obtain the hypothesis, which is the formal name for the formula approximating the target function.', 'The learning algorithm produces the final hypothesis, which is the formula for credit approval, aiming to approximate the target function.', 'The hypothesis set, representing a set of candidate formulas, plays a pivotal role in the theory of learning, determining learning capabilities and outcomes.', 'The learning model consists of a hypothesis set and a learning algorithm, such as the perceptron model and perceptron learning algorithm, used to produce hypotheses (G) for solving the problem.']}, {'end': 2046.096, 'segs': [{'end': 1560.066, 'src': 'embed', 'start': 1534.921, 'weight': 2, 'content': [{'end': 1540.502, 'text': 'Well, the function that takes the real number and produces the sign, plus 1 or minus 1, is called the sign.', 'start': 1534.921, 'duration': 5.581}, {'end': 1544.843, 'text': 'So when you take the sign of this thing, this will indeed be plus 1 or minus 1.', 'start': 1540.942, 'duration': 3.901}, {'end': 1546.623, 'text': 'And this will give the decision you want.', 'start': 1544.843, 'duration': 1.78}, {'end': 1548.984, 'text': 'And that will be the form of your hypothesis.', 'start': 1547.043, 'duration': 1.941}, {'end': 1552.144, 'text': "Now, let's put it in color.", 'start': 1549.784, 'duration': 2.36}, {'end': 1560.066, 'text': 'And you realize that what defines h is your choice of wi and the threshold.', 'start': 1553.465, 'duration': 6.601}], 'summary': 'The sign function produces the sign of a real number, yielding a decision for the hypothesis based on the choice of wi and threshold.', 'duration': 25.145, 'max_score': 1534.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1534921.jpg'}, {'end': 1598.237, 'src': 'embed', 'start': 1575.09, 'weight': 1, 'content': [{'end': 1586.133, 'text': 'But what we vary to get one hypothesis or another and what the algorithm needs to vary in order to choose the final hypothesis are those parameters which in this case are wi and the threshold.', 'start': 1575.09, 'duration': 11.043}, {'end': 1590.207, 'text': "So let's look at it visually.", 'start': 1588.866, 'duration': 1.341}, {'end': 1595.734, 'text': "Let's assume that the data you are working with is linearly separable.", 'start': 1590.608, 'duration': 5.126}, {'end': 1598.237, 'text': 'In this case, for example, you have nine data points.', 'start': 1595.814, 'duration': 2.423}], 'summary': 'Vary parameters to get hypotheses, with 9 linearly separable data points.', 'duration': 23.147, 'max_score': 1575.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1575090.jpg'}, {'end': 1739.254, 'src': 'embed', 'start': 1712.144, 'weight': 3, 'content': [{'end': 1718.427, 'text': "And what is the 0 term? It's the threshold, which you conveniently call w0 with a plus sign, multiplied by the 1.", 'start': 1712.144, 'duration': 6.283}, {'end': 1721.228, 'text': 'So indeed, this will be the formula equivalent to that.', 'start': 1718.427, 'duration': 2.801}, {'end': 1722.348, 'text': 'So it looks better.', 'start': 1721.608, 'duration': 0.74}, {'end': 1725.389, 'text': 'And this is the standard notation we are going to use.', 'start': 1723.388, 'duration': 2.001}, {'end': 1731.091, 'text': 'And now that we put it as a vector form, which will simplify matters.', 'start': 1726.529, 'duration': 4.562}, {'end': 1739.254, 'text': 'So in this case, you will be having an inner product between a vector w, a column vector, and a vector x.', 'start': 1731.551, 'duration': 7.703}], 'summary': 'Introducing the standard notation for a simplified inner product formula.', 'duration': 27.11, 'max_score': 1712.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1712144.jpg'}, {'end': 1849.653, 'src': 'embed', 'start': 1826.753, 'weight': 4, 'content': [{'end': 1834.437, 'text': 'It means that when you apply your formula with the current w, the w is the one that the algorithm will play with.', 'start': 1826.753, 'duration': 7.684}, {'end': 1836.038, 'text': 'apply it to this particular x.', 'start': 1834.437, 'duration': 1.601}, {'end': 1841.15, 'text': 'Then what happens? You get something that is not the credit behavior you want.', 'start': 1837.409, 'duration': 3.741}, {'end': 1842.39, 'text': 'It is misclassified.', 'start': 1841.47, 'duration': 0.92}, {'end': 1846.112, 'text': 'So what do we do when a point is misclassified? We have to do something.', 'start': 1843.551, 'duration': 2.561}, {'end': 1849.653, 'text': 'So what the algorithm does, it updates the weight vector.', 'start': 1846.912, 'duration': 2.741}], 'summary': 'Algorithm updates weight vector for misclassified points.', 'duration': 22.9, 'max_score': 1826.753, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1826753.jpg'}, {'end': 1914.773, 'src': 'heatmap', 'start': 1856.114, 'weight': 1, 'content': [{'end': 1859.775, 'text': 'And this is the formula that it does.', 'start': 1856.114, 'duration': 3.661}, {'end': 1861.136, 'text': "So I'll explain it in a moment.", 'start': 1859.995, 'duration': 1.141}, {'end': 1868.997, 'text': 'Let me first try to explain the inner product in terms of agreement or disagreement.', 'start': 1861.536, 'duration': 7.461}, {'end': 1881.32, 'text': 'If you have the vector x and the vector w this way, their inner product will be positive, and the sign will give you a plus 1.', 'start': 1870.878, 'duration': 10.442}, {'end': 1887.628, 'text': 'If they are this way, the inner product will be negative, and the sign will be minus 1.', 'start': 1881.32, 'duration': 6.308}, {'end': 1897.635, 'text': "So being misclassified means that either they are this way, and the output should be minus 1, or it's this way, and the output should be plus 1.", 'start': 1887.628, 'duration': 10.007}, {'end': 1899.196, 'text': "That's what makes it misclassified.", 'start': 1897.635, 'duration': 1.561}, {'end': 1914.773, 'text': 'Right?. If you look here at this formula, it takes the old w and adds something that depends on the misclassified point, both in terms of the xn and yn.', 'start': 1899.596, 'duration': 15.177}], 'summary': 'Explaining the inner product and misclassification in formula.', 'duration': 58.659, 'max_score': 1856.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1856114.jpg'}, {'end': 2031.486, 'src': 'embed', 'start': 2002.83, 'weight': 0, 'content': [{'end': 2004.19, 'text': "It means it's misclassified.", 'start': 2002.83, 'duration': 1.36}, {'end': 2011.992, 'text': 'So now you would like to adjust the weights, that is, change, move around that purple line, such that the point is classified correctly.', 'start': 2004.691, 'duration': 7.301}, {'end': 2016.235, 'text': 'If you apply the learning rule, you will find that you are actually moving in this direction,', 'start': 2012.392, 'duration': 3.843}, {'end': 2022.619, 'text': 'which means that the blue point will likely be correctly classified after that iteration.', 'start': 2016.235, 'duration': 6.384}, {'end': 2029.124, 'text': "There is a problem, because let's say that I actually move this guy this direction.", 'start': 2023.8, 'duration': 5.324}, {'end': 2031.486, 'text': 'Well, this one, I got it right.', 'start': 2029.744, 'duration': 1.742}], 'summary': 'Adjust weights to correctly classify points using learning rule.', 'duration': 28.656, 'max_score': 2002.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2002830.jpg'}], 'start': 1534.921, 'title': 'Perceptron learning algorithm', 'summary': 'Explains the concept of the perceptron hypothesis and learning algorithm, involving choosing parameters wi and a threshold to separate data points into different classes. it introduces the perceptron learning algorithm as a method to navigate through the space of hypothesis, updating weight vectors based on misclassified points to achieve correct classification, while addressing its limitations and iterations.', 'chapters': [{'end': 1756.258, 'start': 1534.921, 'title': 'Perceptron hypothesis and learning algorithm', 'summary': 'Explains the concept of the perceptron hypothesis and learning algorithm, which involves choosing parameters wi and threshold to separate data points into different classes, illustrated with a visual representation of linearly separable data points and the process of adjusting parameters to achieve a correct classification.', 'duration': 221.337, 'highlights': ['The learning algorithm involves playing around with parameters wi and threshold to move a line around trying to arrive at a correct classification, illustrated with a visual representation of linearly separable data points. The learning algorithm involves playing around with parameters wi and threshold to move a line around trying to arrive at a correct classification. This is illustrated with a visual representation of linearly separable data points, with the process of adjusting parameters to achieve a correct classification.', 'The function for the perceptron hypothesis takes a real number and produces the sign, plus 1 or minus 1, based on the choice of wi and threshold, which defines one hypothesis versus the other. The function for the perceptron hypothesis takes a real number and produces the sign, plus 1 or minus 1, based on the choice of wi and the threshold, which defines one hypothesis versus the other.', 'Introduction of an artificial coordinate x0 as an artificial constant, allowing the formula to simplify and standardizing the notation for the perceptron hypothesis. Introduction of an artificial coordinate x0 as an artificial constant, allowing the formula to simplify and standardizing the notation for the perceptron hypothesis.']}, {'end': 2046.096, 'start': 1759.071, 'title': 'Perceptron learning algorithm', 'summary': 'Introduces the perceptron learning algorithm as a method to navigate through the space of hypothesis, updating weight vectors based on misclassified points to achieve correct classification, while addressing its limitations and iterations.', 'duration': 287.025, 'highlights': ['The Perceptron Learning Algorithm is a method that updates weight vectors based on misclassified points, aiming to achieve correct classification, illustrated through the manipulation of vectors to correct misclassifications.', "The algorithm takes existing training data, such as customers' credit behavior, and updates the weight vector to improve classification, either by adding or subtracting vectors based on misclassifications.", 'The iterations of the Perceptron Learning Algorithm involve adjusting weight vectors to correctly classify points, but may lead to potential issues of overfitting and not considering other points during the adjustment.']}], 'duration': 511.175, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI1534921.jpg', 'highlights': ['The Perceptron Learning Algorithm updates weight vectors based on misclassified points to achieve correct classification.', 'The learning algorithm involves playing around with parameters wi and threshold to move a line around trying to arrive at a correct classification, illustrated with a visual representation of linearly separable data points.', 'The function for the perceptron hypothesis takes a real number and produces the sign, plus 1 or minus 1, based on the choice of wi and the threshold, which defines one hypothesis versus the other.', 'Introduction of an artificial coordinate x0 as an artificial constant, allowing the formula to simplify and standardizing the notation for the perceptron hypothesis.', "The algorithm takes existing training data, such as customers' credit behavior, and updates the weight vector to improve classification, either by adding or subtracting vectors based on misclassifications.", 'The iterations of the Perceptron Learning Algorithm involve adjusting weight vectors to correctly classify points, but may lead to potential issues of overfitting and not considering other points during the adjustment.']}, {'end': 2686.618, 'segs': [{'end': 2069.907, 'src': 'embed', 'start': 2047.056, 'weight': 3, 'content': [{'end': 2056.081, 'text': 'Well, the good thing about the conceptual learning algorithm is that all you need to do is for iterations 1,, 2,, 3, 4, et cetera.', 'start': 2047.056, 'duration': 9.025}, {'end': 2058.944, 'text': 'pick a misclassified point, any one you like.', 'start': 2056.081, 'duration': 2.863}, {'end': 2068.326, 'text': 'And then apply the iteration to it, the iteration we just talked about, which is this one, the top one.', 'start': 2062.101, 'duration': 6.225}, {'end': 2069.907, 'text': "And that's it.", 'start': 2069.527, 'duration': 0.38}], 'summary': 'Conceptual learning algorithm iterates through misclassified points for application.', 'duration': 22.851, 'max_score': 2047.056, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2047056.jpg'}, {'end': 2121.894, 'src': 'embed', 'start': 2091.155, 'weight': 4, 'content': [{'end': 2094.056, 'text': "It's a linear model, and this is your algorithm.", 'start': 2091.155, 'duration': 2.901}, {'end': 2100.28, 'text': 'All you need to do is be very patient, because one, two, three, four, this is really long.', 'start': 2094.775, 'duration': 5.505}, {'end': 2101.5, 'text': 'At times, it can be very long.', 'start': 2100.36, 'duration': 1.14}, {'end': 2106.023, 'text': "But that's the promise, as long as the data is linearly separable.", 'start': 2101.981, 'duration': 4.042}, {'end': 2108.765, 'text': 'So now we have one learning model.', 'start': 2107.024, 'duration': 1.741}, {'end': 2116.37, 'text': 'And if I give you now credit data from a bank, previous customers and their credit behavior,', 'start': 2109.225, 'duration': 7.145}, {'end': 2121.894, 'text': 'you can actually and come up with a that you can hand to the bank.', 'start': 2116.37, 'duration': 5.524}], 'summary': 'Linear model requires patience, but promises accurate predictions for credit behavior based on bank data.', 'duration': 30.739, 'max_score': 2091.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2091155.jpg'}, {'end': 2363.459, 'src': 'embed', 'start': 2337.083, 'weight': 0, 'content': [{'end': 2343.187, 'text': 'Anytime you have the data that is given to you with the output explicitly given.', 'start': 2337.083, 'duration': 6.104}, {'end': 2346.269, 'text': 'here is the user and movie and here is the rating.', 'start': 2343.187, 'duration': 3.082}, {'end': 2350.611, 'text': 'Here is the previous customer, and here is their credit behavior.', 'start': 2347.669, 'duration': 2.942}, {'end': 2356.034, 'text': "It's as if a supervisor is helping you out, in order to be able to classify the future ones.", 'start': 2351.111, 'duration': 4.923}, {'end': 2357.615, 'text': "That's why it's called supervised.", 'start': 2356.334, 'duration': 1.281}, {'end': 2363.459, 'text': "So let's take an example of coin recognition, just to be able to contrast it with unsupervised learning in a moment.", 'start': 2358.236, 'duration': 5.223}], 'summary': 'Supervised learning uses labeled data to predict future outcomes. it involves customer and credit behavior data for classification.', 'duration': 26.376, 'max_score': 2337.083, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2337083.jpg'}, {'end': 2482.921, 'src': 'embed', 'start': 2456.129, 'weight': 1, 'content': [{'end': 2466.011, 'text': 'For unsupervised learning, instead of having the examples, the training data, having this form, which is the input plus the correct target,', 'start': 2456.129, 'duration': 9.882}, {'end': 2471.533, 'text': 'the correct output, the customer and how they behaved in reality in credit.', 'start': 2466.011, 'duration': 5.522}, {'end': 2477.374, 'text': 'We are going to have examples that have less information, so much less laughable.', 'start': 2472.553, 'duration': 4.821}, {'end': 2480.575, 'text': 'I am just going to tell you.', 'start': 2479.575, 'duration': 1}, {'end': 2482.921, 'text': 'what the input is.', 'start': 2481.96, 'duration': 0.961}], 'summary': 'Unsupervised learning uses less detailed examples for credit behavior analysis.', 'duration': 26.792, 'max_score': 2456.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2456129.jpg'}, {'end': 2652.988, 'src': 'embed', 'start': 2624.077, 'weight': 2, 'content': [{'end': 2629.48, 'text': 'And this is why a set like that, which looks like a complete jungle, is actually useful.', 'start': 2624.077, 'duration': 5.403}, {'end': 2632.983, 'text': 'Let me give you another interesting example of unsupervised learning,', 'start': 2630.101, 'duration': 2.882}, {'end': 2636.865, 'text': 'where I give you the input without the output and you are actually in a better situation to learn.', 'start': 2632.983, 'duration': 3.882}, {'end': 2644.13, 'text': "Let's say that your company, or your school in this case, is sending you for a semester in Rio de Janeiro.", 'start': 2638.366, 'duration': 5.764}, {'end': 2652.988, 'text': "So you're very excited, and you decide that you'd better learn some Portuguese in order to be able to speak the language when you arrive.", 'start': 2645.865, 'duration': 7.123}], 'summary': 'Unsupervised learning can be useful, like learning portuguese for a trip to rio de janeiro.', 'duration': 28.911, 'max_score': 2624.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2624077.jpg'}], 'start': 2047.056, 'title': 'Learning in data science', 'summary': 'Covers conceptual learning algorithm, its application to credit data, types of learning (supervised, unsupervised, reinforcement), and the significance of labeled and unlabeled data, providing examples of coin recognition and language learning.', 'chapters': [{'end': 2137.585, 'start': 2047.056, 'title': 'Conceptual learning algorithm', 'summary': 'Discusses the conceptual learning algorithm, emphasizing its simplicity and promise of finding a correct solution if the data is linearly separable, and it also highlights the application of the algorithm to credit data from a bank.', 'duration': 90.529, 'highlights': ['The conceptual learning algorithm involves iterating through misclassified points and applying a specific iteration to them, eventually leading to a correct solution if the data was originally linearly separable.', 'The algorithm provides the simplest possible learning model, a linear model, which requires patience due to the potential length of the process.', 'The application of the algorithm to credit data from a bank enables the creation of a classification model based on historical records, though its accuracy for predicting future customer behaviors remains uncertain.']}, {'end': 2686.618, 'start': 2137.945, 'title': 'Types of learning in data science', 'summary': 'Discusses the common premise of learning, types of learning (supervised, unsupervised, reinforcement), and provides examples of coin recognition and language learning, emphasizing the significance of labeled and unlabeled data and the potential of unsupervised learning.', 'duration': 548.673, 'highlights': ["Supervised learning is when data is given with the output explicitly provided, such as a customer's credit behavior, and is utilized to train a system to classify future data. Supervised learning involves data with explicit output, enabling the system to classify future data, such as in customer credit behavior.", 'Unsupervised learning involves examples with less information, where only the input is given without the target function, leading to the need to predict the output using clusters and ambiguous boundaries. Unsupervised learning deals with less information, requiring the prediction of output using clusters and ambiguous boundaries, as seen in the coin recognition and language learning examples.', 'An example of unsupervised learning is language learning through exposure to a radio station in Portuguese, where the meaning of words is learned without explicit instruction. An unsupervised learning example is language learning through exposure to a Portuguese radio station, illustrating the potential of unsupervised learning in language acquisition.']}], 'duration': 639.562, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2047056.jpg', 'highlights': ['Supervised learning involves data with explicit output, enabling the system to classify future data, such as in customer credit behavior.', 'Unsupervised learning involves examples with less information, where only the input is given without the target function, leading to the need to predict the output using clusters and ambiguous boundaries.', 'An example of unsupervised learning is language learning through exposure to a radio station in Portuguese, where the meaning of words is learned without explicit instruction.', 'The conceptual learning algorithm involves iterating through misclassified points and applying a specific iteration to them, eventually leading to a correct solution if the data was originally linearly separable.', 'The algorithm provides the simplest possible learning model, a linear model, which requires patience due to the potential length of the process.', 'The application of the algorithm to credit data from a bank enables the creation of a classification model based on historical records, though its accuracy for predicting future customer behaviors remains uncertain.']}, {'end': 3363.27, 'segs': [{'end': 2731.082, 'src': 'embed', 'start': 2705.117, 'weight': 1, 'content': [{'end': 2712.178, 'text': 'So you can think of unsupervised learning in one way or another as a way of getting a higher level representation of the input,', 'start': 2705.117, 'duration': 7.061}, {'end': 2714.759, 'text': "whether it's extremely high level, as in clusters.", 'start': 2712.178, 'duration': 2.581}, {'end': 2718.6, 'text': 'you forgot all the attributes and you just tell me a label or higher level,', 'start': 2714.759, 'duration': 3.841}, {'end': 2724.261, 'text': 'as in this a better representation than just the crude input into some model in your mind.', 'start': 2718.6, 'duration': 5.661}, {'end': 2731.082, 'text': "Now let's talk about reinforcement learning.", 'start': 2729.322, 'duration': 1.76}], 'summary': 'Unsupervised learning aims to achieve a higher level representation of input data, such as through clustering, enabling a better understanding of the input for model processing.', 'duration': 25.965, 'max_score': 2705.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2705117.jpg'}, {'end': 2812.3, 'src': 'embed', 'start': 2784.398, 'weight': 0, 'content': [{'end': 2787.78, 'text': "But when you choose an output, I'm going to tell you how well you are doing.", 'start': 2784.398, 'duration': 3.382}, {'end': 2792.966, 'text': 'Reinforcement learning is interesting because it is mostly our own experience in learning.', 'start': 2788.945, 'duration': 4.021}, {'end': 2797.408, 'text': 'Think of a toddler and a hot cup of tea in front of her.', 'start': 2793.466, 'duration': 3.942}, {'end': 2799.849, 'text': "She's looking at it, and she's very curious.", 'start': 2798.108, 'duration': 1.741}, {'end': 2804.23, 'text': 'So she reaches to touch, and she starts crying.', 'start': 2800.709, 'duration': 3.521}, {'end': 2806.911, 'text': 'The reward is very negative for trying.', 'start': 2804.77, 'duration': 2.141}, {'end': 2812.3, 'text': "Now, next time she looks at it, and she remembers the previous experience, and she doesn't touch it.", 'start': 2807.836, 'duration': 4.464}], 'summary': 'Reinforcement learning: toddler learns not to touch hot cup after negative experience.', 'duration': 27.902, 'max_score': 2784.398, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2784398.jpg'}, {'end': 2913.141, 'src': 'embed', 'start': 2883.665, 'weight': 3, 'content': [{'end': 2885.846, 'text': 'So now reinforcement learning comes in handy.', 'start': 2883.665, 'duration': 2.181}, {'end': 2890.729, 'text': 'What you are going to do, you are going to have the computer choose any output.', 'start': 2886.566, 'duration': 4.163}, {'end': 2892.891, 'text': 'A crazy move, for all you care.', 'start': 2891.17, 'duration': 1.721}, {'end': 2896.033, 'text': 'And then see what happens eventually.', 'start': 2893.931, 'duration': 2.102}, {'end': 2898.955, 'text': 'So this computer is playing against another computer.', 'start': 2896.133, 'duration': 2.822}, {'end': 2899.975, 'text': 'Both of them want to learn.', 'start': 2899.015, 'duration': 0.96}, {'end': 2904.317, 'text': 'And you make a move, and eventually you win or lose.', 'start': 2901.336, 'duration': 2.981}, {'end': 2910.92, 'text': 'So you propagate back the credit because of winning or losing, according to a very specific and sophisticated formula,', 'start': 2904.697, 'duration': 6.223}, {'end': 2913.141, 'text': 'into all the moves that happened.', 'start': 2910.92, 'duration': 2.221}], 'summary': 'Reinforcement learning involves computers making moves, learning from outcomes, and propagating credit based on a specific formula.', 'duration': 29.476, 'max_score': 2883.665, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2883665.jpg'}, {'end': 3045.77, 'src': 'embed', 'start': 3024.067, 'weight': 4, 'content': [{'end': 3032.396, 'text': 'And that supervised learning problem will give you a training set, some points mapped to plus 1, some points mapped to minus 1.', 'start': 3024.067, 'duration': 8.329}, {'end': 3034.939, 'text': "And then I'm going to give you a test point that is unlabeled.", 'start': 3032.396, 'duration': 2.543}, {'end': 3045.77, 'text': 'Your task is to look at the examples, learn the target function, apply it to the test point, and then decide what the value of the function is.', 'start': 3035.66, 'duration': 10.11}], 'summary': 'Supervised learning with labeled training set to predict test point value.', 'duration': 21.703, 'max_score': 3024.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3024067.jpg'}, {'end': 3326.764, 'src': 'embed', 'start': 3266.104, 'weight': 5, 'content': [{'end': 3273.007, 'text': 'How in the world am I going to tell what the learning outside is? That sounds about right.', 'start': 3266.104, 'duration': 6.903}, {'end': 3276.729, 'text': "But we are in trouble, because that's the premise of learning.", 'start': 3273.868, 'duration': 2.861}, {'end': 3283.292, 'text': 'If the goal was to memorize the examples I gave you, that would be memorizing, not learning.', 'start': 3277.249, 'duration': 6.043}, {'end': 3287.254, 'text': 'Learning is to figure out a pattern that applies outside.', 'start': 3283.832, 'duration': 3.422}, {'end': 3291.436, 'text': 'And now we realize that outside, I cannot say anything.', 'start': 3288.074, 'duration': 3.362}, {'end': 3301.146, 'text': 'Does this mean that learning is doomed? Well, this is going to be a very short course.', 'start': 3293.397, 'duration': 7.749}, {'end': 3304.771, 'text': 'The good news is that learning is alive and well.', 'start': 3301.226, 'duration': 3.545}, {'end': 3311.862, 'text': 'And we are going to show that without compromising our basic premise.', 'start': 3306.534, 'duration': 5.328}, {'end': 3317.396, 'text': 'The target function will continue to be unknown.', 'start': 3313.493, 'duration': 3.903}, {'end': 3320.739, 'text': 'And we still mean unknown.', 'start': 3318.437, 'duration': 2.302}, {'end': 3323.301, 'text': 'And we will be able to learn.', 'start': 3321.54, 'duration': 1.761}, {'end': 3326.764, 'text': 'And that will be the subject of the next lecture.', 'start': 3324.462, 'duration': 2.302}], 'summary': 'Learning is about figuring out patterns that apply outside, even if the target function remains unknown.', 'duration': 60.66, 'max_score': 3266.104, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3266104.jpg'}], 'start': 2687.299, 'title': 'Language and reinforcement learning', 'summary': "Discusses language modeling, unsupervised learning, and reinforcement learning principles based on a toddler's experience. it also explores reinforcement learning using backgammon as an example and the challenges of supervised learning in a puzzle scenario.", 'chapters': [{'end': 2830.774, 'start': 2687.299, 'title': 'Language learning and reinforcement', 'summary': "Discusses the process of developing a model of language in the mind, the concept of unsupervised learning for higher level representation, and the principles of reinforcement learning based on a toddler's experience, illustrating the process of curiosity and reward.", 'duration': 143.475, 'highlights': ["Reinforcement learning is based on our own experience, similar to a toddler's curiosity and learning from the consequences of touching a hot cup of tea.", 'Unsupervised learning provides a higher level representation of input, analogous to developing a model of language in the mind.', 'Learning a language involves developing a model in the mind, being eager to learn, and fixing the learned language in the mind for faster language acquisition.', 'Unsupervised learning aids in getting a higher level representation of input, which can be extremely high level, such as clusters, or a better representation than the crude input into a model in the mind.', "The process of reinforcement learning is illustrated through a toddler's experience of curiosity, touching a hot cup of tea, and learning from the consequences, leading to the concept of curiosity and reward in learning."]}, {'end': 2948.413, 'start': 2830.774, 'title': 'Reinforcement learning in backgammon', 'summary': 'Explains the concept of reinforcement learning, using the example of backgammon to illustrate how a computer can learn to play the game by making random moves and then propagating back the credit based on winning or losing, eventually leading to the development of a backgammon champion in a few days of cpu time.', 'duration': 117.639, 'highlights': ['The process of reinforcement learning is illustrated using the example of backgammon, where the computer makes random moves and then propagates back the credit based on winning or losing, ultimately leading to the development of a backgammon champion in a few days of CPU time.', 'Reinforcement learning is described as a key application in playing games, with backgammon being highlighted as an important game for the system to learn.']}, {'end': 3363.27, 'start': 2950.014, 'title': 'Machine learning learning puzzle', 'summary': 'Discusses the concept of supervised learning, presenting a learning puzzle where the target function is unknown, and demonstrates the challenges and potential solutions to learning outside the provided training set.', 'duration': 413.256, 'highlights': ['The chapter presents a learning puzzle where the target function is unknown, and the task is to learn the function and apply it to a test point, resulting in the audience providing varied responses indicating the challenges of learning an unknown function. learning puzzle, target function is unknown, varied responses', 'The speaker emphasizes the difficulty of learning outside the provided training set, as there are infinite functions that fit the given points and behave differently outside, illustrating the challenges of generalization in learning. difficulty of learning outside the training set, infinite functions, challenges of generalization', 'The chapter concludes with the assertion that learning is possible even with an unknown target function, hinting at potential solutions to learning in such scenarios. learning is possible with an unknown target function, potential solutions to learning']}], 'duration': 675.971, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI2687299.jpg', 'highlights': ["Reinforcement learning is based on our own experience, similar to a toddler's curiosity and learning from consequences.", 'Unsupervised learning provides a higher level representation of input, analogous to developing a model of language in the mind.', "The process of reinforcement learning is illustrated through a toddler's experience of curiosity, touching a hot cup of tea, and learning from the consequences.", 'The process of reinforcement learning is illustrated using the example of backgammon, where the computer makes random moves and then propagates back the credit based on winning or losing.', 'The chapter presents a learning puzzle where the target function is unknown, and the task is to learn the function and apply it to a test point, resulting in the audience providing varied responses indicating the challenges of learning an unknown function.', 'The speaker emphasizes the difficulty of learning outside the provided training set, as there are infinite functions that fit the given points and behave differently outside, illustrating the challenges of generalization in learning.', 'The chapter concludes with the assertion that learning is possible even with an unknown target function, hinting at potential solutions to learning in such scenarios.', 'Unsupervised learning aids in getting a higher level representation of input, which can be extremely high level, such as clusters, or a better representation than the crude input into a model in the mind.']}, {'end': 3999.585, 'segs': [{'end': 3470.125, 'src': 'embed', 'start': 3449.52, 'weight': 0, 'content': [{'end': 3461.983, 'text': 'However, if you apply the perceptron learning algorithm that is guaranteed to converge to a correct solution in the case of linear separability and you apply it to data that is not linearly separable,', 'start': 3449.52, 'duration': 12.463}, {'end': 3462.943, 'text': 'bad things happen.', 'start': 3461.983, 'duration': 0.96}, {'end': 3470.125, 'text': "Not only is it going not to converge, obviously it's not going to converge, because it terminates when there are no misclassified points.", 'start': 3463.563, 'duration': 6.562}], 'summary': 'Perceptron learning algorithm guarantees convergence to a correct solution in the case of linear separability, but fails to converge when applied to non-linearly separable data.', 'duration': 20.605, 'max_score': 3449.52, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3449520.jpg'}, {'end': 3528.46, 'src': 'embed', 'start': 3506.501, 'weight': 1, 'content': [{'end': 3514.628, 'text': "There's also a question of, how does the rate of convergence of the perceptron change with the dimensionality of the data? Badly.", 'start': 3506.501, 'duration': 8.127}, {'end': 3518.451, 'text': "That's the answer.", 'start': 3515.949, 'duration': 2.502}, {'end': 3521.334, 'text': 'You can build pathological cases where it really will take forever.', 'start': 3518.491, 'duration': 2.843}, {'end': 3528.46, 'text': 'However, I did not give the perceptron learning algorithm in the first lecture to tell you that this is the great algorithm that you need to learn.', 'start': 3521.954, 'duration': 6.506}], 'summary': "The perceptron's convergence rate worsens with data dimensionality, leading to potential infinite learning times.", 'duration': 21.959, 'max_score': 3506.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3506501.jpg'}, {'end': 3594.076, 'src': 'embed', 'start': 3569.465, 'weight': 5, 'content': [{'end': 3576.791, 'text': 'it will become very clear that there is a separation between the target function there is a pattern to detect and whether we can learn it.', 'start': 3569.465, 'duration': 7.326}, {'end': 3579.253, 'text': "It's very difficult for me to explain it in two minutes.", 'start': 3577.011, 'duration': 2.242}, {'end': 3581.054, 'text': 'It will take a full lecture to get there.', 'start': 3579.553, 'duration': 1.501}, {'end': 3585.678, 'text': 'But the essence of it is that you take the data.', 'start': 3581.575, 'duration': 4.103}, {'end': 3594.076, 'text': 'You apply your learning algorithm, and there are some things you can explicitly detect that will tell you whether you learned or not.', 'start': 3586.559, 'duration': 7.517}], 'summary': 'Separation between target function and learning can be explicitly detected.', 'duration': 24.611, 'max_score': 3569.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3569465.jpg'}, {'end': 3750.347, 'src': 'embed', 'start': 3724.357, 'weight': 2, 'content': [{'end': 3730.322, 'text': 'When you study linear regression under statistics, there is a lot of mathematics that goes with it, a lot of assumptions,', 'start': 3724.357, 'duration': 5.965}, {'end': 3731.863, 'text': 'because that is the purpose of the goal.', 'start': 3730.322, 'duration': 1.541}, {'end': 3739.505, 'text': 'In general, machine learning tries to make the least assumptions cover the most territory.', 'start': 3732.763, 'duration': 6.742}, {'end': 3741.185, 'text': 'These go together.', 'start': 3740.485, 'duration': 0.7}, {'end': 3746.506, 'text': "So it is not a mathematical discipline, but it's not a purely applied discipline.", 'start': 3742.185, 'duration': 4.321}, {'end': 3750.347, 'text': 'It spans both the mathematical to a certain extent,', 'start': 3746.966, 'duration': 3.381}], 'summary': 'Linear regression in statistics involves a lot of mathematics and assumptions, while machine learning aims to minimize assumptions and cover a broad area.', 'duration': 25.99, 'max_score': 3724.357, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3724357.jpg'}, {'end': 3797.174, 'src': 'embed', 'start': 3768.174, 'weight': 3, 'content': [{'end': 3771.676, 'text': 'Data mining has a huge intersection with machine learning.', 'start': 3768.174, 'duration': 3.502}, {'end': 3775.418, 'text': 'There are lots of disciplines around that actually share some value.', 'start': 3772.136, 'duration': 3.282}, {'end': 3781.983, 'text': "But the point is, the premise that you saw is so broad that it shouldn't be surprising that people, at different times,", 'start': 3776.118, 'duration': 5.865}, {'end': 3786.066, 'text': 'develop a particular discipline with its own jargons to deal with that discipline.', 'start': 3781.983, 'duration': 4.083}, {'end': 3797.174, 'text': "So what I'm giving you are machine learning as the mainstream goes and that can be applied as widely as possible to applications,", 'start': 3786.926, 'duration': 10.248}], 'summary': 'Data mining intersects with machine learning, with broad applications.', 'duration': 29, 'max_score': 3768.174, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3768174.jpg'}, {'end': 3867.749, 'src': 'embed', 'start': 3839.442, 'weight': 4, 'content': [{'end': 3841.964, 'text': 'For example, in support vector machines, it would be quadratic programming.', 'start': 3839.442, 'duration': 2.522}, {'end': 3843.765, 'text': 'It happens to be the one that works with that.', 'start': 3842.164, 'duration': 1.601}, {'end': 3849.449, 'text': 'But optimization is not something that machine learning people study for its own sake.', 'start': 3844.125, 'duration': 5.324}, {'end': 3854.434, 'text': 'They obviously studied to understand it better, and to choose the correct optimization method.', 'start': 3850.109, 'duration': 4.325}, {'end': 3855.735, 'text': 'Now problems.', 'start': 3854.955, 'duration': 0.78}, {'end': 3859.54, 'text': 'the question is alluding to something that will become clear when we talk about neural networks,', 'start': 3855.735, 'duration': 3.805}, {'end': 3862.443, 'text': 'which is local minimum versus global minimum and whatnot.', 'start': 3859.54, 'duration': 2.903}, {'end': 3867.749, 'text': 'And it is impossible to put this in any perspective before we get the details of neural networks.', 'start': 3862.984, 'duration': 4.765}], 'summary': 'Machine learning involves studying optimization methods to understand and choose the correct approach, with a focus on local versus global minimum in neural networks.', 'duration': 28.307, 'max_score': 3839.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3839442.jpg'}, {'end': 3986.618, 'src': 'embed', 'start': 3960.061, 'weight': 6, 'content': [{'end': 3965.545, 'text': 'And every time someone asks a question, the lecture number comes to my mind.', 'start': 3960.061, 'duration': 5.484}, {'end': 3966.926, 'text': "I know when I'm going to talk about it.", 'start': 3965.605, 'duration': 1.321}, {'end': 3969.867, 'text': 'So what you described is called sampling bias.', 'start': 3967.646, 'duration': 2.221}, {'end': 3971.829, 'text': 'And I will describe it in detail.', 'start': 3970.548, 'duration': 1.281}, {'end': 3978.093, 'text': "But when you use the bias data, let's say the bank uses historical records.", 'start': 3972.289, 'duration': 5.804}, {'end': 3980.854, 'text': 'So it sees the people who applied and were accepted.', 'start': 3978.533, 'duration': 2.321}, {'end': 3986.618, 'text': 'And for those guys, it can actually predict what the credit behavior is, because it has their credit history.', 'start': 3981.275, 'duration': 5.343}], 'summary': 'Sampling bias occurs when using biased historical data for prediction.', 'duration': 26.557, 'max_score': 3960.061, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3960061.jpg'}], 'start': 3363.33, 'title': 'Limitations and relationship', 'summary': 'Discusses the limitations of the perceptron learning algorithm, including its inability to handle non-linearly separable data and the impact of dimensionality on convergence. it also explores the relationship between machine learning and statistics, highlighting differences in assumptions and applicability in various fields.', 'chapters': [{'end': 3679.895, 'start': 3363.33, 'title': 'Perceptron learning algorithm', 'summary': 'Discusses the limitations of the perceptron learning algorithm, including its inability to handle non-linearly separable data, and the impact of dimensionality on convergence, while emphasizing the importance of avoiding reliance on visual inspection of data.', 'duration': 316.565, 'highlights': ['The perceptron learning algorithm is limited in handling non-linearly separable data, leading to convergence issues and potential degradation in solution quality. The perceptron learning algorithm is guaranteed to converge to a correct solution in the case of linear separability, but if applied to non-linearly separable data, it leads to convergence issues and can move from a very good solution to a terrible solution.', 'The dimensionality of the data significantly impacts the rate of convergence of the perceptron learning algorithm, with cases where convergence may take an exceedingly long time. The rate of convergence of the perceptron changes badly with the dimensionality of the data, and there can be pathological cases where it really will take forever for the algorithm to converge.', 'Emphasizes the importance of avoiding visual inspection of data and relying on learning algorithms to determine the presence of patterns. The chapter stresses the importance of not visually inspecting data to determine linearity separability, and instead emphasizes the reliance on learning algorithms to detect patterns in the data.']}, {'end': 3999.585, 'start': 3680.135, 'title': 'Machine learning and statistics', 'summary': 'Discusses the relationship between machine learning and statistics, emphasizing the differences in assumptions and mathematical rigor, and the broad applicability of machine learning in various fields such as data mining and computational learning.', 'duration': 319.45, 'highlights': ['Machine learning aims to make the least assumptions and cover the most territory, contrasting with the mathematical rigor and numerous assumptions in statistics, making it applicable to practical and scientific domains. Machine learning aims to make the least assumptions and cover the most territory, contrasting with the mathematical rigor and numerous assumptions in statistics, making it applicable to practical and scientific domains.', 'The intersection of machine learning with data mining and other disciplines, and the broadness of its premise, leading to the development of specific disciplines with their own jargons to deal with the diversity of applications. The intersection of machine learning with data mining and other disciplines, and the broadness of its premise, leading to the development of specific disciplines with their own jargons to deal with the diversity of applications.', 'Discussion about the use of optimization methods in machine learning, where specific methods such as quadratic programming are chosen based on their suitability for the task at hand. Discussion about the use of optimization methods in machine learning, where specific methods such as quadratic programming are chosen based on their suitability for the task at hand.', 'Explanation of sampling bias in data collection and the challenges associated with making biased decisions when using historical records for credit approval, especially for rejected applicants. Explanation of sampling bias in data collection and the challenges associated with making biased decisions when using historical records for credit approval, especially for rejected applicants.']}], 'duration': 636.255, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI3363330.jpg', 'highlights': ['The perceptron learning algorithm is limited in handling non-linearly separable data, leading to convergence issues and potential degradation in solution quality.', 'The dimensionality of the data significantly impacts the rate of convergence of the perceptron learning algorithm, with cases where convergence may take an exceedingly long time.', 'Machine learning aims to make the least assumptions and cover the most territory, contrasting with the mathematical rigor and numerous assumptions in statistics, making it applicable to practical and scientific domains.', 'The intersection of machine learning with data mining and other disciplines, and the broadness of its premise, leading to the development of specific disciplines with their own jargons to deal with the diversity of applications.', 'Discussion about the use of optimization methods in machine learning, where specific methods such as quadratic programming are chosen based on their suitability for the task at hand.', 'Emphasizes the importance of avoiding visual inspection of data and relying on learning algorithms to determine the presence of patterns.', 'Explanation of sampling bias in data collection and the challenges associated with making biased decisions when using historical records for credit approval, especially for rejected applicants.']}, {'end': 4873.047, 'segs': [{'end': 4050.944, 'src': 'embed', 'start': 4021.142, 'weight': 2, 'content': [{'end': 4023.664, 'text': 'The data set in this case is not completely representative.', 'start': 4021.142, 'duration': 2.522}, {'end': 4029.729, 'text': "And there is a particular principle in learning that we'll talk about, which is sampling bias, that deals with this case.", 'start': 4024.084, 'duration': 5.645}, {'end': 4032.257, 'text': 'Another question from here.', 'start': 4031.437, 'duration': 0.82}, {'end': 4037.319, 'text': 'You explained that we need to have a lot of data to learn.', 'start': 4034.338, 'duration': 2.981}, {'end': 4045.982, 'text': 'So how do you decide how much amount of data that is required for a particular problem in order to be able to come up with a reasonable?', 'start': 4037.499, 'duration': 8.483}, {'end': 4047.623, 'text': 'Good question.', 'start': 4047.042, 'duration': 0.581}, {'end': 4050.944, 'text': 'So let me tell you the theoretical and the practical answer.', 'start': 4047.883, 'duration': 3.061}], 'summary': 'Discussing sampling bias and determining the amount of data required for learning.', 'duration': 29.802, 'max_score': 4021.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4021142.jpg'}, {'end': 4188.307, 'src': 'embed', 'start': 4150.595, 'weight': 0, 'content': [{'end': 4155.895, 'text': "So the larger the hypothesis set is, probably I'll be able to better fit the data.", 'start': 4150.595, 'duration': 5.3}, {'end': 4163.337, 'text': 'But that, as you were explaining, might be a bad thing to do, because when the new data point comes, there might be trouble.', 'start': 4157.057, 'duration': 6.28}, {'end': 4168.339, 'text': 'So how do you decide the size of your? OK, you are asking all the right questions, and all of them are coming up.', 'start': 4163.358, 'duration': 4.981}, {'end': 4169.559, 'text': 'This is, again, part of the theory.', 'start': 4168.359, 'duration': 1.2}, {'end': 4171.08, 'text': 'But let me try to explain this.', 'start': 4169.64, 'duration': 1.44}, {'end': 4174.763, 'text': 'As we mentioned, learning is about being able to predict.', 'start': 4172.401, 'duration': 2.362}, {'end': 4181.05, 'text': 'So you are using the data not to memorize it, but to figure out what the pattern is.', 'start': 4175.504, 'duration': 5.546}, {'end': 4188.307, 'text': "And if you figure out a pattern that applies to all the data, and it's a reasonable pattern, then you have a chance that it will generalize outside.", 'start': 4182.265, 'duration': 6.042}], 'summary': 'Larger hypothesis set may fit data better, but could cause trouble with new data points. deciding the size is crucial for generalization.', 'duration': 37.712, 'max_score': 4150.595, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4150595.jpg'}, {'end': 4447.092, 'src': 'embed', 'start': 4422.415, 'weight': 1, 'content': [{'end': 4428.239, 'text': 'In general, the learning algorithm has the form of minimizing an error function.', 'start': 4422.415, 'duration': 5.824}, {'end': 4430.9, 'text': 'You can think of the perceptron.', 'start': 4429.74, 'duration': 1.16}, {'end': 4434.803, 'text': 'What does the algorithm do? It tries to minimize the classification error.', 'start': 4431.181, 'duration': 3.622}, {'end': 4439.806, 'text': 'That is your error function, and you are minimizing it using this particular update rule.', 'start': 4435.484, 'duration': 4.322}, {'end': 4442.949, 'text': "And in other cases, we'll see that we are minimizing an error function.", 'start': 4440.267, 'duration': 2.682}, {'end': 4447.092, 'text': 'Now, the minimization aspect is an optimization question.', 'start': 4443.749, 'duration': 3.343}], 'summary': 'Learning algorithm minimizes error function to optimize performance.', 'duration': 24.677, 'max_score': 4422.415, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4422415.jpg'}, {'end': 4765.875, 'src': 'embed', 'start': 4735.632, 'weight': 5, 'content': [{'end': 4737.613, 'text': 'Because that would be the more proper context for that.', 'start': 4735.632, 'duration': 1.981}, {'end': 4747.722, 'text': 'Another question is regarding hypothesis set.', 'start': 4741.617, 'duration': 6.105}, {'end': 4757.809, 'text': 'are there Bayesian hierarchical procedures to narrow down the hypothesis set??', 'start': 4747.722, 'duration': 10.087}, {'end': 4760.811, 'text': 'The choice of the hypothesis set and the model in general is model selection.', 'start': 4757.929, 'duration': 2.882}, {'end': 4765.875, 'text': 'And there is quite a bit of stuff that we are going to talk about in model selection, when we talk about validation.', 'start': 4760.871, 'duration': 5.004}], 'summary': 'Discussing bayesian hierarchical procedures for narrowing hypothesis set in model selection.', 'duration': 30.243, 'max_score': 4735.632, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4735632.jpg'}, {'end': 4859.19, 'src': 'embed', 'start': 4823.353, 'weight': 3, 'content': [{'end': 4827.515, 'text': 'There are methods of hierarchies and ramifications of it in generalization.', 'start': 4823.353, 'duration': 4.162}, {'end': 4830.156, 'text': 'I may touch upon it when I get to support vector machines.', 'start': 4828.015, 'duration': 2.141}, {'end': 4832.757, 'text': 'But again, there is a lot of theory.', 'start': 4830.556, 'duration': 2.201}, {'end': 4838.239, 'text': 'And if you read a book on machine learning written by someone from pure theory,', 'start': 4833.417, 'duration': 4.822}, {'end': 4840.42, 'text': 'you would think that you are reading about a completely different subject.', 'start': 4838.239, 'duration': 2.181}, {'end': 4845.263, 'text': "It's respectable stuff, but different from the other stuff that is practiced.", 'start': 4841.32, 'duration': 3.943}, {'end': 4847.684, 'text': "So one of the things that I'm trying to do.", 'start': 4845.723, 'duration': 1.961}, {'end': 4859.19, 'text': "I'm trying to pick from all the components of machine learning the big picture that gives you the understanding of the concept and the tools to use it in practice.", 'start': 4847.684, 'duration': 11.506}], 'summary': 'The speaker aims to distill practical insights from machine learning theory for a comprehensive understanding and practical application.', 'duration': 35.837, 'max_score': 4823.353, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4823353.jpg'}], 'start': 4000.265, 'title': 'Machine learning fundamentals', 'summary': 'Explores the impact of data and hypothesis set size on learning, emphasizing the importance of data size for generalization, and the trade-off between hypothesis set size and generalization. it also delves into defining points, patterns, mathematical foundations, and the relevance of bayesian principles in model selection.', 'chapters': [{'end': 4510.139, 'start': 4000.265, 'title': 'Learning and data size in machine learning', 'summary': 'Discusses the impact of data size and hypothesis set size on learning, emphasizing the importance of data size in generalization, and the trade-off between hypothesis set size and generalization, while also addressing the role of learning algorithm and error function in machine learning.', 'duration': 509.874, 'highlights': ['The importance of data size in generalization and the trade-off between hypothesis set size and generalization The chapter emphasizes the significance of data size in generalization, highlighting the trade-off between hypothesis set size and generalization, where a larger hypothesis set may overfit the data, while a smaller one may underfit, and discusses the impact of data size on performance.', 'The role of learning algorithm and error function in machine learning The learning algorithm plays a secondary role in determining generalization behavior, while the choice of error function or error measure directly affects the learning algorithm, and will be covered in the topic of error and noise.', 'Sampling bias and the impact of data size on learning The chapter addresses the principle of sampling bias and its impact on learning, emphasizing the practical lack of control over data size in most cases and the importance of understanding the impact of data size on learning and system performance.']}, {'end': 4873.047, 'start': 4510.959, 'title': 'Machine learning concepts and mathematical foundations', 'summary': 'Delves into the technical ways of defining points, the essence of machine learning, the importance of patterns, mathematical definitions for learning, and the relevance of bayesian principles in model selection.', 'duration': 362.088, 'highlights': ['The chapter explores the technical ways of defining points and their ramifications for machine learning, emphasizing the necessity of patterns and mathematical definitions for learning. Emphasis on technical ways of defining points and their ramifications for machine learning, necessity of patterns and mathematical definitions for learning.', 'It discusses the essence of machine learning, emphasizing the importance of patterns and the presence of data for effective learning. Emphasis on the essence of machine learning, importance of patterns and presence of data for effective learning.', 'The chapter also touches upon the relevance of Bayesian principles in model selection, highlighting the different schools of thought in dealing with the subject. Relevance of Bayesian principles in model selection, different schools of thought in dealing with the subject.']}], 'duration': 872.782, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mbyG85GZ0PI/pics/mbyG85GZ0PI4000265.jpg', 'highlights': ['The importance of data size in generalization and the trade-off between hypothesis set size and generalization', 'The role of learning algorithm and error function in machine learning', 'Sampling bias and the impact of data size on learning', 'The chapter explores the technical ways of defining points and their ramifications for machine learning', 'It discusses the essence of machine learning, emphasizing the importance of patterns and the presence of data for effective learning', 'The chapter also touches upon the relevance of Bayesian principles in model selection']}], 'highlights': ['The payout for achieving a 10% improvement in the in-house system by Netflix was $1 million, showcasing the substantial value attributed to performance enhancements in machine learning applications.', 'The crucial role of data in machine learning applications is emphasized, with the speaker highlighting the necessity of data for successful machine learning implementation and its direct correlation to the potential for financial gains.', 'The importance of machine learning in a wide spectrum of applications is underscored, with a specific example of its application in financial forecasting that can lead to significant financial gains even with small improvements in performance.', 'The learning process involves reverse engineering the rating by starting with random factors and gradually adjusting them based on actual ratings, eventually resulting in meaningful factors that produce consistent ratings.', 'The machine learning approach for movie rating involves using a vector of factors for viewers and movies, with the aim of automating the rating process.', "The bank relies on historical records of previous customers' credit behavior to reverse engineer a system for credit approval, aiming to predict whether to extend credit or not based on the customer's application information and previous credit behavior.", 'The learning algorithm produces the final hypothesis, which is the formula for credit approval, aiming to approximate the target function.', 'The hypothesis set, representing a set of candidate formulas, plays a pivotal role in the theory of learning, determining learning capabilities and outcomes.', 'The Perceptron Learning Algorithm updates weight vectors based on misclassified points to achieve correct classification.', 'Supervised learning involves data with explicit output, enabling the system to classify future data, such as in customer credit behavior.', 'Unsupervised learning involves examples with less information, where only the input is given without the target function, leading to the need to predict the output using clusters and ambiguous boundaries.', "Reinforcement learning is based on our own experience, similar to a toddler's curiosity and learning from consequences.", 'The perceptron learning algorithm is limited in handling non-linearly separable data, leading to convergence issues and potential degradation in solution quality.', 'The dimensionality of the data significantly impacts the rate of convergence of the perceptron learning algorithm, with cases where convergence may take an exceedingly long time.', 'Machine learning aims to make the least assumptions and cover the most territory, contrasting with the mathematical rigor and numerous assumptions in statistics, making it applicable to practical and scientific domains.', 'The importance of data size in generalization and the trade-off between hypothesis set size and generalization', 'The role of learning algorithm and error function in machine learning', 'The chapter explores the technical ways of defining points and their ramifications for machine learning', 'It discusses the essence of machine learning, emphasizing the importance of patterns and the presence of data for effective learning', 'The chapter also touches upon the relevance of Bayesian principles in model selection']}