title
Scikit Learn Machine Learning SVM Tutorial with Python p. 2 - Example

description
In this machine learning tutorial, we cover a very basic, yet powerful example of machine learning for image recognition. The point of this video is to get you familiar with machine learning in Python with sklearn, but also to show you that the actual machine learning part is the easy part. Playlist link: https://www.youtube.com/watch?v=URTZ2jKCgBc&list=PLQVvvaa0QuDd0flgGphKCej-9jp-QdzZ3&index=2 The real hard part is everything else. Getting data, organizing data, labeling data, scaling data.... and more. sample code: http://pythonprogramming.net http://seaofbtc.com http://sentdex.com http://hkinsley.com https://twitter.com/sentdex Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6

detail
{'title': 'Scikit Learn Machine Learning SVM Tutorial with Python p. 2 - Example', 'heatmap': [{'end': 814.399, 'start': 778.065, 'weight': 1}], 'summary': "Covers python ml tutorial with scikit-learn, discussing iris datasets, svm in machine learning, recognizing and classifying digits, and testing with digits data. it emphasizes the importance of data acquisition and organization, the challenges of applying machine learning in the real world, and the concept of svm using a simple visual demonstration. it also explains the process of recognizing and classifying digits using machine learning, including the importance of data normalization and the setup of a machine learning classifier using scikit-learn's svm dot svc. additionally, it covers fitting a model, running predictions with varying gamma values, observing accuracy, gradient descent in machine learning, and the training process, highlighting the use of python for quick prototyping.", 'chapters': [{'end': 82.59, 'segs': [{'end': 32.095, 'src': 'embed', 'start': 2.998, 'weight': 0, 'content': [{'end': 9.339, 'text': "What's going on, everybody? Welcome to another Python programming tutorial with machine learning using scikit-learn.", 'start': 2.998, 'duration': 6.341}, {'end': 17.081, 'text': "In this video, what we're going to be doing is covering a real quick example of how simple machine learning actually is.", 'start': 10.239, 'duration': 6.842}, {'end': 22.942, 'text': 'And just to illustrate that the bulk of your task is actually not the machine learning.', 'start': 17.841, 'duration': 5.101}, {'end': 26.483, 'text': 'it is the acquisition and structuring and organization of data.', 'start': 22.942, 'duration': 3.541}, {'end': 32.095, 'text': "But that's not fun, so the first thing we're gonna do is show an example.", 'start': 27.714, 'duration': 4.381}], 'summary': 'Python programming tutorial with machine learning using scikit-learn emphasizing the importance of data acquisition and organization.', 'duration': 29.097, 'max_score': 2.998, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD42998.jpg'}, {'end': 82.59, 'src': 'embed', 'start': 49.799, 'weight': 2, 'content': [{'end': 52.96, 'text': 'So search on my channel and you can find those.', 'start': 49.799, 'duration': 3.161}, {'end': 54.74, 'text': 'They should be on just like the main page even.', 'start': 53, 'duration': 1.74}, {'end': 62.183, 'text': 'So anyway, import matplotlib.pyplot as plt for short.', 'start': 55.2, 'duration': 6.983}, {'end': 69.285, 'text': "The next thing we're going to do is we're going to use the default scikit-learn dataset.", 'start': 63.703, 'duration': 5.582}, {'end': 75.627, 'text': 'So scikit-learn comes with sample datasets that we can actually just use.', 'start': 69.345, 'duration': 6.282}, {'end': 81.789, 'text': "So we're going to go ahead and do from sklearn import datasets.", 'start': 76.167, 'duration': 5.622}, {'end': 82.59, 'text': 'And this has..', 'start': 82.01, 'duration': 0.58}], 'summary': 'Using scikit-learn to import datasets for analysis and visualization.', 'duration': 32.791, 'max_score': 49.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD449799.jpg'}], 'start': 2.998, 'title': 'Python ml tutorial with scikit-learn', 'summary': 'Discusses the simplicity of machine learning, emphasizing the importance of data acquisition and organization, and demonstrates using scikit-learn for a quick machine learning task.', 'chapters': [{'end': 82.59, 'start': 2.998, 'title': 'Python ml tutorial with scikit-learn', 'summary': 'Discusses the simplicity of machine learning, emphasizing the importance of data acquisition and organization, while demonstrating an example of using scikit-learn for a quick machine learning task.', 'duration': 79.592, 'highlights': ['The bulk of the task in machine learning is the acquisition, structuring, and organization of data.', 'The chapter demonstrates using scikit-learn for a quick machine learning task.', "Importing matplotlib's pyplot and using the default scikit-learn dataset are key steps in the example."]}], 'duration': 79.592, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD42998.jpg', 'highlights': ['The bulk of the task in machine learning is the acquisition, structuring, and organization of data.', 'The chapter demonstrates using scikit-learn for a quick machine learning task.', "Importing matplotlib's pyplot and using the default scikit-learn dataset are key steps in the example."]}, {'end': 326.239, 'segs': [{'end': 113.501, 'src': 'embed', 'start': 84.932, 'weight': 0, 'content': [{'end': 87.054, 'text': 'You know the Iris datasets.', 'start': 84.932, 'duration': 2.122}, {'end': 90.157, 'text': "it's got numbers and I think it has those.", 'start': 87.054, 'duration': 3.103}, {'end': 98.246, 'text': "Oregon, if you've ever like done any searching of machine learning tutorials like the example that people always use, is,", 'start': 90.157, 'duration': 8.089}, {'end': 101.169, 'text': "I wanna say it's Oregon like Oregon.", 'start': 98.246, 'duration': 2.923}, {'end': 103.371, 'text': 'house prices, like they did housing prices.', 'start': 101.169, 'duration': 2.202}, {'end': 111.259, 'text': 'and then so the idea was you could use machine learning to come up with a perfect algorithm to pricing any house based on various variables.', 'start': 103.371, 'duration': 7.888}, {'end': 113.501, 'text': "So I'm pretty sure that one's in there too.", 'start': 111.919, 'duration': 1.582}], 'summary': 'Iris dataset and oregon housing prices used as example for machine learning tutorials.', 'duration': 28.569, 'max_score': 84.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD484932.jpg'}, {'end': 151.738, 'src': 'embed', 'start': 123.57, 'weight': 1, 'content': [{'end': 127.694, 'text': 'you have no idea how to go out into the real world and do machine learning,', 'start': 123.57, 'duration': 4.124}, {'end': 135.421, 'text': 'because you have to get the data somehow and nobody provides you data in the format that you really kind of need it in.', 'start': 127.694, 'duration': 7.727}, {'end': 144.268, 'text': "So anyway, especially if you're going to label it as like well, we're going to be doing stocks and you need to label that stock as buy, buy, buy, buy,", 'start': 135.861, 'duration': 8.407}, {'end': 148.072, 'text': "buy, all the way up to when it's no longer a buy and it's a sell and sell, sell, sell, sell, sell.", 'start': 144.268, 'duration': 3.804}, {'end': 151.738, 'text': 'We need to be able to label our data.', 'start': 148.877, 'duration': 2.861}], 'summary': 'Challenges in obtaining labeled data for machine learning in real-world scenarios, particularly in stock prediction.', 'duration': 28.168, 'max_score': 123.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4123570.jpg'}, {'end': 263.668, 'src': 'embed', 'start': 235.248, 'weight': 3, 'content': [{'end': 239.15, 'text': "SVM's going to come through and be like, there's your groups for you, because you labeled the groups.", 'start': 235.248, 'duration': 3.902}, {'end': 240.87, 'text': "So that's what SVM does for you.", 'start': 239.19, 'duration': 1.68}, {'end': 245.472, 'text': 'unsupervised learning is basically like this.', 'start': 242.751, 'duration': 2.721}, {'end': 251.035, 'text': "just for anyone who's curious, remember we had all these orange dots and over here I think we had green.", 'start': 245.472, 'duration': 5.563}, {'end': 258.458, 'text': 'you know, you just would feed the data to the algorithm like this and then the algorithm is expected to just kind of figure out.', 'start': 251.035, 'duration': 7.423}, {'end': 260.238, 'text': 'you know how to divide these groups up.', 'start': 258.458, 'duration': 1.78}, {'end': 263.668, 'text': "doesn't really know anything about the groups.", 'start': 261.807, 'duration': 1.861}], 'summary': 'Svm uses labeled data to identify groups, unsupervised learning divides data without group knowledge.', 'duration': 28.42, 'max_score': 235.248, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4235248.jpg'}, {'end': 326.239, 'src': 'embed', 'start': 272.594, 'weight': 4, 'content': [{'end': 277.738, 'text': "so we're going to go ahead and import svm and what it's going to do is it's going to divvy up numbers,", 'start': 272.594, 'duration': 5.144}, {'end': 286.203, 'text': "so we're going to use the digits data set and it's going to categorize 0s, 1, 2, 3, 4, 5, 6, 7, 8s and 9s for us.", 'start': 277.738, 'duration': 8.465}, {'end': 293.307, 'text': "And then we're going to shove a new version through and it's going to have to, based on the previous examples,", 'start': 286.363, 'duration': 6.944}, {'end': 295.929, 'text': 'kind of guess for us what that number is.', 'start': 293.307, 'duration': 2.622}, {'end': 305.414, 'text': 'So the primary focus of machine learning is kind of like to skip the..', 'start': 296.049, 'duration': 9.365}, {'end': 312.808, 'text': "I don't know, experience gaining aspect of learning.", 'start': 309.625, 'duration': 3.183}, {'end': 321.014, 'text': 'So anywhere where you think that just kind of like seat time or time in the field would gain a lot of experience for you,', 'start': 312.888, 'duration': 8.126}, {'end': 324.717, 'text': 'you can use machine learning to just gain all that experience for you instantly.', 'start': 321.014, 'duration': 3.703}, {'end': 326.239, 'text': 'So that can help.', 'start': 325.638, 'duration': 0.601}], 'summary': 'Using svm to categorize and predict numbers in the digits data set for machine learning gain.', 'duration': 53.645, 'max_score': 272.594, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4272594.jpg'}], 'start': 84.932, 'title': 'Iris datasets and svm in machine learning', 'summary': "Discusses the use of iris datasets in machine learning, including the example of oregon house prices, and the challenges of applying machine learning in the real world due to data availability and labeling requirements. it also explains the concept of support vector machines (svm) using a simple visual demonstration and emphasizes the use of labeled plots for classification. additionally, it introduces svm and unsupervised learning, highlighting svm's ability to categorize and guess numbers, and the role of machine learning in gaining experience instantly.", 'chapters': [{'end': 165.723, 'start': 84.932, 'title': 'Iris datasets and machine learning challenges', 'summary': 'Discusses the use of iris datasets in machine learning, including the example of oregon house prices, and the challenges of applying machine learning in the real world due to data availability and labeling requirements.', 'duration': 80.791, 'highlights': ['The Iris datasets are commonly used in machine learning tutorials, with an example related to Oregon house prices and the application of machine learning algorithms to price houses based on various variables.', 'A common complaint is the challenge of transitioning from learning scikit-learn to applying machine learning in the real world due to data unavailability in the required format and the need for labeled data, particularly in scenarios such as stock market predictions.', "The need for labeled data is highlighted in the context of stock market predictions, as obtaining data labeled with 'buy' and 'sell' indications is a crucial requirement that is often not readily available.", 'The chapter mentions the use of scikit-learn for importing datasets and the import of SVM (support vector machine) from sklearn as a form of machine learning.']}, {'end': 218.183, 'start': 166.323, 'title': 'Svm visualization', 'summary': 'Explains the concept of support vector machines (svm) using a simple visual demonstration and emphasizes the use of labeled plots for classification.', 'duration': 51.86, 'highlights': ['Support Vector Machines (SVM) visualization is demonstrated using a simple graph to illustrate the concept.', 'Emphasizes the use of labeled plots for classification in SVM.']}, {'end': 326.239, 'start': 219.724, 'title': 'Svm and unsupervised learning', 'summary': "Introduces support vector machines (svm) and unsupervised learning, highlighting svm's ability to categorize and guess numbers, and the role of machine learning in gaining experience instantly.", 'duration': 106.515, 'highlights': ['SVM categorizes numbers from the digits dataset, including 0s, 1s, 2s, 3s, 4s, 5s, 6s, 7s, 8s, and 9s, and can guess new numbers based on previous examples.', 'Machine learning allows for instant gain of experience, bypassing the need for traditional experience gaining methods.', 'Support Vector Machines (SVM) is explained as categorizing and dividing groups based on labeled data, serving as an example of unsupervised learning.']}], 'duration': 241.307, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD484932.jpg', 'highlights': ['The Iris datasets are commonly used in machine learning tutorials, with an example related to Oregon house prices and the application of machine learning algorithms to price houses based on various variables.', 'A common complaint is the challenge of transitioning from learning scikit-learn to applying machine learning in the real world due to data unavailability in the required format and the need for labeled data, particularly in scenarios such as stock market predictions.', "The need for labeled data is highlighted in the context of stock market predictions, as obtaining data labeled with 'buy' and 'sell' indications is a crucial requirement that is often not readily available.", 'Support Vector Machines (SVM) visualization is demonstrated using a simple graph to illustrate the concept.', 'SVM categorizes numbers from the digits dataset, including 0s, 1s, 2s, 3s, 4s, 5s, 6s, 7s, 8s, and 9s, and can guess new numbers based on previous examples.', 'Machine learning allows for instant gain of experience, bypassing the need for traditional experience gaining methods.', 'Support Vector Machines (SVM) is explained as categorizing and dividing groups based on labeled data, serving as an example of unsupervised learning.']}, {'end': 581.271, 'segs': [{'end': 388.991, 'src': 'embed', 'start': 344.592, 'weight': 1, 'content': [{'end': 349.995, 'text': "After seeing a bunch of examples, it's going to be like well, they all have all this and some of them vary a little bit,", 'start': 344.592, 'duration': 5.403}, {'end': 357.319, 'text': "but the difference between a 4 and a 9 is it's got either a straight line or a curved line or some sort of line at the top, something like that.", 'start': 349.995, 'duration': 7.324}, {'end': 359.879, 'text': "Anyway, so we've got that.", 'start': 358.038, 'duration': 1.841}, {'end': 360.839, 'text': "Now we're going to go ahead.", 'start': 359.899, 'duration': 0.94}, {'end': 362.019, 'text': "We're going to say the digits.", 'start': 360.879, 'duration': 1.14}, {'end': 363.78, 'text': 'This is for the digits data set.', 'start': 362.48, 'duration': 1.3}, {'end': 367.962, 'text': 'Digits equals data sets dot load underscore digits.', 'start': 364.06, 'duration': 3.902}, {'end': 371.283, 'text': 'And this comes with like a whole bunch of data.', 'start': 368.322, 'duration': 2.961}, {'end': 375.584, 'text': 'And you can look at the digits data set if you want.', 'start': 371.943, 'duration': 3.641}, {'end': 378.786, 'text': "It's like labeled and then you've got your example and you've got..", 'start': 375.945, 'duration': 2.841}, {'end': 386.51, 'text': 'Basically like the, not necessarily pixels, but basically what makes up that digit saved in this.', 'start': 379.966, 'duration': 6.544}, {'end': 388.991, 'text': "So for example, we can say, you've got all kinds of stuff.", 'start': 386.79, 'duration': 2.201}], 'summary': 'Analyzing digits data set and identifying key features for classification.', 'duration': 44.399, 'max_score': 344.592, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4344592.jpg'}, {'end': 453.378, 'src': 'embed', 'start': 417.582, 'weight': 0, 'content': [{'end': 428.748, 'text': 'And then we can even do, for example, we can say print..', 'start': 417.582, 'duration': 11.166}, {'end': 434.851, 'text': 'digits.images and zero, something like this.', 'start': 430.031, 'duration': 4.82}, {'end': 440.876, 'text': 'Okay, so this is basically the image of the digit right here.', 'start': 437.415, 'duration': 3.461}, {'end': 444.076, 'text': 'So this is just like an example of a digit.', 'start': 441.836, 'duration': 2.24}, {'end': 453.378, 'text': "And so, as you're gonna find too, one of the more difficult things for machine learning is that we have to convert everything to numbers right?", 'start': 444.416, 'duration': 8.962}], 'summary': 'Convert images of digits to numbers for machine learning.', 'duration': 35.796, 'max_score': 417.582, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4417582.jpg'}, {'end': 504.039, 'src': 'embed', 'start': 475.832, 'weight': 4, 'content': [{'end': 480.378, 'text': "And it's okay if your data is in a range from negative three to positive three, or something like that.", 'start': 475.832, 'duration': 4.546}, {'end': 483.18, 'text': 'but you want basically all of the data sets to follow that.', 'start': 480.718, 'duration': 2.462}, {'end': 487.885, 'text': "so you wouldn't want like, like, for example, housing prices, let's say you had to normalize that.", 'start': 483.18, 'duration': 4.705}, {'end': 489.066, 'text': "you'd have to normalize that data.", 'start': 487.885, 'duration': 1.181}, {'end': 496.351, 'text': "so like if you've got a house, you know, you believe we're saying you've got prices of houses anywhere from 10,000 to 200,000.", 'start': 489.066, 'duration': 7.285}, {'end': 499.135, 'text': "well, we need to normalize that, because we've got, you know, that data.", 'start': 496.352, 'duration': 2.783}, {'end': 504.039, 'text': "we've got square footage data, number of bedrooms, data, all that data is not really normalized.", 'start': 499.135, 'duration': 4.904}], 'summary': 'Data needs to be normalized within a range of -3 to +3 to ensure consistency across all data sets.', 'duration': 28.207, 'max_score': 475.832, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4475832.jpg'}, {'end': 581.271, 'src': 'embed', 'start': 536.01, 'weight': 5, 'content': [{'end': 539.071, 'text': "So if you've seen that, then you'll probably recognize some of this code.", 'start': 536.01, 'duration': 3.061}, {'end': 542.172, 'text': "So anyway, we're just going to use the same stuff that they did.", 'start': 539.471, 'duration': 2.701}, {'end': 544.113, 'text': 'So they had CLF for classifier.', 'start': 542.192, 'duration': 1.921}, {'end': 549.197, 'text': 'And that equals SVM dot SVC.', 'start': 545.654, 'duration': 3.543}, {'end': 551.7, 'text': 'And then you can set gamma here.', 'start': 549.918, 'duration': 1.782}, {'end': 554.402, 'text': 'You can set this if you want.', 'start': 552.561, 'duration': 1.841}, {'end': 558.667, 'text': "And we'll kind of play with gamma in a little bit to show you that it matters.", 'start': 555.303, 'duration': 3.364}, {'end': 563.511, 'text': 'But for now I just want to show you how quickly we can do a machine learning algorithm.', 'start': 559.407, 'duration': 4.104}, {'end': 572.662, 'text': 'There are also, built into scikit-learn, there is a functionality to just automatically get gamma for you.', 'start': 565.193, 'duration': 7.469}, {'end': 577.528, 'text': "And it'll just kind of automatically pick the best version of gamma using machine learning.", 'start': 573.323, 'duration': 4.205}, {'end': 581.271, 'text': 'You can use that as well.', 'start': 580.37, 'duration': 0.901}], 'summary': 'Using scikit-learn, quickly implement machine learning algorithms with automatic gamma selection.', 'duration': 45.261, 'max_score': 536.01, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4536010.jpg'}], 'start': 326.339, 'title': 'Recognizing digits with machine learning', 'summary': "Discusses the process of recognizing and classifying digits using machine learning, including analyzing patterns and exploring the digits data set in python, as well as the importance of data normalization and the setup of a machine learning classifier using scikit-learn's svm dot svc.", 'chapters': [{'end': 363.78, 'start': 326.339, 'title': 'Recognizing digits with machine learning', 'summary': 'Discusses how a machine learns to differentiate between digits, such as 9, 8, and 4, by analyzing patterns like closed tops and varying lines, in order to aid in recognizing and classifying digits.', 'duration': 37.441, 'highlights': ['The machine learns to differentiate between digits based on patterns like closed tops and varying lines, aiding in recognizing and classifying digits.', 'By analyzing a million examples, the machine identifies the difference between digits, such as 9, 8, and 4, based on specific patterns like closed tops, straight lines, or curved lines.', 'The discussion focuses on the process of recognizing and classifying digits within the digits data set.']}, {'end': 453.378, 'start': 364.06, 'title': 'Exploring digits data set', 'summary': 'Introduces the digits data set in python, containing labeled data and images of digits, and discusses the process of converting everything to numbers for machine learning.', 'duration': 89.318, 'highlights': ['The digits data set in Python comes with a whole bunch of labeled data and images of digits, which can be accessed using commands like digits.data, digits.target, and digits.images.', 'The data set contains a very long array of data, which includes information about what number is represented by the data.', 'Converting everything to numbers is one of the more difficult things for machine learning, as highlighted in the chapter.']}, {'end': 581.271, 'start': 454.218, 'title': 'Data normalization and machine learning', 'summary': "Discusses the importance of normalizing data to a standardized range, such as converting housing price data to a range from -1 to 1, and demonstrates the setup of a machine learning classifier using scikit-learn's svm dot svc.", 'duration': 127.053, 'highlights': ['Data normalization is crucial to ensure all data sets are in a standardized range, such as converting housing price data from 10,000 to 200,000 to a range from -1 to 1.', "Setting up a machine learning classifier using Scikit-Learn's SVM dot SVC involves defining the classifier and potentially adjusting the gamma value for optimal performance.", 'Scikit-Learn offers functionality to automatically determine the best gamma value using machine learning, simplifying the process of setting up the classifier.']}], 'duration': 254.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4326339.jpg', 'highlights': ['The machine learns to differentiate between digits based on patterns like closed tops and varying lines, aiding in recognizing and classifying digits.', 'By analyzing a million examples, the machine identifies the difference between digits, such as 9, 8, and 4, based on specific patterns like closed tops, straight lines, or curved lines.', 'The discussion focuses on the process of recognizing and classifying digits within the digits data set.', 'The digits data set in Python comes with a whole bunch of labeled data and images of digits, which can be accessed using commands like digits.data, digits.target, and digits.images.', 'Data normalization is crucial to ensure all data sets are in a standardized range, such as converting housing price data from 10,000 to 200,000 to a range from -1 to 1.', "Setting up a machine learning classifier using Scikit-Learn's SVM dot SVC involves defining the classifier and potentially adjusting the gamma value for optimal performance.", 'Scikit-Learn offers functionality to automatically determine the best gamma value using machine learning, simplifying the process of setting up the classifier.', 'Converting everything to numbers is one of the more difficult things for machine learning, as highlighted in the chapter.']}, {'end': 938.128, 'segs': [{'end': 658.133, 'src': 'embed', 'start': 624.798, 'weight': 0, 'content': [{'end': 629.54, 'text': "So we can just print, I don't know what the length is, but I'm pretty sure it's pretty large.", 'start': 624.798, 'duration': 4.742}, {'end': 630.381, 'text': 'So digits.data.', 'start': 629.58, 'duration': 0.801}, {'end': 639.856, 'text': 'printlang.digits.data So we have a total of, yeah, 1,797 examples of digits.', 'start': 631.647, 'duration': 8.209}, {'end': 641.137, 'text': 'So zero through nine.', 'start': 640.437, 'duration': 0.7}, {'end': 642.619, 'text': 'Actually, I think it might be one through nine.', 'start': 641.438, 'duration': 1.181}, {'end': 643.82, 'text': "I'm not sure if they have zeros or not.", 'start': 642.639, 'duration': 1.181}, {'end': 647.845, 'text': 'Anyway, so printlang, so we have that many examples.', 'start': 643.98, 'duration': 3.865}, {'end': 658.133, 'text': "So we're going to load up 1,796 examples of the examples and we're going to use that as learning data.", 'start': 648.205, 'duration': 9.928}], 'summary': "Using 1,796 examples of digits (possibly 1-9), we'll use it as learning data.", 'duration': 33.335, 'max_score': 624.798, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4624798.jpg'}, {'end': 722.177, 'src': 'embed', 'start': 684.71, 'weight': 1, 'content': [{'end': 697.439, 'text': "and then what we're gonna do is we're gonna do CLF, dot, fit and then XY, and then now we're gonna actually go ahead and call for a prediction.", 'start': 684.71, 'duration': 12.729}, {'end': 716.995, 'text': "so we're gonna say print, print prediction, colon, comma, clf dot, predict, and then we're going to predict digits, data, the negative first element.", 'start': 697.439, 'duration': 19.556}, {'end': 722.177, 'text': 'So this will be where we actually predict what is the negative first element.', 'start': 717.035, 'duration': 5.142}], 'summary': 'Using clf to predict negative first element', 'duration': 37.467, 'max_score': 684.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4684710.jpg'}, {'end': 829.447, 'src': 'heatmap', 'start': 778.065, 'weight': 2, 'content': [{'end': 785.473, 'text': "sometimes it's actually really hard to see the image because it was originally very small and we've zoomed in on this big plot and it's kind of confusing to look at.", 'start': 778.065, 'duration': 7.408}, {'end': 786.714, 'text': "So anyway, that's what's going on there.", 'start': 785.533, 'duration': 1.181}, {'end': 791.72, 'text': 'PLT.mshow and then finally we just call PLT.show, real simple.', 'start': 787.555, 'duration': 4.165}, {'end': 793.261, 'text': 'So we save and run that.', 'start': 792.481, 'duration': 0.78}, {'end': 797.509, 'text': 'And the prediction is that this is a number eight.', 'start': 794.247, 'duration': 3.262}, {'end': 803.092, 'text': "And as we can see in the image, it's most likely an eight.", 'start': 798.65, 'duration': 4.442}, {'end': 804.333, 'text': 'It looks like it goes like this.', 'start': 803.152, 'duration': 1.181}, {'end': 809.316, 'text': "It's kind of missing a number here, but the only thing that this would even be close to is like a six or something.", 'start': 804.413, 'duration': 4.903}, {'end': 812.217, 'text': "So we're pretty content.", 'start': 809.836, 'duration': 2.381}, {'end': 814.399, 'text': 'You know, it seems like it knew that number.', 'start': 812.277, 'duration': 2.122}, {'end': 819.441, 'text': 'So the next thing we can do is show like another example.', 'start': 814.879, 'duration': 4.562}, {'end': 829.447, 'text': 'So what if we did this? What if we change the data set that we memorize all the way to the negative tenth? Okay.', 'start': 819.541, 'duration': 9.906}], 'summary': 'Using python, the prediction of a number image is successful, with a slight confusion on an image resembling a six.', 'duration': 51.382, 'max_score': 778.065, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4778065.jpg'}, {'end': 913.228, 'src': 'embed', 'start': 880.687, 'weight': 3, 'content': [{'end': 884.53, 'text': 'this is basically how this data set goes right?', 'start': 880.687, 'duration': 3.843}, {'end': 889.234, 'text': "And if you've ever heard the term overfit, here's how overfit works.", 'start': 885.231, 'duration': 4.003}, {'end': 891.296, 'text': "It's gonna go like this.", 'start': 890.315, 'duration': 0.981}, {'end': 897.261, 'text': "It's gonna be like, okay, so here is how the data set works.", 'start': 891.316, 'duration': 5.945}, {'end': 899.082, 'text': "It's like this.", 'start': 898.161, 'duration': 0.921}, {'end': 903.351, 'text': "And then the green is, I don't know, like this.", 'start': 900.203, 'duration': 3.148}, {'end': 911.087, 'text': "I can't tell if that's even a green or a blue.", 'start': 909.626, 'duration': 1.461}, {'end': 912.168, 'text': 'Whoops, we covered over that blue.', 'start': 911.127, 'duration': 1.041}, {'end': 913.228, 'text': 'Anyway, you get the idea.', 'start': 912.268, 'duration': 0.96}], 'summary': 'Discussion about overfitting and interpreting data visually.', 'duration': 32.541, 'max_score': 880.687, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4880687.jpg'}], 'start': 581.331, 'title': 'Machine learning and testing with digits data', 'summary': "Covers the process of storing and testing examples in a machine learning algorithm using digits data, which includes 1,797 examples of digits from zero through nine. it also explains the prediction of the negative first element and demonstrates the machine's prediction of an image as number eight, along with model fitting and the concept of overfitting.", 'chapters': [{'end': 722.177, 'start': 581.331, 'title': 'Machine learning and testing with digits data', 'summary': 'Introduces the process of storing and testing examples in a machine learning algorithm using digits data, containing 1,797 examples of digits from zero through nine, and then predicting the value of the negative first element.', 'duration': 140.846, 'highlights': ['The digits.data contains a total of 1,797 examples of digits, from zero through nine, used as learning data for the machine learning algorithm.', 'The chapter demonstrates the process of testing the machine learning algorithm against the last element, the negative first element, and predicting its value.', 'The process involves storing all the answers and examples in digits.data and digits.target, and then using CLF.fit to test and predict the value of the negative first element.']}, {'end': 938.128, 'start': 723.962, 'title': 'Machine prediction and model fitting', 'summary': "Demonstrates the machine's prediction of an image as number eight, and fitting a line to the numbers with the fit function, while explaining the concept of overfitting and its implications.", 'duration': 214.166, 'highlights': ["Demonstrates the machine's prediction of an image as number eight The machine predicts an image as number eight with confidence, despite minor imperfections in the image.", 'Explains the concept of overfitting and its implications Overfitting is explained as fitting a line to the numbers, and the negative implications of overfitting are highlighted.']}], 'duration': 356.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4581331.jpg', 'highlights': ['The digits.data contains a total of 1,797 examples of digits, from zero through nine, used as learning data for the machine learning algorithm.', 'The process involves storing all the answers and examples in digits.data and digits.target, and then using CLF.fit to test and predict the value of the negative first element.', "Demonstrates the machine's prediction of an image as number eight The machine predicts an image as number eight with confidence, despite minor imperfections in the image.", 'Explains the concept of overfitting and its implications Overfitting is explained as fitting a line to the numbers, and the negative implications of overfitting are highlighted.']}, {'end': 1426.356, 'segs': [{'end': 992.072, 'src': 'embed', 'start': 938.789, 'weight': 0, 'content': [{'end': 946.654, 'text': "Anyway, so that's what we're doing here is we're fitting, and then we're going to go ahead and run the prediction, and then we're showing it.", 'start': 938.789, 'duration': 7.865}, {'end': 952.397, 'text': "So, for example, what if we decide, okay, we're going to predict the negative 2 element.", 'start': 946.974, 'duration': 5.423}, {'end': 953.938, 'text': "Let's show the negative 2 element.", 'start': 952.477, 'duration': 1.461}, {'end': 955.399, 'text': 'Save and run that.', 'start': 954.718, 'duration': 0.681}, {'end': 962.483, 'text': "It's predicting it is a 9, and sure enough, we can definitely tell that's a 9.", 'start': 956.88, 'duration': 5.603}, {'end': 964.905, 'text': "Let's go ahead and do the negative 3rd element, for example.", 'start': 962.483, 'duration': 2.422}, {'end': 970.774, 'text': "don't even know what that is.", 'start': 969.853, 'duration': 0.921}, {'end': 975.098, 'text': "eight. maybe i guess, yeah, it says it's an eight as well.", 'start': 970.774, 'duration': 4.324}, {'end': 984.826, 'text': "it's kind of fun that it pops up uh, before the uh algorithm, before i can see what the algorithm says zero, and uh, that is a zero.", 'start': 975.098, 'duration': 9.728}, {'end': 985.847, 'text': 'so okay, you get the idea.', 'start': 984.826, 'duration': 1.021}, {'end': 987.608, 'text': "so it's obviously fairly accurate.", 'start': 985.847, 'duration': 1.761}, {'end': 992.072, 'text': "but what if we change gamma to, let's do 0.1?", 'start': 987.608, 'duration': 4.464}], 'summary': 'Fitting and running prediction on elements, achieving fairly accurate results.', 'duration': 53.283, 'max_score': 938.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4938789.jpg'}, {'end': 1175.872, 'src': 'embed', 'start': 1150.645, 'weight': 2, 'content': [{'end': 1157.787, 'text': "What is happening right in this example, because usually it's referred to, I believe, as alpha or like the learning rate,", 'start': 1150.645, 'duration': 7.142}, {'end': 1163.289, 'text': 'and basically it is wanting to get down the hill and you can make the learning rate really high.', 'start': 1157.787, 'duration': 5.502}, {'end': 1165.47, 'text': 'So the jumps might be this big, right?', 'start': 1163.649, 'duration': 1.821}, {'end': 1172.171, 'text': 'And so really, if the jump is that big, you only need to make basically one, two, three, four jumps to reach the base of the hill.', 'start': 1165.81, 'duration': 6.361}, {'end': 1175.872, 'text': 'So your machine learning algorithm would be like lightning fast, right? It would get there.', 'start': 1172.491, 'duration': 3.381}], 'summary': 'High learning rate leads to faster convergence in machine learning algorithms.', 'duration': 25.227, 'max_score': 1150.645, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD41150645.jpg'}, {'end': 1284.338, 'src': 'embed', 'start': 1261.285, 'weight': 4, 'content': [{'end': 1269.127, 'text': "And this machine learning, what we've done with fit, clf.fit, this is the training, right? So here's our data set.", 'start': 1261.285, 'duration': 7.842}, {'end': 1270.788, 'text': "Here's where we train the data.", 'start': 1269.508, 'duration': 1.28}, {'end': 1272.869, 'text': 'And we left the last 10 for testing.', 'start': 1270.988, 'duration': 1.881}, {'end': 1274.429, 'text': 'We train the data here.', 'start': 1273.269, 'duration': 1.16}, {'end': 1284.338, 'text': 'And just for concepts, Training is where the scientist knows the answer and then the computer also knows the answer.', 'start': 1274.829, 'duration': 9.509}], 'summary': 'Machine learning training involves using most of the data for training and leaving the last 10 for testing.', 'duration': 23.053, 'max_score': 1261.285, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD41261285.jpg'}, {'end': 1358.389, 'src': 'embed', 'start': 1327.849, 'weight': 5, 'content': [{'end': 1330.071, 'text': 'And so a lot of people will use it to kind of prototype.', 'start': 1327.849, 'duration': 2.222}, {'end': 1335.823, 'text': 'You can also do it, people will use C or Java for machine learning.', 'start': 1332.281, 'duration': 3.542}, {'end': 1337.344, 'text': "Again, you're not going to write your own.", 'start': 1335.963, 'duration': 1.381}, {'end': 1338.965, 'text': "You're going to use someone else's algorithm.", 'start': 1337.364, 'duration': 1.601}, {'end': 1347.585, 'text': "And there's also a pretty popular machine learning kind of prototyping tool called Octave, and you can do a lot of machine learning there.", 'start': 1339.225, 'duration': 8.36}, {'end': 1352.107, 'text': "Again, it doesn't really quite matter what you're using, because in general,", 'start': 1347.865, 'duration': 4.242}, {'end': 1358.389, 'text': "you're going to be spending most of your time organizing your dataset to work with machine learning.", 'start': 1352.107, 'duration': 6.282}], 'summary': 'Various tools like octave, c, and java are used for machine learning prototyping, with a focus on organizing datasets.', 'duration': 30.54, 'max_score': 1327.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD41327849.jpg'}], 'start': 938.789, 'title': 'Machine learning fundamentals', 'summary': 'Covers fitting a model, running predictions with varying gamma values, and observing accuracy. it also discusses gradient descent in machine learning, emphasizing the importance of the learning rate in optimization. additionally, it introduces the concept of machine learning, explaining the training process and the use of python for quick prototyping.', 'chapters': [{'end': 1048.75, 'start': 938.789, 'title': 'Model prediction and accuracy', 'summary': 'Discusses fitting a model and running predictions with varying gamma values, observing the accuracy of the predictions on specific elements of the dataset.', 'duration': 109.961, 'highlights': ['The model predicts the negative 2 element as 9 with good accuracy. The model predicts the negative 2 element as 9 with good accuracy.', 'Changing gamma to 0.1 results in a prediction of 3 for the same element, showcasing the impact of parameter adjustment on prediction accuracy. Changing gamma to 0.1 results in a prediction of 3 for the same element, showcasing the impact of parameter adjustment on prediction accuracy.', 'When gamma is changed to 0.001, the model accurately predicts the element as 9, demonstrating the effect of further parameter modification on prediction correctness. When gamma is changed to 0.001, the model accurately predicts the element as 9, demonstrating the effect of further parameter modification on prediction correctness.']}, {'end': 1234.315, 'start': 1049.231, 'title': 'Gradient descent in machine learning', 'summary': "Discusses the concept of gradient descent in machine learning, highlighting the importance of the learning rate (gamma) in determining the speed and accuracy of reaching the base of the 'mountain' in the optimization process.", 'duration': 185.084, 'highlights': ['The importance of the learning rate (gamma) in determining the speed and accuracy of gradient descent in machine learning. The learning rate (gamma) significantly influences the speed and accuracy of the algorithm, with higher values leading to faster but less accurate convergence, while lower values result in slower but more accurate convergence.', "The analogy of gradient descent to getting down a mountain, with the learning rate (gamma) determining the size of the 'jumps' in the optimization process. The concept of gradient descent is likened to navigating down a mountain, where the learning rate (gamma) determines the size of 'jumps' made to reach the base, impacting the speed and efficiency of the algorithm.", 'The trade-off between speed and accuracy in gradient descent, based on the learning rate (gamma). The choice of the learning rate (gamma) involves a trade-off between speed and accuracy, as a higher rate leads to faster convergence but potential overshooting, while a lower rate ensures accuracy but slower convergence.']}, {'end': 1426.356, 'start': 1235.56, 'title': 'Introduction to machine learning', 'summary': 'Introduces the concept of machine learning, explaining the training process, testing, and the importance of dataset organization. it also mentions the use of python for quick prototyping and the upcoming project on analyzing stock fundamental data using machine learning.', 'duration': 190.796, 'highlights': ['Machine learning training involves fitting the data and leaving some for testing, with the computer using previous answers to come up with the best answer. The training process involves leaving the last 10 samples for testing, and the computer fits the data based on the known answers. It then uses the previous answers to make predictions, with the success rate depending on the value of gamma.', 'Python is commonly used for machine learning prototyping due to its quick development, and scikit-learn provides a diagram for choosing the best algorithm. Python is preferred for quick prototyping, and scikit-learn offers a diagram for selecting the most suitable machine learning algorithm for a task.', 'The upcoming project involves analyzing stock fundamental data using machine learning to identify important fundamentals and select suitable investment algorithms. The project will focus on using machine learning to analyze stock fundamental data, aiming to identify important fundamentals and determine suitable investment algorithms.']}], 'duration': 487.567, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTeVOb8gaD4/pics/KTeVOb8gaD4938789.jpg', 'highlights': ['The model predicts the negative 2 element as 9 with good accuracy.', 'Changing gamma to 0.001 results in a prediction of 9 for the same element, demonstrating the effect of further parameter modification on prediction correctness.', 'The importance of the learning rate (gamma) in determining the speed and accuracy of gradient descent in machine learning.', "The analogy of gradient descent to getting down a mountain, with the learning rate (gamma) determining the size of the 'jumps' in the optimization process.", 'Machine learning training involves fitting the data and leaving some for testing, with the computer using previous answers to come up with the best answer.', 'Python is commonly used for machine learning prototyping due to its quick development, and scikit-learn provides a diagram for choosing the best algorithm.']}], 'highlights': ['The bulk of the task in machine learning is the acquisition, structuring, and organization of data.', 'The machine learns to differentiate between digits based on patterns like closed tops and varying lines, aiding in recognizing and classifying digits.', 'The importance of data acquisition and organization, the challenges of applying machine learning in the real world, and the concept of SVM using a simple visual demonstration.', 'The model predicts the negative 2 element as 9 with good accuracy.', 'The process involves storing all the answers and examples in digits.data and digits.target, and then using CLF.fit to test and predict the value of the negative first element.', 'The discussion focuses on the process of recognizing and classifying digits within the digits data set.', 'The Iris datasets are commonly used in machine learning tutorials, with an example related to Oregon house prices and the application of machine learning algorithms to price houses based on various variables.', "The need for labeled data is highlighted in the context of stock market predictions, as obtaining data labeled with 'buy' and 'sell' indications is a crucial requirement that is often not readily available.", 'The machine predicts an image as number eight with confidence, despite minor imperfections in the image.', 'The importance of the learning rate (gamma) in determining the speed and accuracy of gradient descent in machine learning.']}