title

Training a machine learning model with scikit-learn

description

Now that we're familiar with the famous iris dataset, let's actually use a classification model in scikit-learn to predict the species of an iris! We'll learn how the K-nearest neighbors (KNN) model works, and then walk through the four steps for model training and prediction in scikit-learn. Finally, we'll see how easy it is to try out a different classification model, namely logistic regression.
Download the notebook: https://github.com/justmarkham/scikit-learn-videos
Iris dataset: http://archive.ics.uci.edu/ml/datasets/Iris
Nearest Neighbors user guide: http://scikit-learn.org/stable/modules/neighbors.html
KNeighborsClassifier class documentation: http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Logistic Regression user guide: http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
LogisticRegression class documentation: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Videos from An Introduction to Statistical Learning: https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:
1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1
3) JOIN "Data School Insiders" to access bonus content:
https://www.patreon.com/dataschool
4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/
5) LET'S CONNECT!
- Newsletter: https://www.dataschool.io/subscribe/
- Twitter: https://twitter.com/justmarkham
- Facebook: https://www.facebook.com/DataScienceSchool/
- LinkedIn: https://www.linkedin.com/in/justmarkham/

detail

{'title': 'Training a machine learning model with scikit-learn', 'heatmap': [{'end': 699.427, 'start': 647.97, 'weight': 0.95}, {'end': 718.149, 'start': 705.202, 'weight': 0.706}], 'summary': 'Covers training a k-nearest neighbors classification model in scikit-learn, including steps, k-value selection, impact of k-values on decision boundaries, model tuning, and evaluation procedures for improved prediction accuracy.', 'chapters': [{'end': 59.329, 'segs': [{'end': 59.329, 'src': 'embed', 'start': 0.569, 'weight': 0, 'content': [{'end': 4.651, 'text': 'Welcome back to my video series on machine learning with scikit-learn.', 'start': 0.569, 'duration': 4.082}, {'end': 11.753, 'text': 'In the previous video, we discussed the famous IRIS dataset and loaded it into scikit-learn.', 'start': 5.531, 'duration': 6.222}, {'end': 22.297, 'text': 'We learned some important machine learning terminology, such as features, response, observations, regression, and classification.', 'start': 12.553, 'duration': 9.744}, {'end': 28.679, 'text': "And finally, we discussed scikit-learn's four key requirements for working with data.", 'start': 23.217, 'duration': 5.462}, {'end': 32.109, 'text': "In this video, I'll be covering the following.", 'start': 29.867, 'duration': 2.242}, {'end': 37.134, 'text': 'What is the k-nearest neighbors classification model?', 'start': 34.031, 'duration': 3.103}, {'end': 42.659, 'text': 'What are the four steps from model training and prediction in scikit-learn?', 'start': 38.355, 'duration': 4.304}, {'end': 47.063, 'text': 'And how can I apply this pattern to other machine learning models?', 'start': 43.66, 'duration': 3.403}, {'end': 59.329, 'text': "Let's first do a quick review of the iris dataset that I introduced last time, since we'll be using this data when training our model.", 'start': 51.266, 'duration': 8.063}], 'summary': 'Video series covers k-nearest neighbors model & scikit-learn requirements.', 'duration': 58.76, 'max_score': 0.569, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8569.jpg'}], 'start': 0.569, 'title': 'K-nearest neighbors model', 'summary': 'Covers the k-nearest neighbors classification model, the four steps from model training and prediction in scikit-learn, and the application of this pattern to other machine learning models using the iris dataset.', 'chapters': [{'end': 59.329, 'start': 0.569, 'title': 'Scikit-learn: k-nearest neighbors model', 'summary': 'Covers the k-nearest neighbors classification model, the four steps from model training and prediction in scikit-learn, and the application of this pattern to other machine learning models using the iris dataset, introduced in the previous video.', 'duration': 58.76, 'highlights': ['The chapter covers the k-nearest neighbors classification model. It explains the concept and application of the k-nearest neighbors classification model.', 'The four steps from model training and prediction in scikit-learn are discussed. It outlines the process of model training and prediction in scikit-learn, providing a practical approach to working with the data.', 'The application of this pattern to other machine learning models using the IRIS dataset is explained. It demonstrates how the pattern can be applied to other machine learning models using the IRIS dataset, showcasing practical implementation.', 'Introduction and review of the IRIS dataset. It includes a quick review of the IRIS dataset and its relevance to the model training process.']}], 'duration': 58.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8569.jpg', 'highlights': ['The application of this pattern to other machine learning models using the IRIS dataset is explained.', 'The four steps from model training and prediction in scikit-learn are discussed.', 'Covers the k-nearest neighbors classification model. It explains the concept and application of the k-nearest neighbors classification model.', 'Introduction and review of the IRIS dataset. It includes a quick review of the IRIS dataset and its relevance to the model training process.']}, {'end': 387.479, 'segs': [{'end': 89.593, 'src': 'embed', 'start': 60.43, 'weight': 1, 'content': [{'end': 66.553, 'text': 'There are 150 observations, and each observation represents an iris flower.', 'start': 60.43, 'duration': 6.123}, {'end': 76.137, 'text': 'There are four features, represented by the first four columns of data, which are the sepal and petal measurements for each iris.', 'start': 67.673, 'duration': 8.464}, {'end': 82.65, 'text': 'And the response variable is the species of each iris shown in the fifth column.', 'start': 77.168, 'duration': 5.482}, {'end': 89.593, 'text': 'Because our response variable is categorical, this is known as a classification problem.', 'start': 83.711, 'duration': 5.882}], 'summary': '150 observations of iris flowers with 4 features, a classification problem.', 'duration': 29.163, 'max_score': 60.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe860430.jpg'}, {'end': 211.206, 'src': 'embed', 'start': 178.752, 'weight': 0, 'content': [{'end': 194.907, 'text': 'the model calculates the numerical distance between the unknown iris and each of the 150 known irises and selects the five known irises with the smallest distance to the unknown iris.', 'start': 178.752, 'duration': 16.155}, {'end': 202.494, 'text': 'Note that Euclidean distance is often used as the distance metric, but other metrics can be used instead.', 'start': 196.308, 'duration': 6.186}, {'end': 211.206, 'text': 'Third, the response values of the five nearest neighbors are tallied,', 'start': 205.001, 'duration': 6.205}], 'summary': 'Model calculates distance between unknown iris and 150 known irises, selects 5 nearest neighbors, and tallies their response values.', 'duration': 32.454, 'max_score': 178.752, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8178752.jpg'}, {'end': 387.479, 'src': 'embed', 'start': 320.968, 'weight': 2, 'content': [{'end': 328.171, 'text': 'Finally, we have a KNN classification map in which the k-value is 5.', 'start': 320.968, 'duration': 7.203}, {'end': 334.074, 'text': 'You can see that the boundaries between colors, known as decision boundaries,', 'start': 328.171, 'duration': 5.903}, {'end': 339.856, 'text': 'have changed because more neighbors are taken into account when making predictions.', 'start': 334.074, 'duration': 5.782}, {'end': 353.498, 'text': 'The predicted response for a new observation here is now blue instead of green because four of its five nearest neighbors are blue.', 'start': 341.196, 'duration': 12.302}, {'end': 366.623, 'text': "The white areas, by the way, are areas in which K and N can't make a clear decision because there's a tie between two classes.", 'start': 357.739, 'duration': 8.884}, {'end': 373.602, 'text': 'As you can see, KNN is a very simple machine learning model,', 'start': 369.016, 'duration': 4.586}, {'end': 381.311, 'text': 'but it can make highly accurate predictions if the different classes in the dataset have very dissimilar feature values.', 'start': 373.602, 'duration': 7.709}, {'end': 387.479, 'text': "Anyway, let's actually use KNN with the Iris dataset in scikit-learn.", 'start': 382.353, 'duration': 5.126}], 'summary': 'Knn map with k=5 shows changed boundaries and accurate predictions based on neighbors.', 'duration': 66.511, 'max_score': 320.968, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8320968.jpg'}], 'start': 60.43, 'title': 'K-nearest neighbors classification', 'summary': 'Introduces the k-nearest neighbors classification model, explaining its process and steps, including the selection of k value and the calculation of nearest neighbors to predict the response value for an unknown iris. it also demonstrates the impact of k-values on decision boundaries and predicting response values, highlighting the importance of dissimilar feature values for accurate predictions.', 'chapters': [{'end': 243.44, 'start': 60.43, 'title': 'K-nearest neighbors classification', 'summary': 'Introduces the k-nearest neighbors classification model, explaining its process and steps including the selection of k value and the calculation of nearest neighbors to predict the response value for an unknown iris.', 'duration': 183.01, 'highlights': ['The response variable is the species of each iris shown in the fifth column. The response variable in the dataset is the species of each iris, providing categorical data for classification.', 'There are 150 observations, and each observation represents an iris flower. The dataset contains 150 observations, each representing an iris flower, providing a substantial amount of data for analysis.', 'The model calculates the numerical distance between the unknown iris and each of the 150 known irises and selects the five known irises with the smallest distance to the unknown iris. The KNN model calculates the numerical distance and selects the nearest five known irises to predict the response value for the unknown iris.']}, {'end': 387.479, 'start': 244.581, 'title': 'Knn classification and prediction', 'summary': 'Introduces knn classification and prediction, demonstrating the impact of k-values on decision boundaries and predicting response values, highlighting the importance of dissimilar feature values for accurate predictions.', 'duration': 142.898, 'highlights': ['The predicted response for a new observation here is now blue instead of green because four of its five nearest neighbors are blue. (k-value=5, predicted response change)', 'Its predicted response class would be green because its nearest neighbor is green. (k-value=1, predicted response class)', 'KNN is a very simple machine learning model, but it can make highly accurate predictions if the different classes in the dataset have very dissimilar feature values. (importance of dissimilar feature values)']}], 'duration': 327.049, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe860430.jpg', 'highlights': ['The KNN model calculates the numerical distance and selects the nearest five known irises to predict the response value for the unknown iris.', 'The dataset contains 150 observations, each representing an iris flower, providing a substantial amount of data for analysis.', 'KNN is a very simple machine learning model, but it can make highly accurate predictions if the different classes in the dataset have very dissimilar feature values.', 'The response variable in the dataset is the species of each iris, providing categorical data for classification.', 'The predicted response for a new observation here is now blue instead of green because four of its five nearest neighbors are blue.']}, {'end': 835.281, 'segs': [{'end': 484.74, 'src': 'embed', 'start': 429.533, 'weight': 0, 'content': [{'end': 437.254, 'text': "Y is a one-dimensional array with length 150, since there's one response value for each observation.", 'start': 429.533, 'duration': 7.721}, {'end': 448.276, 'text': 'When loading your own data into scikit-learn, make sure to meet the four key requirements of input data that I outlined in the previous video.', 'start': 439.334, 'duration': 8.942}, {'end': 460.284, 'text': "Now let's begin the actual machine learning process.", 'start': 457.222, 'duration': 3.062}, {'end': 471.111, 'text': "Scikit-learn provides a uniform interface to machine learning models, and thus there's a common pattern that can be reused across different models.", 'start': 461.425, 'duration': 9.686}, {'end': 476.655, 'text': 'The first step in this pattern is to import the relevant class.', 'start': 472.392, 'duration': 4.263}, {'end': 484.74, 'text': 'In this case, we import kNeighborsClassifier from sklearn.neighbors.', 'start': 477.875, 'duration': 6.865}], 'summary': 'Using scikit-learn for machine learning with kneighborsclassifier.', 'duration': 55.207, 'max_score': 429.533, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8429533.jpg'}, {'end': 631.633, 'src': 'embed', 'start': 601.294, 'weight': 2, 'content': [{'end': 610.603, 'text': 'That is how we tell the knn object that when it runs the knnearestNeighbors algorithm, it should be looking for the one nearest neighbor.', 'start': 601.294, 'duration': 9.309}, {'end': 619.351, 'text': "nNeighbors is known as a tuning parameter or a hyperparameter, which we'll talk more about in the next video.", 'start': 611.784, 'duration': 7.567}, {'end': 631.633, 'text': 'Third, note that there are other parameters for which I did not specify a value, and thus all of those parameters are set to their default values.', 'start': 621.681, 'duration': 9.952}], 'summary': "Explaining the knn object's nneighbors parameter for finding the one nearest neighbor.", 'duration': 30.339, 'max_score': 601.294, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8601294.jpg'}, {'end': 699.427, 'src': 'heatmap', 'start': 647.97, 'weight': 0.95, 'content': [{'end': 658.716, 'text': 'Thankfully, scikit-learn provides sensible defaults for its models so that you can get started with a new model without researching the meaning of every parameter.', 'start': 647.97, 'duration': 10.746}, {'end': 673.405, 'text': "Let's move on to the third step, which is to fit the model with data.", 'start': 669.122, 'duration': 4.283}, {'end': 682.016, 'text': 'This is the model training step in which the model learns the relationship between the features and the response,', 'start': 674.591, 'duration': 7.425}, {'end': 687.819, 'text': 'though the underlying mathematical process through which this learning occurs varies by model.', 'start': 682.016, 'duration': 5.803}, {'end': 699.427, 'text': 'I simply use the fit method on the KNN object and pass it two arguments, the feature matrix x and the response vector y.', 'start': 689.12, 'duration': 10.307}], 'summary': 'Scikit-learn provides defaults for model parameters, then fits model with data.', 'duration': 51.457, 'max_score': 647.97, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8647970.jpg'}, {'end': 699.427, 'src': 'embed', 'start': 674.591, 'weight': 3, 'content': [{'end': 682.016, 'text': 'This is the model training step in which the model learns the relationship between the features and the response,', 'start': 674.591, 'duration': 7.425}, {'end': 687.819, 'text': 'though the underlying mathematical process through which this learning occurs varies by model.', 'start': 682.016, 'duration': 5.803}, {'end': 699.427, 'text': 'I simply use the fit method on the KNN object and pass it two arguments, the feature matrix x and the response vector y.', 'start': 689.12, 'duration': 10.307}], 'summary': 'Model learns relationship between features and response using fit method.', 'duration': 24.836, 'max_score': 674.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8674591.jpg'}, {'end': 731.453, 'src': 'heatmap', 'start': 705.202, 'weight': 0.706, 'content': [{'end': 711.811, 'text': "This operation occurs in place, which is why I don't need to assign the results to another object.", 'start': 705.202, 'duration': 6.609}, {'end': 718.149, 'text': 'The fourth and final step is to make predictions for new observations.', 'start': 713.888, 'duration': 4.261}, {'end': 720.39, 'text': 'In other words,', 'start': 719.169, 'duration': 1.221}, {'end': 731.453, 'text': "I'm inputting the measurements for an unknown iris and asking the fitted model to predict the iris species based on what it has learned in the previous step.", 'start': 720.39, 'duration': 11.063}], 'summary': 'Operation occurs in place, no need to assign results. make predictions for new observations by inputting measurements for unknown iris and asking the fitted model to predict species based on previous learning.', 'duration': 26.251, 'max_score': 705.202, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8705202.jpg'}, {'end': 765.365, 'src': 'embed', 'start': 732.553, 'weight': 4, 'content': [{'end': 740.875, 'text': 'I use the predict method on the knn object and pass it the features of the unknown iris as a Python list.', 'start': 732.553, 'duration': 8.322}, {'end': 751.952, 'text': "It's expecting a NumPy array, but it still works with the list since NumPy automatically converts it to an array of the appropriate shape.", 'start': 742.222, 'duration': 9.73}, {'end': 765.365, 'text': 'Unlike the fit method, the predict method does return an object, namely a NumPy array with the predicted response value.', 'start': 757.297, 'duration': 8.068}], 'summary': 'Using the predict method on the knn object returns a numpy array with the predicted response value.', 'duration': 32.812, 'max_score': 732.553, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8732553.jpg'}], 'start': 395.481, 'title': 'Loading and training with scikit-learn', 'summary': 'Covers loading iris data, verifying shapes, and emphasizing input data requirements. it also explains training a k-nearest-neighbors classification model with insights on tuning parameters and model prediction.', 'chapters': [{'end': 460.284, 'start': 395.481, 'title': 'Loading and verifying data', 'summary': 'Covers loading iris data from scikit-learn, verifying the shapes of features and response, and emphasizing the four key requirements for input data, before commencing the machine learning process.', 'duration': 64.803, 'highlights': ['X is a two-dimensional array with 150 rows and four columns, while Y is a one-dimensional array with length 150, meeting the requirements for input data.', 'Emphasizing the four key requirements for input data when loading your own data into scikit-learn is crucial before commencing the machine learning process.']}, {'end': 835.281, 'start': 461.425, 'title': 'Scikit-learn model training', 'summary': 'Explains the process of training a k-nearest-neighbors classification model using scikit-learn, consisting of importing the class, instantiating the estimator, fitting the model with data, and making predictions for new observations, with insights on tuning parameters, default values, and model prediction.', 'duration': 373.856, 'highlights': ['The process of training a k-nearest-neighbors classification model using Scikit-Learn consists of importing the relevant class, instantiating the estimator, fitting the model with data, and making predictions for new observations. Explains the overall process of training a k-nearest-neighbors classification model using Scikit-Learn.', 'The argument nNeighbors equals 1 is used to specify the k-nearest-neighbors algorithm to look for the one nearest neighbor, known as a tuning parameter or hyperparameter. Provides insight into specifying the number of nearest neighbors in the k-nearest-neighbors algorithm as a tuning parameter.', 'The fit method is used on the KNN object to train the model with the feature matrix x and the response vector y, and this operation occurs in place without needing to assign the results to another object. Describes the training step of the model using the fit method and its in-place operation.', 'The predict method on the knn object is used to make predictions for new observations, returning a NumPy array with the predicted response value. Explains the process of making predictions for new observations using the predict method and the returned NumPy array.']}], 'duration': 439.8, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8395481.jpg', 'highlights': ['Emphasizing the four key requirements for input data when loading your own data into scikit-learn is crucial before commencing the machine learning process.', 'The process of training a k-nearest-neighbors classification model using Scikit-Learn consists of importing the relevant class, instantiating the estimator, fitting the model with data, and making predictions for new observations.', 'The argument nNeighbors equals 1 is used to specify the k-nearest-neighbors algorithm to look for the one nearest neighbor, known as a tuning parameter or hyperparameter.', 'The fit method is used on the KNN object to train the model with the feature matrix x and the response vector y, and this operation occurs in place without needing to assign the results to another object.', 'The predict method on the knn object is used to make predictions for new observations, returning a NumPy array with the predicted response value.', 'X is a two-dimensional array with 150 rows and four columns, while Y is a one-dimensional array with length 150, meeting the requirements for input data.']}, {'end': 1186.677, 'segs': [{'end': 994.607, 'src': 'embed', 'start': 835.281, 'weight': 0, 'content': [{'end': 842.784, 'text': 'which means that the prediction for the first unknown iris was a two and the prediction for the second unknown iris was a one.', 'start': 835.281, 'duration': 7.503}, {'end': 852.508, 'text': "Let's say you wanted to try a different value for k, such as five.", 'start': 848.826, 'duration': 3.682}, {'end': 859.442, 'text': "This is known as model tuning, in which you're varying the arguments that you pass to the model.", 'start': 853.581, 'duration': 5.861}, {'end': 863.163, 'text': "Note that you don't have to import the class again.", 'start': 860.662, 'duration': 2.501}, {'end': 873.205, 'text': 'You just instantiate the model with the argument nNeighbors equals 5, fit the model with the data, and make predictions.', 'start': 864.343, 'duration': 8.862}, {'end': 882.547, 'text': 'This time, the model predicts the value 1 for both unknown irises.', 'start': 877.726, 'duration': 4.821}, {'end': 895.774, 'text': 'One of the things I love about scikit-learn is that its models have a uniform interface,', 'start': 890.012, 'duration': 5.762}, {'end': 901.936, 'text': 'which means that I can use the same four-step pattern on a different model with relative ease.', 'start': 895.774, 'duration': 6.162}, {'end': 911.58, 'text': 'For example, I might want to try logistic regression, which, despite its name, is another model used for classification.', 'start': 903.177, 'duration': 8.403}, {'end': 923.512, 'text': 'I simply import logistic regression from the LinearModel module, instantiate the model with all of the default parameters,', 'start': 912.924, 'duration': 10.588}, {'end': 926.474, 'text': 'fit the model with data and make predictions.', 'start': 923.512, 'duration': 2.962}, {'end': 939.945, 'text': 'This time, the model predicts a value of 2 for the first unknown iris, and a value of 0 for the second unknown iris.', 'start': 931.018, 'duration': 8.927}, {'end': 948.71, 'text': 'Of course, you might be wondering which model produced the correct predictions for these two unknown irises.', 'start': 941.727, 'duration': 6.983}, {'end': 959.148, 'text': "The answer is that we don't know because these are out of sample observations, meaning that we don't know the true response values.", 'start': 949.926, 'duration': 9.222}, {'end': 968.39, 'text': 'As we talked about in the first video, our goal with supervised learning is to build models that generalize to new data.', 'start': 960.329, 'duration': 8.061}, {'end': 978.793, 'text': "However, we often aren't able to truly measure how well our models will perform on out of sample data.", 'start': 969.471, 'duration': 9.322}, {'end': 987.708, 'text': "Does that mean that we're forced to just guess how well our models are likely to do? Thankfully, no.", 'start': 981.141, 'duration': 6.567}, {'end': 994.607, 'text': "In the next video, we'll begin to discuss model evaluation procedures,", 'start': 989.184, 'duration': 5.423}], 'summary': 'Model predicts values for unknown irises using model tuning and logistic regression.', 'duration': 159.326, 'max_score': 835.281, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8835281.jpg'}, {'end': 1062.067, 'src': 'embed', 'start': 1028.731, 'weight': 5, 'content': [{'end': 1033.233, 'text': 'First is the nearest neighbors section of the scikit-learn user guide.', 'start': 1028.731, 'duration': 4.502}, {'end': 1042.959, 'text': 'It can help you to understand the available nearest neighbor algorithms and how to use them effectively.', 'start': 1036.836, 'duration': 6.123}, {'end': 1050.343, 'text': 'Also worth reviewing is the class documentation for Knighbors classifier.', 'start': 1044.56, 'duration': 5.783}, {'end': 1062.067, 'text': "It's useful to become familiar with the structure of the class documentation since all classes are documented in the same manner.", 'start': 1053.863, 'duration': 8.204}], 'summary': 'Scikit-learn user guide covers nearest neighbor algorithms and class documentation for kneighbors classifier.', 'duration': 33.336, 'max_score': 1028.731, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe81028731.jpg'}], 'start': 835.281, 'title': 'Model tuning and evaluation in scikit-learn', 'summary': 'Covers model tuning in scikit-learn, demonstrating improved prediction accuracy by varying the k value, and discusses model evaluation procedures for choosing the best value of k and n or logistic regression for a particular task.', 'chapters': [{'end': 926.474, 'start': 835.281, 'title': 'Model tuning in scikit-learn', 'summary': 'Explains the process of model tuning in scikit-learn, showcasing an example of varying the argument for k to improve prediction accuracy from 2 to 1 for unknown irises, and highlights the uniform interface of scikit-learn models.', 'duration': 91.193, 'highlights': ['The chapter explains model tuning by showcasing an example of varying the argument for k from 2 to 1 for unknown irises, demonstrating the improvement in prediction accuracy.', 'It highlights the uniform interface of scikit-learn models, enabling the same four-step pattern to be used on different models with relative ease.', 'Logistic regression is mentioned as an alternative model for classification, showcasing the process of importing, instantiating, fitting, and making predictions with the model.']}, {'end': 1186.677, 'start': 931.018, 'title': 'Model evaluation procedures', 'summary': 'Discusses model evaluation procedures to estimate the performance of models on out-of-sample data, including choosing the best value of k and n and deciding between k and n or logistic regression for a particular task.', 'duration': 255.659, 'highlights': ["The model predicts a value of 2 for the first unknown iris, and a value of 0 for the second unknown iris. The model's predictions for the unknown irises, with a value of 2 and 0, are mentioned.", 'Discussion of model evaluation procedures to estimate model performance on out-of-sample data. The chapter introduces model evaluation procedures for estimating model performance on out-of-sample data.', 'Provided resources including scikit-learn user guide, nearest neighbors section, class documentation for KNeighbors classifier, and logistic regression user guide. Various resources like scikit-learn user guide, nearest neighbors section, class documentation for KNeighbors classifier, and logistic regression user guide are linked for reference.']}], 'duration': 351.396, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlQuVL6-qe8/pics/RlQuVL6-qe8835281.jpg', 'highlights': ['The chapter explains model tuning by showcasing an example of varying the argument for k from 2 to 1 for unknown irises, demonstrating the improvement in prediction accuracy.', 'Discussion of model evaluation procedures to estimate model performance on out-of-sample data. The chapter introduces model evaluation procedures for estimating model performance on out-of-sample data.', 'Logistic regression is mentioned as an alternative model for classification, showcasing the process of importing, instantiating, fitting, and making predictions with the model.', "The model predicts a value of 2 for the first unknown iris, and a value of 0 for the second unknown iris. The model's predictions for the unknown irises, with a value of 2 and 0, are mentioned.", 'It highlights the uniform interface of scikit-learn models, enabling the same four-step pattern to be used on different models with relative ease.', 'Provided resources including scikit-learn user guide, nearest neighbors section, class documentation for KNeighbors classifier, and logistic regression user guide. Various resources like scikit-learn user guide, nearest neighbors section, class documentation for KNeighbors classifier, and logistic regression user guide are linked for reference.']}], 'highlights': ['The chapter explains model tuning by showcasing an example of varying the argument for k from 2 to 1 for unknown irises, demonstrating the improvement in prediction accuracy.', 'The process of training a k-nearest-neighbors classification model using Scikit-Learn consists of importing the relevant class, instantiating the estimator, fitting the model with data, and making predictions for new observations.', 'The argument nNeighbors equals 1 is used to specify the k-nearest-neighbors algorithm to look for the one nearest neighbor, known as a tuning parameter or hyperparameter.', "The model predicts a value of 2 for the first unknown iris, and a value of 0 for the second unknown iris. The model's predictions for the unknown irises, with a value of 2 and 0, are mentioned.", 'The KNN model calculates the numerical distance and selects the nearest five known irises to predict the response value for the unknown iris.']}