title
Machine Learning Tutorial Python - 16: Hyper parameter Tuning (GridSearchCV)
description
In this python machine learning tutorial for beginners we will look into,
1) how to hyper tune machine learning model paramers
2) choose best model for given machine learning problem
We will start by comparing traditional train_test_split approach with k fold cross validation. Then we will see how GridSearchCV helps run K Fold cross validation with its convenient api. GridSearchCV helps find best parameters that gives maximum performance. RandomizedSearchCV is another class in sklearn library that does same thing as GridSearchCV
but without running exhaustive search, this helps with computation time and resources. We will also see how to find best model among all the classification algorithm using GridSearchCV. In the end we have interesting exercise for you to solve.
#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #HyperParameter #GridSearchCV #sklearntutorials #scikitlearntutorials
Exercise: https://github.com/codebasics/py/blob/master/ML/15_gridsearch/exercise.md
Code in this tutorial: https://github.com/codebasics/py/blob/master/ML/15_gridsearch/15_grid_search.ipynb
Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses.
Exercise solution: https://github.com/codebasics/py/blob/master/ML/15_gridsearch/Exercise/15_grid_search_cv_exercise.ipynb
Topics that are covered in this Video:
00:00 Introduction
00:45 train_test_split to find model performance
01:37 K fold cross validation
04:44 GridSearchCV for hyperparameter tuning
10:18 RandomizedSearchCV
12:35 Choosing best model
15:25 Exercise
Next Video:
Deep Learning Tutorial Python, Tensorflow And Keras: Introduction and Installation: https://www.youtube.com/watch?v=oPa20mUgJi8&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=18
Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV
Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg
Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy
matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl
Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1
Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8
Tools and Libraries:
Scikit learn tutorials
Sklearn tutorials
Machine learning with scikit learn tutorials
Machine learning with sklearn tutorials
🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description
Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website.
#️⃣ Social Media #️⃣
🔗 Discord: https://discord.gg/r42Kbuk
📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/
📸 Codebasics Instagram: https://www.instagram.com/codebasicshub/
🔊 Facebook: https://www.facebook.com/codebasicshub
📱 Twitter: https://twitter.com/codebasicshub
📝 Linkedin (Personal): https://www.linkedin.com/in/dhavalsays/
📝 Linkedin (Codebasics): https://www.linkedin.com/company/codebasics/
🔗 Patreon: https://www.patreon.com/codebasics?fan_landing=true
detail
{'title': 'Machine Learning Tutorial Python - 16: Hyper parameter Tuning (GridSearchCV)', 'heatmap': [{'end': 52.21, 'start': 35.319, 'weight': 0.876}, {'end': 122.254, 'start': 93.229, 'weight': 0.789}, {'end': 475.915, 'start': 426.838, 'weight': 1}, {'end': 590.721, 'start': 559.966, 'weight': 0.899}, {'end': 851.744, 'start': 747.663, 'weight': 0.805}], 'summary': 'Covers ml model selection, hyperparameter tuning, and gridsearchcv optimization, achieving a 95% accuracy score through tuning, reducing iterations with grid search cv, and achieving a 0.98 best score, emphasizing practicality of randomized search cv, and achieving 98% accuracy for svm model using gridsearchcv on the iris dataset.', 'chapters': [{'end': 277.299, 'segs': [{'end': 72.899, 'src': 'heatmap', 'start': 35.319, 'weight': 0, 'content': [{'end': 41.184, 'text': 'What kind of kernel and C and gamma values should I be using? There are just so many values to choose from.', 'start': 35.319, 'duration': 5.865}, {'end': 45.248, 'text': 'The process of choosing the optimal parameter is called hyper tuning.', 'start': 41.705, 'duration': 3.543}, {'end': 52.21, 'text': 'In my Jupyter notebook, I have loaded Iris flower data set here and it is being shown in a table format.', 'start': 46.368, 'duration': 5.842}, {'end': 63.335, 'text': 'The traditional approach that we can take to solve this problem is we use train test split method to split our data set into training and test data set.', 'start': 53.051, 'duration': 10.284}, {'end': 72.899, 'text': "Here I'm using 7030 partition and then let's say, we first try SVM model.", 'start': 63.395, 'duration': 9.504}], 'summary': 'Hyper tuning parameters for svm on iris dataset with 70/30 train-test split.', 'duration': 37.58, 'max_score': 35.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s35319.jpg'}, {'end': 142.963, 'src': 'heatmap', 'start': 93.229, 'weight': 1, 'content': [{'end': 102.119, 'text': "I don't know what is the best parameter, so I'm just going with some value and The issue here is that based on your train and test set,", 'start': 93.229, 'duration': 8.89}, {'end': 103.18, 'text': 'the score might vary.', 'start': 102.119, 'duration': 1.061}, {'end': 110.287, 'text': 'Right now my score is 95%, but if I execute this again, X train, X test samples are going to change.', 'start': 103.88, 'duration': 6.407}, {'end': 113.19, 'text': 'So it will change from 95 to now change to one.', 'start': 110.527, 'duration': 2.663}, {'end': 117.973, 'text': 'I cannot rely on this method because the score is changing based on my samples.', 'start': 114.171, 'duration': 3.802}, {'end': 122.254, 'text': 'For that reason, we use k-fold cross-validation.', 'start': 118.673, 'duration': 3.581}, {'end': 124.575, 'text': 'I have a video on k-fold cross-validation.', 'start': 122.535, 'duration': 2.04}, {'end': 129.997, 'text': 'So if you want to pause here and take a detailed look at it, you can go there.', 'start': 124.635, 'duration': 5.362}, {'end': 131.958, 'text': 'But I will just give you an overview.', 'start': 130.338, 'duration': 1.62}, {'end': 141.342, 'text': 'As shown in the diagram, what we do in a k-fold cross-validation is we divide our data samples into n number of folds.', 'start': 132.559, 'duration': 8.783}, {'end': 142.963, 'text': 'Here I am showing 5 folds.', 'start': 141.582, 'duration': 1.381}], 'summary': 'Using k-fold cross-validation to avoid score variation, with 95% initial score.', 'duration': 49.734, 'max_score': 93.229, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s93229.jpg'}, {'end': 203.438, 'src': 'embed', 'start': 175.947, 'weight': 4, 'content': [{'end': 180.209, 'text': 'Here what I have done is tried cross well score for 5-fold.', 'start': 175.947, 'duration': 4.262}, {'end': 182.59, 'text': 'So CV is equal to 5 means 5-fold.', 'start': 180.289, 'duration': 2.301}, {'end': 187.611, 'text': 'And tried this method on different values of kernels and C.', 'start': 183.49, 'duration': 4.121}, {'end': 189.352, 'text': 'Okay, so here kernel is linear.', 'start': 187.611, 'duration': 1.741}, {'end': 193.534, 'text': 'Here it is RBF, C 10 and here C is 20.', 'start': 189.792, 'duration': 3.742}, {'end': 196.074, 'text': 'For each of these combinations.', 'start': 193.534, 'duration': 2.54}, {'end': 199.616, 'text': 'i found the scores, so these are like five.', 'start': 196.074, 'duration': 3.542}, {'end': 203.438, 'text': 'you can see there are five values here, and these are the scores from fire iteration.', 'start': 199.616, 'duration': 3.822}], 'summary': 'Utilized 5-fold cross-validation with different kernel and c values, achieving five scores for each combination.', 'duration': 27.491, 'max_score': 175.947, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s175947.jpg'}], 'start': 1.161, 'title': 'Ml model selection and optimization', 'summary': 'Discusses the process of selecting the best model and hyperparameter tuning for a machine learning problem, achieving a 95% accuracy score through hyperparameter tuning. it also explains k-fold cross-validation and its use in finding optimal model parameters, with an example of kernel and c value optimization.', 'chapters': [{'end': 117.973, 'start': 1.161, 'title': 'Choosing ml model and hyperparameter tuning', 'summary': "Discusses the process of choosing the best model and hyperparameter tuning for a machine learning problem, using the example of classifying the sklearn's iris flower dataset with svm model and achieving a 95% accuracy score through hyperparameter tuning.", 'duration': 116.812, 'highlights': ['The chapter discusses the process of choosing the best model and hyperparameter tuning for a machine learning problem', 'Achieving a 95% accuracy score through hyperparameter tuning', 'Exploring the challenges of relying on train-test split method for model evaluation']}, {'end': 277.299, 'start': 118.673, 'title': 'K-fold cross-validation for model optimization', 'summary': "Explains the concept of k-fold cross-validation, demonstrating how it's used to find the optimal parameters for a machine learning model by averaging scores from different iterations and using a for loop to automate the process, with an example of finding the optimal combination of kernel and c values.", 'duration': 158.626, 'highlights': ['Using k-fold cross-validation to divide data samples into n folds, iterating through each fold as a test set and finding the average score from all iterations, providing an effective method for determining the optimal parameters for a machine learning model.', 'Demonstrating the manual and repetitive nature of finding optimal parameter values using k-fold cross-validation, leading to the introduction of a for loop to automate the process and find the average scores for different combinations of kernel and C values.', 'Showing specific examples of scores obtained for different combinations of kernel and C values using k-fold cross-validation, highlighting the process of determining the optimal parameter values for the machine learning model.']}], 'duration': 276.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s1161.jpg', 'highlights': ['Achieving a 95% accuracy score through hyperparameter tuning', 'Using k-fold cross-validation to find optimal model parameters', 'Exploring the challenges of relying on train-test split method for model evaluation', 'Demonstrating the manual and repetitive nature of finding optimal parameter values using k-fold cross-validation', 'Showing specific examples of scores obtained for different combinations of kernel and C values using k-fold cross-validation']}, {'end': 601.104, 'segs': [{'end': 304.92, 'src': 'embed', 'start': 277.759, 'weight': 0, 'content': [{'end': 284.224, 'text': 'So this way I can find out the optimal score using the hyperparameter tuning.', 'start': 277.759, 'duration': 6.465}, {'end': 291.97, 'text': 'But you can see that this approach also has some issues, which is if I have four parameters, for example,', 'start': 284.765, 'duration': 7.205}, {'end': 297.375, 'text': "then I have to run like four loops and it will be too many iterations, and it's just not convenient.", 'start': 291.97, 'duration': 5.405}, {'end': 304.92, 'text': 'luckily, sklearn provides an api called grid search cv, which will do the exact same thing.', 'start': 297.975, 'duration': 6.945}], 'summary': 'Using grid search cv reduces iterations and improves convenience for hyperparameter tuning.', 'duration': 27.161, 'max_score': 277.759, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s277759.jpg'}, {'end': 397.879, 'src': 'embed', 'start': 369.102, 'weight': 2, 'content': [{'end': 372.304, 'text': 'i want the value of c to be 1, 10 and 20..', 'start': 369.102, 'duration': 3.202}, {'end': 376.446, 'text': 'okay, these are like different values that you want to try.', 'start': 372.304, 'duration': 4.142}, {'end': 389.314, 'text': 'the second parameter is kernel and you want to try the kernel and you want the value of your kernel to be rbf and linear.', 'start': 376.446, 'duration': 12.868}, {'end': 391.715, 'text': 'okay, so these are the two values.', 'start': 389.314, 'duration': 2.401}, {'end': 397.879, 'text': 'there are other parameters in grid search cv, for example, how many cross validations you want to run?', 'start': 391.715, 'duration': 6.164}], 'summary': 'Grid search cv with c values 1, 10, and 20, kernel rbf and linear.', 'duration': 28.777, 'max_score': 369.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s369102.jpg'}, {'end': 475.915, 'src': 'heatmap', 'start': 426.838, 'weight': 1, 'content': [{'end': 438.223, 'text': 'once this is done, you will do model training by saying iris dot data and iris dot target okay, and once that is done,', 'start': 426.838, 'duration': 11.385}, {'end': 442.545, 'text': 'we will print the cross validation results.', 'start': 438.223, 'duration': 4.322}, {'end': 445.266, 'text': 'when you execute this, you get these results now.', 'start': 442.545, 'duration': 2.721}, {'end': 450.81, 'text': 'if you look at these results, you will notice that you got this mean test score.', 'start': 445.266, 'duration': 5.544}, {'end': 459.401, 'text': 'CV results are not easy to view, but luckily, sklearn provides a way to download these results into a data frame.', 'start': 451.311, 'duration': 8.09}, {'end': 464.527, 'text': "Here I have sklearn documentation, and it says that this can be imported into Panda's data frame.", 'start': 459.541, 'duration': 4.986}, {'end': 475.915, 'text': "that's the next thing I'm going to do, and all you guys are, I think, exports into pandas by now.", 'start': 465.227, 'duration': 10.688}], 'summary': 'Train model on iris data, print cross-validation results, import into pandas.', 'duration': 49.077, 'max_score': 426.838, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s426838.jpg'}, {'end': 508.025, 'src': 'embed', 'start': 475.915, 'weight': 5, 'content': [{'end': 488.762, 'text': 'so you just create pandas data frame and supply CV results as an input, And when I run this, I get this nice tabular view.', 'start': 475.915, 'duration': 12.847}, {'end': 493.923, 'text': 'Here you can see that these are the C parameter values, then kernel values.', 'start': 489.162, 'duration': 4.761}, {'end': 499.483, 'text': 'And these are the scores from each individual split.', 'start': 494.443, 'duration': 5.04}, {'end': 502.304, 'text': 'Okay We then five fold cross validation.', 'start': 499.784, 'duration': 2.52}, {'end': 505.244, 'text': "That's why you get split zero to split four.", 'start': 502.344, 'duration': 2.9}, {'end': 508.025, 'text': 'And then you have mean test score as well.', 'start': 505.865, 'duration': 2.16}], 'summary': 'Using pandas data frame and cv results, we get tabular view of c parameter, kernel values, and individual split scores, with mean test score after five-fold cross validation.', 'duration': 32.11, 'max_score': 475.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s475915.jpg'}, {'end': 590.721, 'src': 'heatmap', 'start': 536.146, 'weight': 4, 'content': [{'end': 539.927, 'text': 'so we already did hyper tuning of these parameters.', 'start': 536.146, 'duration': 3.781}, {'end': 545.477, 'text': 'you see that this how this works right, and now You can have many, many parameters.', 'start': 539.927, 'duration': 5.55}, {'end': 559.546, 'text': 'All you have to do is supply them in parameter grid and this grid search CV will do permutation and combination of each of these parameters using k fold cross validation.', 'start': 545.657, 'duration': 13.889}, {'end': 564.589, 'text': 'And it will show you all the results in this nice pandas data frame.', 'start': 559.966, 'duration': 4.623}, {'end': 572.393, 'text': 'I can do DIR on my classifier and see what other properties this object has.', 'start': 565.069, 'duration': 7.324}, {'end': 578.535, 'text': 'And I see some of the properties such as best estimator, best params, and best score.', 'start': 573.113, 'duration': 5.422}, {'end': 582.437, 'text': 'So let me try best score first.', 'start': 578.916, 'duration': 3.521}, {'end': 590.721, 'text': 'So CLF dot best score and the best score it is saying 0.98, which is valid.', 'start': 582.797, 'duration': 7.924}], 'summary': 'Performed hyperparameter tuning, achieving a best score of 0.98 using grid search cv.', 'duration': 28.443, 'max_score': 536.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s536146.jpg'}], 'start': 277.759, 'title': 'Optimizing model with grid search cv', 'summary': 'Explains how to use grid search cv to optimize model parameters, reducing the number of iterations and conveniently finding the optimal score, with examples of parameter grid values and the use of grid search cv in a single line of code. it also details using grid search cv in sklearn to tune model parameters, achieving the best score of 0.98, and visualizing the cross validation results in a pandas data frame.', 'chapters': [{'end': 397.879, 'start': 277.759, 'title': 'Optimizing model with grid search cv', 'summary': 'Explains how to use grid search cv to optimize model parameters, reducing the number of iterations and conveniently finding the optimal score, with examples of parameter grid values and the use of grid search cv in a single line of code.', 'duration': 120.12, 'highlights': ['Grid search cv reduces the number of iterations needed to find the optimal model parameters, as it allows trying different parameter values in a single line of code.', "The parameter grid for grid search cv includes values for 'C' (1, 10, 20) and 'kernel' (rbf, linear), showcasing the ability to specify multiple parameter values to be tested.", 'The chapter introduces the sklearn api called grid search cv, which simplifies the process of hyperparameter tuning and finding the optimal model parameters.']}, {'end': 601.104, 'start': 398.599, 'title': 'Grid search cv for model tuning', 'summary': 'Explains how to use grid search cv in sklearn to tune model parameters, resulting in the best score of 0.98, and then visualizing the cross validation results in a pandas data frame.', 'duration': 202.505, 'highlights': ['Using Grid Search CV to tune model parameters resulted in the best score of 0.98', 'Visualizing cross validation results in a Pandas data frame', 'Conveniently using Grid Search CV to perform permutation and combination of parameters using k fold cross validation']}], 'duration': 323.345, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s277759.jpg', 'highlights': ['Using Grid Search CV to tune model parameters resulted in the best score of 0.98', 'Grid search cv reduces the number of iterations needed to find the optimal model parameters, as it allows trying different parameter values in a single line of code.', "The parameter grid for grid search cv includes values for 'C' (1, 10, 20) and 'kernel' (rbf, linear), showcasing the ability to specify multiple parameter values to be tested.", 'The chapter introduces the sklearn api called grid search cv, which simplifies the process of hyperparameter tuning and finding the optimal model parameters.', 'Conveniently using Grid Search CV to perform permutation and combination of parameters using k fold cross validation', 'Visualizing cross validation results in a Pandas data frame']}, {'end': 988.816, 'segs': [{'end': 707.815, 'src': 'embed', 'start': 652.995, 'weight': 1, 'content': [{'end': 661.161, 'text': 'To tackle this computation problem, sklearn library comes up with another class called randomized search CV.', 'start': 652.995, 'duration': 8.166}, {'end': 663.062, 'text': 'Randomized search.', 'start': 662.242, 'duration': 0.82}, {'end': 673.029, 'text': 'CV will not try every single permutation and combination of parameters, but it will try random combination of these parameter values.', 'start': 663.062, 'duration': 9.967}, {'end': 676.732, 'text': 'And you can choose what those iteration could be.', 'start': 673.309, 'duration': 3.423}, {'end': 679.894, 'text': 'So let me just show you how that works.', 'start': 677.172, 'duration': 2.722}, {'end': 690.282, 'text': 'Here I imported randomized search CV class from the SQL model selection and the API kind of looks same as grid search CV.', 'start': 680.034, 'duration': 10.248}, {'end': 696.346, 'text': 'I supplied my parameter grid, my cross validation value, which is again five fold cross validation.', 'start': 690.542, 'duration': 5.804}, {'end': 700.129, 'text': 'And the most interesting parameter here is an iteration.', 'start': 696.947, 'duration': 3.182}, {'end': 703.892, 'text': 'I want to try only two combinations.', 'start': 700.71, 'duration': 3.182}, {'end': 707.815, 'text': 'OK, here we try total six zero to five.', 'start': 703.972, 'duration': 3.843}], 'summary': 'Using randomized search cv, sklearn library tries random combinations of parameter values, with an example of trying only two combinations.', 'duration': 54.82, 'max_score': 652.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s652995.jpg'}, {'end': 851.744, 'src': 'heatmap', 'start': 747.663, 'weight': 0.805, 'content': [{'end': 754.709, 'text': 'then you just want to try random values of parameters and just go with whatever comes out to be the best.', 'start': 747.663, 'duration': 7.046}, {'end': 757.251, 'text': 'All right, we looked into hyperparameter tuning.', 'start': 755.029, 'duration': 2.222}, {'end': 761.835, 'text': 'Now I want to show you how do you choose a best model for your given problem.', 'start': 757.591, 'duration': 4.244}, {'end': 769.922, 'text': "For our iris dataset, I'm going to try these two classifiers, SVM, random forest, and the logistic regression.", 'start': 762.315, 'duration': 7.607}, {'end': 773.705, 'text': 'And I want to figure out which one gives me the best performance.', 'start': 770.342, 'duration': 3.363}, {'end': 782.988, 'text': "You have to define your parameter grid and I'm just defining them as a simple JSON object or simple Python dictionary,", 'start': 774.105, 'duration': 8.883}, {'end': 790.11, 'text': "where I'm saying I want to try SVM model with these parameters, random forest with these other parameters.", 'start': 782.988, 'duration': 7.122}, {'end': 795.191, 'text': 'I want the tree value of random forest to be one five and ten.', 'start': 790.49, 'duration': 4.701}, {'end': 800.753, 'text': 'And this N estimator is an argument in random forest classifier.', 'start': 795.651, 'duration': 5.102}, {'end': 809.375, 'text': 'Okay, similarly, the value c is an argument or a parameter in logistic regression classifier.', 'start': 801.013, 'duration': 8.362}, {'end': 814.676, 'text': 'once i have initialized this dictionary, i can write a simple for loop.', 'start': 809.375, 'duration': 5.301}, {'end': 819.678, 'text': "so i'm just going to show you that for loop here and this for loop is doing nothing.", 'start': 814.676, 'duration': 5.002}, {'end': 826.151, 'text': "but it's just going through this dictionary values and for each of the values it will use GridSearchCV.", 'start': 819.678, 'duration': 6.473}, {'end': 832.654, 'text': 'So you can see that GridSearchCV, the first argument is the classifier, which is your model.', 'start': 826.171, 'duration': 6.483}, {'end': 834.936, 'text': 'So here you can see the model is classifier.', 'start': 832.794, 'duration': 2.142}, {'end': 844.421, 'text': 'So it is trying each of these classifiers one by one with the corresponding parameter grid that I have specified in this dictionary.', 'start': 835.336, 'duration': 9.085}, {'end': 847.563, 'text': 'You can see that parameter is the second object.', 'start': 845.421, 'duration': 2.142}, {'end': 851.744, 'text': 'second argument and then cross validation is five.', 'start': 848.383, 'duration': 3.361}], 'summary': 'Demonstrating hyperparameter tuning and model selection using svm, random forest, and logistic regression on the iris dataset.', 'duration': 104.081, 'max_score': 747.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s747663.jpg'}, {'end': 790.11, 'src': 'embed', 'start': 757.591, 'weight': 2, 'content': [{'end': 761.835, 'text': 'Now I want to show you how do you choose a best model for your given problem.', 'start': 757.591, 'duration': 4.244}, {'end': 769.922, 'text': "For our iris dataset, I'm going to try these two classifiers, SVM, random forest, and the logistic regression.", 'start': 762.315, 'duration': 7.607}, {'end': 773.705, 'text': 'And I want to figure out which one gives me the best performance.', 'start': 770.342, 'duration': 3.363}, {'end': 782.988, 'text': "You have to define your parameter grid and I'm just defining them as a simple JSON object or simple Python dictionary,", 'start': 774.105, 'duration': 8.883}, {'end': 790.11, 'text': "where I'm saying I want to try SVM model with these parameters, random forest with these other parameters.", 'start': 782.988, 'duration': 7.122}], 'summary': 'Comparing svm, random forest, and logistic regression classifiers for iris dataset to find best performance.', 'duration': 32.519, 'max_score': 757.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s757591.jpg'}, {'end': 938.414, 'src': 'embed', 'start': 886.148, 'weight': 0, 'content': [{'end': 898.65, 'text': 'So here I have my conclusion that the best model for my iris dataset problem, SVM, it, will give me 98% score with these parameters.', 'start': 886.148, 'duration': 12.502}, {'end': 904.252, 'text': 'so not only we did hyper parameter tuning, but we also selected the best model here.', 'start': 898.65, 'duration': 5.602}, {'end': 908.013, 'text': 'I have used only three models for the demonstration.', 'start': 904.252, 'duration': 3.761}, {'end': 911.736, 'text': 'you can use 100 models, for example here.', 'start': 908.013, 'duration': 3.723}, {'end': 919.923, 'text': 'Okay, so this is this is more like trial and error approach, But in practical life this works really well,', 'start': 912.016, 'duration': 7.907}, {'end': 925.428, 'text': 'and this is what people use to figure out the best model and the best parameters.', 'start': 919.923, 'duration': 5.505}, {'end': 930.291, 'text': 'now comes the most interesting part of my tutorial, which is the exercise.', 'start': 926.369, 'duration': 3.922}, {'end': 938.414, 'text': 'you have to do this exercise, guys, just by watching video, you are not going to learn anything, so please move your butt.', 'start': 930.291, 'duration': 8.123}], 'summary': 'Svm model achieved 98% accuracy on iris dataset after hyper parameter tuning, demonstrating trial and error approach for model selection.', 'duration': 52.266, 'max_score': 886.148, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s886148.jpg'}], 'start': 601.524, 'title': 'Hyperparameter tuning and model selection', 'summary': 'Discusses hyperparameter tuning methods such as grid search cv and randomized search cv, emphasizing the practicality of randomized search cv, the process of choosing the best model based on given parameters for svm, random forest, and logistic regression, achieving 98% accuracy for svm model on the iris dataset using gridsearchcv, and the importance of hands-on practice and trying different parameters for better results.', 'chapters': [{'end': 809.375, 'start': 601.524, 'title': 'Hyperparameter tuning and model selection', 'summary': 'Discusses hyperparameter tuning, comparing grid search cv with randomized search cv, and selecting the best model for a given problem, emphasizing the practicality of randomized search cv and the process of choosing the best model based on the given parameters for svm, random forest, and logistic regression.', 'duration': 207.851, 'highlights': ['Randomized search CV is a more practical option when dealing with large datasets and limited computation power, as it tries random combinations of parameter values, reducing computation cost.', 'The process of choosing the best model for a given problem involves defining parameter grids for different classifiers, such as SVM, random forest, and logistic regression, and then evaluating their performance to determine the best option.', 'Grid search CV and randomized search CV are compared, emphasizing the practicality of randomized search CV in the context of limited computation power and large datasets.']}, {'end': 988.816, 'start': 809.375, 'title': 'Hyperparameter tuning and model selection', 'summary': 'Discusses hyperparameter tuning using gridsearchcv with a dictionary of parameter grids, achieving 98% accuracy for svm model on the iris dataset, and emphasizes the importance of hands-on practice and trying different parameters for better results.', 'duration': 179.441, 'highlights': ['The best model for the iris dataset, SVM, achieved a 98% accuracy using hyperparameter tuning with GridSearchCV.', 'The tutorial emphasizes the importance of hands-on practice and trying different parameters for better results.', 'The chapter showcases the process of hyperparameter tuning and model selection using GridSearchCV with a dictionary of parameter grids.']}], 'duration': 387.292, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/HdlDYng8g9s/pics/HdlDYng8g9s601524.jpg', 'highlights': ['SVM achieved 98% accuracy on iris dataset using GridSearchCV', 'Randomized search CV reduces computation cost for large datasets', 'Choosing best model involves defining parameter grids for classifiers', 'Hands-on practice and trying different parameters are emphasized', 'Grid search CV and randomized search CV practicality comparison']}], 'highlights': ['Achieving a 95% accuracy score through hyperparameter tuning', 'Using Grid Search CV to tune model parameters resulted in the best score of 0.98', 'SVM achieved 98% accuracy on iris dataset using GridSearchCV', 'Grid search cv reduces the number of iterations needed to find the optimal model parameters, as it allows trying different parameter values in a single line of code', 'Randomized search CV reduces computation cost for large datasets', 'Using k-fold cross-validation to find optimal model parameters', 'Conveniently using Grid Search CV to perform permutation and combination of parameters using k fold cross validation', 'Exploring the challenges of relying on train-test split method for model evaluation', "The parameter grid for grid search cv includes values for 'C' (1, 10, 20) and 'kernel' (rbf, linear), showcasing the ability to specify multiple parameter values to be tested", 'Demonstrating the manual and repetitive nature of finding optimal parameter values using k-fold cross-validation']}