title
Python Machine Learning Tutorial (Data Science)
description
Python Machine Learning Tutorial - Learn how to predict the kind of music people like.
👍 Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF
👉 The CSV file used in this tutorial: https://bit.ly/3muqqta
🚀 Learn Python in one hour: https://youtu.be/kqtD5dpn9C8
🚀 Python (Full Course): https://www.youtube.com/watch?v=_uQrJ0TkZlc
Want to learn more from me?
Courses: https://codewithmosh.com
Twitter: https://twitter.com/moshhamedani
Facebook: https://www.facebook.com/programmingwithmosh/
Blog: http://programmingwithmosh.com
#Python, #MachineLearning, #Jupyter
TABLE OF CONTENT
0:00:00 Introduction
0:00:59 What is Machine Learning?
0:02:58 Machine Learning in Action
0:05:45 Libraries and Tools
0:10:40 Importing a Data Set
0:17:01 Jupyter Shortcuts
0:22:53 A Real Machine Learning Problem
0:26:09 Preparing the Data
0:29:15 Learning and Predicting
0:33:20 Calculating the Accuracy
0:39:41 Persisting Models
0:42:55 Visualizing a Decision Tree
detail
{'title': 'Python Machine Learning Tutorial (Data Science)', 'heatmap': [{'end': 540.532, 'start': 503.698, 'weight': 0.941}, {'end': 1762.645, 'start': 1729.146, 'weight': 0.751}, {'end': 1881.525, 'start': 1849.364, 'weight': 0.744}, {'end': 2541.025, 'start': 2471.102, 'weight': 0.985}, {'end': 2595.29, 'start': 2563.168, 'weight': 0.737}, {'end': 2774.812, 'start': 2739.245, 'weight': 1}], 'summary': 'This tutorial on python machine learning in jupyter notebook covers music recommendation, machine learning concepts, scikit-learn introduction, pandas data frames, python course, data cleaning, and model training. achieves 100% accuracy in music genre prediction using decision tree algorithm.', 'chapters': [{'end': 54.012, 'segs': [{'end': 48.511, 'src': 'embed', 'start': 19.705, 'weight': 0, 'content': [{'end': 24.087, 'text': "You'll learn how to build a model that can learn and predict the kind of music people like.", 'start': 19.705, 'duration': 4.382}, {'end': 26.787, 'text': 'So, by the end of this one hour tutorial,', 'start': 24.767, 'duration': 2.02}, {'end': 32.747, 'text': "you will have a good understanding of machine learning basics and you'll be able to learn more intermediate to advanced level concepts.", 'start': 26.787, 'duration': 5.96}, {'end': 36.909, 'text': "You don't need any prior knowledge in machine learning, but you need to know Python fairly well.", 'start': 33.268, 'duration': 3.641}, {'end': 40.19, 'text': "If you don't, I've got a couple of tutorials for you here on my channel.", 'start': 37.309, 'duration': 2.881}, {'end': 41.59, 'text': 'The links are below this video.', 'start': 40.47, 'duration': 1.12}, {'end': 44.71, 'text': "I'm Moush Hamadani, and I'm super excited to be your instructor.", 'start': 42.27, 'duration': 2.44}, {'end': 48.511, 'text': 'On this channel, I have tons of programming tutorials that you might find helpful.', 'start': 45.13, 'duration': 3.381}], 'summary': 'Learn to build a music preference prediction model in one hour, covering machine learning basics and intermediate to advanced concepts, with no prior knowledge required but a good understanding of python.', 'duration': 28.806, 'max_score': 19.705, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA019705.jpg'}], 'start': 1.658, 'title': 'Music recommendation with python', 'summary': 'A tutorial on machine learning using python and jupyter notebook, aiming to solve a real-world problem of predicting music preferences, providing a foundational understanding of machine learning and enabling learners to grasp intermediate to advanced concepts. the session lasts for one hour.', 'chapters': [{'end': 54.012, 'start': 1.658, 'title': 'Music recommendation with python', 'summary': 'A tutorial on machine learning using python and jupyter notebook, aiming to solve a real-world problem of predicting music preferences, providing a foundational understanding of machine learning and enabling learners to grasp intermediate to advanced concepts, and the session lasts for one hour.', 'duration': 52.354, 'highlights': ['The tutorial aims to solve a real world problem using machine learning and Python, specifically building a model to predict the kind of music people like.', 'By the end of the one hour tutorial, learners will have a good understanding of machine learning basics and will be able to learn more intermediate to advanced level concepts.', 'The instructor emphasizes that prior knowledge in machine learning is not necessary, but a decent understanding of Python is required.']}], 'duration': 52.354, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01658.jpg', 'highlights': ['The tutorial aims to solve a real world problem using machine learning and Python, specifically building a model to predict the kind of music people like.', 'By the end of the one hour tutorial, learners will have a good understanding of machine learning basics and will be able to learn more intermediate to advanced level concepts.', 'The instructor emphasizes that prior knowledge in machine learning is not necessary, but a decent understanding of Python is required.']}, {'end': 388.328, 'segs': [{'end': 112.015, 'src': 'embed', 'start': 81.667, 'weight': 2, 'content': [{'end': 87.613, 'text': 'if you want to build this program using traditional programming techniques, your program is going to get overly complex.', 'start': 81.667, 'duration': 5.946}, {'end': 95.221, 'text': "you will have to come up with lots of rules, to look for specific curves, edges and colors in an image, to tell if it's a cat or a dog.", 'start': 87.613, 'duration': 7.608}, {'end': 98.244, 'text': 'but if i give you a black and white photo, your rules may not work.', 'start': 95.221, 'duration': 3.023}, {'end': 99.084, 'text': 'they may break.', 'start': 98.244, 'duration': 0.84}, {'end': 105.351, 'text': "then you'll have to rewrite them, or i may give you a picture of a cat or a dog from a different angle that you did not predict before.", 'start': 99.084, 'duration': 6.267}, {'end': 112.015, 'text': 'So solving this problem using traditional programming techniques is going to get overly complex or sometimes impossible.', 'start': 105.911, 'duration': 6.104}], 'summary': 'Traditional programming for image recognition gets complex and may break with new inputs.', 'duration': 30.348, 'max_score': 81.667, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA081667.jpg'}, {'end': 203.853, 'src': 'embed', 'start': 135.272, 'weight': 0, 'content': [{'end': 139.695, 'text': 'For example, we give it thousands or tens of thousands of pictures of cats and dogs.', 'start': 135.272, 'duration': 4.423}, {'end': 144.218, 'text': 'Our model will then find and learn patterns in the input data,', 'start': 140.295, 'duration': 3.923}, {'end': 149.922, 'text': "so we can give it a new picture of a cat that it hasn't seen before and ask it is it a cat or a dog or a horse?", 'start': 144.218, 'duration': 5.704}, {'end': 152.444, 'text': 'And it will tell us with a certain level of accuracy.', 'start': 150.262, 'duration': 2.182}, {'end': 156.586, 'text': 'The more input data we give it, the more accurate our model is going to be.', 'start': 152.984, 'duration': 3.602}, {'end': 158.968, 'text': 'So that was a very basic example.', 'start': 157.367, 'duration': 1.601}, {'end': 167.633, 'text': 'But machine learning has other applications in self-driving cars, robotics, language processing, vision processing, forecasting,', 'start': 159.348, 'duration': 8.285}, {'end': 171.195, 'text': 'things like stock market trends and the weather games, and so on.', 'start': 167.633, 'duration': 3.562}, {'end': 173.757, 'text': "So that's the basic idea about machine learning.", 'start': 171.735, 'duration': 2.022}, {'end': 175.958, 'text': "Next, we'll look at machine learning in action.", 'start': 174.257, 'duration': 1.701}, {'end': 184.52, 'text': 'A machine learning project involves a number of steps.', 'start': 182.318, 'duration': 2.202}, {'end': 189.443, 'text': 'The first step is to import our data, which often comes in the form of a CSV file.', 'start': 185.06, 'duration': 4.383}, {'end': 191.845, 'text': 'You might have a database with lots of data.', 'start': 189.763, 'duration': 2.082}, {'end': 197.509, 'text': 'We can simply export that data and store it in a CSV file for the purpose of our machine learning project.', 'start': 192.165, 'duration': 5.344}, {'end': 203.853, 'text': 'So we import our data, next we need to clean it, and this involves tasks such as removing duplicated data.', 'start': 198.149, 'duration': 5.704}], 'summary': 'Machine learning uses input data to make accurate predictions in various applications, such as identifying cats and dogs with high accuracy and processing large datasets for forecasting.', 'duration': 68.581, 'max_score': 135.272, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0135272.jpg'}, {'end': 281.708, 'src': 'embed', 'start': 239.02, 'weight': 4, 'content': [{'end': 247.771, 'text': 'we need to split it into two segments one for training our model and the other for testing it to make sure that our model produces the right result.', 'start': 239.02, 'duration': 8.751}, {'end': 254.619, 'text': 'For example, if you have a thousand pictures of cats and dogs, we can reserve 80% for training and the other 20% for testing.', 'start': 248.392, 'duration': 6.227}, {'end': 261.723, 'text': 'The next step is to create a model, and this involves selecting an algorithm to analyze the data.', 'start': 256.141, 'duration': 5.582}, {'end': 267.584, 'text': "There's so many different machine learning algorithms out there, such as decision trees, neural networks, and so on.", 'start': 262.243, 'duration': 5.341}, {'end': 272.165, 'text': 'Each algorithm has pros and cons in terms of accuracy and performance.', 'start': 268.164, 'duration': 4.001}, {'end': 277.387, 'text': "So the algorithm you choose depends on the kind of problem you're trying to solve and your input data.", 'start': 272.566, 'duration': 4.821}, {'end': 281.708, 'text': "Now the good news is that we don't have to explicitly program an algorithm.", 'start': 278.047, 'duration': 3.661}], 'summary': 'Data is split into 80% for training and 20% for testing. various machine learning algorithms are available for model creation.', 'duration': 42.688, 'max_score': 239.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0239020.jpg'}, {'end': 329.053, 'src': 'embed', 'start': 302.355, 'weight': 6, 'content': [{'end': 308.879, 'text': 'Back to our example of cats and dogs, we can ask our model, is this a cat or a dog? And our model will make a prediction.', 'start': 302.355, 'duration': 6.524}, {'end': 311.64, 'text': 'Now the prediction is not always accurate.', 'start': 309.599, 'duration': 2.041}, {'end': 315.743, 'text': "In fact, when you start out, it's very likely that your predictions are inaccurate.", 'start': 311.84, 'duration': 3.903}, {'end': 319.565, 'text': 'So we need to evaluate the predictions and measure their accuracy.', 'start': 316.123, 'duration': 3.442}, {'end': 329.053, 'text': "Then we need to get back to our model and either select a different algorithm that is going to produce a more accurate result for the kind of problem we're trying to solve,", 'start': 320.245, 'duration': 8.808}], 'summary': 'Model predictions may be inaccurate initially, requiring evaluation and potentially selecting a different algorithm for increased accuracy.', 'duration': 26.698, 'max_score': 302.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0302355.jpg'}, {'end': 371.757, 'src': 'embed', 'start': 341.204, 'weight': 7, 'content': [{'end': 343.666, 'text': "Next we'll look at the libraries and tools for machine learning.", 'start': 341.204, 'duration': 2.462}, {'end': 355.325, 'text': "this lecture we're going to look at the popular python libraries that we use in machine learning projects.", 'start': 350.342, 'duration': 4.983}, {'end': 361.009, 'text': 'the first one is numpy, which provides a multi-dimensional array very, very popular library.', 'start': 355.325, 'duration': 5.684}, {'end': 367.034, 'text': 'the second one is pandas, which is a data analysis library that provide a concept called data frame.', 'start': 361.009, 'duration': 6.025}, {'end': 371.757, 'text': 'a data frame is a two-dimensional data structure similar to an excel spreadsheet.', 'start': 367.034, 'duration': 4.723}], 'summary': 'Overview of popular python libraries for machine learning: numpy and pandas.', 'duration': 30.553, 'max_score': 341.204, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0341204.jpg'}], 'start': 64.062, 'title': 'Machine learning', 'summary': 'Introduces machine learning for solving complex problems with applications in image recognition, self-driving cars, robotics, language processing, and forecasting. it also outlines the steps of a machine learning project, including importing and cleaning data, model selection and training, and popular python libraries for machine learning.', 'chapters': [{'end': 184.52, 'start': 64.062, 'title': 'Introduction to machine learning', 'summary': 'Introduces machine learning as a technique to solve complex problems by building models that learn patterns from large datasets, with applications in image recognition, self-driving cars, robotics, language processing, and forecasting.', 'duration': 120.458, 'highlights': ['Machine learning involves building a model with large datasets to learn patterns, enabling it to accurately classify new data.', 'Traditional programming techniques for image recognition can become overly complex or impossible when dealing with complex tasks or expanding the scope to new categories.', 'Machine learning has diverse applications, including self-driving cars, robotics, language processing, vision processing, and forecasting.']}, {'end': 388.328, 'start': 185.06, 'title': 'Machine learning project steps', 'summary': 'Outlines the steps of a machine learning project, including importing and cleaning the data, splitting it for training and testing, selecting and training a model, evaluating predictions, and exploring popular python libraries for machine learning.', 'duration': 203.268, 'highlights': ['Importing and cleaning data involves tasks such as removing duplicated and irrelevant data to ensure the input data is in good and clean shape.', 'Splitting the data into training and testing segments, for example, reserving 80% for training and 20% for testing, is crucial to ensure the model produces accurate results.', 'Selecting an algorithm, such as decision trees or neural networks, depends on the problem being solved and the input data, with libraries like scikit-learn providing these algorithms.', 'Evaluating predictions and measuring their accuracy is essential, as it helps in determining whether to select a different algorithm or fine-tune the model parameters for optimization.', 'The chapter also explores popular Python libraries for machine learning, including numpy, pandas, and matplotlib, each providing essential functionalities for data manipulation, analysis, and visualization.']}], 'duration': 324.266, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA064062.jpg', 'highlights': ['Machine learning involves building a model with large datasets to learn patterns, enabling it to accurately classify new data.', 'Machine learning has diverse applications, including self-driving cars, robotics, language processing, vision processing, and forecasting.', 'Traditional programming techniques for image recognition can become overly complex or impossible when dealing with complex tasks or expanding the scope to new categories.', 'Importing and cleaning data involves tasks such as removing duplicated and irrelevant data to ensure the input data is in good and clean shape.', 'Splitting the data into training and testing segments, for example, reserving 80% for training and 20% for testing, is crucial to ensure the model produces accurate results.', 'Selecting an algorithm, such as decision trees or neural networks, depends on the problem being solved and the input data, with libraries like scikit-learn providing these algorithms.', 'Evaluating predictions and measuring their accuracy is essential, as it helps in determining whether to select a different algorithm or fine-tune the model parameters for optimization.', 'The chapter also explores popular Python libraries for machine learning, including numpy, pandas, and matplotlib, each providing essential functionalities for data manipulation, analysis, and visualization.']}, {'end': 972.22, 'segs': [{'end': 428.132, 'src': 'embed', 'start': 389.109, 'weight': 0, 'content': [{'end': 396.013, 'text': 'The next library is scikit-learn, which is one of the most popular machine learning libraries that provides all these common algorithms,', 'start': 389.109, 'duration': 6.904}, {'end': 398.375, 'text': 'like decision trees, neural networks and so on.', 'start': 396.013, 'duration': 2.362}, {'end': 404.695, 'text': 'Now, when working with machine learning projects, we use an environment called Jupiter for writing our code.', 'start': 400.051, 'duration': 4.644}, {'end': 408.337, 'text': 'Technically, we can still use VS code or any other code editors,', 'start': 405.255, 'duration': 3.082}, {'end': 413.701, 'text': 'but these editors are not ideal for machine learning projects because we frequently need to inspect the data,', 'start': 408.337, 'duration': 5.364}, {'end': 416.944, 'text': 'and that is really hard in environments like VS code and terminal.', 'start': 413.701, 'duration': 3.243}, {'end': 423.869, 'text': "If you're working with a table of 10 or 20 columns, visualizing this data in a terminal window is really, really difficult and messy.", 'start': 417.544, 'duration': 6.325}, {'end': 425.63, 'text': "So that's why we use Jupiter.", 'start': 424.39, 'duration': 1.24}, {'end': 428.132, 'text': 'It makes it really easy to inspect our data.', 'start': 425.971, 'duration': 2.161}], 'summary': 'Scikit-learn is a popular ml library, used with jupyter for easy data inspection.', 'duration': 39.023, 'max_score': 389.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0389109.jpg'}, {'end': 511.686, 'src': 'embed', 'start': 482.566, 'weight': 1, 'content': [{'end': 490.214, 'text': 'Now, the beautiful thing about Anaconda is that it will install Jupiter, as well as all those popular data science libraries like NumPy,', 'start': 482.566, 'duration': 7.648}, {'end': 491.094, 'text': 'Pandas and so on.', 'start': 490.214, 'duration': 0.88}, {'end': 493.497, 'text': "So we don't have to manually install this using pip.", 'start': 491.395, 'duration': 2.102}, {'end': 500.856, 'text': 'now, as part of the next step, anaconda is suggesting to install microsoft vs code.', 'start': 496.531, 'duration': 4.325}, {'end': 503.698, 'text': "we already have this on our machine, so we don't have to install it.", 'start': 500.856, 'duration': 2.842}, {'end': 511.686, 'text': "we can go with, continue and close the installation and finally we can move this to trash because we don't need this installer in the future.", 'start': 503.698, 'duration': 7.988}], 'summary': 'Anaconda installs jupyter and popular data science libraries like numpy and pandas, eliminating the need for manual installation using pip.', 'duration': 29.12, 'max_score': 482.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0482566.jpg'}, {'end': 540.532, 'src': 'heatmap', 'start': 503.698, 'weight': 0.941, 'content': [{'end': 511.686, 'text': "we can go with, continue and close the installation and finally we can move this to trash because we don't need this installer in the future.", 'start': 503.698, 'duration': 7.988}, {'end': 521.284, 'text': 'all right now, open up a terminal window and type jupyter with a y space notebook.', 'start': 511.686, 'duration': 9.598}, {'end': 524.445, 'text': 'This will start the notebook server on your machine.', 'start': 522.224, 'duration': 2.221}, {'end': 526.946, 'text': 'So enter, there you go.', 'start': 524.765, 'duration': 2.181}, {'end': 532.289, 'text': "This will start the notebook server on your machine, you can see these default messages here, don't worry about them.", 'start': 527.607, 'duration': 4.682}, {'end': 540.532, 'text': 'Now it automatically opens a browser window pointing to localhost port 8888.', 'start': 532.809, 'duration': 7.723}], 'summary': 'Installation completed, start notebook server at localhost:8888.', 'duration': 36.834, 'max_score': 503.698, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0503698.jpg'}, {'end': 589.855, 'src': 'embed', 'start': 563.956, 'weight': 2, 'content': [{'end': 573.946, 'text': "I'm going to go to desktop, here's my desktop, I don't have anything here, and then click new, I want to create a notebook for Python 3.", 'start': 563.956, 'duration': 9.99}, {'end': 577.43, 'text': 'In this notebook we can write Python code and execute it line by line.', 'start': 573.946, 'duration': 3.484}, {'end': 581.094, 'text': 'We can easily visualize our data as you will see over the next few videos.', 'start': 577.47, 'duration': 3.624}, {'end': 582.916, 'text': "So let's go ahead with this.", 'start': 581.614, 'duration': 1.302}, {'end': 587.614, 'text': "Alright, here's our first notebook you can see.", 'start': 585.333, 'duration': 2.281}, {'end': 589.855, 'text': "by default it's called untitled.", 'start': 587.614, 'duration': 2.241}], 'summary': 'Creating a python 3 notebook to write and execute code, visualize data.', 'duration': 25.899, 'max_score': 563.956, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0563956.jpg'}, {'end': 673.168, 'src': 'embed', 'start': 644.96, 'weight': 4, 'content': [{'end': 650.488, 'text': "Alright, in this lecture we're going to download a data set from a very popular website called Kaggle.com.", 'start': 644.96, 'duration': 5.528}, {'end': 653.832, 'text': 'Kaggle is basically a place to do data science projects.', 'start': 650.988, 'duration': 2.844}, {'end': 660.341, 'text': 'So the first thing you need to do is to create an account, you can sign up with Facebook, Google, or using a custom email and password.', 'start': 654.193, 'duration': 6.148}, {'end': 669.146, 'text': 'Once you sign up, then come back here on Kaggle.com, here in the search bar, search for video game sales.', 'start': 661.002, 'duration': 8.144}, {'end': 673.168, 'text': "This is the name of a very popular data set that we're going to use in this lecture.", 'start': 669.806, 'duration': 3.362}], 'summary': 'Lecture focuses on downloading popular video game sales dataset from kaggle.com.', 'duration': 28.208, 'max_score': 644.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0644960.jpg'}, {'end': 882.205, 'src': 'embed', 'start': 854.259, 'weight': 3, 'content': [{'end': 860.362, 'text': 'so here in the second segment we can call one of the methods of the data frame that is df.describe.', 'start': 854.259, 'duration': 6.103}, {'end': 868.258, 'text': 'Now when we run this program, you can see the output for each segment right next to it.', 'start': 862.242, 'duration': 6.016}, {'end': 874.513, 'text': "So here's our first segment, here we have these three lines, and this is the output of the last line.", 'start': 868.719, 'duration': 5.794}, {'end': 882.205, 'text': "Below that we have our second segment, here we're calling the describe method, and right below that we have the output of this segment.", 'start': 875.199, 'duration': 7.006}], 'summary': 'Demonstrating the df.describe method with program output.', 'duration': 27.946, 'max_score': 854.259, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0854259.jpg'}], 'start': 389.109, 'title': 'Introduction to scikit-learn and jupyter with anaconda', 'summary': 'Covers the introduction of scikit-learn, emphasizing the importance of using jupyter for inspecting and visualizing data in machine learning projects. it also explains the installation of jupyter using anaconda, including installing anaconda for python 3.7, starting the notebook server, creating a jupyter notebook, and loading a dataset from a csv file. additionally, it demonstrates the process of downloading and manipulating a dataset from kaggle using pandas in jupyter, focusing on importing, inspecting, and visualizing the data.', 'chapters': [{'end': 428.132, 'start': 389.109, 'title': 'Machine learning with scikit-learn', 'summary': 'Introduces scikit-learn, a popular machine learning library, and emphasizes the importance of using jupyter for inspecting and visualizing data in machine learning projects.', 'duration': 39.023, 'highlights': ['Jupyter is preferred over VS code or other editors for machine learning projects as it makes it easy to inspect and visualize data.', 'Scikit-learn is highlighted as one of the most popular machine learning libraries that provides common algorithms like decision trees and neural networks.']}, {'end': 972.22, 'start': 429.073, 'title': 'Installing jupyter with anaconda', 'summary': 'Explains how to install jupyter using anaconda, including installing anaconda for python 3.7, starting the notebook server, creating a jupyter notebook, and loading a dataset from a csv file. it also demonstrates how to download a dataset from kaggle and manipulate it using pandas in jupyter, highlighting the process of importing, inspecting, and visualizing the data.', 'duration': 543.147, 'highlights': ['Anaconda automatically installs Jupyter and popular data science libraries like NumPy, Pandas, eliminating the need for manual installation using pip.', 'Demonstrates creating a Jupyter notebook for Python 3, writing and executing Python code, and visualizing data, showcasing the practical application of Jupyter for data analysis and visualization.', 'Shows the process of downloading a dataset from Kaggle, including creating an account, searching for a dataset, and downloading a dataset with over 16,000 rows and 11 columns.', 'Illustrates importing the dataset into Jupyter using Pandas, inspecting the data using df.describe() and visualizing basic information about each column, such as count, mean, standard deviation, and minimum value.']}], 'duration': 583.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0389109.jpg', 'highlights': ['Jupyter is preferred over VS code or other editors for machine learning projects as it makes it easy to inspect and visualize data.', 'Anaconda automatically installs Jupyter and popular data science libraries like NumPy, Pandas, eliminating the need for manual installation using pip.', 'Demonstrates creating a Jupyter notebook for Python 3, writing and executing Python code, and visualizing data, showcasing the practical application of Jupyter for data analysis and visualization.', 'Illustrates importing the dataset into Jupyter using Pandas, inspecting the data using df.describe() and visualizing basic information about each column, such as count, mean, standard deviation, and minimum value.', 'Shows the process of downloading a dataset from Kaggle, including creating an account, searching for a dataset, and downloading a dataset with over 16,000 rows and 11 columns.', 'Scikit-learn is highlighted as one of the most popular machine learning libraries that provides common algorithms like decision trees and neural networks.']}, {'end': 1355.102, 'segs': [{'end': 1047.784, 'src': 'embed', 'start': 972.22, 'weight': 0, 'content': [{'end': 978.385, 'text': 'So quite often when we work with a new data set, we call the describe method to get some basic statistics about our data.', 'start': 972.22, 'duration': 6.165}, {'end': 980.885, 'text': 'Let me show you another useful attribute.', 'start': 979.125, 'duration': 1.76}, {'end': 986.926, 'text': "So, in the next segment, let's type df.values.", 'start': 981.405, 'duration': 5.521}, {'end': 988.627, 'text': "let's run this.", 'start': 986.926, 'duration': 1.701}, {'end': 991.547, 'text': 'as you can see, this returns a two dimensional array.', 'start': 988.627, 'duration': 2.92}, {'end': 997.748, 'text': 'this square bracket indicates the outer array and the second one represents the inner array.', 'start': 991.547, 'duration': 6.201}, {'end': 1002.929, 'text': 'So the first element in our outer array is an array itself.', 'start': 998.008, 'duration': 4.921}, {'end': 1007.93, 'text': 'these are the values in this array, which basically represent the first row in our data set.', 'start': 1002.929, 'duration': 5.001}, {'end': 1011.772, 'text': 'So the video game with ranking 1, which is called Wii Sports.', 'start': 1008.61, 'duration': 3.162}, {'end': 1015.615, 'text': 'So this was a basic overview of Pandas data frames.', 'start': 1012.833, 'duration': 2.782}, {'end': 1019.518, 'text': "In the next section I'm going to show you some of the useful shortcuts of Jupyter.", 'start': 1016.155, 'duration': 3.363}, {'end': 1029.724, 'text': "I'm going to show you some of the most useful shortcuts in Jupyter.", 'start': 1026.642, 'duration': 3.082}, {'end': 1034.31, 'text': 'Now the first thing I want you to pay attention to is this green bar on the left.', 'start': 1030.465, 'duration': 3.845}, {'end': 1039.375, 'text': 'This indicates that this cell is currently in the edit mode, so we can write code here.', 'start': 1034.75, 'duration': 4.625}, {'end': 1047.784, 'text': 'Now, if we press the escape key, green turns to blue, and that means this cell is currently in the command mode.', 'start': 1040.215, 'duration': 7.569}], 'summary': 'Using pandas for basic data statistics and exploring jupyter shortcuts.', 'duration': 75.564, 'max_score': 972.22, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0972220.jpg'}, {'end': 1260.922, 'src': 'embed', 'start': 1233.627, 'weight': 5, 'content': [{'end': 1240.651, 'text': 'So this notebook file that we have here includes our source code organized in cells, as well as the output for each cell.', 'start': 1233.627, 'duration': 7.024}, {'end': 1245.294, 'text': "That is why it's different from a regular py file where we only have the source code.", 'start': 1240.951, 'duration': 4.343}, {'end': 1250.091, 'text': 'Here we also have auto completion and intelligence.', 'start': 1246.448, 'duration': 3.643}, {'end': 1256.157, 'text': "so in this cell, let's call df data frame dot.", 'start': 1250.091, 'duration': 6.066}, {'end': 1260.922, 'text': 'now, if you press tab, you can see all the attributes and methods in this object.', 'start': 1256.157, 'duration': 4.765}], 'summary': 'Notebook file contains source code in cells and provides auto completion and intelligence.', 'duration': 27.295, 'max_score': 1233.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01233627.jpg'}, {'end': 1314.218, 'src': 'embed', 'start': 1285.282, 'weight': 4, 'content': [{'end': 1290.326, 'text': 'In this case it generates descriptive statistics that summarize the central tendency and so on.', 'start': 1285.282, 'duration': 5.044}, {'end': 1300.46, 'text': 'Similar to VS Code, we can also convert a line to comment by pressing command and slash on Mac or control slash on Windows, like this.', 'start': 1291.713, 'duration': 8.747}, {'end': 1306.224, 'text': 'Now this line is a comment, we can press the same shortcut one more time to remove the comment.', 'start': 1301.12, 'duration': 5.104}, {'end': 1309.914, 'text': 'So these were some of the most useful shortcuts in Jupyter.', 'start': 1307.352, 'duration': 2.562}, {'end': 1314.218, 'text': "Now, over the next few lectures, we're going to work on a real machine learning project.", 'start': 1310.455, 'duration': 3.763}], 'summary': 'Jupyter shortcuts enhance efficiency for data analysis, with a focus on machine learning.', 'duration': 28.936, 'max_score': 1285.282, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01285282.jpg'}], 'start': 972.22, 'title': 'Pandas data frames overview and jupyter shortcuts', 'summary': 'Provides an overview of pandas data frames, including basic statistics and two-dimensional arrays, and introduces useful jupyter shortcuts and tips such as switching modes, keyboard shortcuts, running code, managing cells, and file differences.', 'chapters': [{'end': 1019.518, 'start': 972.22, 'title': 'Pandas data frames overview', 'summary': 'Provides a basic overview of pandas data frames, showcasing the describe method for basic statistics and the df.values attribute for a two dimensional array, and hints at the next section on useful shortcuts of jupyter.', 'duration': 47.298, 'highlights': ['The df.values attribute returns a two dimensional array, with the outer and inner arrays indicated by square brackets.', 'The first element in the outer array represents the first row in the data set, showcasing the values for the video game with ranking 1, named Wii Sports.', 'The describe method is used to obtain basic statistics about the data set, providing essential insights during data analysis.']}, {'end': 1355.102, 'start': 1026.642, 'title': 'Jupyter shortcuts and tips', 'summary': 'Introduces useful jupyter shortcuts and tips, including switching between edit and command modes, keyboard shortcuts, running code, managing cells, using tooltips and comments, and the difference between jupyter notebook files and regular .py files.', 'duration': 328.46, 'highlights': ['Jupyter shortcuts for managing cells and running code', 'Using tooltips and auto-completion in Jupyter', 'Introduction to Jupyter notebook files']}], 'duration': 382.882, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA0972220.jpg', 'highlights': ['The describe method is used to obtain basic statistics about the data set, providing essential insights during data analysis.', 'The df.values attribute returns a two dimensional array, with the outer and inner arrays indicated by square brackets.', 'The first element in the outer array represents the first row in the data set, showcasing the values for the video game with ranking 1, named Wii Sports.', 'Jupyter shortcuts for managing cells and running code', 'Using tooltips and auto-completion in Jupyter', 'Introduction to Jupyter notebook files']}, {'end': 2252.218, 'segs': [{'end': 1462.759, 'src': 'embed', 'start': 1355.442, 'weight': 0, 'content': [{'end': 1361.567, 'text': 'In fact, I have a comprehensive Python course that teaches you everything about Python from the basics to more advanced concepts.', 'start': 1355.442, 'duration': 6.125}, {'end': 1366.05, 'text': 'So after you watch this tutorial, if you want to learn more, you may want to look at my Python course.', 'start': 1361.847, 'duration': 4.203}, {'end': 1370.674, 'text': 'It comes with a 30 day money back guarantee and a certificate of completion you can add to your resume.', 'start': 1366.27, 'duration': 4.404}, {'end': 1373.276, 'text': "In case you're interested, the link is below this video.", 'start': 1371.094, 'duration': 2.182}, {'end': 1381.493, 'text': "Over the next few lectures we're going to work on a real machine learning project.", 'start': 1378.191, 'duration': 3.302}, {'end': 1384.014, 'text': 'Imagine we have an online music store.', 'start': 1382.133, 'duration': 1.881}, {'end': 1392.579, 'text': "When our users sign up, we ask their age and gender, and based on their profile, we recommend various music albums they're likely to buy.", 'start': 1384.735, 'duration': 7.844}, {'end': 1396.161, 'text': 'So in this project we want to use machine learning to increase sales.', 'start': 1393.079, 'duration': 3.082}, {'end': 1402.705, 'text': 'So we want to build a model, we feed this model with some sample data based on the existing users.', 'start': 1396.901, 'duration': 5.804}, {'end': 1407.548, 'text': 'Our model will learn the patterns in our data, so we can ask it to make predictions.', 'start': 1403.405, 'duration': 4.143}, {'end': 1412.612, 'text': 'When a new user signs up, we tell our model hey, we have a new user with this profile.', 'start': 1408.149, 'duration': 4.463}, {'end': 1415.254, 'text': 'what is the kind of music that this user is interested in??', 'start': 1412.612, 'duration': 2.642}, {'end': 1418.437, 'text': 'Our model will say jazz or hip hop or whatever.', 'start': 1415.835, 'duration': 2.602}, {'end': 1421.099, 'text': 'And based on that, we can make suggestions to the user.', 'start': 1418.857, 'duration': 2.242}, {'end': 1423.46, 'text': "So this is the problem we're going to solve.", 'start': 1421.799, 'duration': 1.661}, {'end': 1426.622, 'text': 'Now back to the list of steps in a machine learning project.', 'start': 1424.221, 'duration': 2.401}, {'end': 1428.503, 'text': 'First we need to import our data.', 'start': 1427.282, 'duration': 1.221}, {'end': 1430.744, 'text': 'Then we should prepare or clean it.', 'start': 1428.843, 'duration': 1.901}, {'end': 1434.285, 'text': 'Next we select a machine learning algorithm to build a model.', 'start': 1431.264, 'duration': 3.021}, {'end': 1437.346, 'text': 'We train our model and ask it to make predictions.', 'start': 1434.725, 'duration': 2.621}, {'end': 1441.308, 'text': 'And finally we evaluate our algorithm to see its accuracy.', 'start': 1437.886, 'duration': 3.422}, {'end': 1446.49, 'text': "If it's not accurate, we either fine tune our model or select a different algorithm.", 'start': 1441.508, 'duration': 4.982}, {'end': 1448.453, 'text': "So let's focus on the first step.", 'start': 1447.17, 'duration': 1.283}, {'end': 1451.059, 'text': 'Download the CSV file below this video.', 'start': 1449.255, 'duration': 1.804}, {'end': 1454.326, 'text': "This is a very basic CSV that I've created for this project.", 'start': 1451.46, 'duration': 2.866}, {'end': 1457.053, 'text': "It's just some random made up data, it's not real.", 'start': 1454.547, 'duration': 2.506}, {'end': 1462.759, 'text': 'So we have a table with three columns, age, gender, and genre.', 'start': 1458.598, 'duration': 4.161}], 'summary': 'Comprehensive python course available with 30-day money-back guarantee and certificate. machine learning project involves recommending music based on user profile. project includes steps like importing, cleaning, selecting algorithm, training, predicting, and evaluating data accuracy.', 'duration': 107.317, 'max_score': 1355.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01355442.jpg'}, {'end': 1592.48, 'src': 'embed', 'start': 1563.412, 'weight': 7, 'content': [{'end': 1565.395, 'text': 'Next we need to prepare or clean the data.', 'start': 1563.412, 'duration': 1.983}, {'end': 1567.218, 'text': "And that's the topic for the next lecture.", 'start': 1565.675, 'duration': 1.543}, {'end': 1577.356, 'text': 'The second step in a machine learning project is cleaning or preparing the data.', 'start': 1573.715, 'duration': 3.641}, {'end': 1582.077, 'text': 'And that involves tasks such as removing duplicates, null values, and so on.', 'start': 1577.796, 'duration': 4.281}, {'end': 1588.339, 'text': "Now, in this particular data set, we don't have to do any kind of cleaning because we don't have any duplicates and, as you can see,", 'start': 1582.557, 'duration': 5.782}, {'end': 1590.899, 'text': 'all rows have values for all columns.', 'start': 1588.339, 'duration': 2.56}, {'end': 1592.48, 'text': "So we don't have null values.", 'start': 1591.259, 'duration': 1.221}], 'summary': 'Preparing data in machine learning, no duplicates or null values in this dataset.', 'duration': 29.068, 'max_score': 1563.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01563412.jpg'}, {'end': 1642.482, 'src': 'embed', 'start': 1611.28, 'weight': 6, 'content': [{'end': 1613.141, 'text': 'The input set, and the output set.', 'start': 1611.28, 'duration': 1.861}, {'end': 1618.886, 'text': 'The output set, which is in this case, the genre column, contains the predictions.', 'start': 1613.662, 'duration': 5.224}, {'end': 1625.231, 'text': "So we're telling our model that if we have a user who is 20 years old, and is a male, they like hip hop.", 'start': 1619.386, 'duration': 5.845}, {'end': 1628.813, 'text': 'Once we train our model, then we give it a new input set.', 'start': 1625.951, 'duration': 2.862}, {'end': 1634.057, 'text': 'For example, we say, hey, we have a new user who is 21 years old, and is a male.', 'start': 1629.353, 'duration': 4.704}, {'end': 1642.482, 'text': "What is the genre of the music that this user probably likes? As you can see, in our input set, we don't have a sample for a 21 year old male.", 'start': 1634.597, 'duration': 7.885}], 'summary': 'Using input-output sets to predict music genre preferences.', 'duration': 31.202, 'max_score': 1611.28, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01611280.jpg'}, {'end': 1762.645, 'src': 'heatmap', 'start': 1729.146, 'weight': 0.751, 'content': [{'end': 1730.707, 'text': 'In this case, genre.', 'start': 1729.146, 'duration': 1.561}, {'end': 1737.933, 'text': 'once again this returns a new dataset, by convention we use a lowercase y to represent that.', 'start': 1731.807, 'duration': 6.126}, {'end': 1739.894, 'text': 'So that is our output dataset.', 'start': 1738.193, 'duration': 1.701}, {'end': 1748.402, 'text': "Let's inspect that as well, so in this dataset we only have the predictions or the answers.", 'start': 1740.555, 'duration': 7.847}, {'end': 1753.066, 'text': 'So we have prepared our data, next we need to create a model using an algorithm.', 'start': 1749.303, 'duration': 3.763}, {'end': 1762.645, 'text': 'The next step is to build a model using a machine learning algorithm.', 'start': 1759.602, 'duration': 3.043}], 'summary': 'Creating a new dataset with predictions and building a model using a machine learning algorithm.', 'duration': 33.499, 'max_score': 1729.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01729146.jpg'}, {'end': 1881.525, 'src': 'heatmap', 'start': 1849.364, 'weight': 0.744, 'content': [{'end': 1851.565, 'text': 'finally we need to ask our model to make a prediction.', 'start': 1849.364, 'duration': 2.201}, {'end': 1855.987, 'text': 'So we can ask it what is the kind of music that a 21 year old male likes?', 'start': 1852.125, 'duration': 3.862}, {'end': 1862.971, 'text': "Now, before we do that, let's temporarily inspect our initial data set, that is, music data.", 'start': 1856.568, 'duration': 6.403}, {'end': 1865.992, 'text': 'So, look what we got here.', 'start': 1864.331, 'duration': 1.661}, {'end': 1872.939, 'text': "As I told you earlier, I've assumed that men between 20 and 25 like hip hop music.", 'start': 1867.435, 'duration': 5.504}, {'end': 1878.543, 'text': 'But here we only have three samples for men aged 20, 23, and 25.', 'start': 1873.359, 'duration': 5.184}, {'end': 1881.525, 'text': "We don't have a sample for a 21 year old male.", 'start': 1878.543, 'duration': 2.982}], 'summary': 'Model predicts music preference for 21-year-old male using limited data.', 'duration': 32.161, 'max_score': 1849.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01849364.jpg'}, {'end': 2235.953, 'src': 'embed', 'start': 2208.376, 'weight': 4, 'content': [{'end': 2215.141, 'text': "So the accuracy score is 1 or 100%, but if you run this one more time, you're going to see a different result,", 'start': 2208.376, 'duration': 6.765}, {'end': 2220.646, 'text': "because every time we split our data set into training and test sets, we'll have different data sets,", 'start': 2215.141, 'duration': 5.505}, {'end': 2224.269, 'text': 'because this function randomly picks data for training and testing.', 'start': 2220.646, 'duration': 3.623}, {'end': 2225.449, 'text': 'Let me show you.', 'start': 2224.989, 'duration': 0.46}, {'end': 2229.19, 'text': 'So put the cursor in the cell, now you can see this cell is activated.', 'start': 2225.809, 'duration': 3.381}, {'end': 2235.953, 'text': 'Note that if you click this button here, it will run this cell and also insert a new cell below this cell.', 'start': 2229.751, 'duration': 6.202}], 'summary': 'Accuracy score is 100%, but varies on re-runs due to random data splitting.', 'duration': 27.577, 'max_score': 2208.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02208376.jpg'}], 'start': 1355.442, 'title': 'Python course and data cleaning for music genre prediction', 'summary': 'Covers a python course with a 30-day money-back guarantee and a certificate of completion, as well as a machine learning project aimed at increasing sales by making music album recommendations based on user profiles. additionally, it includes the process of data cleaning, model training for music genre prediction using a decision tree algorithm, achieving an accuracy score of 100%.', 'chapters': [{'end': 1511.758, 'start': 1355.442, 'title': 'Python course and machine learning project', 'summary': 'Covers a comprehensive python course with a 30-day money-back guarantee and a certificate of completion, followed by a machine learning project aimed at increasing sales by using a model to make music album recommendations based on user profiles.', 'duration': 156.316, 'highlights': ['A comprehensive Python course is available with a 30-day money-back guarantee and a certificate of completion.', 'The machine learning project aims to increase sales by recommending music albums based on user profiles.', 'The steps in the machine learning project include importing data, preparing and cleaning it, selecting a machine learning algorithm, training the model, and evaluating its accuracy.', 'The CSV file for the project contains data on age, gender, and genre, with specific assumptions about music preferences based on age and gender.']}, {'end': 2252.218, 'start': 1511.758, 'title': 'Data cleaning and model training for music genre prediction', 'summary': 'Covers the process of reading a csv file into a data frame using pandas, preparing the data by splitting it into input and output sets, creating a model using a decision tree algorithm, training the model to make predictions, and measuring the accuracy of the model through test and training sets, achieving an accuracy score of 100%.', 'duration': 740.46, 'highlights': ["The data set doesn't require cleaning as it lacks duplicates and null values.", 'The data set is split into input and output sets to train the model, which aids in predicting music genres based on user attributes.', 'The decision tree algorithm is utilized to create a model for predicting music genres based on user attributes.', 'The accuracy of the model is measured using test and training sets, achieving an accuracy score of 100%.']}], 'duration': 896.776, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA01355442.jpg', 'highlights': ['The machine learning project aims to increase sales by recommending music albums based on user profiles.', 'A comprehensive Python course is available with a 30-day money-back guarantee and a certificate of completion.', 'The steps in the machine learning project include importing data, preparing and cleaning it, selecting a machine learning algorithm, training the model, and evaluating its accuracy.', 'The CSV file for the project contains data on age, gender, and genre, with specific assumptions about music preferences based on age and gender.', 'The accuracy of the model is measured using test and training sets, achieving an accuracy score of 100%.', 'The decision tree algorithm is utilized to create a model for predicting music genres based on user attributes.', 'The data set is split into input and output sets to train the model, which aids in predicting music genres based on user attributes.', "The data set doesn't require cleaning as it lacks duplicates and null values."]}, {'end': 2976.157, 'segs': [{'end': 2334.941, 'src': 'embed', 'start': 2304.539, 'weight': 0, 'content': [{'end': 2307.021, 'text': 'look, the accuracy immediately dropped to 0.4 one more time now.', 'start': 2304.539, 'duration': 2.482}, {'end': 2308.161, 'text': '46 percent, 40 percent, 26 percent.', 'start': 2307.021, 'duration': 1.14}, {'end': 2309.562, 'text': "it's really really bad.", 'start': 2308.161, 'duration': 1.401}, {'end': 2320.59, 'text': 'The reason this is happening is because we are using very little data for training this model.', 'start': 2315.606, 'duration': 4.984}, {'end': 2323.392, 'text': 'This is one of the key concepts in machine learning.', 'start': 2321.09, 'duration': 2.302}, {'end': 2328.296, 'text': 'The more data we give to our model and the cleaner the data is, we get the better result.', 'start': 2323.792, 'duration': 4.504}, {'end': 2334.941, 'text': 'So if you have duplicates, irrelevant data, or incomplete values, our model will learn bad patterns in our data.', 'start': 2328.816, 'duration': 6.125}], 'summary': 'Accuracy dropped to 0.4, emphasizing the importance of sufficient and clean data in machine learning.', 'duration': 30.402, 'max_score': 2304.539, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02304539.jpg'}, {'end': 2378.811, 'src': 'embed', 'start': 2354.11, 'weight': 1, 'content': [{'end': 2359.472, 'text': 'Some machine learning problems require thousands or even millions of samples to train a model.', 'start': 2354.11, 'duration': 5.362}, {'end': 2362.354, 'text': 'The more complex the problem is, the more data we need.', 'start': 2359.972, 'duration': 2.382}, {'end': 2369.942, 'text': "For example, here we're only dealing with a table of three columns, but if you want to build a model to tell if a picture is a cat or a dog,", 'start': 2362.755, 'duration': 7.187}, {'end': 2372.705, 'text': "or a horse or a lion, we'll need millions of pictures.", 'start': 2369.942, 'duration': 2.763}, {'end': 2375.668, 'text': 'The more animals we want to support, the more pictures we need.', 'start': 2373.085, 'duration': 2.583}, {'end': 2378.811, 'text': "In the next lecture we're going to talk about model persistence.", 'start': 2376.328, 'duration': 2.483}], 'summary': 'Complex machine learning problems require millions of samples for training, such as distinguishing between different animals in pictures.', 'duration': 24.701, 'max_score': 2354.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02354110.jpg'}, {'end': 2541.025, 'src': 'heatmap', 'start': 2457.228, 'weight': 2, 'content': [{'end': 2465.616, 'text': 'On the top, from sklearn.externals module, we import joblib.', 'start': 2457.228, 'duration': 8.388}, {'end': 2470.361, 'text': 'This joblib object has methods for saving and loading models.', 'start': 2466.397, 'duration': 3.964}, {'end': 2477.468, 'text': 'So, after we train our model, we simply call joblib.dump.', 'start': 2471.102, 'duration': 6.366}, {'end': 2484.274, 'text': 'give it two arguments our model and the name of the file in which we want to store this model.', 'start': 2478.869, 'duration': 5.405}, {'end': 2490.639, 'text': "Let's call that music-recommender.joblib.", 'start': 2484.954, 'duration': 5.685}, {'end': 2492.621, 'text': "That's all we have to do.", 'start': 2491.66, 'duration': 0.961}, {'end': 2499.567, 'text': "Now temporarily I'm going to comment out this line, we don't want to make any predictions, we just want to store our trained model in a file.", 'start': 2493.301, 'duration': 6.266}, {'end': 2509.914, 'text': "So, let's run the cell with control and slash, okay, look, in the output we have an array that contains the name of our model file.", 'start': 2500.167, 'duration': 9.747}, {'end': 2512.575, 'text': 'So this is the return value of the dump method.', 'start': 2510.434, 'duration': 2.141}, {'end': 2520.94, 'text': "Now back to our desktop, right next to my notebook you can see our joblib file, this is where our model is stored, it's simply a binary file.", 'start': 2513.776, 'duration': 7.164}, {'end': 2523.27, 'text': 'Now back to our Jupyter notebook.', 'start': 2521.889, 'duration': 1.381}, {'end': 2527.834, 'text': "As I told you before, in a real application, we don't want to train a model every time.", 'start': 2523.891, 'duration': 3.943}, {'end': 2537.642, 'text': "So, let's comment out these few lines, so I've selected these few lines, on Mac we can press command and slash, and on Windows, control slash.", 'start': 2528.414, 'duration': 9.228}, {'end': 2541.025, 'text': 'okay, these lines are commented out now.', 'start': 2538.863, 'duration': 2.162}], 'summary': 'Using joblib from sklearn.externals to save and load trained models.', 'duration': 35.393, 'max_score': 2457.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02457228.jpg'}, {'end': 2608.238, 'src': 'heatmap', 'start': 2563.168, 'weight': 3, 'content': [{'end': 2567.115, 'text': "let's print predictions and see if our model is behaving correctly or not.", 'start': 2563.168, 'duration': 3.947}, {'end': 2570.361, 'text': 'So control and enter, there you go.', 'start': 2567.797, 'duration': 2.564}, {'end': 2573.367, 'text': 'So this is how we persist and load models.', 'start': 2571.042, 'duration': 2.325}, {'end': 2586.626, 'text': "Earlier in this section I told you that decision trees are the easiest to understand, and that's why we started machine learning with decision trees.", 'start': 2580.123, 'duration': 6.503}, {'end': 2590.348, 'text': "In this section we're going to export our model in a visual format.", 'start': 2587.206, 'duration': 3.142}, {'end': 2593.63, 'text': 'So you will see how this model makes predictions.', 'start': 2590.688, 'duration': 2.942}, {'end': 2595.29, 'text': 'That is really really cool.', 'start': 2594.07, 'duration': 1.22}, {'end': 2595.851, 'text': 'Let me show you.', 'start': 2595.37, 'duration': 0.481}, {'end': 2608.238, 'text': "So, once again I've simplified this code, so we simply import our data set, create input and output sets, create a model, and train it.", 'start': 2596.531, 'duration': 11.707}], 'summary': 'Demonstrating model persistence and visualization in machine learning using decision trees.', 'duration': 45.07, 'max_score': 2563.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02563168.jpg'}, {'end': 2779.434, 'src': 'heatmap', 'start': 2739.245, 'weight': 4, 'content': [{'end': 2746.89, 'text': 'So, set label to all, then rounded to true, and finally filled to true.', 'start': 2739.245, 'duration': 7.645}, {'end': 2749.551, 'text': 'So this is the end result.', 'start': 2748.37, 'duration': 1.181}, {'end': 2752.573, 'text': "Now let's run this cell using control and enter.", 'start': 2750.472, 'duration': 2.101}, {'end': 2760.402, 'text': "Okay, here we have a new file, music recommender dot, dot, that's a little bit funny.", 'start': 2753.954, 'duration': 6.448}, {'end': 2763.506, 'text': 'So we want to open this file with VS Code.', 'start': 2761.143, 'duration': 2.363}, {'end': 2766.51, 'text': 'So drag and drop this into a VS Code window.', 'start': 2763.987, 'duration': 2.523}, {'end': 2771.83, 'text': "Okay, here's the dot format.", 'start': 2770.109, 'duration': 1.721}, {'end': 2774.812, 'text': "It's a textual language for describing graphs.", 'start': 2772.31, 'duration': 2.502}, {'end': 2779.434, 'text': 'Now to visualize this graph, we need to install an extension in VS code.', 'start': 2775.532, 'duration': 3.902}], 'summary': 'Demonstrated setting label to all, rounding to true, and filling to true, followed by opening a new file in vs code and installing a graph visualization extension.', 'duration': 25.48, 'max_score': 2739.245, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02739245.jpg'}, {'end': 2880.942, 'src': 'embed', 'start': 2852.63, 'weight': 5, 'content': [{'end': 2856.211, 'text': "So here we're classifying people based on their profile.", 'start': 2852.63, 'duration': 3.581}, {'end': 2859.032, 'text': 'That is the reason we have the word class here.', 'start': 2856.631, 'duration': 2.401}, {'end': 2863.932, 'text': 'So a user who is 30 years or older belongs to the class of classical.', 'start': 2859.632, 'duration': 4.3}, {'end': 2866.453, 'text': 'Or people who like classical music.', 'start': 2864.633, 'duration': 1.82}, {'end': 2872.477, 'text': 'Now what if this condition is true? That means that user is younger than 30.', 'start': 2867.193, 'duration': 5.284}, {'end': 2874.198, 'text': 'So, now we check the gender.', 'start': 2872.477, 'duration': 1.721}, {'end': 2880.942, 'text': "If it's less than 0.5, which basically means if it equals to 0, then we're dealing with a female.", 'start': 2874.898, 'duration': 6.044}], 'summary': 'Classifying users based on age and gender for music preferences.', 'duration': 28.312, 'max_score': 2852.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02852630.jpg'}, {'end': 2929.453, 'src': 'embed', 'start': 2901.181, 'weight': 6, 'content': [{'end': 2904.843, 'text': 'So this is the decision tree that our model uses to make predictions.', 'start': 2901.181, 'duration': 3.662}, {'end': 2908.484, 'text': "Now, if you're wondering why we have these floating point numbers like 25.5,,", 'start': 2905.323, 'duration': 3.161}, {'end': 2915.587, 'text': 'these are basically the rules that our model generates based on the patterns that it finds in our data set.', 'start': 2908.484, 'duration': 7.103}, {'end': 2920.529, 'text': "As we give our model more data, these rules will change, so they're not always the same.", 'start': 2916.207, 'duration': 4.322}, {'end': 2926.312, 'text': 'Also, the more columns or more features we have, our decision tree is going to get more complex.', 'start': 2921.289, 'duration': 5.023}, {'end': 2929.453, 'text': 'Currently we have only two features, age and gender.', 'start': 2926.732, 'duration': 2.721}], 'summary': 'Model generates rules based on data patterns, decision tree complexity increases with more features.', 'duration': 28.272, 'max_score': 2901.181, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02901181.jpg'}], 'start': 2253.118, 'title': 'Model training & data visualization', 'summary': "Emphasizes data cleaning's significance, data size's impact on model accuracy, the need for model persistence, and visualizing decision trees using vs code, covering interpreting the tree and the impact of data on model rules and complexity.", 'chapters': [{'end': 2752.573, 'start': 2253.118, 'title': 'Model training & data cleaning', 'summary': 'Covers the importance of data cleaning and the impact of data size on model accuracy, along with discussing the need for model persistence to save time in training, and the process of exporting a decision tree model in a visual format.', 'duration': 499.455, 'highlights': ['The accuracy score dropped to 0.4, 46%, 40%, 26% when test size was changed from 0.2 to 0.8, indicating the significant impact of data size on model accuracy.', 'The importance of data cleaning in machine learning is emphasized, as the quality and quantity of data significantly impact model performance.', 'Model persistence is crucial to save time in training, and the process involves saving the trained model to a file using joblib.dump and loading it for making predictions.', 'The process of exporting a decision tree model in a visual format using tree.export_graphviz is presented, demonstrating how the model makes predictions in a graphical format.']}, {'end': 2976.157, 'start': 2753.954, 'title': 'Visualizing decision trees with vs code', 'summary': 'Explains how to visualize a decision tree using the dot format in vs code, including installing the required extension, interpreting the decision tree, and the impact of data on model rules and complexity.', 'duration': 222.203, 'highlights': ['The dot format is used to describe graphs and can be visualized in VS Code by installing the graphviz or dot language extension by Stefan Vs.', 'The decision tree classifies users based on their profile, with conditions like age and gender determining the music genre they are interested in.', 'The rules for the decision tree are generated based on patterns found in the dataset, and as more data is given, these rules may change and become more complex.']}], 'duration': 723.039, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7eh4d6sabA0/pics/7eh4d6sabA02253118.jpg', 'highlights': ['The accuracy score dropped to 0.4, 46%, 40%, 26% when test size was changed from 0.2 to 0.8, indicating the significant impact of data size on model accuracy.', 'The importance of data cleaning in machine learning is emphasized, as the quality and quantity of data significantly impact model performance.', 'Model persistence is crucial to save time in training, involving saving the trained model to a file using joblib.dump and loading it for making predictions.', 'The process of exporting a decision tree model in a visual format using tree.export_graphviz is presented, demonstrating how the model makes predictions in a graphical format.', 'The dot format is used to describe graphs and can be visualized in VS Code by installing the graphviz or dot language extension by Stefan Vs.', 'The decision tree classifies users based on their profile, with conditions like age and gender determining the music genre they are interested in.', 'The rules for the decision tree are generated based on patterns found in the dataset, and as more data is given, these rules may change and become more complex.']}], 'highlights': ['The tutorial aims to solve a real world problem using machine learning and Python, specifically building a model to predict the kind of music people like.', 'The machine learning project aims to increase sales by recommending music albums based on user profiles.', 'The instructor emphasizes that prior knowledge in machine learning is not necessary, but a decent understanding of Python is required.', 'By the end of the one hour tutorial, learners will have a good understanding of machine learning basics and will be able to learn more intermediate to advanced level concepts.', 'Machine learning involves building a model with large datasets to learn patterns, enabling it to accurately classify new data.', 'The accuracy of the model is measured using test and training sets, achieving an accuracy score of 100%.', 'The decision tree algorithm is utilized to create a model for predicting music genres based on user attributes.', 'The importance of data cleaning in machine learning is emphasized, as the quality and quantity of data significantly impact model performance.', 'Model persistence is crucial to save time in training, involving saving the trained model to a file using joblib.dump and loading it for making predictions.', 'The accuracy score dropped to 0.4, 46%, 40%, 26% when test size was changed from 0.2 to 0.8, indicating the significant impact of data size on model accuracy.']}