title

Machine Learning for Everybody – Full Course

description

Learn Machine Learning in a way that is accessible to absolute beginners. You will learn the basics of Machine Learning and how to use TensorFlow to implement many different concepts.
✏️ Kylie Ying developed this course. Check out her channel: https://www.youtube.com/c/YCubed
⭐️ Code and Resources ⭐️
🔗 Supervised learning (classification/MAGIC): https://colab.research.google.com/drive/16w3TDn_tAku17mum98EWTmjaLHAJcsk0?usp=sharing
🔗 Supervised learning (regression/bikes): https://colab.research.google.com/drive/1m3oQ9b0oYOT-DXEy0JCdgWPLGllHMb4V?usp=sharing
🔗 Unsupervised learning (seeds): https://colab.research.google.com/drive/1zw_6ZnFPCCh6mWDAd_VBMZB4VkC3ys2q?usp=sharing
🔗 Dataets (add a note that for the bikes dataset, they may have to open the downloaded csv file and remove special characters)
🔗 MAGIC dataset: https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope
🔗 Bikes dataset: https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand
🔗 Seeds/wheat dataset: https://archive.ics.uci.edu/ml/datasets/seeds
🏗 Google provided a grant to make this course possible.
⭐️ Contents ⭐️
⌨️ (0:00:00) Intro
⌨️ (0:00:58) Data/Colab Intro
⌨️ (0:08:45) Intro to Machine Learning
⌨️ (0:12:26) Features
⌨️ (0:17:23) Classification/Regression
⌨️ (0:19:57) Training Model
⌨️ (0:30:57) Preparing Data
⌨️ (0:44:43) K-Nearest Neighbors
⌨️ (0:52:42) KNN Implementation
⌨️ (1:08:43) Naive Bayes
⌨️ (1:17:30) Naive Bayes Implementation
⌨️ (1:19:22) Logistic Regression
⌨️ (1:27:56) Log Regression Implementation
⌨️ (1:29:13) Support Vector Machine
⌨️ (1:37:54) SVM Implementation
⌨️ (1:39:44) Neural Networks
⌨️ (1:47:57) Tensorflow
⌨️ (1:49:50) Classification NN using Tensorflow
⌨️ (2:10:12) Linear Regression
⌨️ (2:34:54) Lin Regression Implementation
⌨️ (2:57:44) Lin Regression using a Neuron
⌨️ (3:00:15) Regression NN using Tensorflow
⌨️ (3:13:13) K-Means Clustering
⌨️ (3:23:46) Principal Component Analysis
⌨️ (3:33:54) K-Means and PCA Implementations
🎉 Thanks to our Champion and Sponsor supporters:
👾 Raymond Odero
👾 Agustín Kussrow
👾 aldo ferretti
👾 Otis Morgan
👾 DeezMaster
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news

detail

{'title': 'Machine Learning for Everybody – Full Course', 'heatmap': [{'end': 14029.679, 'start': 13892.797, 'weight': 1}], 'summary': 'The full course on machine learning covers topics such as data analysis, fundamental concepts, model implementation, probabilistic classification, svms, neural networks, regression modeling, unsupervised learning, and dimensionality reduction, with practical demonstrations and achievements including an 82% accuracy in k nearest neighbors model, 96.4% probability calculation in probabilistic classification, 87% accuracy in svms, 81% accuracy in neural network training, and an r-squared score of around 0.38 in regression modeling.', 'chapters': [{'end': 417.261, 'segs': [{'end': 28.16, 'src': 'embed', 'start': 0.069, 'weight': 0, 'content': [{'end': 5.651, 'text': 'Kylie Ying has worked at many interesting places such as MIT, CERN, and Free Code Camp.', 'start': 0.069, 'duration': 5.582}, {'end': 9.032, 'text': "She's a physicist, engineer, and basically a genius.", 'start': 6.091, 'duration': 2.941}, {'end': 14.654, 'text': "And now she's going to teach you about machine learning in a way that is accessible to absolute beginners.", 'start': 9.612, 'duration': 5.042}, {'end': 19.954, 'text': "What's up, you guys? So welcome to Machine Learning for Everyone.", 'start': 15.33, 'duration': 4.624}, {'end': 28.16, 'text': 'If you are someone who is interested in machine learning and you think you are considered as everyone, then this video is for you.', 'start': 20.694, 'duration': 7.466}], 'summary': 'Kylie ying, a physicist and engineer, will teach machine learning to beginners.', 'duration': 28.091, 'max_score': 0.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg69.jpg'}, {'end': 132.42, 'src': 'embed', 'start': 65.102, 'weight': 1, 'content': [{'end': 70.067, 'text': 'So this here is the UCI machine learning repository.', 'start': 65.102, 'duration': 4.965}, {'end': 73.409, 'text': 'And basically they just have a ton of data sets that we can access.', 'start': 70.587, 'duration': 2.822}, {'end': 77.973, 'text': 'And I found this really cool one called the Magic Gamma Telescope data set.', 'start': 73.83, 'duration': 4.143}, {'end': 83.515, 'text': "So, in this dataset, if you don't wanna read all this information, to summarize,", 'start': 78.874, 'duration': 4.641}, {'end': 90.657, 'text': "what I think is going on is there's this gamma telescope and we have all these high energy particles hitting the telescope.", 'start': 83.515, 'duration': 7.142}, {'end': 98.459, 'text': "Now there's a camera, there's a detector that actually records certain patterns of how this light hits the camera.", 'start': 91.197, 'duration': 7.262}, {'end': 106.348, 'text': 'and we can use properties of those patterns in order to predict what type of particle caused that radiation.', 'start': 99.379, 'duration': 6.969}, {'end': 110.132, 'text': 'So whether it was a gamma particle or some other like hadron.', 'start': 106.448, 'duration': 3.684}, {'end': 118.61, 'text': 'Down here, these are all of the attributes of those patterns that we collect in the camera.', 'start': 113.587, 'duration': 5.023}, {'end': 124.154, 'text': "So you can see that there's some length, width, size, asymmetry, etc.", 'start': 119.011, 'duration': 5.143}, {'end': 132.42, 'text': "Now we're going to use all these properties to help us discriminate the patterns and whether or not they came from a gamma particle or a hadron.", 'start': 124.755, 'duration': 7.665}], 'summary': 'Uci ml repository offers magic gamma telescope dataset for predicting particle type using camera patterns.', 'duration': 67.318, 'max_score': 65.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg65102.jpg'}, {'end': 338.077, 'src': 'embed', 'start': 304.54, 'weight': 4, 'content': [{'end': 307.242, 'text': 'What else do we have? fdist and class.', 'start': 304.54, 'duration': 2.702}, {'end': 312.599, 'text': 'Okay, great.', 'start': 311.598, 'duration': 1.001}, {'end': 318.284, 'text': 'Now in order to label those as these columns down here in our data frame.', 'start': 313.14, 'duration': 5.144}, {'end': 322.627, 'text': 'so basically, this command here just reads some CSV file that you pass in.', 'start': 318.284, 'duration': 4.343}, {'end': 328.972, 'text': 'CSV is comma separated values and turns that into a pandas data frame object.', 'start': 322.627, 'duration': 6.345}, {'end': 338.077, 'text': 'So now if I pass in a names here, then it basically assigns these labels to the columns of this data set.', 'start': 330.134, 'duration': 7.943}], 'summary': "Using 'fdist' and 'class' to label columns in a csv file to a pandas data frame.", 'duration': 33.537, 'max_score': 304.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg304540.jpg'}, {'end': 388.617, 'src': 'embed', 'start': 359.759, 'weight': 5, 'content': [{'end': 371.469, 'text': "So if I actually go down here and I do data frame class.unique, you'll see that I have either Gs or Hs, and these stand for gammas or hadrons.", 'start': 359.759, 'duration': 11.71}, {'end': 378.446, 'text': 'And our computer is not so good at understanding letters, right? Our computer is really good at understanding numbers.', 'start': 373.081, 'duration': 5.365}, {'end': 384.152, 'text': "So what we're going to do is we're going to convert this to zero for G and one for H.", 'start': 379.007, 'duration': 5.145}, {'end': 388.617, 'text': "So here, I'm going to set this equal to this.", 'start': 384.152, 'duration': 4.465}], 'summary': 'Converting gs to 0s and hs to 1s in data frame class.unique.', 'duration': 28.858, 'max_score': 359.759, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg359759.jpg'}], 'start': 0.069, 'title': 'Machine learning and data analysis', 'summary': "Covers machine learning for absolute beginners, discussing supervised and unsupervised learning models, and demonstrates programming on google colab using the uci machine learning repository's magic gamma telescope dataset. it also covers importing a dataset, creating column labels, and converting class labels from letters to numbers using pandas and numpy libraries.", 'chapters': [{'end': 132.42, 'start': 0.069, 'title': 'Machine learning for everyone', 'summary': "Covers machine learning for absolute beginners, discussing supervised and unsupervised learning models and demonstrating how to program it on google colab, using the uci machine learning repository's magic gamma telescope dataset.", 'duration': 132.351, 'highlights': ['Kylie Ying, a physicist and engineer, is teaching machine learning for absolute beginners, introducing supervised and unsupervised learning models and demonstrating programming on Google CoLab.', "The UCI machine learning repository provides a dataset called the Magic Gamma Telescope dataset, which involves using patterns from a gamma telescope's camera to predict the type of particle causing radiation, with attributes like length, width, size, and asymmetry being used for discrimination."]}, {'end': 417.261, 'start': 133.28, 'title': 'Data analysis and dataset labeling', 'summary': 'Covers importing a dataset, creating column labels, and converting class labels from letters to numbers for better computer understanding, including the use of pandas and numpy libraries.', 'duration': 283.981, 'highlights': ['Importing the dataset using pandas read_csv and creating column labels based on attribute names.', 'Converting class labels from letters to numerical values (0 and 1) for better computer understanding.', "Using pandas to convert class labels to numerical values and ensuring the computer's ability to understand the data."]}], 'duration': 417.192, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg69.jpg', 'highlights': ['Kylie Ying teaches machine learning for absolute beginners, covering supervised and unsupervised learning models.', "Demonstrates programming on Google CoLab and uses the UCI machine learning repository's Magic Gamma Telescope dataset.", "The dataset involves using patterns from a gamma telescope's camera to predict the type of particle causing radiation.", 'Attributes like length, width, size, and asymmetry are used for discrimination in the dataset.', 'Importing the dataset using pandas read_csv and creating column labels based on attribute names.', 'Converting class labels from letters to numerical values (0 and 1) for better computer understanding.', "Using pandas to convert class labels to numerical values and ensuring the computer's ability to understand the data."]}, {'end': 2079.06, 'segs': [{'end': 509.912, 'src': 'embed', 'start': 480.38, 'weight': 0, 'content': [{'end': 489.231, 'text': "And features are just things that we're going to pass into our model in order to help us predict the label, which in this case is the class column.", 'start': 480.38, 'duration': 8.851}, {'end': 495.649, 'text': 'So for sample zero, I have 10 different features.', 'start': 489.812, 'duration': 5.837}, {'end': 503.951, 'text': 'so I have 10 different values that I can pass into some model and I can spit out the class, the label.', 'start': 495.649, 'duration': 8.302}, {'end': 509.912, 'text': 'And I know the true label here is G, so this is actually supervised learning.', 'start': 504.351, 'duration': 5.561}], 'summary': 'In supervised learning, 10 features are used to predict the class label g.', 'duration': 29.532, 'max_score': 480.38, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg480380.jpg'}, {'end': 627.664, 'src': 'embed', 'start': 593.582, 'weight': 1, 'content': [{'end': 595.423, 'text': 'The first one is supervised learning.', 'start': 593.582, 'duration': 1.841}, {'end': 598.885, 'text': "And in supervised learning, we're using labeled inputs.", 'start': 596.163, 'duration': 2.722}, {'end': 601.346, 'text': 'So this means whatever input we get,', 'start': 599.325, 'duration': 2.021}, {'end': 610.35, 'text': 'we have a corresponding output label in order to train models and to learn outputs of different new inputs that we might feed our model.', 'start': 601.346, 'duration': 9.004}, {'end': 618.014, 'text': 'So for example, I might have these pictures, okay? To a computer, all these pictures are are pixels.', 'start': 612.011, 'duration': 6.003}, {'end': 620.175, 'text': "They're pixels with a certain color.", 'start': 618.534, 'duration': 1.641}, {'end': 627.664, 'text': 'Now in supervised learning, all of these inputs have a label associated with them.', 'start': 621.943, 'duration': 5.721}], 'summary': 'Supervised learning uses labeled inputs to train models and learn outputs.', 'duration': 34.082, 'max_score': 593.582, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg593582.jpg'}, {'end': 825.43, 'src': 'embed', 'start': 794.314, 'weight': 2, 'content': [{'end': 795.915, 'text': 'There are two different categories.', 'start': 794.314, 'duration': 1.601}, {'end': 798.536, 'text': "That's a piece of qualitative data.", 'start': 796.575, 'duration': 1.961}, {'end': 807.882, 'text': 'Another example might be, okay, we have a bunch of different nationalities, maybe a nationality or a nation or a location.', 'start': 800.057, 'duration': 7.825}, {'end': 811.224, 'text': 'That might also be an example of categorical data.', 'start': 808.462, 'duration': 2.762}, {'end': 815.981, 'text': "Now, in both of these, there's no inherent order.", 'start': 813.038, 'duration': 2.943}, {'end': 825.43, 'text': "It's not like we can rate US one and France two, Japan three, et cetera.", 'start': 816.541, 'duration': 8.889}], 'summary': 'Two categories: qualitative and categorical data.', 'duration': 31.116, 'max_score': 794.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg794314.jpg'}, {'end': 1464.261, 'src': 'embed', 'start': 1437.233, 'weight': 3, 'content': [{'end': 1443.199, 'text': 'So we actually break up our whole data set that we have into three different types of data sets.', 'start': 1437.233, 'duration': 5.966}, {'end': 1448.004, 'text': 'We call it the training data set, the validation data set and the testing data set.', 'start': 1443.339, 'duration': 4.665}, {'end': 1455.071, 'text': 'And you might have 60% here, 20% and 20% or 80, 10 and 10.', 'start': 1449.005, 'duration': 6.066}, {'end': 1456.853, 'text': 'It really depends on how many statistics you have.', 'start': 1455.071, 'duration': 1.782}, {'end': 1458.795, 'text': 'I think either of those would be acceptable.', 'start': 1456.933, 'duration': 1.862}, {'end': 1464.261, 'text': 'So what we do is then we feed the training data set into our model.', 'start': 1460.58, 'duration': 3.681}], 'summary': 'Data set is divided into training, validation, and testing sets, with distribution like 60%, 20%, and 20% or 80%, 10%, and 10% depending on statistics.', 'duration': 27.028, 'max_score': 1437.233, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg1437233.jpg'}], 'start': 421.058, 'title': 'Machine learning fundamentals', 'summary': 'Covers fundamental concepts of machine learning, including data features, supervised and unsupervised learning, types of data, and model evaluation. it emphasizes 10 features used for classification, ai, and data science concepts, different data types, and evaluation of a supervised learning model using a diabetes dataset.', 'chapters': [{'end': 509.912, 'start': 421.058, 'title': 'Understanding data features and classification', 'summary': 'Explains the concept of features in a data set, with 10 features used to predict the class label, g or h, in a supervised learning classification model.', 'duration': 88.854, 'highlights': ['The data set contains multiple samples, each having 10 different features for predicting the class label, leading to a supervised learning scenario with the true label being G.', 'Features are the values passed into the model to help predict the class label, in this case, the class column containing G for gamma or H for hadron.']}, {'end': 770.209, 'start': 512.432, 'title': 'Introduction to machine learning', 'summary': 'Explains the concepts of machine learning, ai, and data science, emphasizing supervised and unsupervised learning, with examples and comparisons between them.', 'duration': 257.777, 'highlights': ['The chapter explains the concepts of machine learning, AI, and data science', 'Emphasizes supervised and unsupervised learning, with examples and comparisons between them', 'Supervised learning uses labeled inputs to train models and make predictions', 'Unsupervised learning involves using unlabeled data to learn about patterns in the data']}, {'end': 1178.44, 'start': 770.289, 'title': 'Types of data in machine learning', 'summary': 'Explains the different types of data in machine learning, covering qualitative, nominal, ordinal, and quantitative data, as well as supervised learning tasks including classification and regression.', 'duration': 408.151, 'highlights': ['The chapter explains the different types of data in machine learning, covering qualitative, nominal, ordinal, and quantitative data.', 'Supervised learning tasks are covered, including classification and regression.']}, {'end': 2079.06, 'start': 1179.241, 'title': 'Supervised learning model evaluation', 'summary': 'Discusses the process of training and evaluating a supervised learning model, using a diabetes dataset as an example, including the concepts of feature vector, target vector, training, validation, testing, loss functions, and model performance metrics.', 'duration': 899.819, 'highlights': ["The process of training and evaluating a supervised learning model involves splitting the dataset into training, validation, and testing sets, with the training set used to adjust the model's predictions and the validation set serving as a reality check for the model's ability to handle unseen data.", "Loss functions, such as L1, L2, and binary cross entropy, are used to measure the difference between the model's predictions and the actual labels, with smaller loss indicating better model performance.", "The accuracy metric is described as a measure of the model's performance, with an example illustrating the calculation of accuracy based on the model's predictions and the actual labels."]}], 'duration': 1658.002, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg421058.jpg', 'highlights': ['The data set contains 10 features for predicting the class label in a supervised learning scenario.', 'Supervised learning uses labeled inputs to train models and make predictions.', 'The chapter explains the different types of data in machine learning, covering qualitative, nominal, ordinal, and quantitative data.', 'The process of training and evaluating a supervised learning model involves splitting the dataset into training, validation, and testing sets.']}, {'end': 2642.047, 'segs': [{'end': 2143.647, 'src': 'embed', 'start': 2113.912, 'weight': 0, 'content': [{'end': 2124.083, 'text': "I'm going to set train valid and test to be equal to, This, um, so numpy dot split, I'm just splitting up the data frame.", 'start': 2113.912, 'duration': 10.171}, {'end': 2130.464, 'text': "And if I do this sample where I'm sampling everything, this will basically shuffle my data.", 'start': 2125.123, 'duration': 5.341}, {'end': 2139.206, 'text': "Um, now if I, I want to pass in where exactly I'm splitting my data set.", 'start': 2130.484, 'duration': 8.722}, {'end': 2143.647, 'text': 'So the first split is going to be maybe at 60%.', 'start': 2139.406, 'duration': 4.241}], 'summary': 'Splitting data into train, valid, and test sets, shuffling with 60% split', 'duration': 29.735, 'max_score': 2113.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2113912.jpg'}, {'end': 2290.863, 'src': 'embed', 'start': 2258.721, 'weight': 1, 'content': [{'end': 2269.228, 'text': "Now, In, so I'm actually going to import something known as, uh, the standard scaler from sklearn.", 'start': 2258.721, 'duration': 10.507}, {'end': 2280.018, 'text': "So if I come up here, I can go to sklearn.preprocessing and I'm going to import, um, standard scaler.", 'start': 2269.748, 'duration': 10.27}, {'end': 2282.24, 'text': 'I have to run that cell.', 'start': 2280.038, 'duration': 2.202}, {'end': 2284.36, 'text': "I'm going to come back down here.", 'start': 2283.339, 'duration': 1.021}, {'end': 2288.622, 'text': "And now I'm going to create a scalar and use that scale.", 'start': 2284.38, 'duration': 4.242}, {'end': 2290.863, 'text': 'So standard scalar.', 'start': 2288.662, 'duration': 2.201}], 'summary': 'Imported standard scaler from sklearn for data scaling.', 'duration': 32.142, 'max_score': 2258.721, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2258721.jpg'}, {'end': 2376.605, 'src': 'embed', 'start': 2346.109, 'weight': 4, 'content': [{'end': 2354.266, 'text': 'So what am I stacking? Well, I have to pass in something so that it can stack X and Y.', 'start': 2346.109, 'duration': 8.157}, {'end': 2360.689, 'text': 'And now, Okay, so NumPy is very particular about dimensions, right?', 'start': 2354.266, 'duration': 6.423}, {'end': 2367.355, 'text': 'So in this specific case, our X is a two-dimensional object, but Y is only a one-dimensional thing.', 'start': 2361.149, 'duration': 6.206}, {'end': 2369.097, 'text': "It's only a vector of values.", 'start': 2367.395, 'duration': 1.702}, {'end': 2376.605, 'text': 'So in order to now reshape it into a 2D item, we have to call numpy.reshape.', 'start': 2369.958, 'duration': 6.647}], 'summary': 'Using numpy to reshape 1d vector y into 2d object for stacking x and y.', 'duration': 30.496, 'max_score': 2346.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2346109.jpg'}, {'end': 2459.605, 'src': 'embed', 'start': 2428.976, 'weight': 2, 'content': [{'end': 2445.112, 'text': "So remember that this is the gammas, and then, if we print that and we do the same thing, but zero, we'll see that there's around 7,", 'start': 2428.976, 'duration': 16.136}, {'end': 2449.084, 'text': '000 of the gammas, but only around 4, 000 of the hadrons.', 'start': 2445.112, 'duration': 3.972}, {'end': 2451.406, 'text': 'So that might actually become an issue.', 'start': 2449.565, 'duration': 1.841}, {'end': 2459.605, 'text': 'And instead, what we want to do is we want to oversample our training data set.', 'start': 2452.904, 'duration': 6.701}], 'summary': 'Around 7,000 gammas and 4,000 hadrons, oversampling needed.', 'duration': 30.629, 'max_score': 2428.976, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2428976.jpg'}, {'end': 2557.009, 'src': 'embed', 'start': 2525.349, 'weight': 3, 'content': [{'end': 2529.731, 'text': "And what that's doing is saying, okay, take more of the less class.", 'start': 2525.349, 'duration': 4.382}, {'end': 2538.857, 'text': 'So take take the less class and keep sampling from there to increase the size of our data set of that smaller class so that they now match.', 'start': 2530.312, 'duration': 8.545}, {'end': 2556.408, 'text': "So if I do this and I scale data set and I pass in the training data set where oversample is true, So let's say this is train and then X train,", 'start': 2541.018, 'duration': 15.39}, {'end': 2557.009, 'text': 'Y train.', 'start': 2556.408, 'duration': 0.601}], 'summary': 'Oversample the smaller class to match the larger class in the dataset.', 'duration': 31.66, 'max_score': 2525.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2525349.jpg'}, {'end': 2652.924, 'src': 'embed', 'start': 2623.115, 'weight': 5, 'content': [{'end': 2629.859, 'text': "Now. the reason why I'm switching that to false is because my validation and my test sets are for the purpose of you know.", 'start': 2623.115, 'duration': 6.744}, {'end': 2635.323, 'text': "if I have data that I haven't seen yet, how does my sample perform on those?", 'start': 2629.859, 'duration': 5.464}, {'end': 2639.405, 'text': "And I don't want to oversample for that right now.", 'start': 2636.143, 'duration': 3.262}, {'end': 2642.047, 'text': "Like, I don't care about balancing those.", 'start': 2639.445, 'duration': 2.602}, {'end': 2652.924, 'text': "I want to know if I have a random set of data that's unlabeled, can I trust my model, right? So that's why I'm not oversampling.", 'start': 2642.147, 'duration': 10.777}], 'summary': 'Validation and test sets are not oversampled to assess model performance on unlabeled data.', 'duration': 29.809, 'max_score': 2623.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2623115.jpg'}], 'start': 2079.08, 'title': 'Data scaling, splitting, and oversampling for machine learning', 'summary': 'Covers the process of splitting data into training, validation, and test sets, using numpy for shuffling and scaling while maintaining relative scaling and distribution. it also explains oversampling to balance training data by increasing the size of the smaller class, resulting in evenly rebalanced classes, with a focus on gamma and hadron classes and the use of random oversampling.', 'chapters': [{'end': 2318.975, 'start': 2079.08, 'title': 'Data scaling and splitting for machine learning', 'summary': 'Demonstrates the process of splitting data into training, validation, and test sets, using numpy to shuffle and scale the data for machine learning, with an emphasis on maintaining relative scaling and distribution.', 'duration': 239.895, 'highlights': ['Creating training, validation, and test data sets by splitting the data frame using numpy, with 60% for training, 20% for validation, and 20% for testing, ensuring relative scaling and distribution.', 'Utilizing a standard scaler from sklearn to scale the data, ensuring that the values are relative to the mean and standard deviation of each specific column.', 'Importing the standard scaler from sklearn.preprocessing and fitting it to the data to transform the values, maintaining the relative scaling to improve the results.']}, {'end': 2642.047, 'start': 2320.307, 'title': 'Oversampling for balanced training data', 'summary': 'Explains the process of oversampling to balance the training data set by increasing the size of the smaller class, resulting in evenly rebalanced classes, with a focus on the number of gamma and hadron classes and the use of random oversampling to achieve this.', 'duration': 321.74, 'highlights': ['The training data set consists of around 7000 gamma class instances and around 4000 hadron class instances, indicating an imbalance that needs to be addressed through oversampling.', "The process of oversampling involves increasing the size of the smaller class to achieve a balanced data set, with the use of the 'random oversampler' from 'imblearn.oversampling' library to achieve this.", "Reshaping the data into a 2D array is essential, involving the use of numpy.reshape with dimensions specified as '-1, 1', which infers the dimension value based on the length of y, ensuring compatibility for stacking.", 'The validation and test data sets are not oversampled, as they are intended for evaluating model performance on unseen data, and balanced classes are not a priority for these sets.']}], 'duration': 562.967, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2079080.jpg', 'highlights': ['Creating training, validation, and test data sets using numpy with 60% for training, 20% for validation, and 20% for testing', 'Utilizing a standard scaler from sklearn to scale the data, ensuring relative scaling and distribution', 'The training data set consists of around 7000 gamma class instances and around 4000 hadron class instances', 'Oversampling involves increasing the size of the smaller class to achieve a balanced data set', "Reshaping the data into a 2D array is essential, involving the use of numpy.reshape with dimensions specified as '-1, 1'", 'The validation and test data sets are not oversampled, as they are intended for evaluating model performance on unseen data']}, {'end': 3450.695, 'segs': [{'end': 2703.247, 'src': 'embed', 'start': 2675.504, 'weight': 3, 'content': [{'end': 2681.408, 'text': "And I'm going to tell you guys a little bit about each of these models, and then I'm going to show you how we can do that in our code.", 'start': 2675.504, 'duration': 5.904}, {'end': 2687.013, 'text': "So the first model that we're going to learn about is KNN or K nearest neighbors.", 'start': 2682.569, 'duration': 4.444}, {'end': 2692.657, 'text': "Okay So here I've already drawn a plot on the Y axis.", 'start': 2688.013, 'duration': 4.644}, {'end': 2694.959, 'text': 'I have the number of kids.', 'start': 2693.017, 'duration': 1.942}, {'end': 2697.185, 'text': 'that a family might have.', 'start': 2696.004, 'duration': 1.181}, {'end': 2703.247, 'text': 'And then on the x axis, I have their income in terms of 1000s per year.', 'start': 2697.385, 'duration': 5.862}], 'summary': 'Introducing knn model for predicting family size based on income.', 'duration': 27.743, 'max_score': 2675.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2675504.jpg'}, {'end': 2974.665, 'src': 'embed', 'start': 2942.336, 'weight': 5, 'content': [{'end': 2947.999, 'text': 'And this K is basically telling us okay, how many neighbors do we use in order to judge what the label is?', 'start': 2942.336, 'duration': 5.663}, {'end': 2955.937, 'text': 'So usually we use a K of, maybe you know, three or five depends on how big our data set is.', 'start': 2949.093, 'duration': 6.844}, {'end': 2960.439, 'text': 'But here I would say maybe a logical number would be three or five.', 'start': 2955.977, 'duration': 4.462}, {'end': 2965.862, 'text': "So let's say that we take K to be equal to three.", 'start': 2960.94, 'duration': 4.922}, {'end': 2974.665, 'text': 'Okay, well of this data point that I drew over here, Let me use green to highlight this.', 'start': 2966.723, 'duration': 7.942}], 'summary': 'Using k-nearest neighbors algorithm with a k of 3 or 5 to determine labels for data points.', 'duration': 32.329, 'max_score': 2942.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2942336.jpg'}, {'end': 3190.435, 'src': 'embed', 'start': 3157.672, 'weight': 0, 'content': [{'end': 3160.413, 'text': "Okay So that's k nearest neighbors.", 'start': 3157.672, 'duration': 2.741}, {'end': 3163.234, 'text': "So now we've learned about k nearest neighbors.", 'start': 3161.213, 'duration': 2.021}, {'end': 3166.535, 'text': "Let's see how we would be able to do that within our code.", 'start': 3163.934, 'duration': 2.601}, {'end': 3171.037, 'text': "So here, I'm going to label the section k nearest neighbors.", 'start': 3167.676, 'duration': 3.361}, {'end': 3176.209, 'text': "and we're actually going to use a package from sklearn.", 'start': 3173.468, 'duration': 2.741}, {'end': 3183.693, 'text': "So the reason why we use these packages is so that we don't have to manually code all of these things ourself,", 'start': 3176.71, 'duration': 6.983}, {'end': 3190.435, 'text': "because it would be really difficult and, chances are, the way that we would code it either would have bugs or it'd be really slow or, I don't know,", 'start': 3183.693, 'duration': 6.742}], 'summary': 'Introduction to k-nearest neighbors and using sklearn for implementation.', 'duration': 32.763, 'max_score': 3157.672, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3157672.jpg'}, {'end': 3432.956, 'src': 'embed', 'start': 3399.014, 'weight': 1, 'content': [{'end': 3401.055, 'text': 'And recall is saying okay.', 'start': 3399.014, 'duration': 2.041}, {'end': 3407.177, 'text': 'out of all the ones that we know are truly positive, how many do we actually get right? Okay.', 'start': 3401.055, 'duration': 6.122}, {'end': 3411.799, 'text': 'So, going back to this over here, our precision score.', 'start': 3408.497, 'duration': 3.302}, {'end': 3420.508, 'text': "so Again, precision out of all the ones that we've labeled as the specific class, how many of them are actually that class?", 'start': 3411.799, 'duration': 8.709}, {'end': 3423.73, 'text': "It's 77 and 84%.", 'start': 3420.829, 'duration': 2.901}, {'end': 3432.956, 'text': 'Now recall how out of all the ones that are actually this class, how many of those did we get? This is 68% and 89%.', 'start': 3423.73, 'duration': 9.226}], 'summary': 'Precision score: 77-84%, recall: 68-89%', 'duration': 33.942, 'max_score': 3399.014, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3399014.jpg'}], 'start': 2642.147, 'title': 'K nearest neighbors model and its implementation', 'summary': 'Introduces the k nearest neighbors (knn) model, explaining its use of euclidean distance and k value for classification, while demonstrating its implementation using sklearn package with 82% accuracy, 77% and 84% precision scores, and 68% and 89% recall scores.', 'chapters': [{'end': 3157.232, 'start': 2642.147, 'title': 'K nearest neighbors model', 'summary': "Introduces the concept of k nearest neighbors (knn) model, explaining the algorithm's use of euclidean distance and k value to classify unlabeled data points based on majority voting of the nearest neighbors in a 2d plot.", 'duration': 515.085, 'highlights': ['The K Nearest Neighbors (KNN) model is introduced, utilizing Euclidean distance and a specified K value to classify unlabeled data points based on the majority of the nearest neighbors.', 'Explanation of how the KNN model functions in a 2D plot, using the example of income and number of kids to demonstrate the process of classifying data points based on proximity and majority voting.', 'Clarification on the process of determining the K value and the identification of the nearest neighbors for classifying data points in the KNN model.']}, {'end': 3450.695, 'start': 3157.672, 'title': 'Implementing k nearest neighbors with sklearn', 'summary': 'Demonstrates how to implement k nearest neighbors using sklearn package, achieving 82% accuracy in classification with precision scores of 77% and 84%, and recall scores of 68% and 89%.', 'duration': 293.023, 'highlights': ['The chapter demonstrates how to implement k nearest neighbors using the sklearn package, achieving 82% accuracy in classification.', 'The precision scores for class zero and class one are 77% and 84%, while the recall scores are 68% and 89%.', 'The F1 score is a combination of the precision and recall score.']}], 'duration': 808.548, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg2642147.jpg', 'highlights': ['The chapter demonstrates k nearest neighbors using sklearn with 82% accuracy', 'Precision scores for class zero and class one are 77% and 84%', 'Recall scores are 68% and 89%', 'Introduction of KNN model using Euclidean distance and a specified K value', 'Explanation of KNN model functioning in a 2D plot using income and number of kids', 'Clarification on determining the K value and identifying nearest neighbors']}, {'end': 5779.089, 'segs': [{'end': 3489.517, 'src': 'embed', 'start': 3451.276, 'weight': 0, 'content': [{'end': 3456.819, 'text': "So we're actually gonna mostly look at this one because we have an unbalanced test data set.", 'start': 3451.276, 'duration': 5.543}, {'end': 3461.022, 'text': 'So here we have a measure of 72 and 87, or 0.72 and 0.87, which is not too shabby.', 'start': 3457.54, 'duration': 3.482}, {'end': 3461.322, 'text': 'All right.', 'start': 3461.042, 'duration': 0.28}, {'end': 3462.483, 'text': 'Well, what if we made this three?', 'start': 3461.362, 'duration': 1.121}, {'end': 3478.394, 'text': 'So we actually see that okay, so what was it originally with one?', 'start': 3474.693, 'duration': 3.701}, {'end': 3484.255, 'text': 'We see that our F1 score is now.', 'start': 3480.834, 'duration': 3.421}, {'end': 3489.517, 'text': 'it was 0.72 and then 0.87, and then our accuracy was 82%.', 'start': 3484.255, 'duration': 5.262}], 'summary': 'Analyzing unbalanced test data, improving f1 score to 0.87, accuracy at 82%', 'duration': 38.241, 'max_score': 3451.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3451276.jpg'}, {'end': 3726.284, 'src': 'embed', 'start': 3695.26, 'weight': 1, 'content': [{'end': 3709.131, 'text': "So, according to this data set, which is data that I made up off the top of my head, so it's not actually real COVID data, but according to this data,", 'start': 3695.26, 'duration': 13.871}, {'end': 3713.294, 'text': 'the probability of having COVID, given that you tested positive, is 96.4%.', 'start': 3709.131, 'duration': 4.163}, {'end': 3721.2, 'text': "All right, now with that, let's talk about Bayes' rule.", 'start': 3713.294, 'duration': 7.906}, {'end': 3723.821, 'text': 'which is this section here.', 'start': 3722.079, 'duration': 1.742}, {'end': 3726.284, 'text': "Let's ignore this bottom part for now.", 'start': 3724.502, 'duration': 1.782}], 'summary': 'Probability of having covid, given a positive test, is 96.4%.', 'duration': 31.024, 'max_score': 3695.26, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3695260.jpg'}, {'end': 3992.762, 'src': 'embed', 'start': 3960.114, 'weight': 2, 'content': [{'end': 3962.075, 'text': "All right, now let's plug in some numbers for that.", 'start': 3960.114, 'duration': 1.961}, {'end': 3966.858, 'text': 'The probability of having a positive test given that I have the disease is 0.99.', 'start': 3962.536, 'duration': 4.322}, {'end': 3969.779, 'text': 'And then the probability that I have the disease.', 'start': 3966.858, 'duration': 2.921}, {'end': 3976.078, 'text': 'is this value over here 0.1,, okay?', 'start': 3972.717, 'duration': 3.361}, {'end': 3986.06, 'text': 'And then the probability that I have a positive test at all should be okay.', 'start': 3980.359, 'duration': 5.701}, {'end': 3992.762, 'text': 'what is the probability that I have a positive test, given that I actually have the disease and then having the disease?', 'start': 3986.06, 'duration': 6.702}], 'summary': 'Probability of positive test given disease is 0.99, probability of having disease is 0.1.', 'duration': 32.648, 'max_score': 3960.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3960114.jpg'}, {'end': 4372.432, 'src': 'embed', 'start': 4342.558, 'weight': 3, 'content': [{'end': 4354.018, 'text': "Okay, Alright, so in naive Bayes, the point of it being naive is that we're actually this joint probability.", 'start': 4342.558, 'duration': 11.46}, {'end': 4358.162, 'text': "we're just assuming that all of these different things are all independent.", 'start': 4354.018, 'duration': 4.144}, {'end': 4365.469, 'text': "So, in my soccer example, you know the probability that we're playing soccer,", 'start': 4358.222, 'duration': 7.247}, {'end': 4370.891, 'text': "or the probability that you know it's windy and it's rainy and and it's Wednesday.", 'start': 4365.469, 'duration': 5.422}, {'end': 4372.432, 'text': 'all these things are independent.', 'start': 4370.891, 'duration': 1.541}], 'summary': 'Naive bayes assumes independence in calculating joint probability.', 'duration': 29.874, 'max_score': 4342.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg4342558.jpg'}, {'end': 4635.468, 'src': 'embed', 'start': 4605.181, 'weight': 5, 'content': [{'end': 4607.481, 'text': 'we maybe we have like the evidence for that.', 'start': 4605.181, 'duration': 2.3}, {'end': 4609.882, 'text': 'we have the answers for that, based on our training set.', 'start': 4607.481, 'duration': 2.401}, {'end': 4620.623, 'text': 'So this principle of going through each of these and finding whatever class, whatever category, maximizes this expression on the right.', 'start': 4612.142, 'duration': 8.481}, {'end': 4635.468, 'text': 'this is something known as MAP, for short or maximum, a Posteriori pick the hypothesis.', 'start': 4620.623, 'duration': 14.845}], 'summary': 'Using the training set, we apply the map principle to maximize expression for hypothesis selection.', 'duration': 30.287, 'max_score': 4605.181, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg4605181.jpg'}, {'end': 5263.193, 'src': 'embed', 'start': 5187.187, 'weight': 6, 'content': [{'end': 5204.08, 'text': "Oops, let's draw it sharper, but it fits our shape up there a lot better, right? All right, so that is what we call logistic regression.", 'start': 5187.187, 'duration': 16.893}, {'end': 5216.787, 'text': "We're basically trying to fit our data to the sigmoid function, okay? And when we only have, you know, data point.", 'start': 5204.1, 'duration': 12.687}, {'end': 5224.973, 'text': "So if we only have one feature x, and that's what we call simple logistic regression.", 'start': 5217.027, 'duration': 7.946}, {'end': 5229.316, 'text': "But then if we have, you know so that's only x zero.", 'start': 5226.594, 'duration': 2.722}, {'end': 5237.942, 'text': 'but then if we have x zero, x one all the way to x n, we call this multiple logistic regression,', 'start': 5229.316, 'duration': 8.626}, {'end': 5242.646, 'text': "because there are multiple features that we're considering when we're building our model.", 'start': 5237.942, 'duration': 4.704}, {'end': 5244.925, 'text': 'logistic regression.', 'start': 5243.684, 'duration': 1.241}, {'end': 5247.766, 'text': "So I'm going to put that here.", 'start': 5246.125, 'duration': 1.641}, {'end': 5256.37, 'text': 'And again, from SK learn, this linear model, we can import logistic regression.', 'start': 5248.486, 'duration': 7.884}, {'end': 5263.193, 'text': 'Right And just like how we did above, we can repeat all of this.', 'start': 5257.451, 'duration': 5.742}], 'summary': 'Logistic regression fits data to sigmoid function to build a model with multiple features.', 'duration': 76.006, 'max_score': 5187.187, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5187187.jpg'}, {'end': 5373.898, 'src': 'embed', 'start': 5336.38, 'weight': 8, 'content': [{'end': 5339.702, 'text': 'So here, this is decent precision, 65%, recall 71, F1 68 or 82, total accuracy of 77.', 'start': 5336.38, 'duration': 3.322}, {'end': 5345.044, 'text': "Okay, so it performs slightly better than Naive Bayes, but it's still not as good as KNN.", 'start': 5339.702, 'duration': 5.342}, {'end': 5362.992, 'text': 'Alright, so the last model for classification that I wanted to talk about is something called support vector machines, or SVMs for short.', 'start': 5352.867, 'duration': 10.125}, {'end': 5373.898, 'text': 'So what exactly is an SVM model, I have two different features x zero and x one on the axes.', 'start': 5365.333, 'duration': 8.565}], 'summary': 'Model achieves 65% precision, 71% recall, and 77% accuracy, outperforming naive bayes but not knn.', 'duration': 37.518, 'max_score': 5336.38, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5336380.jpg'}, {'end': 5638.059, 'src': 'embed', 'start': 5610.618, 'weight': 9, 'content': [{'end': 5617.8, 'text': 'Okay, so these both here, these are our margins in our SVMs.', 'start': 5610.618, 'duration': 7.182}, {'end': 5620.821, 'text': 'And our goal is to maximize those margins.', 'start': 5618.62, 'duration': 2.201}, {'end': 5627.303, 'text': 'So not only do we want the line that best separates the two different classes, we want the line that has the largest margin.', 'start': 5620.841, 'duration': 6.462}, {'end': 5634.856, 'text': 'and the data points that lie on the margin lines, the data.', 'start': 5628.991, 'duration': 5.865}, {'end': 5638.059, 'text': "So basically, these are the data points that's helping us define our divider.", 'start': 5634.916, 'duration': 3.143}], 'summary': 'Goal: maximize margins in svms for best class separation.', 'duration': 27.441, 'max_score': 5610.618, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5610618.jpg'}], 'start': 3451.276, 'title': 'Probabilistic classification models', 'summary': 'Discusses the application of naive bayes and logistic regression in probabilistic classification, demonstrating the impact on f1 score and accuracy, with a detailed explanation and examples, including a 96.4% probability of having covid given a positive test, and a comparison with svm model for classification.', 'chapters': [{'end': 3771.175, 'start': 3451.276, 'title': 'Naive bayes and unbalanced test data', 'summary': "Discusses the impact of changing the measure on f1 score and accuracy, the concept of conditional probability and bayes' rule, and demonstrates bayes' rule in action using an example, including a 96.4% probability of having covid given a positive test.", 'duration': 319.899, 'highlights': ['Demonstrates the impact of changing the measure on F1 score and accuracy, with the accuracy decreasing to 81% when the measure is changed to three and remaining at 82% when changed to five.', 'Explains the concept of conditional probability and calculates a 96.4% probability of having COVID given a positive test using a made-up dataset.', "Introduces Bayes' rule and its application in calculating the probability of an event given another event, based on the probability of the second event given the first, the probabilities of the events, and demonstrates its use in an example."]}, {'end': 4280.83, 'start': 3771.875, 'title': 'Naive bayes classification', 'summary': "Discusses the application of bayes' rule in determining the probability of having a disease given a positive test, with a detailed explanation on naive bayes classification and its components, using an example of classifying soccer playing based on weather conditions.", 'duration': 508.955, 'highlights': ['The probability of having a positive test given the disease is 0.99, and the probability of having the disease is 0.1, resulting in a 68.75% probability of testing positive but not having the disease.', 'Naive Bayes classification involves calculating the posterior probability of a specific class given evidence, based on the likelihood, prior probability, and evidence, as demonstrated through an example of classifying soccer playing based on weather conditions.']}, {'end': 4764.604, 'start': 4284.452, 'title': 'Naive bayes: probabilistic classification', 'summary': 'Discusses the principles of naive bayes, explaining the concept of proportional probability and independence assumption, and concludes with the application of naive bayes in a classification problem with a focus on the maximum a posteriori (map) principle, followed by a comparison with logistic regression.', 'duration': 480.152, 'highlights': ['Naive Bayes assumes independence among features, allowing the joint probability to be represented as a multiplication of individual feature probabilities, simplifying the probability calculation.', 'The Maximum A Posteriori (MAP) principle is employed to select the category that maximizes the expression of conditional probabilities, aiming to minimize misclassification probability.', 'The application of Naive Bayes in a classification problem results in a 72% accuracy, which is evaluated using precision, recall, and F1 score metrics.']}, {'end': 5335.84, 'start': 4765.626, 'title': 'Understanding logistic regression', 'summary': "Explains the concept of logistic regression, transforming linear regression to a probability function using the sigmoid function, and its application in simple and multiple logistic regression models, as well as the usage of logistic regression in machine learning with python's sk learn library.", 'duration': 570.214, 'highlights': ['Logistic regression transforms linear regression into a probability function using the sigmoid function to ensure the range of the probability stays between 0 and 1, enhancing its applicability for classification tasks.', 'The concept of simple logistic regression is introduced for models with a single feature, while multiple logistic regression is explained for models with multiple features, showcasing the versatility of logistic regression in handling different types of data.', "The use of logistic regression in machine learning is demonstrated through the application of Python's SK learn library, including the default logistic regression model and the option to customize parameters such as penalties and validation data for model optimization."]}, {'end': 5779.089, 'start': 5336.38, 'title': 'Svm model for classification', 'summary': 'Discusses the svm model for classification, with a focus on finding the hyperplane that best separates two classes and maximizing margins, while also highlighting its limitations with outliers and its potential use with one-dimensional data sets.', 'duration': 442.709, 'highlights': ['SVM model achieves 77% total accuracy, performing slightly better than Naive Bayes but not as good as KNN.', 'The goal of SVM is to find the hyperplane that best differentiates two classes and maximize margins, with support vectors defining the divider.', 'SVM model may not be robust to outliers, potentially changing the position of support vectors and affecting its performance.', 'SVM can be used with one-dimensional data sets through creating a projection, such as using X and X^2 to separate the classes.']}], 'duration': 2327.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg3451276.jpg', 'highlights': ['Demonstrates the impact of changing the measure on F1 score and accuracy, with the accuracy decreasing to 81% when the measure is changed to three and remaining at 82% when changed to five.', 'Explains the concept of conditional probability and calculates a 96.4% probability of having COVID given a positive test using a made-up dataset.', 'The probability of having a positive test given the disease is 0.99, and the probability of having the disease is 0.1, resulting in a 68.75% probability of testing positive but not having the disease.', 'Naive Bayes classification involves calculating the posterior probability of a specific class given evidence, based on the likelihood, prior probability, and evidence, as demonstrated through an example of classifying soccer playing based on weather conditions.', 'Naive Bayes assumes independence among features, allowing the joint probability to be represented as a multiplication of individual feature probabilities, simplifying the probability calculation.', 'The Maximum A Posteriori (MAP) principle is employed to select the category that maximizes the expression of conditional probabilities, aiming to minimize misclassification probability.', 'Logistic regression transforms linear regression into a probability function using the sigmoid function to ensure the range of the probability stays between 0 and 1, enhancing its applicability for classification tasks.', 'The concept of simple logistic regression is introduced for models with a single feature, while multiple logistic regression is explained for models with multiple features, showcasing the versatility of logistic regression in handling different types of data.', 'SVM model achieves 77% total accuracy, performing slightly better than Naive Bayes but not as good as KNN.', 'The goal of SVM is to find the hyperplane that best differentiates two classes and maximize margins, with support vectors defining the divider.']}, {'end': 6405.539, 'segs': [{'end': 5861.157, 'src': 'embed', 'start': 5833.084, 'weight': 1, 'content': [{'end': 5835.905, 'text': "how do you define the hyperplane that we're going to use?", 'start': 5833.084, 'duration': 2.821}, {'end': 5843.103, 'text': 'So anyways, this transformation that we did down here, this is known as the kernel trick.', 'start': 5837, 'duration': 6.103}, {'end': 5854.789, 'text': "So when we go from x to some coordinate x and then x squared, what we're doing is we are applying a kernel.", 'start': 5847.105, 'duration': 7.684}, {'end': 5856.47, 'text': "So that's why it's called the kernel trick.", 'start': 5855.029, 'duration': 1.441}, {'end': 5861.157, 'text': "So SVMs are actually really powerful and you'll see that here.", 'start': 5858.615, 'duration': 2.542}], 'summary': 'Exploring the definition and power of svms through the kernel trick.', 'duration': 28.073, 'max_score': 5833.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5833084.jpg'}, {'end': 5957.247, 'src': 'embed', 'start': 5903.797, 'weight': 0, 'content': [{'end': 5905.478, 'text': "Let's see if I can hover over this.", 'start': 5903.797, 'duration': 1.681}, {'end': 5916.126, 'text': 'Right, so again, you see a lot of these different parameters here that you can go back and change if you were creating a production level model.', 'start': 5906.279, 'duration': 9.847}, {'end': 5921.61, 'text': "Okay, but in this specific case, we'll just use it out of the box again.", 'start': 5916.146, 'duration': 5.464}, {'end': 5931.111, 'text': "So if I make predictions, you'll note that wow, the accuracy actually jumps to 87% with the SVM.", 'start': 5924.389, 'duration': 6.722}, {'end': 5936.593, 'text': "And even with class zero, there's nothing less than, you know, point eight, which is great.", 'start': 5931.912, 'duration': 4.681}, {'end': 5943.296, 'text': "And for class one, I mean, everything's at 0.9, which is higher than anything that we had seen to this point.", 'start': 5937.794, 'duration': 5.502}, {'end': 5950.361, 'text': "So, so far we've gone over four different classification models.", 'start': 5946.758, 'duration': 3.603}, {'end': 5954.585, 'text': "We've done SVMs, logistic regression, naive Bayes, and KNN.", 'start': 5950.401, 'duration': 4.184}, {'end': 5957.247, 'text': 'And these are just simple ways on how to implement them.', 'start': 5955.085, 'duration': 2.162}], 'summary': 'Achieved 87% accuracy with svm model, class one scores 0.9, covered four classification models: svm, logistic regression, naive bayes, and knn.', 'duration': 53.45, 'max_score': 5903.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5903797.jpg'}, {'end': 6012.146, 'src': 'embed', 'start': 5984.185, 'weight': 5, 'content': [{'end': 5988.927, 'text': 'Now, the final type of model that I wanted to talk about is known as a neural net or neural network.', 'start': 5984.185, 'duration': 4.742}, {'end': 5992.675, 'text': 'And neural nets look something like this.', 'start': 5990.214, 'duration': 2.461}, {'end': 5994.957, 'text': 'So you have an input layer.', 'start': 5993.456, 'duration': 1.501}, {'end': 5996.598, 'text': 'This is where all your features would go.', 'start': 5995.017, 'duration': 1.581}, {'end': 6001.06, 'text': 'And they have all these arrows pointing to some sort of hidden layer.', 'start': 5997.478, 'duration': 3.582}, {'end': 6004.242, 'text': 'And then all these arrows point to some sort of output layer.', 'start': 6001.701, 'duration': 2.541}, {'end': 6012.146, 'text': 'So what does, what does all this mean? Each of these layers in here, this is something known as a neuron.', 'start': 6005.443, 'duration': 6.703}], 'summary': 'Neural networks consist of input, hidden, and output layers, each containing neurons.', 'duration': 27.961, 'max_score': 5984.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5984185.jpg'}, {'end': 6202.025, 'src': 'embed', 'start': 6170.081, 'weight': 4, 'content': [{'end': 6175.964, 'text': "And the reason why we introduced these is so that our entire model doesn't collapse on itself and become a linear model.", 'start': 6170.081, 'duration': 5.883}, {'end': 6185.248, 'text': 'So over here, this is something known as a sigmoid function, it runs between zero and one, tanh runs between negative one all the way to one.', 'start': 6177.605, 'duration': 7.643}, {'end': 6192.051, 'text': 'And this is relu, which anything less than zero is zero, and then anything greater than zero is linear.', 'start': 6185.848, 'duration': 6.203}, {'end': 6202.025, 'text': 'So, with these activation functions, every single output of a neuron is no longer just the linear combination of these.', 'start': 6194.642, 'duration': 7.383}], 'summary': 'Introduction of sigmoid, tanh, and relu activation functions to prevent collapse of the model into a linear one.', 'duration': 31.944, 'max_score': 6170.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6170081.jpg'}, {'end': 6256.258, 'src': 'embed', 'start': 6222.644, 'weight': 3, 'content': [{'end': 6224.945, 'text': 'And then we do this thing called training,', 'start': 6222.644, 'duration': 2.301}, {'end': 6233.227, 'text': 'where we have to feed the loss back into the model and make certain adjustments to the model to improve this predicted output.', 'start': 6224.945, 'duration': 8.282}, {'end': 6237.068, 'text': "Let's talk a little bit about the training.", 'start': 6235.247, 'duration': 1.821}, {'end': 6244.31, 'text': "What exactly goes on during that step? Let's go back and take a look at our L2 loss function.", 'start': 6237.208, 'duration': 7.102}, {'end': 6256.258, 'text': "this is what our l two loss function looks like it's a quadratic formula, right? Well, up here, the error is really, really, really, really large.", 'start': 6245.456, 'duration': 10.802}], 'summary': 'During training, the model adjusts based on l2 loss function to reduce error.', 'duration': 33.614, 'max_score': 6222.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6222644.jpg'}, {'end': 6386.305, 'src': 'embed', 'start': 6353.047, 'weight': 6, 'content': [{'end': 6357.791, 'text': 'So my new value, this is what we call a weight update.', 'start': 6353.047, 'duration': 4.744}, {'end': 6363.316, 'text': "I'm gonna take W zero, and I'm gonna set some new value for W zero.", 'start': 6357.851, 'duration': 5.465}, {'end': 6376.668, 'text': "And what I'm going to set for that is the old value of w zero, plus some factor, which I'll just call alpha for now, times, whatever this arrow is.", 'start': 6364.221, 'duration': 12.447}, {'end': 6386.305, 'text': "So that's basically saying, okay, take our old w zero, our old weight, and just decrease it this way.", 'start': 6376.989, 'duration': 9.316}], 'summary': 'Updating weight w0 with a new value using a factor alpha.', 'duration': 33.258, 'max_score': 6353.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6353047.jpg'}], 'start': 5781.771, 'title': 'Svms and neural networks', 'summary': 'Introduces svms achieving 87% accuracy in dataset separation, and explains neural network structure, neurons, activation functions, and training using gradient descent.', 'chapters': [{'end': 5983.825, 'start': 5781.771, 'title': 'Understanding svms in machine learning', 'summary': 'Introduces support vector machines (svms) and demonstrates their effectiveness with a dataset separation accuracy of 87%, outperforming other classification models like logistic regression, naive bayes, and knn.', 'duration': 202.054, 'highlights': ['SVM model achieves dataset separation accuracy of 87%', 'Explanation of the kernel trick and its application in SVMs', 'Comparison of SVM with other classification models']}, {'end': 6405.539, 'start': 5984.185, 'title': 'Neural networks and training models', 'summary': 'Explains the structure of a neural network, the role of neurons, activation functions, and the training process using gradient descent for model adjustment.', 'duration': 421.354, 'highlights': ['Neural networks consist of input layer, hidden layers, and output layer, with neurons representing features and applying weights through an activation function.', 'The activation functions like sigmoid, tanh, and relu introduce non-linearities to prevent the collapse of the model into a linear form, essential for complex data representation.', 'Training the model involves using gradient descent to adjust weights based on the L2 loss function, ensuring the predicted value is closer to the true value.', 'The weight update in gradient descent involves adjusting the old weight by a small factor (alpha) in the direction of decreasing loss, contributing to model improvement.']}], 'duration': 623.768, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg5781771.jpg', 'highlights': ['SVM model achieves dataset separation accuracy of 87%', 'Explanation of the kernel trick and its application in SVMs', 'Comparison of SVM with other classification models', 'Training the model involves using gradient descent to adjust weights based on the L2 loss function, ensuring the predicted value is closer to the true value', 'The activation functions like sigmoid, tanh, and relu introduce non-linearities to prevent the collapse of the model into a linear form, essential for complex data representation', 'Neural networks consist of input layer, hidden layers, and output layer, with neurons representing features and applying weights through an activation function', 'The weight update in gradient descent involves adjusting the old weight by a small factor (alpha) in the direction of decreasing loss, contributing to model improvement']}, {'end': 7403.178, 'segs': [{'end': 6433.893, 'src': 'embed', 'start': 6405.899, 'weight': 1, 'content': [{'end': 6410.663, 'text': 'The reason why I use a plus here is because this here is the negative gradient, right?', 'start': 6405.899, 'duration': 4.764}, {'end': 6414.666, 'text': 'If this were just the, if you were to use the actual gradient, this should be a minus.', 'start': 6410.823, 'duration': 3.843}, {'end': 6420.823, 'text': 'Now this alpha is something that we call the learning rate okay?', 'start': 6416.96, 'duration': 3.863}, {'end': 6433.893, 'text': "And that adjusts how quickly we're taking steps and that might tell our that will ultimately control how long it takes for our neural net to converge,", 'start': 6420.843, 'duration': 13.05}], 'summary': 'Explaining the use of learning rate and negative gradient in adjusting neural net convergence.', 'duration': 27.994, 'max_score': 6405.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6405899.jpg'}, {'end': 6493.211, 'src': 'embed', 'start': 6459.856, 'weight': 2, 'content': [{'end': 6464.7, 'text': "After we calculate the loss, we're calculating gradients, making adjustments in the model.", 'start': 6459.856, 'duration': 4.844}, {'end': 6468.944, 'text': "So we're setting all the all the weights to something adjusted slightly.", 'start': 6464.72, 'duration': 4.224}, {'end': 6476.685, 'text': "And then we're saying, Okay, let's take the training set and run it through the model again, and go through this loop all over again.", 'start': 6470.663, 'duration': 6.022}, {'end': 6484.028, 'text': "So for machine learning, we already have seen some libraries that we use, right, we've already seen SK learn.", 'start': 6477.085, 'duration': 6.943}, {'end': 6493.211, 'text': "But when we start going into neural networks, this is kind of what we're trying to program.", 'start': 6485.308, 'duration': 7.903}], 'summary': 'Iteratively adjust weights in neural network for machine learning.', 'duration': 33.355, 'max_score': 6459.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6459856.jpg'}, {'end': 6984.948, 'src': 'embed', 'start': 6958.924, 'weight': 0, 'content': [{'end': 6966.786, 'text': 'So here we do see that you know, our validation accuracy improves from around point seven,', 'start': 6958.924, 'duration': 7.862}, {'end': 6972.253, 'text': 'seven or something all the way up to somewhere around point maybe 81..', 'start': 6966.786, 'duration': 5.467}, {'end': 6973.454, 'text': 'And our loss is decreasing.', 'start': 6972.253, 'duration': 1.201}, {'end': 6974.075, 'text': 'So this is good.', 'start': 6973.495, 'duration': 0.58}, {'end': 6982.125, 'text': 'It is expected that the validation loss and accuracy is performing worse than the training loss or accuracy.', 'start': 6974.756, 'duration': 7.369}, {'end': 6984.948, 'text': "And that's because our model is training on that data.", 'start': 6982.185, 'duration': 2.763}], 'summary': 'Validation accuracy improved from 0.77 to around 0.81, and loss is decreasing, indicating positive progress.', 'duration': 26.024, 'max_score': 6958.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6958924.jpg'}, {'end': 7047.539, 'src': 'embed', 'start': 7010.709, 'weight': 3, 'content': [{'end': 7012.77, 'text': 'hey, what do we set these hyper parameters to?', 'start': 7010.709, 'duration': 2.061}, {'end': 7023.03, 'text': "So what I'm actually going to do is I'm going to rewrite this so that we can do something what's known as a grid search.", 'start': 7014.508, 'duration': 8.522}, {'end': 7031.592, 'text': 'So we can search through an entire space of hey, what happens if we have 64 nodes and 64 nodes, or 16 nodes and 16 nodes, and so on?', 'start': 7023.07, 'duration': 8.522}, {'end': 7047.539, 'text': 'And then, on top of all that, we can, you know, we can change this learning rate, we can change how many epochs, we can change,', 'start': 7038.177, 'duration': 9.362}], 'summary': 'Discussing grid search for hyperparameters with variations like 64 nodes and 16 nodes, and adjusting learning rate and epochs.', 'duration': 36.83, 'max_score': 7010.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg7010709.jpg'}, {'end': 7296.272, 'src': 'embed', 'start': 7257.691, 'weight': 4, 'content': [{'end': 7263.9, 'text': 'drop out prob, LR, batch size and epochs.', 'start': 7257.691, 'duration': 6.209}, {'end': 7267.743, 'text': 'Okay And then now we have both the model and the history.', 'start': 7264.241, 'duration': 3.502}, {'end': 7275.767, 'text': "And what I'm going to do is again, I want to plot the loss for the history.", 'start': 7268.783, 'duration': 6.984}, {'end': 7282.571, 'text': "I'm also going to plot the accuracy probably should have done them side by side, that probably would have been easier.", 'start': 7275.787, 'duration': 6.784}, {'end': 7291.756, 'text': "Okay, so what I'm going to do is split up, split this up.", 'start': 7282.591, 'duration': 9.165}, {'end': 7296.272, 'text': 'and that will be subplots.', 'start': 7293.63, 'duration': 2.642}], 'summary': 'Analyzing model performance with loss and accuracy plots.', 'duration': 38.581, 'max_score': 7257.691, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg7257691.jpg'}], 'start': 6405.899, 'title': 'Neural network training', 'summary': 'Explains back propagation, emphasizing the use of learning rate and uniform weight updates. it covers tensorflow for training models achieving 81% accuracy and decreasing loss. additionally, it discusses optimizing machine learning parameters through grid search and visualization.', 'chapters': [{'end': 6476.685, 'start': 6405.899, 'title': 'Back propagation in neural networks', 'summary': 'Explains the concept of back propagation in neural networks, emphasizing the use of learning rate to adjust convergence and prevent divergence, while making uniform updates to all weights based on the calculated loss gradients.', 'duration': 70.786, 'highlights': ['The learning rate (alpha) controls the speed of convergence or divergence in the neural network, impacting the time taken for convergence and potential divergence if set too high.', 'Uniform updates are made to all weights (W0, W1, Wn) after calculating the loss and its gradient, demonstrating the process of back propagation in adjusting the model.', 'The process involves calculating the loss, computing gradients, and making adjustments to the model weights, followed by iterating through the training set to repeat the loop.']}, {'end': 6984.948, 'start': 6477.085, 'title': 'Tensorflow for neural networks', 'summary': 'Covers the use of tensorflow for defining and training neural network models, including the process of defining layers, compiling the model, and training it with specified parameters, achieving a validation accuracy of around 81% and decreasing loss.', 'duration': 507.863, 'highlights': ['The chapter introduces the use of TensorFlow for defining neural network models, making it easy to create sequential neural nets with interconnected dense layers, and defining the output layer for classification.', 'The process of compiling the neural network model using TensorFlow is detailed, including specifying the optimizer, learning rate, loss function, and additional metrics like accuracy for evaluation.', 'The training process of the neural network model using TensorFlow is discussed, including setting the number of epochs, batch size, and validation split, leading to achieving a validation accuracy of around 81% and decreasing loss.']}, {'end': 7403.178, 'start': 6984.988, 'title': 'Optimizing machine learning parameters', 'summary': "Discusses the process of finding optimal hyperparameters for a machine learning model, including grid search for parameters like nodes, learning rate, dropout probability, batch size, and epochs, and visualizing the model's loss and accuracy.", 'duration': 418.19, 'highlights': ['The process of finding optimal hyperparameters for a machine learning model involves grid search for parameters like nodes, learning rate, dropout probability, batch size, and epochs.', "The model's training process includes changing the number of nodes, dropout probability, learning rate, batch size, and epochs to optimize its performance.", "Visualizing the model's loss and accuracy using subplots helps in understanding its training history and performance.", 'The dropout layer in the model randomly chooses certain nodes to not be trained in a given iteration, helping prevent overfitting.', "The function 'train model' is defined to take input variables X train, Y train, number of nodes, dropout probability, learning rate, batch size, and epochs to train the machine learning model."]}], 'duration': 997.279, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg6405899.jpg', 'highlights': ['The training process achieves a validation accuracy of around 81% and decreases loss', 'The learning rate controls the speed of convergence or divergence in the neural network', 'Uniform updates are made to all weights after calculating the loss and its gradient', 'The process of finding optimal hyperparameters involves grid search for parameters like nodes, learning rate, dropout probability, batch size, and epochs', "Visualizing the model's loss and accuracy using subplots helps in understanding its training history and performance"]}, {'end': 9379.254, 'segs': [{'end': 7657.688, 'src': 'embed', 'start': 7622.973, 'weight': 0, 'content': [{'end': 7626.716, 'text': "I don't know if this is the proper syntax, but that's probably what I should have done.", 'start': 7622.973, 'duration': 3.743}, {'end': 7630.038, 'text': "But instead, you know, we'll just stick with what we have here.", 'start': 7627.376, 'duration': 2.662}, {'end': 7637.276, 'text': "So you'll see at the end, with the 64 nodes, it seems like this is our best performance.", 'start': 7631.553, 'duration': 5.723}, {'end': 7645.661, 'text': '64 nodes with a dropout of 0.2, a learning rate of 0.001, and a batch size of 64.', 'start': 7637.737, 'duration': 7.924}, {'end': 7657.688, 'text': 'And it does seem like, yes, the validation, the fake validation, but the validation loss is decreasing and then the accuracy is increasing,', 'start': 7645.661, 'duration': 12.027}], 'summary': 'Best performance achieved with 64 nodes, 0.2 dropout, 0.001 learning rate, and batch size of 64.', 'duration': 34.715, 'max_score': 7622.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg7622973.jpg'}, {'end': 8206.201, 'src': 'embed', 'start': 8169.478, 'weight': 1, 'content': [{'end': 8182.303, 'text': 'some line of best fit that will help us decrease this measure of error with respect to all the data points that we have in our data set and try to come up with the best prediction for all of them.', 'start': 8169.478, 'duration': 12.825}, {'end': 8186.905, 'text': 'This is known as simple linear regression.', 'start': 8182.963, 'duration': 3.942}, {'end': 8198.075, 'text': 'regression And basically, that means you know, our equation looks something like this.', 'start': 8188.246, 'duration': 9.829}, {'end': 8206.201, 'text': "Now there's also multiple linear regression.", 'start': 8199.436, 'duration': 6.765}], 'summary': 'Using linear regression to minimize error and make predictions for data points.', 'duration': 36.723, 'max_score': 8169.478, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg8169478.jpg'}, {'end': 8271.489, 'src': 'embed', 'start': 8239.789, 'weight': 4, 'content': [{'end': 8242.95, 'text': 'Now you guys might have noticed that I have some assumptions over here.', 'start': 8239.789, 'duration': 3.161}, {'end': 8247.352, 'text': "And you might be asking, Okay, Kylie, what in the world do these assumptions mean? So let's go over them.", 'start': 8242.971, 'duration': 4.381}, {'end': 8251.055, 'text': 'The first one is linearity.', 'start': 8249.114, 'duration': 1.941}, {'end': 8256.277, 'text': "And what that means is, let's say I have a data set.", 'start': 8253.876, 'duration': 2.401}, {'end': 8268.005, 'text': 'linearity just means okay, my, does my data follow a linear pattern??', 'start': 8263.763, 'duration': 4.242}, {'end': 8271.489, 'text': 'Does y increase as x increases??', 'start': 8268.467, 'duration': 3.022}], 'summary': 'Kylie discusses assumptions, including linearity in data analysis.', 'duration': 31.7, 'max_score': 8239.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg8239789.jpg'}, {'end': 8912.914, 'src': 'embed', 'start': 8884.903, 'weight': 2, 'content': [{'end': 8889.287, 'text': 'And we take the square root of that mean squared error.', 'start': 8884.903, 'duration': 4.384}, {'end': 8897.541, 'text': "And so now the term in which you know, we're defining our error is now in terms of that dollar sign symbol again.", 'start': 8889.867, 'duration': 7.674}, {'end': 8902.245, 'text': "So that's a pro of root mean squared error is that now we can say okay,", 'start': 8897.861, 'duration': 4.384}, {'end': 8907.309, 'text': 'our error according to this metric is this many dollar signs off from our predictor?', 'start': 8902.245, 'duration': 5.064}, {'end': 8912.914, 'text': "Okay, so it's in the same unit, which is one of the pros of root mean squared error.", 'start': 8907.329, 'duration': 5.585}], 'summary': 'The root mean squared error allows expressing error in dollars and same unit.', 'duration': 28.011, 'max_score': 8884.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg8884903.jpg'}, {'end': 9212.398, 'src': 'embed', 'start': 9189.039, 'weight': 3, 'content': [{'end': 9196.086, 'text': "And now when r squared is towards one, that means that that's usually a sign that we have a good predictor.", 'start': 9189.039, 'duration': 7.047}, {'end': 9201.39, 'text': "It's one of the signs, not the only one.", 'start': 9199.769, 'duration': 1.621}, {'end': 9206.534, 'text': "So over here, I also have, you know, that there's this adjusted R squared.", 'start': 9202.531, 'duration': 4.003}, {'end': 9209.416, 'text': 'And what that does, it just adjusts for the number of terms.', 'start': 9206.614, 'duration': 2.802}, {'end': 9212.398, 'text': 'So x1, x2, x3, etc.', 'start': 9209.916, 'duration': 2.482}], 'summary': 'High r squared indicates good prediction, adjusted r squared accounts for terms.', 'duration': 23.359, 'max_score': 9189.039, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg9189039.jpg'}], 'start': 7404.779, 'title': 'Model training, evaluation, and regression in ml', 'summary': 'Covers training and evaluating neural network models achieving 88% accuracy, comparing neural nets with svm model, and explaining linear regression with examples, assumptions, and evaluation methods such as mae and mse.', 'chapters': [{'end': 7738.544, 'start': 7404.779, 'title': 'Neural network model training and evaluation', 'summary': 'Details the process of training a neural network model with specific parameters, evaluating its performance, and selecting the model with the least validation loss, achieving 88% accuracy with 64 nodes, 0.2 dropout, and 0.001 learning rate.', 'duration': 333.765, 'highlights': ['The model achieved an accuracy of around 88% with 64 nodes, 0.2 dropout, and 0.001 learning rate.', 'The chapter describes the process of selecting the model with the least validation loss.', 'The transcript explains the difference between accuracy on the validation data set and during training.']}, {'end': 8239.329, 'start': 7739.912, 'title': 'Model evaluation and regression in ml', 'summary': 'Discusses model evaluation, comparing neural nets with svm model, achieving an 87% accuracy score, and then shifts to regression, explaining linear regression, finding the line of best fit, and minimizing the sum of residuals for prediction.', 'duration': 499.417, 'highlights': ['Explaining model evaluation and comparison with SVM model', 'Introduction to regression and explaining linear regression', 'Minimizing sum of residuals in linear regression']}, {'end': 8537.552, 'start': 8239.789, 'title': 'Assumptions of linear regression', 'summary': 'Explains the assumptions of linear regression, including linearity, independence, normality, and homoscedasticity, with examples and implications for the appropriateness of using linear regression.', 'duration': 297.763, 'highlights': ['The first assumption of linear regression is linearity, where the data should follow a linear trajectory, with y increasing or decreasing at a constant rate as x increases.', 'The second assumption is independence, meaning that each point in the data set should have no influence on the others and should be independent.', 'Normality and homoscedasticity are the other assumptions, where the residuals should be normally distributed around the line of best fit, and the variance of the points should remain constant throughout.']}, {'end': 8827.965, 'start': 8537.852, 'title': 'Evaluating linear regression models', 'summary': 'Discusses evaluating linear regression models through mean absolute error (mae) and mean squared error (mse), providing insights into their mathematical formulations, advantages, and applications in quantifying prediction accuracy.', 'duration': 290.113, 'highlights': ['Mean Squared Error (MSE)', 'Mean Absolute Error (MAE)']}, {'end': 9379.254, 'start': 8829.212, 'title': 'Evaluating linear regression models', 'summary': 'Covers the evaluation of linear regression models, discussing the pros and cons of mean squared error, root mean squared error, and coefficient of determination (r squared) and providing an example of predicting bike rental count using regression.', 'duration': 550.042, 'highlights': ['The chapter discusses the pro of root mean squared error, which allows for error measurement in the same unit as the predictor, providing a more intuitive understanding of the error.', 'The formula for the coefficient of determination (r squared) is explained, indicating that a higher r squared value is a sign of a good predictor.', 'An example of predicting bike rental count using regression is provided, demonstrating the practical application of linear regression in a supervised learning context.']}], 'duration': 1974.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg7404779.jpg', 'highlights': ['The model achieved an accuracy of around 88% with 64 nodes, 0.2 dropout, and 0.001 learning rate.', 'Introduction to regression and explaining linear regression', 'The chapter discusses the pro of root mean squared error, which allows for error measurement in the same unit as the predictor, providing a more intuitive understanding of the error.', 'The formula for the coefficient of determination (r squared) is explained, indicating that a higher r squared value is a sign of a good predictor.', 'The first assumption of linear regression is linearity, where the data should follow a linear trajectory, with y increasing or decreasing at a constant rate as x increases.']}, {'end': 11523.616, 'segs': [{'end': 9805.495, 'src': 'embed', 'start': 9779.306, 'weight': 1, 'content': [{'end': 9790.669, 'text': "So let me now data frame and I'm going to drop wind, visibility, and functional.", 'start': 9779.306, 'duration': 11.363}, {'end': 9792.189, 'text': 'All right.', 'start': 9791.969, 'duration': 0.22}, {'end': 9795.331, 'text': 'And the axis again is a column.', 'start': 9793.45, 'duration': 1.881}, {'end': 9796.051, 'text': "So that's one.", 'start': 9795.371, 'duration': 0.68}, {'end': 9805.495, 'text': 'So if I look at my data set, now, I have just the temperature, the humidity, the dew point temperature, radiation, rain, and snow.', 'start': 9796.992, 'duration': 8.503}], 'summary': 'Data frame modified: wind, visibility, and functional dropped, leaving 6 columns.', 'duration': 26.189, 'max_score': 9779.306, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg9779306.jpg'}, {'end': 10122.94, 'src': 'embed', 'start': 10090.841, 'weight': 2, 'content': [{'end': 10096.224, 'text': "So if I look at x train temp, it's literally just the temperature.", 'start': 10090.841, 'duration': 5.383}, {'end': 10100.726, 'text': "Okay, and I'm doing this first to show you simple linear regression.", 'start': 10096.244, 'duration': 4.482}, {'end': 10104.008, 'text': 'Alright, so right now I can create a regressor.', 'start': 10100.746, 'duration': 3.262}, {'end': 10107.59, 'text': 'So I can say the temp regressor here.', 'start': 10104.888, 'duration': 2.702}, {'end': 10111.936, 'text': "And then I'm going to, you know, make a linear regression model.", 'start': 10108.675, 'duration': 3.261}, {'end': 10122.94, 'text': 'And just like before, I can simply fix fit my x train temp, y train temp in order to train train this linear regression model.', 'start': 10112.096, 'duration': 10.844}], 'summary': 'Demonstrating simple linear regression using temperature data.', 'duration': 32.099, 'max_score': 10090.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg10090841.jpg'}, {'end': 10515, 'src': 'embed', 'start': 10457.368, 'weight': 0, 'content': [{'end': 10462.25, 'text': "So let's go ahead and also score this regressor and let's see how the R squared performs now.", 'start': 10457.368, 'duration': 4.882}, {'end': 10469.712, 'text': 'So if I test this on the test data set, what happens? All right.', 'start': 10462.39, 'duration': 7.322}, {'end': 10471.393, 'text': 'So our R squared seems to improve.', 'start': 10469.752, 'duration': 1.641}, {'end': 10473.314, 'text': 'It went from 0.4 to 0.52, which is a good sign.', 'start': 10471.413, 'duration': 1.901}, {'end': 10481.524, 'text': "Okay, And I can't necessarily plot.", 'start': 10473.334, 'duration': 8.19}, {'end': 10483.405, 'text': 'you know every single dimension.', 'start': 10481.524, 'duration': 1.881}, {'end': 10487.208, 'text': 'but this is just to say okay, this is improved, right?', 'start': 10483.405, 'duration': 3.803}, {'end': 10488.389, 'text': 'All right.', 'start': 10487.228, 'duration': 1.161}, {'end': 10493.992, 'text': 'so one cool thing that you can do with TensorFlow is you can actually do regression, but with a neural net.', 'start': 10488.389, 'duration': 5.603}, {'end': 10500.556, 'text': "So here I'm going to.", 'start': 10497.114, 'duration': 3.442}, {'end': 10509.556, 'text': 'we already have our training data for just the temperature and just for all the different columns.', 'start': 10504.372, 'duration': 5.184}, {'end': 10512.278, 'text': "So I'm not gonna bother with splitting up the data again.", 'start': 10509.596, 'duration': 2.682}, {'end': 10515, 'text': "I'm just gonna go ahead and start building the model.", 'start': 10513.378, 'duration': 1.622}], 'summary': 'Improved r squared from 0.4 to 0.52 indicates progress in regression analysis using tensorflow.', 'duration': 57.632, 'max_score': 10457.368, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg10457368.jpg'}, {'end': 11505.164, 'src': 'embed', 'start': 11478.633, 'weight': 4, 'content': [{'end': 11486.841, 'text': "So you can see that our neural net for the larger values, it seems like it's a little bit more spread out.", 'start': 11478.633, 'duration': 8.208}, {'end': 11492.447, 'text': 'And it seems like we tend to underestimate a little bit down here in this area.', 'start': 11487.141, 'duration': 5.306}, {'end': 11499.254, 'text': 'Okay And for some reason, these are way off as well.', 'start': 11492.467, 'duration': 6.787}, {'end': 11505.164, 'text': "But yeah, so we've basically used a linear regressor and a neural net.", 'start': 11500.981, 'duration': 4.183}], 'summary': 'Neural net underestimates for larger values, linear regressor also used.', 'duration': 26.531, 'max_score': 11478.633, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg11478633.jpg'}], 'start': 9380.25, 'title': 'Data analysis and regression modeling', 'summary': 'Covers importing and describing a dataset, cleaning and analyzing csv data using python, preparing datasets for linear regression and neural net modeling, achieving an r-squared score of around 0.38, implementing regression in tensorflow, demonstrating training of neural net models for temperature prediction, and comparing the performance of linear regression and neural net models, including insights into the appropriateness of each model.', 'chapters': [{'end': 9450.711, 'start': 9380.25, 'title': 'Importing and describing data set', 'summary': 'Covers importing a data set and describing its attributes, including byte count, hour, temperature, humidity, wind, visibility, dew point, radiation, rain, snow, and functional, with a drag-and-drop import method.', 'duration': 70.461, 'highlights': ['The data set includes attributes such as byte count, hour, temperature, humidity, wind, visibility, dew point, radiation, rain, snow, and functional.', 'The import method involves dragging and dropping the data for easy access and analysis.', 'Credit is given to the source of the data set, UCI, and the process of importing is described.']}, {'end': 9718.856, 'start': 9452.409, 'title': 'Data cleaning and analysis with python', 'summary': 'Explains the process of cleaning and analyzing csv data using python, including removing unnecessary columns, converting data to integers, filtering data based on specific criteria, and visualizing the relationships between different variables.', 'duration': 266.447, 'highlights': ['Creating a data frame from CSV file and removing unnecessary columns, resulting in a more manageable dataset.', "Converting data to integers based on specific criteria, such as mapping 'yes' to 1, to facilitate computer processing.", 'Filtering the data frame to include only entries where the hour equals 12, simplifying the example and focusing on specific data points.', 'Visualizing the relationships between different variables by plotting byte count against specific labels, providing insights into their impact on the byte count.']}, {'end': 10333.236, 'start': 9719.557, 'title': 'Linear regression for bike prediction', 'summary': 'Covers the process of preparing the dataset for linear regression analysis, including dropping irrelevant features, splitting the data into training, validation, and test sets, and demonstrating simple linear regression for predicting the number of bikes based on temperature, achieving an r-squared score of around 0.38.', 'duration': 613.679, 'highlights': ['Dropping irrelevant features like wind, visibility, and functional to focus on temperature, humidity, dew point temperature, radiation, rain, and snow.', 'Splitting the data into training, validation, and test sets using numpy.split, with the training set containing 80% of the data.', 'Demonstrating simple linear regression for predicting the number of bikes based on temperature, achieving an R-squared score of around 0.38.']}, {'end': 10614.528, 'start': 10334.471, 'title': 'Linear regression and neural net in tensorflow', 'summary': 'Discusses the implementation of linear regression and multiple linear regression to predict temperature, with the r squared value improving from 0.4 to 0.52, and then introduces the use of a neural net for regression in tensorflow.', 'duration': 280.057, 'highlights': ["The R squared value improved from 0.4 to 0.52 after implementing multiple linear regression, indicating a positive impact on the model's performance.", 'Introduction of using a neural net for regression in TensorFlow, with the implementation of normalization and a single dense layer with one unit to build the model.']}, {'end': 11101.013, 'start': 10614.528, 'title': 'Neural net regression training', 'summary': 'Demonstrates the training of a neural net model for temperature prediction, with two different setups, including a single node and a real neural net, using various parameters and evaluation metrics. the chapter also discusses the comparison with linear regression and the challenges of working with machine learning.', 'duration': 486.485, 'highlights': ['The chapter demonstrates the training of a neural net model for temperature prediction, with two different setups, including a single node and a real neural net.', "The training involves setting different parameters, such as learning rate, loss function, and epochs, to optimize the model's performance.", 'The comparison between linear regression and neural net models is discussed, highlighting the different approaches and training processes.', 'Challenges and considerations in working with machine learning models, such as the need for experimentation and the impact of data distribution on model performance, are addressed.']}, {'end': 11523.616, 'start': 11103.055, 'title': 'Comparison of linear regressor and neural net', 'summary': 'Discusses training a neural net model and comparing its mean squared error with a linear regressor, revealing that the neural net has a larger mean squared error, despite the decreasing loss curve and insights into the appropriateness of each model.', 'duration': 420.561, 'highlights': ['The neural net model has a larger mean squared error than the linear regressor, despite the decreasing loss curve, indicating the potential limitations of the neural net in this context.', 'The chapter emphasizes the importance of determining the most appropriate model for a specific context, with instances where a multiple linear regressor might work better than a neural net, highlighting the need for practical evaluation of model performance.', 'Insights are provided into the differences in prediction patterns between the linear regressor and neural net, showcasing that the neural net tends to underestimate values in certain areas, while the linear regressor aligns closely with the true values, offering practical observations for model performance.']}], 'duration': 2143.366, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg9380250.jpg', 'highlights': ['The R squared value improved from 0.4 to 0.52 after implementing multiple linear regression', 'The data set includes attributes such as byte count, hour, temperature, humidity, wind, visibility, dew point, radiation, rain, snow, and functional', 'Demonstrating simple linear regression for predicting the number of bikes based on temperature, achieving an R-squared score of around 0.38', 'The chapter demonstrates the training of a neural net model for temperature prediction, with two different setups, including a single node and a real neural net', 'Insights are provided into the differences in prediction patterns between the linear regressor and neural net, showcasing that the neural net tends to underestimate values in certain areas, while the linear regressor aligns closely with the true values']}, {'end': 12133.58, 'segs': [{'end': 11552.392, 'src': 'embed', 'start': 11524.176, 'weight': 2, 'content': [{'end': 11530.641, 'text': 'But for example, with the one-dimensional case, a linear regressor would never be able to see this curve.', 'start': 11524.176, 'duration': 6.465}, {'end': 11539.689, 'text': "I mean I'm not saying this is a great model either, but I'm just saying like hey,", 'start': 11533.263, 'duration': 6.426}, {'end': 11542.993, 'text': "sometimes it might be more appropriate to use something that's not linear.", 'start': 11539.689, 'duration': 3.304}, {'end': 11549.091, 'text': 'So yeah, I will leave regression at that.', 'start': 11545.43, 'duration': 3.661}, {'end': 11552.392, 'text': 'Okay, so we just talked about supervised learning.', 'start': 11549.951, 'duration': 2.441}], 'summary': 'Linear regressor is limited in capturing non-linear relationships; suggesting the need for alternative models in certain cases.', 'duration': 28.216, 'max_score': 11524.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg11524176.jpg'}, {'end': 11609.692, 'src': 'embed', 'start': 11578.032, 'weight': 0, 'content': [{'end': 11582.275, 'text': 'So with unsupervised learning, we have a bunch of unlabeled data.', 'start': 11578.032, 'duration': 4.243}, {'end': 11593.622, 'text': "And what can we do with that? Can we learn anything from this data? So the first algorithm that we're gonna discuss is known as k-means clustering.", 'start': 11583.215, 'duration': 10.407}, {'end': 11602.667, 'text': "What k-means clustering is trying to do is it's trying to compute k clusters from the data.", 'start': 11594.202, 'duration': 8.465}, {'end': 11609.692, 'text': 'So in this example below, I have a bunch of scattered points.', 'start': 11605.808, 'duration': 3.884}], 'summary': 'Unsupervised learning: using k-means clustering to compute k clusters from unlabeled data.', 'duration': 31.66, 'max_score': 11578.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg11578032.jpg'}, {'end': 11882.71, 'src': 'embed', 'start': 11855.228, 'weight': 3, 'content': [{'end': 11863.554, 'text': "So there's this group here, there's this group here, and then blue is kind of just this group here.", 'start': 11855.228, 'duration': 8.326}, {'end': 11865.576, 'text': "It hasn't really touched any of the points yet.", 'start': 11863.735, 'duration': 1.841}, {'end': 11875.164, 'text': 'So the next step, three, that we do is we actually go and we recalculate the centroids.', 'start': 11867.758, 'duration': 7.406}, {'end': 11882.71, 'text': 'So we compute new centroids based on the points that we have in all the centroids.', 'start': 11875.224, 'duration': 7.486}], 'summary': 'Data points recalculated to update centroids for groups.', 'duration': 27.482, 'max_score': 11855.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg11855228.jpg'}, {'end': 12153.31, 'src': 'embed', 'start': 12114.871, 'weight': 4, 'content': [{'end': 12122.795, 'text': "And so that's when we know that we can stop iterating between steps two and three is when we've converged on some solution,", 'start': 12114.871, 'duration': 7.924}, {'end': 12125.276, 'text': "when we've reached some stable point.", 'start': 12122.795, 'duration': 2.481}, {'end': 12131.8, 'text': 'And so now, because none of these points are really changing out of their clusters anymore, we can go back to the user and say hey,', 'start': 12126.197, 'duration': 5.603}, {'end': 12133.58, 'text': 'these are our three clusters.', 'start': 12131.8, 'duration': 1.78}, {'end': 12153.31, 'text': "And this process, something known as expectation maximization, This part where we're assigning the points to the closest centroid.", 'start': 12135.441, 'duration': 17.869}], 'summary': 'Using expectation maximization, reaching stable clusters allows stopping iteration.', 'duration': 38.439, 'max_score': 12114.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12114871.jpg'}], 'start': 11524.176, 'title': 'Unsupervised learning: k-means clustering', 'summary': 'Discusses the need for non-linear models in regression, introduces unsupervised learning, and explains the k-means clustering algorithm which aims to compute k clusters from unlabeled data by iteratively choosing centroids, assigning points to the closest centroid, and recalculating centroids. it also explains the iterative process of computing centroids and clusters, resulting in the formation of stable clusters.', 'chapters': [{'end': 11906.864, 'start': 11524.176, 'title': 'Unsupervised learning: k-means clustering', 'summary': 'Discusses the need for non-linear models in regression, introduces unsupervised learning, and explains the k-means clustering algorithm, which aims to compute k clusters from unlabeled data by iteratively choosing centroids, assigning points to the closest centroid, and recalculating centroids.', 'duration': 382.688, 'highlights': ['The chapter emphasizes the need for non-linear models in regression to capture complex relationships in data.', 'The chapter introduces unsupervised learning and the concept of working with unlabeled data to derive insights.', 'The k-means clustering algorithm is explained, detailing the iterative process of choosing centroids, assigning points to the closest centroid, and recalculating centroids to form k clusters from unlabeled data.']}, {'end': 12133.58, 'start': 11908.744, 'title': 'Iterative centroid computation', 'summary': 'Explains the iterative process of computing centroids and clusters, in which the points are assigned to the closest centroid based on distance, resulting in the formation of stable clusters.', 'duration': 224.836, 'highlights': ['The process involves iteratively recomputing centroids and assigning points to the closest centroid based on distance.', 'Stability is achieved when the points no longer change clusters after iterations, indicating convergence to a solution.']}], 'duration': 609.404, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg11524176.jpg', 'highlights': ['The k-means clustering algorithm is explained, detailing the iterative process of choosing centroids, assigning points to the closest centroid, and recalculating centroids to form k clusters from unlabeled data.', 'The chapter introduces unsupervised learning and the concept of working with unlabeled data to derive insights.', 'The chapter emphasizes the need for non-linear models in regression to capture complex relationships in data.', 'The process involves iteratively recomputing centroids and assigning points to the closest centroid based on distance.', 'Stability is achieved when the points no longer change clusters after iterations, indicating convergence to a solution.']}, {'end': 12832.72, 'segs': [{'end': 12200.733, 'src': 'embed', 'start': 12135.441, 'weight': 0, 'content': [{'end': 12153.31, 'text': "And this process, something known as expectation maximization, This part where we're assigning the points to the closest centroid.", 'start': 12135.441, 'duration': 17.869}, {'end': 12157.654, 'text': 'this is something this is our expectation step.', 'start': 12153.31, 'duration': 4.344}, {'end': 12173.986, 'text': "And this part where we're computing the new centroids, this is our maximization step, okay? So that's expectation maximization.", 'start': 12159.956, 'duration': 14.03}, {'end': 12183.948, 'text': 'And we use this in order to compute the centroids assign all the points to clusters, according to those centroids.', 'start': 12175.187, 'duration': 8.761}, {'end': 12190.25, 'text': "And then we're recomputing all that over again until we reach some stable point where nothing is changing anymore.", 'start': 12184.648, 'duration': 5.602}, {'end': 12196.132, 'text': "Alright, so that's our first example of unsupervised learning.", 'start': 12190.27, 'duration': 5.862}, {'end': 12200.733, 'text': 'And basically, what this is doing is trying to find some structure, some pattern in the data.', 'start': 12196.432, 'duration': 4.301}], 'summary': 'Using expectation maximization for unsupervised learning to find data patterns and structure.', 'duration': 65.292, 'max_score': 12135.441, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12135441.jpg'}, {'end': 12282.832, 'src': 'embed', 'start': 12250.543, 'weight': 2, 'content': [{'end': 12258.887, 'text': 'And what do I mean by dimensionality reduction is if I have a bunch of features like X, one, X, two, X, three, X, four, et cetera,', 'start': 12250.543, 'duration': 8.344}, {'end': 12261.929, 'text': 'can I just reduce that down to one dimension?', 'start': 12258.887, 'duration': 3.042}, {'end': 12266.391, 'text': 'that gives me the most information about how all of these points are spread relative to one another.', 'start': 12261.929, 'duration': 4.462}, {'end': 12268.832, 'text': "And that's what PCA is for.", 'start': 12267.171, 'duration': 1.661}, {'end': 12282.832, 'text': "So PCA principal component analysis, let's say I have some points in the X zero and X one feature space.", 'start': 12268.952, 'duration': 13.88}], 'summary': 'Pca reduces multi-dimensional data to one dimension to capture key information.', 'duration': 32.289, 'max_score': 12250.543, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12250543.jpg'}, {'end': 12379.399, 'src': 'embed', 'start': 12349.558, 'weight': 3, 'content': [{'end': 12357.543, 'text': 'How do we display, you know, how do we how do we demonstrate that this point is a further away from this point than this point.', 'start': 12349.558, 'duration': 7.985}, {'end': 12363.828, 'text': 'And we can do that using principal component analysis.', 'start': 12360.885, 'duration': 2.943}, {'end': 12369.533, 'text': 'Take what you know about linear regression and just forget about it for a second.', 'start': 12366.451, 'duration': 3.082}, {'end': 12370.814, 'text': 'Otherwise you might get confused.', 'start': 12369.593, 'duration': 1.221}, {'end': 12379.399, 'text': 'PCA is a way of trying to find direction in the space with the largest variance.', 'start': 12371.114, 'duration': 8.285}], 'summary': 'Demonstrating distance using pca to find direction in space.', 'duration': 29.841, 'max_score': 12349.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12349558.jpg'}, {'end': 12683.036, 'src': 'embed', 'start': 12655.117, 'weight': 4, 'content': [{'end': 12663.899, 'text': "And we're saying alright, how much you know, how much distance is there between that projection residual,", 'start': 12655.117, 'duration': 8.782}, {'end': 12667.119, 'text': "and we're trying to minimize that for all of these points.", 'start': 12663.899, 'duration': 3.22}, {'end': 12683.036, 'text': 'So that actually equates to this largest variance dimension, this dimension here, the PCA dimension.', 'start': 12668.399, 'duration': 14.637}], 'summary': 'Minimizing projection residual to find largest variance dimension in pca.', 'duration': 27.919, 'max_score': 12655.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12655117.jpg'}], 'start': 12135.441, 'title': 'Expectation maximization and principal component analysis', 'summary': 'Covers expectation maximization for unsupervised learning to compute centroids and assign points to clusters, and principal component analysis for dimensionality reduction and data visualization.', 'chapters': [{'end': 12224.778, 'start': 12135.441, 'title': 'Expectation maximization in unsupervised learning', 'summary': 'Explains the expectation maximization process, used in unsupervised learning to compute centroids and assign points to clusters until reaching a stable point, aiming to find structure and patterns in the data.', 'duration': 89.337, 'highlights': ['The process involves assigning points to the closest centroid in the expectation step and computing new centroids in the maximization step, iteratively recomputing until reaching a stable point.', 'Unsupervised learning aims to find structure and patterns in data by clustering points based on their proximity to each other, helping to identify which cluster a new point should belong to.']}, {'end': 12832.72, 'start': 12225.999, 'title': 'Principal component analysis', 'summary': 'Explains principal component analysis (pca) as a technique for dimensionality reduction, allowing the transformation of multi-dimensional data to a lower dimension while retaining the most information, which is crucial for data visualization and model building.', 'duration': 606.721, 'highlights': ['PCA is used for dimensionality reduction to transform multi-dimensional data into a lower dimension while retaining the most information.', 'It helps in finding a direction in the space with the largest variance to represent the dataset in a lower dimension.', 'PCA minimizes projection residuals or maximizes the variance between points to identify the dimension with the largest variance.']}], 'duration': 697.279, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12135441.jpg', 'highlights': ['Unsupervised learning aims to find structure and patterns in data by clustering points based on their proximity to each other, helping to identify which cluster a new point should belong to.', 'The process involves assigning points to the closest centroid in the expectation step and computing new centroids in the maximization step, iteratively recomputing until reaching a stable point.', 'PCA is used for dimensionality reduction to transform multi-dimensional data into a lower dimension while retaining the most information.', 'PCA helps in finding a direction in the space with the largest variance to represent the dataset in a lower dimension.', 'PCA minimizes projection residuals or maximizes the variance between points to identify the dimension with the largest variance.']}, {'end': 14031.881, 'segs': [{'end': 12862.523, 'src': 'embed', 'start': 12833.221, 'weight': 4, 'content': [{'end': 12839.627, 'text': "Now, finally, let's move on to implementing the unsupervised learning part of this class.", 'start': 12833.221, 'duration': 6.406}, {'end': 12847.693, 'text': "Here again, I'm on the UCI machine learning repository and I have a seeds data set where, you know,", 'start': 12840.268, 'duration': 7.425}, {'end': 12851.596, 'text': 'I have a bunch of kernels that belong to three different types of wheat.', 'start': 12847.693, 'duration': 3.903}, {'end': 12853.937, 'text': "So there's comma, Rosa and Canadian.", 'start': 12851.656, 'duration': 2.281}, {'end': 12862.523, 'text': 'And the different features that we have access to are, you know, geometric parameters of those wheat kernels.', 'start': 12855.018, 'duration': 7.505}], 'summary': 'Implementing unsupervised learning on the uci seeds dataset with three types of wheat kernels', 'duration': 29.302, 'max_score': 12833.221, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12833221.jpg'}, {'end': 12951.721, 'src': 'embed', 'start': 12906.396, 'weight': 3, 'content': [{'end': 12912.74, 'text': "Okay, and then we're going to import all the classics again, so pandas.", 'start': 12906.396, 'duration': 6.344}, {'end': 12929.856, 'text': "And then I'm also going to import SeedBorne, because I'm going to want that for this specific class.", 'start': 12924.034, 'duration': 5.822}, {'end': 12935.577, 'text': 'Okay Great.', 'start': 12931.696, 'duration': 3.881}, {'end': 12951.721, 'text': 'So now our columns that we have in our seed data set are the area, the perimeter, the compactness, the length, width, asymmetry,', 'start': 12936.177, 'duration': 15.544}], 'summary': 'Importing pandas and seedborne for specific class.', 'duration': 45.325, 'max_score': 12906.396, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12906396.jpg'}, {'end': 13572.322, 'src': 'embed', 'start': 13546.338, 'weight': 1, 'content': [{'end': 13560.241, 'text': "This one's messy, right? So if I come down here and I say compactness and asymmetry, and I'm trying to do this in 2D, this is what my scatter plot.", 'start': 13546.338, 'duration': 13.903}, {'end': 13562.442, 'text': 'So this is what you know.', 'start': 13560.281, 'duration': 2.161}, {'end': 13567.524, 'text': 'my k-means is telling me for these two dimensions, for compactness and asymmetry.', 'start': 13562.442, 'duration': 5.082}, {'end': 13572.322, 'text': 'if we just look at those two, These are our three classes, right?', 'start': 13567.524, 'duration': 4.798}], 'summary': 'Using k-means, identified 3 classes based on compactness and asymmetry in scatter plot.', 'duration': 25.984, 'max_score': 13546.338, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg13546338.jpg'}, {'end': 13698.217, 'src': 'embed', 'start': 13674.284, 'weight': 2, 'content': [{'end': 13681.887, 'text': "So, PCA, we're reducing the dimension, but we're mapping all these, like you know, seven dimensions.", 'start': 13674.284, 'duration': 7.603}, {'end': 13683.548, 'text': "I don't know if there are seven.", 'start': 13681.887, 'duration': 1.661}, {'end': 13684.209, 'text': 'I made that number up.', 'start': 13683.548, 'duration': 0.661}, {'end': 13688.251, 'text': "But we're mapping multiple dimensions into a lower dimension number.", 'start': 13684.249, 'duration': 4.002}, {'end': 13691.093, 'text': "Right And so let's see how that works.", 'start': 13689.132, 'duration': 1.961}, {'end': 13698.217, 'text': 'So from SK learn, decomposition, I can import PCA, and that will be my PCA model.', 'start': 13692.293, 'duration': 5.924}], 'summary': 'Pca reduces multiple dimensions into a lower dimension number.', 'duration': 23.933, 'max_score': 13674.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg13674284.jpg'}, {'end': 14029.679, 'src': 'heatmap', 'start': 13892.797, 'weight': 1, 'content': [{'end': 13894.638, 'text': 'So let me actually take these two.', 'start': 13892.797, 'duration': 1.841}, {'end': 13907.782, 'text': 'Instead of the cluster data frame, I want the, this is the k-means.', 'start': 13902.42, 'duration': 5.362}, {'end': 13915.565, 'text': 'PCA data frame, this is still going to be class, but now x and y are going to be the two PCA dimensions.', 'start': 13908.8, 'duration': 6.765}, {'end': 13921.469, 'text': 'Okay So these are my two PCA dimensions.', 'start': 13916.586, 'duration': 4.883}, {'end': 13925.873, 'text': "And you can see that, you know, they're, they're pretty spread out.", 'start': 13921.529, 'duration': 4.344}, {'end': 13931.697, 'text': "And then here, I'm going to go to my truth classes.", 'start': 13927.974, 'duration': 3.723}, {'end': 13937.241, 'text': "Again, it's PCA one PCA two, but instead of K means this should be truth, PCA data frame.", 'start': 13932.337, 'duration': 4.904}, {'end': 13949.285, 'text': 'So you can see that, like in the truth data frame, along these two dimensions, we actually are doing fairly well in terms of separation, right?', 'start': 13939.218, 'duration': 10.067}, {'end': 13957.651, 'text': 'It does seem like this is slightly more separable than the other, like dimensions that we had been looking at up here.', 'start': 13949.705, 'duration': 7.946}, {'end': 13960.593, 'text': "So that's a good sign.", 'start': 13959.592, 'duration': 1.001}, {'end': 13966.096, 'text': 'And up here, you can see that, hey, some of these correspond to one another.', 'start': 13962.874, 'duration': 3.222}, {'end': 13977.443, 'text': 'I mean, for the most part, our algorithm, our unsupervised clustering algorithm is able to give us, is able to spit out what the proper labels are.', 'start': 13966.136, 'duration': 11.307}, {'end': 13983.327, 'text': 'I mean if you map these specific labels to the different types of kernels.', 'start': 13977.724, 'duration': 5.603}, {'end': 13987.009, 'text': 'but for example this one might all be the comma kernels and same here,', 'start': 13983.327, 'duration': 3.682}, {'end': 13990.572, 'text': 'and then these might all be the Canadian kernels and these might all be the Canadian kernels.', 'start': 13987.009, 'duration': 3.563}, {'end': 13996.381, 'text': 'So it does struggle a little bit with where they overlap,', 'start': 13991.658, 'duration': 4.723}, {'end': 14006.688, 'text': 'but for the most part our algorithm is able to find the three different categories and do a fairly good job at predicting them without any information from us.', 'start': 13996.381, 'duration': 10.307}, {'end': 14009.81, 'text': "We haven't given our algorithm any labels.", 'start': 14006.768, 'duration': 3.042}, {'end': 14012.103, 'text': "So that's the gist of unsupervised learning.", 'start': 14010.481, 'duration': 1.622}, {'end': 14014.785, 'text': 'I hope you guys enjoyed this course.', 'start': 14012.963, 'duration': 1.822}, {'end': 14017.628, 'text': 'I hope a lot of these examples made sense.', 'start': 14014.985, 'duration': 2.643}, {'end': 14026.816, 'text': "If there are certain things that I have done and you're somebody with more experience than me, please feel free to correct me in the comments,", 'start': 14018.729, 'duration': 8.087}, {'end': 14029.679, 'text': 'and we can all, as a community, learn from this together.', 'start': 14026.816, 'duration': 2.863}], 'summary': 'Unsupervised learning algorithm predicts three categories fairly well without labels.', 'duration': 136.882, 'max_score': 13892.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg13892797.jpg'}, {'end': 14017.628, 'src': 'embed', 'start': 13991.658, 'weight': 0, 'content': [{'end': 13996.381, 'text': 'So it does struggle a little bit with where they overlap,', 'start': 13991.658, 'duration': 4.723}, {'end': 14006.688, 'text': 'but for the most part our algorithm is able to find the three different categories and do a fairly good job at predicting them without any information from us.', 'start': 13996.381, 'duration': 10.307}, {'end': 14009.81, 'text': "We haven't given our algorithm any labels.", 'start': 14006.768, 'duration': 3.042}, {'end': 14012.103, 'text': "So that's the gist of unsupervised learning.", 'start': 14010.481, 'duration': 1.622}, {'end': 14014.785, 'text': 'I hope you guys enjoyed this course.', 'start': 14012.963, 'duration': 1.822}, {'end': 14017.628, 'text': 'I hope a lot of these examples made sense.', 'start': 14014.985, 'duration': 2.643}], 'summary': 'Algorithm predicts three categories fairly well without labels. demonstrates unsupervised learning.', 'duration': 25.97, 'max_score': 13991.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg13991658.jpg'}], 'start': 12833.221, 'title': 'Unsupervised learning and dimensionality reduction', 'summary': 'Covers implementing unsupervised learning on a seeds dataset, applying k-means clustering and pca to reduce dimensions from seven to two, and evaluating clustering performance on a dataset with 210 samples, achieving good separation and prediction of three categories.', 'chapters': [{'end': 13130.573, 'start': 12833.221, 'title': 'Implementing unsupervised learning on seeds data', 'summary': 'Covers implementing unsupervised learning on a seeds dataset containing geometric parameters of wheat kernels to cluster different wheat varieties, using pandas and seaborn to visualize the data.', 'duration': 297.352, 'highlights': ['The chapter covers implementing unsupervised learning on a seeds dataset containing geometric parameters of wheat kernels to cluster different wheat varieties', 'Using pandas to import the seeds dataset and visualize the data using Seaborn']}, {'end': 13648.546, 'start': 13131.693, 'title': 'K-means clustering visualization', 'summary': 'Demonstrates the implementation of k-means clustering with a specific focus on visualizing the clusters in 2d and higher dimensions, showcasing the effectiveness of the clustering algorithm.', 'duration': 516.853, 'highlights': ['The chapter introduces k-means clustering and demonstrates its application on a dataset with three classes, illustrating the separation of classes based on the area, perimeter, and compactness, indicating the initial similarity among some classes.', 'The implementation of k-means clustering with 2D visualization showcases the distinct clustering of classes based on compactness and asymmetry, highlighting the effectiveness of the algorithm in identifying the separability of classes in the dataset.', "The visualization of k-means clustering in higher dimensions emphasizes the improved assessment of class separability, indicating the algorithm's capability to accurately identify and distinguish between different groups within the dataset."]}, {'end': 14031.881, 'start': 13650.157, 'title': 'Unsupervised learning and dimensionality reduction', 'summary': 'Discusses the application of k-means clustering and principal component analysis (pca) in unsupervised learning, reducing the dimensions from seven to two, and evaluating the clustering performance on a dataset with 210 samples and seven features, achieving a fairly good separation and prediction of three different categories.', 'duration': 381.724, 'highlights': ['The chapter demonstrates the application of k-means clustering in unsupervised learning, evaluating the clustering performance on a dataset with 210 samples and seven features, and achieving a fairly good separation and prediction of three different categories.', 'The chapter explains the process of reducing dimensions from seven to two using principal component analysis (PCA), mapping the data into a lower dimension, and visualizing the transformed data points in a two-dimensional representation.', 'The chapter discusses the evaluation of the clustering performance on the transformed data using PCA, demonstrating a fairly good separation and prediction of three different categories, with some struggle in overlapping areas but overall effective prediction without any labeled information.']}], 'duration': 1198.66, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i_LwzRVP7bg/pics/i_LwzRVP7bg12833221.jpg', 'highlights': ['The chapter demonstrates the application of k-means clustering in unsupervised learning, evaluating the clustering performance on a dataset with 210 samples and seven features, and achieving a fairly good separation and prediction of three different categories.', 'The implementation of k-means clustering with 2D visualization showcases the distinct clustering of classes based on compactness and asymmetry, highlighting the effectiveness of the algorithm in identifying the separability of classes in the dataset.', 'The chapter explains the process of reducing dimensions from seven to two using principal component analysis (PCA), mapping the data into a lower dimension, and visualizing the transformed data points in a two-dimensional representation.', 'Using pandas to import the seeds dataset and visualize the data using Seaborn', 'The chapter covers implementing unsupervised learning on a seeds dataset containing geometric parameters of wheat kernels to cluster different wheat varieties']}], 'highlights': ['Achieved 96.4% probability in probabilistic classification', 'Demonstrated k nearest neighbors with 82% accuracy', 'SVM model achieved 87% dataset separation accuracy', 'Achieved an accuracy of around 88% with 64 nodes', 'R squared value improved from 0.4 to 0.52 after implementing multiple linear regression', 'Demonstrated simple linear regression with R-squared score of around 0.38', 'Explained k-means clustering algorithm for unsupervised learning', 'Demonstrated the application of k-means clustering in unsupervised learning', 'Explained PCA for dimensionality reduction']}