Coursnap

title
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Training | Edureka

description
** Python for Data Science: https://www.edureka.co/data-science-python-certification-course ** This Edureka video on KNN Algorithm will help you to build your base by covering the theoretical, mathematical and implementation part of the KNN algorithm in Python. Topics covered under this video includes: 1. What is KNN Algorithm? 2. Industrial Use case of KNN Algorithm 3. How things are predicted using KNN Algorithm 4. How to choose the value of K? 5. KNN Algorithm Using Python 6. Implementation of KNN Algorithm from scratch Check out our playlist for more videos: http://bit.ly/2taym8X Subscribe to our channel to get video updates. Hit the subscribe button above. #KNNAlgorithm #MachineLearningUsingPython #MachineLearningTraining How it Works? 1. This is a 5 Week Instructor led Online Course,40 hours of assignment and 20 hours of project work 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. At the end of the training you will be working on a real time project for which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - - - - About the Course Edureka’s Machine Learning Course using Python is designed to make you grab the concepts of Machine Learning. The Machine Learning training will provide deep understanding of Machine Learning and its mechanism. As a Data Scientist, you will be learning the importance of Machine Learning and its implementation in python programming language. Furthermore, you will be taught Reinforcement Learning which in turn is an important aspect of Artificial Intelligence. You will be able to automate real life scenarios using Machine Learning Algorithms. Towards the end of the course, we will be discussing various practical use cases of Machine Learning in python programming language to enhance your learning experience. After completing this Machine Learning Certification Training using Python, you should be able to: Gain insight into the 'Roles' played by a Machine Learning Engineer Automate data analysis using python Describe Machine Learning Work with real-time data Learn tools and techniques for predictive modeling Discuss Machine Learning algorithms and their implementation Validate Machine Learning algorithms Explain Time Series and it’s related concepts Gain expertise to handle business in future, living the present - - - - - - - - - - - - - - - - - - - Why learn Machine Learning with Python? Data Science is a set of techniques that enables the computers to learn the desired behavior from data without explicitly being programmed. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science. This course exposes you to different classes of machine learning algorithms like supervised, unsupervised and reinforcement algorithms. This course imparts you the necessary skills like data pre-processing, dimensional reduction, model evaluation and also exposes you to different machine learning algorithms like regression, clustering, decision trees, random forest, Naive Bayes and Q-Learning. For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free). Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

detail
{'title': 'KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Training | Edureka', 'heatmap': [{'end': 134.351, 'start': 93.988, 'weight': 0.737}, {'end': 491.452, 'start': 459.271, 'weight': 0.742}, {'end': 560.574, 'start': 512.924, 'weight': 0.701}, {'end': 621.669, 'start': 571.222, 'weight': 0.796}, {'end': 775.013, 'start': 760.326, 'weight': 0.955}], 'summary': "Introduces the significance of knn algorithm, which contributes to over 35% of amazon's revenue, and highlights its industrial applications for concept search and t-shirt size prediction, while also covering its implementation using the iris dataset with 97 training data set and 53 test data set, achieving an accuracy rate of 97.29%.", 'chapters': [{'end': 589.41, 'segs': [{'end': 79.16, 'src': 'embed', 'start': 45.727, 'weight': 7, 'content': [{'end': 49.408, 'text': 'We will drill down to the working of algorithm in depth while learning the algorithm.', 'start': 45.727, 'duration': 3.681}, {'end': 54.41, 'text': 'You will also understand the significance of K or what does this case stands for in the nearest neighbor algorithm.', 'start': 49.488, 'duration': 4.922}, {'end': 59.152, 'text': "and then we'll see how the prediction is made using Canon algorithm manually or mathematically.", 'start': 54.97, 'duration': 4.182}, {'end': 59.712, 'text': 'All right.', 'start': 59.472, 'duration': 0.24}, {'end': 64.494, 'text': 'Now, once we are done with that theoretical concept, will start the practical or the demo session,', 'start': 60.112, 'duration': 4.382}, {'end': 67.255, 'text': "where we'll learn how to implement Canon algorithm using python.", 'start': 64.494, 'duration': 2.761}, {'end': 68.696, 'text': "So let's start our session.", 'start': 67.715, 'duration': 0.981}, {'end': 71.777, 'text': 'So, starting with what is Canon algorithm?', 'start': 69.516, 'duration': 2.261}, {'end': 79.16, 'text': 'Well, k nearest neighbor is a simple algorithm that stores all the available cases and classify the new data or case based on a similarity measure.', 'start': 72.337, 'duration': 6.823}], 'summary': 'Learn about k-nearest neighbor algorithm and its implementation in python.', 'duration': 33.433, 'max_score': 45.727, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE45727.jpg'}, {'end': 139.589, 'src': 'heatmap', 'start': 93.988, 'weight': 1, 'content': [{'end': 98.151, 'text': "Well in general K&N is used in search application where you're looking for similar items.", 'start': 93.988, 'duration': 4.163}, {'end': 104.095, 'text': 'That is when your task is some form of fine items similar to this one, then you call the search as a K&N search.', 'start': 98.611, 'duration': 5.484}, {'end': 112.414, 'text': 'But what is this cane K&N? Well, this K denotes the number of nearest neighbor which are voting class of the new data or the testing data.', 'start': 104.916, 'duration': 7.498}, {'end': 118.699, 'text': 'For example, if K equal 1 then the testing data are given the same label as the closest example in the training set.', 'start': 112.754, 'duration': 5.945}, {'end': 126.445, 'text': 'Similarly, if K equal 3 the labels of the three closest classes are checked and the most common label is assigned to the testing data.', 'start': 119.219, 'duration': 7.226}, {'end': 129.267, 'text': 'So this is what a KN KN algorithm means.', 'start': 126.965, 'duration': 2.302}, {'end': 130.668, 'text': 'So moving on ahead.', 'start': 129.887, 'duration': 0.781}, {'end': 134.351, 'text': "Let's see some of the example of scenarios where KNN is used in the industry.", 'start': 130.848, 'duration': 3.503}, {'end': 139.589, 'text': "So let's see the industrial application of Canon algorithm starting with recommender system.", 'start': 135.228, 'duration': 4.361}], 'summary': 'K&n algorithm is used in search applications to find similar items, with k denoting the number of nearest neighbors for classification.', 'duration': 40.978, 'max_score': 93.988, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE93988.jpg'}, {'end': 182.877, 'src': 'embed', 'start': 155.873, 'weight': 0, 'content': [{'end': 161.834, 'text': 'this Canon algorithm applies to recommending products like an Amazon, or for recommending media like in case of Netflix,', 'start': 155.873, 'duration': 5.961}, {'end': 164.455, 'text': 'or even for recommending advertisement to display to a user.', 'start': 161.834, 'duration': 2.621}, {'end': 169.16, 'text': 'If I am not wrong, almost all of you must have used Amazon for shopping right?', 'start': 165.055, 'duration': 4.105}, {'end': 175.007, 'text': 'So, just to tell you, more than 35% of amazon.com revenue is generated by its recommendation engine.', 'start': 169.381, 'duration': 5.626}, {'end': 182.877, 'text': "So what's their strategy Amazon uses recommendation as a targeted marketing tool in both the email campaigns around most of its website pages.", 'start': 175.548, 'duration': 7.329}], 'summary': "Canon algorithm drives over 35% of amazon's revenue through targeted recommendations.", 'duration': 27.004, 'max_score': 155.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE155873.jpg'}, {'end': 232.745, 'src': 'embed', 'start': 210.939, 'weight': 3, 'content': [{'end': 219.586, 'text': 'So next industrial application of Canon algorithm is concept search or searching semantically similar documents and classifying documents containing similar topics.', 'start': 210.939, 'duration': 8.647}, {'end': 227.613, 'text': 'So, as you know, the data on the internet is increasing exponentially every single second, their billions and billions of documents on the internet.', 'start': 220.267, 'duration': 7.346}, {'end': 232.745, 'text': 'Each document on the internet contains multiple Concepts that could be a potential Concept.', 'start': 228.24, 'duration': 4.505}], 'summary': 'Canon algorithm for concept search in billions of internet documents.', 'duration': 21.806, 'max_score': 210.939, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE210939.jpg'}, {'end': 403.355, 'src': 'embed', 'start': 379.644, 'weight': 6, 'content': [{'end': 387.266, 'text': 'or you could divide up your data and use something like cross validation technique to test several values of K in order to determine which works best for your data.', 'start': 379.644, 'duration': 7.622}, {'end': 395.828, 'text': 'For example, if n equal thousand cases then in that case the optimal value of K lies somewhere in between 1 to 19, but is unless you try it.', 'start': 387.466, 'duration': 8.362}, {'end': 396.828, 'text': 'You cannot be sure of it.', 'start': 395.908, 'duration': 0.92}, {'end': 400.353, 'text': 'So, you know how the algorithm is working on a higher level.', 'start': 397.831, 'duration': 2.522}, {'end': 403.355, 'text': "Let's move on and see how things are predicted using Canon algorithm.", 'start': 400.533, 'duration': 2.822}], 'summary': 'Use cross-validation to test k values and optimize algorithm performance.', 'duration': 23.711, 'max_score': 379.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE379644.jpg'}, {'end': 491.452, 'src': 'heatmap', 'start': 459.271, 'weight': 0.742, 'content': [{'end': 465.514, 'text': 'Well, this Manhattan distance is used to calculate the distance between real vector using the sum of their absolute difference.', 'start': 459.271, 'duration': 6.243}, {'end': 474.638, 'text': 'in this case, the Manhattan distance between the point P 1 and P 2 is mod of 5 minus 1, plus mod value of 4 minus 1, which results to 3 plus 4,', 'start': 465.514, 'duration': 9.124}, {'end': 476.624, 'text': 'that is, 7..', 'start': 474.638, 'duration': 1.986}, {'end': 481.587, 'text': 'So this slide shows the difference between Euclidean and Manhattan distance from point A to point B.', 'start': 476.624, 'duration': 4.963}, {'end': 486.289, 'text': 'So Euclidean distance is nothing but the direct or the least possible distance between A and B.', 'start': 481.587, 'duration': 4.702}, {'end': 491.452, 'text': 'Whereas the Manhattan distance is a distance between A and B measured along the axis at right angle.', 'start': 486.289, 'duration': 5.163}], 'summary': 'Manhattan distance measures 7 between points p1 and p2, with euclidean distance representing the direct distance.', 'duration': 32.181, 'max_score': 459.271, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE459271.jpg'}, {'end': 560.574, 'src': 'heatmap', 'start': 512.924, 'weight': 0.701, 'content': [{'end': 515.527, 'text': 'So for this will be using the KNN algorithm.', 'start': 512.924, 'duration': 2.603}, {'end': 517.749, 'text': 'So the very first thing what we need to do.', 'start': 516.188, 'duration': 1.561}, {'end': 519.97, 'text': 'We need to calculate the Euclidean distance.', 'start': 517.989, 'duration': 1.981}, {'end': 524.993, 'text': 'So now that you have a new data of height 161 centimeter and weight as 61 kg.', 'start': 520.59, 'duration': 4.403}, {'end': 528.716, 'text': 'So the very first thing that will do is will calculate the Euclidean distance.', 'start': 525.574, 'duration': 3.142}, {'end': 538.211, 'text': 'which is nothing but the square root of 161 minus 158 whole square plus 61 minus 58 whole square and square root of that is 4.24.', 'start': 529.304, 'duration': 8.907}, {'end': 539.412, 'text': "Let's drag and drop it.", 'start': 538.211, 'duration': 1.201}, {'end': 542.294, 'text': 'So these are the various Euclidean distance of other points.', 'start': 539.832, 'duration': 2.462}, {'end': 544.635, 'text': "Now let's suppose K equal to 5.", 'start': 543.114, 'duration': 1.521}, {'end': 545.976, 'text': 'then the algorithm what it does.', 'start': 544.635, 'duration': 1.341}, {'end': 552.121, 'text': 'it searches for the five customer closest to the new customer that is more similar to the new data in terms of its attribute.', 'start': 545.976, 'duration': 6.145}, {'end': 553.589, 'text': 'For K equal 5.', 'start': 552.668, 'duration': 0.921}, {'end': 555.69, 'text': "Let's find the top 5 minimum Euclidean distance.", 'start': 553.589, 'duration': 2.101}, {'end': 560.574, 'text': 'So these are the distance which we are going to use 1 2 3 4 and 5.', 'start': 556.131, 'duration': 4.443}], 'summary': 'Using knn algorithm to find 5 closest customers based on attributes.', 'duration': 47.65, 'max_score': 512.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE512924.jpg'}], 'start': 7.509, 'title': 'Knn and canon algorithms', 'summary': "Introduces the k nearest neighbor (knn) algorithm and its significance, including its impact on amazon's revenue, contributing to over 35% of its revenue. it also explains the industrial applications of the canon algorithm for concept search, highlighting the importance of k value selection and the calculation of euclidean and manhattan distances, with a practical example of predicting t-shirt size using knn algorithm.", 'chapters': [{'end': 209.977, 'start': 7.509, 'title': 'Understanding knn algorithm', 'summary': "Introduces the k nearest neighbor algorithm (knn), explaining its working, significance of k, industry applications, and the impact of knn on amazon's revenue, with knn contributing to over 35% of its revenue.", 'duration': 202.468, 'highlights': ['KNN contributes to over 35% of amazon.com revenue, used for targeted marketing and increasing average order value.', 'KNN is used in recommender systems for suggesting relevant products, generating revenue and increasing upsell and cross-sell opportunities.', 'KNN algorithm searches through the entire training data set for K most similar instances to make predictions for unseen data.', 'KNN is applied in various industries such as e-commerce (Amazon), media (Netflix), and advertising for recommending products and content to users based on their preferences.', 'The K in KNN denotes the number of nearest neighbors which vote for the class of the new or testing data, determining the label assignment for the testing data.']}, {'end': 589.41, 'start': 210.939, 'title': 'Canon algorithm for concept search', 'summary': 'Explains the industrial applications of canon algorithm for concept search, including classifying documents and predicting classes using knn algorithm, highlighting the importance of k value selection and the calculation of euclidean and manhattan distances, with a practical example of predicting t-shirt size using knn algorithm.', 'duration': 378.471, 'highlights': ['The chapter explains the industrial applications of Canon algorithm for concept search, including classifying documents and predicting classes using KNN algorithm, highlighting the importance of K value selection and the calculation of Euclidean and Manhattan distances, with a practical example of predicting t-shirt size using KNN algorithm.', 'The data on the internet is increasing exponentially every single second, with billions of documents containing multiple potential concepts, leading to the need for extracting concepts from vast amounts of data using Canon algorithm.', 'The Canon algorithm has various use cases, such as concept search, classifying documents, handwriting detection, image recognition, and video recognition, demonstrating its versatility in industrial applications.', 'The significance of the K value in KNN algorithm is highlighted, emphasizing the need to choose the optimal value of K through techniques like cross-validation, with an example indicating the range of optimal K values for a specific dataset.', 'The chapter provides a detailed explanation of how a Canon algorithm works by predicting classes using KNN algorithm, emphasizing the selection of the number of nearest neighbors (K) and the process of determining the class based on majority voting of the nearest neighbors.', 'The calculation and significance of Euclidean and Manhattan distances in KNN algorithm are explained, with practical examples demonstrating the application of these distance measures in predicting classes, such as using Euclidean distance to calculate the similarity between a new customer and existing customers for predicting t-shirt size.', 'A practical example of predicting t-shirt size using KNN algorithm is provided, demonstrating the process of calculating Euclidean distances of new data points and identifying the nearest neighbors to make predictions based on attribute similarity.', 'The practical example of predicting t-shirt size using KNN algorithm showcases the process of selecting the number of nearest neighbors (K), ranking the minimum Euclidean distances, and making predictions based on the majority class of the nearest neighbors.']}], 'duration': 581.901, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE7509.jpg', 'highlights': ['KNN contributes to over 35% of amazon.com revenue, used for targeted marketing and increasing average order value.', 'KNN is used in recommender systems for suggesting relevant products, generating revenue and increasing upsell and cross-sell opportunities.', 'The K in KNN denotes the number of nearest neighbors which vote for the class of the new or testing data, determining the label assignment for the testing data.', 'The chapter explains the industrial applications of Canon algorithm for concept search, including classifying documents and predicting classes using KNN algorithm, highlighting the importance of K value selection and the calculation of Euclidean and Manhattan distances, with a practical example of predicting t-shirt size using KNN algorithm.', 'The data on the internet is increasing exponentially every single second, with billions of documents containing multiple potential concepts, leading to the need for extracting concepts from vast amounts of data using Canon algorithm.', 'The Canon algorithm has various use cases, such as concept search, classifying documents, handwriting detection, image recognition, and video recognition, demonstrating its versatility in industrial applications.', 'The significance of the K value in KNN algorithm is highlighted, emphasizing the need to choose the optimal value of K through techniques like cross-validation, with an example indicating the range of optimal K values for a specific dataset.', 'The chapter provides a detailed explanation of how a Canon algorithm works by predicting classes using KNN algorithm, emphasizing the selection of the number of nearest neighbors (K) and the process of determining the class based on majority voting of the nearest neighbors.']}, {'end': 803.392, 'segs': [{'end': 617.121, 'src': 'embed', 'start': 589.47, 'weight': 1, 'content': [{'end': 594.132, 'text': 'But before we drill down to the coding part, let me just tell you why people call KNN as a lazy learner.', 'start': 589.47, 'duration': 4.662}, {'end': 599.896, 'text': "Well Canon for classification is a very simple algorithm, but that's not why they are called lazy.", 'start': 594.835, 'duration': 5.061}, {'end': 604.498, 'text': "Canon is a lazy learner because it doesn't have a discriminative function from the training data.", 'start': 600.176, 'duration': 4.322}, {'end': 606.938, 'text': 'But what it does it memorizes the training data.', 'start': 604.698, 'duration': 2.24}, {'end': 612.18, 'text': 'There is no learning phase of the model and all of the work happens at the time of prediction is requested.', 'start': 607.258, 'duration': 4.922}, {'end': 617.121, 'text': "So as such there's the reason why Canon is often referred to as lazy learning algorithm.", 'start': 612.34, 'duration': 4.781}], 'summary': 'Knn is called a lazy learner as it memorizes training data and has no learning phase.', 'duration': 27.651, 'max_score': 589.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE589470.jpg'}, {'end': 655.324, 'src': 'embed', 'start': 626.111, 'weight': 0, 'content': [{'end': 628.692, 'text': 'So this data set consists of 150 observation.', 'start': 626.111, 'duration': 2.581}, {'end': 631.353, 'text': 'We have four features and one class label.', 'start': 628.932, 'duration': 2.421}, {'end': 635.354, 'text': 'the four features include the sepal length, sepal width, petal length and the petal width,', 'start': 631.353, 'duration': 4.001}, {'end': 638.375, 'text': 'whereas the class label decides which flower belongs to which category.', 'start': 635.354, 'duration': 3.021}, {'end': 642.001, 'text': 'So this was the description of the data set which we are using.', 'start': 639.46, 'duration': 2.541}, {'end': 646.142, 'text': "now let's move on and see what are the step-by-step solution to perform a KNN algorithm.", 'start': 642.001, 'duration': 4.141}, {'end': 649.823, 'text': "So first we'll start by handling the data what we have to do.", 'start': 646.622, 'duration': 3.201}, {'end': 655.324, 'text': 'We have to open the data set from the CSV format and split the data set into train and test part.', 'start': 649.883, 'duration': 5.441}], 'summary': 'Data set: 150 observations, 4 features, 1 class label. steps for knn algorithm.', 'duration': 29.213, 'max_score': 626.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE626111.jpg'}, {'end': 692.394, 'src': 'embed', 'start': 660.465, 'weight': 2, 'content': [{'end': 666.387, 'text': "Once we calculate the distance next we'll look for the neighbor and select K neighbors which are having the least distance from a new point.", 'start': 660.465, 'duration': 5.922}, {'end': 671.216, 'text': "Now once we get our neighbor then we'll generate a response from a set of data instances.", 'start': 667.193, 'duration': 4.023}, {'end': 676.16, 'text': 'So this will decide whether the new point belongs to class A or class B.', 'start': 671.957, 'duration': 4.203}, {'end': 680.624, 'text': "Finally, we'll create the accuracy function and in the end we'll tie it all together in the main function.", 'start': 676.16, 'duration': 4.464}, {'end': 685.228, 'text': "So let's start with our code for implementing KNN algorithm using Python.", 'start': 681.545, 'duration': 3.683}, {'end': 688.471, 'text': "I'll be using Jupiter notebook Python 3.0 installed on it.", 'start': 685.328, 'duration': 3.143}, {'end': 692.394, 'text': "Now, let's move on and see how KNN algorithm can be implemented using Python.", 'start': 689.011, 'duration': 3.383}], 'summary': 'Implementing knn algorithm using python to classify data instances into classes a or b.', 'duration': 31.929, 'max_score': 660.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE660465.jpg'}, {'end': 776.733, 'src': 'heatmap', 'start': 745.015, 'weight': 4, 'content': [{'end': 750.539, 'text': 'A ratio of 67 is to 33 for test is to train as a standard ratio, which is used for this purpose.', 'start': 745.015, 'duration': 5.524}, {'end': 759.546, 'text': "So let's Define a function as load data set that loads a CSV with a provided file name and split it randomly into training and test data set using the provided split ratio.", 'start': 751.02, 'duration': 8.526}, {'end': 765.991, 'text': 'So this is our function load data set which is using file name split ratio training data set and testing data set as its input.', 'start': 760.326, 'duration': 5.665}, {'end': 769.133, 'text': "All right, so let's execute the run button and check for any errors.", 'start': 766.511, 'duration': 2.622}, {'end': 771.475, 'text': "So it's executed with zero errors.", 'start': 769.793, 'duration': 1.682}, {'end': 773.176, 'text': "Let's test this function.", 'start': 772.195, 'duration': 0.981}, {'end': 775.013, 'text': 'so this is our training set.', 'start': 773.872, 'duration': 1.141}, {'end': 776.733, 'text': 'testing set, load data set.', 'start': 775.013, 'duration': 1.72}], 'summary': 'A function was defined to load a dataset, splitting it into 67% training and 33% testing data, executed without errors.', 'duration': 31.718, 'max_score': 745.015, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE745015.jpg'}, {'end': 809.478, 'src': 'embed', 'start': 785.237, 'weight': 3, 'content': [{'end': 791.16, 'text': "let's see what our training data set and test data set it's dividing into, so it's giving a count of training data set and testing data set.", 'start': 785.237, 'duration': 5.923}, {'end': 797.286, 'text': 'The total number of training data set has split into is 97 and total number of test data set we have is 53..', 'start': 791.52, 'duration': 5.766}, {'end': 803.152, 'text': 'So total number of training data set we have here is 97 and total number of test data set we have here is 53.', 'start': 797.286, 'duration': 5.866}, {'end': 803.392, 'text': 'All right.', 'start': 803.152, 'duration': 0.24}, {'end': 806.455, 'text': 'Okay, so our function load data set is performing well.', 'start': 803.572, 'duration': 2.883}, {'end': 809.478, 'text': "So let's move ahead to step 2 which is similarity.", 'start': 806.995, 'duration': 2.483}], 'summary': "Training data set: 97, test data set: 53. function 'load dataset' performing well.", 'duration': 24.241, 'max_score': 785.237, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE785237.jpg'}], 'start': 589.47, 'title': 'Knn algorithm implementation and python', 'summary': 'Covers the theoretical session of knn algorithm, its implementation using the iris dataset with 150 observations, 4 features, and 1 class label, and implementing knn algorithm in python using jupiter notebook, with 97 training data set and 53 test data set.', 'chapters': [{'end': 676.16, 'start': 589.47, 'title': 'Knn algorithm implementation with iris dataset', 'summary': 'Covers the theoretical session of knn algorithm, its lazy learning nature, and describes the practical implementation using the iris dataset with 150 observations, 4 features, and 1 class label.', 'duration': 86.69, 'highlights': ['The iris dataset consists of 150 observations, 4 features (sepal length, sepal width, petal length, petal width), and 1 class label, determining the category of the flower.', 'KNN is considered a lazy learner as it memorizes the training data and performs all the work at the time of prediction, without a separate learning phase.', 'The algorithm for KNN implementation involves handling the dataset, calculating the distance between data instances, selecting K neighbors with the least distance, and generating a response to determine the class of the new point.']}, {'end': 803.392, 'start': 676.16, 'title': 'Implementing knn algorithm in python', 'summary': 'Covers implementing knn algorithm in python using jupiter notebook, loading and splitting a csv data set into a 67:33 ratio for training and testing, with a total of 97 training data set and 53 test data set.', 'duration': 127.232, 'highlights': ['Loading and splitting the data set into a 67:33 ratio for training and testing, resulting in 97 training data set and 53 test data set.', 'Executing the function to load the data set with a split ratio of 0.66 and verifying the count of training and testing data set with 97 and 53 respectively.', 'Creating the accuracy function and tying it all together in the main function for implementing the KNN algorithm in Python.']}], 'duration': 213.922, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE589470.jpg', 'highlights': ['KNN algorithm uses 150 observations, 4 features, and 1 class label', 'KNN is a lazy learner, memorizing training data and performing prediction without a separate learning phase', 'Algorithm for KNN involves handling dataset, calculating distances, selecting K neighbors, and generating response', 'Data set split into 67:33 ratio, resulting in 97 training data set and 53 test data set', 'Function executed to load data set with split ratio of 0.66, resulting in 97 training and 53 testing data', 'Accuracy function created and integrated into main function for KNN algorithm implementation in Python']}, {'end': 1191.142, 'segs': [{'end': 912.846, 'src': 'embed', 'start': 878.247, 'weight': 1, 'content': [{'end': 879.568, 'text': "Let's test the function.", 'start': 878.247, 'duration': 1.321}, {'end': 887.434, 'text': 'suppose the data one or the first instance consists of data point as 2, 2, 2 and it belongs to class a, and data to consist of 4, 4,', 'start': 879.568, 'duration': 7.866}, {'end': 889.356, 'text': '4 and it belongs to class B.', 'start': 887.434, 'duration': 1.922}, {'end': 896.502, 'text': 'So when we calculate the Euclidean distance of data 1 to data 2 and what we have to do we have to consider only first three features of them.', 'start': 889.356, 'duration': 7.146}, {'end': 898.284, 'text': "All right, so let's print the distance.", 'start': 896.843, 'duration': 1.441}, {'end': 901.323, 'text': 'As you can see here the distance comes out to be 3.464.', 'start': 899.082, 'duration': 2.241}, {'end': 905.244, 'text': 'All right, so this is nothing but the square root of 4 minus 2 whole square.', 'start': 901.323, 'duration': 3.921}, {'end': 912.846, 'text': 'So this distance is nothing but the Euclidean distance and it is calculated as square root of 4 minus 2 whole square, plus 4 minus 2 whole square.', 'start': 905.884, 'duration': 6.962}], 'summary': 'Tested function calculates euclidean distance as 3.464.', 'duration': 34.599, 'max_score': 878.247, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE878247.jpg'}, {'end': 981.76, 'src': 'embed', 'start': 952.822, 'weight': 3, 'content': [{'end': 959.827, 'text': 'So this is how our get neighbors function look like it takes training data set and test instance and K as its input here.', 'start': 952.822, 'duration': 7.005}, {'end': 962.589, 'text': 'The K is nothing but the number of nearest neighbor you want to check for.', 'start': 959.847, 'duration': 2.742}, {'end': 963.07, 'text': 'All right.', 'start': 962.869, 'duration': 0.201}, {'end': 969.374, 'text': "So basically what you'll be getting from this get neighbors function is K different points having least Euclidean distance from the test instance.", 'start': 963.55, 'duration': 5.824}, {'end': 970.835, 'text': "All right, let's execute it.", 'start': 969.614, 'duration': 1.221}, {'end': 972.977, 'text': 'So the function executed without any errors.', 'start': 971.075, 'duration': 1.902}, {'end': 974.844, 'text': "So let's test our function.", 'start': 973.723, 'duration': 1.121}, {'end': 981.76, 'text': 'So suppose the training data set includes the data like 2, 2, 2 and it belongs to Class A and other data includes 4, 4,', 'start': 975.144, 'duration': 6.616}], 'summary': 'Function finds k nearest neighbors for test instance.', 'duration': 28.938, 'max_score': 952.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE952822.jpg'}, {'end': 1037.157, 'src': 'embed', 'start': 1010.815, 'weight': 2, 'content': [{'end': 1017.036, 'text': 'Now once you are located the most similar neighbor for a test instance next task is to predict a response based on those neighbors.', 'start': 1010.815, 'duration': 6.221}, {'end': 1024.917, 'text': 'So how we can do that? Well, we can do this by allowing each neighbor to vote for the class attribute and take the majority vote as a prediction.', 'start': 1017.457, 'duration': 7.46}, {'end': 1026.438, 'text': "Let's see how we can do that.", 'start': 1025.298, 'duration': 1.14}, {'end': 1029.759, 'text': 'So we have a function as get response which takes neighbors as the input.', 'start': 1026.799, 'duration': 2.96}, {'end': 1033.46, 'text': 'Well, this neighbor was nothing but the output of this get neighbor function.', 'start': 1030.319, 'duration': 3.141}, {'end': 1037.157, 'text': 'The output of get neighbor function will be fed to get response.', 'start': 1034.342, 'duration': 2.815}], 'summary': 'Predict response based on majority vote from neighbors.', 'duration': 26.342, 'max_score': 1010.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE1010815.jpg'}, {'end': 1155.192, 'src': 'embed', 'start': 1123.699, 'weight': 0, 'content': [{'end': 1127.082, 'text': 'all right, so the ratio will be two by three, which is nothing but 66.66.', 'start': 1123.699, 'duration': 3.383}, {'end': 1131.181, 'text': 'so our accuracy rate is 66.66.', 'start': 1127.082, 'duration': 4.099}, {'end': 1136.728, 'text': "So, now that you have created all the function that are required for Canon algorithm, let's compile them into one single main function.", 'start': 1131.181, 'duration': 5.547}, {'end': 1144.779, 'text': "All right, so there's our main function and we are using Iris data set with a split of 0.67 and the value of K is 3.", 'start': 1137.249, 'duration': 7.53}, {'end': 1145.079, 'text': "Let's see.", 'start': 1144.779, 'duration': 0.3}, {'end': 1146.681, 'text': 'What is the accuracy score of this??', 'start': 1145.119, 'duration': 1.562}, {'end': 1149.408, 'text': 'check how accurate our model is.', 'start': 1147.347, 'duration': 2.061}, {'end': 1155.192, 'text': 'so in training data set we have 113 values and in the test data set we have 37 values.', 'start': 1149.408, 'duration': 5.784}], 'summary': 'The accuracy rate for the canon algorithm is 66.66% with 113 values in the training dataset and 37 values in the test dataset.', 'duration': 31.493, 'max_score': 1123.699, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE1123699.jpg'}], 'start': 803.572, 'title': 'Euclidean distance calculation and k nearest neighbors algorithm', 'summary': 'Discusses calculating euclidean distance and k nearest neighbors, achieving a distance of 3.464 and an accuracy rate of 97.29% with specific examples and data instances.', 'chapters': [{'end': 920.228, 'start': 803.572, 'title': 'Euclidean distance calculation', 'summary': 'Discusses calculating euclidean distance to determine similarity between data instances, limiting the distance calculation to specific attributes, and executing the euclidean function successfully, with a specific example resulting in a distance of 3.464.', 'duration': 116.656, 'highlights': ['The chapter discusses calculating Euclidean distance to determine similarity between data instances, limiting the distance calculation to specific attributes, and executing the Euclidean function successfully, with a specific example resulting in a distance of 3.464.', 'The Euclidean distance function is defined to take instance 1, instance 2, and length as parameters, where the length denotes how many attributes to include, and it executes successfully without any errors.', 'The Euclidean distance between two data instances with the first three features considered results in a calculated distance of 3.464, demonstrating the application of the Euclidean distance measure in a specific scenario.']}, {'end': 1191.142, 'start': 920.228, 'title': 'K nearest neighbors algorithm', 'summary': 'Explains the process of calculating k nearest neighbors, predicting responses based on those neighbors, and evaluating the accuracy of the knn model, achieving an accuracy rate of 97.29% using the iris dataset with a split of 0.67 and k value of 3.', 'duration': 270.914, 'highlights': ["The accuracy rate achieved using the Iris dataset with a split of 0.67 and K value of 3 is 97.29%. The model's accuracy rate is quantified, indicating the effectiveness of the KNN algorithm using specific dataset parameters.", "The process of predicting responses based on the nearest neighbors is explained, utilizing a 'get response' function that allows each neighbor to vote for the class attribute and takes the majority vote as a prediction. The 'get response' function's role in predicting responses based on neighbors is described, emphasizing the majority vote mechanism for prediction.", "The concept of calculating K nearest neighbors and selecting the K most similar neighbors from the training set for a given test instance is detailed, with an example demonstrating the prediction of a test instance's class based on its nearest neighbor. The step-by-step process of calculating K nearest neighbors and using them to predict the class of a test instance is explained, illustrating the practical application of the KNN algorithm."]}], 'duration': 387.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/6kZ-OPLNcgE/pics/6kZ-OPLNcgE803572.jpg', 'highlights': ['The accuracy rate achieved using the Iris dataset with a split of 0.67 and K value of 3 is 97.29%.', 'The Euclidean distance between two data instances with the first three features considered results in a calculated distance of 3.464.', "The process of predicting responses based on the nearest neighbors is explained, utilizing a 'get response' function that allows each neighbor to vote for the class attribute and takes the majority vote as a prediction.", "The concept of calculating K nearest neighbors and selecting the K most similar neighbors from the training set for a given test instance is detailed, with an example demonstrating the prediction of a test instance's class based on its nearest neighbor."]}], 'highlights': ['KNN contributes to over 35% of amazon.com revenue, used for targeted marketing and increasing average order value.', 'The accuracy rate achieved using the Iris dataset with a split of 0.67 and K value of 3 is 97.29%.', 'KNN is used in recommender systems for suggesting relevant products, generating revenue and increasing upsell and cross-sell opportunities.', 'The K in KNN denotes the number of nearest neighbors which vote for the class of the new or testing data, determining the label assignment for the testing data.', 'The Canon algorithm has various use cases, such as concept search, classifying documents, handwriting detection, image recognition, and video recognition, demonstrating its versatility in industrial applications.', "The process of predicting responses based on the nearest neighbors is explained, utilizing a 'get response' function that allows each neighbor to vote for the class attribute and takes the majority vote as a prediction.", 'KNN algorithm uses 150 observations, 4 features, and 1 class label', 'The data on the internet is increasing exponentially every single second, with billions of documents containing multiple potential concepts, leading to the need for extracting concepts from vast amounts of data using Canon algorithm.']}