title

K Nearest Neighbour Easily Explained with Implementation

description

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:
In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, a useful technique can be used to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.
References : Jose Portilla Project Implementation. This video is dedicated to him. thank you for serving the community
github url: https://github.com/krishnaik06/K-Nearest-Neighour
You can buy my book on Finance with ML and DL using python from the below link
amazon url: https://www.amazon.in/Hands-Python-Finance-implementing-strategies/dp/1789346371/ref=sr_1_1?keywords=Krish+naik&qid=1560843206&s=gateway&sr=8-1

detail

{'title': 'K Nearest Neighbour Easily Explained with Implementation', 'heatmap': [{'end': 489.728, 'start': 441.662, 'weight': 0.764}, {'end': 629.357, 'start': 583.77, 'weight': 0.792}, {'end': 677.466, 'start': 636.002, 'weight': 0.851}, {'end': 752.251, 'start': 722.761, 'weight': 0.796}], 'summary': 'Explains the intuition and implementation of the k nearest neighbor algorithm for classification and regression, highlighting its effectiveness in solving non-linear classified data points. it covers the process of selecting the optimal k value through iterative testing, determining a k value of 23 with 95% accuracy for model optimization, and provides guidance for applying the algorithm to various use cases on kaggle.', 'chapters': [{'end': 308.848, 'segs': [{'end': 48.948, 'src': 'embed', 'start': 22.554, 'weight': 0, 'content': [{'end': 28.757, 'text': 'k nearest neighbor is a wonderful algorithm if you want to solve for a non-linear classified data points,', 'start': 22.554, 'duration': 6.203}, {'end': 35.981, 'text': 'if that basically means that if your data point is distributed in a non-linear manner, you cannot just draw a straight line and classify those points.', 'start': 28.757, 'duration': 7.224}, {'end': 39.282, 'text': 'definitely k nearest neighbor, you can basically use it.', 'start': 35.981, 'duration': 3.301}, {'end': 44.105, 'text': 'so let us just go ahead and try to see, with the help of an example, how does a k nearest neighbor work.', 'start': 39.282, 'duration': 4.823}, {'end': 48.948, 'text': "so first of all i'm going to consider it for a classification problem statement.", 'start': 44.625, 'duration': 4.323}], 'summary': 'K nearest neighbor is useful for non-linear classification. it is effective for data distributed in a non-linear manner.', 'duration': 26.394, 'max_score': 22.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy422554.jpg'}, {'end': 108.588, 'src': 'embed', 'start': 82.537, 'weight': 2, 'content': [{'end': 87.399, 'text': 'this value basically indicates that how many nearest neighbor we are going to consider.', 'start': 82.537, 'duration': 4.862}, {'end': 94.463, 'text': 'okay, now, suppose, if i consider k is equal to 5 and suppose tomorrow, after my model is getting trained, okay,', 'start': 87.399, 'duration': 7.064}, {'end': 97.765, 'text': 'and suppose i get my new data point somewhere here.', 'start': 94.463, 'duration': 3.302}, {'end': 101.728, 'text': 'okay, my new data point is somewhere here.', 'start': 97.765, 'duration': 3.963}, {'end': 108.588, 'text': "okay, at this particular data point, what i will do is that first of all i'll just check What is my K value?", 'start': 101.728, 'duration': 6.86}], 'summary': 'The k value indicates the number of nearest neighbors to consider, with an example of k=5, and the process of utilizing it for new data points.', 'duration': 26.051, 'max_score': 82.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy482537.jpg'}, {'end': 171.996, 'src': 'embed', 'start': 145.871, 'weight': 3, 'content': [{'end': 152.072, 'text': "okay now, from this new point, what i'll do is that i have taken my k is equal to 5 as my near neighbor value.", 'start': 145.871, 'duration': 6.201}, {'end': 158.093, 'text': "so what i'm going to do is that i'm going to see that which are the five nearest points from this point by using euclidean distance.", 'start': 152.072, 'duration': 6.021}, {'end': 161.454, 'text': 'again, two distance parameters are used euclidean and manhattan.', 'start': 158.093, 'duration': 3.361}, {'end': 164.515, 'text': "we'll discuss about euclidean and manhattan very clearly.", 'start': 161.454, 'duration': 3.061}, {'end': 167.895, 'text': 'so my first point suppose this is my nearest point.', 'start': 164.515, 'duration': 3.38}, {'end': 171.996, 'text': 'this is my second nearest point, third nearest point, fourth nearest point and fifth nearest point.', 'start': 167.895, 'duration': 4.101}], 'summary': 'Using k=5 as near neighbor value, finding 5 nearest points using euclidean distance.', 'duration': 26.125, 'max_score': 145.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4145871.jpg'}, {'end': 219.492, 'src': 'embed', 'start': 191.235, 'weight': 4, 'content': [{'end': 197.819, 'text': "Now in order to find out the distance between this point, the Euclidean distance formula is basically, I'll just write it as ED.", 'start': 191.235, 'duration': 6.584}, {'end': 207.544, 'text': "I'll say it as root of x2 minus x1 whole square plus y2 minus y1 whole square.", 'start': 198.359, 'duration': 9.185}, {'end': 212.247, 'text': 'so this is the formula that we basically use for calculating the euclidean distance.', 'start': 207.544, 'duration': 4.703}, {'end': 219.492, 'text': 'similarly, manhattan distance works in a different way in manhattan distance, instead of directly calculating the point,', 'start': 212.247, 'duration': 7.245}], 'summary': 'Euclidean distance formula calculates distance between points.', 'duration': 28.257, 'max_score': 191.235, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4191235.jpg'}, {'end': 313.013, 'src': 'embed', 'start': 289.061, 'weight': 5, 'content': [{'end': 295.414, 'text': 'so what it happens is that this new point it treats it at based on the distance, whatever point it is getting,', 'start': 289.061, 'duration': 6.353}, {'end': 299.001, 'text': 'it behaves in that way and it gets classified to that particular group itself.', 'start': 295.414, 'duration': 3.587}, {'end': 301.44, 'text': 'So I hope it is pretty much clear.', 'start': 299.718, 'duration': 1.722}, {'end': 308.848, 'text': 'So, in this particular use case, whenever we get a new point and whenever we need to do a classification problem statement, what it does is that,', 'start': 301.5, 'duration': 7.348}, {'end': 313.013, 'text': 'based on the K value, it will try to find out which is the K nearest neighbor.', 'start': 308.848, 'duration': 4.165}], 'summary': 'A new point is classified based on distance and k nearest neighbors.', 'duration': 23.952, 'max_score': 289.061, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4289061.jpg'}], 'start': 1.442, 'title': 'K nearest neighbor algorithm', 'summary': 'Covers the intuition and implementation of the k nearest neighbor algorithm for both classification and regression use cases, highlighting its effectiveness in solving for non-linear classified data points. it explains the working of k nearest neighbor algorithm for a classification problem, using the example of selecting k value and calculating distance using euclidean or manhattan distance. additionally, it discusses the concept of finding five nearest neighbors using euclidean distance and manhattan distance, and classifying a new point based on the majority category among the neighbors.', 'chapters': [{'end': 39.282, 'start': 1.442, 'title': 'K nearest neighbor algorithm', 'summary': 'Covers the intuition and implementation of the k nearest neighbor algorithm for both classification and regression use cases, highlighting its effectiveness in solving for non-linear classified data points.', 'duration': 37.84, 'highlights': ['K nearest neighbor is a wonderful algorithm for solving non-linear classified data points, making it effective for data points distributed in a non-linear manner.', 'The algorithm is useful for both classification and regression use cases, providing a versatile solution for various data analysis scenarios.']}, {'end': 126.128, 'start': 39.282, 'title': 'Working of k nearest neighbor algorithm', 'summary': 'Explains the working of k nearest neighbor algorithm for a classification problem, using the example of selecting k value and calculating distance using euclidean or manhattan distance.', 'duration': 86.846, 'highlights': ['The k nearest neighbor algorithm is explained using an example of a classification problem, where the algorithm requires the selection of a k value to determine the number of nearest neighbors to consider.', 'The example discusses the calculation of distance using Euclidean distance or Manhattan distance for identifying the nearest neighbors for a new data point.']}, {'end': 308.848, 'start': 126.989, 'title': 'Nearest neighbor classification', 'summary': 'Discusses the concept of finding five nearest neighbors using euclidean distance and manhattan distance, and classifying a new point based on the majority category among the neighbors.', 'duration': 181.859, 'highlights': ['The chapter explains the process of finding the five nearest neighbors using Euclidean distance and mentions using k=5 as the near neighbor value. The speaker discusses the process of finding the five nearest neighbors using Euclidean distance and specifies the choice of k=5 as the near neighbor value.', 'The chapter defines the Euclidean distance formula as the root of (x2-x1)^2 + (y2-y1)^2 and provides an explanation for calculating Euclidean distance. The chapter provides the Euclidean distance formula as the root of (x2-x1)^2 + (y2-y1)^2 and explains the process of calculating Euclidean distance.', 'The chapter outlines the process of classifying a new point based on the majority category among the neighbors and explains the concept of the new point being classified based on the majority category of its neighbors. The chapter explains the process of classifying a new point based on the majority category among the neighbors and emphasizes the concept of the new point being classified based on the majority category of its neighbors.']}], 'duration': 307.406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy41442.jpg', 'highlights': ['K nearest neighbor is effective for non-linear classified data points.', 'The algorithm is useful for both classification and regression use cases.', 'The algorithm requires the selection of a k value for determining nearest neighbors.', 'The chapter explains the process of finding the five nearest neighbors using Euclidean distance.', 'The chapter defines the Euclidean distance formula and explains the process of calculating Euclidean distance.', 'The chapter outlines the process of classifying a new point based on the majority category among the neighbors.']}, {'end': 689.258, 'segs': [{'end': 340.767, 'src': 'embed', 'start': 308.848, 'weight': 0, 'content': [{'end': 313.013, 'text': 'based on the K value, it will try to find out which is the K nearest neighbor.', 'start': 308.848, 'duration': 4.165}, {'end': 317.818, 'text': 'Okay And suppose I found out that from, and always remember this K value should be odd.', 'start': 313.033, 'duration': 4.785}, {'end': 326.102, 'text': 'if it is even, then there will be a scenario where you may get an equal number of uh one equal number of categories on both the side.', 'start': 318.559, 'duration': 7.543}, {'end': 333.164, 'text': 'that is basically, on this side you may get two categories and in this side also you may get two categories if i select k values equal to four.', 'start': 326.102, 'duration': 7.062}, {'end': 340.767, 'text': "so always make sure you select a value of odd and i'll also tell i'll also show you how to select this k value, how you can come up with that.", 'start': 333.164, 'duration': 7.603}], 'summary': 'Using odd k values for k-nearest neighbor to avoid equal categories on both sides.', 'duration': 31.919, 'max_score': 308.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4308848.jpg'}, {'end': 377.02, 'src': 'embed', 'start': 347.429, 'weight': 1, 'content': [{'end': 349.591, 'text': 'so pretty simple algorithm.', 'start': 347.429, 'duration': 2.162}, {'end': 356.001, 'text': 'in this, what we do is that we find the nearest neighbor okay, based on the k value that we have selected,', 'start': 349.591, 'duration': 6.41}, {'end': 362.411, 'text': 'then whichever category is having the near maximum number of nearest neighbor, it belongs, it belongs to that particular category.', 'start': 356.001, 'duration': 6.41}, {'end': 365.577, 'text': 'Now this is with respect to the classification use case.', 'start': 363.237, 'duration': 2.34}, {'end': 372.159, 'text': 'Okay Now what about regression? In regression use case, what we do is that everything is similar.', 'start': 366.038, 'duration': 6.121}, {'end': 374.299, 'text': 'Okay Everything is similar.', 'start': 372.879, 'duration': 1.42}, {'end': 377.02, 'text': 'Okay Here also we select a K value.', 'start': 374.919, 'duration': 2.101}], 'summary': 'Algorithm finds nearest neighbor based on k value for classification and regression.', 'duration': 29.591, 'max_score': 347.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4347429.jpg'}, {'end': 489.728, 'src': 'heatmap', 'start': 441.662, 'weight': 0.764, 'content': [{'end': 458.499, 'text': 'average mean of all this point mean of all the nearest neighbor that it has found out all the just try to find out the mean of all the nearest neighbor and this particular new point will get that particular value.', 'start': 441.662, 'duration': 16.837}, {'end': 466.825, 'text': 'what, when whatever mean will come out of this particular thing, that particular point will get that particular value as its uh predicted point.', 'start': 458.499, 'duration': 8.326}, {'end': 470.367, 'text': 'so this is with respect to regression.', 'start': 466.825, 'duration': 3.542}, {'end': 472.668, 'text': 'so this was about two different things.', 'start': 470.367, 'duration': 2.301}, {'end': 477.352, 'text': 'again, always remember in k nearest neighbor will also get impacted by outliers,', 'start': 472.668, 'duration': 4.684}, {'end': 484.504, 'text': 'Because if there are a lot of outliers then it will be very difficult for a classification use case.', 'start': 478.7, 'duration': 5.804}, {'end': 489.728, 'text': 'It may get treated in a wrong manner.', 'start': 485.405, 'duration': 4.323}], 'summary': 'K-nearest neighbor regression calculates mean of nearest neighbors to predict new point values. outliers impact classification.', 'duration': 48.066, 'max_score': 441.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4441662.jpg'}, {'end': 534.898, 'src': 'embed', 'start': 506.556, 'weight': 2, 'content': [{'end': 512.818, 'text': "imbalanced dataset I've already created a lot of video on how to fix imbalanced dataset and with respect to outliers also.", 'start': 506.556, 'duration': 6.262}, {'end': 517.842, 'text': 'So make sure that you fix both these issues and then you try to apply k-nearest neighbor.', 'start': 513.139, 'duration': 4.703}, {'end': 523.905, 'text': "Now this was about k-nearest neighbor, now we'll go ahead and try to understand the implementation part.", 'start': 518.261, 'duration': 5.644}, {'end': 526.848, 'text': 'And the implementation is also very, very simple.', 'start': 524.725, 'duration': 2.123}, {'end': 534.898, 'text': "And I'll make sure that I'll upload this and I'll provide a link of the GitHub URL in my description of this video.", 'start': 527.088, 'duration': 7.81}], 'summary': 'Fix imbalanced dataset and outliers, then apply k-nearest neighbor for implementation.', 'duration': 28.342, 'max_score': 506.556, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4506556.jpg'}, {'end': 629.357, 'src': 'heatmap', 'start': 583.77, 'weight': 0.792, 'content': [{'end': 587.371, 'text': 'So for this, what I do is that I apply a standard scalar.', 'start': 583.77, 'duration': 3.601}, {'end': 593.584, 'text': 'and apart from the target class, i will apply the standard scale on all these features.', 'start': 588.622, 'duration': 4.962}, {'end': 594.764, 'text': 'so here it is.', 'start': 593.584, 'duration': 1.18}, {'end': 598.565, 'text': "i've applied it and you can see that here is my scale feature.", 'start': 594.764, 'duration': 3.801}, {'end': 604.747, 'text': 'i have imported standard scalar which, like standard scalar, then i have done fit after dropping the target class.', 'start': 598.565, 'duration': 6.182}, {'end': 606.548, 'text': 'then i have done transform.', 'start': 604.747, 'duration': 1.801}, {'end': 613.15, 'text': 'as soon as i do transform and always remember that standard scalar basically means that all the values will get transformed.', 'start': 606.548, 'duration': 6.602}, {'end': 616.531, 'text': 'uh, based on the standard normal distribution of that particular data.', 'start': 613.15, 'duration': 3.381}, {'end': 621.531, 'text': 'okay, So after doing that and after I convert that into a data frame, my data looks something like this', 'start': 616.531, 'duration': 5}, {'end': 626.855, 'text': "OK, and this is how my data looks like after the standard scale I've applied.", 'start': 622.412, 'duration': 4.443}, {'end': 629.357, 'text': 'Next is that I have so many features.', 'start': 627.256, 'duration': 2.101}], 'summary': 'Applied standard scalar to features, achieving standard normal distribution.', 'duration': 45.587, 'max_score': 583.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4583770.jpg'}, {'end': 677.466, 'src': 'heatmap', 'start': 636.002, 'weight': 0.851, 'content': [{'end': 640.806, 'text': 'I can just use Seaborn and try to see that how my data is actually distributed.', 'start': 636.002, 'duration': 4.804}, {'end': 647.151, 'text': "So I'll be taking there with respect to we as my target class so that the classification point can be shown very clearly.", 'start': 641.046, 'duration': 6.105}, {'end': 652.228, 'text': 'Now, after executing this, now you can see that this is how my data is actually distributed.', 'start': 647.864, 'duration': 4.364}, {'end': 656.912, 'text': 'Now, you can see that a whole lot of distribution is there.', 'start': 652.869, 'duration': 4.043}, {'end': 659.535, 'text': 'It is like intermingled quite a lot.', 'start': 656.973, 'duration': 2.562}, {'end': 661.136, 'text': "There's a lot of overlapping.", 'start': 659.695, 'duration': 1.441}, {'end': 666.842, 'text': "So, definitely, I can't just apply a logistic regression into it or a decision tool because it will take much time.", 'start': 661.557, 'duration': 5.285}, {'end': 669.923, 'text': 'so i prefer using k nearest neighbor for this.', 'start': 667.222, 'duration': 2.701}, {'end': 677.466, 'text': "okay, so after this, uh, i don't have to do train test split also because today i'm going to show you with the help of cross validation,", 'start': 669.923, 'duration': 7.543}], 'summary': 'Using seaborn to visualize data distribution and choosing k nearest neighbor for classification', 'duration': 41.464, 'max_score': 636.002, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4636002.jpg'}], 'start': 308.848, 'title': 'K-nearest neighbor algorithm', 'summary': 'Explains the k-nearest neighbor algorithm, emphasizing the significance of selecting an odd k value, its applications in classification and regression use cases, and the impact of outliers and imbalanced datasets.', 'chapters': [{'end': 526.848, 'start': 308.848, 'title': 'K-nearest neighbor algorithm', 'summary': 'Explains the k-nearest neighbor algorithm, highlighting the significance of selecting an odd k value, and its applications in both classification and regression use cases, while emphasizing the impact of outliers and imbalanced datasets.', 'duration': 218, 'highlights': ['The importance of selecting an odd K value in K-nearest neighbor algorithm Selecting an odd K value is crucial to avoid scenarios where an even K value may result in an equal number of categories on both sides, impacting the accuracy of classification.', 'Application of K-nearest neighbor in classification and regression use cases The algorithm is utilized for identifying the nearest neighbor based on the selected K value, where for classification, the category with the near maximum number of nearest neighbors is assigned, while for regression, the mean of the nearest neighbors is used as the predicted value.', 'Impact of outliers and imbalanced datasets on K-nearest neighbor The algorithm may face challenges in accurate prediction and identification if the dataset contains numerous outliers or is imbalanced, necessitating the need to address these issues prior to applying K-nearest neighbor.']}, {'end': 689.258, 'start': 527.088, 'title': 'Data classification and standardization', 'summary': 'Covers the process of standardizing a classified dataset using a standard scalar, visualizing the data distribution through a pair plot, and choosing the k-nearest neighbor algorithm for classification.', 'duration': 162.17, 'highlights': ['The dataset contains independent features and the output feature is a target class that needs to be classified into one or zero.', 'Standardization of the variables is done using a standard scalar, transforming the values based on the standard normal distribution of the data.', 'Visualization of the data distribution is carried out using a pair plot to understand the overlapping and distribution of the features, leading to the decision to use the k-nearest neighbor algorithm for classification.', 'The use of cross-validation will be demonstrated for training the model in the future, in addition to the traditional train-test split method.']}], 'duration': 380.41, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4308848.jpg', 'highlights': ['The importance of selecting an odd K value in K-nearest neighbor algorithm Selecting an odd K value is crucial to avoid scenarios where an even K value may result in an equal number of categories on both sides, impacting the accuracy of classification.', 'Application of K-nearest neighbor in classification and regression use cases The algorithm is utilized for identifying the nearest neighbor based on the selected K value, where for classification, the category with the near maximum number of nearest neighbors is assigned, while for regression, the mean of the nearest neighbors is used as the predicted value.', 'Impact of outliers and imbalanced datasets on K-nearest neighbor The algorithm may face challenges in accurate prediction and identification if the dataset contains numerous outliers or is imbalanced, necessitating the need to address these issues prior to applying K-nearest neighbor.']}, {'end': 1080.124, 'segs': [{'end': 778.468, 'src': 'heatmap', 'start': 722.761, 'weight': 1, 'content': [{'end': 727.102, 'text': 'But let us just think that the k value is 1 and try to do the fit.', 'start': 722.761, 'duration': 4.341}, {'end': 729.643, 'text': "And after doing the fit, I'm doing the predict.", 'start': 727.762, 'duration': 1.881}, {'end': 734.205, 'text': 'after doing the predict, i can also find out the confusion matrix.', 'start': 730.083, 'duration': 4.122}, {'end': 740.767, 'text': "from this confusion matrix you can see that the precision and recall value i'm getting somewhere around 91 if my k value is one.", 'start': 734.205, 'duration': 6.562}, {'end': 745.308, 'text': 'but still it will lead to underfitting because my k value is just one.', 'start': 740.767, 'duration': 4.541}, {'end': 752.251, 'text': 'and uh, with the, if my k value is one, it is very difficult to just uh predict based on just one nearest neighbor.', 'start': 745.308, 'duration': 6.943}, {'end': 755.633, 'text': 'So what I have to do is that I have to choose a K value.', 'start': 752.711, 'duration': 2.922}, {'end': 760.556, 'text': 'Now for choosing a K value, what I do is that I have made a list called an accuracy rate.', 'start': 756.173, 'duration': 4.383}, {'end': 763.738, 'text': 'Then I will run a loop from 1 to 40.', 'start': 761.277, 'duration': 2.461}, {'end': 766.56, 'text': 'Now just see what this loop is all about.', 'start': 763.738, 'duration': 2.822}, {'end': 770.623, 'text': "This 1 to 40 values, I'm going to consider the K value.", 'start': 767.101, 'duration': 3.522}, {'end': 775.026, 'text': "Suppose for K is equal to 1, I'll try to find out what is the accuracy that I got over there.", 'start': 771.103, 'duration': 3.923}, {'end': 778.468, 'text': 'For K is equal to 2, what is the accuracy? For K is equal to 3,.', 'start': 775.306, 'duration': 3.162}], 'summary': 'Using k=1 leads to 91 precision and recall, but underfitting, leading to the need to choose a higher k value through testing k values from 1 to 40.', 'duration': 44.263, 'max_score': 722.761, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4722761.jpg'}, {'end': 856.977, 'src': 'embed', 'start': 821.666, 'weight': 2, 'content': [{'end': 823.207, 'text': 'and this is my target class.', 'start': 821.666, 'duration': 1.541}, {'end': 829.131, 'text': "And here, the cross-validation value I'm going to take is basically 10, okay? So 10 different experiments.", 'start': 823.567, 'duration': 5.564}, {'end': 830.973, 'text': 'You know how cross-validation works.', 'start': 829.472, 'duration': 1.501}, {'end': 833.034, 'text': 'Train and test split will be internally happening.', 'start': 831.093, 'duration': 1.941}, {'end': 837.298, 'text': 'For 10 different experiments, the train and test data set will always be different.', 'start': 833.635, 'duration': 3.663}, {'end': 840.38, 'text': "Then finally, I'm going to append the accuracy rate of it.", 'start': 837.878, 'duration': 2.502}, {'end': 844.853, 'text': "okay, now see this after i append this accuracy rate, similarly what i've done.", 'start': 840.731, 'duration': 4.122}, {'end': 848.774, 'text': 'instead of calculating this accuracy rate, i can also calculate error rate.', 'start': 844.853, 'duration': 3.921}, {'end': 856.977, 'text': 'all i have to do is that i just have to write uh, in the accuracy scale, in the accuracy rate i just have to write instead of this score of dot,', 'start': 848.774, 'duration': 8.203}], 'summary': 'Using 10-fold cross-validation, the speaker will run 10 experiments and record accuracy rates, with the possibility of calculating error rates as well.', 'duration': 35.311, 'max_score': 821.666, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4821666.jpg'}, {'end': 1020.841, 'src': 'embed', 'start': 990.816, 'weight': 0, 'content': [{'end': 994.78, 'text': 'is actually running within this 93% accuracy.', 'start': 990.816, 'duration': 3.964}, {'end': 1001.787, 'text': "I'll suggest that K is equal to 23 would be a very good value for selecting the K value and this is how you have to select the K value.", 'start': 995.541, 'duration': 6.246}, {'end': 1012.418, 'text': 'You have to see that from where the error rate or the accuracy rate was continuously stable at a specific point and it was not like zigzag line like how we have over here.', 'start': 1001.807, 'duration': 10.611}, {'end': 1020.841, 'text': 'so now what i can do is that i can just select my k nearest neighbor as one over here and let us go ahead and select k neighbors 23,', 'start': 1012.978, 'duration': 7.863}], 'summary': 'Select k=23 for 93% accuracy in k-nearest neighbor algorithm.', 'duration': 30.025, 'max_score': 990.816, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4990816.jpg'}, {'end': 1059.432, 'src': 'embed', 'start': 1035.786, 'weight': 4, 'content': [{'end': 1043.247, 'text': "guys. uh, Very simple, very simple algorithm, wonderful code over here that I've written already for you.", 'start': 1035.786, 'duration': 7.461}, {'end': 1047.788, 'text': 'All you have to do is that download it from the github and start working with your own use case.', 'start': 1043.247, 'duration': 4.541}, {'end': 1052.93, 'text': "There's a whole lot of bunch of use cases in Kaggle which you can apply with the help of k-news, neighbor.", 'start': 1047.788, 'duration': 5.142}, {'end': 1054.89, 'text': 'I hope you like this particular videos, guys.', 'start': 1052.93, 'duration': 1.96}, {'end': 1056.991, 'text': 'Please gives a thumbs up.', 'start': 1055.37, 'duration': 1.621}, {'end': 1059.432, 'text': 'like this particular video, Please just subscribe the channel.', 'start': 1056.991, 'duration': 2.441}], 'summary': 'Simple algorithm available on github for various use cases with k-nearest neighbor.', 'duration': 23.646, 'max_score': 1035.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy41035786.jpg'}], 'start': 689.258, 'title': 'Selecting k value for k nearest neighbor', 'summary': 'Discusses the process of selecting the optimal k value for k nearest neighbor algorithm by iteratively testing accuracy rates from 1 to 40, aiming to select a stable k value for model optimization. it determines a k value of 23 with 95% accuracy and provides guidance for applying the algorithm to various use cases on kaggle.', 'chapters': [{'end': 840.38, 'start': 689.258, 'title': 'K-neighbors classifier for k value derivation', 'summary': 'Explains the process of deriving the k value for k-neighbors classifier by iteratively testing accuracy rates from 1 to 40, aiming to select a stable k value for model optimization, with an initial precision and recall value of around 91 at k=1.', 'duration': 151.122, 'highlights': ['The K value is initially set to 1 for deriving its value through an iterative process, where precision and recall values were observed to be around 91% at k=1.', 'A loop from 1 to 40 is run to calculate accuracy rates for different K values, aiming to select a stable K value based on the accuracy rate plot.', 'Cross-validation with 10 different experiments is used to internally split train and test datasets for KNN model evaluation.']}, {'end': 1080.124, 'start': 840.731, 'title': 'Selecting k value for k nearest neighbor', 'summary': 'Discusses the process of selecting the optimal k value for k nearest neighbor algorithm by analyzing error rate and accuracy, determining k value of 23 with 95% accuracy, and providing guidance for applying the algorithm to various use cases on kaggle.', 'duration': 239.393, 'highlights': ['By analyzing the error rate and accuracy, the chapter demonstrates the process of selecting the optimal K value for K Nearest Neighbor algorithm, determining a K value of 23 with 95% accuracy.', 'The chapter provides guidance for applying the K Nearest Neighbor algorithm to various use cases on Kaggle, emphasizing its simplicity and effectiveness in classification tasks.', 'The speaker encourages viewers to download the provided code from GitHub and start working with their own use cases, while also urging them to like, subscribe, and enable notifications for future video updates.']}], 'duration': 390.866, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/wTF6vzS9fy4/pics/wTF6vzS9fy4689258.jpg', 'highlights': ['Determining a K value of 23 with 95% accuracy.', 'Loop from 1 to 40 calculates accuracy rates for different K values.', 'Cross-validation with 10 different experiments is used for KNN model evaluation.', 'Precision and recall values observed to be around 91% at k=1.', 'Provides guidance for applying the K Nearest Neighbor algorithm to various use cases on Kaggle.', 'Emphasizes simplicity and effectiveness in classification tasks.']}], 'highlights': ['Determining a K value of 23 with 95% accuracy.', 'The algorithm requires the selection of a k value for determining nearest neighbors.', 'The importance of selecting an odd K value in K-nearest neighbor algorithm Selecting an odd K value is crucial to avoid scenarios where an even K value may result in an equal number of categories on both sides, impacting the accuracy of classification.', 'Loop from 1 to 40 calculates accuracy rates for different K values.', 'The chapter outlines the process of classifying a new point based on the majority category among the neighbors.', 'The algorithm is useful for both classification and regression use cases.', 'The chapter explains the process of finding the five nearest neighbors using Euclidean distance.', 'Cross-validation with 10 different experiments is used for KNN model evaluation.', 'Application of K-nearest neighbor in classification and regression use cases The algorithm is utilized for identifying the nearest neighbor based on the selected K value, where for classification, the category with the near maximum number of nearest neighbors is assigned, while for regression, the mean of the nearest neighbors is used as the predicted value.', 'Precision and recall values observed to be around 91% at k=1.', 'Emphasizes simplicity and effectiveness in classification tasks.', 'The chapter defines the Euclidean distance formula and explains the process of calculating Euclidean distance.', 'The algorithm may face challenges in accurate prediction and identification if the dataset contains numerous outliers or is imbalanced, necessitating the need to address these issues prior to applying K-nearest neighbor.', 'Impact of outliers and imbalanced datasets on K-nearest neighbor The algorithm may face challenges in accurate prediction and identification if the dataset contains numerous outliers or is imbalanced, necessitating the need to address these issues prior to applying K-nearest neighbor.', 'Explains the intuition and implementation of the k nearest neighbor algorithm for classification and regression, highlighting its effectiveness in solving non-linear classified data points.']}