title
Live Day 6- Discussing KMeans,Hierarchical And DBScan Clustering Algorithms

description
Join the community session https://ineuron.ai/course/Mega-Community . Here All the materials will be uploaded. Live ML Playlist: https://www.youtube.com/playlist?list=PLZoTAELRMXVPjaAzURB77Kz0YXxj65tYz The Oneneuron Lifetime subscription has been extended. In Oneneuron platform you will be able to get 100+ courses(Monthly atleast 20 courses will be added based on your demand) Features of the course 1. You can raise any course demand.(Fulfilled within 45-60 days) 2. You can access innovation lab from ineuron. 3. You can use our incubation based on your ideas 4. Live session coming soon(Mostly till Feb) Use Coupon code KRISH10 for addition 10% discount. And Many More..... Enroll Now OneNeuron Link: https://one-neuron.ineuron.ai/ Direct call to our Team incase of any queries 8788503778 6260726925 9538303385 866003424

detail
{'title': 'Live Day 6- Discussing KMeans,Hierarchical And DBScan Clustering Algorithms', 'heatmap': [{'end': 342.841, 'start': 281.123, 'weight': 0.735}, {'end': 1605.188, 'start': 1553.958, 'weight': 0.781}, {'end': 2079.168, 'start': 2029.177, 'weight': 0.746}, {'end': 2884.773, 'start': 2733.231, 'weight': 0.727}, {'end': 4297.032, 'start': 4246.213, 'weight': 1}], 'summary': 'Discusses k-means clustering in unsupervised machine learning, including its application in ensemble techniques and the steps involved in k-means algorithm. it also covers the process of k-means clustering, centroid initialization, and point updating, explaining the elbow method for determining the optimized k value and discussing finding the optimal k value using wcss, elbow curve method, and silhert score validation. additionally, it explains k-means clustering, hierarchical clustering, and dbscan clustering, highlighting their differences and advantages, and covers implementing clustering algorithms and silhouette analysis to determine the optimal number of clusters for k-means clustering.', 'chapters': [{'end': 760.795, 'segs': [{'end': 196.99, 'src': 'embed', 'start': 171.243, 'weight': 0, 'content': [{'end': 176.748, 'text': 'k means clustering and this is a kind of unsupervised machine learning.', 'start': 171.243, 'duration': 5.505}, {'end': 182.653, 'text': 'okay, now, always remember, unsupervised machine learning basically means that.', 'start': 176.748, 'duration': 5.905}, {'end': 193.202, 'text': "uh, the one and the most important thing is that in unsupervised machine learning, in unsupervised ml, you don't have any specific output.", 'start': 182.653, 'duration': 10.549}, {'end': 194.268, 'text': 'Okay.', 'start': 194.068, 'duration': 0.2}, {'end': 196.99, 'text': "So you don't have any specific output.", 'start': 194.969, 'duration': 2.021}], 'summary': 'K-means clustering is an unsupervised ml method with no specific output.', 'duration': 25.747, 'max_score': 171.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk171243.jpg'}, {'end': 248.887, 'src': 'embed', 'start': 221.031, 'weight': 1, 'content': [{'end': 225.072, 'text': 'And there are various techniques like k-means, it is hierarchical clustering and all.', 'start': 221.031, 'duration': 4.041}, {'end': 229.613, 'text': "So first of all, we'll try to understand about k-means and how does it specifically work.", 'start': 225.132, 'duration': 4.481}, {'end': 231.054, 'text': "It's simple.", 'start': 230.554, 'duration': 0.5}, {'end': 233.934, 'text': 'Suppose you have a data points like this.', 'start': 231.754, 'duration': 2.18}, {'end': 235.955, 'text': "Let's say that you have a data points.", 'start': 234.114, 'duration': 1.841}, {'end': 239.839, 'text': 'Suppose you have some data points like this.', 'start': 237.717, 'duration': 2.122}, {'end': 243.382, 'text': "Okay Let's say that this is your F1 feature, F2 feature.", 'start': 240.48, 'duration': 2.902}, {'end': 248.887, 'text': 'And based on this in two dimensional, probably I will be plotting this points.', 'start': 244.003, 'duration': 4.884}], 'summary': 'Introduction to k-means and clustering techniques in data analysis.', 'duration': 27.856, 'max_score': 221.031, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk221031.jpg'}, {'end': 342.841, 'src': 'heatmap', 'start': 281.123, 'weight': 0.735, 'content': [{'end': 286.386, 'text': "specifically, it will be very much useful and then we'll try to understand about math, intuition also.", 'start': 281.123, 'duration': 5.263}, {'end': 289.889, 'text': 'now always understand, guys, uh, where does clustering gets used?', 'start': 286.386, 'duration': 3.503}, {'end': 293.111, 'text': 'okay, in most of the ensemble techniques.', 'start': 289.889, 'duration': 3.222}, {'end': 295.052, 'text': 'i told you about custom ensemble technique.', 'start': 293.111, 'duration': 1.941}, {'end': 299.475, 'text': 'right, so custom ensemble techniques.', 'start': 295.052, 'duration': 4.423}, {'end': 303.998, 'text': 'in custom ensemble techniques, you know whenever we are probably creating a model.', 'start': 299.475, 'duration': 4.523}, {'end': 309.101, 'text': 'First of all, on our data set, what we do is that we create clusters.', 'start': 304.838, 'duration': 4.263}, {'end': 310.702, 'text': 'So suppose this is my data set.', 'start': 309.281, 'duration': 1.421}, {'end': 316.046, 'text': 'During my model creation, the first algorithm we will probably apply will be clustering algorithm.', 'start': 311.303, 'duration': 4.743}, {'end': 321.771, 'text': 'And after that, it is obviously good that we can apply regression or classification problem.', 'start': 316.867, 'duration': 4.904}, {'end': 326.034, 'text': 'Suppose in this clustering, I have two or three groups.', 'start': 322.431, 'duration': 3.603}, {'end': 328.336, 'text': "Let's say that I have two or three groups over here.", 'start': 326.114, 'duration': 2.222}, {'end': 338.7, 'text': 'For each group, we can apply a separate supervised machine learning algorithm if we know the specific output that we really want to take ahead.', 'start': 329.016, 'duration': 9.684}, {'end': 342.841, 'text': "I'll talk about this and give you some of the examples as I go ahead.", 'start': 339.18, 'duration': 3.661}], 'summary': 'Clustering is used in custom ensemble techniques for creating separate supervised machine learning algorithms for different groups in a dataset.', 'duration': 61.718, 'max_score': 281.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk281123.jpg'}, {'end': 520.695, 'src': 'embed', 'start': 490.554, 'weight': 3, 'content': [{'end': 494.676, 'text': 'The second step is that we initialize k number of centroids.', 'start': 490.554, 'duration': 4.122}, {'end': 502.624, 'text': 'k number of centroids.', 'start': 501.223, 'duration': 1.401}, {'end': 506.326, 'text': 'now, in this particular case, i know my k value is 2.', 'start': 502.624, 'duration': 3.702}, {'end': 508.467, 'text': 'so we will be initializing randomly.', 'start': 506.326, 'duration': 2.141}, {'end': 510.709, 'text': "let's say that k is equal to 2.", 'start': 508.467, 'duration': 2.242}, {'end': 511.85, 'text': 'so what we can actually do?', 'start': 510.709, 'duration': 1.141}, {'end': 515.072, 'text': "let's say that this is this is my one centroid.", 'start': 511.85, 'duration': 3.222}, {'end': 516.933, 'text': 'this is my one centroid.', 'start': 515.072, 'duration': 1.861}, {'end': 518.734, 'text': "i will i'll put it in another color.", 'start': 516.933, 'duration': 1.801}, {'end': 520.695, 'text': 'so this will be my one centroid.', 'start': 518.734, 'duration': 1.961}], 'summary': 'In k-means clustering, we initialize 2 centroids randomly.', 'duration': 30.141, 'max_score': 490.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk490554.jpg'}, {'end': 697.964, 'src': 'embed', 'start': 672.216, 'weight': 4, 'content': [{'end': 676.559, 'text': 'The reason we compute the average is that because we need to update the centroid.', 'start': 672.216, 'duration': 4.343}, {'end': 678.08, 'text': 'So compute the average.', 'start': 676.859, 'duration': 1.221}, {'end': 680.352, 'text': 'To update centroid.', 'start': 678.991, 'duration': 1.361}, {'end': 684.935, 'text': 'To update centroids.', 'start': 683.574, 'duration': 1.361}, {'end': 688.978, 'text': 'Okay So here you will be able to see that what I am actually doing.', 'start': 686.096, 'duration': 2.882}, {'end': 694.142, 'text': 'As soon as we compute the average, this centroid is going to move to some other location.', 'start': 689.498, 'duration': 4.644}, {'end': 697.964, 'text': 'So what location it will move? It will obviously become somewhere in center.', 'start': 694.802, 'duration': 3.162}], 'summary': 'Computing average to update centroids and move centroid to center.', 'duration': 25.748, 'max_score': 672.216, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk672216.jpg'}], 'start': 34.804, 'title': 'K-means clustering in ml', 'summary': 'Discusses k-means clustering in unsupervised machine learning, its application in ensemble techniques, and the steps involved in k-means algorithm, including trying different k values, initializing centroids, and finding points near the centroids.', 'chapters': [{'end': 540.813, 'start': 34.804, 'title': 'K-means clustering in ml', 'summary': 'Discusses k-means clustering in unsupervised machine learning, its application in ensemble techniques, and the steps involved in k-means algorithm, including trying different k values, initializing centroids, and finding points near the centroids.', 'duration': 506.009, 'highlights': ['K-means clustering is a type of unsupervised machine learning that aims to create clusters of similar data based on features, with applications in ensemble techniques. K-means clustering is discussed in the context of unsupervised machine learning, highlighting its purpose of creating clusters of similar data based on features and its application in ensemble techniques.', 'The chapter explains the steps involved in K-means clustering, including trying different k values, initializing centroids, and finding points near the centroids. The chapter outlines the steps involved in K-means clustering, such as trying different k values, randomly initializing centroids, and identifying which points are near to the centroids.', 'The K-means algorithm involves trying different k values, initializing k number of centroids, and finding points near the centroids to form clusters. The K-means algorithm encompasses steps like trying different k values, randomly initializing k number of centroids, and determining the points near the centroids to form clusters.']}, {'end': 760.795, 'start': 541.394, 'title': 'K-means clustering process', 'summary': 'Explains the process of k-means clustering, where centroids are initialized, distances to centroids are calculated, points are assigned to nearest centroids, centroids are updated based on average, and the process iterates until no further updates are needed.', 'duration': 219.401, 'highlights': ['The process involves initializing k number of centroids and calculating distances to identify points nearest to each centroid.', 'Based on the shortest distance, points are assigned to the nearest centroid, with those nearer to a specific centroid being marked with a corresponding color.', 'Centroids are updated by computing the average of points assigned to them, moving the centroid to a new location, and iterating the process until no further updates are required.']}], 'duration': 725.991, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk34804.jpg', 'highlights': ['K-means clustering is a type of unsupervised machine learning with applications in ensemble techniques.', 'The chapter outlines the steps involved in K-means clustering, such as trying different k values, randomly initializing centroids, and identifying which points are near to the centroids.', 'The K-means algorithm encompasses steps like trying different k values, randomly initializing k number of centroids, and determining the points near the centroids to form clusters.', 'The process involves initializing k number of centroids and calculating distances to identify points nearest to each centroid.', 'Centroids are updated by computing the average of points assigned to them, moving the centroid to a new location, and iterating the process until no further updates are required.']}, {'end': 1087.678, 'segs': [{'end': 821.696, 'src': 'embed', 'start': 790.996, 'weight': 6, 'content': [{'end': 794.417, 'text': 'Okay, because of your internet it is taking in different pixels.', 'start': 790.996, 'duration': 3.421}, {'end': 802.48, 'text': 'Okay, so I hope everybody has understood the steps that you have actually followed in initializing the centroids,', 'start': 795.518, 'duration': 6.962}, {'end': 805.441, 'text': 'in updating the centroids and in updating the points.', 'start': 802.48, 'duration': 2.961}, {'end': 807.602, 'text': 'Is it clear everybody?', 'start': 805.621, 'duration': 1.981}, {'end': 808.682, 'text': 'with respect to k-means?', 'start': 807.602, 'duration': 1.08}, {'end': 815.231, 'text': 'yes, everybody, everybody.', 'start': 810.888, 'duration': 4.343}, {'end': 820.355, 'text': 'is it clear?', 'start': 815.231, 'duration': 5.124}, {'end': 821.696, 'text': 'everybody. is it clear.', 'start': 820.355, 'duration': 1.341}], 'summary': 'Discussion on initializing, updating centroids and points in k-means algorithm.', 'duration': 30.7, 'max_score': 790.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk790996.jpg'}, {'end': 895.913, 'src': 'embed', 'start': 849.778, 'weight': 0, 'content': [{'end': 857.201, 'text': 'Now, Elbow method says something very much important, because this will actually help us to find out what is the optimized K value,', 'start': 849.778, 'duration': 7.423}, {'end': 864.503, 'text': 'whether the K value should be 2, whether the K value is going to be 3, whether the K value is going to become 4..', 'start': 857.201, 'duration': 7.302}, {'end': 866.926, 'text': 'okay and always understand.', 'start': 864.503, 'duration': 2.423}, {'end': 870.01, 'text': 'suppose this is my data set.', 'start': 866.926, 'duration': 3.084}, {'end': 871.892, 'text': 'suppose this is my data set initially.', 'start': 870.01, 'duration': 1.882}, {'end': 873.093, 'text': "let's say that i have my data.", 'start': 871.892, 'duration': 1.201}, {'end': 874.455, 'text': 'points like this.', 'start': 873.093, 'duration': 1.362}, {'end': 878.88, 'text': 'we cannot go ahead and directly say say that okay, k is equal to 2 is going to work.', 'start': 874.455, 'duration': 4.425}, {'end': 881.948, 'text': 'so obviously we are going to Go with iteration.', 'start': 878.88, 'duration': 3.068}, {'end': 885.01, 'text': 'For i is equal to probably 1 to 10.', 'start': 882.469, 'duration': 2.541}, {'end': 887.79, 'text': 'I am going to move towards iteration from 1 to 10.', 'start': 885.01, 'duration': 2.78}, {'end': 888.151, 'text': "Let's say.", 'start': 887.79, 'duration': 0.361}, {'end': 895.913, 'text': 'So for every iteration we will construct a graph with respect to k value and with respect to something called as WCSS.', 'start': 888.551, 'duration': 7.362}], 'summary': 'Using the elbow method to find optimized k value for clustering through iteration from 1 to 10.', 'duration': 46.135, 'max_score': 849.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk849778.jpg'}, {'end': 998.682, 'src': 'embed', 'start': 969.871, 'weight': 2, 'content': [{'end': 972.353, 'text': 'Okay It is going to be obviously greater.', 'start': 969.871, 'duration': 2.482}, {'end': 975.055, 'text': "Okay Let's say that my first point is coming over here.", 'start': 972.653, 'duration': 2.402}, {'end': 982.841, 'text': 'Fine So within k is equal to 1 initially we took and we found out the distance of WCSS and it is a very huge value.', 'start': 975.876, 'duration': 6.965}, {'end': 987.485, 'text': 'Okay Because we are going to compute the distance between each and every point to the centroid.', 'start': 983.402, 'duration': 4.083}, {'end': 989.747, 'text': 'Now the next thing that I am actually going to do.', 'start': 988.045, 'duration': 1.702}, {'end': 995.822, 'text': 'is that now we will go with next value that is k is equal to 2.', 'start': 991.041, 'duration': 4.781}, {'end': 998.682, 'text': 'Now in k is equal to 2, I will initialize two points.', 'start': 995.822, 'duration': 2.86}], 'summary': 'Initial k=1 wcss distance is very high, next k=2 points will be initialized.', 'duration': 28.811, 'max_score': 969.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk969871.jpg'}, {'end': 1087.678, 'src': 'embed', 'start': 1036.694, 'weight': 3, 'content': [{'end': 1039.958, 'text': 'Then k is equal to 4 will come here to 5, 6 like this it will go.', 'start': 1036.694, 'duration': 3.264}, {'end': 1043.44, 'text': 'Okay So, here if I probably join this line.', 'start': 1040.678, 'duration': 2.762}, {'end': 1049.365, 'text': 'You will be able to see that there will be an abrupt changes in the WCSS value.', 'start': 1044.181, 'duration': 5.184}, {'end': 1058.713, 'text': 'Okay In the WCSS value, there will be an abrupt changes and this is basically called as elbow curve.', 'start': 1050.867, 'duration': 7.846}, {'end': 1062.777, 'text': 'Now why we say it as elbow curve because it is in the shape of elbow.', 'start': 1059.514, 'duration': 3.263}, {'end': 1068.462, 'text': 'Okay And here at one specific point, there will be an abrupt change and then it will be straight.', 'start': 1063.557, 'duration': 4.905}, {'end': 1072.905, 'text': 'So that is the reason why we basically say this as ELBO.', 'start': 1069.482, 'duration': 3.423}, {'end': 1075.287, 'text': 'Okay? So this is a very important thing.', 'start': 1073.646, 'duration': 1.641}, {'end': 1078.89, 'text': 'See, in finding the K value, we use ELBO method.', 'start': 1075.728, 'duration': 3.162}, {'end': 1087.678, 'text': "But for validating purpose, how do we validate that this model is performing well? We use SILHERT score that I'll show you just in some time.", 'start': 1078.951, 'duration': 8.727}], 'summary': 'K value of 4 will result in abrupt changes in wcss, forming an elbow curve, validated by silhert score.', 'duration': 50.984, 'max_score': 1036.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1036694.jpg'}], 'start': 761.375, 'title': 'K-means clustering and optimal k value', 'summary': 'Covers the process of k-means clustering, centroid initialization, and point updating. it explains the elbow method for determining the optimized k value and discusses finding the optimal k value using wcss, elbow curve method, and silhert score validation.', 'chapters': [{'end': 874.455, 'start': 761.375, 'title': 'Understanding k-means clustering and the elbow method', 'summary': 'Discusses the process of initializing centroids, updating centroids, and updating points in k-means clustering, and also explains the elbow method for determining the optimized k value for clustering, with an emphasis on the importance of this method in choosing the ideal k value.', 'duration': 113.08, 'highlights': ['The Elbow method helps to find the optimized K value for clustering, whether it should be 2, 3, or 4, and emphasizes the importance of this method in choosing the ideal K value.', 'The chapter explains the process of initializing centroids, updating centroids, and updating points in k-means clustering, emphasizing the understanding of these steps for effective implementation.', 'The chapter discusses the impact of internet connectivity on the clarity of screens and pixel differentiation, highlighting the importance of a stable internet connection for effective learning and visualization.']}, {'end': 1087.678, 'start': 874.455, 'title': 'Finding optimal k value for clustering', 'summary': 'Discusses the iterative process of finding the optimal k value for clustering, using the within cluster sum of squares (wcss) and the elbow curve method to determine the ideal number of clusters, followed by validation using the silhert score.', 'duration': 213.223, 'highlights': ['Explaining the process of finding the optimal K value through iterative construction of graphs and WCSS computation. Iteration from 1 to 10 is used to construct a graph and compute the distance of WCSS for each centroid.', 'Defining WCSS as within cluster sum of square and discussing the significance of its value in relation to the distance from centroids. Detailed explanation of WCSS as within cluster sum of square and its impact on the distance from centroids.', 'Describing the process of initializing centroids and computing the distance of WCSS, highlighting the significant increase in distance for k=1. Explanation of the significant increase in WCSS distance when initializing centroids for k=1.', 'Discussing the process of iteratively increasing k value and observing the decrease in WCSS, leading to the identification of the elbow curve as the optimal K value. Iterative observation of decreasing WCSS and identification of the elbow curve as the optimal K value.', 'Introducing the concept of using the elbow curve to identify the optimal K value through abrupt changes in WCSS. Explanation of the use of the elbow curve to identify the optimal K value through abrupt changes in WCSS.', "Mentioning the use of SILHERT score for validating the model's performance after finding the optimal K value. Introduction of the SILHERT score for validating the model's performance after finding the optimal K value."]}], 'duration': 326.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk761375.jpg', 'highlights': ['The Elbow method helps to find the optimized K value for clustering, emphasizing its importance in choosing the ideal K value.', 'Explaining the process of finding the optimal K value through iterative construction of graphs and WCSS computation.', 'Describing the process of initializing centroids and computing the distance of WCSS, highlighting the significant increase in distance for k=1.', 'Discussing the process of iteratively increasing k value and observing the decrease in WCSS, leading to the identification of the elbow curve as the optimal K value.', 'Introducing the concept of using the elbow curve to identify the optimal K value through abrupt changes in WCSS.', "Mentioning the use of SILHERT score for validating the model's performance after finding the optimal K value.", 'The chapter explains the process of initializing centroids, updating centroids, and updating points in k-means clustering, emphasizing the understanding of these steps for effective implementation.']}, {'end': 1562.894, 'segs': [{'end': 1114.569, 'src': 'embed', 'start': 1088.419, 'weight': 1, 'content': [{'end': 1094.381, 'text': 'But understand that in k-means clustering, we need to update the centroids.', 'start': 1088.419, 'duration': 5.962}, {'end': 1096.602, 'text': 'And based on that, we calculate the distance.', 'start': 1094.741, 'duration': 1.861}, {'end': 1105.526, 'text': "And as the k value keep on increasing, you will be able to see that the distance will become normal or the WCS's value will become normal.", 'start': 1097.022, 'duration': 8.504}, {'end': 1111.468, 'text': 'And then we really need to find out which is the feasible k value where the abrupt change.', 'start': 1106.206, 'duration': 5.262}, {'end': 1114.569, 'text': 'See over here, suppose abrupt change is there and then it is normal.', 'start': 1111.808, 'duration': 2.761}], 'summary': 'In k-means clustering, increasing k leads to normal distance values and we need to find the feasible k value with an abrupt change.', 'duration': 26.15, 'max_score': 1088.419, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1088419.jpg'}, {'end': 1163.78, 'src': 'embed', 'start': 1135.715, 'weight': 0, 'content': [{'end': 1140.477, 'text': 'first of all we need to construct this elbow curve, then see the changes where it is basically happening.', 'start': 1135.715, 'duration': 4.762}, {'end': 1142.338, 'text': "We'll need to find out the abrupt change.", 'start': 1140.497, 'duration': 1.841}, {'end': 1148.16, 'text': 'And once we get the abrupt change, we basically say that this may be the K value.', 'start': 1142.698, 'duration': 5.462}, {'end': 1149.6, 'text': 'So K is equal to 4.', 'start': 1148.34, 'duration': 1.26}, {'end': 1150.841, 'text': "As an example, I'm telling you.", 'start': 1149.6, 'duration': 1.241}, {'end': 1157.374, 'text': 'Okay, so unless and until if you really want to find the cluster, it is very much simple.', 'start': 1151.929, 'duration': 5.445}, {'end': 1163.78, 'text': 'We take a K value, we initialize K number of centroids, we compute the average to update the centroids.', 'start': 1157.694, 'duration': 6.086}], 'summary': 'Construct elbow curve to find k value, which is 4, for clustering.', 'duration': 28.065, 'max_score': 1135.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1135715.jpg'}, {'end': 1359.873, 'src': 'embed', 'start': 1336.693, 'weight': 2, 'content': [{'end': 1343.639, 'text': 'Okay, so hierarchical clustering will first of all find out the nearest point and try to compute the distance between them,', 'start': 1336.693, 'duration': 6.946}, {'end': 1347.983, 'text': 'and just try to combine them together into one into one.', 'start': 1343.639, 'duration': 4.344}, {'end': 1351.065, 'text': 'What do we do? We basically combine them into one group.', 'start': 1348.483, 'duration': 2.582}, {'end': 1354.048, 'text': 'Okay, so P1 and P2 has been combined.', 'start': 1351.846, 'duration': 2.202}, {'end': 1357.891, 'text': "Let's say then it will go and find out the other nearest points.", 'start': 1354.809, 'duration': 3.082}, {'end': 1359.873, 'text': "So let's say P6 and P7 are near.", 'start': 1357.951, 'duration': 1.922}], 'summary': 'Hierarchical clustering combines nearest points into groups.', 'duration': 23.18, 'max_score': 1336.693, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1336693.jpg'}, {'end': 1482.569, 'src': 'embed', 'start': 1455.849, 'weight': 3, 'content': [{'end': 1461.352, 'text': "Okay And then finally, you'll be seeing that P6, P7 is the nearest group to this.", 'start': 1455.849, 'duration': 5.503}, {'end': 1465.874, 'text': 'So this will totally get combined and it may look something like this.', 'start': 1461.712, 'duration': 4.162}, {'end': 1468.836, 'text': 'So this will become a total group like this.', 'start': 1466.675, 'duration': 2.161}, {'end': 1471.801, 'text': 'So all the groups are combined.', 'start': 1470.58, 'duration': 1.221}, {'end': 1476.264, 'text': "So finally you'll be able to see that there will be one more line which will get combined like this.", 'start': 1472.201, 'duration': 4.063}, {'end': 1480.527, 'text': 'This is basically called as dendogram.', 'start': 1477.725, 'duration': 2.802}, {'end': 1482.569, 'text': 'Dendogram Okay.', 'start': 1481.348, 'duration': 1.221}], 'summary': 'P6, p7 form nearest group, creating total combined groups in dendogram.', 'duration': 26.72, 'max_score': 1455.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1455849.jpg'}], 'start': 1088.419, 'title': 'Clustering algorithms', 'summary': 'Explains k-means clustering and finding feasible k value, including updating centroids, calculating distances, constructing an elbow curve, and the impact of k value. it also discusses hierarchical clustering, including finding nearest data points, computing distances, forming a dendrogram, and determining the number of groups.', 'chapters': [{'end': 1202.503, 'start': 1088.419, 'title': 'K-means clustering and finding feasible k value', 'summary': 'Explains the process of updating centroids and calculating distances in k-means clustering, determining the feasible k value by constructing an elbow curve and identifying the abrupt change, and the impact of k value on the number of clusters.', 'duration': 114.084, 'highlights': ["The process of updating centroids and calculating distances in k-means clustering is essential for determining the feasible k value and model complexity, involving checking different k values and WCS's values.", 'Constructing an elbow curve and identifying the abrupt change is crucial for determining the feasible k value, which impacts the number of clusters obtained in k-means clustering, such as obtaining 4 clusters with k=4.']}, {'end': 1562.894, 'start': 1204.085, 'title': 'Hierarchical clustering', 'summary': 'Discusses hierarchical clustering, a simple algorithm that finds nearest data points, computes distances, and combines them into clusters, forming a dendrogram to determine the number of groups.', 'duration': 358.809, 'highlights': ['Hierarchical clustering finds nearest points, computes distances, and combines them into clusters. The algorithm first finds the nearest point, computes the distance, and combines them into one cluster.', 'A dendrogram is formed to determine the number of groups. A dendrogram is created to visualize the clustering and determine the number of groups based on the longest vertical line with no horizontal line passing through it.']}], 'duration': 474.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1088419.jpg', 'highlights': ['Constructing an elbow curve and identifying the abrupt change is crucial for determining the feasible k value, which impacts the number of clusters obtained in k-means clustering, such as obtaining 4 clusters with k=4.', "The process of updating centroids and calculating distances in k-means clustering is essential for determining the feasible k value and model complexity, involving checking different k values and WCS's values.", 'Hierarchical clustering finds nearest points, computes distances, and combines them into clusters. The algorithm first finds the nearest point, computes the distance, and combines them into one cluster.', 'A dendrogram is formed to determine the number of groups. A dendrogram is created to visualize the clustering and determine the number of groups based on the longest vertical line with no horizontal line passing through it.']}, {'end': 2311.257, 'segs': [{'end': 1729.545, 'src': 'embed', 'start': 1680.123, 'weight': 0, 'content': [{'end': 1689.869, 'text': 'okay?. At that point of time, hierarchical clustering will keep on constructing this kind of dendrograms and it will be taking many, many, many time,', 'start': 1680.123, 'duration': 9.746}, {'end': 1690.81, 'text': 'lot time, right?', 'start': 1689.869, 'duration': 0.941}, {'end': 1693.572, 'text': 'So hierarchical clustering will take more time.', 'start': 1691.31, 'duration': 2.262}, {'end': 1699.15, 'text': 'okay, maximum time that it is going to basically take.', 'start': 1694.628, 'duration': 4.522}, {'end': 1707.294, 'text': 'okay, so it is very much important that you understand which is making it basically taking more time.', 'start': 1699.15, 'duration': 8.144}, {'end': 1713.317, 'text': 'so if your data set is small, you may go ahead with hierarchical clustering.', 'start': 1707.294, 'duration': 6.023}, {'end': 1716.458, 'text': 'if your data set is large, go with k-means clustering.', 'start': 1713.317, 'duration': 3.141}, {'end': 1720.163, 'text': 'go with k-means clustering.', 'start': 1718.863, 'duration': 1.3}, {'end': 1724.904, 'text': 'In short, both will take more time, but k-means will perform better than hierarchical clustering.', 'start': 1720.223, 'duration': 4.681}, {'end': 1729.545, 'text': 'Okay?. See, guys, you will be forming this kind of dendograms right?', 'start': 1725.505, 'duration': 4.04}], 'summary': 'For small datasets, use hierarchical clustering; for large datasets, use k-means clustering, as it performs better.', 'duration': 49.422, 'max_score': 1680.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1680123.jpg'}, {'end': 1803.752, 'src': 'embed', 'start': 1772.202, 'weight': 2, 'content': [{'end': 1783.846, 'text': 'So for validating clustering models we are going to use something called as, so we are going to basically use something called a Silhout score.', 'start': 1772.202, 'duration': 11.644}, {'end': 1786.687, 'text': "I'll show you what Silhout score is.", 'start': 1784.787, 'duration': 1.9}, {'end': 1788.628, 'text': "I'm going to just open the Wikipedia.", 'start': 1786.847, 'duration': 1.781}, {'end': 1793.79, 'text': 'So this is how a Silhout score looks like.', 'start': 1791.589, 'duration': 2.201}, {'end': 1796.551, 'text': 'A very, very amazing topic.', 'start': 1794.87, 'duration': 1.681}, {'end': 1803.752, 'text': 'Okay How do we validate? whether my model basically has perfect 3 or 4 model.', 'start': 1796.911, 'duration': 6.841}], 'summary': 'Validating clustering models using silhouette score to determine model accuracy.', 'duration': 31.55, 'max_score': 1772.202, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1772202.jpg'}, {'end': 1929.892, 'src': 'embed', 'start': 1898.146, 'weight': 1, 'content': [{'end': 1899.867, 'text': 'right. so this is a problem.', 'start': 1898.146, 'duration': 1.721}, {'end': 1905.711, 'text': 'so for this there is an algorithm which is called, as k means, plus plus, and what this k means plus plus will do,', 'start': 1899.867, 'duration': 5.844}, {'end': 1907.853, 'text': 'which i will probably show in practical.', 'start': 1905.711, 'duration': 2.142}, {'end': 1913.777, 'text': 'this will make sure that all the centroids that are initialized it is very, very far.', 'start': 1907.853, 'duration': 5.924}, {'end': 1915.578, 'text': 'okay, all the centroids.', 'start': 1913.777, 'duration': 1.801}, {'end': 1917.179, 'text': 'that is basically there.', 'start': 1915.578, 'duration': 1.601}, {'end': 1918.66, 'text': 'it is initialized very, very far.', 'start': 1917.179, 'duration': 1.481}, {'end': 1924.685, 'text': "we'll see that in practical application, where specifically those centroids are basically used.", 'start': 1918.66, 'duration': 6.025}, {'end': 1929.892, 'text': 'now let me go ahead and let me show you with respect to Silhouette clustering.', 'start': 1924.685, 'duration': 5.207}], 'summary': 'K means plus plus algorithm ensures centroids are initialized very far apart, demonstrated in silhouette clustering.', 'duration': 31.746, 'max_score': 1898.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1898146.jpg'}, {'end': 2079.168, 'src': 'heatmap', 'start': 2029.177, 'weight': 0.746, 'content': [{'end': 2033.34, 'text': "So I'm going to do the summation and I'm also going to do the average of all this distance.", 'start': 2029.177, 'duration': 4.163}, {'end': 2041.966, 'text': 'So here you can see that when I say distance of i comma j, i basically means this point, j basically means all these points.', 'start': 2034.1, 'duration': 7.866}, {'end': 2045.039, 'text': 'I is nothing but it is the centroid.', 'start': 2043.198, 'duration': 1.841}, {'end': 2047.139, 'text': 'So here is nothing but this is the centroid.', 'start': 2045.079, 'duration': 2.06}, {'end': 2048.739, 'text': "Let's say that I am having this centroid.", 'start': 2047.179, 'duration': 1.56}, {'end': 2055.021, 'text': 'Okay So I am going to compute all the distance over here which is mentioned by this.', 'start': 2050.8, 'duration': 4.221}, {'end': 2059.822, 'text': 'And this value that you see that I am actually dividing by C of I minus 1.', 'start': 2055.501, 'duration': 4.321}, {'end': 2063.223, 'text': 'In short I am actually trying to calculate the average distance.', 'start': 2059.822, 'duration': 3.401}, {'end': 2069.545, 'text': 'So this is the first point where I am actually computing the A of I.', 'start': 2065.083, 'duration': 4.462}, {'end': 2072.226, 'text': 'Okay Now similarly what I will do is that.', 'start': 2069.545, 'duration': 2.681}, {'end': 2079.168, 'text': 'What I will do is that the next point will be that suppose I have computed A.', 'start': 2073.922, 'duration': 5.246}], 'summary': 'Calculating average distance from centroid for data points.', 'duration': 49.991, 'max_score': 2029.177, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2029177.jpg'}, {'end': 2233.12, 'src': 'embed', 'start': 2156.022, 'weight': 4, 'content': [{'end': 2163.269, 'text': 'average. okay, now, tell me if i try to find out the relationship between a of i and b of i.', 'start': 2156.022, 'duration': 7.247}, {'end': 2179.626, 'text': 'if my cluster model is good, will a of i will be greater than b of i, or will b of i will be greater than a of i if i have a good clustering model?', 'start': 2163.269, 'duration': 16.357}, {'end': 2188.937, 'text': 'tell me If I have a crude clustering model, will A of I is greater than B of I will be greater than B of I,', 'start': 2179.626, 'duration': 9.311}, {'end': 2192.022, 'text': 'or whether B of I will be greater than A of I?', 'start': 2188.937, 'duration': 3.085}, {'end': 2194.286, 'text': 'Out of this, if we have a really good model.', 'start': 2192.022, 'duration': 2.264}, {'end': 2195.568, 'text': 'Tell me guys.', 'start': 2195.107, 'duration': 0.461}, {'end': 2214.401, 'text': 'Tell me, guys, whether A of I will be greater than B of I or whether B of I will be greater than A of I in a good model?', 'start': 2207.52, 'duration': 6.881}, {'end': 2222.183, 'text': 'Obviously, the distance between B of I will be greater than A of I in a good model.', 'start': 2215.362, 'duration': 6.821}, {'end': 2233.12, 'text': 'Okay, that basically means if I talk about silhouette clustering, the values will be between minus 1 to plus 1.', 'start': 2222.203, 'duration': 10.917}], 'summary': 'In a good clustering model, b of i will be greater than a of i.', 'duration': 77.098, 'max_score': 2156.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2156022.jpg'}], 'start': 1564.781, 'title': 'Comparing clustering methods', 'summary': 'Explains the differences between hierarchical and k-means clustering, highlighting the time performance for large datasets. it covers challenges with k-means clustering initialization, importance of k-means++ algorithm, and significance of silhouette clustering in validating cluster models.', 'chapters': [{'end': 1803.752, 'start': 1564.781, 'title': 'Hierarchical vs k-means clustering', 'summary': 'Explains the difference between hierarchical and k-means clustering, highlighting that hierarchical clustering takes more time for large datasets, while k-means clustering performs better. it also discusses the validation of clustering models using silhout score.', 'duration': 238.971, 'highlights': ['Hierarchical clustering takes more time for large datasets Hierarchical clustering will take more time when dealing with a large number of data points and constructing dendrograms.', 'K-means clustering performs better than hierarchical clustering In large datasets, k-means clustering is recommended as it performs better than hierarchical clustering, even though both methods take more time.', 'Validation of clustering models using Silhout score The chapter introduces the use of Silhout score to validate clustering models, providing a visual representation of the score and its importance in determining the quality of clustering.']}, {'end': 2311.257, 'start': 1804.632, 'title': 'K-means clustering and silhouette clustering', 'summary': 'Covers the challenges with k-means clustering initialization, the importance of k-means++ algorithm for initializing centroids, and the significance of silhouette clustering in validating cluster models, with insights into the calculation of a of i and b of i and their impact on clustering models.', 'duration': 506.625, 'highlights': ['The importance of k-means++ algorithm for initializing centroids The k-means++ algorithm ensures that centroids are initialized very far apart, reducing the risk of incorrect initialization and the potential for multiple clusters, enhancing the accuracy of k-means clustering model.', 'Significance of silhouette clustering in validating cluster models Silhouette clustering provides values between -1 to +1, with a higher value indicating a better clustering model, and the comparison between a of i and b of i determines the quality of the clustering model.', 'Insights into the calculation of a of i and b of i The calculation of a of i involves computing the average distance of points within a cluster from its centroid, while b of i involves finding the average distance of points from other clusters, and the comparison between a of i and b of i indicates the quality of the clustering model.']}], 'duration': 746.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk1564781.jpg', 'highlights': ['K-means clustering performs better than hierarchical clustering for large datasets', 'The importance of k-means++ algorithm for initializing centroids to enhance accuracy', 'Validation of clustering models using Silhout score for determining quality', 'Hierarchical clustering takes more time for large datasets due to dendrogram construction', 'Significance of silhouette clustering in validating cluster models with values between -1 to +1', 'Insights into the calculation of a of i and b of i for determining clustering model quality']}, {'end': 2650.707, 'segs': [{'end': 2344.978, 'src': 'embed', 'start': 2312.56, 'weight': 0, 'content': [{'end': 2316.302, 'text': 'Okay, so this is the outcome with respect to silhouette crush trend.', 'start': 2312.56, 'duration': 3.742}, {'end': 2318.743, 'text': 'So everybody wants to do a practical session.', 'start': 2316.922, 'duration': 1.821}, {'end': 2321.084, 'text': 'Just let me know.', 'start': 2320.423, 'duration': 0.661}, {'end': 2329.007, 'text': 'I hope everybody understood till here.', 'start': 2327.466, 'duration': 1.541}, {'end': 2333.089, 'text': "We'll also discuss about dbScan, don't worry.", 'start': 2331.208, 'duration': 1.881}, {'end': 2336.61, 'text': 'dbScan is pretty much amazing than k-means and hierarchical means.', 'start': 2333.389, 'duration': 3.221}, {'end': 2344.978, 'text': 'I will discuss about what is exactly DBSCAN and then we will discuss about the practical problem and then we will finish it off.', 'start': 2340.037, 'duration': 4.941}], 'summary': 'Discussion on silhouette crush trend and practical session with dbscan, a more effective method than k-means and hierarchical means.', 'duration': 32.418, 'max_score': 2312.56, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2312560.jpg'}, {'end': 2500.623, 'src': 'embed', 'start': 2457.946, 'weight': 1, 'content': [{'end': 2458.907, 'text': "I'm just kidding anyhow.", 'start': 2457.946, 'duration': 0.961}, {'end': 2460.448, 'text': "If you don't like also, it's okay.", 'start': 2459.087, 'duration': 1.361}, {'end': 2468.654, 'text': 'Okay Now in DB scan clustering, what are the important things? Okay.', 'start': 2461.709, 'duration': 6.945}, {'end': 2492.76, 'text': "Shall I start everyone? Okay, shall I start everybody? Okay, so let's start with respect to DB and clustering.", 'start': 2469.494, 'duration': 23.266}, {'end': 2495.882, 'text': "And let's understand some of the important points over here.", 'start': 2493.301, 'duration': 2.581}, {'end': 2500.623, 'text': 'The first point that you really need to remember is something called as core points.', 'start': 2496.582, 'duration': 4.041}], 'summary': 'Db scan clustering: key point is core points.', 'duration': 42.677, 'max_score': 2457.946, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2457946.jpg'}, {'end': 2632.339, 'src': 'embed', 'start': 2600.339, 'weight': 3, 'content': [{'end': 2603.682, 'text': 'Now understand one thing, this point is definitely an outlier.', 'start': 2600.339, 'duration': 3.343}, {'end': 2610.529, 'text': 'Even though this is an outlier, with the help of k means what I am actually doing, I am actually grouping this into another group.', 'start': 2605.084, 'duration': 5.445}, {'end': 2618.237, 'text': 'So can we have a scenario wherein a kind of clustering algorithm is there where we can leave the outlier separately.', 'start': 2611.53, 'duration': 6.707}, {'end': 2620.774, 'text': 'and this outlier.', 'start': 2619.474, 'duration': 1.3}, {'end': 2632.339, 'text': 'in this particular algorithm and this is basically we will be using dbScan to relieve the outlier and this point will be called as a noisy point noisy point,', 'start': 2620.774, 'duration': 11.565}], 'summary': 'Use dbscan to identify and separate noisy outlier points.', 'duration': 32, 'max_score': 2600.339, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2600339.jpg'}], 'start': 2312.56, 'title': 'Dbscan clustering and noise points', 'summary': 'Covers an overview of dbscan clustering, highlighting its superiority over k-means and hierarchical means. it also discusses core points, min points, and border points. additionally, it explains the concept of noise points in clustering, addressing the challenge of outliers in k-means clustering and the use of dbscan to identify and remove noisy points, thereby enhancing clustering accuracy.', 'chapters': [{'end': 2528.033, 'start': 2312.56, 'title': 'Dbscan clustering overview', 'summary': 'Covers an overview of dbscan clustering, emphasizing its superiority over k-means and hierarchical means, and discussing key concepts such as core points, min points, and border points.', 'duration': 215.473, 'highlights': ["The chapter emphasizes the superiority of DBSCAN over k-means and hierarchical means, stating that DBSCAN is 'pretty much amazing' (quantifiable data: qualitative statement).", 'The instructor discusses key concepts of DBSCAN clustering, including core points, min points, and border points, highlighting their importance in understanding the algorithm (quantifiable data: specific concepts emphasized).', 'The practical session on DBSCAN clustering is mentioned, indicating a hands-on approach to learning the algorithm (quantifiable data: practical application).']}, {'end': 2650.707, 'start': 2531.416, 'title': 'Understanding noise points in clustering', 'summary': 'Discusses the concept of noise points in clustering, highlighting the challenge of outliers in k-means clustering and the solution of using dbscan to identify and relieve noisy points, thereby improving the clustering accuracy.', 'duration': 119.291, 'highlights': ['The concept of noise points in clustering is addressed, emphasizing the challenge of outliers in k-means clustering and the potential to combine outliers into separate clusters, illustrating the need for a clustering algorithm that can identify and exclude outliers, thereby improving clustering accuracy.', 'The challenge of outliers in k-means clustering is highlighted, illustrating how outliers can be incorrectly grouped into clusters, leading to the need for a clustering algorithm that can differentiate and relieve noisy points, ultimately improving clustering accuracy.']}], 'duration': 338.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2312560.jpg', 'highlights': ["DBSCAN is 'pretty much amazing' compared to k-means and hierarchical means.", 'The instructor discusses key concepts of DBSCAN clustering: core points, min points, and border points.', 'Practical session on DBSCAN clustering indicates a hands-on approach to learning the algorithm.', 'Addressing the challenge of outliers in k-means clustering and the potential to combine outliers into separate clusters.', 'Illustrating the need for a clustering algorithm that can identify and exclude outliers, thereby improving clustering accuracy.']}, {'end': 3302.787, 'segs': [{'end': 2884.773, 'src': 'heatmap', 'start': 2733.231, 'weight': 0.727, 'content': [{'end': 2738.853, 'text': 'if I say my min point is equal to 4, which is again a hyperparameter.', 'start': 2733.231, 'duration': 5.622}, {'end': 2749.076, 'text': 'that basically means I can, if I have 4, at least 4 points over here, at least 4 points over here, okay, at least 4 points near to me,', 'start': 2738.853, 'duration': 10.223}, {'end': 2750.377, 'text': 'near to this particular circle.', 'start': 2749.076, 'duration': 1.301}, {'end': 2759.22, 'text': 'based on this epsilon value, then what will happen is that this point, this red point, will actually become a core point,', 'start': 2750.377, 'duration': 8.843}, {'end': 2763.099, 'text': 'a core point which is basically given over here.', 'start': 2760.757, 'duration': 2.342}, {'end': 2771.966, 'text': 'Okay? If it has at least that many number of min points inside or near to this particular, within this epsilon.', 'start': 2764.2, 'duration': 7.766}, {'end': 2774.588, 'text': 'Okay? Within this particular cluster.', 'start': 2773.147, 'duration': 1.441}, {'end': 2778.551, 'text': 'Suppose this is my cluster with the help of epsilon I have actually created it.', 'start': 2775.629, 'duration': 2.922}, {'end': 2783.875, 'text': 'Okay? Is there a particular unit of epsilon or we simply take the unit of distance? No.', 'start': 2779.111, 'duration': 4.764}, {'end': 2786.537, 'text': 'Epsilon value will also get selected through some way.', 'start': 2784.315, 'duration': 2.222}, {'end': 2789.316, 'text': "Okay? I'll show you.", 'start': 2787.678, 'duration': 1.638}, {'end': 2790.917, 'text': "I'll show you in the practical application.", 'start': 2789.396, 'duration': 1.521}, {'end': 2791.338, 'text': "Don't worry.", 'start': 2790.977, 'duration': 0.361}, {'end': 2797.223, 'text': "Okay Now the next thing is that let's say I have another point over here.", 'start': 2791.758, 'duration': 5.465}, {'end': 2799.205, 'text': "Let's say that I have another point over here.", 'start': 2797.283, 'duration': 1.922}, {'end': 2802.769, 'text': 'And this is my circle with respect to epsilon.', 'start': 2799.686, 'duration': 3.083}, {'end': 2803.569, 'text': 'I have created it.', 'start': 2802.829, 'duration': 0.74}, {'end': 2807.794, 'text': "Let's say that here I have only one point.", 'start': 2804.41, 'duration': 3.384}, {'end': 2813.708, 'text': 'I have only one point inside this particular cluster.', 'start': 2809.787, 'duration': 3.921}, {'end': 2818.61, 'text': 'At that point, this point becomes something called as border point.', 'start': 2814.548, 'duration': 4.062}, {'end': 2821.13, 'text': 'Border point.', 'start': 2820.13, 'duration': 1}, {'end': 2823.431, 'text': 'Border point also we have discussed over here.', 'start': 2821.77, 'duration': 1.661}, {'end': 2826.252, 'text': 'Right? So border point is also there.', 'start': 2824.471, 'duration': 1.781}, {'end': 2836.072, 'text': 'Okay? So here I am saying that at least one, at least one, if it is only one it is present, okay, then it will become a border point.', 'start': 2827.192, 'duration': 8.88}, {'end': 2838.935, 'text': 'If it has force, definitely this will become a core point.', 'start': 2836.613, 'duration': 2.322}, {'end': 2840.738, 'text': 'Core point like how we have this red color.', 'start': 2839.016, 'duration': 1.722}, {'end': 2846.284, 'text': 'Okay So, and there will be one more scenario.', 'start': 2843.14, 'duration': 3.144}, {'end': 2847.946, 'text': 'Suppose I have this one cluster.', 'start': 2846.304, 'duration': 1.642}, {'end': 2851.35, 'text': "Let's say this is my epsilon.", 'start': 2850.069, 'duration': 1.281}, {'end': 2857.573, 'text': "And suppose if I don't have any points near this, then this will definitely become my noise point.", 'start': 2851.85, 'duration': 5.723}, {'end': 2861.275, 'text': 'And this noise point will nothing be but this will be a cluster.', 'start': 2858.253, 'duration': 3.022}, {'end': 2865.537, 'text': 'Okay So here I have actually discussed about the noise point also.', 'start': 2862.715, 'duration': 2.822}, {'end': 2872.707, 'text': 'Right? So I hope everybody is able to understand the key terms.', 'start': 2869.386, 'duration': 3.321}, {'end': 2878.95, 'text': 'Now, what is basically happening is that whenever we have a noise point, like in this particular scenario,', 'start': 2872.828, 'duration': 6.122}, {'end': 2884.773, 'text': "we have a noise point and we don't find any points inside this, any core point or border point.", 'start': 2878.95, 'duration': 5.823}], 'summary': 'Understanding clustering, core points, border points, noise points, and epsilon values in practical applications.', 'duration': 151.542, 'max_score': 2733.231, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2733231.jpg'}, {'end': 2789.316, 'src': 'embed', 'start': 2760.757, 'weight': 0, 'content': [{'end': 2763.099, 'text': 'a core point which is basically given over here.', 'start': 2760.757, 'duration': 2.342}, {'end': 2771.966, 'text': 'Okay? If it has at least that many number of min points inside or near to this particular, within this epsilon.', 'start': 2764.2, 'duration': 7.766}, {'end': 2774.588, 'text': 'Okay? Within this particular cluster.', 'start': 2773.147, 'duration': 1.441}, {'end': 2778.551, 'text': 'Suppose this is my cluster with the help of epsilon I have actually created it.', 'start': 2775.629, 'duration': 2.922}, {'end': 2783.875, 'text': 'Okay? Is there a particular unit of epsilon or we simply take the unit of distance? No.', 'start': 2779.111, 'duration': 4.764}, {'end': 2786.537, 'text': 'Epsilon value will also get selected through some way.', 'start': 2784.315, 'duration': 2.222}, {'end': 2789.316, 'text': "Okay? I'll show you.", 'start': 2787.678, 'duration': 1.638}], 'summary': 'Discussion on selecting epsilon value for clustering.', 'duration': 28.559, 'max_score': 2760.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2760757.jpg'}, {'end': 3073.65, 'src': 'embed', 'start': 3041.982, 'weight': 1, 'content': [{'end': 3045.745, 'text': 'These are all outliers are not combined inside a group.', 'start': 3041.982, 'duration': 3.763}, {'end': 3052.051, 'text': 'But whichever are nearer as a core point and the broader point, separate-separate groups are actually created.', 'start': 3046.706, 'duration': 5.345}, {'end': 3057.596, 'text': 'Right? So this is how amazing a DB scan clustering is.', 'start': 3053.853, 'duration': 3.743}, {'end': 3063.666, 'text': 'right. a db scan clustering is pretty much amazing.', 'start': 3059.284, 'duration': 4.382}, {'end': 3067.407, 'text': 'that is basically the outcome of this here in k-means clustering.', 'start': 3063.666, 'duration': 3.741}, {'end': 3068.087, 'text': 'you can see this.', 'start': 3067.407, 'duration': 0.68}, {'end': 3073.65, 'text': "all these points has also been taken as blue color as one group, because i'll be considering this as one group,", 'start': 3068.087, 'duration': 5.563}], 'summary': 'Db scan clustering creates separate groups based on core and broader points, showing amazing results.', 'duration': 31.668, 'max_score': 3041.982, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3041982.jpg'}, {'end': 3158.682, 'src': 'embed', 'start': 3094.081, 'weight': 3, 'content': [{'end': 3095.242, 'text': 'Download the code.', 'start': 3094.081, 'duration': 1.161}, {'end': 3103.846, 'text': 'So download the code guys.', 'start': 3095.262, 'duration': 8.584}, {'end': 3105.386, 'text': "I've given you the GitHub link.", 'start': 3104.246, 'duration': 1.14}, {'end': 3107.087, 'text': 'Quickly download and keep your file ready.', 'start': 3105.426, 'duration': 1.661}, {'end': 3114.63, 'text': "I'm also going to download.", 'start': 3113.45, 'duration': 1.18}, {'end': 3132.474, 'text': "I'm going to open my Anaconda prompt.", 'start': 3130.634, 'duration': 1.84}, {'end': 3135.235, 'text': 'Probably open my Jupyter notebook.', 'start': 3133.655, 'duration': 1.58}, {'end': 3145.71, 'text': "We'll do one practical problem.", 'start': 3144.209, 'duration': 1.501}, {'end': 3156.5, 'text': "I've given you the link guys, please open it.", 'start': 3154.658, 'duration': 1.842}, {'end': 3158.682, 'text': 'So this is what we are going to do today.', 'start': 3156.6, 'duration': 2.082}], 'summary': 'Download the code from the provided github link for a practical problem in anaconda and jupyter.', 'duration': 64.601, 'max_score': 3094.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3094081.jpg'}, {'end': 3242.567, 'src': 'embed', 'start': 3200.805, 'weight': 4, 'content': [{'end': 3204.187, 'text': 'And please open the file and keep it ready and then we will try to explain.', 'start': 3200.805, 'duration': 3.382}, {'end': 3204.627, 'text': 'See this.', 'start': 3204.267, 'duration': 0.36}, {'end': 3206.468, 'text': 'This will be amazing.', 'start': 3205.407, 'duration': 1.061}, {'end': 3208.829, 'text': 'Here you will be able to see amazing things.', 'start': 3206.868, 'duration': 1.961}, {'end': 3217.373, 'text': 'How do you come to know that overfitting or underfitting is happening?', 'start': 3213.671, 'duration': 3.702}, {'end': 3218.733, 'text': "You don't know the real value right?", 'start': 3217.393, 'duration': 1.34}, {'end': 3222.335, 'text': 'So in clustering there will not be any underfitting or overfitting.', 'start': 3218.793, 'duration': 3.542}, {'end': 3242.567, 'text': 'Everybody ready? Everybody ready? Just give me a confirmation.', 'start': 3225.977, 'duration': 16.59}], 'summary': 'Explanation of file opening and identifying overfitting/underfitting in clustering.', 'duration': 41.762, 'max_score': 3200.805, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3200805.jpg'}], 'start': 2651.767, 'title': 'Understanding dbscan clustering', 'summary': 'Explains the concepts of dbscan clustering, emphasizing the identification of core points and border points, and the advantages over traditional clustering methods. it encourages practical application through a github repository download, with a focus on problem-solving and career success stories in data science.', 'chapters': [{'end': 2915.523, 'start': 2651.767, 'title': 'Dbscan algorithm explained', 'summary': 'Explains the dbscan algorithm, which uses a minimum points hyperparameter and an epsilon value as a radius to classify points into core, border, and noise points, with an example of how a point becomes a core point with at least 4 nearby points within its epsilon radius.', 'duration': 263.756, 'highlights': ['The DBSCAN algorithm utilizes a minimum points hyperparameter and an epsilon value as a radius to classify points into core, border, and noise points, with the example of a point becoming a core point if it has at least 4 nearby points within its epsilon radius.', 'The epsilon value represents the radius of a specific circle, while the minimum points hyperparameter determines the minimum number of points needed within the epsilon radius for a point to be classified as a core point.', 'If a point has only one nearby point within its epsilon radius, it becomes a border point, and if it has no nearby points, it is classified as a noise point, or an outlier, and is not included in any group.']}, {'end': 3302.787, 'start': 2915.903, 'title': 'Understanding dbscan clustering', 'summary': 'Explains the concepts of dbscan clustering, emphasizing the identification of core points and border points, and the advantages over traditional clustering methods. it also encourages the practical application of the concepts through a github repository download, with a focus on practical problem-solving and career success stories in data science.', 'duration': 386.884, 'highlights': ['The chapter emphasizes the identification of core points and border points in DBScan clustering, highlighting the process of combining them into a single group and the criteria of at least one core point for a border point to be formed.', 'The presenter contrasts DBScan clustering with traditional methods like K-means, highlighting the superior grouping of core and border points in DBScan clustering compared to the traditional clustering approach.', 'The practical application of the concepts is encouraged through the distribution of a GitHub repository link for download, signaling a focus on hands-on problem-solving and practical learning in data science.', 'The chapter shares success stories of individuals making career transitions in data science, serving as motivation and inspiration for the audience.', 'The presenter encourages audience engagement through likes and confirmations, creating an interactive learning environment and promoting active participation.', 'The emphasis on the absence of underfitting or overfitting in clustering, contrasting it with other machine learning concepts, reinforces the unique characteristics of clustering algorithms.']}], 'duration': 651.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk2651767.jpg', 'highlights': ['The DBSCAN algorithm utilizes a minimum points hyperparameter and an epsilon value as a radius to classify points into core, border, and noise points, with the example of a point becoming a core point if it has at least 4 nearby points within its epsilon radius.', 'The chapter emphasizes the identification of core points and border points in DBScan clustering, highlighting the process of combining them into a single group and the criteria of at least one core point for a border point to be formed.', 'The presenter contrasts DBScan clustering with traditional methods like K-means, highlighting the superior grouping of core and border points in DBScan clustering compared to the traditional clustering approach.', 'The practical application of the concepts is encouraged through the distribution of a GitHub repository link for download, signaling a focus on hands-on problem-solving and practical learning in data science.', 'The emphasis on the absence of underfitting or overfitting in clustering, contrasting it with other machine learning concepts, reinforces the unique characteristics of clustering algorithms.']}, {'end': 3692.298, 'segs': [{'end': 3400.06, 'src': 'embed', 'start': 3376.87, 'weight': 3, 'content': [{'end': 3383.955, 'text': 'first of all, we are just trying to generate some samples with some two features and we are saying that, okay, it should have four centroids,', 'start': 3376.87, 'duration': 7.085}, {'end': 3387.338, 'text': 'or see centroids itself, with some features.', 'start': 3383.955, 'duration': 3.383}, {'end': 3390.88, 'text': "i'm trying to generate some x and y data randomly.", 'start': 3387.338, 'duration': 3.542}, {'end': 3396.704, 'text': 'okay, and this particular data set will basically be used in performing clustering algorithms.', 'start': 3390.88, 'duration': 5.824}, {'end': 3400.06, 'text': 'okay, Forget about range underscore n, underscore clusters,', 'start': 3396.704, 'duration': 3.356}], 'summary': 'Generating data with two features and four centroids for clustering algorithms.', 'duration': 23.19, 'max_score': 3376.87, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3376870.jpg'}, {'end': 3448.188, 'src': 'embed', 'start': 3419.782, 'weight': 0, 'content': [{'end': 3424.445, 'text': 'with one feature, which is my output, which belongs to a specific class.', 'start': 3419.782, 'duration': 4.663}, {'end': 3431.81, 'text': 'okay, so that you can actually do with the help of make underscore blobs.', 'start': 3424.445, 'duration': 7.365}, {'end': 3437.761, 'text': "okay, now Let's say how to apply k-means clustering algorithm.", 'start': 3431.81, 'duration': 5.951}, {'end': 3440.162, 'text': 'So as I said that I will be using WCSS.', 'start': 3437.781, 'duration': 2.381}, {'end': 3443.325, 'text': 'WCSS basically means within cluster sum of square.', 'start': 3440.783, 'duration': 2.542}, {'end': 3448.188, 'text': "So I'm going to import k-means over here for i in range 1 comma 11.", 'start': 3444.105, 'duration': 4.083}], 'summary': 'Using k-means clustering algorithm to apply wcss for range 1-11', 'duration': 28.406, 'max_score': 3419.782, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3419782.jpg'}, {'end': 3549.6, 'src': 'embed', 'start': 3512.853, 'weight': 1, 'content': [{'end': 3517.775, 'text': 'So if I talk about the last abrupt change, Here I have the specific value with respect to this.', 'start': 3512.853, 'duration': 4.922}, {'end': 3522.776, 'text': 'Okay, I have one specific value with respect to this.', 'start': 3520.335, 'duration': 2.441}, {'end': 3524.416, 'text': 'This is my abrupt change.', 'start': 3523.136, 'duration': 1.28}, {'end': 3526.477, 'text': 'From here, the changes are normal.', 'start': 3524.856, 'duration': 1.621}, {'end': 3529.698, 'text': "So I'm going to basically select k is equal to 4.", 'start': 3526.857, 'duration': 2.841}, {'end': 3531.258, 'text': "Now what I'm actually going to do.", 'start': 3529.698, 'duration': 1.56}, {'end': 3549.6, 'text': "with the help of Solheart I hope I'm sorry with the help of silhoutte plus score, we are going to compare whether k is equal to 4, is valid or not.", 'start': 3531.258, 'duration': 18.342}], 'summary': 'Using k=4, we compare abrupt change value with silhouette plus score.', 'duration': 36.747, 'max_score': 3512.853, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3512853.jpg'}, {'end': 3613.228, 'src': 'embed', 'start': 3581.631, 'weight': 2, 'content': [{'end': 3586.332, 'text': "But I'm just going to talk about like what are the important things we need to see over here.", 'start': 3581.631, 'duration': 4.701}, {'end': 3590.173, 'text': 'With respect to different different clusters see, see this.', 'start': 3586.852, 'duration': 3.321}, {'end': 3592.714, 'text': 'clusters two, three, four, five, six.', 'start': 3590.173, 'duration': 2.541}, {'end': 3599.157, 'text': "I'm going to basically compare whether the K value should be 4 or not with the help of Silhouette scoring.", 'start': 3592.714, 'duration': 6.443}, {'end': 3602.019, 'text': "Okay So let's go over here.", 'start': 3599.457, 'duration': 2.562}, {'end': 3604.801, 'text': "And here you can see that I'm applying this one.", 'start': 3602.78, 'duration': 2.021}, {'end': 3613.228, 'text': 'First, I will go with respect to for loop for n underscore clusters in range underscore clusters, different, different cluster values are there.', 'start': 3606.663, 'duration': 6.565}], 'summary': 'Comparing k values (4 or not) using silhouette scoring for clusters 2-6.', 'duration': 31.597, 'max_score': 3581.631, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3581631.jpg'}, {'end': 3652.625, 'src': 'embed', 'start': 3627.737, 'weight': 4, 'content': [{'end': 3633.621, 'text': 'After I did fit predict on x, I am using this score on x comma cluster label.', 'start': 3627.737, 'duration': 5.884}, {'end': 3634.702, 'text': 'Now what this is going to do?', 'start': 3633.681, 'duration': 1.021}, {'end': 3636.903, 'text': 'Understand in Silhouette what did we discuss?', 'start': 3635.182, 'duration': 1.721}, {'end': 3638.924, 'text': 'We discussed that.', 'start': 3637.323, 'duration': 1.601}, {'end': 3639.765, 'text': 'what did we discuss?', 'start': 3638.924, 'duration': 0.841}, {'end': 3642.827, 'text': 'It will try to find out all the clusters.', 'start': 3640.365, 'duration': 2.462}, {'end': 3650.543, 'text': 'the clusters over here like this, and it will try to calculate the distance between them, which is the a of i.', 'start': 3643.598, 'duration': 6.945}, {'end': 3652.625, 'text': 'then it will try to compute the b of i.', 'start': 3650.543, 'duration': 2.082}], 'summary': 'Using fit predict on x, then calculating distances between clusters using silhouette method.', 'duration': 24.888, 'max_score': 3627.737, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3627737.jpg'}], 'start': 3302.787, 'title': 'Implementing clustering algorithms and silhouette analysis', 'summary': 'Covers implementing k-means clustering and dbscan, using k-means clustering to find the minimal within-cluster sum of square value, generating data for clustering algorithms, utilizing silhouette analysis to determine the optimal number of clusters for k-means clustering, ultimately selecting k=4 based on the abrupt change, and using silhouette scoring to validate the choice.', 'chapters': [{'end': 3492.503, 'start': 3302.787, 'title': 'Implementing k-means clustering and dbscan', 'summary': 'Covers implementing k-means clustering and dbscan, using k-means clustering to find the minimal within-cluster sum of square value and generating data for clustering algorithms.', 'duration': 189.716, 'highlights': ['Using k-means clustering to find the minimal within-cluster sum of square value: The speaker imports k-means clustering and applies it with different k values to find the minimal within-cluster sum of square value, using the k-means++ initialization technique and drawing a graph to visualize the process.', 'Generating data for clustering algorithms: The speaker explains the process of generating sample data with specified features and centroids using make_blobs, which will be used in performing clustering algorithms and trying out different cluster values to find the silhouette score.']}, {'end': 3692.298, 'start': 3493.123, 'title': 'Silhouette analysis for k-means clustering', 'summary': 'Covers using silhouette analysis to determine the optimal number of clusters for k-means clustering, ultimately selecting k=4 based on the abrupt change and using silhouette scoring to validate the choice.', 'duration': 199.175, 'highlights': ['Selecting k=4 based on abrupt change The speaker selects k=4 for clustering based on the last abrupt change in the graph, indicating a shift in the pattern of the data.', 'Using Silhouette scoring to validate k=4 The speaker employs Silhouette scoring to compare and validate the choice of k=4 for clustering, ensuring its validity for the dataset.', 'Explanation of Silhouette scoring process The speaker explains the process of Silhouette scoring, involving the calculation of distances between clusters, computation of scores, and the interpretation of values between -1 to +1, where higher values indicate better clustering.', 'Iterative comparison of cluster values using for loop An iterative comparison of different cluster values using a for loop, starting from 2, to assess the suitability of k=4 for clustering based on Silhouette scoring.', 'Visualization of clustering results The code includes visualization functions to display the clustering results in the form of graphs, aiding in the interpretation and assessment of the clustering outcome.']}], 'duration': 389.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3302787.jpg', 'highlights': ['Using k-means clustering to find the minimal within-cluster sum of square value: The speaker imports k-means clustering and applies it with different k values to find the minimal within-cluster sum of square value, using the k-means++ initialization technique and drawing a graph to visualize the process.', 'Selecting k=4 based on abrupt change The speaker selects k=4 for clustering based on the last abrupt change in the graph, indicating a shift in the pattern of the data.', 'Using Silhouette scoring to validate k=4 The speaker employs Silhouette scoring to compare and validate the choice of k=4 for clustering, ensuring its validity for the dataset.', 'Generating data for clustering algorithms: The speaker explains the process of generating sample data with specified features and centroids using make_blobs, which will be used in performing clustering algorithms and trying out different cluster values to find the silhouette score.', 'Explanation of Silhouette scoring process The speaker explains the process of Silhouette scoring, involving the calculation of distances between clusters, computation of scores, and the interpretation of values between -1 to +1, where higher values indicate better clustering.']}, {'end': 3904.819, 'segs': [{'end': 3748.201, 'src': 'embed', 'start': 3715.811, 'weight': 0, 'content': [{'end': 3719.535, 'text': 'I told you the value will be between minus 1 to plus 1.', 'start': 3715.811, 'duration': 3.724}, {'end': 3722.558, 'text': "And I'm actually getting 0.704, which is very, very good.", 'start': 3719.535, 'duration': 3.023}, {'end': 3727.903, 'text': 'Okay And then for n underscore cluster is equal to 3, 0.588.', 'start': 3722.959, 'duration': 4.944}, {'end': 3732.348, 'text': "Then n underscore cluster is equal to 4, I'm getting 0.65, which is pretty much amazing.", 'start': 3727.904, 'duration': 4.444}, {'end': 3736.832, 'text': 'And then for n underscore cluster is equal to 5, the average score is 0.563.', 'start': 3732.768, 'duration': 4.064}, {'end': 3740.475, 'text': 'And n underscore cluster is equal to 6, you are saying 0.45.', 'start': 3736.832, 'duration': 3.643}, {'end': 3748.201, 'text': "Here, directly you can actually say that, fine, for n underscore cluster is equal to 2, I'm getting an amazing score of 0.704.", 'start': 3740.475, 'duration': 7.726}], 'summary': 'Achieved high scores with n_cluster values: 3 (0.588), 4 (0.65), 5 (0.563), and 2 (0.704).', 'duration': 32.39, 'max_score': 3715.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3715811.jpg'}, {'end': 3881.733, 'src': 'embed', 'start': 3837.015, 'weight': 1, 'content': [{'end': 3838.417, 'text': 'Okay Negative value over here.', 'start': 3837.015, 'duration': 1.402}, {'end': 3841.9, 'text': 'This is my output, my score.', 'start': 3838.797, 'duration': 3.103}, {'end': 3844.742, 'text': 'This point that you see dotted points.', 'start': 3842.74, 'duration': 2.002}, {'end': 3845.543, 'text': 'This is my score.', 'start': 3844.822, 'duration': 0.721}, {'end': 3848.065, 'text': '0.58, whatever it is.', 'start': 3846.944, 'duration': 1.121}, {'end': 3849.506, 'text': 'This is basically my score.', 'start': 3848.305, 'duration': 1.201}, {'end': 3852.889, 'text': 'Okay So obviously this basically indicates that this point is near.', 'start': 3849.787, 'duration': 3.102}, {'end': 3854.791, 'text': 'The other cluster point is nearer to this.', 'start': 3852.929, 'duration': 1.862}, {'end': 3856.613, 'text': "So I'm actually getting a negative value.", 'start': 3855.091, 'duration': 1.522}, {'end': 3859.512, 'text': 'So this you really need to understand.', 'start': 3857.91, 'duration': 1.602}, {'end': 3867.461, 'text': "Now similarly if I go with respect to n underscore cluster is equal to 4, this looks good because here I don't have any negative value.", 'start': 3860.353, 'duration': 7.108}, {'end': 3875.231, 'text': 'And here you can see how coolly it has basically divided the points amazingly with the help of k is equal to 4.', 'start': 3868.122, 'duration': 7.109}, {'end': 3881.733, 'text': 'Right? And similarly if I go with 5, obviously you can see some negative values are here, some dotted line negative value are there.', 'start': 3875.231, 'duration': 6.502}], 'summary': 'Data analysis scores: k=4 has no negative values, k=5 has some negative values.', 'duration': 44.718, 'max_score': 3837.015, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3837015.jpg'}], 'start': 3692.779, 'title': 'Cluster analysis and k value selection', 'summary': 'Discusses evaluating silhouette scores for different cluster numbers, with a focus on obtaining a high score of 0.704 for 2 clusters and a deeper analysis of negative values. it also covers the process of selecting the k value for clustering, determining k=4 as the optimal choice due to the absence of negative values and the effective division of points, leading to the selection of n=4, k=4.', 'chapters': [{'end': 3802.55, 'start': 3692.779, 'title': 'Cluster analysis: silhouette scores', 'summary': 'Discusses evaluating silhouette scores for different cluster numbers, with a focus on obtaining a high score of 0.704 for 2 clusters and a deeper analysis of negative values for potential clusters.', 'duration': 109.771, 'highlights': ['The average silhouette score for n_cluster=2 is 0.704, indicating a very good clustering performance.', 'The next highest silhouette score is 0.65 for n_cluster=4, followed by 0.588 for n_cluster=3, and 0.563 for n_cluster=5.', 'The significance of negative values in evaluating cluster performance is highlighted, cautioning against solely relying on high positive scores without considering negative values.']}, {'end': 3904.819, 'start': 3803.57, 'title': 'Choose k value for clustering', 'summary': 'Discusses the process of selecting the k value for clustering based on the presence of negative values and the proximity of points, determining that k=4 is the optimal choice due to the absence of negative values and the effective division of points, leading to the selection of n=4, k=4.', 'duration': 101.249, 'highlights': ['The chapter emphasizes the impact of negative values on the AI model, leading to the decision to avoid clusters with negative values. The speaker explains that negative values indicate proximity issues between clusters, leading to the decision to avoid clusters with negative values.', 'The presentation highlights the effectiveness of k=4 in dividing the points without negative values, leading to the decision to choose k=4 over k=2 or k=6. The speaker emphasizes the effectiveness of k=4 in creating a generalized model and avoiding negative values, leading to the decision to choose k=4 over k=2 or k=6.']}], 'duration': 212.04, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3692779.jpg', 'highlights': ['The average silhouette score for n_cluster=2 is 0.704, indicating very good clustering performance.', 'The significance of negative values in evaluating cluster performance is highlighted, cautioning against solely relying on high positive scores without considering negative values.', 'The presentation highlights the effectiveness of k=4 in dividing the points without negative values, leading to the decision to choose k=4 over k=2 or k=6.']}, {'end': 4719.669, 'segs': [{'end': 3930.916, 'src': 'embed', 'start': 3904.819, 'weight': 0, 'content': [{'end': 3909.202, 'text': 'Now, should we compare this with the ELBO method? Here also I got 4.', 'start': 3904.819, 'duration': 4.383}, {'end': 3911.063, 'text': 'Right? So both are actually matching.', 'start': 3909.202, 'duration': 1.861}, {'end': 3916.446, 'text': 'So this indicates that with the help of this clustering, this silhouette score,', 'start': 3911.483, 'duration': 4.963}, {'end': 3921.209, 'text': 'we can definitely come to a conclusion and validate our clustering model in an amazing way.', 'start': 3916.446, 'duration': 4.763}, {'end': 3924.972, 'text': 'Okay? So I hope everybody is able to understand.', 'start': 3922.47, 'duration': 2.502}, {'end': 3930.916, 'text': 'Right? I hope everybody is able to understand this specific thing.', 'start': 3927.453, 'duration': 3.463}], 'summary': 'Silhouette score of 4 indicates successful validation of clustering model.', 'duration': 26.097, 'max_score': 3904.819, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3904819.jpg'}, {'end': 4209.489, 'src': 'embed', 'start': 4156.702, 'weight': 1, 'content': [{'end': 4166.929, 'text': 'it is a phenomena that skews.', 'start': 4156.702, 'duration': 10.227}, {'end': 4181.764, 'text': 'that skews the result of an algorithm in favor or against an idea.', 'start': 4166.929, 'duration': 14.835}, {'end': 4190.944, 'text': 'against an idea.', 'start': 4189.924, 'duration': 1.02}, {'end': 4198.746, 'text': "I'll make you understand the definition, but understand what I have actually written over here.", 'start': 4191.805, 'duration': 6.941}, {'end': 4203.667, 'text': 'It is a phenomenon that skews the result of an algorithm in favor or against an idea.', 'start': 4199.166, 'duration': 4.501}, {'end': 4209.489, 'text': 'Whenever I say this specific idea, this idea I will just talk about the training dataset initially.', 'start': 4204.208, 'duration': 5.281}], 'summary': 'Phenomenon skews algorithm results in favor or against an idea.', 'duration': 52.787, 'max_score': 4156.702, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk4156702.jpg'}, {'end': 4297.032, 'src': 'heatmap', 'start': 4246.213, 'weight': 1, 'content': [{'end': 4248.536, 'text': 'it may be in favor of that or it may be against of that.', 'start': 4246.213, 'duration': 2.323}, {'end': 4250.998, 'text': 'That basically means it may perform well, it may not perform well.', 'start': 4248.596, 'duration': 2.402}, {'end': 4254.842, 'text': 'If it is not performing well, that basically means the accuracy is down, right.', 'start': 4251.518, 'duration': 3.324}, {'end': 4258.445, 'text': 'If the accuracy is better at that point of time, what we will say?', 'start': 4255.242, 'duration': 3.203}, {'end': 4261.068, 'text': 'See, if the accuracy is better, that time what we will say?', 'start': 4259.006, 'duration': 2.062}, {'end': 4262.629, 'text': 'we will come up with two terms from here.', 'start': 4261.068, 'duration': 1.561}, {'end': 4266.413, 'text': 'Obviously, you understand, okay, there are two scenarios of bias now here.', 'start': 4263.23, 'duration': 3.183}, {'end': 4274.227, 'text': 'If it is in favor, that basically means it is performing well with respect to the training data set, I will basically say that it has high bias.', 'start': 4267.461, 'duration': 6.766}, {'end': 4280.232, 'text': 'If it is not able to perform well with the training data set, then here I will say it as low bias.', 'start': 4275.508, 'duration': 4.724}, {'end': 4287.059, 'text': 'I hope everybody is able to understand in this specific thing, because many many many people has this kind of confusion.', 'start': 4281.974, 'duration': 5.085}, {'end': 4294.09, 'text': "Now similarly, if I talk about variance, let's say, about variance because you need to understand the definition.", 'start': 4288.22, 'duration': 5.87}, {'end': 4297.032, 'text': 'A definition is very much important.', 'start': 4294.751, 'duration': 2.281}], 'summary': 'Bias and variance are key factors determining performance; high bias indicates good performance, low bias indicates poor performance.', 'duration': 50.819, 'max_score': 4246.213, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk4246213.jpg'}, {'end': 4430.01, 'src': 'embed', 'start': 4404.85, 'weight': 2, 'content': [{'end': 4411.056, 'text': 'It refers to the changes in the model when using different portion of the training or test data.', 'start': 4404.85, 'duration': 6.206}, {'end': 4417.32, 'text': 'refers to the changes basically means whether it is able to give a good prediction or wrong predictions.', 'start': 4412.136, 'duration': 5.184}, {'end': 4417.68, 'text': "That's it.", 'start': 4417.34, 'duration': 0.34}, {'end': 4424.285, 'text': 'So in this particular scenario, if it gives a good prediction, I may definitely say it as low variance.', 'start': 4418.221, 'duration': 6.064}, {'end': 4430.01, 'text': 'That basically means the accuracy with the accuracy with respect to the test data is also very good.', 'start': 4424.305, 'duration': 5.705}], 'summary': 'Analyzing model changes based on training/test data portions and accuracy for variance assessment.', 'duration': 25.16, 'max_score': 4404.85, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk4404850.jpg'}], 'start': 3904.819, 'title': 'Clustering validation and bias-variance in ml', 'summary': 'Discusses the use of silhouette score to validate clustering models, achieving a score of 4 for both methods. it also covers bias and variance in ml, providing definitions, examples, and scenarios to explain these concepts, giving a clear understanding.', 'chapters': [{'end': 3956.695, 'start': 3904.819, 'title': 'Validating clustering with silhouette score', 'summary': 'Discusses the use of silhouette score to compare and validate clustering models, achieving a score of 4 for both the method and the elbo method, indicating a successful validation approach for clustering models.', 'duration': 51.876, 'highlights': ['Using silhouette score to compare and validate clustering models, achieving a score of 4 for both the method and the ELBO method, indicating a successful validation approach. (Relevance: 5)', 'The ability to come to a conclusion and validate clustering models with the help of silhouette score, indicating an effective validation method. (Relevance: 4)', 'Demonstrating the understanding and validation of the specific concept, emphasizing the clarity and comprehension of the validation approach. (Relevance: 3)', 'The process of validating a model and understanding the code, highlighting the practical application of the validation technique. (Relevance: 2)', 'The potential to try out and understand the code, indicating accessibility and practical implementation of the validation process. (Relevance: 1)']}, {'end': 4719.669, 'start': 3957.116, 'title': 'Bias and variance in machine learning', 'summary': 'Covered the definitions of bias and variance in machine learning, with bias being the phenomena that skews the result of an algorithm in favor or against an idea, and variance referring to the changes in the model when using different portions of the training or test data. the session also included examples and scenarios to explain high bias, low bias, high variance, and low variance, providing a clear understanding of these concepts.', 'duration': 762.553, 'highlights': ["Bias is the phenomena that skews the result of an algorithm in favor or against an idea, leading to scenarios of high bias and low bias based on the model's performance with the training dataset. The definition of bias was explained as a phenomena that skews the result of an algorithm in favor or against an idea, leading to scenarios of high bias and low bias based on the model's performance with the training dataset.", 'Variance refers to the changes in the model when using different portions of the training or test data, resulting in scenarios of high variance and low variance based on the accuracy of predictions with the test data. Variance was defined as the changes in the model when using different portions of the training or test data, resulting in scenarios of high variance and low variance based on the accuracy of predictions with the test data.', 'Examples and scenarios were provided to explain high bias, low bias, high variance, and low variance, offering a clear understanding of these concepts in machine learning. The session included examples and scenarios to explain high bias, low bias, high variance, and low variance, providing a clear understanding of these concepts in machine learning.']}], 'duration': 814.85, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/461Opp1TShk/pics/461Opp1TShk3904819.jpg', 'highlights': ['Using silhouette score to compare and validate clustering models, achieving a score of 4 for both methods, indicating a successful validation approach.', "Bias is the phenomena that skews the result of an algorithm in favor or against an idea, leading to scenarios of high bias and low bias based on the model's performance with the training dataset.", 'Variability refers to the changes in the model when using different portions of the training or test data, resulting in scenarios of high variance and low variance based on the accuracy of predictions with the test data.']}], 'highlights': ['K-means clustering is a type of unsupervised machine learning with applications in ensemble techniques.', 'The Elbow method helps to find the optimized K value for clustering, emphasizing its importance in choosing the ideal K value.', "DBSCAN is 'pretty much amazing' compared to k-means and hierarchical means.", 'The average silhouette score for n_cluster=2 is 0.704, indicating very good clustering performance.', 'Using silhouette score to compare and validate clustering models, achieving a score of 4 for both methods, indicating a successful validation approach.']}