title
K Means Clustering Algorithm | K Means In Python | Machine Learning Algorithms |Simplilearn

description
🔥Professional Certificate Course In AI And Machine Learning by IIT Kanpur (India Only): https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=23AugustTubebuddyExpPCPAIandML&utm_medium=DescriptionFF&utm_source=youtube 🔥AI & Machine Learning Bootcamp(US Only): https://www.simplilearn.com/ai-machine-learning-bootcamp?utm_campaign=MachineLearning-Xvwt7y2jf5E&utm_medium=Descriptionff&utm_source=youtube 🔥 Purdue Post Graduate Program In AI And Machine Learning: https://www.simplilearn.com/pgp-ai-machine-learning-certification-training-course?utm_campaign=MachineLearning-Xvwt7y2jf5E&utm_medium=Descriptionff&utm_source=youtube 🔥AI Engineer Masters Program (Discount Code - YTBE15): https://www.simplilearn.com/masters-in-artificial-intelligence?utm_campaign=SCE-AIMasters&utm_medium=DescriptionFF&utm_source=youtube This K Means clustering algorithm tutorial video will take you through machine learning basics, types of clustering algorithms, what is K Means clustering, how does K Means clustering work with examples along with a demo in python on K-Means clustering - color compression. This Machine Learning algorithm tutorial video is ideal for beginners to learn how K Means clustering work. Below topics are covered in this K-Means Clustering Algorithm Tutorial: 1. Types of Machine Learning? ( 07:08 ) 2. What is K Means Clustering? ( 00:10 ) 3. Applications of K Means Clustering ( 09:27 ) 4. Common distance measure ( 10:20 ) 5. How does K Means Clustering work? ( 12:27 ) 6. K Means Clustering Algorithm ( 20:08 ) 7. Demo In Python: K Means Clustering ( 26:20 ) 8. Use case: Color compression In Python ( 38:38 ) Dataset Link - https://drive.google.com/drive/folders/1RdPdB3FkqoTYj6w8tZfbWE8K_dw_JaAe What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Subscribe to our channel for more Machine Learning Tutorials: https://www.youtube.com/user/Simplilearn?sub_confirmation=1 For a more detailed understanding on K Means Clustering Algorithm, do visit: https://bit.ly/2CbZAUg You will find in-depth content on the Machine Learning. Browse further to discover similar resources on related topics, made available to you as a learning path. Enjoy top-quality learning for FREE. You can also go through the Slides here: https://goo.gl/B6k4R6 Machine Learning Articles: https://www.simplilearn.com/what-is-artificial-intelligence-and-why-ai-certification-article?utm_campaign=Kmeans-Clustering-Algorithm-Xvwt7y2jf5E&utm_medium=Tutorials&utm_source=youtube \ #KMeansClusteringAlgorithm #KMeansClustering #KMeansClusteringInMachineLearning #ClusteringInMachineLearning #KMeansAlgorithm #KMeans #MachineLearningAlgorithm #MachineLearning #Simplilearn ➡️ About Post Graduate Program In AI And Machine Learning This AI ML course is designed to enhance your career in AI and ML by demystifying concepts like machine learning, deep learning, NLP, computer vision, reinforcement learning, and more. You'll also have access to 4 live sessions, led by industry experts, covering the latest advancements in AI such as generative modeling, ChatGPT, OpenAI, and chatbots. ✅ Key Features - Post Graduate Program certificate and Alumni Association membership - Exclusive hackathons and Ask me Anything sessions by IBM - 3 Capstones and 25+ Projects with industry data sets from Twitter, Uber, Mercedes Benz, and many more - Master Classes delivered by Purdue faculty and IBM experts - Simplilearn's JobAssist helps you get noticed by top hiring companies - Gain access to 4 live online sessions on latest AI trends such as ChatGPT, generative AI, explainable AI, and more - Learn about the applications of ChatGPT, OpenAI, Dall-E, Midjourney & other prominent tools ✅ Skills Covered - ChatGPT - Generative AI - Explainable AI - Generative Modeling - Statistics - Python - Supervised Learning - Unsupervised Learning - NLP - Neural Networks - Computer Vision - And Many More… 👉 Learn More At: 🔥 Enroll for FREE Machine Learning Course & Get your Completion Certificate: https://www.simplilearn.com/learn-machine-learning-basics-skillup?utm_campaign=MachineLearning&utm_medium=Description&utm_source=youtube 🔥🔥 Interested in Attending Live Classes? Call Us: IN - 18002127688 / US - +18445327688

detail
{'title': 'K Means Clustering Algorithm | K Means In Python | Machine Learning Algorithms |Simplilearn', 'heatmap': [{'end': 520.198, 'start': 422.415, 'weight': 0.714}, {'end': 754.392, 'start': 664.426, 'weight': 0.874}, {'end': 999.454, 'start': 901.402, 'weight': 0.798}, {'end': 1727.714, 'start': 1656.169, 'weight': 0.867}], 'summary': "Covers k-means clustering algorithm, explaining its unsupervised learning process to group data based on similarities with 'k' value determining clusters, illustrating its application through examples like cricket, walmart store locations, and color compression, and demonstrating its implementation in python for data analysis.", 'chapters': [{'end': 173.875, 'segs': [{'end': 53.304, 'src': 'embed', 'start': 3.868, 'weight': 0, 'content': [{'end': 7.355, 'text': 'Hello and welcome to the session on K-means clustering.', 'start': 3.868, 'duration': 3.487}, {'end': 9.981, 'text': "I'm Mohan Kumar from Simply Learn.", 'start': 7.756, 'duration': 2.225}, {'end': 12.626, 'text': 'So what is K-means clustering??', 'start': 10.482, 'duration': 2.144}, {'end': 18.005, 'text': 'K-means clustering is unsupervised learning algorithm.', 'start': 13.247, 'duration': 4.758}, {'end': 22.946, 'text': "in this case, you don't have labeled data, unlike in supervised learning.", 'start': 18.005, 'duration': 4.941}, {'end': 30.229, 'text': 'so you have a set of data and you want to group them and, as the name suggests, you want to put them into clusters,', 'start': 22.946, 'duration': 7.283}, {'end': 37.011, 'text': 'which means objects that are similar in nature, similar in characteristics, need to be put together.', 'start': 30.229, 'duration': 6.782}, {'end': 40.432, 'text': "so that's what k-means clustering is all about.", 'start': 37.011, 'duration': 3.421}, {'end': 47.758, 'text': 'the term k is basically is a number, so we need to tell the system how many clusters we need to perform.', 'start': 40.432, 'duration': 7.326}, {'end': 50.061, 'text': 'so if k is equal to two, there will be two clusters.', 'start': 47.758, 'duration': 2.303}, {'end': 53.304, 'text': 'if k is equal to three, three clusters, and so on and so forth.', 'start': 50.061, 'duration': 3.243}], 'summary': 'K-means clustering groups similar data into clusters based on a specified number, k.', 'duration': 49.436, 'max_score': 3.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E3868.jpg'}, {'end': 103.771, 'src': 'embed', 'start': 74.457, 'weight': 2, 'content': [{'end': 82.019, 'text': "let's say you received data of a lot of players from maybe all over the country or all over the world,", 'start': 74.457, 'duration': 7.562}, {'end': 93.282, 'text': 'and this data has information about the runs scored by the people or by the player and the wickets taken by the player and, based on this information,', 'start': 82.019, 'duration': 11.263}, {'end': 99.507, 'text': 'We need to cluster this data into two clusters batsman and bowlers.', 'start': 93.662, 'duration': 5.845}, {'end': 101.289, 'text': 'So this is an interesting example.', 'start': 99.807, 'duration': 1.482}, {'end': 103.771, 'text': "Let's see how we can perform this.", 'start': 101.769, 'duration': 2.002}], 'summary': 'Cluster player data into batsman and bowlers based on runs and wickets.', 'duration': 29.314, 'max_score': 74.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E74457.jpg'}], 'start': 3.868, 'title': 'K-means clustering', 'summary': "Explains k-means clustering, an unsupervised learning algorithm used to group data into clusters based on similarities, with the 'k' value determining the number of clusters, illustrated with a cricket example.", 'chapters': [{'end': 173.875, 'start': 3.868, 'title': 'K-means clustering: unsupervised learning', 'summary': "Explains k-means clustering, an unsupervised learning algorithm used to group data into clusters based on similarities, with the 'k' value determining the number of clusters, illustrated with a cricket example.", 'duration': 170.007, 'highlights': ["K-means clustering is an unsupervised learning algorithm used to group data into clusters based on similarities, with the 'k' value determining the number of clusters.", "The 'k' value in K-means clustering determines the number of clusters, where each cluster represents objects with similar characteristics.", "An example of using K-means clustering is demonstrated in the context of cricket, where the data of players' runs and wickets is clustered into batsmen and bowlers."]}], 'duration': 170.007, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E3868.jpg', 'highlights': ["K-means clustering is an unsupervised learning algorithm used to group data into clusters based on similarities, with the 'k' value determining the number of clusters.", "The 'k' value in K-means clustering determines the number of clusters, where each cluster represents objects with similar characteristics.", "An example of using K-means clustering is demonstrated in the context of cricket, where the data of players' runs and wickets is clustered into batsmen and bowlers."]}, {'end': 721.433, 'segs': [{'end': 260.122, 'src': 'embed', 'start': 230.048, 'weight': 5, 'content': [{'end': 233.689, 'text': 'one point closer to these data points and another closer to these data points.', 'start': 230.048, 'duration': 3.641}, {'end': 236.11, 'text': 'they can be assigned randomly anywhere.', 'start': 233.689, 'duration': 2.421}, {'end': 238.331, 'text': "OK, so that's the first step.", 'start': 236.57, 'duration': 1.761}, {'end': 242.972, 'text': 'The next step is to determine the distance of each of the data points.', 'start': 238.491, 'duration': 4.481}, {'end': 247.594, 'text': 'from each of the randomly assigned centroid.', 'start': 243.752, 'duration': 3.842}, {'end': 254.919, 'text': 'So, for example, we take this point and find the distance from this centroid and the distance from this centroid.', 'start': 248.035, 'duration': 6.884}, {'end': 260.122, 'text': 'This point is taken and the distance is found from this centroid and this centroid and so on and so forth.', 'start': 255.199, 'duration': 4.923}], 'summary': 'Determine distance of data points from centroids, part of clustering process.', 'duration': 30.074, 'max_score': 230.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E230048.jpg'}, {'end': 393.613, 'src': 'embed', 'start': 367.706, 'weight': 2, 'content': [{'end': 375.569, 'text': 'and that means our algorithm has converged, convergence has occurred And we have the cluster, two clusters.', 'start': 367.706, 'duration': 7.863}, {'end': 377.99, 'text': 'We have the clusters with a centroid.', 'start': 375.89, 'duration': 2.1}, {'end': 380.67, 'text': 'So this process is repeated.', 'start': 378.11, 'duration': 2.56}, {'end': 389.872, 'text': 'The process of calculating the distance and repositioning the centroid is repeated till the repositioning stops,', 'start': 381.211, 'duration': 8.661}, {'end': 393.613, 'text': 'which means that the algorithm has converged.', 'start': 389.872, 'duration': 3.741}], 'summary': 'Algorithm converged with 2 clusters after iterative centroid repositioning.', 'duration': 25.907, 'max_score': 367.706, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E367706.jpg'}, {'end': 521.379, 'src': 'heatmap', 'start': 415.128, 'weight': 4, 'content': [{'end': 422.115, 'text': 'and then we will talk about how k-means clustering works and go into the details of k-means clustering algorithm,', 'start': 415.128, 'duration': 6.987}, {'end': 426.637, 'text': 'And then we will end with a demo and a use case for k-means clustering.', 'start': 422.415, 'duration': 4.222}, {'end': 427.918, 'text': "So let's begin.", 'start': 426.998, 'duration': 0.92}, {'end': 430.339, 'text': 'First of all, what are the types of clustering?', 'start': 428.338, 'duration': 2.001}, {'end': 436.763, 'text': 'There are primarily two categories of clustering hierarchical clustering and then partitional clustering.', 'start': 430.74, 'duration': 6.023}, {'end': 445.948, 'text': 'And each of these categories are further subdivided into agglomerative and divisive clustering and k-means and fuzzy c-means clustering.', 'start': 437.363, 'duration': 8.585}, {'end': 450.01, 'text': "Let's take a quick look at what each of these types of clustering are.", 'start': 446.488, 'duration': 3.522}, {'end': 460.99, 'text': 'In hierarchical clustering, the clusters have a tree-like structure and hierarchical clustering is further divided into agglomerative and divisive.', 'start': 450.942, 'duration': 10.048}, {'end': 464.565, 'text': 'Agglomerative clustering is a bottom-up approach.', 'start': 461.843, 'duration': 2.722}, {'end': 470.828, 'text': 'We begin with each element as a separate cluster and merge them into successively larger clusters.', 'start': 464.725, 'duration': 6.103}, {'end': 474.09, 'text': 'So for example, we have A, B, C, D, E, F.', 'start': 470.848, 'duration': 3.242}, {'end': 478.752, 'text': 'We start by combining B and C, form one cluster, D and E, form one more.', 'start': 474.09, 'duration': 4.662}, {'end': 484.916, 'text': 'Then we combine D, E, and F, one more bigger cluster, and then add B, C to that, and then finally A to it.', 'start': 479.113, 'duration': 5.803}, {'end': 490.519, 'text': 'Compared to that, divisive clustering or divisive clustering is the top-down approach.', 'start': 485.316, 'duration': 5.203}, {'end': 496.182, 'text': 'We begin with the whole set and proceed to divide it into successively smaller clusters.', 'start': 490.759, 'duration': 5.423}, {'end': 497.942, 'text': 'So we have A, B, C, D, E, F.', 'start': 496.222, 'duration': 1.72}, {'end': 504.246, 'text': 'We first take that as a single cluster and then break it down into A, B, C, D, E, and F.', 'start': 497.942, 'duration': 6.304}, {'end': 512.152, 'text': 'Then we have partitional clustering split into two subtypes, k-means clustering and fuzzy c-means.', 'start': 505.226, 'duration': 6.926}, {'end': 520.198, 'text': 'In k-means clustering, the objects are divided into the number of clusters mentioned by the number k.', 'start': 512.592, 'duration': 7.606}, {'end': 521.379, 'text': "That's where the k comes from.", 'start': 520.198, 'duration': 1.181}], 'summary': 'Introduction to types of clustering: hierarchical, partitional, k-means clustering algorithm, and use case demonstration.', 'duration': 106.251, 'max_score': 415.128, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E415128.jpg'}, {'end': 537.303, 'src': 'embed', 'start': 512.592, 'weight': 0, 'content': [{'end': 520.198, 'text': 'In k-means clustering, the objects are divided into the number of clusters mentioned by the number k.', 'start': 512.592, 'duration': 7.606}, {'end': 521.379, 'text': "That's where the k comes from.", 'start': 520.198, 'duration': 1.181}, {'end': 527.44, 'text': 'So, if we say k is equal to 2, the objects are divided into two clusters c1 and c2.', 'start': 521.719, 'duration': 5.721}, {'end': 537.303, 'text': 'And the way it is done is the features or characteristics are compared and all objects having similar characteristics are clubbed together.', 'start': 528.021, 'duration': 9.282}], 'summary': 'K-means clustering divides objects into k clusters based on similar characteristics.', 'duration': 24.711, 'max_score': 512.592, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E512592.jpg'}, {'end': 588.691, 'src': 'embed', 'start': 559.293, 'weight': 1, 'content': [{'end': 562.855, 'text': 'in c means objects can belong to more than one cluster.', 'start': 559.293, 'duration': 3.562}, {'end': 566.598, 'text': 'so that is the primary difference between k means and fuzzy c means.', 'start': 562.855, 'duration': 3.743}, {'end': 570.64, 'text': 'So what are some of the applications of k-means clustering?', 'start': 567.398, 'duration': 3.242}, {'end': 581.587, 'text': 'K-means clustering is used in a variety of examples or variety of business cases in real life, starting from academic performance diagnostic systems,', 'start': 571.22, 'duration': 10.367}, {'end': 585.309, 'text': 'search engines and wireless sensor networks and many more.', 'start': 581.587, 'duration': 3.722}, {'end': 588.691, 'text': 'So let us take a little deeper look at each of these examples.', 'start': 585.529, 'duration': 3.162}], 'summary': 'K-means clustering has various real-life applications, including academic performance diagnostic systems, search engines, and wireless sensor networks.', 'duration': 29.398, 'max_score': 559.293, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E559293.jpg'}, {'end': 673.59, 'src': 'embed', 'start': 648.863, 'weight': 3, 'content': [{'end': 657.965, 'text': 'there is euclidean distance, there is manhattan distance, then we have squared euclidean distance measure and cosine distance measure.', 'start': 648.863, 'duration': 9.102}, {'end': 662.206, 'text': 'these are some of the distance measures supported by k-means clustering.', 'start': 657.965, 'duration': 4.241}, {'end': 663.646, 'text': "let's take a look at each of these.", 'start': 662.206, 'duration': 1.44}, {'end': 666.247, 'text': 'what is euclidean distance measure?', 'start': 664.426, 'duration': 1.821}, {'end': 669.128, 'text': 'this is nothing but the distance between two points.', 'start': 666.247, 'duration': 2.881}, {'end': 673.59, 'text': 'so we have learnt in high school how to find the distance between two points.', 'start': 669.128, 'duration': 4.462}], 'summary': 'K-means clustering supports various distance measures such as euclidean, manhattan, squared euclidean, and cosine.', 'duration': 24.727, 'max_score': 648.863, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E648863.jpg'}], 'start': 174.055, 'title': 'K-means clustering', 'summary': 'Covers the k-means clustering process, k=2, random centroid assignment, distance calculation, and repositioning for convergence, resulting in two clusters. it also includes an overview of k-means clustering, types, applications, distance measures, and algorithm explanation with examples and use cases.', 'chapters': [{'end': 393.613, 'start': 174.055, 'title': 'K-means clustering process', 'summary': 'Explains the k-means clustering process, where k=2, centroids are randomly assigned, distances are calculated for data points from centroids, and centroids are repositioned until convergence, resulting in two clusters.', 'duration': 219.558, 'highlights': ['The chapter explains the k-means clustering process, where k=2, centroids are randomly assigned, distances are calculated for data points from centroids, and centroids are repositioned until convergence, resulting in two clusters.', 'The next step is to determine the distance of each of the data points from each of the randomly assigned centroid.', 'The process of calculating the distance and repositioning the centroid is repeated till the repositioning stops, which means that the algorithm has converged.']}, {'end': 721.433, 'start': 394.033, 'title': 'K-means clustering overview', 'summary': 'Covers an overview of k-means clustering, including types of clustering, applications, distance measures, and a detailed explanation of k-means clustering algorithm, with examples and use cases.', 'duration': 327.4, 'highlights': ['K-means clustering is used in various business cases including academic performance diagnostic systems, search engines, and wireless sensor networks.', 'Explanation of distance measures including Euclidean distance, Manhattan distance, squared Euclidean distance, and cosine distance.', 'Types of clustering, including hierarchical clustering and partitional clustering, with a focus on k-means and fuzzy c-means clustering.']}], 'duration': 547.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E174055.jpg', 'highlights': ['The chapter explains the k-means clustering process, where k=2, centroids are randomly assigned, distances are calculated for data points from centroids, and centroids are repositioned until convergence, resulting in two clusters.', 'K-means clustering is used in various business cases including academic performance diagnostic systems, search engines, and wireless sensor networks.', 'The process of calculating the distance and repositioning the centroid is repeated till the repositioning stops, which means that the algorithm has converged.', 'Explanation of distance measures including Euclidean distance, Manhattan distance, squared Euclidean distance, and cosine distance.', 'Types of clustering, including hierarchical clustering and partitional clustering, with a focus on k-means and fuzzy c-means clustering.', 'The next step is to determine the distance of each of the data points from each of the randomly assigned centroid.']}, {'end': 1033.334, 'segs': [{'end': 751.151, 'src': 'embed', 'start': 721.653, 'weight': 1, 'content': [{'end': 723.435, 'text': 'So that is the Manhattan distance measure.', 'start': 721.653, 'duration': 1.782}, {'end': 725.576, 'text': 'Then we have cosine distance measure.', 'start': 723.635, 'duration': 1.941}, {'end': 732.501, 'text': 'In this case, we take the angle between the two vectors formed by joining the points from the origin.', 'start': 725.896, 'duration': 6.605}, {'end': 735.302, 'text': 'So that is the cosine distance measure.', 'start': 732.861, 'duration': 2.441}, {'end': 742.027, 'text': 'OK, so that was a quick overview about the various distance measures that are supported by k-means.', 'start': 735.623, 'duration': 6.404}, {'end': 746.37, 'text': "Now let's go and check how exactly k-means clustering works.", 'start': 742.547, 'duration': 3.823}, {'end': 751.151, 'text': 'okay. so this is how k-means clustering works.', 'start': 747.33, 'duration': 3.821}], 'summary': 'Overview of manhattan and cosine distance measures in k-means clustering', 'duration': 29.498, 'max_score': 721.653, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E721653.jpg'}, {'end': 793.858, 'src': 'embed', 'start': 767.614, 'weight': 0, 'content': [{'end': 775.56, 'text': 'maybe k is equal to three or four or five to start with, and then, As we progress, we keep changing until we get the best clusters.', 'start': 767.614, 'duration': 7.946}, {'end': 782.328, 'text': 'Or there is a technique called ELBO technique whereby we can determine the value of k.', 'start': 775.961, 'duration': 6.367}, {'end': 788.816, 'text': 'What should be the best value of k? How many clusters should be formed? So once we have the value of k, we specify that.', 'start': 782.328, 'duration': 6.488}, {'end': 793.858, 'text': 'and then the system will assign that many centroid.', 'start': 789.236, 'duration': 4.622}], 'summary': 'Determining optimal k value is critical for clustering algorithms to assign centroids.', 'duration': 26.244, 'max_score': 767.614, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E767614.jpg'}, {'end': 848.37, 'src': 'embed', 'start': 821.348, 'weight': 5, 'content': [{'end': 825.932, 'text': 'And thereby, we have k number of initial clusters.', 'start': 821.348, 'duration': 4.584}, {'end': 828.794, 'text': 'However, this is not the final clusters.', 'start': 826.473, 'duration': 2.321}, {'end': 839.083, 'text': 'The next step it does is for the new groups, for the clusters that have been formed, it calculates the mean position,', 'start': 829.155, 'duration': 9.928}, {'end': 842.365, 'text': 'thereby calculates the new centroid position.', 'start': 839.083, 'duration': 3.282}, {'end': 846.829, 'text': 'The position of the centroid moves compared to the randomly allocated one.', 'start': 842.846, 'duration': 3.983}, {'end': 848.37, 'text': "So it's an iterative process.", 'start': 846.989, 'duration': 1.381}], 'summary': 'Iterative process to calculate new centroid position for k initial clusters.', 'duration': 27.022, 'max_score': 821.348, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E821348.jpg'}, {'end': 999.454, 'src': 'heatmap', 'start': 901.402, 'weight': 0.798, 'content': [{'end': 906.728, 'text': "so let's do a little bit of visualization and see if we can explain it better.", 'start': 901.402, 'duration': 5.326}, {'end': 908.049, 'text': "Let's take an example.", 'start': 907.168, 'duration': 0.881}, {'end': 910.672, 'text': 'If we have a data set for a grocery shop.', 'start': 908.309, 'duration': 2.363}, {'end': 921.4, 'text': "so let's say we have a data set for a grocery shop and now we want to find out how many clusters this has to be spread across.", 'start': 911.112, 'duration': 10.288}, {'end': 925.183, 'text': 'so how do we find the optimum number of clusters?', 'start': 921.4, 'duration': 3.783}, {'end': 927.565, 'text': 'there is a technique called the elbow method.', 'start': 925.183, 'duration': 2.382}, {'end': 938.552, 'text': 'so when these clusters are formed, there is a parameter called within sum of squares, And the lower this value is, the better the cluster is.', 'start': 927.565, 'duration': 10.987}, {'end': 942.633, 'text': 'That means all these points are very close to each other.', 'start': 938.672, 'duration': 3.961}, {'end': 954.178, 'text': 'So we use this within sum of squares as a measure to find the optimum number of clusters that can be formed for a given data set.', 'start': 942.993, 'duration': 11.185}, {'end': 962.741, 'text': 'So we create clusters or we let the system create clusters of a variety of numbers, maybe of 10.', 'start': 954.458, 'duration': 8.283}, {'end': 975.03, 'text': '10 clusters and for each value of k the within ss is measured and the value of k, which has the least amount of within ss or wss,', 'start': 962.741, 'duration': 12.289}, {'end': 978.693, 'text': 'that is taken as the optimum value of k.', 'start': 975.03, 'duration': 3.663}, {'end': 981.435, 'text': 'so this is the diagrammatic representation.', 'start': 978.693, 'duration': 2.742}, {'end': 989.806, 'text': 'so we have on the y-axis the within sum of squares or wss, And on the x-axis we have the number of clusters.', 'start': 981.435, 'duration': 8.371}, {'end': 999.454, 'text': 'So, as you can imagine, if you have k is equal to 1,, which means all the data points are in a single cluster, the within-SS value will be very high,', 'start': 989.947, 'duration': 9.507}], 'summary': 'Using the elbow method to find optimum number of clusters based on within sum of squares.', 'duration': 98.052, 'max_score': 901.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E901402.jpg'}, {'end': 981.435, 'src': 'embed', 'start': 925.183, 'weight': 2, 'content': [{'end': 927.565, 'text': 'there is a technique called the elbow method.', 'start': 925.183, 'duration': 2.382}, {'end': 938.552, 'text': 'so when these clusters are formed, there is a parameter called within sum of squares, And the lower this value is, the better the cluster is.', 'start': 927.565, 'duration': 10.987}, {'end': 942.633, 'text': 'That means all these points are very close to each other.', 'start': 938.672, 'duration': 3.961}, {'end': 954.178, 'text': 'So we use this within sum of squares as a measure to find the optimum number of clusters that can be formed for a given data set.', 'start': 942.993, 'duration': 11.185}, {'end': 962.741, 'text': 'So we create clusters or we let the system create clusters of a variety of numbers, maybe of 10.', 'start': 954.458, 'duration': 8.283}, {'end': 975.03, 'text': '10 clusters and for each value of k the within ss is measured and the value of k, which has the least amount of within ss or wss,', 'start': 962.741, 'duration': 12.289}, {'end': 978.693, 'text': 'that is taken as the optimum value of k.', 'start': 975.03, 'duration': 3.663}, {'end': 981.435, 'text': 'so this is the diagrammatic representation.', 'start': 978.693, 'duration': 2.742}], 'summary': 'Elbow method uses within sum of squares to find optimal clusters.', 'duration': 56.252, 'max_score': 925.183, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E925183.jpg'}], 'start': 721.653, 'title': 'K-means clustering and elbow method', 'summary': 'Provides an overview of manhattan and cosine distance measures in k-means clustering, iterative process of determining best clusters, and explains the elbow method for determining the optimum number of clusters using within sum of squares, with optimum k value as 2, 3, or 4.', 'chapters': [{'end': 896.998, 'start': 721.653, 'title': 'K-means clustering overview', 'summary': 'Provides an overview of manhattan and cosine distance measures in k-means clustering and explains the iterative process of determining the best clusters through centroid allocation, distance measurement, and centroid position calculation.', 'duration': 175.345, 'highlights': ['The chapter explains the iterative process of determining the best clusters through centroid allocation, distance measurement, and centroid position calculation, with the technique called ELBO used to determine the best value of k.', 'The chapter provides an overview of Manhattan and cosine distance measures in k-means clustering, highlighting the process of assigning data points to the closest centroid based on minimum distance measure.', 'The chapter describes the iterative process of k-means clustering, where the centroid position is calculated based on the mean position of the clusters, and the convergence is determined by the movement of the centroids.']}, {'end': 1033.334, 'start': 897.318, 'title': 'Elbow method for cluster analysis', 'summary': 'Explains the elbow method for determining the optimum number of clusters in a dataset by using the within sum of squares as a measure, with a visualization showing the decrease in within sum of squares as the number of clusters increases, indicating an optimum value of k as either 2, 3, or 4.', 'duration': 136.016, 'highlights': ['The elbow method uses within sum of squares as a measure to find the optimum number of clusters, where a lower value indicates better cluster formation.', 'A diagrammatic representation is used to show the decrease in within sum of squares as the number of clusters increases, suggesting an optimum value of k as either 2, 3, or 4.', 'By creating clusters for a variety of numbers and measuring the within sum of squares for each value of k, the optimum value of k is determined by the k value with the least amount of within sum of squares.']}], 'duration': 311.681, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E721653.jpg', 'highlights': ['The chapter explains the iterative process of determining the best clusters through centroid allocation, distance measurement, and centroid position calculation, with the technique called ELBO used to determine the best value of k.', 'The chapter provides an overview of Manhattan and cosine distance measures in k-means clustering, highlighting the process of assigning data points to the closest centroid based on minimum distance measure.', 'The elbow method uses within sum of squares as a measure to find the optimum number of clusters, where a lower value indicates better cluster formation.', 'A diagrammatic representation is used to show the decrease in within sum of squares as the number of clusters increases, suggesting an optimum value of k as either 2, 3, or 4.', 'By creating clusters for a variety of numbers and measuring the within sum of squares for each value of k, the optimum value of k is determined by the k value with the least amount of within sum of squares.', 'The chapter describes the iterative process of k-means clustering, where the centroid position is calculated based on the mean position of the clusters, and the convergence is determined by the movement of the centroids.']}, {'end': 1622.03, 'segs': [{'end': 1132.192, 'src': 'embed', 'start': 1105.749, 'weight': 2, 'content': [{'end': 1114.816, 'text': 'They have been randomly assigned points and only thing that has been done was the data points which are closest to them have been assigned to them.', 'start': 1105.749, 'duration': 9.067}, {'end': 1122.263, 'text': 'but now, in this step, the actual centroid will be calculated, which may be for each of these data sets somewhere in the middle.', 'start': 1115.316, 'duration': 6.947}, {'end': 1132.192, 'text': "so that's like the mean point that will be calculated and the centroid will actually be positioned or repositioned there same with c2.", 'start': 1122.263, 'duration': 9.929}], 'summary': 'Data points randomly assigned. centroid calculated for repositioning.', 'duration': 26.443, 'max_score': 1105.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1105749.jpg'}, {'end': 1200.238, 'src': 'embed', 'start': 1171.995, 'weight': 3, 'content': [{'end': 1174.196, 'text': 'So this is the new grouping.', 'start': 1171.995, 'duration': 2.201}, {'end': 1176.037, 'text': 'So some points will be reassigned.', 'start': 1174.316, 'duration': 1.721}, {'end': 1179.36, 'text': 'And again, the centroid will be calculated.', 'start': 1176.738, 'duration': 2.622}, {'end': 1184.423, 'text': "And if the centroid doesn't change, so that is a repetitive process, iterative process.", 'start': 1179.8, 'duration': 4.623}, {'end': 1191.488, 'text': "And if the centroid doesn't change, once the centroid stops changing, that means the algorithm has converged.", 'start': 1184.623, 'duration': 6.865}, {'end': 1200.238, 'text': 'this is our final cluster, with this as the centroid, c1 and c2 as the centroids, these data points as a part of each cluster.', 'start': 1192.128, 'duration': 8.11}], 'summary': 'New grouping with reassigned points and converged algorithm with final centroids c1 and c2', 'duration': 28.243, 'max_score': 1171.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1171995.jpg'}, {'end': 1332.404, 'src': 'embed', 'start': 1308.325, 'weight': 1, 'content': [{'end': 1319.592, 'text': 'which means that the position of the randomly selected centroids will now change and they will be the mean positions of this newly formed k groups.', 'start': 1308.325, 'duration': 11.267}, {'end': 1327.078, 'text': 'and once that is done, we once again repeat this process of calculating the distance right.', 'start': 1319.592, 'duration': 7.486}, {'end': 1328.439, 'text': 'so this is what we are doing.', 'start': 1327.078, 'duration': 1.361}, {'end': 1332.404, 'text': 'as a part of step four, we repeat step two and three.', 'start': 1328.439, 'duration': 3.965}], 'summary': 'Iteratively update centroids to find mean positions of k groups in k-means clustering.', 'duration': 24.079, 'max_score': 1308.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1308325.jpg'}, {'end': 1576.834, 'src': 'embed', 'start': 1550.348, 'weight': 4, 'content': [{'end': 1560.87, 'text': 'once again we find, and then it is possible that after iterating through three or four or five times, the centroid will stop moving,', 'start': 1550.348, 'duration': 10.522}, {'end': 1569.252, 'text': 'in the sense that when you calculate the new value of the centroid that will be same as the original value or there will be very marginal change.', 'start': 1560.87, 'duration': 8.382}, {'end': 1576.834, 'text': 'So that is when we say convergence has occurred and that is our final cluster.', 'start': 1569.672, 'duration': 7.162}], 'summary': 'Convergence occurs after 3-5 iterations when centroid stops moving.', 'duration': 26.486, 'max_score': 1550.348, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1550348.jpg'}, {'end': 1622.03, 'src': 'embed', 'start': 1596.962, 'weight': 0, 'content': [{'end': 1606.935, 'text': 'walmart wants to open a chain of stores across the state of florida and it wants to find the optimal store locations.', 'start': 1596.962, 'duration': 9.973}, {'end': 1618.048, 'text': 'now the issue here is, if they open too many stores close to each other, obviously the they will not make profit, but If the stores are too far apart,', 'start': 1606.935, 'duration': 11.113}, {'end': 1619.669, 'text': 'then they will not have enough sales.', 'start': 1618.048, 'duration': 1.621}, {'end': 1622.03, 'text': 'So how do they optimize this?', 'start': 1620.329, 'duration': 1.701}], 'summary': 'Walmart aims to open stores across florida while optimizing store locations to balance profitability and sales.', 'duration': 25.068, 'max_score': 1596.962, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1596962.jpg'}], 'start': 1033.334, 'title': 'K-means clustering process', 'summary': 'Elucidates the iterative k-means clustering process, encompassing initial centroid selection, iterative centroid calculation and reallocation, and convergence, exemplified with a practical case of determining optimal store locations for walmart in florida.', 'chapters': [{'end': 1386.636, 'start': 1033.334, 'title': 'K-means clustering process', 'summary': 'Explains the k-means clustering process, involving the random selection of initial centroids, iterative calculation and repositioning of centroids, and reallocation of data points until convergence, with an illustrative example provided.', 'duration': 353.302, 'highlights': ['The process of k-means clustering involves the random selection of initial centroids and iterative calculation and repositioning of centroids until convergence.', 'Data points are allocated to the centroid closest to them, and centroids are recalculated as the mean position of the new groups formed, with potential reallocation of points during the process.', 'The iterative process continues until the centroids stop changing, indicating convergence and the formation of the final clusters.']}, {'end': 1622.03, 'start': 1386.636, 'title': 'K-means clustering process', 'summary': 'Explains the iterative process of k-means clustering, involving steps to assign data points to centroids based on distance, updating centroids, and iterating until convergence, with an application example of finding optimal store locations for walmart in florida.', 'duration': 235.394, 'highlights': ['The iterative process of K-means clustering involves assigning data points to centroids based on distance and updating centroids until convergence.', 'Application example of finding optimal store locations for Walmart in Florida using K-means clustering to avoid opening stores too close or too far apart.']}], 'duration': 588.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1033334.jpg', 'highlights': ['Application example of finding optimal store locations for Walmart in Florida using K-means clustering to avoid opening stores too close or too far apart.', 'The process of k-means clustering involves the random selection of initial centroids and iterative calculation and repositioning of centroids until convergence.', 'Data points are allocated to the centroid closest to them, and centroids are recalculated as the mean position of the new groups formed, with potential reallocation of points during the process.', 'The iterative process of K-means clustering involves assigning data points to centroids based on distance and updating centroids until convergence.', 'The iterative process continues until the centroids stop changing, indicating convergence and the formation of the final clusters.']}, {'end': 2219.042, 'segs': [{'end': 1727.714, 'src': 'heatmap', 'start': 1622.57, 'weight': 0, 'content': [{'end': 1632.655, 'text': 'Now, for an organization like Walmart, which is an e-commerce giant, they already have the addresses of their customers in their database.', 'start': 1622.57, 'duration': 10.085}, {'end': 1641.859, 'text': 'So they can actually use this information or this data and use k-means clustering to find the optimal location.', 'start': 1632.955, 'duration': 8.904}, {'end': 1648.343, 'text': 'Now, before we go into the Python notebook and show you the live code,', 'start': 1642.319, 'duration': 6.024}, {'end': 1656.169, 'text': 'I wanted to take you through very quickly a summary of the code in the slides and then we will go into the Python notebook.', 'start': 1648.343, 'duration': 7.826}, {'end': 1665.878, 'text': 'so in this block we are basically importing all the required libraries like numpy, matplotlib and so on,', 'start': 1656.169, 'duration': 9.709}, {'end': 1673.572, 'text': "and we are loading the data that is available in the form of, let's say, the addresses.", 'start': 1667.066, 'duration': 6.506}, {'end': 1677.656, 'text': 'for simplicity sake, we will just take them as some data points.', 'start': 1673.572, 'duration': 4.084}, {'end': 1686.305, 'text': 'then the next thing we do is quickly do a scatter plot to see how they are related to each other with respect to each other.', 'start': 1677.656, 'duration': 8.649}, {'end': 1692.771, 'text': 'so in the scatter plot we see that there are a few distinct groups already being formed,', 'start': 1686.305, 'duration': 6.466}, {'end': 1699.819, 'text': 'so you can actually get an idea about how the cluster would look and how many clusters, what is the optimal number of clusters?', 'start': 1692.771, 'duration': 7.048}, {'end': 1704.761, 'text': 'and then starts the actual k-means clustering process.', 'start': 1700.319, 'duration': 4.442}, {'end': 1727.714, 'text': 'so we will assign each of these points to the centroids and then check whether they are the optimal distance which is the shortest distance and assign each of the points data points to the centroids and then go through this iterative process till the whole process converges and finally we get an output like this.', 'start': 1704.761, 'duration': 22.953}], 'summary': 'Walmart uses k-means clustering to find optimal locations for customer addresses.', 'duration': 55.086, 'max_score': 1622.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1622570.jpg'}, {'end': 1797.104, 'src': 'embed', 'start': 1772.962, 'weight': 2, 'content': [{'end': 1779.511, 'text': 'We have a few examples here which we will demonstrate how k-means clustering is used,', 'start': 1772.962, 'duration': 6.549}, {'end': 1783.356, 'text': 'and even there is a small implementation of k-means clustering as well.', 'start': 1779.511, 'duration': 3.845}, {'end': 1785.018, 'text': "Okay, so let's get started.", 'start': 1783.937, 'duration': 1.081}, {'end': 1794.263, 'text': 'Okay, so this block is basically importing the various libraries that are required, like matplotlib and numpy, and so on and so forth,', 'start': 1785.318, 'duration': 8.945}, {'end': 1797.104, 'text': 'which would be used as a part of the code.', 'start': 1794.263, 'duration': 2.841}], 'summary': 'Demonstrating k-means clustering with examples and small implementation.', 'duration': 24.142, 'max_score': 1772.962, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1772962.jpg'}, {'end': 2128.885, 'src': 'embed', 'start': 2101.054, 'weight': 3, 'content': [{'end': 2111.841, 'text': 'So as explained in the slides, the first step that is done in case of k-means clustering is to randomly assign some centroids.', 'start': 2101.054, 'duration': 10.787}, {'end': 2121.053, 'text': 'so, as a first step, we randomly allocate a couple of centroids, which we call here we are calling as centers,', 'start': 2112.421, 'duration': 8.632}, {'end': 2128.885, 'text': 'and then we put this in a loop and we take it through an iterative process For each of the data points.', 'start': 2121.053, 'duration': 7.832}], 'summary': 'K-means clustering involves randomly assigning centroids and iterating through data points in an iterative process.', 'duration': 27.831, 'max_score': 2101.054, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2101054.jpg'}], 'start': 1622.57, 'title': "Walmart's customer location optimization", 'summary': 'Explores utilizing customer data to optimize store locations through k-means clustering, demonstrating data analysis using python libraries and live code. it also explains k-means clustering process and its implementation in python, including creating test data and standard k-means functionality, with an additional rough implementation of the algorithm.', 'chapters': [{'end': 1677.656, 'start': 1622.57, 'title': "Walmart's customer location optimization", 'summary': 'Explores how walmart can utilize customer data to optimize store locations through k-means clustering, demonstrating data analysis using python libraries and live code.', 'duration': 55.086, 'highlights': ['Walmart uses k-means clustering to find optimal store locations using customer address data.', 'The process involves importing necessary libraries such as numpy and matplotlib and loading the available address data.']}, {'end': 2219.042, 'start': 1677.656, 'title': 'K-means clustering for store location optimization', 'summary': 'Explains the process of k-means clustering and its implementation in python, demonstrating the use of make_blobs to create test data and the standard k-means functionality to create clusters, with an additional rough implementation of the k-means algorithm.', 'duration': 541.386, 'highlights': ['The process of K-means clustering and its implementation in Python are explained, demonstrating the use of make_blobs to create test data and the standard k-means functionality to create clusters.', 'A rough implementation of the k-means algorithm is demonstrated, involving the random allocation of centroids, iterative assignment of data points to centroids, calculation of new centroids, and convergence checking.']}], 'duration': 596.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E1622570.jpg', 'highlights': ['Walmart uses k-means clustering to find optimal store locations using customer address data.', 'The process involves importing necessary libraries such as numpy and matplotlib and loading the available address data.', 'The process of K-means clustering and its implementation in Python are explained, demonstrating the use of make_blobs to create test data and the standard k-means functionality to create clusters.', 'A rough implementation of the k-means algorithm is demonstrated, involving the random allocation of centroids, iterative assignment of data points to centroids, calculation of new centroids, and convergence checking.']}, {'end': 3016.3, 'segs': [{'end': 2272.414, 'src': 'embed', 'start': 2242.801, 'weight': 1, 'content': [{'end': 2244.823, 'text': "So that's a small flaw here.", 'start': 2242.801, 'duration': 2.022}, {'end': 2247.906, 'text': 'So that is something additional checks may have to be added here.', 'start': 2244.883, 'duration': 3.023}, {'end': 2252.29, 'text': 'But again, as mentioned, this is not the most sophisticated implementation.', 'start': 2247.926, 'duration': 4.364}, {'end': 2257.055, 'text': 'This is like a kind of a rough implementation of the k-means clustering.', 'start': 2252.911, 'duration': 4.144}, {'end': 2261.961, 'text': 'So if we execute this code, this is what we get as the output.', 'start': 2257.756, 'duration': 4.205}, {'end': 2265.485, 'text': 'So this is the definition of this particular function.', 'start': 2261.981, 'duration': 3.504}, {'end': 2272.414, 'text': 'And then we call that find underscore clusters and we pass our data x and the number of clusters, which is four.', 'start': 2265.926, 'duration': 6.488}], 'summary': 'Rough k-means clustering implementation with 4 clusters, additional checks needed.', 'duration': 29.613, 'max_score': 2242.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2242801.jpg'}, {'end': 2318.735, 'src': 'embed', 'start': 2287.745, 'weight': 2, 'content': [{'end': 2289.587, 'text': 'This is the final position of the centroids.', 'start': 2287.745, 'duration': 1.842}, {'end': 2296.493, 'text': 'And as you can see visually also, this appears like a kind of a center of all these points here, right?', 'start': 2289.627, 'duration': 6.866}, {'end': 2299.275, 'text': 'Similarly, this is like the center of all these points.', 'start': 2296.593, 'duration': 2.682}, {'end': 2306.906, 'text': 'So this is the example or this is an example of an implementation of k-means clustering.', 'start': 2300.902, 'duration': 6.004}, {'end': 2318.735, 'text': 'And next, we will move on to see a couple of examples of how k-means clustering is used in maybe some real life scenarios or use cases.', 'start': 2307.707, 'duration': 11.028}], 'summary': 'Centroids represent center of data points. example of k-means clustering implementation.', 'duration': 30.99, 'max_score': 2287.745, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2287745.jpg'}, {'end': 2425.819, 'src': 'embed', 'start': 2395.815, 'weight': 0, 'content': [{'end': 2400.379, 'text': 'And this will then create clusters of only 16 clusters.', 'start': 2395.815, 'duration': 4.564}, {'end': 2403.301, 'text': 'So these colors, there are millions of colors.', 'start': 2400.479, 'duration': 2.822}, {'end': 2407.264, 'text': 'And now we need to bring it down to 16 colors.', 'start': 2403.942, 'duration': 3.322}, {'end': 2410.627, 'text': 'So we use k is equal to 16.', 'start': 2407.344, 'duration': 3.283}, {'end': 2414.15, 'text': 'And this is how, when we visualize, this is how it looks.', 'start': 2410.627, 'duration': 3.523}, {'end': 2417.232, 'text': 'These are all about 16 million possible colors.', 'start': 2414.25, 'duration': 2.982}, {'end': 2421.175, 'text': 'The input color space has 16 million possible colors.', 'start': 2417.573, 'duration': 3.602}, {'end': 2422.356, 'text': 'And we just.', 'start': 2421.696, 'duration': 0.66}, {'end': 2425.819, 'text': 'sub compress it to 16 colors.', 'start': 2423.357, 'duration': 2.462}], 'summary': 'Using k-means clustering to reduce 16 million colors to 16 clusters.', 'duration': 30.004, 'max_score': 2395.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2395815.jpg'}, {'end': 2650.53, 'src': 'embed', 'start': 2626.932, 'weight': 5, 'content': [{'end': 2637.4, 'text': 'So then what we will do is we will use k-means clustering to create just 16 clusters for the various colors and then apply that to the image.', 'start': 2626.932, 'duration': 10.468}, {'end': 2646.087, 'text': 'Now, what will happen is since the data is large because there are millions of colors, using regular k-means may be a little time consuming.', 'start': 2637.48, 'duration': 8.607}, {'end': 2650.53, 'text': 'So there is another version of k-means which is called mini batch k-means.', 'start': 2646.147, 'duration': 4.383}], 'summary': 'Using mini batch k-means to create 16 clusters for millions of colors in image', 'duration': 23.598, 'max_score': 2626.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2626932.jpg'}, {'end': 2783.968, 'src': 'embed', 'start': 2755.803, 'weight': 3, 'content': [{'end': 2763.832, 'text': 'Now we will reduce that to 16 colors using k-means clustering and we do the same process like before.', 'start': 2755.803, 'duration': 8.029}, {'end': 2771.802, 'text': 'We reshape it and then we cluster the colors to 16 and then we render the image once again.', 'start': 2764.073, 'duration': 7.729}, {'end': 2778.966, 'text': 'And we will see that the color, the quality of the image slightly deteriorates as you can see here.', 'start': 2772.442, 'duration': 6.524}, {'end': 2783.968, 'text': 'This has much finer details in this which are probably missing here.', 'start': 2778.986, 'duration': 4.982}], 'summary': 'Using k-means clustering to reduce image to 16 colors, resulting in slight deterioration of image quality.', 'duration': 28.165, 'max_score': 2755.803, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2755803.jpg'}, {'end': 2989.277, 'src': 'embed', 'start': 2962.928, 'weight': 4, 'content': [{'end': 2969.952, 'text': 'then we understood the distance measures, what are the different types of distance measures supported by k-means clustering?', 'start': 2962.928, 'duration': 7.024}, {'end': 2972.733, 'text': 'and we focused on k-means clustering.', 'start': 2969.952, 'duration': 2.781}, {'end': 2980.537, 'text': 'we talked about its applications and how exactly the process flow works for the k-means clustering, and then finally,', 'start': 2972.733, 'duration': 7.804}, {'end': 2984.78, 'text': 'we ended up with a demo and a couple of use cases.', 'start': 2980.537, 'duration': 4.243}, {'end': 2989.277, 'text': 'all right, so with that we come to the end of this video.', 'start': 2985.393, 'duration': 3.884}], 'summary': 'Explored distance measures in k-means clustering and its applications, with a demo and use cases.', 'duration': 26.349, 'max_score': 2962.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2962928.jpg'}], 'start': 2219.042, 'title': 'K-means clustering in color compression', 'summary': 'Discusses the flaws in k-means clustering, an example of its implementation, and its application in color compression, reducing 16 million possible colors to 16, resulting in the loss of some image information. it also covers the process flow of k-means clustering, including types of clustering, distance measures, applications, and use cases.', 'chapters': [{'end': 2604.622, 'start': 2219.042, 'title': 'K-means clustering in python', 'summary': 'Discusses the flaws in k-means clustering, an example of its implementation, and how it is used in color compression, reducing 16 million possible colors to 16, resulting in the loss of some image information.', 'duration': 385.58, 'highlights': ['Color compression using k-means clustering reduces 16 million possible colors to 16, resulting in reduced image quality while preserving most information.', 'Flaws in k-means clustering include convergence issues when the change is very minor, requiring additional checks to be added.', 'Implementation of k-means clustering is demonstrated with an example, where clusters are represented by different colors and centroids are visually depicted.']}, {'end': 3016.3, 'start': 2604.622, 'title': 'Color compression using k-means clustering', 'summary': 'Discusses the application of k-means clustering to compress colors in images, reducing them to 16 clusters, with minimal loss of information, and the process flow of k-means clustering, covering types of clustering, distance measures, applications, and use cases.', 'duration': 411.678, 'highlights': ['The chapter discusses the application of k-means clustering to compress colors in images, reducing them to 16 clusters, with minimal loss of information.', 'The process flow of k-means clustering, covering types of clustering, distance measures, applications, and use cases.', 'The use of mini batch k-means to process large data sets efficiently.']}], 'duration': 797.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Xvwt7y2jf5E/pics/Xvwt7y2jf5E2219042.jpg', 'highlights': ['Color compression using k-means reduces 16M colors to 16, preserving most info.', 'Flaws in k-means include convergence issues with minor changes, requiring checks.', 'Demonstrates k-means clustering with example, visually depicting clusters and centroids.', 'Application of k-means to compress colors in images, reducing to 16 clusters with minimal info loss.', 'Process flow of k-means covers types of clustering, distance measures, applications, and use cases.', 'Efficient processing of large datasets using mini batch k-means.']}], 'highlights': ["K-means clustering is an unsupervised learning algorithm used to group data into clusters based on similarities, with the 'k' value determining the number of clusters.", 'Application example of finding optimal store locations for Walmart in Florida using K-means clustering to avoid opening stores too close or too far apart.', 'Color compression using k-means reduces 16M colors to 16, preserving most info.', 'The process of K-means clustering and its implementation in Python are explained, demonstrating the use of make_blobs to create test data and the standard k-means functionality to create clusters.', 'The chapter explains the iterative process of determining the best clusters through centroid allocation, distance measurement, and centroid position calculation, with the technique called ELBO used to determine the best value of k.', 'The chapter provides an overview of Manhattan and cosine distance measures in k-means clustering, highlighting the process of assigning data points to the closest centroid based on minimum distance measure.', "The 'k' value in K-means clustering determines the number of clusters, where each cluster represents objects with similar characteristics.", "An example of using K-means clustering is demonstrated in the context of cricket, where the data of players' runs and wickets is clustered into batsmen and bowlers.", 'The process of calculating the distance and repositioning the centroid is repeated till the repositioning stops, which means that the algorithm has converged.', 'Walmart uses k-means clustering to find optimal store locations using customer address data.']}