title
K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka

description
🔥 Python Training for Data Science (Use Code "𝐘𝐎𝐔𝐓𝐔𝐁𝐄𝟐𝟎"): https://www.edureka.co/data-science-python-certification-course This Edureka Machine Learning tutorial (Machine Learning Tutorial with Python Blog: https://goo.gl/fe7ykh ) series presents another video on "K-Means Clustering Algorithm". Within the video you will learn the concepts of K-Means clustering and its implementation using python. Below are the topics covered in today's session: 1. What is Clustering? 2. Types of Clustering 3. What is K-Means Clustering? 4. How does a K-Means Algorithm works? 5. K-Means Clustering Using Python Machine Learning Tutorial Playlist: https://goo.gl/UxjTxm Subscribe to our channel to get video updates. Hit the subscribe button above. PG in Artificial Intelligence and Machine Learning with NIT Warangal : https://www.edureka.co/post-graduate/machine-learning-and-ai Post Graduate Certification in Data Science with IIT Guwahati - https://www.edureka.co/post-graduate/data-science-program (450+ Hrs || 9 Months || 20+ Projects & 100+ Case studies) How it Works? 1. This is a 5 Week Instructor led Online Course,40 hours of assignment and 20 hours of project work 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. At the end of the training you will be working on a real time project for which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - - - - About the Course Edureka's Python Online Certification Training will make you an expert in Python programming. It will also help you learn Python the Big data way with integration of Machine learning, Pig, Hive and Web Scraping through beautiful soup. During our Python Certification training, our instructors will help you: 1. Programmatically download and analyze data 2. Learn techniques to deal with different types of data – ordinal, categorical, encoding 3. Learn data visualization 4. Using I python notebooks, master the art of presenting step by step data analysis 5. Gain insight into the 'Roles' played by a Machine Learning Engineer 6. Describe Machine Learning 7. Work with real-time data 8. Learn tools and techniques for predictive modeling 9. Discuss Machine Learning algorithms and their implementation 10. Validate Machine Learning algorithms 11. Explain Time Series and its related concepts 12. Perform Text Mining and Sentimental analysis 13. Gain expertise to handle business in future, living the present - - - - - - - - - - - - - - - - - - - Why learn Python? Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python continues to be a favorite option for data scientists who use it for building and using Machine learning applications and other scientific computations. Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license. Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain. For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free). Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Customer Review Sairaam Varadarajan, Data Evangelist at Medtronic, Tempe, Arizona: "I took Big Data and Hadoop / Python course and I am planning to take Apache Mahout thus becoming the "customer of Edureka!". Instructors are knowledge... able and interactive in teaching. The sessions are well structured with a proper content in helping us to dive into Big Data / Python. Most of the online courses are free, edureka charges a minimal amount. Its acceptable for their hard-work in tailoring - All new advanced courses and its specific usage in industry. I am confident that, no other website which have tailored the courses like Edureka. It will help for an immediate take-off in Data Science and Hadoop working."

detail
{'title': 'K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka', 'heatmap': [{'end': 1581.623, 'start': 1554.878, 'weight': 0.972}, {'end': 1623.388, 'start': 1607.443, 'weight': 1}], 'summary': 'Covers the introduction and implementation of k-means clustering in python, its real-life applications in businesses, types of clustering algorithms, determining optimal k value, and detailed implementation in python with steps and recommendations.', 'chapters': [{'end': 190.903, 'segs': [{'end': 91.297, 'src': 'embed', 'start': 40.902, 'weight': 0, 'content': [{'end': 44.503, 'text': "So let me tell you more about what we'll learn today or how this session is designed.", 'start': 40.902, 'duration': 3.601}, {'end': 48.664, 'text': 'Well, this k-means clustering session is designed in a way that in the first part,', 'start': 44.983, 'duration': 3.681}, {'end': 52.464, 'text': 'you get your basics cleared and understand the concept and the algorithm behind it.', 'start': 48.664, 'duration': 3.8}, {'end': 56.805, 'text': 'Once you get comfortable with it, we will start its implementation using Python.', 'start': 53.004, 'duration': 3.801}, {'end': 60.918, 'text': "I've added some real-life scenarios to give you a proper understanding of the topic.", 'start': 57.439, 'duration': 3.479}, {'end': 64.599, 'text': "So in short you can say that these are the agenda for today's session.", 'start': 61.457, 'duration': 3.142}, {'end': 67.821, 'text': "We'll be learning about what is clustering types of clustering.", 'start': 64.638, 'duration': 3.183}, {'end': 74.486, 'text': "What exactly is k-means clustering? How does k-means algorithm work? And then finally we'll implement k-means using python.", 'start': 68.101, 'duration': 6.385}, {'end': 80.33, 'text': 'Yeah, one more thing this implementation part would be divided into two different parts right in the first part.', 'start': 74.866, 'duration': 5.464}, {'end': 88.635, 'text': "We'll learn how to implement k-means using python from the scratch a light and in the next part will be using python inbuilt library for implementing k-means algorithm.", 'start': 80.37, 'duration': 8.265}, {'end': 91.297, 'text': "All right, and then finally we'll compare output of them.", 'start': 88.955, 'duration': 2.342}], 'summary': 'Session covers basics of k-means clustering, implementing in python, and comparing outputs.', 'duration': 50.395, 'max_score': 40.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY40902.jpg'}, {'end': 152.569, 'src': 'embed', 'start': 113.534, 'weight': 1, 'content': [{'end': 114.695, 'text': 'so what is clustering??', 'start': 113.534, 'duration': 1.161}, {'end': 119.158, 'text': 'In simple words, if you want, clustering is nothing but creating different groups.', 'start': 115.336, 'duration': 3.822}, {'end': 120.799, 'text': 'the groups consist of similar element.', 'start': 119.158, 'duration': 1.641}, {'end': 122.88, 'text': 'If you want a definition kind of thing,', 'start': 121.3, 'duration': 1.58}, {'end': 128.904, 'text': 'then you can define clustering as a process of dividing the data sets into groups consisting of similar data points.', 'start': 122.88, 'duration': 6.024}, {'end': 132.086, 'text': 'the point within one cluster are as similar as possible.', 'start': 128.904, 'duration': 3.182}, {'end': 137.289, 'text': 'Whereas the point belonging to other cluster are completely different from the points in the first cluster.', 'start': 132.406, 'duration': 4.883}, {'end': 142.163, 'text': 'All right, and is one more thing clustering is often referred as an unsupervised learning technique.', 'start': 137.981, 'duration': 4.182}, {'end': 149.887, 'text': 'Now you would ask that what is the use of clustering or where is it used? All right, so it is very important to know so where it is used.', 'start': 142.624, 'duration': 7.263}, {'end': 152.569, 'text': "So let's understand clustering with an analogy.", 'start': 150.168, 'duration': 2.401}], 'summary': 'Clustering is the process of dividing data sets into groups of similar data points, often referred to as an unsupervised learning technique.', 'duration': 39.035, 'max_score': 113.534, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY113534.jpg'}], 'start': 6.926, 'title': 'K-means clustering', 'summary': 'Introduces k-means clustering, its python implementation, and real-life scenarios. it covers the algorithm behind k-means, its types, and comparing implementations. it also discusses the concept of clustering, defining it as the process of dividing data sets into groups consisting of similar data points, and explains the use of clustering with an analogy of diners in a restaurant. it highlights that clustering is an unsupervised learning technique.', 'chapters': [{'end': 91.297, 'start': 6.926, 'title': 'Understanding k-means clustering', 'summary': 'Introduces the concept of clustering, with a focus on k-means clustering, including its implementation using python, and real-life scenarios, covering topics like the algorithm behind k-means, its types, and comparing implementations.', 'duration': 84.371, 'highlights': ['The session covers the basics of clustering and the concept and algorithm behind k-means, followed by its implementation using Python, including real-life scenarios for better understanding.', 'The main focus of the session is on k-means clustering, with the agenda covering topics such as types of clustering, the working of the k-means algorithm, and the implementation of k-means using Python.', "The implementation of k-means using Python is divided into two parts: implementing from scratch and using Python's inbuilt library, with a final comparison of their outputs."]}, {'end': 190.903, 'start': 91.897, 'title': 'Introduction to k-means clustering', 'summary': 'Discusses the concept of clustering, defining it as the process of dividing data sets into groups consisting of similar data points, and explains the use of clustering with an analogy of diners in a restaurant. it also highlights that clustering is an unsupervised learning technique.', 'duration': 99.006, 'highlights': ['The process of clustering involves dividing data sets into groups consisting of similar data points, with points within one cluster being as similar as possible, while those belonging to other clusters are completely different (e.g., diners at different restaurant tables).', 'Clustering is referred to as an unsupervised learning technique, and it is important to understand its use and applications.']}], 'duration': 183.977, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY6926.jpg', 'highlights': ['The main focus is on k-means clustering, covering types, algorithm, and Python implementation.', 'Clustering involves dividing data into groups of similar data points, akin to diners at different restaurant tables.', 'Clustering is an unsupervised learning technique with important use and applications.', 'The session covers basics of clustering, k-means concept, algorithm, Python implementation, and real-life scenarios.', "K-means implementation in Python is divided into two parts: from scratch and using Python's inbuilt library, with a final comparison."]}, {'end': 332.591, 'segs': [{'end': 314.308, 'src': 'embed', 'start': 241.471, 'weight': 0, 'content': [{'end': 244.012, 'text': 'Now the next question comes where is it used?', 'start': 241.471, 'duration': 2.541}, {'end': 251.415, 'text': "So let's see, one by one, Flickr maps of photo and other map sites uses clustering to reduce the number of markers on a map.", 'start': 244.612, 'duration': 6.803}, {'end': 255.097, 'text': 'Even the Amazon is using the popular recommendation system,', 'start': 251.955, 'duration': 3.142}, {'end': 260.019, 'text': 'which is using the clustering to show you the recommended list of product according to your past purchase history.', 'start': 255.097, 'duration': 4.922}, {'end': 262.18, 'text': 'And yes, even the Netflix.', 'start': 260.619, 'duration': 1.561}, {'end': 265.161, 'text': 'it recommends you the movies based on your watch history right?', 'start': 262.18, 'duration': 2.981}, {'end': 268.543, 'text': 'Whatever you have watched, it will show you some similar movies related to it.', 'start': 265.522, 'duration': 3.021}, {'end': 269.285, 'text': 'All right.', 'start': 269.085, 'duration': 0.2}, {'end': 274.35, 'text': 'So how do you think these recommended lists are generated? Well, all the concept lies behind this clustering.', 'start': 269.586, 'duration': 4.764}, {'end': 275.051, 'text': 'All right.', 'start': 274.831, 'duration': 0.22}, {'end': 280.036, 'text': 'Well in general you can say that clustering can be used to segment the customers or market in the marketing.', 'start': 275.411, 'duration': 4.625}, {'end': 285.101, 'text': 'It can be used by the social network site and marketing new groups based on users data.', 'start': 280.597, 'duration': 4.504}, {'end': 285.742, 'text': 'All right.', 'start': 285.522, 'duration': 0.22}, {'end': 288.865, 'text': "So let's move on and see how business is using clustering.", 'start': 286.222, 'duration': 2.643}, {'end': 292.358, 'text': 'Well clustering can help the business to manage their data better.', 'start': 289.477, 'duration': 2.881}, {'end': 298.961, 'text': 'Generally, they use clustering for image segmentation grouping web pages market segmentation and information retrieval.', 'start': 292.939, 'duration': 6.022}, {'end': 307.025, 'text': 'For example, in a retail business, the data clustering helps in analyzing the customer shopping behavior, sales campaigns and customer retention.', 'start': 299.501, 'duration': 7.524}, {'end': 314.308, 'text': 'in case of insurance company, a clustering is deployed in the field of fraud detection, risk factor identification and customer retention efforts.', 'start': 307.025, 'duration': 7.283}], 'summary': 'Clustering is used in various industries such as amazon, netflix, marketing, and business for data management and customer segmentation.', 'duration': 72.837, 'max_score': 241.471, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY241471.jpg'}], 'start': 191.043, 'title': 'Clustering in business', 'summary': 'Discusses the concept of clustering, its applications in businesses such as retail, amazon, and netflix, and its use in customer segmentation, fraud detection, and market analysis. it explores the different types of clustering and its significance in managing customer behavior, sales campaigns, and customer retention.', 'chapters': [{'end': 332.591, 'start': 191.043, 'title': 'Clustering in business', 'summary': 'Discusses the concept of clustering, its applications in businesses such as retail, amazon, and netflix, and its use in customer segmentation, fraud detection, and market analysis. it explores the different types of clustering and its significance in managing customer behavior, sales campaigns, and customer retention.', 'duration': 141.548, 'highlights': ['Clustering is used in businesses for customer segmentation, fraud detection, risk factor identification, and customer retention efforts.', 'Retail businesses use clustering for analyzing customer shopping behavior, sales campaigns, and customer retention.', 'Amazon utilizes clustering in the popular recommendation system to show recommended products based on past purchase history.', "Netflix uses clustering to recommend movies based on users' watch history.", 'Clustering is used to segment customers or markets in marketing and by social network sites to form new groups based on user data.', 'Different types of clustering include image segmentation, web page grouping, market segmentation, and information retrieval.']}], 'duration': 141.548, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY191043.jpg', 'highlights': ['Clustering is used in businesses for customer segmentation, fraud detection, risk factor identification, and customer retention efforts.', 'Retail businesses use clustering for analyzing customer shopping behavior, sales campaigns, and customer retention.', 'Amazon utilizes clustering in the popular recommendation system to show recommended products based on past purchase history.', "Netflix uses clustering to recommend movies based on users' watch history.", 'Clustering is used to segment customers or markets in marketing and by social network sites to form new groups based on user data.', 'Different types of clustering include image segmentation, web page grouping, market segmentation, and information retrieval.']}, {'end': 501.38, 'segs': [{'end': 360.2, 'src': 'embed', 'start': 333.135, 'weight': 1, 'content': [{'end': 336.576, 'text': 'exclusive clustering overlapping clustering and hierarchical clustering.', 'start': 333.135, 'duration': 3.441}, {'end': 338.597, 'text': "Let's start with exclusive clustering.", 'start': 336.917, 'duration': 1.68}, {'end': 345.86, 'text': 'Well, this exclusive clustering is a hard clustering in which the data points or the items exclusively belongs to one cluster.', 'start': 338.977, 'duration': 6.883}, {'end': 350.362, 'text': 'an example of such clustering can be a k-means clustering and the examples shown below.', 'start': 345.86, 'duration': 4.502}, {'end': 355.344, 'text': 'you can see that all the blue data points lie within the blue cluster and all the pink data points.', 'start': 350.362, 'duration': 4.982}, {'end': 356.825, 'text': 'They lie within the pink cluster.', 'start': 355.524, 'duration': 1.301}, {'end': 360.2, 'text': 'Both these clusters are entirely different from each other.', 'start': 357.398, 'duration': 2.802}], 'summary': 'Exclusive clustering assigns data points exclusively to one cluster, e.g., k-means.', 'duration': 27.065, 'max_score': 333.135, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY333135.jpg'}, {'end': 394.21, 'src': 'embed', 'start': 369.325, 'weight': 2, 'content': [{'end': 376.029, 'text': 'Well, this overlapping clustering is a soft cluster in which the data points or the item belong to multiple cluster.', 'start': 369.325, 'duration': 6.704}, {'end': 379.271, 'text': 'an example of such clustering can be fuzzy or see means clustering.', 'start': 376.029, 'duration': 3.242}, {'end': 384.422, 'text': 'Even in diagram, you can see that some of the blue data points are overlapping with the pink data points.', 'start': 379.938, 'duration': 4.484}, {'end': 385.122, 'text': 'All right.', 'start': 384.862, 'duration': 0.26}, {'end': 388.425, 'text': 'So this is what happens in a c-means clustering of the fuzzy clustering.', 'start': 385.523, 'duration': 2.902}, {'end': 390.407, 'text': "There's what a overlapping clustering is.", 'start': 388.685, 'duration': 1.722}, {'end': 391.848, 'text': 'All right, fine.', 'start': 390.867, 'duration': 0.981}, {'end': 394.21, 'text': 'Next comes the hierarchical clustering.', 'start': 392.548, 'duration': 1.662}], 'summary': 'Overlapping clustering allows data points to belong to multiple clusters, as seen in fuzzy or c-means clustering.', 'duration': 24.885, 'max_score': 369.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY369325.jpg'}, {'end': 483.033, 'src': 'embed', 'start': 448.018, 'weight': 0, 'content': [{'end': 452.179, 'text': 'okay, so the final tree contains all the cluster combined into a single cluster.', 'start': 448.018, 'duration': 4.161}, {'end': 452.6, 'text': 'all right.', 'start': 452.179, 'duration': 0.421}, {'end': 455.541, 'text': 'so this is what a hierarchical clustering is and how it works.', 'start': 452.6, 'duration': 2.941}, {'end': 459.082, 'text': 'okay, an example of hierarchical clustering can be a dendogram.', 'start': 455.541, 'duration': 3.541}, {'end': 459.282, 'text': 'all right.', 'start': 459.082, 'duration': 0.2}, {'end': 463.164, 'text': 'Till now we have covered about what is clustering, where it is used,', 'start': 460.123, 'duration': 3.041}, {'end': 467.246, 'text': 'how the business is using clustering and what are the different types of clustering available.', 'start': 463.164, 'duration': 4.082}, {'end': 470.808, 'text': 'All right now since our main focus would be on k-means clustering.', 'start': 467.266, 'duration': 3.542}, {'end': 472.088, 'text': "So let's focus on that.", 'start': 470.848, 'duration': 1.24}, {'end': 479.671, 'text': 'So what is k-means clustering? Well k-means is a clustering algorithm whose main goal is to find the groups in the data.', 'start': 472.508, 'duration': 7.163}, {'end': 483.033, 'text': 'The number of groups or cluster is represented by K.', 'start': 480.192, 'duration': 2.841}], 'summary': 'Hierarchical clustering combines all clusters into a single cluster. k-means aims to find groups in data based on a specified number of clusters (k).', 'duration': 35.015, 'max_score': 448.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY448018.jpg'}], 'start': 333.135, 'title': 'Clustering types and k-means algorithm', 'summary': 'Explains exclusive, overlapping, and hierarchical clustering, with a focus on k-means clustering, an algorithm to find groups in the data represented by k.', 'chapters': [{'end': 501.38, 'start': 333.135, 'title': 'Clustering: exclusive, overlapping, hierarchical', 'summary': 'Explains exclusive clustering where data points exclusively belong to one cluster, overlapping clustering where data points belong to multiple clusters, and hierarchical clustering where clusters are combined based on similarity, with a focus on k-means clustering, an algorithm to find groups in the data represented by k.', 'duration': 168.245, 'highlights': ['The chapter explains exclusive clustering as a hard clustering in which data points exclusively belong to one cluster, exemplified by k-means clustering.', 'It also covers overlapping clustering as a soft clustering in which data points belong to multiple clusters, exemplified by fuzzy or c-means clustering.', 'The example of hierarchical clustering is illustrated with a case where clusters are combined based on similarity, with the final tree containing all the clusters combined into a single cluster.', 'The focus shifts to k-means clustering, which is described as an algorithm to find groups in the data represented by K, running iteratively to assign each data point to one of the K groups based on the provided features.']}], 'duration': 168.245, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY333135.jpg', 'highlights': ['The focus shifts to k-means clustering, which is described as an algorithm to find groups in the data represented by K, running iteratively to assign each data point to one of the K groups based on the provided features.', 'The chapter explains exclusive clustering as a hard clustering in which data points exclusively belong to one cluster, exemplified by k-means clustering.', 'It also covers overlapping clustering as a soft clustering in which data points belong to multiple clusters, exemplified by fuzzy or c-means clustering.', 'The example of hierarchical clustering is illustrated with a case where clusters are combined based on similarity, with the final tree containing all the clusters combined into a single cluster.']}, {'end': 1026.84, 'segs': [{'end': 571.676, 'src': 'embed', 'start': 532.846, 'weight': 0, 'content': [{'end': 540.34, 'text': 'Okay, She asked you to pick three different items of clothes at random, and that will be the starting point for those three separate clusters.', 'start': 532.846, 'duration': 7.494}, {'end': 545.289, 'text': 'and then again you have to go through that massive initial cluster and look at each item of clothing in turn.', 'start': 540.34, 'duration': 4.949}, {'end': 550.96, 'text': 'Now, at this point of time, you have to compare the attributes such as water, temperature, drying temperature,', 'start': 545.916, 'duration': 5.044}, {'end': 557.045, 'text': 'color with each of the three starting item and then place the new item into the best cluster or the best file.', 'start': 550.96, 'duration': 6.085}, {'end': 563.23, 'text': 'Okay So what do you think is the definition of a best cluster or a best file? Well, it is not based on a whole cluster.', 'start': 557.405, 'duration': 5.825}, {'end': 570.155, 'text': "It purely rely on the starting point technically that starting item is known as centroid, but don't worry about that.", 'start': 563.75, 'duration': 6.405}, {'end': 571.676, 'text': "We'll get to know about that later.", 'start': 570.175, 'duration': 1.501}], 'summary': 'Using three random clothing items to create clusters based on attributes like water, temperature, and color to determine the best file for each item.', 'duration': 38.83, 'max_score': 532.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY532846.jpg'}, {'end': 668.569, 'src': 'embed', 'start': 642.891, 'weight': 1, 'content': [{'end': 649.413, 'text': 'For example, document classification, where you can create the clusters of document and multiple categories based on tags,', 'start': 642.891, 'duration': 6.522}, {'end': 651.254, 'text': 'topics and contents of the document.', 'start': 649.413, 'duration': 1.841}, {'end': 657.757, 'text': 'This is a very standard classification problem and k-means algorithm is one of the highly suitable algorithm for this purpose.', 'start': 651.674, 'duration': 6.083}, {'end': 661.418, 'text': "Now, let's proceed and understand about k-means algorithm in depth.", 'start': 658.457, 'duration': 2.961}, {'end': 668.569, 'text': 'Imagine you have some data and your task is to plot the data points on the graph and divide the points into three different cluster.', 'start': 662.026, 'duration': 6.543}], 'summary': 'K-means algorithm is used for document classification, creating clusters and categories based on tags, topics, and contents, suitable for dividing data points into three clusters.', 'duration': 25.678, 'max_score': 642.891, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY642891.jpg'}, {'end': 902.231, 'src': 'embed', 'start': 876.267, 'weight': 3, 'content': [{'end': 881.089, 'text': 'Well, you can always check the quality of the cluster data just by adding up the variance within each cluster.', 'start': 876.267, 'duration': 4.822}, {'end': 883.991, 'text': "So there's the sum total of variance within the three clusters.", 'start': 881.549, 'duration': 2.442}, {'end': 890.841, 'text': "Well, there's a fact that k-means clustering cannot see which one of them is the best clustering, but what it does.", 'start': 884.751, 'duration': 6.09}, {'end': 897.667, 'text': 'it keeps a track of these clusters and their total variance and repeat the same steps from the scratch, but with different starting point.', 'start': 890.841, 'duration': 6.826}, {'end': 902.231, 'text': "So let's see once again, we are here at the beginning to choose three random points.", 'start': 898.547, 'duration': 3.684}], 'summary': 'K-means clustering tracks total variance and repeats steps with different starting points.', 'duration': 25.964, 'max_score': 876.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY876267.jpg'}, {'end': 942.528, 'src': 'embed', 'start': 909.257, 'weight': 5, 'content': [{'end': 911.799, 'text': 'Okay, so algorithm what it does?', 'start': 909.257, 'duration': 2.542}, {'end': 915.722, 'text': 'it picks three initial cluster and then it adds the remaining point to the cluster,', 'start': 911.799, 'duration': 3.923}, {'end': 921.069, 'text': 'having the nearest mean again recalculating the mean each time a new point is added to the cluster.', 'start': 915.722, 'duration': 5.347}, {'end': 923.612, 'text': 'Now, once we have the cluster data,', 'start': 921.81, 'duration': 1.802}, {'end': 931.16, 'text': 'will calculate the sum of variation within each cluster and again repeat the same steps again and again pick three initial cluster cluster,', 'start': 923.612, 'duration': 7.548}, {'end': 935.125, 'text': 'the remaining point and finally take the sum of variation within each cluster.', 'start': 931.16, 'duration': 3.965}, {'end': 942.528, 'text': 'Now, at this point, our algorithm knows that is the best clustering so far as it has the lowest sum of variation,', 'start': 935.965, 'duration': 6.563}], 'summary': 'Algorithm picks 3 clusters, adds points, recalculates mean, and minimizes variation to find best clustering.', 'duration': 33.271, 'max_score': 909.257, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY909257.jpg'}], 'start': 502.101, 'title': 'Clothes clustering and k-means algorithm', 'summary': 'Discusses clothes clustering for laundry based on attributes such as water temperature, drying temperature, and color, aiming to identify the best cluster for each item. it also explains the iterative process of determining centroids for clusters in the k-means algorithm, its application in document classification, and the process of assigning data points to clusters based on nearest mean, and calculating the sum of variation within each cluster.', 'chapters': [{'end': 590.517, 'start': 502.101, 'title': 'Clothes clustering for laundry', 'summary': 'Discusses the process of clustering clothes for laundry based on attributes such as water temperature, drying temperature, and color, with the goal of identifying the best cluster for each item.', 'duration': 88.416, 'highlights': ['Clothes clustering involves dividing items into clusters based on attributes like water temperature, drying temperature, and color, with the goal of placing new items in the best-matching cluster.', 'The process starts by randomly picking three items of clothing, which serve as the starting points for separate clusters.', 'The definition of a best cluster is based on placing each item closer to or farther away from the starting item, depending on their similarity in attributes.']}, {'end': 1026.84, 'start': 591.278, 'title': 'Understanding k-means algorithm', 'summary': 'Explains the iterative process of determining centroids for clusters in the k-means algorithm, its application in document classification, and the iterative nature of the algorithm with the number of clusters and iterations. it also covers the process of assigning data points to clusters based on nearest mean and calculating the sum of variation within each cluster, and the iterative nature of the algorithm with the number of clusters and iterations.', 'duration': 435.562, 'highlights': ['Application of k-means algorithm in document classification The k-means algorithm can be applied to document classification to create clusters of documents in multiple categories based on tags, topics, and contents, making it highly suitable for this purpose.', 'Iterative nature of the k-means algorithm with the number of clusters and iterations The algorithm iterates with different numbers of clusters and iterations, and the quality of the clustering can be checked by adding up the variance within each cluster, with the algorithm repeating the same steps with different starting points and keeping track of the clusters and their total variance.', 'Process of assigning data points to clusters based on nearest mean and calculating the sum of variation within each cluster The algorithm measures the distance of data points from the centroids, assigns the points to the nearest clusters, recalculates the mean each time a new point is added to the cluster, and calculates the sum of variation within each cluster to determine the best clustering.']}], 'duration': 524.739, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY502101.jpg', 'highlights': ['Clothes clustering involves dividing items into clusters based on attributes like water temperature, drying temperature, and color, with the goal of placing new items in the best-matching cluster.', 'Application of k-means algorithm in document classification to create clusters of documents in multiple categories based on tags, topics, and contents.', 'The process starts by randomly picking three items of clothing, which serve as the starting points for separate clusters.', 'Iterative nature of the k-means algorithm with the number of clusters and iterations, and the quality of the clustering can be checked by adding up the variance within each cluster.', 'The definition of a best cluster is based on placing each item closer to or farther away from the starting item, depending on their similarity in attributes.', 'The algorithm measures the distance of data points from the centroids, assigns the points to the nearest clusters, recalculates the mean each time a new point is added to the cluster, and calculates the sum of variation within each cluster to determine the best clustering.']}, {'end': 1225.836, 'segs': [{'end': 1065.931, 'src': 'embed', 'start': 1027.3, 'weight': 0, 'content': [{'end': 1030.767, 'text': 'Yeah, we need to pick a bunch of starting points before finalizing the cluster.', 'start': 1027.3, 'duration': 3.467}, {'end': 1031.308, 'text': 'All right.', 'start': 1031.087, 'duration': 0.221}, {'end': 1034.011, 'text': 'Now coming to one of the most important question.', 'start': 1031.969, 'duration': 2.042}, {'end': 1037.115, 'text': 'How will you decide what value should you use for K??', 'start': 1034.452, 'duration': 2.663}, {'end': 1042.981, 'text': 'Well, sometimes you will find that the number of cluster to create are obvious, like K equal 3,', 'start': 1037.395, 'duration': 5.586}, {'end': 1047.126, 'text': 'but sometimes it is not that easy to judge or guess the number of cluster manually.', 'start': 1042.981, 'duration': 4.145}, {'end': 1052.892, 'text': 'So what will you do in this case? Well, one of the option to decide the value of K is a hit and trial method.', 'start': 1047.746, 'duration': 5.146}, {'end': 1055.768, 'text': 'where you just need to try different values of K.', 'start': 1053.467, 'duration': 2.301}, {'end': 1063.81, 'text': 'So, starting with the minimum possible value of K, that is, K equal 1, which symbolizes that all the data points lie within one single cluster,', 'start': 1055.768, 'duration': 8.042}, {'end': 1065.931, 'text': 'K equal 1 is the worst case scenario.', 'start': 1063.81, 'duration': 2.121}], 'summary': 'Deciding the value of k for clustering involves hit and trial method, starting with k=1, and sometimes it is obvious like k=3.', 'duration': 38.631, 'max_score': 1027.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1027300.jpg'}, {'end': 1163.735, 'src': 'embed', 'start': 1125.602, 'weight': 1, 'content': [{'end': 1128.643, 'text': 'Then in that case the variation becomes 0, obviously right?', 'start': 1125.602, 'duration': 3.041}, {'end': 1133.184, 'text': 'So you can say that the number of clusters are indirectly proportional to total variation.', 'start': 1129.163, 'duration': 4.021}, {'end': 1136.725, 'text': 'If the number of cluster increases the total variation decreases.', 'start': 1133.744, 'duration': 2.981}, {'end': 1142.459, 'text': 'Finally, when we plot the reduction in variance for per value of K, will get a graph like this,', 'start': 1137.515, 'duration': 4.944}, {'end': 1149.544, 'text': 'where on the x-axis we have the number of cluster K and on the y-axis we have reduction in variance on the graph.', 'start': 1142.459, 'duration': 7.085}, {'end': 1156.329, 'text': "You can see that there's a huge reduction in variance with K equal 3 but after that there is no such steep change in the variation.", 'start': 1149.564, 'duration': 6.765}, {'end': 1163.735, 'text': 'So this point of change is known as the elbow point and the value of this point is the one which decides the value of K.', 'start': 1156.87, 'duration': 6.865}], 'summary': 'Number of clusters inversely proportional to total variation; elbow method identifies optimal k value.', 'duration': 38.133, 'max_score': 1125.602, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1125602.jpg'}, {'end': 1212.928, 'src': 'embed', 'start': 1183.99, 'weight': 3, 'content': [{'end': 1186.592, 'text': 'But before that, let me just summarize the algorithm for you.', 'start': 1183.99, 'duration': 2.602}, {'end': 1188.4, 'text': 'Okay, So what we do?', 'start': 1186.893, 'duration': 1.507}, {'end': 1197.545, 'text': 'initially we randomly choose K examples as initial centroid and, while true, we create K clusters by assigning each example to the closest centroid.', 'start': 1188.4, 'duration': 9.145}, {'end': 1199.621, 'text': 'So, first of all what we do.', 'start': 1198.32, 'duration': 1.301}, {'end': 1204.303, 'text': 'we randomly choose K examples as initial centroids, then we check the condition.', 'start': 1199.621, 'duration': 4.682}, {'end': 1207.045, 'text': 'while true, then what we have to do in that case,', 'start': 1204.303, 'duration': 2.742}, {'end': 1212.928, 'text': 'unless and until it is true will be creating K different cluster by assigning each example to the closest centroid.', 'start': 1207.045, 'duration': 5.883}], 'summary': 'Algorithm randomly chooses k examples as initial centroids and creates k clusters by assigning each example to the closest centroid.', 'duration': 28.938, 'max_score': 1183.99, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1183990.jpg'}], 'start': 1027.3, 'title': 'Finding optimal k for clustering', 'summary': 'Discusses the process of determining the optimal value of k for clustering, emphasizing the importance of selecting the right number of clusters and suggesting a hit and trial method for deciding the value of k, starting from k=1.', 'chapters': [{'end': 1065.931, 'start': 1027.3, 'title': 'Finding optimal k for clustering', 'summary': 'Discusses the process of determining the optimal value of k for clustering, emphasizing the importance of selecting the right number of clusters and suggesting a hit and trial method for deciding the value of k, starting from k=1.', 'duration': 38.631, 'highlights': ['The hit and trial method is suggested for deciding the value of K, starting with the minimum possible value of K, that is, K equal 1, which symbolizes that all the data points lie within one single cluster. (Relevance: 3)', "It's mentioned that sometimes the number of clusters to create are obvious, like K equal 3, but sometimes it is not easy to judge or guess the number of clusters manually. (Relevance: 2)"]}, {'end': 1225.836, 'start': 1066.371, 'title': 'K-means algorithm summary', 'summary': 'Explains how the k-means algorithm works, with key points including the relationship between the number of clusters and total variance, the concept of the elbow point to determine the value of k, and the process of implementing k-means using python.', 'duration': 159.465, 'highlights': ['The number of clusters are indirectly proportional to total variation, with the total variation decreasing as the number of clusters increases. As the number of clusters increases, the total variation within each cluster gets smaller, and when the total number of clusters equals the total number of points, the variation becomes 0.', 'The concept of the elbow point helps determine the value of K, where a graph shows a significant reduction in variance at K=3, indicating the elbow point as the value that decides the optimal K. The reduction in variance plot shows a significant reduction in variance at K=3, and the elbow point is identified as the value that decides the optimal K.', 'The k-means algorithm involves randomly choosing K examples as initial centroids, creating K clusters by assigning each example to the closest centroid, and computing K new centroids by averaging examples in the cluster to find the exact number of clusters. The k-means algorithm includes the process of randomly choosing K examples as initial centroids, creating K clusters by assigning each example to the closest centroid, and computing K new centroids to find the exact number of clusters.']}], 'duration': 198.536, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1027300.jpg', 'highlights': ['The hit and trial method is suggested for deciding the value of K, starting with the minimum possible value of K, that is, K equal 1, which symbolizes that all the data points lie within one single cluster. (Relevance: 3)', 'The concept of the elbow point helps determine the value of K, where a graph shows a significant reduction in variance at K=3, indicating the elbow point as the value that decides the optimal K. The reduction in variance plot shows a significant reduction in variance at K=3, and the elbow point is identified as the value that decides the optimal K.', 'The number of clusters are indirectly proportional to total variation, with the total variation decreasing as the number of clusters increases. As the number of clusters increases, the total variation within each cluster gets smaller, and when the total number of clusters equals the total number of points, the variation becomes 0.', 'The k-means algorithm involves randomly choosing K examples as initial centroids, creating K clusters by assigning each example to the closest centroid, and computing K new centroids by averaging examples in the cluster to find the exact number of clusters.', "It's mentioned that sometimes the number of clusters to create are obvious, like K equal 3, but sometimes it is not easy to judge or guess the number of clusters manually. (Relevance: 2)"]}, {'end': 1624.589, 'segs': [{'end': 1261.183, 'src': 'embed', 'start': 1226.176, 'weight': 0, 'content': [{'end': 1227.977, 'text': "Now, let's move on to the implementation part.", 'start': 1226.176, 'duration': 1.801}, {'end': 1231.81, 'text': "Let's see how to implement canines clustering algorithm in python.", 'start': 1228.889, 'duration': 2.921}, {'end': 1233.931, 'text': "So, in the first step, what you'll do?", 'start': 1232.59, 'duration': 1.341}, {'end': 1239.793, 'text': 'you will be importing all the important modules, such as Panda, numpy, sklearn and piplot, and the next step,', 'start': 1233.931, 'duration': 5.862}, {'end': 1242.174, 'text': 'you will be importing the iris data set into the python,', 'start': 1239.793, 'duration': 2.381}, {'end': 1247.516, 'text': 'which is a very standard data set for classification problems and can be downloaded from the UCI repositories.', 'start': 1242.174, 'duration': 5.342}, {'end': 1253.139, 'text': "In the third part, you'll view the data set study it and then convert the data set into Panda data frame.", 'start': 1248.197, 'duration': 4.942}, {'end': 1260.563, 'text': 'All right as a step for finally will perform the k-means analysis on it and see the result of the cluster as a cluster vector and also as a plot.', 'start': 1253.399, 'duration': 7.164}, {'end': 1261.183, 'text': 'All right.', 'start': 1260.963, 'duration': 0.22}], 'summary': 'Implement canines clustering algorithm in python using iris dataset, importing modules like panda, numpy, sklearn, and piplot, and performing k-means analysis to visualize the cluster results.', 'duration': 35.007, 'max_score': 1226.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1226176.jpg'}, {'end': 1323.09, 'src': 'embed', 'start': 1296.552, 'weight': 3, 'content': [{'end': 1303.576, 'text': 'We are defining a random seed as 200 and defining K for the k-means clustering, or the total number of cluster that we want is 3.', 'start': 1296.552, 'duration': 7.024}, {'end': 1305.677, 'text': 'the next task is choosing the first centroid.', 'start': 1303.576, 'duration': 2.101}, {'end': 1306.438, 'text': 'All right.', 'start': 1306.218, 'duration': 0.22}, {'end': 1314.344, 'text': "Fine So these numbers lie between 1 to 80, right? So for choosing centroid, I'll be using any random number between 0 to 80 fine.", 'start': 1307.054, 'duration': 7.29}, {'end': 1318.366, 'text': 'So here plot dot scatter what we are doing here.', 'start': 1316.084, 'duration': 2.282}, {'end': 1323.09, 'text': 'We are creating a scatter plot such that on x-axis there will be data frame with the x-array.', 'start': 1318.406, 'duration': 4.684}], 'summary': 'Defining random seed as 200, k-means cluster k=3, choosing centroid and creating scatter plot.', 'duration': 26.538, 'max_score': 1296.552, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1296552.jpg'}, {'end': 1585.645, 'src': 'heatmap', 'start': 1554.878, 'weight': 4, 'content': [{'end': 1560.361, 'text': 'total number of cluster that I want is three initial values one total number of jobs, performance one, and so on.', 'start': 1554.878, 'duration': 5.483}, {'end': 1562.263, 'text': "so so next what we'll do?", 'start': 1560.361, 'duration': 1.902}, {'end': 1564.264, 'text': 'learn the label executed.', 'start': 1562.263, 'duration': 2.001}, {'end': 1569.067, 'text': 'And then finally plotted executed.', 'start': 1566.345, 'duration': 2.722}, {'end': 1574.51, 'text': 'But what we see that when we compare the result the colors are in different order.', 'start': 1571.308, 'duration': 3.202}, {'end': 1581.623, 'text': 'There are some points which you can take note of the k-means clustering is a very sensitive to scale due to its reliance on Euclidean distance.', 'start': 1575.12, 'duration': 6.503}, {'end': 1585.645, 'text': "So be sure to normalize your data if they're likely to be scaling problem.", 'start': 1582.023, 'duration': 3.622}], 'summary': '3 clusters sought with 1 initial value, 1 job performance, and more; k-means clustering sensitive to scale', 'duration': 25.284, 'max_score': 1554.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1554878.jpg'}, {'end': 1624.589, 'src': 'heatmap', 'start': 1607.443, 'weight': 1, 'content': [{'end': 1609.563, 'text': 'I hope you have enjoyed listening to this video.', 'start': 1607.443, 'duration': 2.12}, {'end': 1617.526, 'text': 'Please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest.', 'start': 1609.924, 'duration': 7.602}, {'end': 1623.388, 'text': 'Do look out for more videos in our playlist and subscribe to Edureka channel to learn more.', 'start': 1617.906, 'duration': 5.482}, {'end': 1624.589, 'text': 'Happy learning.', 'start': 1623.929, 'duration': 0.66}], 'summary': 'Encourage likes, comments, and subscriptions for edureka channel.', 'duration': 17.146, 'max_score': 1607.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1607443.jpg'}], 'start': 1226.176, 'title': 'Implementing k-means clustering in python', 'summary': 'Details the implementation of the k-means clustering algorithm in python, covering the steps to import necessary modules, import the iris dataset, convert the dataset to a panda dataframe, perform k-means analysis, and utilize a python library for implementation. it also outlines the process of implementing k-means clustering with a random seed of 200 and k=3, creating scatter plots, assigning data points to clusters, updating centroids, and repeating the process until convergence, with a mention of the sensitivity of k-means clustering to scale and the recommendation to normalize the data.', 'chapters': [{'end': 1296.492, 'start': 1226.176, 'title': 'Implementing k-means clustering in python', 'summary': 'Details the implementation of the k-means clustering algorithm in python, covering the steps to import necessary modules, import the iris dataset, convert the dataset to a panda dataframe, perform k-means analysis, and utilize a python library for implementation.', 'duration': 70.316, 'highlights': ['The chapter details the implementation of the k-means clustering algorithm in Python, covering the steps to import necessary modules, import the Iris dataset, convert the dataset to a Panda dataframe, perform k-means analysis, and utilize a Python library for implementation.', 'The implementation involves importing important modules such as Panda, numpy, sklearn, and piplot, followed by importing the Iris dataset for classification problems, which can be downloaded from the UCI repositories.', 'The next step involves viewing and studying the dataset, then converting it into a Panda dataframe for further analysis and manipulation.', 'The final step includes performing k-means analysis on the dataset, visualizing the cluster result as a cluster vector and plot, and implementing the k-means clustering algorithm using Python libraries, including writing a program from scratch and utilizing the sklearn library.', 'The first method of implementation involves importing pandas, numpy, and matplotlib, defining a Panda dataframe variable, and specifying the x and y coordinates for the dataset.']}, {'end': 1624.589, 'start': 1296.552, 'title': 'K-means clustering process', 'summary': 'Outlines the process of implementing k-means clustering with a random seed of 200 and k=3, creating scatter plots, assigning data points to clusters, updating centroids, and repeating the process until convergence, with a mention of the sensitivity of k-means clustering to scale and the recommendation to normalize the data.', 'duration': 328.037, 'highlights': ['K-means clustering process involves selecting a random seed, defining the number of clusters (K=3), choosing the initial centroids, creating scatter plots, assigning data points to clusters, updating centroids, and repeating the process until convergence. The process involves defining a random seed as 200, setting K for k-means clustering as 3, choosing the initial centroids, creating scatter plots, assigning data points to clusters, updating centroids, and repeating the process until convergence.', 'The sensitivity of K-means clustering to scale is mentioned and the recommendation to normalize the data is provided. The transcript mentions the sensitivity of K-means clustering to scale and recommends normalizing the data to address scaling problems.']}], 'duration': 398.413, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/1XqG0kaJVHY/pics/1XqG0kaJVHY1226176.jpg', 'highlights': ['The chapter details the implementation of the k-means clustering algorithm in Python, covering the steps to import necessary modules, import the Iris dataset, convert the dataset to a Panda dataframe, perform k-means analysis, and utilize a Python library for implementation.', 'The implementation involves importing important modules such as Panda, numpy, sklearn, and piplot, followed by importing the Iris dataset for classification problems, which can be downloaded from the UCI repositories.', 'The next step involves viewing and studying the dataset, then converting it into a Panda dataframe for further analysis and manipulation.', 'K-means clustering process involves selecting a random seed, defining the number of clusters (K=3), choosing the initial centroids, creating scatter plots, assigning data points to clusters, updating centroids, and repeating the process until convergence.', 'The sensitivity of K-means clustering to scale is mentioned and the recommendation to normalize the data is provided.']}], 'highlights': ['The k-means algorithm involves randomly choosing K examples as initial centroids, creating K clusters by assigning each example to the closest centroid, and computing K new centroids by averaging examples in the cluster to find the exact number of clusters.', 'Clustering is used in businesses for customer segmentation, fraud detection, risk factor identification, and customer retention efforts.', 'The focus shifts to k-means clustering, which is described as an algorithm to find groups in the data represented by K, running iteratively to assign each data point to one of the K groups based on the provided features.', 'The hit and trial method is suggested for deciding the value of K, starting with the minimum possible value of K, that is, K equal 1, which symbolizes that all the data points lie within one single cluster.', 'The chapter details the implementation of the k-means clustering algorithm in Python, covering the steps to import necessary modules, import the Iris dataset, convert the dataset to a Panda dataframe, perform k-means analysis, and utilize a Python library for implementation.']}