title
Top 5 Algorithms used in Data Science | Data Science Tutorial | Data Mining Tutorial | Edureka

description
( Data Science Training - https://www.edureka.co/data-science-r-programming-certification-course ) This tutorial will give you an overview of the most common algorithms that are used in Data Science. Here, you will learn what activities Data Scientists do and you will learn how they use algorithms like Decision Tree, Random Forest, Association Rule Mining, Linear Regression and K-Means Clustering. To learn more about Data Science click here: http://goo.gl/9HsPlv The topics related to 'R', Machine learning and Hadoop and various other algorithms have been extensively covered in our course “Data Science”. For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free). Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

detail
{'title': 'Top 5 Algorithms used in Data Science | Data Science Tutorial | Data Mining Tutorial | Edureka', 'heatmap': [{'end': 3966.633, 'start': 3922.424, 'weight': 1}], 'summary': 'Provides an overview of the top five widely used data science algorithms, discusses the impact of data analysis and machine learning, explores chess engines and machine learning, explains the random forest algorithm, covers data mining techniques, introduces regression for predicting apartment prices, and discusses k-means clustering technique and cluster performance evaluation, with real-world applications in telecom churn prediction and beyond.', 'chapters': [{'end': 221.182, 'segs': [{'end': 74.185, 'src': 'embed', 'start': 25.762, 'weight': 0, 'content': [{'end': 31.003, 'text': 'So I also have a hands on towards the end a lab session, so I will do that as well.', 'start': 25.762, 'duration': 5.241}, {'end': 33.785, 'text': 'So top five algorithms used in data science.', 'start': 31.563, 'duration': 2.222}, {'end': 42.211, 'text': "So we have picked the ones which are quite popular and which are widely used, right, friends? All right, let's get started.", 'start': 34.025, 'duration': 8.186}, {'end': 44.093, 'text': 'End of the session.', 'start': 42.612, 'duration': 1.481}, {'end': 46.895, 'text': 'you know this is what is data science?', 'start': 44.093, 'duration': 2.802}, {'end': 48.516, 'text': 'what does data scientist do?', 'start': 46.895, 'duration': 1.621}, {'end': 50.518, 'text': 'what is the nature of the job?', 'start': 48.516, 'duration': 2.002}, {'end': 61.502, 'text': "top five data science algorithms like decision tree, random, forest association, rule mining, which is your, That's one of the popular algorithms too.", 'start': 50.518, 'duration': 10.984}, {'end': 65.483, 'text': 'linear regression and k-means clustering.', 'start': 61.502, 'duration': 3.981}, {'end': 70.424, 'text': "That's very, very widely used, right? That's one of the clustering algorithms.", 'start': 65.522, 'duration': 4.902}, {'end': 74.185, 'text': 'And there are so many algorithms, friends, so it is not to you know.', 'start': 71.084, 'duration': 3.101}], 'summary': 'Overview of top 5 data science algorithms: decision tree, random forest, association rule mining, linear regression, and k-means clustering.', 'duration': 48.423, 'max_score': 25.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu425762.jpg'}, {'end': 147.18, 'src': 'embed', 'start': 101.351, 'weight': 2, 'content': [{'end': 106.052, 'text': 'But you should know it because data scientists are going to be in demand.', 'start': 101.351, 'duration': 4.701}, {'end': 111.434, 'text': "They're already in demand, by the way, and they're going to be more and more in demand, right?", 'start': 106.292, 'duration': 5.142}, {'end': 116.495, 'text': 'And we all know data is coming from various quarters, right?', 'start': 112.654, 'duration': 3.841}, {'end': 122.071, 'text': 'data is just not restricted to coming from a very structured source.', 'start': 118.409, 'duration': 3.662}, {'end': 126.354, 'text': 'Today, the data is actually coming from myriads of sources, right?', 'start': 122.472, 'duration': 3.882}, {'end': 133.618, 'text': 'So we are talking about sensors, we are talking about logs, we are talking about social media, we are talking about wearables,', 'start': 126.834, 'duration': 6.784}, {'end': 135.599, 'text': 'we are talking about all the devices, right?', 'start': 133.618, 'duration': 1.981}, {'end': 140.575, 'text': 'data coming from different pockets, different sources, right?', 'start': 137.552, 'duration': 3.023}, {'end': 144.918, 'text': 'And this is only going to multiply further and further, right?', 'start': 140.835, 'duration': 4.083}, {'end': 147.18, 'text': 'So this is never going to come down, right?', 'start': 145.239, 'duration': 1.941}], 'summary': 'Data scientists are and will be in high demand due to increasing data from diverse sources.', 'duration': 45.829, 'max_score': 101.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4101351.jpg'}, {'end': 207.302, 'src': 'embed', 'start': 178.718, 'weight': 4, 'content': [{'end': 184.843, 'text': 'Data science is nothing but extracting meaningful and actionable knowledge from data, right?', 'start': 178.718, 'duration': 6.125}, {'end': 190.599, 'text': 'So I think you would have also heard about one thing right?', 'start': 185.283, 'duration': 5.316}, {'end': 195.119, 'text': 'In 60s, people were talking about the 40s, 30s.', 'start': 191.159, 'duration': 3.96}, {'end': 197.6, 'text': 'in 1930s, 40s, there was a gold rush, right?', 'start': 195.119, 'duration': 2.481}, {'end': 198.98, 'text': 'You all would have heard about that.', 'start': 197.64, 'duration': 1.34}, {'end': 207.302, 'text': 'People were just fighting for gold and people were even lots and lots of people actually went to California to look for gold right?', 'start': 199.54, 'duration': 7.762}], 'summary': 'Data science extracts knowledge from data; 1930s gold rush drew many to california.', 'duration': 28.584, 'max_score': 178.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4178718.jpg'}], 'start': 0.387, 'title': 'Data science insights', 'summary': 'Provides an overview of top five widely used data science algorithms and discusses the future scope and opportunities for data scientists, highlighting the growing demand and significant career prospects for the next decade.', 'chapters': [{'end': 74.185, 'start': 0.387, 'title': 'Data science algorithms overview', 'summary': 'Provides an overview of data science algorithms, focusing on the top five widely used algorithms: decision tree, random forest, association rule mining, linear regression, and k-means clustering, with a speaker having over 20 years of technology experience and a hands-on lab session.', 'duration': 73.798, 'highlights': ['The speaker has over 20 years of experience in technology, with the last three to four years dedicated to big data and analytics, and will be presenting on widely used algorithms in data science.', 'The top five data science algorithms covered in the session are decision tree, random forest, association rule mining, linear regression, and k-means clustering, which are popular and widely used.', 'The session includes a hands-on lab session towards the end, providing practical application of the discussed algorithms.']}, {'end': 221.182, 'start': 74.185, 'title': 'Data science: future scope and opportunities', 'summary': 'Discusses the growing demand for data scientists due to the increasing volume and diversity of data sources, projecting a bright future for data analysis and data science with a significant scope for the next decade.', 'duration': 146.997, 'highlights': ['Data scientists are in high demand and will continue to be in even greater demand in the future. The speaker emphasizes the current and future demand for data scientists, indicating a positive outlook for the field.', 'Data is coming from various sources such as sensors, logs, social media, wearables, and devices, with a projection of further multiplication. The discussion highlights the diverse and expanding sources of data, suggesting a significant increase in data volume and variety.', 'Data science involves extracting meaningful and actionable knowledge from data, likening data to gold in terms of its value. The comparison of data to gold emphasizes its significance and value, aligning with the concept of data science as a valuable pursuit for extracting meaningful insights.']}], 'duration': 220.795, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4387.jpg', 'highlights': ['The top five data science algorithms covered are decision tree, random forest, association rule mining, linear regression, and k-means clustering.', 'The session includes a hands-on lab session towards the end, providing practical application of the discussed algorithms.', 'Data scientists are in high demand and will continue to be in even greater demand in the future.', 'Data is coming from various sources such as sensors, logs, social media, wearables, and devices, with a projection of further multiplication.', 'Data science involves extracting meaningful and actionable knowledge from data, likening data to gold in terms of its value.']}, {'end': 614.152, 'segs': [{'end': 246.723, 'src': 'embed', 'start': 221.542, 'weight': 0, 'content': [{'end': 231.33, 'text': 'If you know how to make meaningful insights out of that, it is worth the weight of gold, or even more than that,', 'start': 221.542, 'duration': 9.788}, {'end': 241.418, 'text': 'because a lot of critical decisions that can drive the economy or that can drive the direction of a company, a lot of things,', 'start': 231.33, 'duration': 10.088}, {'end': 246.723, 'text': 'actually depends on how we analyze the data, how smartly we analyze the data right?', 'start': 241.418, 'duration': 5.305}], 'summary': 'Meaningful data insights drive critical decisions for economy and companies.', 'duration': 25.181, 'max_score': 221.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4221542.jpg'}, {'end': 588.321, 'src': 'embed', 'start': 540.912, 'weight': 1, 'content': [{'end': 546.277, 'text': 'They know how to write new techniques, marrying some of the existing algorithms.', 'start': 540.912, 'duration': 5.365}, {'end': 553.041, 'text': 'So they know how to do extensive analysis, deep down data mining, all those things, and they are well versed in machine learning as well.', 'start': 547.017, 'duration': 6.024}, {'end': 555.282, 'text': "So it's a combination of everything.", 'start': 553.721, 'duration': 1.561}, {'end': 558.664, 'text': "So it's not something that can be mastered quite easily.", 'start': 555.542, 'duration': 3.122}, {'end': 560.105, 'text': 'It takes some time.', 'start': 558.844, 'duration': 1.261}, {'end': 563.847, 'text': 'It takes years to become a very good data scientist.', 'start': 561.046, 'duration': 2.801}, {'end': 573.273, 'text': 'So moving on, machine learning, what we are saying is a method to teaching computers to make improved predictions based on data.', 'start': 565.448, 'duration': 7.825}, {'end': 578.796, 'text': "It's a huge field with hundreds of different algorithms for solving many different problems.", 'start': 574.293, 'duration': 4.503}, {'end': 583.358, 'text': "So I mean machine learning is it's basically a branch of artificial intelligence, right?", 'start': 578.836, 'duration': 4.522}, {'end': 588.321, 'text': "It's, like you know, study of systems that can learn from the data, right?", 'start': 583.478, 'duration': 4.843}], 'summary': 'Becoming a proficient data scientist takes years, involving extensive analysis and machine learning expertise to master various algorithms and solve diverse problems.', 'duration': 47.409, 'max_score': 540.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4540912.jpg'}], 'start': 221.542, 'title': 'The importance of data analysis', 'summary': 'Discusses the significance of data analysis in driving critical decisions, the skills and expertise required to become a top-notch data scientist, and the impact of machine learning on various industries, emphasizing the complexity and time required to master these fields.', 'chapters': [{'end': 614.152, 'start': 221.542, 'title': 'The power of data science', 'summary': 'Discusses the significance of data analysis in driving critical decisions, the skills and expertise required to become a top-notch data scientist, and the impact of machine learning on various industries, emphasizing the complexity and time required to master these fields.', 'duration': 392.61, 'highlights': ['Data analysis drives critical decisions in the economy and within companies, emphasizing the importance of smart data analysis. The chapter highlights the crucial role of data analysis in driving critical decisions within the economy and companies, underscoring the significance of smart data analysis in influencing outcomes and directions.', 'Becoming a top-notch data scientist requires a combination of talent, knowledge, statistics, and software engineering skills. The chapter emphasizes the multifaceted skills and expertise required to excel as a data scientist, including talent, knowledge in statistics and software engineering, and the necessity to work on algorithms and formulae.', 'Machine learning is a complex field within artificial intelligence, used in various industries including healthcare for diagnosis and medication recommendations. The chapter highlights the complexity of machine learning within the field of artificial intelligence and its widespread applications, particularly in healthcare where machines are replacing humans in diagnosis and medication recommendations.']}], 'duration': 392.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4221542.jpg', 'highlights': ['Data analysis drives critical decisions in the economy and within companies, emphasizing the importance of smart data analysis.', 'Becoming a top-notch data scientist requires a combination of talent, knowledge, statistics, and software engineering skills.', 'Machine learning is a complex field within artificial intelligence, used in various industries including healthcare for diagnosis and medication recommendations.']}, {'end': 1469.541, 'segs': [{'end': 718.896, 'src': 'embed', 'start': 614.592, 'weight': 0, 'content': [{'end': 617.993, 'text': 'And it is making more and more penetration.', 'start': 614.592, 'duration': 3.401}, {'end': 620.373, 'text': 'And we all know.', 'start': 618.753, 'duration': 1.62}, {'end': 625.875, 'text': 'just to give you a small perspective, how many people have heard about Deep Blue?', 'start': 620.373, 'duration': 5.502}, {'end': 630.356, 'text': 'Deep Blue in chess.', 'start': 627.455, 'duration': 2.901}, {'end': 631.456, 'text': 'has anyone heard about that??', 'start': 630.356, 'duration': 1.1}, {'end': 633.643, 'text': 'Okay, no?, All right.', 'start': 632.182, 'duration': 1.461}, {'end': 638.744, 'text': 'Sorry, guys, a little noisy out here.', 'start': 635.443, 'duration': 3.301}, {'end': 648.007, 'text': 'Okay, well, Deep Blue was one of the projects that was done by IBM back in, I think, late 90s.', 'start': 640.145, 'duration': 7.862}, {'end': 651.228, 'text': 'So then came Fritz.', 'start': 648.768, 'duration': 2.46}, {'end': 653.189, 'text': 'These are some of the chess engines, right?', 'start': 651.469, 'duration': 1.72}, {'end': 659.911, 'text': "So today, if you go to, you know, if you're a chess person,", 'start': 653.789, 'duration': 6.122}, {'end': 664.734, 'text': 'I think you will see a lot of you know one of the most powerful engine called Komodo right?', 'start': 659.911, 'duration': 4.823}, {'end': 666.654, 'text': "It's in the name of Komodo Dragon right?", 'start': 665.094, 'duration': 1.56}, {'end': 670.415, 'text': 'So that is like you know it plays with one.', 'start': 667.214, 'duration': 3.201}, {'end': 673.335, 'text': "you know it'll play with one piece.", 'start': 670.415, 'duration': 2.92}, {'end': 680.677, 'text': 'you know missing piece right? With established grandmasters and it is able to, you know, defeat them.', 'start': 673.335, 'duration': 7.342}, {'end': 683.137, 'text': 'So it is becoming powerful, more powerful.', 'start': 681.137, 'duration': 2}, {'end': 690.219, 'text': 'Even Kasparov, one of the esterior, you know, grandmasters, and, you know, he was probably the best player at one point.', 'start': 683.418, 'duration': 6.801}, {'end': 693.09, 'text': 'So he lost to this deep blue chess, right?', 'start': 690.489, 'duration': 2.601}, {'end': 698.691, 'text': 'So I mean the more and more they are trained right, the more and more they become better, right?', 'start': 693.51, 'duration': 5.181}, {'end': 700.391, 'text': 'All these are like you know.', 'start': 698.871, 'duration': 1.52}, {'end': 703.252, 'text': "in a way it's machine learning, right?", 'start': 700.391, 'duration': 2.861}, {'end': 711.614, 'text': 'So the idea is, you know, the overall goal is to devise learning algorithms so that does the learning automatically,', 'start': 703.672, 'duration': 7.942}, {'end': 714.775, 'text': 'without much of human intervention or assistance, right?', 'start': 711.614, 'duration': 3.161}, {'end': 718.896, 'text': 'There are various models, which we have listed down.', 'start': 716.075, 'duration': 2.821}], 'summary': 'Advancements in chess engines, such as komodo, showcase the increasing power of machine learning, with examples like defeating established grandmasters and even kasparov.', 'duration': 104.304, 'max_score': 614.592, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4614592.jpg'}, {'end': 797.209, 'src': 'embed', 'start': 761.108, 'weight': 5, 'content': [{'end': 763.528, 'text': 'you know you can become a data science.', 'start': 761.108, 'duration': 2.42}, {'end': 770.65, 'text': 'you know you can get into data science, because data science is just not statistics and Python and all that right? Or R, for that matter.', 'start': 763.528, 'duration': 7.122}, {'end': 772.831, 'text': "There's a lot to, there's a lot.", 'start': 771.05, 'duration': 1.781}, {'end': 775.472, 'text': 'You know, you still need to do a lot of data analysis.', 'start': 773.371, 'duration': 2.101}, {'end': 778.932, 'text': 'that can be done using big data technologies, using Java itself, right?', 'start': 775.472, 'duration': 3.46}, {'end': 780.873, 'text': 'Lots and lots of possibilities are there.', 'start': 779.253, 'duration': 1.62}, {'end': 783.624, 'text': "Things don't stop at one point here.", 'start': 781.723, 'duration': 1.901}, {'end': 785.665, 'text': 'Right, friends? All right.', 'start': 784.104, 'duration': 1.561}, {'end': 787.465, 'text': 'Moving on.', 'start': 787.005, 'duration': 0.46}, {'end': 797.209, 'text': 'Beneficial Do you know what is actuarial science? Actuary.', 'start': 790.786, 'duration': 6.423}], 'summary': 'Data science includes statistics, python, r, data analysis, big data technologies, and java, offering numerous possibilities.', 'duration': 36.101, 'max_score': 761.108, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4761108.jpg'}, {'end': 1473.882, 'src': 'embed', 'start': 1445.013, 'weight': 4, 'content': [{'end': 1447.754, 'text': "That's what Pratik has tried to say, information gain and all.", 'start': 1445.013, 'duration': 2.741}, {'end': 1456.837, 'text': 'I think it is definitely, that comes into the picture when you actually try to determine even what can be the root node here.', 'start': 1448.014, 'duration': 8.823}, {'end': 1466.42, 'text': 'So you need to find out, based on different root nodes, which can make the higher the information gain rate, the shorter the thing is.', 'start': 1456.957, 'duration': 9.463}, {'end': 1469.541, 'text': "That's probably the most effective way to do things.", 'start': 1466.46, 'duration': 3.081}, {'end': 1471.501, 'text': 'So I want you to remember that.', 'start': 1470.081, 'duration': 1.42}, {'end': 1473.882, 'text': 'Sorry, my example was not all that great.', 'start': 1471.741, 'duration': 2.141}], 'summary': 'Information gain is crucial in determining the root node, leading to higher information gain rate and more effective decisions.', 'duration': 28.869, 'max_score': 1445.013, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41445013.jpg'}], 'start': 614.592, 'title': 'Chess engines and machine learning', 'summary': 'Discusses the increasing power of chess engines like deep blue and komodo, capable of defeating grandmasters. it also explores machine learning, including the decision tree algorithm and its process of determining the root node and branching based on information gain.', 'chapters': [{'end': 693.09, 'start': 614.592, 'title': 'Advancement in chess engines', 'summary': 'Discusses the increasing power of chess engines, with examples like deep blue and komodo, capable of defeating established grandmasters like kasparov in the late 90s.', 'duration': 78.498, 'highlights': ['The chess engine Komodo, known for its power, can play with a missing piece and defeat established grandmasters.', 'Deep Blue, a project by IBM in the late 90s, was able to defeat grandmasters like Kasparov, showcasing the increasing penetration and power of chess engines.', 'The increasing power of chess engines is evident from the fact that even established grandmasters like Kasparov have lost to them.']}, {'end': 1469.541, 'start': 693.51, 'title': 'Machine learning and decision tree', 'summary': 'Discusses the concept of machine learning and its goal of devising learning algorithms to automatically learn without human intervention, also delving into a detailed explanation of the decision tree algorithm and the process of determining the root node and branching based on information gain.', 'duration': 776.031, 'highlights': ['The overall goal is to devise learning algorithms so that does the learning automatically, without much of human intervention or assistance. The goal of machine learning is to create learning algorithms that can learn automatically without human intervention.', 'Decision tree algorithm is explained in detail, demonstrating the process of determining the root node and branching based on information gain. The detailed explanation of the decision tree algorithm, including the process of determining the root node and branching based on information gain.', "Learning is a continuous process, and individuals with Java background can transition into data science by leveraging Java's capabilities in data analysis. Individuals with Java background can transition into data science as learning is a continuous process and Java can be used for data analysis in data science."]}], 'duration': 854.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu4614592.jpg', 'highlights': ['The chess engine Komodo, known for its power, can play with a missing piece and defeat established grandmasters.', 'Deep Blue, a project by IBM in the late 90s, was able to defeat grandmasters like Kasparov, showcasing the increasing penetration and power of chess engines.', 'The increasing power of chess engines is evident from the fact that even established grandmasters like Kasparov have lost to them.', 'The overall goal is to devise learning algorithms so that does the learning automatically, without much of human intervention or assistance.', 'The detailed explanation of the decision tree algorithm, including the process of determining the root node and branching based on information gain.', 'Individuals with Java background can transition into data science as learning is a continuous process and Java can be used for data analysis in data science.']}, {'end': 1762.12, 'segs': [{'end': 1627.315, 'src': 'embed', 'start': 1598.044, 'weight': 0, 'content': [{'end': 1601.885, 'text': 'Of course we are looking at somebody who can give us the right one, correct?', 'start': 1598.044, 'duration': 3.841}, {'end': 1605.266, 'text': "But we also, you know, even in the doctor's case, right.", 'start': 1602.525, 'duration': 2.741}, {'end': 1607.446, 'text': 'we always seek second opinion, and all that right?', 'start': 1605.266, 'duration': 2.18}, {'end': 1609.867, 'text': 'Second opinion, third opinion, whatever right?', 'start': 1607.526, 'duration': 2.341}, {'end': 1612.507, 'text': 'Group of friends, the success rate is higher, right?', 'start': 1610.287, 'duration': 2.22}, {'end': 1614.608, 'text': "We just don't bank on one decision.", 'start': 1612.827, 'duration': 1.781}, {'end': 1617.868, 'text': 'Random forest is all about that, right?', 'start': 1615.068, 'duration': 2.8}, {'end': 1619.469, 'text': 'If you understand that, right?', 'start': 1617.968, 'duration': 1.501}, {'end': 1623.612, 'text': 'Instead of asking, ask your best friend, right?', 'start': 1621.59, 'duration': 2.022}, {'end': 1627.315, 'text': "You're like asking your best friend here.", 'start': 1624.072, 'duration': 3.243}], 'summary': 'Seeking multiple opinions can lead to higher success rates, as in the case of random forest decision-making.', 'duration': 29.271, 'max_score': 1598.044, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41598044.jpg'}, {'end': 1671.666, 'src': 'embed', 'start': 1644.271, 'weight': 1, 'content': [{'end': 1655.353, 'text': 'but what we are trying to say is the consolidation of various decisions coming from different decision trees within the random forest.', 'start': 1644.271, 'duration': 11.082}, {'end': 1660.016, 'text': 'So suppose I have like four people put up there.', 'start': 1655.833, 'duration': 4.183}, {'end': 1664.82, 'text': "four of my friends say yes, hey, boss, you go and watch this movie, it's pretty good, right?", 'start': 1660.016, 'duration': 4.804}, {'end': 1665.861, 'text': 'Four people are saying', 'start': 1664.981, 'duration': 0.88}, {'end': 1669.544, 'text': "One person says it's a stupid movie, right?", 'start': 1666.702, 'duration': 2.842}, {'end': 1671.666, 'text': 'But would you go for the movie or not?', 'start': 1669.985, 'duration': 1.681}], 'summary': 'Random forest consolidates decisions from various decision trees to make a final choice, similar to seeking opinions from friends.', 'duration': 27.395, 'max_score': 1644.271, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41644271.jpg'}, {'end': 1739.368, 'src': 'embed', 'start': 1714.1, 'weight': 2, 'content': [{'end': 1719.562, 'text': "So more accurate recommendation, you ask a bunch of friends, that's what we said, and they vote on whether you like a movie.", 'start': 1714.1, 'duration': 5.462}, {'end': 1727.203, 'text': "The majority will decide the final outcome, right? So again, friend one says, so you're making a decision, no here, yes.", 'start': 1719.582, 'duration': 7.621}, {'end': 1734.265, 'text': 'So this is like evened out, right? So here, yes, yes, yes, right? There are three yeses.', 'start': 1727.463, 'duration': 6.802}, {'end': 1735.765, 'text': 'Okay, friend two.', 'start': 1734.785, 'duration': 0.98}, {'end': 1739.368, 'text': 'friend, three there are like one.', 'start': 1737.287, 'duration': 2.081}], 'summary': "Using friends' votes for movie recommendation, majority rule applied with 3 'yes' votes.", 'duration': 25.268, 'max_score': 1714.1, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41714100.jpg'}], 'start': 1470.081, 'title': 'Random forest algorithm', 'summary': 'Explains the random forest algorithm, emphasizing the benefit of seeking multiple opinions for critical decisions and the higher success rate of group opinions over individual opinions, akin to a method of decision-making based on the majority opinion, ensuring more accurate predictions.', 'chapters': [{'end': 1644.271, 'start': 1470.081, 'title': 'Random forest algorithm', 'summary': 'Explains the random forest algorithm, highlighting the importance of seeking multiple opinions for making critical decisions and the higher success rate of group opinions over individual opinions.', 'duration': 174.19, 'highlights': ['The Random Forest algorithm emphasizes the importance of seeking multiple opinions for critical decisions, as it has a higher success rate than relying on a single opinion.', 'It illustrates the concept using the example of asking for opinions when deciding to watch a movie, emphasizing the benefits of seeking the opinion of a group of friends over just asking one person.', "The algorithm's approach is likened to seeking second opinions in medical cases, emphasizing the higher success rate of considering multiple opinions rather than relying solely on one decision."]}, {'end': 1762.12, 'start': 1644.271, 'title': 'Random forest decision making', 'summary': 'Explains the concept of random forest as a method of decision-making based on the majority opinion, akin to seeking recommendations from friends, with the majority vote determining the final outcome, ensuring more accurate predictions.', 'duration': 117.849, 'highlights': ['Random forest is a method of decision-making based on the majority opinion, similar to seeking recommendations from friends. The majority vote determines the final outcome, ensuring more accurate predictions.', 'In the scenario provided, if four out of six friends recommend watching a movie, the majority vote (4:2) would lead to the decision of watching the movie, demonstrating the concept of majority opinion guiding the decision-making process.']}], 'duration': 292.039, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41470081.jpg', 'highlights': ['The Random Forest algorithm emphasizes the importance of seeking multiple opinions for critical decisions, as it has a higher success rate than relying on a single opinion.', 'Random forest is a method of decision-making based on the majority opinion, similar to seeking recommendations from friends. The majority vote determines the final outcome, ensuring more accurate predictions.', 'It illustrates the concept using the example of asking for opinions when deciding to watch a movie, emphasizing the benefits of seeking the opinion of a group of friends over just asking one person.', "The algorithm's approach is likened to seeking second opinions in medical cases, emphasizing the higher success rate of considering multiple opinions rather than relying solely on one decision.", 'In the scenario provided, if four out of six friends recommend watching a movie, the majority vote (4:2) would lead to the decision of watching the movie, demonstrating the concept of majority opinion guiding the decision-making process.']}, {'end': 2391.668, 'segs': [{'end': 1789.837, 'src': 'embed', 'start': 1763.7, 'weight': 0, 'content': [{'end': 1771.49, 'text': "So this is like Random forest is nothing, but it's like an ensemble made with many decision tree models right?", 'start': 1763.7, 'duration': 7.79}, {'end': 1776.692, 'text': 'So ensemble is like you know, it combines the results from different models correct?', 'start': 1772.011, 'duration': 4.681}, {'end': 1782.575, 'text': 'So, usually better than so, it has got a higher success rate than a decision tree right?', 'start': 1777.093, 'duration': 5.482}, {'end': 1786.976, 'text': 'Because it is like coming from different, you know, decision trees.', 'start': 1782.615, 'duration': 4.361}, {'end': 1789.837, 'text': "It's a combination of decision trees.", 'start': 1787.737, 'duration': 2.1}], 'summary': 'Random forest is an ensemble of decision trees with higher success rate.', 'duration': 26.137, 'max_score': 1763.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41763700.jpg'}, {'end': 1875.429, 'src': 'embed', 'start': 1852.068, 'weight': 1, 'content': [{'end': 1860.976, 'text': 'So from this you can mine the data and figure out the association between various items within a basket, right?', 'start': 1852.068, 'duration': 8.908}, {'end': 1866.281, 'text': 'So you will come to a conclusion like if you assess things right?', 'start': 1861.537, 'duration': 4.744}, {'end': 1875.429, 'text': "So, because there's lots and lots of science behind product placement in a shop and also how they want to market a particular product right?", 'start': 1866.601, 'duration': 8.828}], 'summary': 'Data mining can reveal associations between items in a basket to optimize product placement and marketing strategies.', 'duration': 23.361, 'max_score': 1852.068, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41852068.jpg'}, {'end': 2067.591, 'src': 'embed', 'start': 2033.93, 'weight': 2, 'content': [{'end': 2041.471, 'text': "There's lots and lots of, you know, the a priori algorithm is one of the most famous one which is used here.", 'start': 2033.93, 'duration': 7.541}, {'end': 2045.893, 'text': 'So this is a very popular one, a very, very interesting thing.', 'start': 2041.972, 'duration': 3.921}, {'end': 2053.842, 'text': 'In fact, shared something, a story which some of you may have heard about it.', 'start': 2046.793, 'duration': 7.049}, {'end': 2061.387, 'text': 'There is an association of purchase of beer and diaper, right? Beer and diaper.', 'start': 2054.684, 'duration': 6.703}, {'end': 2067.591, 'text': 'So this is a very classic case of association rule mining.', 'start': 2061.987, 'duration': 5.604}], 'summary': 'A priori algorithm is popular for association rule mining, e.g. beer and diaper purchase association.', 'duration': 33.661, 'max_score': 2033.93, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42033930.jpg'}, {'end': 2129.243, 'src': 'embed', 'start': 2096.476, 'weight': 3, 'content': [{'end': 2097.878, 'text': 'It was found during the weekend.', 'start': 2096.476, 'duration': 1.402}, {'end': 2103.708, 'text': 'the beer and diaper were always bought together.', 'start': 2098.683, 'duration': 5.025}, {'end': 2105.549, 'text': 'so people are trying to research.', 'start': 2103.708, 'duration': 1.841}, {'end': 2107.852, 'text': 'what is the reason for that?', 'start': 2105.549, 'duration': 2.303}, {'end': 2108.752, 'text': 'so beer is like.', 'start': 2107.852, 'duration': 0.9}, {'end': 2112.816, 'text': 'you know, people drink and all that, especially the men folks.', 'start': 2108.752, 'duration': 4.064}, {'end': 2115.719, 'text': 'and diaper again, where is the diaper landing?', 'start': 2112.816, 'duration': 2.903}, {'end': 2120.978, 'text': 'So when they researched it, what is the relationship here?', 'start': 2117.355, 'duration': 3.623}, {'end': 2122.859, 'text': 'Why is this pattern??', 'start': 2121.878, 'duration': 0.981}, {'end': 2124.62, 'text': 'So they were trying to research on that.', 'start': 2123.079, 'duration': 1.541}, {'end': 2129.243, 'text': 'So they found out two theories, which are quite interesting here.', 'start': 2124.68, 'duration': 4.563}], 'summary': 'Research found beer and diaper often bought together, leading to two interesting theories.', 'duration': 32.767, 'max_score': 2096.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42096476.jpg'}, {'end': 2251.89, 'src': 'embed', 'start': 2222.394, 'weight': 4, 'content': [{'end': 2224.235, 'text': "so there's lots and lots of association.", 'start': 2222.394, 'duration': 1.841}, {'end': 2230.936, 'text': "right, it's a popular, well researched method for discovering interesting relations between variables in large data, right?", 'start': 2224.235, 'duration': 6.701}, {'end': 2236.52, 'text': 'So it would indicate if the customer buys onions, okay, I think we are getting onions here as well.', 'start': 2231.516, 'duration': 5.004}, {'end': 2240.783, 'text': 'Potatoes together likely to buy hamburger meat right?', 'start': 2237.16, 'duration': 3.623}, {'end': 2243.344, 'text': "So that's pretty interesting, right?", 'start': 2241.183, 'duration': 2.161}, {'end': 2247.527, 'text': 'I mean the people will also buy this right? Because they will fry it.', 'start': 2243.444, 'duration': 4.083}, {'end': 2248.688, 'text': 'along with this, you know?', 'start': 2247.527, 'duration': 1.161}, {'end': 2251.89, 'text': 'The people who bought egg, they will also buy some.', 'start': 2249.028, 'duration': 2.862}], 'summary': 'Association analysis reveals interesting relationships between customer purchases, such as the correlation between buying onions and potatoes with hamburger meat.', 'duration': 29.496, 'max_score': 2222.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42222394.jpg'}, {'end': 2325.969, 'src': 'embed', 'start': 2295.075, 'weight': 5, 'content': [{'end': 2297.616, 'text': 'you know support and confidence.', 'start': 2295.075, 'duration': 2.541}, {'end': 2298.636, 'text': 'These are the two things.', 'start': 2297.776, 'duration': 0.86}, {'end': 2302.698, 'text': "and there's one more thing called lift, which is probably not there, okay?", 'start': 2298.636, 'duration': 4.062}, {'end': 2309.4, 'text': 'So the you know, these are some of the terms that are actually used in.', 'start': 2303.278, 'duration': 6.122}, {'end': 2316.2, 'text': 'you know, the confidence is nothing, but you know, given the person buying something, right?', 'start': 2309.4, 'duration': 6.8}, {'end': 2325.969, 'text': "Given the person buying something, buying an item, let's say, the person who is buying orange juice is also buying soda, okay?", 'start': 2316.541, 'duration': 9.428}], 'summary': 'Support, confidence, and lift are terms used in association rule mining to analyze purchase behavior.', 'duration': 30.894, 'max_score': 2295.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42295075.jpg'}], 'start': 1763.7, 'title': 'Data mining techniques', 'summary': 'Covers random forest ensemble models and association rule mining for analyzing customer shopping patterns, product placement, and the relationship between product association and consumer behavior. it also explains support, confidence, and lift in association analysis.', 'chapters': [{'end': 2096.417, 'start': 1763.7, 'title': 'Random forest & association rule mining', 'summary': "Discusses the concept of random forest as an ensemble of decision tree models with higher success rate and association rule mining's application in analyzing customer shopping patterns and product placement in supermarkets.", 'duration': 332.717, 'highlights': ['Random forest is an ensemble made with many decision tree models, usually better than a single decision tree with a higher success rate. Random forest is an ensemble of decision tree models with a higher success rate.', 'Association rule mining involves analyzing shopping basket data to determine associations between items, influencing product placement and marketing strategies in supermarkets. Association rule mining analyzes shopping basket data to determine associations between items, influencing product placement and marketing strategies.', 'A classic case of association rule mining is the association between the purchase of beer and diapers, demonstrating the practical application of this concept. The classic case of association rule mining is the association between the purchase of beer and diapers, demonstrating the practical application of this concept.']}, {'end': 2391.668, 'start': 2096.476, 'title': 'Product association and consumer behavior', 'summary': 'Delves into the intriguing relationship between product association and consumer behavior, uncovering patterns such as the correlation between beer and diaper purchases, as well as explaining the concepts of support, confidence, and lift in association analysis.', 'duration': 295.192, 'highlights': ['The correlation between beer and diaper purchases was discovered through research, revealing that beer drinkers bought diapers to reduce trips to the bathroom while enjoying their drink, and mothers would request diapers alongside other errands. Correlation between beer and diaper purchases', 'Association analysis can uncover interesting correlations, such as the likelihood of customers buying onions and potatoes together also purchasing hamburger meat, thereby influencing product promotions and placements. Correlation between onions, potatoes, and hamburger meat purchases', 'The explanation of support, confidence, and lift in association analysis provides valuable insights into understanding consumer behavior and the relationships between product purchases. Explanation of support, confidence, and lift in association analysis']}], 'duration': 627.968, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu41763700.jpg', 'highlights': ['Random forest is an ensemble of decision tree models with a higher success rate.', 'Association rule mining analyzes shopping basket data to determine associations between items, influencing product placement and marketing strategies.', 'The classic case of association rule mining is the association between the purchase of beer and diapers, demonstrating the practical application of this concept.', 'Correlation between beer and diaper purchases', 'Correlation between onions, potatoes, and hamburger meat purchases', 'Explanation of support, confidence, and lift in association analysis provides valuable insights into understanding consumer behavior and the relationships between product purchases.']}, {'end': 2864.471, 'segs': [{'end': 2452.001, 'src': 'embed', 'start': 2423.354, 'weight': 0, 'content': [{'end': 2431.837, 'text': 'So regression is the very, very popular algorithm that is used for prediction and forecasting, right?', 'start': 2423.354, 'duration': 8.483}, {'end': 2438.796, 'text': 'So you can actually, you know, there are different kinds of regression.', 'start': 2433.934, 'duration': 4.862}, {'end': 2447.239, 'text': 'So if the variable is continuous in nature, right, then we can use linear regression.', 'start': 2439.376, 'duration': 7.863}, {'end': 2452.001, 'text': 'Otherwise, you know, we may need to use logistic regression and all that stuff.', 'start': 2447.299, 'duration': 4.702}], 'summary': 'Regression is a popular algorithm for prediction and forecasting. linear regression is used for continuous variables, while logistic regression is used for other types of variables.', 'duration': 28.647, 'max_score': 2423.354, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42423354.jpg'}, {'end': 2596.274, 'src': 'embed', 'start': 2565.895, 'weight': 2, 'content': [{'end': 2566.496, 'text': 'All these are.', 'start': 2565.895, 'duration': 0.601}, {'end': 2568.995, 'text': 'independent variables right.', 'start': 2567.754, 'duration': 1.241}, {'end': 2571.378, 'text': 'all these are, like you know, independent.', 'start': 2568.995, 'duration': 2.383}, {'end': 2572.739, 'text': 'and what is the dependent one?', 'start': 2571.378, 'duration': 1.361}, {'end': 2577.463, 'text': 'the price of the apartment, right, so the price of the apartment, in order to predict that.', 'start': 2572.739, 'duration': 4.724}, {'end': 2582.128, 'text': 'you need all these things right for you to make the prediction right.', 'start': 2577.463, 'duration': 4.665}, {'end': 2586.792, 'text': 'so you can, you know the more and more you know you apply all these things right.', 'start': 2582.128, 'duration': 4.664}, {'end': 2592.772, 'text': 'then you can come up with a formula, a simple formula which can somewhat, you know.', 'start': 2586.792, 'duration': 5.98}, {'end': 2596.274, 'text': 'you can say maybe what is the influence?', 'start': 2592.772, 'duration': 3.502}], 'summary': 'Independent variables impact apartment price prediction.', 'duration': 30.379, 'max_score': 2565.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42565895.jpg'}, {'end': 2694.958, 'src': 'embed', 'start': 2668.115, 'weight': 3, 'content': [{'end': 2676.197, 'text': "So, when you do all these things, towards the end, you can even disregard the ones which doesn't influence the price of the apartment significantly.", 'start': 2668.115, 'duration': 8.082}, {'end': 2684.579, 'text': 'So, you will not have 20 things you can come up with, or even more, to determine the price of an apartment.', 'start': 2676.777, 'duration': 7.802}, {'end': 2690.381, 'text': 'You will not take all the 20 to compute the regression, to come up with the prediction.', 'start': 2684.879, 'duration': 5.502}, {'end': 2694.958, 'text': 'six or seven significant ones, right?', 'start': 2692.376, 'duration': 2.582}], 'summary': 'Selecting six or seven significant factors for apartment price prediction.', 'duration': 26.843, 'max_score': 2668.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42668115.jpg'}], 'start': 2392.048, 'title': 'Regression for predicting apartment prices', 'summary': 'Introduces regression for prediction, especially linear regression for continuous variables, and discusses factors influencing apartment price prediction. it also explains the concept of linear regression, emphasizing statistically significant factors and improved prediction accuracy with more data.', 'chapters': [{'end': 2565.075, 'start': 2392.048, 'title': 'Introduction to regression for prediction', 'summary': 'Introduces regression as a popular algorithm used for prediction and forecasting, especially linear regression for continuous variables, and discusses the factors influencing the prediction of apartment prices using regression.', 'duration': 173.027, 'highlights': ['Regression is a popular algorithm used for prediction and forecasting, with linear regression being quite popular. It is used for predicting the price of an apartment by considering factors such as square feet area, location, amenities, and neighborhood safety.', 'Linear regression is used when the variable is continuous in nature, while logistic regression is used for other types of variables. It is important to consider various factors such as land price, amenities, neighborhood safety, and proximity to transportation and facilities when predicting apartment prices.']}, {'end': 2864.471, 'start': 2565.895, 'title': 'Understanding linear regression', 'summary': 'Explains the concept of linear regression in predicting apartment prices based on independent variables, emphasizing the significance of statistically significant factors and the gradual improvement of prediction accuracy with more data.', 'duration': 298.576, 'highlights': ['Linear regression is used to predict apartment prices based on independent variables. The chapter discusses how independent variables such as crime rate, amenities, land price, and proximity to school are used to predict apartment prices.', 'Emphasizing the significance of statistically significant factors in the regression model. The chapter highlights the importance of identifying statistically significant factors that significantly affect the price of the apartment, suggesting that only six or seven significant variables are crucial for computing the regression and making predictions.', 'Improvement of prediction accuracy with more data in linear regression. The chapter explains that the prediction accuracy in linear regression improves with more data, making the model more firm and accurate as more variables and data are considered.']}], 'duration': 472.423, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42392048.jpg', 'highlights': ['Regression is a popular algorithm used for prediction and forecasting, with linear regression being quite popular. It is used for predicting the price of an apartment by considering factors such as square feet area, location, amenities, and neighborhood safety.', 'Linear regression is used when the variable is continuous in nature, while logistic regression is used for other types of variables. It is important to consider various factors such as land price, amenities, neighborhood safety, and proximity to transportation and facilities when predicting apartment prices.', 'Linear regression is used to predict apartment prices based on independent variables. The chapter discusses how independent variables such as crime rate, amenities, land price, and proximity to school are used to predict apartment prices.', 'Emphasizing the significance of statistically significant factors in the regression model. The chapter highlights the importance of identifying statistically significant factors that significantly affect the price of the apartment, suggesting that only six or seven significant variables are crucial for computing the regression and making predictions.', 'Improvement of prediction accuracy with more data in linear regression. The chapter explains that the prediction accuracy in linear regression improves with more data, making the model more firm and accurate as more variables and data are considered.']}, {'end': 3188.746, 'segs': [{'end': 2934.4, 'src': 'embed', 'start': 2892.65, 'weight': 0, 'content': [{'end': 2906.931, 'text': 'So this is like what we are saying, is objects are classified into groups as much dissimilar as possible from one group to another,', 'start': 2892.65, 'duration': 14.281}, {'end': 2909.773, 'text': 'but as much similar as possible within each group.', 'start': 2906.931, 'duration': 2.842}, {'end': 2923.524, 'text': 'So, can you think of any examples in this thing, friends? Clustering, has anyone heard about it? Some of you may have, but you know.', 'start': 2912.976, 'duration': 10.548}, {'end': 2934.4, 'text': 'Okay, clustering is nothing but grouping some of the pockets, right?', 'start': 2929.557, 'duration': 4.843}], 'summary': 'Objects are classified into dissimilar groups, with similarity within each group. clustering is about grouping pockets.', 'duration': 41.75, 'max_score': 2892.65, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42892650.jpg'}, {'end': 3049.282, 'src': 'embed', 'start': 2962.457, 'weight': 1, 'content': [{'end': 2967.3, 'text': "probably I'll be showing the same example here as well when I show you the lab session.", 'start': 2962.457, 'duration': 4.843}, {'end': 2971.382, 'text': 'In an organization, I want to keep a group.', 'start': 2968.02, 'duration': 3.362}, {'end': 2981.449, 'text': 'What do we do, why we need to group, because we always have, there are a lot of HR policies we have in an organization.', 'start': 2973.284, 'duration': 8.165}, {'end': 2989.708, 'text': "So, HR policy will not be for an individual, generally, right? We don't, you know, the policy is never for an individual.", 'start': 2982.305, 'duration': 7.403}, {'end': 2992.448, 'text': 'It is for a cluster, correct?', 'start': 2990.088, 'duration': 2.36}, {'end': 2995.409, 'text': 'So when I say cluster, they will announce.', 'start': 2993.009, 'duration': 2.4}, {'end': 3001.011, 'text': 'generally after the appraisal cycle ends and all that is, come back and announce.', 'start': 2995.409, 'duration': 5.602}, {'end': 3009.594, 'text': "hey, I'm going to give you, like you know, 20% raise for the top 5% performers, right?", 'start': 3001.011, 'duration': 8.583}, {'end': 3011.495, 'text': 'So they are given a significant raise.', 'start': 3009.974, 'duration': 1.521}, {'end': 3021.344, 'text': 'Then I want to give maybe 10% raise for the other 50% people who are medium.', 'start': 3012.099, 'duration': 9.245}, {'end': 3030.87, 'text': 'And there is another 5% for the people who kind of met some expectations.', 'start': 3023.306, 'duration': 7.564}, {'end': 3037.454, 'text': 'And the last one is generally they are sent to training or they are laid off, whatever.', 'start': 3031.931, 'duration': 5.523}, {'end': 3048.121, 'text': 'So the HR policy, and how do they do? These are all the clusters again, right? Top performance, top 5%, bottom 5%, the next 20%, whatever.', 'start': 3038.293, 'duration': 9.828}, {'end': 3049.282, 'text': 'These are all clusters.', 'start': 3048.161, 'duration': 1.121}], 'summary': 'Hr policies categorize employees into clusters based on performance for rewards and actions.', 'duration': 86.825, 'max_score': 2962.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42962457.jpg'}], 'start': 2864.871, 'title': 'K-means clustering technique and cluster performance evaluation', 'summary': 'Covers k-means clustering technique for classifying dissimilar objects into similar groups, and cluster performance evaluation including raise allocation based on performance levels and emphasizing cluster similarity and dissimilarity.', 'chapters': [{'end': 2989.708, 'start': 2864.871, 'title': 'K-means clustering technique', 'summary': 'Discusses the k-means clustering technique, which involves classifying objects into groups that are as dissimilar as possible from one another but as similar as possible within each group, with a mention of its application in organizational hr policies.', 'duration': 124.837, 'highlights': ['K-means clustering involves classifying objects into groups that are as dissimilar as possible from one another but as similar as possible within each group.', 'The technique can be applied in organizational HR policies for grouping individuals based on similarities and dissimilarities.', 'HR policies in an organization are not designed for individuals, leading to the need for grouping individuals based on similarities.']}, {'end': 3188.746, 'start': 2990.088, 'title': 'Cluster performance evaluation', 'summary': 'Discusses the process of cluster performance evaluation, including the allocation of raises based on performance levels and the concept of clustering techniques such as k-means, emphasizing the need for similarity within clusters and dissimilarity between clusters.', 'duration': 198.658, 'highlights': ['The chapter discusses the process of cluster performance evaluation, including the allocation of raises based on performance levels and the concept of clustering techniques such as k-means. The company allocates raises based on performance levels, with top 5% performers receiving a 20% raise, the next 50% receiving a 10% raise, and 5% being sent for training or laid off. The concept of clustering techniques like k-means is emphasized.', 'Emphasizing the need for similarity within clusters and dissimilarity between clusters. Clusters are expected to be as similar as possible within the group, with top performers grouped together and dissimilar from average performers. There is an emphasis on dissimilarity between clusters, ensuring that objects in different groups are as different as possible.']}], 'duration': 323.875, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu42864871.jpg', 'highlights': ['K-means clustering involves classifying objects into groups that are as dissimilar as possible from one another but as similar as possible within each group.', 'The technique can be applied in organizational HR policies for grouping individuals based on similarities and dissimilarities.', 'The chapter discusses the process of cluster performance evaluation, including the allocation of raises based on performance levels and the concept of clustering techniques such as k-means.', 'The company allocates raises based on performance levels, with top 5% performers receiving a 20% raise, the next 50% receiving a 10% raise, and 5% being sent for training or laid off.', 'Emphasizing the need for similarity within clusters and dissimilarity between clusters.', 'There is an emphasis on dissimilarity between clusters, ensuring that objects in different groups are as different as possible.']}, {'end': 3752.116, 'segs': [{'end': 3270.136, 'src': 'embed', 'start': 3240.905, 'weight': 0, 'content': [{'end': 3241.886, 'text': "That's what we are trying to say.", 'start': 3240.905, 'duration': 0.981}, {'end': 3245.289, 'text': 'So number of clusters is what we mean by K.', 'start': 3242.206, 'duration': 3.083}, {'end': 3249.392, 'text': "So what we are trying to say, I'll just show you the dataset, friends, to start with.", 'start': 3245.289, 'duration': 4.103}, {'end': 3254.836, 'text': "I have workers.txt, so okay, I'll just show you the file, dataset.", 'start': 3250.032, 'duration': 4.804}, {'end': 3270.136, 'text': 'Workers.txt, this is what I have, right? So what I have here is like, you know, I have like 50 workers, and they have scores, right? So 93.12.', 'start': 3259.68, 'duration': 10.456}], 'summary': 'Dataset contains 50 workers with scores, including 93.12.', 'duration': 29.231, 'max_score': 3240.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43240905.jpg'}, {'end': 3359.929, 'src': 'embed', 'start': 3332.49, 'weight': 3, 'content': [{'end': 3338.734, 'text': 'Similarly, there is a way to even determine what is the best, how many clusters you can have.', 'start': 3332.49, 'duration': 6.244}, {'end': 3347.84, 'text': 'So somebody may have five clusters versus 10 clusters versus whatever.', 'start': 3340.655, 'duration': 7.185}, {'end': 3356.928, 'text': "If I'm going to create 50 clusters here, that's going to be, completely useless here, right? So I need to have meaningful number of clusters.", 'start': 3348.32, 'duration': 8.608}, {'end': 3359.929, 'text': 'You know, at the same time, I cannot have only two clusters.', 'start': 3357.488, 'duration': 2.441}], 'summary': 'Determining the optimal number of clusters is crucial for meaningful results in clustering analysis.', 'duration': 27.439, 'max_score': 3332.49, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43332490.jpg'}, {'end': 3460.832, 'src': 'embed', 'start': 3433.225, 'weight': 1, 'content': [{'end': 3438.529, 'text': 'And so this one, so there is something called the elbow curve.', 'start': 3433.225, 'duration': 5.304}, {'end': 3441.292, 'text': "I think you will, I don't want to go into the details here.", 'start': 3438.59, 'duration': 2.702}, {'end': 3443.793, 'text': 'So the elbow curve.', 'start': 3441.852, 'duration': 1.941}, {'end': 3445.535, 'text': 'this is like number of clusters right?', 'start': 3443.793, 'duration': 1.742}, {'end': 3446.936, 'text': 'If I do this, right?', 'start': 3445.915, 'duration': 1.021}, {'end': 3460.832, 'text': 'So if I do the, if I try to use the k-means here, I can at least, before applying the k-means, I should know how many number of clusters I can have.', 'start': 3447.685, 'duration': 13.147}], 'summary': 'Elbow curve helps determine optimal number of clusters for k-means.', 'duration': 27.607, 'max_score': 3433.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43433225.jpg'}, {'end': 3709.521, 'src': 'embed', 'start': 3680.936, 'weight': 5, 'content': [{'end': 3687.538, 'text': 'Even the system how it works, is it there if I want to create? Four, set, four clusters.', 'start': 3680.936, 'duration': 6.602}, {'end': 3693.38, 'text': 'right, if you know, even if the HR wants to create four clusters, They will probably form four groups.', 'start': 3687.538, 'duration': 5.842}, {'end': 3704.8, 'text': "Let's say experts and Let's say medium performers, and let's say average performers and and let's say bottom, most performers right?", 'start': 3693.62, 'duration': 11.18}, {'end': 3709.521, 'text': 'And they may, you know, with some basic knowledge.', 'start': 3705.461, 'duration': 4.06}], 'summary': 'System can create 4 clusters based on performance levels.', 'duration': 28.585, 'max_score': 3680.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43680936.jpg'}], 'start': 3188.966, 'title': 'K-means clustering', 'summary': "Discusses k-means clustering for a dataset of 50 workers' performance scores, emphasizing the need for a meaningful number of clusters to separate high performers from low performers. it also explores the concept of elbow curve to determine the ideal number of clusters, with key points including visual interpretation, cluster ids, and cluster accuracy.", 'chapters': [{'end': 3431.764, 'start': 3188.966, 'title': 'K-means clustering for worker performance', 'summary': "Discusses the process of k-means clustering for a dataset of 50 workers' performance scores, emphasizing the need for a meaningful number of clusters to separate high performers from low performers.", 'duration': 242.798, 'highlights': ["K-means clustering aims to create distinct clusters within a dataset of 50 workers' performance scores, ensuring meaningful separation between high performers and low performers.", 'The dataset consists of 50 workers and their performance scores, with a need to determine meaningful clusters to separate high performers from low performers.', 'The process of K-means clustering for the dataset involves ensuring a meaningful number of clusters to effectively distinguish between different ranges of performance scores.']}, {'end': 3752.116, 'start': 3433.225, 'title': 'Understanding k-means clustering', 'summary': 'Explores the concept of elbow curve to determine the ideal number of clusters for k-means clustering, with key points including the visual interpretation of the elbow curve, the identification of cluster ids, and the evaluation of cluster accuracy.', 'duration': 318.891, 'highlights': ['The chapter explains the concept of the elbow curve, used to determine the ideal number of clusters for K-means clustering, through visual interpretation. Visual interpretation of the elbow curve to identify the ideal number of clusters for K-means clustering.', 'It discusses the process of identifying cluster IDs and the significance of cluster accuracy evaluation in K-means clustering. Identification of cluster IDs and evaluation of cluster accuracy in K-means clustering.', 'The chapter presents a simple example to illustrate the process of forming clusters and the potential adjustments based on performance over time. Illustration of the process of forming clusters and potential adjustments based on performance over time.']}], 'duration': 563.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43188966.jpg', 'highlights': ["K-means clustering aims to create distinct clusters within a dataset of 50 workers' performance scores, ensuring meaningful separation between high performers and low performers.", 'The chapter explains the concept of the elbow curve, used to determine the ideal number of clusters for K-means clustering, through visual interpretation.', 'The dataset consists of 50 workers and their performance scores, with a need to determine meaningful clusters to separate high performers from low performers.', 'The process of K-means clustering for the dataset involves ensuring a meaningful number of clusters to effectively distinguish between different ranges of performance scores.', 'It discusses the process of identifying cluster IDs and the significance of cluster accuracy evaluation in K-means clustering.', 'The chapter presents a simple example to illustrate the process of forming clusters and the potential adjustments based on performance over time.']}, {'end': 4398.617, 'segs': [{'end': 3835.636, 'src': 'embed', 'start': 3804.802, 'weight': 0, 'content': [{'end': 3806.904, 'text': 'Once you know, I try to move the centroid.', 'start': 3804.802, 'duration': 2.102}, {'end': 3813.339, 'text': 'Every time, you know, the centroid is the center point, right? Center point is like a point, right? So that keeps shifting gradually.', 'start': 3806.964, 'duration': 6.375}, {'end': 3818.383, 'text': "So then after everything is done, right, at some point it stops, there's nothing to be moved.", 'start': 3813.68, 'duration': 4.703}, {'end': 3824.348, 'text': "That's when we stop making this and we have perfectly formed clusters, correct?", 'start': 3818.783, 'duration': 5.565}, {'end': 3829.992, 'text': "So I'll show you a simple how do we decide central?", 'start': 3824.668, 'duration': 5.324}, {'end': 3832.133, 'text': 'No, Jaya, the point is, how do we decide?', 'start': 3830.052, 'duration': 2.081}, {'end': 3835.636, 'text': 'Decide, it is done at a very random fashion, Jaya.', 'start': 3832.594, 'duration': 3.042}], 'summary': 'The centroid is gradually shifted until perfectly formed clusters are achieved.', 'duration': 30.834, 'max_score': 3804.802, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43804802.jpg'}, {'end': 3966.633, 'src': 'heatmap', 'start': 3922.424, 'weight': 1, 'content': [{'end': 3940.088, 'text': 'Yeah?. I can also do something like I can put a data, I can assign to a variable on this and if you see this,', 'start': 3922.424, 'duration': 17.664}, {'end': 3944.39, 'text': 'I can put the cluster information against each record.', 'start': 3940.088, 'duration': 4.302}, {'end': 3948.893, 'text': 'Sorry, and I can plot this as well.', 'start': 3946.031, 'duration': 2.862}, {'end': 3960.411, 'text': 'So, this is, I can plot the, see, look at this, right? This is what I try to plot.', 'start': 3952.529, 'duration': 7.882}, {'end': 3966.633, 'text': 'With the cluster information.', 'start': 3965.413, 'duration': 1.22}], 'summary': 'Can assign data to a variable and plot cluster information for analysis.', 'duration': 44.209, 'max_score': 3922.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43922424.jpg'}, {'end': 4032.998, 'src': 'embed', 'start': 3999.876, 'weight': 1, 'content': [{'end': 4001.657, 'text': 'Yeah, this is something which we already saw.', 'start': 3999.876, 'duration': 1.781}, {'end': 4009.459, 'text': 'Okay, so friends, I think this is what I wanted to show you at, you know, just the base, you know, how we execute the k-means.', 'start': 4002.637, 'duration': 6.822}, {'end': 4011.399, 'text': "It's a very powerful algorithm.", 'start': 4009.659, 'duration': 1.74}, {'end': 4015.507, 'text': 'And there is a question from Amruta.', 'start': 4012.205, 'duration': 3.302}, {'end': 4027.615, 'text': 'Data set as categorical and continuous variables.', 'start': 4016.348, 'duration': 11.267}, {'end': 4032.998, 'text': 'Does K-means accept the data set? It can accept categorical as well.', 'start': 4027.835, 'duration': 5.163}], 'summary': 'K-means is a powerful algorithm that can accept categorical and continuous variables in the data set.', 'duration': 33.122, 'max_score': 3999.876, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43999876.jpg'}, {'end': 4118.795, 'src': 'embed', 'start': 4091.228, 'weight': 3, 'content': [{'end': 4101.317, 'text': 'Because we apply k-means for telecom churn prediction, this data set will have customer information as categorical and continuous.', 'start': 4091.228, 'duration': 10.089}, {'end': 4104.482, 'text': "So didn't understand how to apply in this case.", 'start': 4101.84, 'duration': 2.642}, {'end': 4112.89, 'text': 'You will have both coming, Amruta, in this case? Customer information, we just, okay.', 'start': 4106.604, 'duration': 6.286}, {'end': 4118.795, 'text': 'Okay, I need to see this, Amruta.', 'start': 4116.912, 'duration': 1.883}], 'summary': 'Using k-means for telecom churn prediction with customer data.', 'duration': 27.567, 'max_score': 4091.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu44091228.jpg'}, {'end': 4255.453, 'src': 'embed', 'start': 4200.036, 'weight': 4, 'content': [{'end': 4202.617, 'text': 'But my point is like you know, how do we determine?', 'start': 4200.036, 'duration': 2.581}, {'end': 4209.582, 'text': 'for example, a lot of people will not know what is the ideal way, what is the ideal number of clusters I should have, right?', 'start': 4202.617, 'duration': 6.965}, {'end': 4217.046, 'text': 'So this is elbow curve is one thing, but you know, that is something you need to.', 'start': 4210.422, 'duration': 6.624}, {'end': 4226.539, 'text': "you know, You need to understand the basics very clearly, because it's just not enough for you to just know the mechanics of clustering.", 'start': 4217.046, 'duration': 9.493}, {'end': 4232.564, 'text': 'How do we even do clustering? I think churn prediction, Amruta is asking.', 'start': 4226.579, 'duration': 5.985}, {'end': 4240.451, 'text': 'There are so many things people do clustering like predicting the, they even predict crime rate and all that.', 'start': 4232.684, 'duration': 7.767}, {'end': 4245.134, 'text': 'They predict what can be the number of fatalities in accidents.', 'start': 4240.891, 'duration': 4.243}, {'end': 4247.997, 'text': 'So many things they do clustering techniques.', 'start': 4245.675, 'duration': 2.322}, {'end': 4255.453, 'text': 'Right, so then book to improve statistical mathematics.', 'start': 4248.769, 'duration': 6.684}], 'summary': 'Understanding clustering basics is crucial for various predictions and applications, such as predicting crime rates and fatalities in accidents.', 'duration': 55.417, 'max_score': 4200.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu44200036.jpg'}], 'start': 3752.496, 'title': 'K-means clustering algorithm and its telecom churn prediction', 'summary': 'Explains the k-means clustering algorithm, which involves shuffling data points into clusters, shifting centroids to form homogeneous clusters, and plotting the final clusters, with the algorithm being capable of accepting both categorical and continuous variables. additionally, it discusses the application of k-means for telecom churn prediction, addressing challenges in applying k-means to categorical and continuous customer information, determining the ideal number of clusters, and the potential applications of clustering techniques in predicting crime rates and fatalities.', 'chapters': [{'end': 4090.187, 'start': 3752.496, 'title': 'K-means clustering algorithm', 'summary': 'Explains the k-means clustering algorithm, which involves shuffling data points into clusters, shifting centroids to form homogeneous clusters, and plotting the final clusters, with the algorithm being capable of accepting both categorical and continuous variables.', 'duration': 337.691, 'highlights': ["The algorithm involves shuffling data points into clusters, shifting centroids gradually to form homogeneous clusters, and stopping when there's no further movement, resulting in perfectly formed clusters.", 'The process includes plotting the clusters, showing the gradual shifting of centroids, and assigning cluster information to each data point, with the algorithm being powerful and capable of accepting categorical as well as continuous variables.', "The presenter addresses the audience's questions about the data distribution among clusters, emphasizing that the algorithm can handle both categorical and continuous variables, and expressing confidence that there is a way to address data set combinations."]}, {'end': 4398.617, 'start': 4091.228, 'title': 'K-means for telecom churn prediction', 'summary': 'Discusses the application of k-means for telecom churn prediction, addressing challenges in applying k-means to categorical and continuous customer information, determining the ideal number of clusters, and the potential applications of clustering techniques in predicting crime rates and fatalities.', 'duration': 307.389, 'highlights': ['The chapter addresses challenges in applying k-means to categorical and continuous customer information, suggesting the conversion of categorical data to factor variables.', 'It emphasizes the importance of understanding the basics of clustering, particularly in determining the ideal number of clusters, such as using the elbow curve method.', 'The potential applications of clustering techniques are highlighted, including predicting crime rates, fatalities in accidents, and improving statistical mathematics.']}], 'duration': 646.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BfowBtIxNu4/pics/BfowBtIxNu43752496.jpg', 'highlights': ["The algorithm involves shuffling data points into clusters, shifting centroids gradually to form homogeneous clusters, and stopping when there's no further movement, resulting in perfectly formed clusters.", 'The process includes plotting the clusters, showing the gradual shifting of centroids, and assigning cluster information to each data point, with the algorithm being powerful and capable of accepting categorical as well as continuous variables.', "The presenter addresses the audience's questions about the data distribution among clusters, emphasizing that the algorithm can handle both categorical and continuous variables, and expressing confidence that there is a way to address data set combinations.", 'The chapter addresses challenges in applying k-means to categorical and continuous customer information, suggesting the conversion of categorical data to factor variables.', 'It emphasizes the importance of understanding the basics of clustering, particularly in determining the ideal number of clusters, such as using the elbow curve method.', 'The potential applications of clustering techniques are highlighted, including predicting crime rates, fatalities in accidents, and improving statistical mathematics.']}], 'highlights': ['Data science involves extracting meaningful and actionable knowledge from data, likening data to gold in terms of its value.', 'Data is coming from various sources such as sensors, logs, social media, wearables, and devices, with a projection of further multiplication.', 'The top five data science algorithms covered are decision tree, random forest, association rule mining, linear regression, and k-means clustering.', 'The session includes a hands-on lab session towards the end, providing practical application of the discussed algorithms.', 'Data scientists are in high demand and will continue to be in even greater demand in the future.', 'Data analysis drives critical decisions in the economy and within companies, emphasizing the importance of smart data analysis.', 'Becoming a top-notch data scientist requires a combination of talent, knowledge, statistics, and software engineering skills.', 'Machine learning is a complex field within artificial intelligence, used in various industries including healthcare for diagnosis and medication recommendations.', 'The increasing power of chess engines is evident from the fact that even established grandmasters like Kasparov have lost to them.', 'The Random Forest algorithm emphasizes the importance of seeking multiple opinions for critical decisions, as it has a higher success rate than relying on a single opinion.', 'Random forest is an ensemble of decision tree models with a higher success rate.', 'Association rule mining analyzes shopping basket data to determine associations between items, influencing product placement and marketing strategies.', 'Regression is a popular algorithm used for prediction and forecasting, with linear regression being quite popular. It is used for predicting the price of an apartment by considering factors such as square feet area, location, amenities, and neighborhood safety.', 'K-means clustering involves classifying objects into groups that are as dissimilar as possible from one another but as similar as possible within each group.', 'The company allocates raises based on performance levels, with top 5% performers receiving a 20% raise, the next 50% receiving a 10% raise, and 5% being sent for training or laid off.', "K-means clustering aims to create distinct clusters within a dataset of 50 workers' performance scores, ensuring meaningful separation between high performers and low performers.", "The algorithm involves shuffling data points into clusters, shifting centroids gradually to form homogeneous clusters, and stopping when there's no further movement, resulting in perfectly formed clusters."]}