Coursnap

title
Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka

description
🔥 Machine Learning Engineer Masters Program (Use Code "𝐘𝐎𝐔𝐓𝐔𝐁𝐄𝟐𝟎"): https://www.edureka.co/masters-program/machine-learning-engineer-training This Edureka Machine Learning Full Course video will help you understand and learn Machine Learning Algorithms in detail. This Machine Learning Tutorial is ideal for both beginners as well as professionals who want to master Machine Learning Algorithms. Below are the topics covered in this Machine Learning Tutorial for Beginners video: 00:00 Introduction to Machine Learning Full Course 2:47 What is Machine Learning? 4:08 AI vs ML vs Deep Learning 5:43 How does Machine Learning works? 6:18 Types of Machine Learning 6:43 Supervised Learning 8:38 Supervised Learning Examples 11:49 Unsupervised Learning 13:54 Unsupervised Learning Examples 16:09 Reinforcement Learning 18:39 Reinforcement Learning Examples 19:34 AI vs Machine Learning vs Deep Learning 22:09 Examples of AI 23:39 Examples of Machine Learning 25:04 What is Deep Learning? 25:54 Example of Deep Learning 27:29 Machine Learning vs Deep Learning 33:49 Jupyter Notebook Tutorial 34:49 Installation 50:24 Machine Learning Tutorial 51:04 Classification Algorithm 51:39 Anomaly Detection Algorithm 52:14 Clustering Algorithm 53:34 Regression Algorithm 54:14 Demo: Iris Dataset 1:12:11 Stats & Probability for Machine Learning 1:16:16 Categories of Data 1:16:36 Qualitative Data 1:17:51 Quantitative Data 1:20:55 What is Statistics? 1:23:25 Statistics Terminologies 1:24:30 Sampling Techniques 1:27:15 Random Sampling 1:28:05 Systematic Sampling 1:28:35 Stratified Sampling 1:29:35 Types of Statistics 1:32:21 Descriptive Statistics 1:37:36 Measures of Spread 1:44:01 Information Gain & Entropy 1:56:08 Confusion Matrix 2:00:53 Probability 2:03:19 Probability Terminologies 2:04:55 Types of Events 2:05:35 Probability of Distribution 2:10:45 Types of Probability 2:11:10 Marginal Probability 2:11:40 Joint Probability 2:12:35 Conditional Probability 2:13:30 Use-Case 2:17:25 Bayes Theorem 2:23:40 Inferential Statistics 2:24:00 Point Estimation 2:26:50 Interval Estimate 2:30:10 Margin of Error 2:34:20 Hypothesis Testing 2:41:25 Supervised Learning Algorithms 2:42:40 Regression 2:44:05 Linear vs Logistic Regression 2:49:55 Understanding Linear Regression Algorithm 3:11:10 Logistic Regression Curve 3:18:34 Titanic Data Analysis 3:58:39 Decision Tree 3:58:59 what is Classification? 4:01:24 Types of Classification 4:08:35 Decision Tree 4:14:20 Decision Tree Terminologies 4:18:05 Entropy 4:44:05 Credit Risk Detection Use-case 4:51:45 Random Forest 5:00:40 Random Forest Use-Cases 5:04:29 Random Forest Algorithm 5:16:44 KNN Algorithm 5:20:09 KNN Algorithm Working 5:27:24 KNN Demo 5:35:05 Naive Bayes 5:40:55 Naive Bayes Working 5:44:25Industrial Use of Naive Bayes 5:50:25 Types of Naive Bayes 5:51:25 Steps involved in Naive Bayes 5:52:05 PIMA Diabetic Test Use Case 6:04:55 Support Vector Machine 6:10:20 Non-Linear SVM 6:12:05 SVM Use-case 6:13:30 k Means Clustering & Association Rule Mining 6:16:33 Types of Clustering 6:17:34 K-Means Clustering 6:17:59 K-Means Working 6:21:54 Pros & Cons of K-Means Clustering 6:23:44 K-Means Demo 6:28:44 Hierarchical Clustering 6:31:14 Association Rule Mining 6:34:04 Apriori Algorithm 6:39:19 Apriori Algorithm Demo 6:43:29 Reinforcement Learning 6:46:39 Reinforcement Learning: Counter-Strike Example 6:53:59 Markov's Decision Process 6:58:04 Q-Learning 7:02:39 The Bellman Equation 7:12:14 Transitioning to Q-Learning 7:17:29 Implementing Q-Learning 7:23:33 Machine Learning Projects 7:38:53 Who is a ML Engineer? 7:39:28 ML Engineer Job Trends 7:40:43 ML Engineer Salary Trends 7:42:33 ML Engineer Skills 7:44:08 ML Engineer Job Description 7:45:53 ML Engineer Resume 7:54:48 Machine Learning Interview Questions Edureka Machine Learning Training 🔵 Machine Learning Course using Python: http://bit.ly/38BaJco 🔵 Machine Learning Engineer Masters Program: http://bit.ly/2UYS46r 🔵Python Masters Program: https://bit.ly/3cVibjY 🔵 Python Programming Training: http://bit.ly/38ykZCg 🔵 Data Scientist Masters Program: http://bit.ly/31ZsWOn PG Diploma in Artificial Intelligence and Machine Learning with NIT Warangal : https://www.edureka.co/post-graduate/machine-learning-and-ai 🔴 Subscribe to our channel to get latest video updates: https://goo.gl/6ohpTV ⏩ NEW Top 10 Technologies To Learn In 2024 - https://www.youtube.com/watch?v=vaLXPv0ewHU 📌𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦: https://t.me/edurekaupdates 📌𝐓𝐰𝐢𝐭𝐭𝐞𝐫: https://twitter.com/edurekain 📌𝐋𝐢𝐧𝐤𝐞𝐝𝐈𝐧: https://www.linkedin.com/company/edureka 📌𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦: https://www.instagram.com/edureka_learning/ 📌𝐅𝐚𝐜𝐞𝐛𝐨𝐨𝐤: https://www.facebook.com/edurekaIN/ 📌𝐒𝐥𝐢𝐝𝐞𝐒𝐡𝐚𝐫𝐞: https://www.slideshare.net/EdurekaIN 📌𝐂𝐚𝐬𝐭𝐛𝐨𝐱: https://castbox.fm/networks/505?country=IN 📌𝐌𝐞𝐞𝐭𝐮𝐩: https://www.meetup.com/edureka/ 📌𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲: https://www.edureka.co/community/ For more information, please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: +18338555775 (toll-free).

detail
{'title': 'Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka', 'heatmap': [{'end': 9373.162, 'start': 9021.705, 'weight': 1}], 'summary': 'The 10-hour machine learning course covers the huge demand for ml, its applications, and the prediction of 40% new application projects requiring ml co-developers by 2022, generating $3.9 trillion. it delves into supervised, unsupervised, and reinforcement learning, statistical concepts, decision trees, logistic regression, random forest, knn, naive bayes, svm, clustering, association rule mining, reinforcement learning, robot navigation, latest trends, essential skills, and job trends with an average us salary of $111,490 and 7,19,646 rupees in india.', 'chapters': [{'end': 911.179, 'segs': [{'end': 159.987, 'src': 'embed', 'start': 123.856, 'weight': 0, 'content': [{'end': 126.557, 'text': 'As a part of fifth module, we have reinforcement learning.', 'start': 123.856, 'duration': 2.701}, {'end': 131.759, 'text': 'Here we are going to discuss about reinforcement learning in depth and also about Q-learning algorithm.', 'start': 126.957, 'duration': 4.802}, {'end': 135.281, 'text': "Finally, in the end, it's all about to make you industry ready.", 'start': 132.419, 'duration': 2.862}, {'end': 141.663, 'text': 'so here we are going to discuss about three different projects, which are based on supervised learning,', 'start': 136.261, 'duration': 5.402}, {'end': 144.643, 'text': 'unsupervised learning and reinforcement learning.', 'start': 141.663, 'duration': 2.98}, {'end': 151.085, 'text': "finally, in the end, i'll tell you about some of the skills that you need to become a machine learning engineer okay,", 'start': 144.643, 'duration': 6.442}, {'end': 156.727, 'text': "and also i'll be discussing about some of the important questions that are asked in a machine learning interview.", 'start': 151.085, 'duration': 5.642}, {'end': 159.987, 'text': 'fine, with this we come to the end of this agenda.', 'start': 156.727, 'duration': 3.26}], 'summary': 'Module 5 covers reinforcement learning, q-learning, and industry-relevant projects and skills for machine learning engineers.', 'duration': 36.131, 'max_score': 123.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ123856.jpg'}, {'end': 572.819, 'src': 'embed', 'start': 540.212, 'weight': 1, 'content': [{'end': 544.194, 'text': 'Now one strain the resulting predictive model can be deployed to the production environment.', 'start': 540.212, 'duration': 3.982}, {'end': 545.454, 'text': 'You can see a mobile app.', 'start': 544.514, 'duration': 0.94}, {'end': 551.116, 'text': 'For example, once deployed it is ready to recognize the new pictures right now.', 'start': 545.474, 'duration': 5.642}, {'end': 555.318, 'text': 'You might be wondering why this category of machine learning is named as supervised learning.', 'start': 551.156, 'duration': 4.162}, {'end': 564.476, 'text': 'Well, it is called as supervised learning because the process of an algorithm learning from the training data set can be thought of as a teacher supervising the learning process.', 'start': 555.974, 'duration': 8.502}, {'end': 572.819, 'text': 'We know the correct answers the algorithm iteratively makes while predicting on the training data and is corrected by the teacher.', 'start': 565.197, 'duration': 7.622}], 'summary': 'Supervised learning enables algorithm to learn from training data with teacher supervision.', 'duration': 32.607, 'max_score': 540.212, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ540212.jpg'}, {'end': 644.763, 'src': 'embed', 'start': 618.065, 'weight': 2, 'content': [{'end': 622.309, 'text': 'So these are just an example of supervised learning next comes the weather app.', 'start': 618.065, 'duration': 4.244}, {'end': 624.571, 'text': 'Based on some of the prior knowledge.', 'start': 622.87, 'duration': 1.701}, {'end': 629.274, 'text': 'like when it is sunny, the temperature is higher, when it is cloudy, humidity is higher.', 'start': 624.571, 'duration': 4.703}, {'end': 630.134, 'text': 'any kind of that?', 'start': 629.274, 'duration': 0.86}, {'end': 632.055, 'text': 'they predict the parameters for a given time.', 'start': 630.134, 'duration': 1.921}, {'end': 639.2, 'text': 'So this is also an example of supervised learning, as you are feeding the data to the machine and telling that whenever it is sunny,', 'start': 632.676, 'duration': 6.524}, {'end': 640.4, 'text': 'the temperature should be higher.', 'start': 639.2, 'duration': 1.2}, {'end': 642.622, 'text': 'whenever it is cloudy, the humidity should be higher.', 'start': 640.4, 'duration': 2.222}, {'end': 644.763, 'text': "So it's an example of supervised learning.", 'start': 642.982, 'duration': 1.781}], 'summary': 'Supervised learning predicts weather parameters based on prior knowledge, such as temperature and humidity changes with weather conditions.', 'duration': 26.698, 'max_score': 618.065, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ618065.jpg'}], 'start': 6.994, 'title': 'Machine learning and its applications', 'summary': "Discusses the huge demand for machine learning, predicting 40% new application development projects requiring machine learning co-developers by 2022, generating a revenue of around 3.9 trillion dollars. it introduces a new well-structured machine learning full course and outlines the six different modules it covers. additionally, it covers the fundamentals of machine learning, including supervised, unsupervised, and reinforcement learning, along with the differentiation between machine learning, artificial intelligence, and deep learning. the chapter also explains the distinctions between artificial intelligence, machine learning, and deep learning, delves into the process of machine learning, and provides a comprehensive overview of supervised learning, including its mathematical definition, examples, and popular use cases in various domains. furthermore, it explains the application of supervised and unsupervised learning in predicting credit card holder's credit worthiness, healthcare sector's patient readmission rates, and retail sector's customer purchasing patterns.", 'chapters': [{'end': 85.196, 'start': 6.994, 'title': 'Machine learning: huge demand and new course', 'summary': 'Discusses the huge demand for machine learning, with a prediction of 40% new application development projects requiring machine learning co-developers by 2022, generating a revenue of around 3.9 trillion dollars. it introduces a new well-structured machine learning full course and outlines the six different modules it covers.', 'duration': 78.202, 'highlights': ["Gartner predicts that by 2022, there'll be at least 40% of new application development projects requiring machine learning co-developers, generating a revenue of around 3.9 trillion dollars.", 'Introduction of a well-structured machine learning full course at Edureka designed to gradually cover beginner to advanced topics.', 'The course is designed to cover six different modules, including an introduction to machine learning, statistics and probability, and more.']}, {'end': 277.625, 'start': 85.696, 'title': 'Introduction to machine learning', 'summary': 'Covers the fundamentals of machine learning, including supervised, unsupervised, and reinforcement learning, along with the differentiation between machine learning, artificial intelligence, and deep learning. the session also includes practical projects and essential skills for becoming a machine learning engineer.', 'duration': 191.929, 'highlights': ['The chapter covers the fundamentals of machine learning, including supervised, unsupervised, and reinforcement learning.', 'Practical projects and essential skills for becoming a machine learning engineer are included in the session.', 'Differentiation between machine learning, artificial intelligence, and deep learning is explained.']}, {'end': 664.268, 'start': 278.197, 'title': 'Ai, machine learning & supervised learning', 'summary': 'Explains the distinctions between artificial intelligence, machine learning, and deep learning, delves into the process of machine learning, and provides a comprehensive overview of supervised learning, including its mathematical definition, examples, and popular use cases in various domains.', 'duration': 386.071, 'highlights': ['Supervised learning is a machine learning method where each instance of a training data set consists of different input attributes and an expected output, and the algorithm learns the input pattern that generates the expected output.', 'The training data set for supervised learning can be composed of labeled pictures of ducks and non-ducks, resulting in a predictive model capable of associating a label (duck or not duck) to new images.', 'Popular use cases of supervised learning include speech automation in mobile phones, weather prediction based on prior knowledge, and biometric attendance validation in various sectors.']}, {'end': 911.179, 'start': 664.268, 'title': 'Machine learning in banking, healthcare, and retail', 'summary': "Explains the application of supervised and unsupervised learning in predicting credit card holder's credit worthiness, healthcare sector's patient readmission rates, and retail sector's customer purchasing patterns. it also delves into the concept of unsupervised learning, its algorithms, and real-life examples to illustrate its application.", 'duration': 246.911, 'highlights': ['Supervised learning is used to predict the credit worthiness of a credit card holder by building a machine learning model to look for faulty attributes by providing it with data on delinquent and non-delinquent customers.', "In the healthcare sector, supervised learning is used to predict patient's readmission rates by building a regression model, using data on patient's treatment, administration, and readmissions to show variables that best correlate with readmission.", 'In the retail sector, supervised learning is used to analyze the product that a customer buys together, by building a model to identify frequent item sets and Association rule from the transactional data.', 'Unsupervised learning aims to model the underlying structure or distribution in the data in order to learn more about the data, with examples such as clustering.', 'Unsupervised learning is used to classify strangers at a party based on attributes like gender, age group, dressing, education, or any other observable criteria.']}], 'duration': 904.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ6994.jpg', 'highlights': ['By 2022, 40% new app projects need ML co-developers, generating $3.9 trillion revenue', 'Edureka offers a structured ML course with six modules, covering beginner to advanced topics', 'Fundamentals of ML include supervised, unsupervised, and reinforcement learning', "Supervised learning predicts credit card holder's credit worthiness by building a model", "Supervised learning predicts healthcare sector's patient readmission rates using regression model", "Supervised learning analyzes retail sector's customer purchasing patterns by identifying frequent item sets", 'Unsupervised learning aims to model underlying data structure, with examples such as clustering']}, {'end': 3094.239, 'segs': [{'end': 1327.417, 'src': 'embed', 'start': 1303.948, 'weight': 3, 'content': [{'end': 1310.531, 'text': 'Those working on it took pride in their crafts building bricks and chiseling stone that was going to be placed into the great structure.', 'start': 1303.948, 'duration': 6.583}, {'end': 1318.834, 'text': 'So, as AI researchers, we should think of ourselves as humble brick makers whose job is to study how to build components,', 'start': 1311.131, 'duration': 7.703}, {'end': 1321.995, 'text': 'example passes planners or learning algorithm or etc.', 'start': 1318.834, 'duration': 3.161}, {'end': 1327.417, 'text': 'anything that someday someone and somewhere will integrate into the intelligent systems.', 'start': 1321.995, 'duration': 5.422}], 'summary': 'Ai researchers should think of themselves as humble brick makers, studying how to build components for integration into intelligent systems.', 'duration': 23.469, 'max_score': 1303.948, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ1303948.jpg'}, {'end': 2008.116, 'src': 'embed', 'start': 1969.625, 'weight': 1, 'content': [{'end': 1974.987, 'text': "but we don't know what the neurons are supposed to model and what these layers of neuron were doing collectively.", 'start': 1969.625, 'duration': 5.362}, {'end': 1977.712, 'text': 'So he failed to interpret the result.', 'start': 1975.692, 'duration': 2.02}, {'end': 1984.494, 'text': 'on the other hand, machine learning algorithm like decision tree gives us a crisp rule for white chose and what it chose.', 'start': 1977.712, 'duration': 6.782}, {'end': 1987.954, 'text': 'So it is particularly easy to interpret the reasoning behind it.', 'start': 1985.034, 'duration': 2.92}, {'end': 1995.456, 'text': 'Therefore the algorithms like decision tree and linear or logistic regression are primarily used in industry for interpretability.', 'start': 1988.495, 'duration': 6.961}, {'end': 1997.977, 'text': 'Let me summarize things for you.', 'start': 1996.176, 'duration': 1.801}, {'end': 2001.077, 'text': 'machine learning uses algorithm to parse the data.', 'start': 1997.977, 'duration': 3.1}, {'end': 2005.238, 'text': 'learn from the data and make informed decision based on what it has learned.', 'start': 2001.077, 'duration': 4.161}, {'end': 2008.116, 'text': 'Fine at this deep learning structures,', 'start': 2005.754, 'duration': 2.362}], 'summary': 'Machine learning algorithms like decision tree and regression are used in industry for interpretability.', 'duration': 38.491, 'max_score': 1969.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ1969625.jpg'}, {'end': 2685.171, 'src': 'embed', 'start': 2656.713, 'weight': 5, 'content': [{'end': 2660.395, 'text': 'so So, as you can see, we have renamed this particular cell.', 'start': 2656.713, 'duration': 3.682}, {'end': 2668.043, 'text': 'Now auto save option should be on the next to the title as you can see last checkpoint a few days ago unsaved changes.', 'start': 2661.036, 'duration': 7.007}, {'end': 2670.465, 'text': 'The auto save option is always on.', 'start': 2668.783, 'duration': 1.682}, {'end': 2678.748, 'text': 'What we do is with an accurate name we can find the selection and this particular notebook very easily from the notebook home page.', 'start': 2671.184, 'duration': 7.564}, {'end': 2685.171, 'text': "So if you select your browser's home tab and refresh you will find this new window name displayed here again.", 'start': 2679.228, 'duration': 5.943}], 'summary': 'A cell has been renamed, and the auto-save option is always on.', 'duration': 28.458, 'max_score': 2656.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ2656713.jpg'}, {'end': 2922.263, 'src': 'embed', 'start': 2897.147, 'weight': 0, 'content': [{'end': 2903.051, 'text': "how well the script performs and what information is stored in the metadata, especially if it's a large data set.", 'start': 2897.147, 'duration': 5.904}, {'end': 2908.794, 'text': "So our Python script accesses the iris data set here that's built into one of the Python packages.", 'start': 2903.631, 'duration': 5.163}, {'end': 2916.119, 'text': 'Now all we are looking into to do is to read in slightly large number of items and calculate some basic operations on the data set.', 'start': 2909.295, 'duration': 6.824}, {'end': 2922.263, 'text': 'So first of all, what we need to do is from sklearn import the data set.', 'start': 2916.539, 'duration': 5.724}], 'summary': 'Python script accesses iris dataset to perform operations on a large data set.', 'duration': 25.116, 'max_score': 2897.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ2897147.jpg'}, {'end': 3101.028, 'src': 'embed', 'start': 3076.454, 'weight': 4, 'content': [{'end': 3084.936, 'text': 'Classification is a supervised learning approach in which the computer program learns from the input given to it and then uses this learning to classify new observation.', 'start': 3076.454, 'duration': 8.482}, {'end': 3094.239, 'text': 'Some examples of classification problems are speech organization handwriting organization biometric identification document classification Etc.', 'start': 3085.416, 'duration': 8.823}, {'end': 3101.028, 'text': 'So next is the anomaly detection algorithm where you identify the unusual data point.', 'start': 3095.527, 'duration': 5.501}], 'summary': 'Supervised learning for classification and anomaly detection explained.', 'duration': 24.574, 'max_score': 3076.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3076454.jpg'}], 'start': 911.659, 'title': 'Machine learning applications and comparison with deep learning', 'summary': 'Covers applications of unsupervised learning in banking, healthcare, and retail sectors, reinforcement learning, and discusses the differences between machine learning and deep learning, including training and testing time, interpretability, and practical applications.', 'chapters': [{'end': 1266.287, 'start': 911.659, 'title': 'Applications of machine learning', 'summary': 'Discusses the applications of unsupervised learning in banking, healthcare, and retail sectors, as well as the concept of reinforcement learning, its use cases, and its application in different industries, with examples and statistics.', 'duration': 354.628, 'highlights': ['Reinforcement learning in banking sector', 'Reinforcement learning in healthcare sector', 'Reinforcement learning in retail sector', 'Explanation of reinforcement learning and its application', 'Overview of artificial intelligence and machine learning']}, {'end': 1907.764, 'start': 1266.507, 'title': 'Understanding ai, machine learning, and deep learning', 'summary': 'Provides an overview of ai, machine learning, and deep learning, highlighting the importance of ai in replicating human behavior, the role of machine learning in data-driven decision-making, and the increasing significance of deep learning in recognizing complex patterns and features. it covers the challenges that led to the development of machine learning, the evolution from symbolic approaches to statistical methods, and the comparison between machine learning and deep learning in terms of data dependencies, hardware dependencies, feature engineering, problem-solving approach, and execution time.', 'duration': 641.257, 'highlights': ['AI replicates human behavior and enables machines to learn from experience.', 'Developing AI is akin to building a complex structure, with AI researchers focusing on creating components integrated into intelligent systems.', "AI examples in everyday life include Apple's Siri, playing computer, Tesla's self-driving car, based on deep learning and natural language processing.", 'Machine learning enables computers to make data-driven decisions and improve over time when exposed to new data.', 'Deep learning, a subset of machine learning, employs large amounts of data to learn complex functions without relying on specific algorithms.', 'Comparison between machine learning and deep learning in terms of data dependencies, hardware dependencies, feature engineering, problem-solving approach, and execution time.']}, {'end': 2154.679, 'start': 1908.125, 'title': 'Comparison: machine learning vs. deep learning', 'summary': 'Discusses the differences between machine learning and deep learning, highlighting aspects such as training and testing time, interpretability, and practical applications, emphasizing that deep learning is a subfield of machine learning and is usually behind the most human-like artificial intelligence.', 'duration': 246.554, 'highlights': ['Deep learning algorithm takes much less time to run during testing compared to machine learning algorithms.', 'Interpretability is a critical factor for comparison of machine learning and deep learning, with machine learning algorithms like decision tree and logistic regression being primarily used in the industry for interpretability.', "Deep learning is a subfield of machine learning, and it is usually what's behind the most human-like artificial intelligence.", 'Jupyter is a modern tool that allows data scientists to record their complete analysis process and is derived from the combination of Julia, Python, and R.', 'It is strongly recommended to install Python and Jupyter using Anaconda distribution, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.']}, {'end': 2386.11, 'start': 2154.719, 'title': 'Introduction to jupyter notebook', 'summary': "Introduces the jupyter notebook, explaining its components and functionalities, and demonstrates the interface's key features including launching, file management, and workspace organization.", 'duration': 231.391, 'highlights': ['A Jupyter notebook is fundamentally a JSON file with metadata, notebook format, and list of cells.', 'The Jupyter user interface has various components, necessitating familiarity for daily use.', 'The file tab in Jupyter displays a list of current files in the directory.', 'The upload button enables adding files to the notebook space, while the new menu presents options for text files, folders, terminal, and Python 3.', 'The refresh button in Jupyter is used to update the display but is not necessary as the display is reactive to changes in the underlying file structure.']}, {'end': 2678.748, 'start': 2386.611, 'title': 'Jupyter notebook overview', 'summary': 'Explains the functionalities of a jupyter notebook with details on notebook and file count, security features, and configuration options, and emphasizes the workflow and usage of jupyter notebooks.', 'duration': 292.137, 'highlights': ['The notebook section provides options for duplicating, moving, viewing, editing, or deleting notebooks, with a count of 18 notebooks and 0 running scripts.', 'The file section displays and updates counts for the files in the notebook, revealing the presence of 7 files, including data sets, CSV files, and text files.', 'Jupyter notebooks are designed for creating, organizing, and presenting analysis steps, and can incorporate interactive elements for user engagement.', 'Default security mechanisms for Jupyter notebooks include the sanitization of raw HTML and the restriction of running external JavaScripts, aiming to mitigate the risk of executing malicious code.', 'A security digest, combined with a secret known only to the notebook creator, is utilized to prevent the addition of malicious code to the notebook, enhancing its security.']}, {'end': 3094.239, 'start': 2679.228, 'title': 'Jupyter notebook basics', 'summary': 'Introduces the basics of jupyter notebook, including creating and running python code, accessing and analyzing datasets, and understanding machine learning concepts with examples and practical applications.', 'duration': 415.011, 'highlights': ['Jupyter Notebook automatically assigns the extension IPYNB to the file and marks it as running in the Jupyter environment, along with creating it in the local disk space.', 'Executing Python code in Jupyter Notebook and accessing the output with real-life applications of machine learning algorithms like classification.', 'Understanding the basics of Jupyter Notebook, including creating, running, and saving Python code, as well as accessing and analyzing datasets with practical examples.']}], 'duration': 2182.58, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ911659.jpg', 'highlights': ['Reinforcement learning in banking sector', 'Reinforcement learning in healthcare sector', 'Reinforcement learning in retail sector', 'Comparison between machine learning and deep learning', 'AI replicates human behavior and enables machines to learn from experience', 'Deep learning algorithm takes much less time to run during testing compared to machine learning algorithms', 'Jupyter notebooks are designed for creating, organizing, and presenting analysis steps', 'Default security mechanisms for Jupyter notebooks include the sanitization of raw HTML and the restriction of running external JavaScripts', 'Understanding the basics of Jupyter Notebook, including creating, running, and saving Python code']}, {'end': 4165.13, 'segs': [{'end': 3406.445, 'src': 'embed', 'start': 3371.378, 'weight': 2, 'content': [{'end': 3372.679, 'text': "Okay, so there's the code.", 'start': 3371.378, 'duration': 1.301}, {'end': 3376.782, 'text': "Let's just copy it copied and let's paste it.", 'start': 3373.019, 'duration': 3.763}, {'end': 3377.943, 'text': 'Okay First.', 'start': 3377.322, 'duration': 0.621}, {'end': 3379.064, 'text': 'Let me summarize things for you.', 'start': 3377.983, 'duration': 1.081}, {'end': 3380.024, 'text': 'What we are doing here.', 'start': 3379.364, 'duration': 0.66}, {'end': 3385.869, 'text': 'We are just checking the version of the different libraries, starting with python, will first check what version of python we are working on.', 'start': 3380.044, 'duration': 5.825}, {'end': 3391.112, 'text': "then we'll check what are the version of sci-fi we are using, then numpy matplotlib, then Panda, then scikit-learn.", 'start': 3385.869, 'duration': 5.243}, {'end': 3395.796, 'text': "Okay, so let's execute the run button and see what are the various version of libraries which we are using.", 'start': 3391.353, 'duration': 4.443}, {'end': 3396.803, 'text': 'Hit the run.', 'start': 3396.403, 'duration': 0.4}, {'end': 3399.234, 'text': 'So we are working on python 3.6.', 'start': 3396.923, 'duration': 2.311}, {'end': 3406.445, 'text': '4 sci-fi 1.0 numpy 1.14 matplotlib 2.12 pandas 0.22 and scikit-learn of version 0.19.', 'start': 3399.234, 'duration': 7.211}], 'summary': 'Checked versions of python (3.6), sci-fi (1.0), numpy (1.14), matplotlib (2.12), pandas (0.22), and scikit-learn (0.19).', 'duration': 35.067, 'max_score': 3371.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3371378.jpg'}, {'end': 3480.436, 'src': 'embed', 'start': 3441.214, 'weight': 1, 'content': [{'end': 3445.077, 'text': "if everything is working smoothly, then now it's the time to load the data set.", 'start': 3441.214, 'duration': 3.863}, {'end': 3450.842, 'text': "So, as I said, I'll be using the iris flower data set for this tutorial, but before loading the data set,", 'start': 3445.317, 'duration': 5.525}, {'end': 3455.005, 'text': "let's import all the modules function and the object which we are going to use in this tutorial.", 'start': 3450.842, 'duration': 4.163}, {'end': 3457.176, 'text': 'Same I have already written the set of code.', 'start': 3455.575, 'duration': 1.601}, {'end': 3458.838, 'text': "So let's just copy and paste them.", 'start': 3457.257, 'duration': 1.581}, {'end': 3460.559, 'text': "Let's load all the libraries.", 'start': 3459.158, 'duration': 1.401}, {'end': 3464.643, 'text': 'So these are the various libraries which will be using in our tutorial.', 'start': 3461.36, 'duration': 3.283}, {'end': 3467.425, 'text': 'So everything should work fine without an error.', 'start': 3465.523, 'duration': 1.902}, {'end': 3471.769, 'text': 'If you get an error just stop you need to work on your cyber environment before you continue any further.', 'start': 3467.605, 'duration': 4.164}, {'end': 3473.971, 'text': 'So I guess everything should work fine.', 'start': 3472.389, 'duration': 1.582}, {'end': 3475.332, 'text': "Let's hit the run button and see.", 'start': 3474.011, 'duration': 1.321}, {'end': 3478.434, 'text': 'Okay, it worked.', 'start': 3477.373, 'duration': 1.061}, {'end': 3480.436, 'text': "So let's now move ahead and load the data.", 'start': 3478.755, 'duration': 1.681}], 'summary': 'Loading the iris flower data set using various libraries and ensuring smooth functionality.', 'duration': 39.222, 'max_score': 3441.214, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3441214.jpg'}, {'end': 3652.382, 'src': 'embed', 'start': 3624.509, 'weight': 3, 'content': [{'end': 3630.212, 'text': 'So this is how my sample data set looks like separate separate battle and battle with and the class.', 'start': 3624.509, 'duration': 5.703}, {'end': 3637.098, 'text': "Okay So this is how our data set looks like now, let's move on and look at the summary of each attribute.", 'start': 3630.856, 'duration': 6.242}, {'end': 3643.56, 'text': 'What if I want to find out the count mean the minimum and the maximum values and some other percentiles as well.', 'start': 3637.598, 'duration': 5.962}, {'end': 3652.382, 'text': "So what should I do then for that print data set dot describe? What it will give let's see.", 'start': 3643.66, 'duration': 8.722}], 'summary': 'Analyzing sample dataset for attribute summary and descriptive statistics.', 'duration': 27.873, 'max_score': 3624.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3624509.jpg'}, {'end': 3916.937, 'src': 'embed', 'start': 3889.568, 'weight': 4, 'content': [{'end': 3894.589, 'text': 'Now if you want you can also create a histogram of each input variable to get a clear idea of the distribution.', 'start': 3889.568, 'duration': 5.021}, {'end': 3896.249, 'text': "So let's create a histogram for it.", 'start': 3894.849, 'duration': 1.4}, {'end': 3898.09, 'text': 'So data set dot his okay.', 'start': 3896.429, 'duration': 1.661}, {'end': 3899.69, 'text': 'I need to see it.', 'start': 3898.47, 'duration': 1.22}, {'end': 3900.97, 'text': 'So plot dot show.', 'start': 3899.85, 'duration': 1.12}, {'end': 3901.47, 'text': "Let's see.", 'start': 3901.13, 'duration': 0.34}, {'end': 3906.731, 'text': "So there's my histogram and it seems that we have two input variables that have a Gaussian distribution.", 'start': 3901.71, 'duration': 5.021}, {'end': 3910.872, 'text': 'So this is useful to note as we can use the algorithms that can exploit this assumption.', 'start': 3907.031, 'duration': 3.841}, {'end': 3916.937, 'text': 'Okay, so next comes the multivariate plot now that we have created the univariate plot to understand about each attribute.', 'start': 3911.172, 'duration': 5.765}], 'summary': 'Created histogram for input variables, 2 with gaussian distribution.', 'duration': 27.369, 'max_score': 3889.568, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3889568.jpg'}, {'end': 3957.312, 'src': 'embed', 'start': 3925.004, 'weight': 9, 'content': [{'end': 3928.266, 'text': 'This can be helpful to spot structured relationship between input variables.', 'start': 3925.004, 'duration': 3.262}, {'end': 3929.988, 'text': "Okay, so let's create a scatter matrix.", 'start': 3928.446, 'duration': 1.542}, {'end': 3933.431, 'text': 'So for creating a scatter plot we need scatter matrix.', 'start': 3930.248, 'duration': 3.183}, {'end': 3935.873, 'text': 'and we need to pass our data set into it.', 'start': 3933.991, 'duration': 1.882}, {'end': 3939.576, 'text': 'Okay, and then what I want I want to see it so plot dot show.', 'start': 3936.033, 'duration': 3.543}, {'end': 3942.018, 'text': 'So this is how my scatter matrix looks like.', 'start': 3939.756, 'duration': 2.262}, {'end': 3944.641, 'text': "it's like that, the diagonal grouping of some pair right?", 'start': 3942.018, 'duration': 2.623}, {'end': 3947.783, 'text': 'So this suggests a high correlation and a predictable relationship.', 'start': 3944.821, 'duration': 2.962}, {'end': 3949.745, 'text': 'All right, this was our multivariate plot.', 'start': 3947.984, 'duration': 1.761}, {'end': 3957.312, 'text': "Now, let's move on and evaluate some algorithm that's time to create some model of the data and estimate the accuracy on the basis of unseen data.", 'start': 3949.865, 'duration': 7.447}], 'summary': 'Creating scatter matrix to visualize structured relationships with high correlation in multivariate data, then evaluating algorithms for modeling with accuracy estimation.', 'duration': 32.308, 'max_score': 3925.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3925004.jpg'}, {'end': 4143.18, 'src': 'embed', 'start': 4116.111, 'weight': 0, 'content': [{'end': 4121.493, 'text': 'Okay, so model underscore selection, but before doing that what we have to do is split our training data set into two hours.', 'start': 4116.111, 'duration': 5.382}, {'end': 4129.497, 'text': 'Okay, so dot train underscore test underscore split what we want to split is the value of X and Y.', 'start': 4121.714, 'duration': 7.783}, {'end': 4141.52, 'text': 'Okay, and my test size is equals to validation size, which is a 0.20 correct and my random state is equal to seed.', 'start': 4129.497, 'duration': 12.023}, {'end': 4143.18, 'text': 'So what the seed is doing here.', 'start': 4141.96, 'duration': 1.22}], 'summary': 'Data set split into 80% training and 20% validation with a random seed.', 'duration': 27.069, 'max_score': 4116.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4116111.jpg'}], 'start': 3095.527, 'title': 'Data analysis techniques and model evaluation', 'summary': 'Covers anomaly detection and clustering for identifying unusual patterns, understanding clustering, regression, and machine learning algorithms, checking library versions, loading datasets, data visualization, and model evaluation, emphasizing the need for validation data sets and the process of splitting data.', 'chapters': [{'end': 3147.746, 'start': 3095.527, 'title': 'Anomaly detection & clustering in data analysis', 'summary': 'Discusses anomaly detection for identifying unusual patterns and its applications in business such as intrusion detection and fraud detection, followed by the use of clustering algorithms to group data based on similar conditions.', 'duration': 52.219, 'highlights': ['Anomaly detection is a technique used to identify unusual patterns that do not conform to expected behavior, with applications such as intrusion detection, system health monitoring, and fraud detection in credit card transactions.', 'Clustering algorithm is used to group data based on similar conditions, enabling tasks like identifying the types of houses in a segment or the type of customers buying a product.']}, {'end': 3369.937, 'start': 3147.746, 'title': 'Understanding clustering, regression, and machine learning algorithms', 'summary': 'Explains the concepts of clustering, regression, and machine learning algorithms, providing examples and applications, and demonstrating how to choose the best machine learning model using the iris dataset in python.', 'duration': 222.191, 'highlights': ['The chapter provides an example of clustering by considering a rental store owner clustering customers into 10 different groups based on their purchasing habits to tailor separate strategies for each group.', 'Regression algorithm is highlighted as an important and widely used machine learning and statistics tool for making predictions from data by learning the relationship between the features of the data and some observed continuous valued response, with applications such as stock price prediction.', 'The iris dataset is introduced as a small but well-understood project for machine learning, consisting of 150 observations of iris flowers with four measurement columns in centimeters and a fifth column specifying the species of the flower, providing a good opportunity for practicing classification problems.', 'The chapter demonstrates how to start coding using Anaconda with Python 3.0 installed, launching a Jupyter notebook for writing and executing Python codes, and checking the versions of Python libraries to ensure a short, interactive, and informative video.']}, {'end': 3851.665, 'start': 3371.378, 'title': 'Checking library versions and loading iris dataset', 'summary': 'Discusses checking the version of different libraries including python, sci-fi, numpy, matplotlib, pandas, and scikit-learn, executing the code to verify the versions, loading the iris flower dataset using panda, and creating visualization plots such as box and whisker plots.', 'duration': 480.287, 'highlights': ['The chapter covers checking the version of different libraries including Python, sci-fi, numpy, matplotlib, pandas, and scikit-learn, and executing the code to verify the versions. Python 3.6.4, sci-fi 1.0, numpy 1.14, matplotlib 2.12, pandas 0.22, and scikit-learn 0.19 versions are used.', "The process of loading the iris flower dataset using Panda is explained, including specifying the URL for data retrieval, defining column names, and loading the dataset using Panda's read_csv function.", 'Creating visualization plots such as box and whisker plots using the Pandas plot function, with the specification of layout structure and coordinate sharing, is demonstrated.']}, {'end': 4165.13, 'start': 3852.146, 'title': 'Data visualization and model evaluation', 'summary': 'Discusses the visualization of input attributes, including histograms and scatter plots, and the evaluation of model accuracy through validation data set creation and tenfold cross validation, with an emphasis on the need for a validation data set and the process of splitting the data set into training and validation sets.', 'duration': 312.984, 'highlights': ["The chapter emphasizes the need for a validation data set to assess the model's performance on unseen data, and the process involves splitting the loaded data into training (80%) and validation (20%) sets, with the importance of using a seed value to maintain randomness in the split.", 'It discusses the creation of various visualizations such as histograms to understand the distribution of input variables and scatter plots to identify the interaction between different attributes, highlighting the presence of a Gaussian distribution in two input variables and the structured relationships and high correlation observed in the scatter matrix.', 'The chapter describes the process of creating a test harness using tenfold cross validation to estimate the accuracy of the model, involving the splitting of the data set into 10 parts for training and testing, and emphasizes the use of algorithms that can exploit the assumption of Gaussian distribution in input variables.', "It explains the significance of evaluating the model's accuracy on unseen data and the use of statistical methods to estimate the accuracy of the best model, with a focus on the need for a concrete estimate by evaluating the model on actual unseen data."]}], 'duration': 1069.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ3095527.jpg', 'highlights': ['Anomaly detection identifies unusual patterns for intrusion detection, system health monitoring, and fraud detection.', 'Clustering groups data based on similar conditions for tasks like identifying customer segments.', 'Regression is a widely used tool for making predictions from data, with applications such as stock price prediction.', 'The iris dataset is a well-understood project for machine learning, providing a good opportunity for practicing classification problems.', 'The chapter demonstrates how to start coding using Anaconda with Python 3.0 installed and checking library versions.', 'The process of loading the iris flower dataset using Panda is explained, including specifying the URL for data retrieval.', 'Creating visualization plots such as box and whisker plots using the Pandas plot function is demonstrated.', "The chapter emphasizes the need for a validation data set to assess the model's performance on unseen data.", 'The creation of various visualizations such as histograms and scatter plots is discussed.', 'The chapter describes the process of creating a test harness using tenfold cross validation to estimate the accuracy of the model.']}, {'end': 5404.308, 'segs': [{'end': 4222.624, 'src': 'embed', 'start': 4192.64, 'weight': 5, 'content': [{'end': 4194.183, 'text': 'giving a percentage example.', 'start': 4192.64, 'duration': 1.543}, {'end': 4197.906, 'text': "It's 98% accurate or 99% accurate things like that.", 'start': 4194.343, 'duration': 3.563}, {'end': 4203.992, 'text': "Okay, so we'll be using this scoring variable when we run the build and evaluate each model in the next step.", 'start': 4198.046, 'duration': 5.946}, {'end': 4206.174, 'text': 'So next part is building model.', 'start': 4204.692, 'duration': 1.482}, {'end': 4211.597, 'text': "Till now we don't know which algorithm would be good for this problem or what configuration to use.", 'start': 4207.034, 'duration': 4.563}, {'end': 4214.118, 'text': "So let's begin with six different algorithm.", 'start': 4211.977, 'duration': 2.141}, {'end': 4220.723, 'text': "I'll be using logistic regression, linear discriminant analysis k nearest neighbor classification and regression trees,", 'start': 4214.339, 'duration': 6.384}, {'end': 4222.624, 'text': 'neighbors and code vector machine.', 'start': 4220.723, 'duration': 1.901}], 'summary': 'Using 6 algorithms to build models, aiming for 98-99% accuracy in scoring.', 'duration': 29.984, 'max_score': 4192.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4192640.jpg'}, {'end': 4276.188, 'src': 'embed', 'start': 4233.551, 'weight': 1, 'content': [{'end': 4238.734, 'text': 'which included the KNN algorithm, the cart algorithm that the neighbor is and the support vector machines.', 'start': 4233.551, 'duration': 5.183}, {'end': 4246.82, 'text': 'Okay, so we reset the random number seed before each run to ensure that evaluation of each algorithm is performed using exactly the same data spreads.', 'start': 4239.034, 'duration': 7.786}, {'end': 4249.341, 'text': 'It ensures the result are directly comparable.', 'start': 4247.12, 'duration': 2.221}, {'end': 4251.883, 'text': 'Okay, so let me just copy and paste it.', 'start': 4249.581, 'duration': 2.302}, {'end': 4259.235, 'text': 'Okay So what we are doing here, we are building five different types of model.', 'start': 4252.864, 'duration': 6.371}, {'end': 4266.821, 'text': 'We are building logistic regression linear discriminant analysis K nearest neighbor legend tree Gaussian a bias and the support vector machine.', 'start': 4259.435, 'duration': 7.386}, {'end': 4270.564, 'text': "Okay Next what we'll do will evaluate model in each turn.", 'start': 4267.001, 'duration': 3.563}, {'end': 4276.188, 'text': 'Okay So what is this? So we have six different model and accuracy estimation for each one of them.', 'start': 4270.584, 'duration': 5.604}], 'summary': 'Building and evaluating six different models including logistic regression, knn, cart, lda, naive bayes, and svm with accuracy estimations.', 'duration': 42.637, 'max_score': 4233.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4233551.jpg'}, {'end': 4328.463, 'src': 'embed', 'start': 4295.088, 'weight': 4, 'content': [{'end': 4295.988, 'text': 'What is the accuracy?', 'start': 4295.088, 'duration': 0.9}, {'end': 4297.308, 'text': 'and so and so okay.', 'start': 4295.988, 'duration': 1.32}, {'end': 4302.929, 'text': 'So from the output, it seems that LDL got them was the most accurate model that we tested now.', 'start': 4297.769, 'duration': 5.16}, {'end': 4307.53, 'text': 'We want to get an idea of the accuracy of the model on our validation set or the testing data set.', 'start': 4302.969, 'duration': 4.561}, {'end': 4311.131, 'text': 'So this will give us an independent final check on the accuracy of the best model.', 'start': 4307.63, 'duration': 3.501}, {'end': 4319.294, 'text': 'It is always valuable to keep a testing data set, for just in case you made a overfitting to the testing data set or you made a data leak,', 'start': 4311.807, 'duration': 7.487}, {'end': 4321.556, 'text': 'both will result in a overly optimistic result.', 'start': 4319.294, 'duration': 2.262}, {'end': 4328.463, 'text': 'Okay, you can run the LDA model directly on the validation set and summarize the result as a final score,', 'start': 4321.837, 'duration': 6.626}], 'summary': 'The ldl model was the most accurate, validated on the testing data set.', 'duration': 33.375, 'max_score': 4295.088, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4295088.jpg'}, {'end': 4361.7, 'src': 'embed', 'start': 4335.902, 'weight': 3, 'content': [{'end': 4345.709, 'text': 'Statistics and probability are essential because these disciples form the basic foundation of all machine learning algorithms, deep learning,', 'start': 4335.902, 'duration': 9.807}, {'end': 4348.331, 'text': 'artificial intelligence and data science.', 'start': 4345.709, 'duration': 2.622}, {'end': 4352.934, 'text': 'In fact, mathematics and probability is behind everything around us.', 'start': 4348.931, 'duration': 4.003}, {'end': 4361.7, 'text': 'From shapes, patterns and colors to the count of petals in a flower, mathematics is embedded in each and every aspect of our lives.', 'start': 4353.414, 'duration': 8.286}], 'summary': 'Statistics and probability are foundational to machine learning, ai, and data science, influencing everything in our lives.', 'duration': 25.798, 'max_score': 4335.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4335902.jpg'}, {'end': 4867.422, 'src': 'embed', 'start': 4835.949, 'weight': 0, 'content': [{'end': 4840.29, 'text': "I'm sure all of you know what is independent variable and dependent variable right?", 'start': 4835.949, 'duration': 4.341}, {'end': 4846.572, 'text': 'Dependent variable is any variable whose value depends on any other independent variable.', 'start': 4840.591, 'duration': 5.981}, {'end': 4850.291, 'text': 'So guys, that much knowledge I expect all of you to have.', 'start': 4847.569, 'duration': 2.722}, {'end': 4855.995, 'text': "So now let's move on and look at our next topic, which is what is statistics?", 'start': 4851.332, 'duration': 4.663}, {'end': 4867.422, 'text': 'Now coming to the formal definition of statistics, statistics is an area of applied mathematics which is concerned with data collection, analysis,', 'start': 4856.975, 'duration': 10.447}], 'summary': 'Dependent variable depends on independent variable. statistics is applied mathematics for data analysis.', 'duration': 31.473, 'max_score': 4835.949, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4835949.jpg'}], 'start': 4165.13, 'title': 'Statistical concepts in machine learning', 'summary': 'Covers the process of evaluating six different models in machine learning and identifies the most accurate model, emphasizes the importance of statistics and probability in machine learning and data science, discusses understanding statistics in mathematics, delves into the concept of population and sample in statistics, and explains sampling techniques and probability, providing a comprehensive overview of statistical concepts with practical examples and applications.', 'chapters': [{'end': 4330.665, 'start': 4165.13, 'title': 'Model evaluation process', 'summary': 'Describes the process of building and evaluating six different models using logistic regression, linear discriminant analysis, k nearest neighbor, regression trees, gaussian naive bayes, and support vector machine, with the most accurate model identified as linear discriminant analysis.', 'duration': 165.535, 'highlights': ['Linear Discriminant Analysis identified as the most accurate model', 'Use of accuracy scoring metric for model evaluation', 'Building and evaluating six different types of models']}, {'end': 4855.995, 'start': 4335.902, 'title': 'Importance of statistics and probability in machine learning', 'summary': 'Discusses the importance of statistics and probability as the foundational concepts in machine learning and data science, covering topics such as data categories, types of statistics, measures of center and spread, probability distributions, and types of probability, while also emphasizing the ubiquity and significance of data in decision-making and analysis.', 'duration': 520.093, 'highlights': ['The chapter discusses the importance of statistics and probability as the foundational concepts in machine learning and data science', 'Covering topics such as data categories, types of statistics, measures of center and spread, probability distributions, and types of probability', 'Emphasizing the ubiquity and significance of data in decision-making and analysis']}, {'end': 4999.258, 'start': 4856.975, 'title': 'Understanding statistics in mathematics', 'summary': 'Discusses the various aspects of statistics, including data collection, interpretation, and visualization, and provides examples of real-world problems that can be solved using statistical methods, such as testing the effectiveness of a new drug and analyzing sales data for business improvement.', 'duration': 142.283, 'highlights': ['Statistics encompasses data collection, interpretation, and visualization, along with analysis.', 'Examples of real-world problems solvable by statistics include testing the effectiveness of a new drug and analyzing sales data for business improvement.', 'Using statistical methods to visualize data, collect data, and interpret data is a fundamental aspect of statistics.', 'Data analysis in statistics involves identifying the relationship between different variables or components in a business.']}, {'end': 5133.759, 'start': 5000.539, 'title': 'Understanding population and sample in statistics', 'summary': 'Explains the concept of population and sample in statistics, emphasizing the importance of choosing a representative sample, and discusses the purpose of sampling and its relevance in statistical analysis, using the example of a survey on the eating habits of teenagers in the us.', 'duration': 133.22, 'highlights': ['Sampling is a statistical method that deals with the selection of individual observations within a population, performed to infer statistical knowledge about a population.', 'The importance of choosing a sample that represents the entire population is emphasized, as a well-chosen sample will contain most of the information about a particular population parameter.', 'The example of conducting a survey on the eating habits of teenagers in the US, with a population of over 42 million, illustrates the impracticality of surveying the entire population, highlighting the necessity and relevance of sampling in such scenarios.']}, {'end': 5404.308, 'start': 5135.009, 'title': 'Sampling techniques and probability', 'summary': 'Discusses the concept of sampling as a method to draw inference about the entire population, focusing on probability sampling techniques including random, systematic, and stratified sampling, as well as the types of statistics - descriptive and inferential.', 'duration': 269.299, 'highlights': ['The chapter explains probability sampling techniques including random, systematic, and stratified sampling, which are shortcuts to studying the entire population, ensuring each member has an equal chance of being selected.', 'Descriptive and inferential statistics are introduced as the two major types of statistics, which will be discussed in depth in the session.', 'The chapter briefly mentions non-probability sampling techniques, including snowball, quota judgment, and convenient sampling, but focuses solely on probability sampling techniques.']}], 'duration': 1239.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ4165130.jpg', 'highlights': ['Linear Discriminant Analysis identified as the most accurate model', 'Use of accuracy scoring metric for model evaluation', 'Building and evaluating six different types of models', 'The chapter discusses the importance of statistics and probability as foundational concepts in machine learning and data science', 'Covering topics such as data categories, types of statistics, measures of center and spread, probability distributions, and types of probability', 'Sampling is a statistical method that deals with the selection of individual observations within a population, performed to infer statistical knowledge about a population', 'The importance of choosing a sample that represents the entire population is emphasized', 'The chapter explains probability sampling techniques including random, systematic, and stratified sampling, which are shortcuts to studying the entire population, ensuring each member has an equal chance of being selected']}, {'end': 6670.445, 'segs': [{'end': 5543.777, 'src': 'embed', 'start': 5506.786, 'weight': 0, 'content': [{'end': 5512.23, 'text': 'you will take a sample set of the class, which is basically a few people from the entire class.', 'start': 5506.786, 'duration': 5.444}, {'end': 5517.074, 'text': 'You already have had grouped the class into large, medium, and small.', 'start': 5513.291, 'duration': 3.783}, {'end': 5524.159, 'text': 'In this method, you basically build a statistical model and expand it for the entire population in the class.', 'start': 5517.954, 'duration': 6.205}, {'end': 5528.941, 'text': 'So guys, that was a brief understanding of descriptive and inferential statistics.', 'start': 5524.779, 'duration': 4.162}, {'end': 5532.043, 'text': "So that's the difference between descriptive and inferential.", 'start': 5529.342, 'duration': 2.701}, {'end': 5541.188, 'text': "Now in the next section, we'll go in depth about descriptive statistics, all right? So let's discuss more about descriptive statistics.", 'start': 5532.583, 'duration': 8.605}, {'end': 5543.777, 'text': 'So, like I mentioned earlier,', 'start': 5542.256, 'duration': 1.521}], 'summary': 'Using a sample set, a statistical model is built to represent the entire class population, discussing the difference between descriptive and inferential statistics.', 'duration': 36.991, 'max_score': 5506.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ5506786.jpg'}, {'end': 5954.194, 'src': 'embed', 'start': 5929.415, 'weight': 4, 'content': [{'end': 5935.099, 'text': "So to better understand how quartile and interquartile are calculated, let's look at a small example.", 'start': 5929.415, 'duration': 5.684}, {'end': 5943.407, 'text': 'Now, this data set basically represents the marks of 100 students ordered from the lowest to the highest scores.', 'start': 5935.902, 'duration': 7.505}, {'end': 5947.009, 'text': 'Alright, so the quartiles lie in the following ranges.', 'start': 5944.087, 'duration': 2.922}, {'end': 5954.194, 'text': 'Now, the first quartile, which is also known as Q1, it lies between the 25th and the 26th observation.', 'start': 5947.029, 'duration': 7.165}], 'summary': 'Explanation of quartile calculation using student marks data.', 'duration': 24.779, 'max_score': 5929.415, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ5929415.jpg'}, {'end': 6148.366, 'src': 'embed', 'start': 6118.095, 'weight': 8, 'content': [{'end': 6120.135, 'text': 'N is the number of samples in your data set.', 'start': 6118.095, 'duration': 2.04}, {'end': 6123.476, 'text': "Now let's look at sample variance.", 'start': 6121.395, 'duration': 2.081}, {'end': 6127.578, 'text': 'Now sample variance is the average of squared differences from the mean.', 'start': 6123.976, 'duration': 3.602}, {'end': 6131.82, 'text': 'Here xi is any data point or any sample in your data set.', 'start': 6128.158, 'duration': 3.662}, {'end': 6135.201, 'text': 'X bar is the mean of your sample.', 'start': 6132.32, 'duration': 2.881}, {'end': 6136.942, 'text': "It's not the mean of your population.", 'start': 6135.301, 'duration': 1.641}, {'end': 6138.522, 'text': "It's the mean of your sample.", 'start': 6137.422, 'duration': 1.1}, {'end': 6141.804, 'text': 'And if you notice, n here is a smaller n.', 'start': 6138.843, 'duration': 2.961}, {'end': 6144.005, 'text': "It's the number of data points in your sample.", 'start': 6141.804, 'duration': 2.201}, {'end': 6148.366, 'text': 'And this is basically the difference between sample and population variance.', 'start': 6144.905, 'duration': 3.461}], 'summary': 'Sample variance is the average of squared differences from the mean, with n representing the number of data points in the sample.', 'duration': 30.271, 'max_score': 6118.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ6118095.jpg'}], 'start': 5404.904, 'title': 'Statistical measures and information gain', 'summary': 'Covers descriptive and inferential statistics, measures of central tendency, measures of spread, and information gain in machine learning, using examples to illustrate concepts and enhance understanding.', 'chapters': [{'end': 5567.889, 'start': 5404.904, 'title': 'Descriptive vs inferential stats', 'summary': 'Explains the basics of descriptive and inferential statistics, illustrating descriptive statistics using the example of determining the average shirt size of students in a class and discussing measures of central tendency and measures of variability.', 'duration': 162.985, 'highlights': ['Descriptive statistics is a method used to describe and understand the features of a specific data set.', 'Inferential statistics makes inferences and predictions about a population based on a sample of data.', 'Descriptive statistics includes measures of central tendency and measures of variability.']}, {'end': 5861.495, 'start': 5568.502, 'title': 'Measures of center and spread', 'summary': 'Discusses the measures of center (mean, median, and mode) and measures of spread (range, interquartile range, variance, and standard deviation) with examples from a car dataset, showcasing the calculation and application of these statistical measures.', 'duration': 292.993, 'highlights': ['Mean is the average of all values in a sample, calculated by summing up all values and dividing by the number of samples.', 'Median is the middle value in a sample set, calculated by arranging values in ascending or descending order and choosing the middle value or the average of two middle values for an even number of data points.', 'Mode is the most recurrent value in a sample set, representing the value that occurs most often.', 'Measures of spread include range, interquartile range, variance, and standard deviation, providing insights into the variability or spread of a dataset.']}, {'end': 6227.663, 'start': 5861.895, 'title': 'Measures of spread: range, iqr, variance, and standard deviation', 'summary': 'Covers measures of spread such as range, interquartile range, variance, and standard deviation, explaining their calculations and applications with examples, aiming to enhance understanding of variability in data sets.', 'duration': 365.768, 'highlights': ['Calculation of Interquartile Range (IQR)', 'Calculation of Variance', 'Calculation of Standard Deviation', 'Explanation of Range']}, {'end': 6520.823, 'start': 6227.783, 'title': 'Information gain and entropy', 'summary': 'Discusses the importance of information gain and entropy in building machine learning models, explaining entropy as a measure of uncertainty in data and information gain as an indicator of feature relevance, and provides a use case example to understand their significance.', 'duration': 293.04, 'highlights': ['Decision trees and random forest heavily rely on information gain and entropy, key in machine learning algorithms.', 'Entropy is a measure of uncertainty in the data, quantified using a specific formula involving instances, classes, and event probability.', 'Information gain measures how much information a feature provides about the final outcome, calculated using a formula with terms like entropy and number of instances.', 'A use case is provided to demonstrate the application of information gain and entropy in predicting match playability based on weather conditions and predictor variables like outlook, humidity, and wind.', 'Decision trees are used to cluster data and make predictions, with the root node being crucial in the decision-making process.']}, {'end': 6670.445, 'start': 6520.823, 'title': 'Decision tree analysis', 'summary': 'Explains the process of decision tree analysis, emphasizing the significance of selecting the outlook variable as the root node due to its 100% pure subset, and the use of entropy to measure impurity and uncertainty in the data set.', 'duration': 149.622, 'highlights': ["The outlook variable has been chosen as the root node due to its 100% pure subset with the 'overcast' value, demonstrating the significance of selecting variables with lower entropy for building a precise model.", 'Entropy is utilized to measure the impurity or uncertainty of a variable, with lower entropy indicating a more significant variable for predicting precise outcomes in the decision tree.', 'The significance of selecting the best attribute for the root node in a decision tree to predict the most precise outcome is emphasized, with the outlook variable being chosen for its significance.']}], 'duration': 1265.541, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ5404904.jpg', 'highlights': ['Decision trees and random forest heavily rely on information gain and entropy, key in machine learning algorithms.', 'Descriptive statistics is a method used to describe and understand the features of a specific data set.', 'Inferential statistics makes inferences and predictions about a population based on a sample of data.', 'Descriptive statistics includes measures of central tendency and measures of variability.', 'Mean is the average of all values in a sample, calculated by summing up all values and dividing by the number of samples.', 'Median is the middle value in a sample set, calculated by arranging values in ascending or descending order and choosing the middle value or the average of two middle values for an even number of data points.', 'Mode is the most recurrent value in a sample set, representing the value that occurs most often.', 'Information gain measures how much information a feature provides about the final outcome, calculated using a formula with terms like entropy and number of instances.', 'A use case is provided to demonstrate the application of information gain and entropy in predicting match playability based on weather conditions and predictor variables like outlook, humidity, and wind.', "The outlook variable has been chosen as the root node due to its 100% pure subset with the 'overcast' value, demonstrating the significance of selecting variables with lower entropy for building a precise model."]}, {'end': 7733.052, 'segs': [{'end': 7017.203, 'src': 'embed', 'start': 6989.548, 'weight': 8, 'content': [{'end': 6994.811, 'text': "Now guys, what is a confusion matrix? Now don't get confused, this is not any complex topic.", 'start': 6989.548, 'duration': 5.263}, {'end': 7001.114, 'text': 'Now a confusion matrix is a matrix that is often used to describe the performance of a model.', 'start': 6995.311, 'duration': 5.803}, {'end': 7006.677, 'text': 'And this is specifically used for classification models or a classifier.', 'start': 7002.075, 'duration': 4.602}, {'end': 7017.203, 'text': 'And what it does is it will calculate the accuracy or it will calculate the performance of your classifier by comparing your actual results and your predicted results.', 'start': 7007.298, 'duration': 9.905}], 'summary': 'Confusion matrix evaluates classifier performance for classification models.', 'duration': 27.655, 'max_score': 6989.548, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ6989548.jpg'}, {'end': 7099.937, 'src': 'embed', 'start': 7072.339, 'weight': 0, 'content': [{'end': 7081.443, 'text': "You'll feed all of these 165 observations to your classifier and it will predict the output every time a new patient's detail is fed to the classifier.", 'start': 7072.339, 'duration': 9.104}, {'end': 7089.386, 'text': "Now out of these 165 cases, let's say that the classifier predicted yes 110 times and no 55 times.", 'start': 7082.623, 'duration': 6.763}, {'end': 7092.869, 'text': 'all right.', 'start': 7092.088, 'duration': 0.781}, {'end': 7094.931, 'text': 'so yes, basically stands for.', 'start': 7092.869, 'duration': 2.062}, {'end': 7099.937, 'text': 'yes, the person has a disease and no stands for no, the person does not have a disease.', 'start': 7094.931, 'duration': 5.006}], 'summary': 'Classifier predicted yes 110 times and no 55 times out of 165 cases.', 'duration': 27.598, 'max_score': 7072.339, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ7072339.jpg'}, {'end': 7656.409, 'src': 'embed', 'start': 7629.839, 'weight': 4, 'content': [{'end': 7635.161, 'text': 'Now what this means is that the probability value is denoted by the area of the graph.', 'start': 7629.839, 'duration': 5.322}, {'end': 7645.345, 'text': 'So whatever value that you get here, which is basically one, is the probability that a random variable will lie between the range A and B.', 'start': 7636.021, 'duration': 9.324}, {'end': 7648.446, 'text': 'So I hope all of you have understood the probability density function.', 'start': 7645.345, 'duration': 3.101}, {'end': 7656.409, 'text': "It's basically the probability of finding the value of a continuous random variable between the range A and B.", 'start': 7648.846, 'duration': 7.563}], 'summary': 'Probability value denotes area of graph, one is probability of random variable lying between range a and b.', 'duration': 26.57, 'max_score': 7629.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ7629839.jpg'}, {'end': 7716.031, 'src': 'embed', 'start': 7672.677, 'weight': 3, 'content': [{'end': 7680.659, 'text': 'Meaning that the idea behind this function is that the data near the mean occurs more frequently than the data away from the mean.', 'start': 7672.677, 'duration': 7.982}, {'end': 7686.06, 'text': 'So what it means to say is that the data around the mean represents the entire data set.', 'start': 7681.339, 'duration': 4.721}, {'end': 7691.78, 'text': 'So if you just take a sample of data around the mean, it can represent the entire data set.', 'start': 7686.878, 'duration': 4.902}, {'end': 7698.383, 'text': 'Now similar to the probability density function, the normal distribution appears as a bell curve.', 'start': 7692.16, 'duration': 6.223}, {'end': 7703.145, 'text': 'Now when it comes to normal distribution, there are two important factors.', 'start': 7698.963, 'duration': 4.182}, {'end': 7707.627, 'text': 'We have the mean of the population and the standard deviation.', 'start': 7703.665, 'duration': 3.962}, {'end': 7712.829, 'text': 'So the mean in the graph determines the location of the center of the graph.', 'start': 7708.348, 'duration': 4.481}, {'end': 7716.031, 'text': 'And the standard deviation determines the height of the graph.', 'start': 7713.31, 'duration': 2.721}], 'summary': 'Normal distribution: data near mean occurs more frequently, represented by bell curve with mean and standard deviation.', 'duration': 43.354, 'max_score': 7672.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ7672677.jpg'}], 'start': 6671.245, 'title': 'Decision tree: information gain and entropy', 'summary': 'Discusses the use of information gain and entropy in decision trees, emphasizing the selection of the variable with the highest information gain for the root node. it also covers the calculation of information gain for different attributes, with specific values provided, and introduces the concept of a confusion matrix for evaluating classification model performance. additionally, it explains the concept of confusion matrix using a practical example and delves into probability basics including terminologies, types of events, and probability distribution functions.', 'chapters': [{'end': 6797.268, 'start': 6671.245, 'title': 'Decision tree: information gain and entropy', 'summary': 'Discusses the use of information gain and entropy in decision trees to determine the best variable for splitting the dataset, with a focus on calculating entropy and information gain for different attributes, and emphasizing the selection of the variable with the highest information gain for the root node.', 'duration': 126.023, 'highlights': ['The chapter discusses the use of information gain and entropy in decision trees to determine the best variable for splitting the dataset.', 'Calculating entropy and information gain for different attributes is emphasized.', 'Emphasizing the selection of the variable with the highest information gain for the root node.']}, {'end': 7029.683, 'start': 6798.088, 'title': 'Information gain in decision trees', 'summary': 'Discusses the calculation of information gain for different attributes, with the outlook variable having the highest gain at 0.247, followed by humidity at 0.151 and temperature at 0.029, and then introduces the concept of a confusion matrix for evaluating the performance of classification models.', 'duration': 231.595, 'highlights': ['The outlook variable has the highest information gain of 0.247, making it the most suitable for splitting the data at the root node.', 'The humidity variable has an information gain of 0.151, which is lower than that of the outlook variable.', 'The temperature attribute has the lowest information gain of 0.029, making it the least suitable for splitting the data at the root node.', 'Introduces the confusion matrix as a tool for evaluating the performance of classification models by comparing actual and predicted results.']}, {'end': 7733.052, 'start': 7030.183, 'title': 'Confusion matrix & probability basics', 'summary': 'Explains the concept of confusion matrix using an example of 165 patients to calculate the accuracy of a model, and then delves into the basics of probability including terminologies, types of events, and probability distribution functions such as probability density function, normal distribution, and central limit theorem.', 'duration': 702.869, 'highlights': ['The chapter explains the concept of confusion matrix using an example of 165 patients to calculate the accuracy of a model', 'The basics of probability including terminologies, types of events, and probability distribution functions are covered']}], 'duration': 1061.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ6671245.jpg', 'highlights': ['The outlook variable has the highest information gain of 0.247.', 'The humidity variable has an information gain of 0.151.', 'The temperature attribute has the lowest information gain of 0.029.', 'The chapter discusses the use of information gain and entropy in decision trees.', 'Calculating entropy and information gain for different attributes is emphasized.', 'Emphasizing the selection of the variable with the highest information gain for the root node.', 'Introduces the confusion matrix as a tool for evaluating the performance of classification models.', 'The chapter explains the concept of confusion matrix using an example of 165 patients to calculate the accuracy of a model.', 'The basics of probability including terminologies, types of events, and probability distribution functions are covered.']}, {'end': 8822.951, 'segs': [{'end': 7762.114, 'src': 'embed', 'start': 7733.492, 'weight': 1, 'content': [{'end': 7741.799, 'text': 'Now, the central limit theory states that the sampling distribution of the mean of any independent random variable will be normal,', 'start': 7733.492, 'duration': 8.307}, {'end': 7744.761, 'text': 'or nearly normal if the sample size is large enough.', 'start': 7741.799, 'duration': 2.962}, {'end': 7746.643, 'text': "Now that's a little confusing.", 'start': 7745.221, 'duration': 1.422}, {'end': 7748.264, 'text': 'Okay, let me break it down for you.', 'start': 7746.963, 'duration': 1.301}, {'end': 7755.169, 'text': 'Now, in simple terms, if we had a large population and we divided it into many samples,', 'start': 7748.824, 'duration': 6.345}, {'end': 7762.114, 'text': 'then the mean of all the samples from the population will be almost equal to the mean of the entire population.', 'start': 7755.169, 'duration': 6.945}], 'summary': 'Central limit theory: sampling mean approaches population mean with large samples.', 'duration': 28.622, 'max_score': 7733.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ7733492.jpg'}, {'end': 8584.972, 'src': 'embed', 'start': 8554.985, 'weight': 4, 'content': [{'end': 8557.306, 'text': 'All you have to do is substitute the value.', 'start': 8554.985, 'duration': 2.321}, {'end': 8563.708, 'text': "If you want a second or two, I'm gonna pause on the screen so that you can go through this in a more clearer way.", 'start': 8558.346, 'duration': 5.362}, {'end': 8568.628, 'text': 'Remember that you need to calculate two probabilities.', 'start': 8566.127, 'duration': 2.501}, {'end': 8578.29, 'text': "The first probability that you need to calculate is the event of picking a blue ball from bag A given that you're picking exactly two blue balls.", 'start': 8569.168, 'duration': 9.122}, {'end': 8584.972, 'text': 'The second probability you need to calculate is the event of picking exactly two blue balls.', 'start': 8579.491, 'duration': 5.481}], 'summary': 'Calculating two probabilities: picking blue balls from bag a and picking exactly two blue balls.', 'duration': 29.987, 'max_score': 8554.985, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ8554985.jpg'}, {'end': 8830.895, 'src': 'embed', 'start': 8802.501, 'weight': 0, 'content': [{'end': 8806.463, 'text': 'Apart from interval estimation, we also have something known as margin of error.', 'start': 8802.501, 'duration': 3.962}, {'end': 8809.484, 'text': "So I'll be discussing all of this in the upcoming slides.", 'start': 8806.763, 'duration': 2.721}, {'end': 8812.886, 'text': "So first let's understand what is interval estimate.", 'start': 8809.945, 'duration': 2.941}, {'end': 8821.19, 'text': 'An interval or range of values which are used to estimate a population parameter is known as an interval estimation.', 'start': 8813.967, 'duration': 7.223}, {'end': 8822.951, 'text': "That's very understandable.", 'start': 8821.71, 'duration': 1.241}, {'end': 8828.354, 'text': "Basically what they're trying to say is you're going to estimate the value of a parameter.", 'start': 8823.211, 'duration': 5.143}, {'end': 8830.895, 'text': "Let's say you're trying to find the mean of a population.", 'start': 8828.474, 'duration': 2.421}], 'summary': 'Interval estimation involves estimating a population parameter by defining a range of values, such as the mean of a population.', 'duration': 28.394, 'max_score': 8802.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ8802501.jpg'}], 'start': 7733.492, 'title': 'Probability and estimation', 'summary': "Discusses the central limit theorem and types of probability, emphasizing the mean of each sample from a large population and important types of probability. it also covers probability in training and salary using a dataset of 105 candidates and highlights bayes' theorem and point estimation methods such as method of moments and maximum likelihood.", 'chapters': [{'end': 7932.318, 'start': 7733.492, 'title': 'Central limit theorem & types of probability', 'summary': 'Discusses the central limit theorem, stating that the mean of each sample from a large population will be almost equal to the mean of the entire population, along with a clear understanding of its application. additionally, it covers the important types of probability: marginal, joint, and conditional probability, with examples and their significance in problem-solving.', 'duration': 198.826, 'highlights': ['The Central Limit Theorem states that the mean of each sample from a large population will be almost equal to the mean of the entire population, with a graph illustrating the concept for clarity.', 'The accuracy of the resemblance to the normal distribution depends on the number of sample points considered and the shape of the underlying population.', 'Central Limit Theorem holds true only for a large data set, as small data sets show more deviations due to the scaling factor.', 'Marginal probability, or unconditional probability, is the probability of an event occurring unconditioned on any other event.', 'Joint probability measures the likelihood of two events happening at the same time, illustrated with an example of finding the probability of drawing a four and a red card.']}, {'end': 8228.909, 'start': 7932.318, 'title': 'Probability in training and salary', 'summary': "Covers joint probability, conditional probability, and marginal probability using a dataset of 105 candidates, where 45 have undergone edureka's training, and highlights the calculation of the probabilities for different scenarios.", 'duration': 296.591, 'highlights': ["The probability that a candidate has undergone Edureka's training is 45 divided by 105, resulting in a value of approximately 0.42.", "The joint probability of a candidate attending Edureka's training and having a good package is 30 divided by 105.", 'The conditional probability of a candidate having a good package given that they have not undergone training is 5 divided by 60, as only 5 out of 60 candidates without training have received a good package.']}, {'end': 8822.951, 'start': 8229.17, 'title': "Understanding bayes' theorem and point estimation", 'summary': "Introduces bayes' theorem, emphasizing its importance in statistics and probability, and then discusses point estimation, including methods like method of moments, maximum likelihood, bayes estimator, and best unbiased estimators, with a focus on interval estimation and margin of error.", 'duration': 593.781, 'highlights': ["Bayes' Theorem is a very important concept in statistics and probability, used in naive bias algorithm and Gmail spam filtering.", "Explanation of Bayes' Theorem and its mathematical representation, defining terms like likelihood ratio, posterior, and prior, illustrating its application in solving a probability problem.", 'Introduction to point estimation, focusing on the concept of using sample data to estimate a single value representing an unknown population parameter, and the methods like method of moments, maximum likelihood, Bayes estimator, and best unbiased estimators.', 'Explanation of interval estimation, describing it as a range of values used to estimate a population parameter, and the concept of margin of error in statistics.']}], 'duration': 1089.459, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ7733492.jpg', 'highlights': ['The Central Limit Theorem states that the mean of each sample from a large population will be almost equal to the mean of the entire population, with a graph illustrating the concept for clarity.', "Bayes' Theorem is a very important concept in statistics and probability, used in naive bias algorithm and Gmail spam filtering.", 'Introduction to point estimation, focusing on the concept of using sample data to estimate a single value representing an unknown population parameter, and the methods like method of moments, maximum likelihood, Bayes estimator, and best unbiased estimators.', 'The accuracy of the resemblance to the normal distribution depends on the number of sample points considered and the shape of the underlying population.', "The probability that a candidate has undergone Edureka's training is 45 divided by 105, resulting in a value of approximately 0.42."]}, {'end': 10724.409, 'segs': [{'end': 9373.162, 'src': 'heatmap', 'start': 9021.705, 'weight': 1, 'content': [{'end': 9026.007, 'text': 'You can say that it is a deviation from the actual point estimate.', 'start': 9021.705, 'duration': 4.302}, {'end': 9029.408, 'text': 'Now the margin of error can be calculated using this formula.', 'start': 9026.167, 'duration': 3.241}, {'end': 9033.909, 'text': 'Now ZC here denotes the critical value or the confidence interval.', 'start': 9029.888, 'duration': 4.021}, {'end': 9040.619, 'text': 'and this is multiplied by standard deviation divided by root of the sample size.', 'start': 9034.57, 'duration': 6.049}, {'end': 9042.782, 'text': 'Alright, N is basically the sample size.', 'start': 9040.839, 'duration': 1.943}, {'end': 9046.728, 'text': "Now let's understand how you can estimate the confidence intervals.", 'start': 9043.303, 'duration': 3.425}, {'end': 9055.948, 'text': 'So guys, the level of confidence which is denoted by C is the probability that the interval estimate contains the population parameter.', 'start': 9047.483, 'duration': 8.465}, {'end': 9059.67, 'text': "Let's say that you're trying to estimate the mean, all right?", 'start': 9056.468, 'duration': 3.202}, {'end': 9066.674, 'text': 'So the level of confidence is the probability that the interval estimate contains the population parameter.', 'start': 9059.93, 'duration': 6.744}, {'end': 9072.696, 'text': 'So this interval between minus ZC and ZC, or the area beneath this curve,', 'start': 9067.307, 'duration': 5.389}, {'end': 9077.804, 'text': 'is nothing but the probability that the interval estimate contains the population parameter.', 'start': 9072.696, 'duration': 5.108}, {'end': 9081.33, 'text': 'It should basically contain the value that you are predicting.', 'start': 9078.806, 'duration': 2.524}, {'end': 9085.031, 'text': 'Now these are known as critical values.', 'start': 9082.89, 'duration': 2.141}, {'end': 9088.712, 'text': 'This is basically your lower limit and your higher limit confidence level.', 'start': 9085.211, 'duration': 3.501}, {'end': 9091.212, 'text': "Also there's something known as the z-score.", 'start': 9089.292, 'duration': 1.92}, {'end': 9094.853, 'text': 'Now this score can be calculated by using the standard normal table.', 'start': 9091.292, 'duration': 3.561}, {'end': 9100.935, 'text': "If you look it up anywhere on Google, you'll find the z-score table or the standard normal table.", 'start': 9095.554, 'duration': 5.381}, {'end': 9105.096, 'text': "To understand how this is done, let's look at a small example.", 'start': 9101.555, 'duration': 3.541}, {'end': 9109.52, 'text': "Let's say that the level of confidence is 90%.", 'start': 9105.676, 'duration': 3.844}, {'end': 9114.364, 'text': 'This means that you are 90% confident that the interval contains the population mean.', 'start': 9109.52, 'duration': 4.844}, {'end': 9123.151, 'text': 'So the remaining 10%, which is out of 100%, the remaining 10% is equally distributed on these tail regions.', 'start': 9115.685, 'duration': 7.466}, {'end': 9125.113, 'text': 'So you have 0.05 here and 0.05 over here.', 'start': 9123.171, 'duration': 1.942}, {'end': 9131.736, 'text': "So on either side of C, you'll distribute the other leftover percentage.", 'start': 9127.835, 'duration': 3.901}, {'end': 9138.378, 'text': 'Now these Z-scores are calculated from the table as I mentioned before.', 'start': 9132.336, 'duration': 6.042}, {'end': 9141.159, 'text': '1.645 is calculated from the standard normal table.', 'start': 9138.398, 'duration': 2.761}, {'end': 9144.68, 'text': 'So guys, this is how you estimate the level of confidence.', 'start': 9141.179, 'duration': 3.501}, {'end': 9150.345, 'text': 'So to sum it up let me tell you the steps that are involved in constructing a confidence interval.', 'start': 9145.481, 'duration': 4.864}, {'end': 9153.848, 'text': "First you'll start by identifying a sample statistic.", 'start': 9150.745, 'duration': 3.103}, {'end': 9158.331, 'text': 'Okay this is the statistic that you will use to estimate a population parameter.', 'start': 9154.188, 'duration': 4.143}, {'end': 9160.793, 'text': 'This can be anything like the mean of the sample.', 'start': 9158.651, 'duration': 2.142}, {'end': 9163.295, 'text': 'Next you will select a confidence level.', 'start': 9161.273, 'duration': 2.022}, {'end': 9168.159, 'text': 'Now the confidence level describes the uncertainty of a sampling method.', 'start': 9163.655, 'duration': 4.504}, {'end': 9172.877, 'text': "right. after that you'll find something known as the margin of error right.", 'start': 9168.715, 'duration': 4.162}, {'end': 9179.279, 'text': 'we discussed margin of error earlier, so you find this based on the equation that i explained in the previous slide.', 'start': 9172.877, 'duration': 6.402}, {'end': 9182.72, 'text': "then you'll finally specify the confidence interval all right.", 'start': 9179.279, 'duration': 3.441}, {'end': 9186.502, 'text': "now let's look at a problem statement to better understand this concept.", 'start': 9182.72, 'duration': 3.782}, {'end': 9192.403, 'text': 'A random sample of 32 textbook prizes is taken from a local college bookstore.', 'start': 9187.222, 'duration': 5.181}, {'end': 9197.824, 'text': 'The mean of the sample is so, so, and so, and the sample standard deviation is this.', 'start': 9192.763, 'duration': 5.061}, {'end': 9205.186, 'text': 'Use a 95% confident level and find the margin of error for the mean prize of all textbooks in the bookstore.', 'start': 9198.444, 'duration': 6.742}, {'end': 9208.546, 'text': 'Okay, now this is a very straightforward question.', 'start': 9205.666, 'duration': 2.88}, {'end': 9210.207, 'text': 'If you want, you can read the question again.', 'start': 9208.606, 'duration': 1.601}, {'end': 9215.588, 'text': 'All you have to do is you have to just substitute the values into the equation.', 'start': 9211.027, 'duration': 4.561}, {'end': 9219.974, 'text': 'All right, so guys, we know the formula for margin of error.', 'start': 9216.169, 'duration': 3.805}, {'end': 9222.757, 'text': 'You take the z-score from the table.', 'start': 9220.454, 'duration': 2.303}, {'end': 9228.864, 'text': "After that, we have deviation, which is 23.44, right, and that's standard deviation.", 'start': 9223.318, 'duration': 5.546}, {'end': 9231.587, 'text': 'And n stands for the number of samples.', 'start': 9229.305, 'duration': 2.282}, {'end': 9235.132, 'text': 'Now the number of samples is 32, basically 32 textbooks.', 'start': 9231.888, 'duration': 3.244}, {'end': 9238.738, 'text': 'So approximately your margin of error is going to be around 8.12.', 'start': 9235.777, 'duration': 2.961}, {'end': 9242.92, 'text': 'This is a pretty simple question.', 'start': 9238.738, 'duration': 4.182}, {'end': 9245.28, 'text': 'All right, I hope all of you understood this.', 'start': 9243.36, 'duration': 1.92}, {'end': 9254.103, 'text': "Now that you know the idea behind confidence interval, let's move ahead to one of the most important topics in statistical inference,", 'start': 9245.941, 'duration': 8.162}, {'end': 9255.704, 'text': 'which is hypothesis testing.', 'start': 9254.103, 'duration': 1.601}, {'end': 9264.713, 'text': 'So basically statisticians use hypothesis testing to formally check whether the hypothesis is accepted or rejected.', 'start': 9257.127, 'duration': 7.586}, {'end': 9278.744, 'text': 'Hypothesis testing is an inferential statistical technique used to determine whether there is enough evidence in a data sample to infer that a certain condition holds true for an entire population.', 'start': 9265.593, 'duration': 13.151}, {'end': 9286.149, 'text': 'So to understand the characteristics of a general population, we take a random sample and we analyze the properties of the sample.', 'start': 9279.384, 'duration': 6.765}, {'end': 9294.715, 'text': 'We test whether or not the identified conclusion represents the population accurately and finally we interpret their results.', 'start': 9286.949, 'duration': 7.766}, {'end': 9300.719, 'text': 'Now whether or not to accept the hypothesis depends upon the percentage value that we get from the hypothesis.', 'start': 9295.115, 'duration': 5.604}, {'end': 9304.942, 'text': "So to better understand this, let's look at a small example.", 'start': 9301.539, 'duration': 3.403}, {'end': 9309.586, 'text': 'Before that, there are a few steps that are followed in hypothesis testing.', 'start': 9305.942, 'duration': 3.644}, {'end': 9313.811, 'text': 'You begin by stating the null and the alternative hypothesis.', 'start': 9310.047, 'duration': 3.764}, {'end': 9318.656, 'text': "I'll tell you what exactly these terms are, and then you formulate an analysis plan.", 'start': 9313.851, 'duration': 4.805}, {'end': 9324.162, 'text': 'After that, you analyze the sample data, and finally, you can interpret the results.', 'start': 9319.697, 'duration': 4.465}, {'end': 9335.234, 'text': "Now to understand the entire hypothesis testing, we'll look at a good example, okay? Now consider four boys, Nick, John, Bob, and Harry.", 'start': 9326.07, 'duration': 9.164}, {'end': 9342.698, 'text': 'These boys were caught bunking a class and they were asked to stay back at school and clean their classroom as a punishment.', 'start': 9335.775, 'duration': 6.923}, {'end': 9348.789, 'text': 'right. so what john did is he decided that four of them would take turns to clean their classrooms.', 'start': 9343.323, 'duration': 5.466}, {'end': 9353.995, 'text': 'he came up with a plan of writing each of their names on chits and putting them in a bowl.', 'start': 9348.789, 'duration': 5.206}, {'end': 9359.24, 'text': 'now, every day they had to pick up a name from the bowl and that person had to clean the class right.', 'start': 9353.995, 'duration': 5.245}, {'end': 9360.622, 'text': 'that sounds pretty fair enough.', 'start': 9359.24, 'duration': 1.382}, {'end': 9365.497, 'text': "Now it has been three days and everybody's name has come up except John's.", 'start': 9361.295, 'duration': 4.202}, {'end': 9373.162, 'text': 'Assuming that this event is completely random and free of bias, what is the probability of John not cheating?', 'start': 9365.998, 'duration': 7.164}], 'summary': 'Understanding confidence intervals and hypothesis testing in statistics, with an example of margin of error calculation and hypothesis testing scenario.', 'duration': 351.457, 'max_score': 9021.705, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ9021705.jpg'}, {'end': 9100.935, 'src': 'embed', 'start': 9072.696, 'weight': 3, 'content': [{'end': 9077.804, 'text': 'is nothing but the probability that the interval estimate contains the population parameter.', 'start': 9072.696, 'duration': 5.108}, {'end': 9081.33, 'text': 'It should basically contain the value that you are predicting.', 'start': 9078.806, 'duration': 2.524}, {'end': 9085.031, 'text': 'Now these are known as critical values.', 'start': 9082.89, 'duration': 2.141}, {'end': 9088.712, 'text': 'This is basically your lower limit and your higher limit confidence level.', 'start': 9085.211, 'duration': 3.501}, {'end': 9091.212, 'text': "Also there's something known as the z-score.", 'start': 9089.292, 'duration': 1.92}, {'end': 9094.853, 'text': 'Now this score can be calculated by using the standard normal table.', 'start': 9091.292, 'duration': 3.561}, {'end': 9100.935, 'text': "If you look it up anywhere on Google, you'll find the z-score table or the standard normal table.", 'start': 9095.554, 'duration': 5.381}], 'summary': 'Interval estimate contains population parameter, critical values, z-score, and standard normal table.', 'duration': 28.239, 'max_score': 9072.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ9072696.jpg'}, {'end': 9608.849, 'src': 'embed', 'start': 9580.019, 'weight': 2, 'content': [{'end': 9582.482, 'text': 'the model is retained with the help of the new data.', 'start': 9580.019, 'duration': 2.463}, {'end': 9588.056, 'text': 'Now, when we talk about supervised selling, there are not just one, but quite a few algorithms here.', 'start': 9583.092, 'duration': 4.964}, {'end': 9595.182, 'text': 'So we have linear regression, logistic regression, decision tree, we have random forest, we have nape bias classifiers.', 'start': 9588.116, 'duration': 7.066}, {'end': 9603.489, 'text': 'So linear regression is used to estimate real values for example the cost of houses, the number of calls, the total sales.', 'start': 9595.682, 'duration': 7.807}, {'end': 9606.007, 'text': 'based on the continuous variables.', 'start': 9604.085, 'duration': 1.922}, {'end': 9608.849, 'text': 'so that is what regular regression is.', 'start': 9606.007, 'duration': 2.842}], 'summary': 'Various algorithms such as linear regression, logistic regression, decision tree, random forest, and nape bias classifiers are used for supervised selling to estimate real values based on continuous variables.', 'duration': 28.83, 'max_score': 9580.019, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ9580019.jpg'}, {'end': 10365.881, 'src': 'embed', 'start': 10343.966, 'weight': 0, 'content': [{'end': 10353.108, 'text': 'here the equation of line changes to y equal minus of MX plus C, where Y is the time taken to travel a fixed distance, X is the speed of vehicle,', 'start': 10343.966, 'duration': 9.142}, {'end': 10356.649, 'text': 'M is the negative slope of the line and C is the y-intercept of the line.', 'start': 10353.108, 'duration': 3.541}, {'end': 10357.329, 'text': 'All right.', 'start': 10357.129, 'duration': 0.2}, {'end': 10360.617, 'text': "Now, let's get back to our independent and dependent variable.", 'start': 10358.015, 'duration': 2.602}, {'end': 10365.881, 'text': 'So in that term Y is our dependent variable and X that is our independent variable.', 'start': 10360.837, 'duration': 5.044}], 'summary': 'Equation of line: y = -mx + c. y is time, x is speed.', 'duration': 21.915, 'max_score': 10343.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ10343966.jpg'}], 'start': 8823.211, 'title': 'Interval estimation and regression analysis', 'summary': 'Discusses interval estimation in statistics, emphasizing the difference between point estimate and interval estimate, explaining confidence interval and margin of error. it also covers linear regression, including its basics, selection criteria, and applications, with an emphasis on reducing error between actual and predicted values.', 'chapters': [{'end': 8948.822, 'start': 8823.211, 'title': 'Interval estimation in statistics', 'summary': 'Discusses interval estimation in statistics, highlighting the difference between point estimate and interval estimate, emphasizing the accuracy of interval estimate, and explaining the concepts of confidence interval and margin of error.', 'duration': 125.611, 'highlights': ['Interval estimate is more accurate than point estimate as it provides a range within which the value might occur, exemplified by the difference between predicting a specific time and a time range for reaching a theater.', 'Confidence interval is a significant measure in assessing the importance of a machine learning model, indicating the confidence that the interval estimated contains the population parameter or mean.', 'Statisticians use confidence interval to quantify the uncertainty associated with a sample estimate of a population parameter, emphasizing its importance in statistical analysis.']}, {'end': 9182.72, 'start': 8948.902, 'title': 'Confidence interval & margin of error', 'summary': 'Explains confidence interval, with an example of a survey showing a 99% confidence level and a confidence interval of 100 to 200 cans of cat food, and then discusses the margin of error and the steps involved in constructing a confidence interval.', 'duration': 233.818, 'highlights': ['The chapter explains the concept of confidence interval using a survey example with a 99% confidence level and a confidence interval of 100 to 200 cans of cat food.', 'It details the calculation and significance of the margin of error, which is the greatest possible distance between the point estimate and the value of the parameter being estimated.', 'The process of constructing a confidence interval is explained, involving identifying a sample statistic, selecting a confidence level, finding the margin of error, and specifying the confidence interval.']}, {'end': 9643.682, 'start': 9182.72, 'title': 'Margin of error and hypothesis testing', 'summary': 'Covers the calculation of margin of error for a sample mean, with a 95% confidence level yielding a margin of error of approximately 8.12, and delves into hypothesis testing, illustrating its steps and practical application through a classroom cleaning example and the concept of supervised learning.', 'duration': 460.962, 'highlights': ['Calculation of margin of error', 'Explanation of hypothesis testing', 'Practical application of hypothesis testing in a classroom scenario', 'Introduction to supervised learning']}, {'end': 10220.256, 'start': 9643.682, 'title': 'Linear regression overview', 'summary': 'Covers the basics of linear regression, including its types, uses, comparison with logistic regression, selection criteria, and applications, while emphasizing its importance in predictive modeling and business decision-making.', 'duration': 576.574, 'highlights': ['Linear regression and logistic regression are compared based on the type of function they map data to, with linear regression used for continuous variables and logistic regression for categorical variables.', 'Linear regression is used for evaluating trends and sales estimates, analyzing the impact of price changes, and assessing risk in financial services and insurance domains.', 'The criteria for selecting linear regression include its classification and regression capabilities, data quality, computational complexity, and comprehensibility.']}, {'end': 10724.409, 'start': 10220.256, 'title': 'Linear regression and regression line', 'summary': 'Explains the concept of linear regression, its mathematical implementation, and the process of finding the regression line using the least square method, with an emphasis on reducing the error between actual and predicted values.', 'duration': 504.153, 'highlights': ['The process of finding the regression line using the least square method', 'Explanation of the equation of the regression line (y = MX + C)', 'Calculating the predicted values of Y for given M and C', 'Plotting the points and the regression line on the graph', 'Understanding the relationship between speed and distance traveled']}], 'duration': 1901.198, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ8823211.jpg', 'highlights': ['Confidence interval quantifies uncertainty in sample estimate, crucial in statistical analysis.', 'Interval estimate provides a range for more accurate prediction, exemplified by theater arrival time.', 'Margin of error measures the greatest possible distance between point estimate and parameter value.', 'Linear regression used for trend evaluation, sales estimates, price impact analysis, and risk assessment.', 'Criteria for selecting linear regression include classification, regression capabilities, and data quality.']}, {'end': 12080.966, 'segs': [{'end': 10747.993, 'src': 'embed', 'start': 10724.409, 'weight': 1, 'content': [{'end': 10730.973, 'text': 'the line with the least error will be the line of linear regression or regression line, and it will also be the best-fit line.', 'start': 10724.409, 'duration': 6.564}, {'end': 10731.573, 'text': 'All right.', 'start': 10731.353, 'duration': 0.22}, {'end': 10733.694, 'text': 'So this is how things work in computer.', 'start': 10731.993, 'duration': 1.701}, {'end': 10738.397, 'text': 'So what it do it performs n number of iteration for different values of M.', 'start': 10734.315, 'duration': 4.082}, {'end': 10740.951, 'text': 'for different values of M.', 'start': 10739.511, 'duration': 1.44}, {'end': 10747.993, 'text': 'It will calculate the equation of line where y equal MX plus C, right? So as the value of M changes the line is changing.', 'start': 10740.951, 'duration': 7.042}], 'summary': 'Computer performs n iterations to find best-fit line y=mx+c.', 'duration': 23.584, 'max_score': 10724.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ10724409.jpg'}, {'end': 10800.148, 'src': 'embed', 'start': 10771.304, 'weight': 2, 'content': [{'end': 10778.211, 'text': "Now that we have calculated the best fit line now, it's time to check the goodness of it or to check how good a model is performing.", 'start': 10771.304, 'duration': 6.907}, {'end': 10782.075, 'text': 'So in order to do that, we have a method called R square method.', 'start': 10778.651, 'duration': 3.424}, {'end': 10790.163, 'text': 'So what is this R square? Well R squared value is a statistical measure of how close the data are to the fitted regression line in general.', 'start': 10782.575, 'duration': 7.588}, {'end': 10793.046, 'text': 'It is considered that a higher squared value model is a good model.', 'start': 10790.203, 'duration': 2.843}, {'end': 10800.148, 'text': 'but you can also have a lower squared value for a good model as well or a higher squared value for a model that does not fit at all.', 'start': 10793.542, 'duration': 6.606}], 'summary': 'R-squared measures the goodness of fit of a regression model, with higher values indicating a better fit.', 'duration': 28.844, 'max_score': 10771.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ10771304.jpg'}, {'end': 11127.237, 'src': 'embed', 'start': 11097.562, 'weight': 0, 'content': [{'end': 11100.644, 'text': 'Obviously this type of information can be extremely valuable.', 'start': 11097.562, 'duration': 3.082}, {'end': 11101.265, 'text': 'All right.', 'start': 11101.065, 'duration': 0.2}, {'end': 11104.613, 'text': 'All right, so this was all about the theoretical concept.', 'start': 11102.25, 'duration': 2.363}, {'end': 11108.016, 'text': "Now, let's move on to the coding part and understand the code in depth.", 'start': 11104.833, 'duration': 3.183}, {'end': 11114.203, 'text': "So for implementing linear regression using Python, I'll be using anaconda with Jupiter notebook installed on it.", 'start': 11108.717, 'duration': 5.486}, {'end': 11121.711, 'text': 'So all right, this is our Jupiter notebook and we are using python 3.0 on it.', 'start': 11118.107, 'duration': 3.604}, {'end': 11122.192, 'text': 'All right.', 'start': 11122.012, 'duration': 0.18}, {'end': 11127.237, 'text': 'So we are going to use a data set consisting of head size and human brain of different people.', 'start': 11122.816, 'duration': 4.421}], 'summary': 'Introduction to implementing linear regression using python and a dataset of head sizes and human brain data.', 'duration': 29.675, 'max_score': 11097.562, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ11097562.jpg'}, {'end': 11467.38, 'src': 'embed', 'start': 11439.932, 'weight': 4, 'content': [{'end': 11443.035, 'text': 'So here you are getting the curve which is nothing but three different straight lines.', 'start': 11439.932, 'duration': 3.103}, {'end': 11446.158, 'text': 'So here we need to make a new way to solve this problem.', 'start': 11443.475, 'duration': 2.683}, {'end': 11450.662, 'text': 'So this has to be formulated into equation and hence we come up with logistic regression.', 'start': 11446.618, 'duration': 4.044}, {'end': 11455.614, 'text': 'So here the outcome is either 0 or 1 which is the main rule of logistic regression.', 'start': 11451.272, 'duration': 4.342}, {'end': 11458.416, 'text': 'So with this a resulting curve cannot be formulated.', 'start': 11455.914, 'duration': 2.502}, {'end': 11461.877, 'text': 'So hence a main aim to bring the values to 0 and 1 is fulfilled.', 'start': 11458.696, 'duration': 3.181}, {'end': 11467.38, 'text': 'So that is how we came up with logistic regression now here once it gets formulated into an equation.', 'start': 11462.258, 'duration': 5.122}], 'summary': 'Logistic regression formulates outcome into 0 or 1, fulfilling aim of bringing values to 0 and 1.', 'duration': 27.448, 'max_score': 11439.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ11439932.jpg'}, {'end': 11797.377, 'src': 'embed', 'start': 11768.661, 'weight': 3, 'content': [{'end': 11770.962, 'text': 'Now logistic regression helps you to predict your weather.', 'start': 11768.661, 'duration': 2.301}, {'end': 11775.504, 'text': 'For example, it is used to predict whether it is raining or not whether it is sunny.', 'start': 11771.422, 'duration': 4.082}, {'end': 11780.247, 'text': 'Is it cloudy or not? So all these things can be predicted using logistic regression.', 'start': 11775.865, 'duration': 4.382}, {'end': 11786.15, 'text': 'Whereas you need to keep in mind that both linear regression and logistic regression can be used in predicting the weather.', 'start': 11780.647, 'duration': 5.503}, {'end': 11790.372, 'text': 'So in that case linear regression helps you to predict what will be the temperature tomorrow.', 'start': 11786.43, 'duration': 3.942}, {'end': 11797.377, 'text': "whereas logistic regression will only tell you whether it's going to rain or not or whether it's cloudy or not which is going to snow or not.", 'start': 11790.832, 'duration': 6.545}], 'summary': 'Logistic regression predicts weather conditions, linear regression predicts temperature.', 'duration': 28.716, 'max_score': 11768.661, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ11768661.jpg'}], 'start': 10724.409, 'title': 'Linear and logistic regression analysis', 'summary': 'Covers linear regression, r squared method, and its application yielding an r-squared value of 0.63. additionally, it discusses the calculation and implications of r square values and logistic regression, including its practical implementation in data science projects.', 'chapters': [{'end': 10808.496, 'start': 10724.409, 'title': 'Linear regression and r squared method', 'summary': "Explains the process of linear regression and the iterative approach to finding the best-fit line, followed by the evaluation of the model's performance using the r squared method.", 'duration': 84.087, 'highlights': ['The process of linear regression involves performing n iterations for different values of M to calculate the best-fit line, where the value of M resulting in the minimum distance between actual and predicted values is selected (quantifiable data: n iterations).', 'The R squared method is used to evaluate the goodness of the model, with a higher R squared value indicating a better fit, though a lower R squared value can also indicate a good model, and it is also known as coefficient of Determination (quantifiable data: R squared value).']}, {'end': 11097.222, 'start': 10808.876, 'title': 'Calculation of r square in regression analysis', 'summary': 'Discusses the calculation of r square, where the r square value is found to be approximately 0.3, indicating a poor fit of the data to the regression line. it explains how different r square values signify the proximity of actual values to the regression line and the implications of low r square values in certain fields.', 'duration': 288.346, 'highlights': ['The R square value is approximately 0.3, indicating a poor fit of the data to the regression line.', 'Different R square values signify the proximity of actual values to the regression line, with higher values indicating a closer fit.', 'Low R square values in fields predicting human behavior are expected, and significant predictors can still draw important conclusions.']}, {'end': 11561.723, 'start': 11097.562, 'title': 'Implementing linear regression and logistic regression', 'summary': "Covers the implementation of linear regression using python, with detailed steps on importing the dataset, calculating coefficients, plotting the linear model, and evaluating the model's r-squared value, achieving an r-squared value of 0.63. additionally, it explains the concept and implementation of logistic regression, highlighting the differences from linear regression and the use of a threshold value for classification.", 'duration': 464.161, 'highlights': ['The tutorial explains the implementation of linear regression using Python, covering the steps of importing the dataset, calculating coefficients, and plotting the linear model.', 'It details the calculation of the R-squared value for the linear regression model, achieving an R-squared value of 0.63, indicating a good model fit.', 'The chapter delves into the concept and implementation of logistic regression, highlighting its use in predicting outcomes of a categorical dependent variable, with an emphasis on the need for discrete or categorical outcomes.', 'It explains the need for logistic regression over linear regression, emphasizing the requirement for discrete outcomes and the clipping of values between 0 and 1.', 'The tutorial introduces the sigmoid function curve as a key component of logistic regression, highlighting its ability to convert values to binary format and the use of a threshold value for classification, ensuring discrete outputs.']}, {'end': 12080.966, 'start': 11562.847, 'title': 'Logistic regression in data science', 'summary': 'Explains the concept of logistic regression, including the equation formation and transformation, the major differences between linear and logistic regression, its use in weather prediction, classification problems, and illness determination, and outlines the practical implementation of logistic regression through two projects on titanic data analysis and suv car data analysis.', 'duration': 518.119, 'highlights': ['The equation transformation from a straight line equation to a logistic regression equation is explained, including the need to transform the equation to ensure the range of the predicted value is between 0 and 1.', 'The major differences between linear regression and logistic regression, such as the nature of the variable being continuous in linear regression and discrete in logistic regression, and the type of problems they solve, are outlined.', 'The application of logistic regression in weather prediction, classification problems, and illness determination is discussed, highlighting its role in predicting discrete values and multi-class classification.', 'The practical implementation of logistic regression through projects on Titanic data analysis and SUV car data analysis is introduced, emphasizing the prediction of survival chances based on various factors and the analysis of passenger features.']}], 'duration': 1356.557, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ10724409.jpg', 'highlights': ['The tutorial explains the implementation of linear regression using Python, achieving an R-squared value of 0.63', 'The process of linear regression involves performing n iterations for different values of M to calculate the best-fit line (quantifiable data: n iterations)', 'The R squared method is used to evaluate the goodness of the model, with a higher R squared value indicating a better fit (quantifiable data: R squared value)', 'The application of logistic regression in weather prediction, classification problems, and illness determination is discussed', 'The equation transformation from a straight line equation to a logistic regression equation is explained']}, {'end': 12910.136, 'segs': [{'end': 12110.155, 'src': 'embed', 'start': 12081.467, 'weight': 10, 'content': [{'end': 12086.069, 'text': 'So all of these are very very interesting questions and you would be going through all of them one by one.', 'start': 12081.467, 'duration': 4.602}, {'end': 12090.651, 'text': 'So in this stage, you need to analyze your data and explore your data as much as you can.', 'start': 12086.509, 'duration': 4.142}, {'end': 12096.364, 'text': 'Then my third step is to wrangle your data now data wrangling basically means cleaning your data.', 'start': 12091.48, 'duration': 4.884}, {'end': 12097.225, 'text': 'So over here.', 'start': 12096.724, 'duration': 0.501}, {'end': 12101.828, 'text': 'you can simply remove the unnecessary items, or, if you have a null values in the data set,', 'start': 12097.225, 'duration': 4.603}, {'end': 12104.811, 'text': 'you can just clear that data and then you can take it forward.', 'start': 12101.828, 'duration': 2.983}, {'end': 12110.155, 'text': 'So in this step, you can build your model using the train data set and then you can test it using a test.', 'start': 12105.271, 'duration': 4.884}], 'summary': 'Data analysis involves exploring, cleaning, and modeling data for testing with train and test datasets.', 'duration': 28.688, 'max_score': 12081.467, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12081467.jpg'}, {'end': 12147.294, 'src': 'embed', 'start': 12116.76, 'weight': 0, 'content': [{'end': 12120.743, 'text': 'you will check the accuracy so as to ensure how much accurate your values are.', 'start': 12116.76, 'duration': 3.983}, {'end': 12125.307, 'text': "So I hope you guys got these five steps that you're going to implement in logistic regression.", 'start': 12121.063, 'duration': 4.244}, {'end': 12128.029, 'text': "So now let's go into all these steps in detail.", 'start': 12125.747, 'duration': 2.282}, {'end': 12131.912, 'text': 'So number one, we have to collect your data or you can say import the libraries.', 'start': 12128.349, 'duration': 3.563}, {'end': 12134.314, 'text': 'So let me show you the implementation part as well.', 'start': 12132.292, 'duration': 2.022}, {'end': 12138.717, 'text': 'So I just open my Jupiter notebook and I just implement all of these steps side by side.', 'start': 12134.694, 'duration': 4.023}, {'end': 12142.447, 'text': 'So guys, this is my Jupyter notebook.', 'start': 12140.885, 'duration': 1.562}, {'end': 12147.294, 'text': "So first let me just rename Jupyter notebook to let's say Titanic data analysis.", 'start': 12142.868, 'duration': 4.426}], 'summary': 'Five steps for logistic regression implementation, including data collection and jupyter notebook usage.', 'duration': 30.534, 'max_score': 12116.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12116760.jpg'}, {'end': 12234.175, 'src': 'embed', 'start': 12206.618, 'weight': 6, 'content': [{'end': 12209.759, 'text': "So these are the libraries that I'll be needing in this Titanic data analysis.", 'start': 12206.618, 'duration': 3.141}, {'end': 12211.939, 'text': 'So now let me just import my data set.', 'start': 12210.159, 'duration': 1.78}, {'end': 12213.54, 'text': "So I'll take a variable.", 'start': 12212.319, 'duration': 1.221}, {'end': 12216.42, 'text': "Let's say Titanic data and using the pandas.", 'start': 12213.58, 'duration': 2.84}, {'end': 12219.441, 'text': 'I will just read my CSV or you can say the data set.', 'start': 12216.6, 'duration': 2.841}, {'end': 12223.302, 'text': "I'll write the name of my data set that is Titanic dot CSV.", 'start': 12220.601, 'duration': 2.701}, {'end': 12225.744, 'text': 'Now I have already showed you the data set.', 'start': 12223.942, 'duration': 1.802}, {'end': 12228.147, 'text': 'So over here, let me just print the top 10 rows.', 'start': 12226.065, 'duration': 2.082}, {'end': 12234.175, 'text': "So for that I'll just say I'll take the variable Titanic data dot head and I'll say the top 10 rows.", 'start': 12228.228, 'duration': 5.947}], 'summary': 'Imported titanic data using pandas and displayed top 10 rows.', 'duration': 27.557, 'max_score': 12206.618, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12206618.jpg'}, {'end': 12319.762, 'src': 'embed', 'start': 12288.618, 'weight': 3, 'content': [{'end': 12293.743, 'text': 'So here the number of passengers which are there in the original data set we have is 891.', 'start': 12288.618, 'duration': 5.125}, {'end': 12296.465, 'text': 'So around this number were traveling in the Titanic ship.', 'start': 12293.743, 'duration': 2.722}, {'end': 12297.746, 'text': 'So over here.', 'start': 12297.146, 'duration': 0.6}, {'end': 12300.429, 'text': 'my first step is done, where you have just collected data,', 'start': 12297.746, 'duration': 2.683}, {'end': 12304.913, 'text': 'imported all the libraries and find out the total number of passengers which are traveling in Titanic.', 'start': 12300.429, 'duration': 4.484}, {'end': 12308.316, 'text': "So now let me just go back to presentation and let's see what is my next step.", 'start': 12305.533, 'duration': 2.783}, {'end': 12312.56, 'text': "So we're done with the collecting data next step is to analyze your data.", 'start': 12309.079, 'duration': 3.481}, {'end': 12319.762, 'text': 'So over here will be creating different plots to check the relationship between variables, as in how one variable is affecting the other.', 'start': 12312.92, 'duration': 6.842}], 'summary': 'The original data set has 891 passengers, approximately the number on the titanic. next step is to analyze the data by creating plots to examine variable relationships.', 'duration': 31.144, 'max_score': 12288.618, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12288618.jpg'}, {'end': 12469.978, 'src': 'embed', 'start': 12442.433, 'weight': 2, 'content': [{'end': 12446.836, 'text': 'and if we see the people who survived here, we can see the majority of female survive.', 'start': 12442.433, 'duration': 4.403}, {'end': 12449.678, 'text': 'So this basically concludes the gender of the survival rate.', 'start': 12447.177, 'duration': 2.501}, {'end': 12455.542, 'text': 'So it appears on average women were more than three times more likely to survive than men next.', 'start': 12450.019, 'duration': 5.523}, {'end': 12459.245, 'text': 'Let us plot another plot where we have the hue as the passenger class.', 'start': 12455.582, 'duration': 3.663}, {'end': 12465.629, 'text': 'So over here we can see which class at the passenger was traveling in whether it was traveling in class 1 2 or 3.', 'start': 12459.345, 'duration': 6.284}, {'end': 12467.671, 'text': 'So for that I just write the same command.', 'start': 12465.629, 'duration': 2.042}, {'end': 12469.978, 'text': "I'll say sns.com plot.", 'start': 12467.771, 'duration': 2.207}], 'summary': 'On average, women were over three times more likely to survive than men. passenger class distribution also impacted survival rates.', 'duration': 27.545, 'max_score': 12442.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12442433.jpg'}, {'end': 12521.651, 'src': 'embed', 'start': 12484.593, 'weight': 1, 'content': [{'end': 12490.079, 'text': 'So over here, you can see I have blue for first class orange for second class and green for the third class.', 'start': 12484.593, 'duration': 5.486}, {'end': 12494.898, 'text': 'So here the passengers who did not survive were majorly of the third class, or you can say,', 'start': 12490.757, 'duration': 4.141}, {'end': 12501.26, 'text': 'the lowest class or the cheapest class to get into the Titanic, and the people who did survive majorly belong to the higher classes.', 'start': 12494.898, 'duration': 6.362}, {'end': 12505.622, 'text': 'So here one and two has more rise than the passenger who were traveling in the third class.', 'start': 12501.68, 'duration': 3.942}, {'end': 12511.103, 'text': 'So here we have concluded that the passengers who did not survive a majorly of third class or, you can say,', 'start': 12506.122, 'duration': 4.981}, {'end': 12517.165, 'text': 'the lowest class and the passengers who were traveling in first and second class would tend to survive more next.', 'start': 12511.103, 'duration': 6.062}, {'end': 12519.026, 'text': 'Let us plot a graph for the age distribution.', 'start': 12517.185, 'duration': 1.841}, {'end': 12521.651, 'text': 'Over here, I can simply use my data.', 'start': 12519.547, 'duration': 2.104}], 'summary': 'Most non-survivors were from third class, while first and second class passengers had higher survival rates.', 'duration': 37.058, 'max_score': 12484.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12484593.jpg'}, {'end': 12578.392, 'src': 'embed', 'start': 12546.469, 'weight': 4, 'content': [{'end': 12548.61, 'text': 'So this is the analysis on the age column.', 'start': 12546.469, 'duration': 2.141}, {'end': 12554.354, 'text': 'So we saw that we have more young passengers and more mediocre age passengers which are traveling in the Titanic.', 'start': 12548.65, 'duration': 5.704}, {'end': 12557.357, 'text': 'So next let me plot a graph of air as well.', 'start': 12555.135, 'duration': 2.222}, {'end': 12559.098, 'text': "So I'll say Titanic data.", 'start': 12557.637, 'duration': 1.461}, {'end': 12563.521, 'text': "I'll say fair and again I'll plot a histogram.", 'start': 12560.519, 'duration': 3.002}, {'end': 12564.182, 'text': "So I'll say hist.", 'start': 12563.561, 'duration': 0.621}, {'end': 12569.886, 'text': 'So here you can see the fair size is between 0 to 100.', 'start': 12566.403, 'duration': 3.483}, {'end': 12573.809, 'text': 'Now, let me add the bin size so as to make it more clear over here.', 'start': 12569.886, 'duration': 3.923}, {'end': 12578.392, 'text': "I'll say bin is equals to let's say 20 and I'll increase the figure size as well.", 'start': 12573.889, 'duration': 4.503}], 'summary': 'Analysis shows majority of titanic passengers are young or of mediocre age.', 'duration': 31.923, 'max_score': 12546.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12546469.jpg'}, {'end': 12678.53, 'src': 'embed', 'start': 12652.589, 'weight': 11, 'content': [{'end': 12660.311, 'text': 'and then we have very less values for 2 3 4 and so on next if I go above we saw this column as well similarly can do for parts.', 'start': 12652.589, 'duration': 7.722}, {'end': 12665.053, 'text': 'So next we have parts or you can say the number of parents or children which were aboard the Titanic.', 'start': 12660.992, 'duration': 4.061}, {'end': 12667.154, 'text': 'So you similarly can do this as well.', 'start': 12665.533, 'duration': 1.621}, {'end': 12668.554, 'text': 'Then we have the ticket number.', 'start': 12667.314, 'duration': 1.24}, {'end': 12671.535, 'text': "So I don't think so any analysis is required for ticket.", 'start': 12669.034, 'duration': 2.501}, {'end': 12672.966, 'text': 'Then we have fair.', 'start': 12672.125, 'duration': 0.841}, {'end': 12673.366, 'text': 'so fair.', 'start': 12672.966, 'duration': 0.4}, {'end': 12678.53, 'text': 'we have already discussed, as in the people who tend to travel in the first class usually pay the highest fare,', 'start': 12673.366, 'duration': 5.164}], 'summary': 'Data analysis: low values for parts 2, 3, 4; first class pays highest fare.', 'duration': 25.941, 'max_score': 12652.589, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12652589.jpg'}, {'end': 12822.059, 'src': 'embed', 'start': 12796.838, 'weight': 7, 'content': [{'end': 12802.261, 'text': "So here if you don't want to see this numbers, you can also plot a heat map and then you can visually analyze it.", 'start': 12796.838, 'duration': 5.423}, {'end': 12803.682, 'text': 'So let me just do that as well.', 'start': 12802.381, 'duration': 1.301}, {'end': 12805.843, 'text': "So I'll say SNS dot heat map.", 'start': 12803.742, 'duration': 2.101}, {'end': 12811.327, 'text': "I'll say y tick labels.", 'start': 12810.066, 'duration': 1.261}, {'end': 12820.679, 'text': "falls So I'll just run this so as we have already seen that there were three columns in which missing data value was present.", 'start': 12814.517, 'duration': 6.162}, {'end': 12822.059, 'text': 'So this might be age.', 'start': 12821.079, 'duration': 0.98}], 'summary': 'Visualize and analyze missing data in three columns using a heat map.', 'duration': 25.221, 'max_score': 12796.838, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12796838.jpg'}], 'start': 12081.467, 'title': 'Titanic data analysis', 'summary': 'Explains 5 steps for logistic regression data analysis and covers data analysis techniques, including survival rates based on gender, class, age, and fare size. it also delves into detailed titanic dataset analysis, survival analysis, gender-based analysis, passenger class, and data wrangling to remove null values, with 891 total passengers.', 'chapters': [{'end': 12304.913, 'start': 12081.467, 'title': 'Logistic regression data analysis', 'summary': 'Explains the 5 steps for logistic regression data analysis, including importing necessary libraries, reading the titanic dataset, and finding the total number of passengers to be 891.', 'duration': 223.446, 'highlights': ['The chapter explains the 5 steps for logistic regression data analysis', 'finding the total number of passengers to be 891', 'importing necessary libraries and reading the Titanic dataset']}, {'end': 12589.781, 'start': 12305.533, 'title': 'Data analysis techniques and visualizations', 'summary': 'Covers the process of analyzing and visualizing data, including creating plots to check the relationship between variables and using graphs to compare survival rates based on gender, class, age, and fare size in the titanic dataset.', 'duration': 284.248, 'highlights': ['The majority of passengers who did not survive belonged to the third class, while those who survived were mostly from the higher classes, indicating a correlation between passenger class and survival rate.', 'On average, women were more than three times more likely to survive than men, highlighting a significant gender-based difference in survival rates.', 'The analysis of age distribution revealed a higher population of young passengers and an average age group traveling on the Titanic.', 'The fare size analysis showed that the majority of fare sizes were between 0 to 100, providing insight into the distribution of fare prices among passengers.']}, {'end': 12910.136, 'start': 12590.441, 'title': 'Titanic data analysis', 'summary': 'Covers data analysis of titanic dataset, including columns left, survival analysis, gender-based analysis, passenger class, sibling/spouse and parent/children count, data wrangling to remove null values affecting accuracy, and visualization of missing data using a heat map.', 'duration': 319.695, 'highlights': ['Passenger class analysis', 'Data wrangling process', 'Visualization of missing data using heat map', 'Survival analysis based on gender', 'Sibling/spouse and parent/children count analysis']}], 'duration': 828.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12081467.jpg', 'highlights': ['The chapter explains the 5 steps for logistic regression data analysis', 'The majority of passengers who did not survive belonged to the third class', 'On average, women were more than three times more likely to survive than men', 'Finding the total number of passengers to be 891', 'The analysis of age distribution revealed a higher population of young passengers', 'The fare size analysis showed that the majority of fare sizes were between 0 to 100', 'Importing necessary libraries and reading the Titanic dataset', 'Visualization of missing data using heat map', 'Survival analysis based on gender', 'Passenger class analysis', 'Data wrangling process', 'Sibling/spouse and parent/children count analysis']}, {'end': 15037.777, 'segs': [{'end': 12989.764, 'src': 'embed', 'start': 12952.136, 'weight': 2, 'content': [{'end': 12952.936, 'text': 'So dropping it.', 'start': 12952.136, 'duration': 0.8}, {'end': 12954.817, 'text': "I'll just say Titanic underscore data.", 'start': 12952.956, 'duration': 1.861}, {'end': 12960.762, 'text': "and I'll simply type and drop and the column which I need to drop so I have to drop the cabin column.", 'start': 12955.377, 'duration': 5.385}, {'end': 12965.626, 'text': "I'll mention the axis equals to 1 and I'll say in place also to true.", 'start': 12961.602, 'duration': 4.024}, {'end': 12971.811, 'text': "So now again, I'll just print the head and let us see whether this column has been removed from the data set or not.", 'start': 12967.187, 'duration': 4.624}, {'end': 12974.453, 'text': "So I'll say Titanic dot head.", 'start': 12972.331, 'duration': 2.122}, {'end': 12978.456, 'text': "So as you can see here, we don't have cabin column anymore.", 'start': 12976.034, 'duration': 2.422}, {'end': 12981.238, 'text': 'Now, you can also drop the na values.', 'start': 12979.257, 'duration': 1.981}, {'end': 12989.764, 'text': "So I'll say Titanic data dot drop all the any values or you can say any end which is not a number and I'll say in place is equals to true.", 'start': 12981.399, 'duration': 8.365}], 'summary': "Dropped 'cabin' column and na values from titanic dataset.", 'duration': 37.628, 'max_score': 12952.136, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12952136.jpg'}, {'end': 13041.036, 'src': 'embed', 'start': 13014.437, 'weight': 6, 'content': [{'end': 13018.958, 'text': 'So this will basically help me to check whether my values has been removed from the data set or not.', 'start': 13014.437, 'duration': 4.521}, {'end': 13021.778, 'text': "So as you can see here, I don't have any null values.", 'start': 13019.358, 'duration': 2.42}, {'end': 13024.499, 'text': "So it's entirely black now.", 'start': 13022.239, 'duration': 2.26}, {'end': 13026.079, 'text': 'You can actually know the sum as well.', 'start': 13024.519, 'duration': 1.56}, {'end': 13027.02, 'text': 'So I just go above.', 'start': 13026.099, 'duration': 0.921}, {'end': 13032.841, 'text': 'So I just copy this part and I just use the sum function to calculate the sum.', 'start': 13029.02, 'duration': 3.821}, {'end': 13041.036, 'text': 'So here that tells me that data set is clean as in the data set does not contain any null value or any any n value.', 'start': 13034.235, 'duration': 6.801}], 'summary': 'The data set has been confirmed to be clean, with no null or nan values.', 'duration': 26.599, 'max_score': 13014.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ13014437.jpg'}, {'end': 13098.858, 'src': 'embed', 'start': 13071.651, 'weight': 4, 'content': [{'end': 13077.453, 'text': 'we will convert this to categorical variable, into some dummy variables, and this can be done using Pandas,', 'start': 13071.651, 'duration': 5.802}, {'end': 13079.754, 'text': 'because logistic regression just take two values.', 'start': 13077.453, 'duration': 2.301}, {'end': 13084.814, 'text': 'So whenever you apply machine learning, you need to make sure that there are no string values present,', 'start': 13080.453, 'duration': 4.361}, {'end': 13087.775, 'text': "because it won't be taking these as your input variables.", 'start': 13084.814, 'duration': 2.961}, {'end': 13092.596, 'text': "So using string you don't have to predict anything but in my case, I have the survive column.", 'start': 13088.235, 'duration': 4.361}, {'end': 13098.858, 'text': 'So I need to predict how many people tend to survive and how many did not so zero stands for did not survive and one stands for survive.', 'start': 13092.736, 'duration': 6.122}], 'summary': 'Using pandas to convert categorical variables into dummy variables for logistic regression, predicting survival (0 for not survived, 1 for survived).', 'duration': 27.207, 'max_score': 13071.651, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ13071651.jpg'}, {'end': 13441.175, 'src': 'embed', 'start': 13411.779, 'weight': 7, 'content': [{'end': 13414.761, 'text': 'So this was all about my data wrangling or just cleaning the data.', 'start': 13411.779, 'duration': 2.982}, {'end': 13417.762, 'text': 'Then my next step is training and testing your data.', 'start': 13415.461, 'duration': 2.301}, {'end': 13422.145, 'text': 'So here we will split the data set into train subset and test subset,', 'start': 13418.223, 'duration': 3.922}, {'end': 13427.008, 'text': "and then what we'll do will build a model on the train data and then predict the output on your test data set.", 'start': 13422.145, 'duration': 4.863}, {'end': 13430.07, 'text': 'So let me just go back to Jupiter and let us implement this as well.', 'start': 13427.548, 'duration': 2.522}, {'end': 13431.467, 'text': 'Over here.', 'start': 13431.107, 'duration': 0.36}, {'end': 13432.969, 'text': 'I need to train my data set.', 'start': 13431.567, 'duration': 1.402}, {'end': 13437.652, 'text': "So I'll just put this in the heading 3.", 'start': 13432.989, 'duration': 4.663}, {'end': 13441.175, 'text': 'So over here you need to define your dependent variable and independent variable.', 'start': 13437.652, 'duration': 3.523}], 'summary': 'Data will be split into train and test subsets and a model will be built for prediction.', 'duration': 29.396, 'max_score': 13411.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ13411779.jpg'}, {'end': 13878.157, 'src': 'embed', 'start': 13835.492, 'weight': 1, 'content': [{'end': 13842.274, 'text': 'Then we have build our model on the train data and then predicted the output on the test data set and then my fifth step is to check the accuracy.', 'start': 13835.492, 'duration': 6.782}, {'end': 13847.035, 'text': 'So here we have calculator accuracy to almost 78% which is quite good.', 'start': 13842.714, 'duration': 4.321}, {'end': 13849.315, 'text': 'You cannot say that accuracy is bad.', 'start': 13847.655, 'duration': 1.66}, {'end': 13852.236, 'text': 'So here it tells me how accurate your results are.', 'start': 13850.035, 'duration': 2.201}, {'end': 13856.517, 'text': 'So here my accuracy score defines that and hence we got a good accuracy.', 'start': 13852.636, 'duration': 3.881}, {'end': 13860.878, 'text': 'So now moving ahead later see the second project that is SUV data analysis.', 'start': 13857.437, 'duration': 3.441}, {'end': 13868.773, 'text': 'So in this a car company has released new SUV in the market and using the previous data about the sales of their SUV.', 'start': 13861.65, 'duration': 7.123}, {'end': 13872.835, 'text': 'They want to predict the category of people who might be interested in buying this.', 'start': 13868.793, 'duration': 4.042}, {'end': 13878.157, 'text': 'so using the logistic regression, you need to find what factors made people more interested in buying this SUV.', 'start': 13872.835, 'duration': 5.322}], 'summary': 'Model achieved 78% accuracy; predicting suv buyers using logistic regression.', 'duration': 42.665, 'max_score': 13835.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ13835492.jpg'}, {'end': 14269.291, 'src': 'embed', 'start': 14242.337, 'weight': 0, 'content': [{'end': 14247.539, 'text': 'All right, so over here I get the accuracy is 89% so we want to know the accuracy in percentage.', 'start': 14242.337, 'duration': 5.202}, {'end': 14255.201, 'text': 'So I just have to multiply it by hundred and if I run this so it gives me 89% so I hope you guys are clear with whatever I have taught you today.', 'start': 14247.559, 'duration': 7.642}, {'end': 14263.346, 'text': 'So here I have taken my independent variables as age and salary, and then we have calculated that how many people can purchase the SUV?', 'start': 14255.74, 'duration': 7.606}, {'end': 14266.188, 'text': 'and then we have calculated our model by checking the accuracy.', 'start': 14263.346, 'duration': 2.842}, {'end': 14269.291, 'text': 'So over here we get the accuracy is 89, which is great.', 'start': 14266.629, 'duration': 2.662}], 'summary': 'Model accuracy is 89%, based on age and salary, for predicting suv purchases.', 'duration': 26.954, 'max_score': 14242.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ14242337.jpg'}, {'end': 14447.772, 'src': 'embed', 'start': 14421.013, 'weight': 3, 'content': [{'end': 14424.796, 'text': 'So the machine predicts that something fishy is going on in the transaction.', 'start': 14421.013, 'duration': 3.783}, {'end': 14428.518, 'text': 'So in order to confirm it it sends you a notification alert.', 'start': 14425.416, 'duration': 3.102}, {'end': 14429.299, 'text': 'All right.', 'start': 14429.059, 'duration': 0.24}, {'end': 14431.641, 'text': 'Well, this is one of the use case of classification.', 'start': 14429.479, 'duration': 2.162}, {'end': 14438.306, 'text': 'You can even use it to classify different items like fruits on the base of its taste, color, size or weight.', 'start': 14431.921, 'duration': 6.385}, {'end': 14445.09, 'text': 'a machine well trained using the classification algorithm can easily predict the class or the type of fruit whenever new data is given to it.', 'start': 14438.306, 'duration': 6.784}, {'end': 14446.411, 'text': 'Not just the fruit.', 'start': 14445.691, 'duration': 0.72}, {'end': 14447.772, 'text': 'It can be any item.', 'start': 14446.551, 'duration': 1.221}], 'summary': 'Machine predicts suspicious transaction, sends alert. classification algorithm used for fruit identification.', 'duration': 26.759, 'max_score': 14421.013, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ14421013.jpg'}, {'end': 14820.32, 'src': 'embed', 'start': 14798.996, 'weight': 9, 'content': [{'end': 14809.448, 'text': 'Well, the modern systems are now able to use the k-nearest neighbor for visual pattern recognition to scan and detect hidden packages in the bottom bin of a shopping cart at the checkout.', 'start': 14798.996, 'duration': 10.452}, {'end': 14815.554, 'text': 'If an object is detected which matches exactly to the object listed in the database,', 'start': 14809.848, 'duration': 5.706}, {'end': 14820.32, 'text': 'then the price of the spotted product could even automatically be added to the customers bill.', 'start': 14815.554, 'duration': 4.766}], 'summary': "Modern systems use k-nearest neighbor for visual pattern recognition to detect hidden packages in shopping carts, automatically adding the price to the customer's bill if a match is found.", 'duration': 21.324, 'max_score': 14798.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ14798996.jpg'}], 'start': 12910.136, 'title': 'Logistic regression and classification', 'summary': 'Covers data wrangling for logistic regression, model training and evaluation with 78% accuracy, prediction of suv purchases with 89% accuracy, and discusses classification algorithms and their real-life applications, such as fraud detection and fruit classification.', 'chapters': [{'end': 13051.158, 'start': 12910.136, 'title': 'Data wrangling in titanic dataset', 'summary': "Discusses the process of data wrangling in the titanic dataset, including dropping the 'cabin' column and removing null values, resulting in a clean dataset with no null or nan values.", 'duration': 141.022, 'highlights': ["The process of data wrangling involves removing the 'cabin' column from the Titanic dataset to clean the data.", 'The next step in data wrangling is removing null values from the dataset using the dropna() function, resulting in a clean dataset with no null or NaN values.', 'The clean dataset is confirmed using a heatmap visualization, showing the removal of null values and ensuring the dataset is entirely black with no null values.', 'The chapter also emphasizes the use of the sum function to verify the absence of null or NaN values in the dataset, confirming the cleanliness of the data.']}, {'end': 13557.525, 'start': 13051.158, 'title': 'Data wrangling for logistic regression', 'summary': 'Discusses the process of data wrangling for logistic regression, including converting string values to categorical variables, creating dummy variables using pandas, and splitting the data set into training and testing subsets using sklearn, aiming to predict survival rates of passengers on the titanic.', 'duration': 506.367, 'highlights': ['The process of converting string values to categorical variables using Pandas, including creating dummy variables for gender and passenger class, is crucial for implementing logistic regression, ensuring that the model can handle only numerical input variables. This is essential for predicting survival rates, where zero stands for not surviving and one stands for surviving, with the gender and passenger class variables being converted to categorical dummy variables.', 'The next step involves data concatenation using Pandas to combine the newly created categorical columns, such as gender, embark location, and passenger class, into a single data set, streamlining the data for logistic regression analysis.', "The chapter further delves into the division of the data set into training and testing subsets using sklearn's 'train_test_split' function, with a split size of 0.3, to facilitate model building on the training data and prediction on the test data, aiding in evaluating the effectiveness of the logistic regression model in predicting survival outcomes."]}, {'end': 14045.702, 'start': 13558.604, 'title': 'Logistic regression model training and evaluation', 'summary': 'Details the process of training a logistic regression model using sklearn library, making predictions, evaluating model performance by calculating accuracy, precision, recall, f1 score, and using confusion matrix, resulting in an accuracy score of 78%, and discusses the application of logistic regression in predicting suv purchases based on user demographic data.', 'duration': 487.098, 'highlights': ['The process of training a logistic regression model using sklearn library, making predictions, and evaluating model performance by calculating accuracy, precision, recall, and F1 score is detailed, resulting in an accuracy score of 78%.', 'The application of logistic regression in predicting SUV purchases based on user demographic data, where age and estimated salary are used to predict the likelihood of a person purchasing an SUV, is discussed.']}, {'end': 14380.511, 'start': 14046.042, 'title': 'Logistic regression and classification', 'summary': 'Covers the process of training a logistic regression model to predict suv purchases, achieving an accuracy of 89%, and delves into the importance and process of classification, comparing linear and logistic regression, and exploring real-life use cases of logistic regression, highlighting the relevance of data analysis and predictive analysis.', 'duration': 334.469, 'highlights': ['The accuracy of the logistic regression model for predicting SUV purchases is 89%.', 'The chapter covers the process of training a logistic regression model to predict SUV purchases.', 'The chapter delves into the importance and process of classification, comparing linear and logistic regression, and exploring real-life use cases of logistic regression.']}, {'end': 14688.329, 'start': 14381.292, 'title': 'Classification algorithms in ml', 'summary': 'Covers the use cases of classification algorithms, including fraud detection and fruit classification, and discusses various techniques such as decision tree, random forest, and naive bayes.', 'duration': 307.037, 'highlights': ['The chapter covers the use cases of classification algorithms, including fraud detection and fruit classification.', 'It discusses various techniques such as decision tree, random forest, and naive Bayes.', 'It explains how classification algorithms can be used for fraud detection and to classify different items based on their characteristics like taste, color, size, or weight.']}, {'end': 15037.777, 'start': 14688.849, 'title': 'Probability, knn algorithm, decision tree', 'summary': 'Covers the concept of probability with examples of false positive and false negative statements, followed by the explanation of knn algorithm and its applications in visual pattern recognition and retail, and finally, the easy-to-understand nature and real-life application of decision trees.', 'duration': 348.928, 'highlights': ['The chapter covers the concept of probability with examples of false positive and false negative statements,', 'followed by the explanation of KNN algorithm and its applications in visual pattern recognition and retail,', 'and finally, the easy-to-understand nature and real-life application of decision trees.']}], 'duration': 2127.641, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ12910136.jpg', 'highlights': ['Logistic regression model for SUV purchases has 89% accuracy', 'Logistic regression model for survival prediction has 78% accuracy', "Data wrangling involves removing 'cabin' column from Titanic dataset", 'Classification algorithms used for fraud detection and fruit classification', 'Process of converting string values to categorical variables using Pandas', 'Training and evaluating logistic regression model for survival prediction', 'Use of sum function to verify absence of null or NaN values in dataset', "Division of dataset into training and testing subsets using 'train_test_split'", 'Application of logistic regression in predicting SUV purchases based on demographic data', 'Explanation of KNN algorithm and its applications in visual pattern recognition and retail']}, {'end': 16485.133, 'segs': [{'end': 15348.504, 'src': 'embed', 'start': 15318.038, 'weight': 6, 'content': [{'end': 15322.263, 'text': 'Well, you can say that pruning is just opposite of splitting what we are doing here.', 'start': 15318.038, 'duration': 4.225}, {'end': 15324.746, 'text': 'We are just removing the sub node of a decision tree.', 'start': 15322.303, 'duration': 2.443}, {'end': 15327.869, 'text': "We'll see more about pruning later in this session.", 'start': 15325.447, 'duration': 2.422}, {'end': 15331.792, 'text': "All right, let's move on ahead next is parent or child node.", 'start': 15328.249, 'duration': 3.543}, {'end': 15339.217, 'text': 'Well, first of all root node is always the parent node and all other nodes associated with that is known as child node.', 'start': 15332.292, 'duration': 6.925}, {'end': 15348.504, 'text': 'Well, you can understand it in a way that all the top node belongs to a parent node and all the bottom node which are derived from a top node is a child node.', 'start': 15339.618, 'duration': 8.886}], 'summary': 'Pruning removes decision tree sub nodes. parent node is root, child nodes are derived from top node.', 'duration': 30.466, 'max_score': 15318.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ15318038.jpg'}, {'end': 15394.39, 'src': 'embed', 'start': 15365.509, 'weight': 7, 'content': [{'end': 15367.911, 'text': 'you decide which question to ask and when.', 'start': 15365.509, 'duration': 2.402}, {'end': 15368.752, 'text': 'so how will you do that?', 'start': 15367.911, 'duration': 0.841}, {'end': 15371.474, 'text': "So let's first of all visualize the decision trees.", 'start': 15369.332, 'duration': 2.142}, {'end': 15374.217, 'text': "So there's the decision tree which will be creating manually.", 'start': 15371.654, 'duration': 2.563}, {'end': 15374.757, 'text': 'All right.', 'start': 15374.237, 'duration': 0.52}, {'end': 15376.799, 'text': "First of all, let's have a look at the data set.", 'start': 15375.037, 'duration': 1.762}, {'end': 15384.786, 'text': 'You have Outlook temperature humidity and windy as your different attribute on the base of that you have to predict that whether you can play or not.', 'start': 15377.439, 'duration': 7.347}, {'end': 15391.467, 'text': 'So which one among them should you pick first answer determine the best attribute that classifies the training data.', 'start': 15385.583, 'duration': 5.884}, {'end': 15392.008, 'text': 'All right.', 'start': 15391.728, 'duration': 0.28}, {'end': 15394.39, 'text': 'So how will you choose the best attribute?', 'start': 15392.328, 'duration': 2.062}], 'summary': 'Visualize decision trees and determine best attribute for classifying training data.', 'duration': 28.881, 'max_score': 15365.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ15365509.jpg'}, {'end': 15433.093, 'src': 'embed', 'start': 15406.871, 'weight': 1, 'content': [{'end': 15414.855, 'text': 'So what is this Jenny index? The Gini index is the measure of impurity or purity used in building a decision tree in cart algorithm.', 'start': 15406.871, 'duration': 7.984}, {'end': 15415.336, 'text': 'All right.', 'start': 15415.116, 'duration': 0.22}, {'end': 15417.297, 'text': 'Next is information gain.', 'start': 15415.896, 'duration': 1.401}, {'end': 15423.643, 'text': 'This information gain is the decrease in entropy after data set is split on the basis of an attribute.', 'start': 15417.757, 'duration': 5.886}, {'end': 15428.869, 'text': 'constructing a decision tree is all about finding a attribute that returns the highest information gain.', 'start': 15423.643, 'duration': 5.226}, {'end': 15433.093, 'text': "All right, so you'll be selecting the node that would give you the highest information gain.", 'start': 15429.269, 'duration': 3.824}], 'summary': 'The jenny index includes gini index and information gain for decision tree construction.', 'duration': 26.222, 'max_score': 15406.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ15406871.jpg'}, {'end': 15793.135, 'src': 'embed', 'start': 15767.128, 'weight': 2, 'content': [{'end': 15777.648, 'text': 'so this was all about entropy all right next is what is information gain Well information gain what it does it measures the reduction in entropy.', 'start': 15767.128, 'duration': 10.52}, {'end': 15781.37, 'text': 'It decides which attribute should be selected as the decision node.', 'start': 15778.329, 'duration': 3.041}, {'end': 15788.033, 'text': 'If s is our total collection, then information gain equals entropy, which we calculated just now,', 'start': 15781.93, 'duration': 6.103}, {'end': 15792.434, 'text': 'that minus weighted average multiplied by entropy of each feature.', 'start': 15788.033, 'duration': 4.401}, {'end': 15793.135, 'text': "Don't worry.", 'start': 15792.774, 'duration': 0.361}], 'summary': 'Information gain measures reduction in entropy to select decision node.', 'duration': 26.007, 'max_score': 15767.128, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ15767128.jpg'}, {'end': 16079.906, 'src': 'embed', 'start': 16004.378, 'weight': 3, 'content': [{'end': 16009.3, 'text': 'So, as calculated, the entropy for Sunny is 0.971, right?', 'start': 16004.378, 'duration': 4.922}, {'end': 16014.562, 'text': "So what we'll do will multiply 5 by 14 with 0.971, right?", 'start': 16009.82, 'duration': 4.742}, {'end': 16021.905, 'text': 'Well, this was a calculation for information when Outlook equals Sunny, but Outlook even equals overcast and rainy.', 'start': 16015.362, 'duration': 6.543}, {'end': 16024.459, 'text': "For in that case, what we'll do again?", 'start': 16022.478, 'duration': 1.981}, {'end': 16027.981, 'text': 'similarly, will calculate for everything, for overcast and sunny.', 'start': 16024.459, 'duration': 3.522}, {'end': 16036.545, 'text': 'for overcast, weighted averages, 4 by 14, multiplied by its entropy, that is 0, and for sunny it is same, 5 by 14, 3 years,', 'start': 16027.981, 'duration': 8.564}, {'end': 16039.266, 'text': 'and 2 nose multiplied by its entropy, that is 0.97.', 'start': 16036.545, 'duration': 2.721}, {'end': 16044.429, 'text': 'when and finally will take the sum of all of them, which equals to 0.693, right?', 'start': 16039.266, 'duration': 5.163}, {'end': 16049.583, 'text': "Next we'll calculate the information gained.", 'start': 16047.362, 'duration': 2.221}, {'end': 16053.584, 'text': 'this, what we did earlier was information taken from Outlook.', 'start': 16049.583, 'duration': 4.001}, {'end': 16057.345, 'text': 'Now we are calculating what is the information we are gaining from Outlook.', 'start': 16053.884, 'duration': 3.461}, {'end': 16064.327, 'text': 'right now, this information gain that equals to total entropy, minus the information that is taken from Outlook.', 'start': 16057.345, 'duration': 6.982}, {'end': 16069.508, 'text': 'All right, so total entropy we had 0.94 minus information.', 'start': 16064.687, 'duration': 4.821}, {'end': 16072.921, 'text': 'We took from Outlook is 0.693.', 'start': 16069.568, 'duration': 3.353}, {'end': 16077.705, 'text': 'So the value of information gained from Outlook results to 0.247.', 'start': 16072.921, 'duration': 4.784}, {'end': 16077.945, 'text': 'All right.', 'start': 16077.705, 'duration': 0.24}, {'end': 16079.906, 'text': 'So next what we have to do.', 'start': 16078.766, 'duration': 1.14}], 'summary': 'Calculation of entropy and information gain from outlook: entropy for sunny is 0.971, and information gained from outlook is 0.247.', 'duration': 75.528, 'max_score': 16004.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16004378.jpg'}, {'end': 16285.036, 'src': 'embed', 'start': 16258.59, 'weight': 5, 'content': [{'end': 16265.332, 'text': 'So, for Outlook, as you can see, the information was 0.693 and its information gain was 0.247.', 'start': 16258.59, 'duration': 6.742}, {'end': 16273.513, 'text': 'in case of temperature, the information was around 0.911 and the information gain that was equal to 0.029.', 'start': 16265.332, 'duration': 8.181}, {'end': 16281.135, 'text': 'in case of humidity, the information gain was 0.152 and in the case of windy, the information gain was 0.048..', 'start': 16273.513, 'duration': 7.622}, {'end': 16285.036, 'text': "So what we'll do will select the attribute with the maximum fine.", 'start': 16281.135, 'duration': 3.901}], 'summary': 'Attributes with max information: outlook (0.693), temperature (0.911), humidity (0.152), windy (0.048)', 'duration': 26.446, 'max_score': 16258.59, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16258590.jpg'}, {'end': 16467.362, 'src': 'embed', 'start': 16435.442, 'weight': 0, 'content': [{'end': 16438.902, 'text': 'a tree model will outperform a classical regression model.', 'start': 16435.442, 'duration': 3.46}, {'end': 16445.844, 'text': 'and third case if you need to build a model which is easy to explain to people, a decision, tree model will always do better than a linear model.', 'start': 16438.902, 'duration': 6.942}, {'end': 16450.216, 'text': 'as the decision tree models are simpler to interpret than linear regression.', 'start': 16446.454, 'duration': 3.762}, {'end': 16451.097, 'text': 'All right.', 'start': 16450.836, 'duration': 0.261}, {'end': 16458.821, 'text': "Now, let's move on ahead and see how you can write a decision tree classifier from scratch and python using the cart algorithm.", 'start': 16452.377, 'duration': 6.444}, {'end': 16459.541, 'text': 'All right.', 'start': 16459.321, 'duration': 0.22}, {'end': 16464.06, 'text': "For this I'll be using Jupiter notebook with python 3.0 installed on it.", 'start': 16460.177, 'duration': 3.883}, {'end': 16467.362, 'text': "All right, so let's open the Anaconda and the Jupiter notebook.", 'start': 16464.279, 'duration': 3.083}], 'summary': 'Decision tree models outperform linear regression, especially for easy interpretation. creating a decision tree classifier using the cart algorithm in python will be demonstrated.', 'duration': 31.92, 'max_score': 16435.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16435442.jpg'}], 'start': 15037.857, 'title': 'Decision trees and attribute selection', 'summary': 'Provides an overview of decision trees, cart algorithm, and attribute selection concepts such as gini index, information gain, reduction in variance, and entropy. it discusses the calculation of information gain for different outlook parameters resulting in the highest entropy and emphasizes the importance of calculating metrics for selecting the root node.', 'chapters': [{'end': 15365.509, 'start': 15037.857, 'title': 'Understanding decision trees', 'summary': 'Provides an overview of decision trees and the cart algorithm, discussing the process of building a tree, handling data separation, and terminologies like root node, leaf node, splitting, and pruning.', 'duration': 327.652, 'highlights': ['The chapter provides an overview of decision trees and the CART algorithm', 'Discussing the process of building a tree and handling data separation', 'Understanding decision tree terminologies like root node, leaf node, splitting, and pruning']}, {'end': 15856.734, 'start': 15365.509, 'title': 'Decision tree: attribute selection', 'summary': 'Discusses the process of attribute selection in decision tree construction, covering concepts such as gini index, information gain, reduction in variance, entropy, and information gain. it emphasizes the importance of calculating these metrics for selecting the root node in building the decision tree.', 'duration': 491.225, 'highlights': ['The Gini index is the measure of impurity or purity used in building a decision tree in cart algorithm.', 'Information gain is the decrease in entropy after data set is split on the basis of an attribute, and it is crucial in identifying the best attribute for the decision tree.', 'Reduction in variance is an algorithm used for continuous target variable or regression problems, where the split with lower variance is selected as a criteria to split the population.', 'Entropy is a metric measuring the impurity of data, with its value being influenced by the degree of randomness in the data.', 'Information gain measures the reduction in entropy and is crucial for selecting the decision node attribute.']}, {'end': 16079.906, 'start': 15857.446, 'title': 'Decision tree information gain', 'summary': 'Discusses the calculation of information gain for different outlook parameters (sunny, overcast, rainy) in a decision tree, with sunny having the highest entropy and the information gain from outlook resulting in 0.247.', 'duration': 222.46, 'highlights': ['The entropy for sunny outlook is 0.971, overcast has an entropy of 0, and rainy has an entropy of 0.971, with sunny having the highest entropy.', 'The information gained from outlook is calculated as 0.247, which is the difference between the total entropy (0.94) and the information taken from outlook (0.693).', 'Weighted averages and entropies are calculated for each outlook parameter to determine the information gain, with sunny having the highest weighted average of 5/14.']}, {'end': 16485.133, 'start': 16080.167, 'title': 'Decision tree classification', 'summary': 'Discusses the calculation of information gained from a decision tree, with wendy as the root node, resulting in an information gain of 0.048. it further explores the subdivision of the root node based on attributes like outlook, temperature, humidity, and windy, and the concept of pruning to optimize the decision tree model. additionally, it highlights the factors influencing the selection of tree-based models over linear models.', 'duration': 404.966, 'highlights': ['The information gained from Wendy, the root node, is 0.048, followed by information gains from Outlook (0.247), Temperature (0.029), and Humidity (0.152).', 'The decision tree is further subdivided based on attributes like Outlook, Temperature, Humidity, and Windy, resulting in the selection of Outlook as the root node.', 'The chapter introduces the concept of pruning to optimize the decision tree model, which involves reducing complexity and obtaining the optimal solution.', 'Factors influencing the selection of tree-based models over linear models are discussed, emphasizing the role of linearity, non-linearity, and interpretability in model selection.']}], 'duration': 1447.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ15037857.jpg', 'highlights': ['The chapter provides an overview of decision trees and the CART algorithm', 'The Gini index is the measure of impurity or purity used in building a decision tree in cart algorithm', 'Information gain measures the reduction in entropy and is crucial for selecting the decision node attribute', 'The entropy for sunny outlook is 0.971, overcast has an entropy of 0, and rainy has an entropy of 0.971, with sunny having the highest entropy', 'The information gained from outlook is calculated as 0.247, which is the difference between the total entropy (0.94) and the information taken from outlook (0.693)', 'The information gained from Wendy, the root node, is 0.048, followed by information gains from Outlook (0.247), Temperature (0.029), and Humidity (0.152)', 'Understanding decision tree terminologies like root node, leaf node, splitting, and pruning', 'The decision tree is further subdivided based on attributes like Outlook, Temperature, Humidity, and Windy, resulting in the selection of Outlook as the root node', 'Factors influencing the selection of tree-based models over linear models are discussed, emphasizing the role of linearity, non-linearity, and interpretability in model selection', 'Weighted averages and entropies are calculated for each outlook parameter to determine the information gain, with sunny having the highest weighted average of 5/14']}, {'end': 17945.55, 'segs': [{'end': 16749.911, 'src': 'embed', 'start': 16717.921, 'weight': 2, 'content': [{'end': 16721.125, 'text': 'Next what we are doing here with defining a function as information gain.', 'start': 16717.921, 'duration': 3.204}, {'end': 16723.563, 'text': 'So what this information gain function?', 'start': 16721.784, 'duration': 1.779}, {'end': 16730.426, 'text': 'does it calculates the information gain using the uncertainty of the starting node, minus the weighted impurity of the child node?', 'start': 16723.563, 'duration': 6.863}, {'end': 16733.287, 'text': 'The next function is find the best split.', 'start': 16731.026, 'duration': 2.261}, {'end': 16743.649, 'text': 'Well, this function is used to find the best question to ask by iterating over every feature or value and then calculating the information gain for the detail explanation on the code.', 'start': 16733.787, 'duration': 9.862}, {'end': 16745.81, 'text': 'You can find the code in the description given below.', 'start': 16743.669, 'duration': 2.141}, {'end': 16749.911, 'text': 'All right next will define a class as leaf for classifying the data.', 'start': 16745.83, 'duration': 4.081}], 'summary': 'Defining functions for information gain and best split, and classifying data', 'duration': 31.99, 'max_score': 16717.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16717920.jpg'}, {'end': 16853.001, 'src': 'embed', 'start': 16828.547, 'weight': 4, 'content': [{'end': 16834.55, 'text': 'So if we are reaching till this position then you have already found a feature of value which will be used to partition the data set.', 'start': 16828.547, 'duration': 6.003}, {'end': 16840.157, 'text': 'Then what you will do you will recursively build the true branch and similarly recursively build the false branch.', 'start': 16835.296, 'duration': 4.861}, {'end': 16845.539, 'text': 'So return decision underscore node and inside that will be passing question true branch and false branch.', 'start': 16840.898, 'duration': 4.641}, {'end': 16846.499, 'text': 'So what it will do?', 'start': 16845.559, 'duration': 0.94}, {'end': 16849.28, 'text': 'it will return a question node and this question node.', 'start': 16846.499, 'duration': 2.781}, {'end': 16853.001, 'text': 'this records the best feature or the value to ask at this point.', 'start': 16849.28, 'duration': 3.721}], 'summary': 'Recursive process to build decision tree based on valuable feature for partitioning data.', 'duration': 24.454, 'max_score': 16828.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16828547.jpg'}, {'end': 17066.806, 'src': 'embed', 'start': 17043.652, 'weight': 1, 'content': [{'end': 17053.934, 'text': 'Now needless to say that credit card companies have a very nested interest in identifying financial transactions that are illegitimate and criminal in nature.', 'start': 17043.652, 'duration': 10.282}, {'end': 17060.082, 'text': 'and also I would like to mention this point that, according to the Federal Reserve payments study,', 'start': 17055, 'duration': 5.082}, {'end': 17066.806, 'text': 'Americans used credit cards to pay for 26.2 billion purchases in 2012,', 'start': 17060.082, 'duration': 6.724}], 'summary': 'Credit card companies aim to identify illegitimate transactions; americans made 26.2b credit card purchases in 2012.', 'duration': 23.154, 'max_score': 17043.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ17043652.jpg'}, {'end': 17609.042, 'src': 'embed', 'start': 17583.535, 'weight': 0, 'content': [{'end': 17590.397, 'text': 'therefore the final outcome will be in the favor of the class B, and that is how random forest actually works upon.', 'start': 17583.535, 'duration': 6.862}, {'end': 17601.819, 'text': 'now, one really beautiful thing about this particular algorithm is that it is one of the versatile algorithms which is capable of performing both regression as well as classification.', 'start': 17590.397, 'duration': 11.422}, {'end': 17609.042, 'text': "Now let's try to understand random forest further, with a very beautiful example.", 'start': 17604.096, 'duration': 4.946}], 'summary': 'Random forest is versatile, capable of performing both regression and classification.', 'duration': 25.507, 'max_score': 17583.535, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ17583535.jpg'}], 'start': 16485.133, 'title': 'Decision trees and random forest in python', 'summary': "Covers the implementation of decision tree classifier and algorithm in python, including functions for calculating information gain, finding the best split, and a use case of credit risk detection. it also demonstrates the use of random forest algorithm for credit card fraud detection based on applicant's income and age, highlighting its importance in predicting fraudulent loan applications.", 'chapters': [{'end': 16715.88, 'start': 16485.133, 'title': 'Decision tree classifier in python', 'summary': 'Covers the implementation of a decision tree classifier in python using a jupyter notebook, explaining data set initialization, unique value identification, partitioning, and impurity calculation.', 'duration': 230.747, 'highlights': ['Data set initialization and feature explanation', 'Identifying unique values in the data set', 'Partitioning the data set based on questions', 'Calculating genie impurity for data set', 'Testing numeric values and classifying questions']}, {'end': 17042.852, 'start': 16717.921, 'title': 'Decision tree algorithm', 'summary': 'Describes the implementation of a decision tree algorithm, including functions for calculating information gain, finding the best split, classifying data, and printing the tree, with a simple use case of credit risk detection.', 'duration': 324.931, 'highlights': ["The function 'information gain' calculates the information gain using the uncertainty of the starting node minus the weighted impurity of the child node.", "The 'find the best split' function is used to find the best question to ask by iterating over every feature or value and then calculating the information gain.", "The 'build tree' function is used to recursively build the true and false branches based on the partitioning and information gain.", "The 'classify' function is used to decide whether to follow the true branch or the false branch and then compare the feature values stored in the node to the example."]}, {'end': 17609.042, 'start': 17043.652, 'title': 'Credit card fraud detection with random forest', 'summary': "Highlights the importance of identifying fraudulent transactions for credit card companies, the risk and loss associated with unauthorized transactions, and demonstrates how random forest algorithm can be used to predict fraudulent loan applications based on applicant's income and age.", 'duration': 565.39, 'highlights': ['The chapter emphasizes the importance of identifying fraudulent transactions for credit card companies to minimize the estimated loss of US $6.1 billion due to unauthorized transactions in 2012, as reported by the Federal Reserve payments study.', "The transcript explains the process of using random forest algorithm to predict fraudulent loan applications based on an applicant's income and age, assessing the risk and probability of loan approval based on different income categories and age groups.", 'The detailed explanation of random forest algorithm provides insight into its mechanism of compiling results from multiple decision trees, and its capability of performing both regression and classification, making it a versatile algorithm for fraud detection.']}, {'end': 17945.55, 'start': 17609.062, 'title': 'Decision making: edge of tomorrow', 'summary': "Discusses how to make a decision on watching edge of tomorrow based on friends' opinions and preferences, using decision trees and probabilities, with a final outcome dependent on compiled results of friends' votes.", 'duration': 336.488, 'highlights': ["Using friends' opinions and preferences to make a decision on watching Edge of Tomorrow", 'Decision trees and probabilities in determining the likelihood of liking Edge of Tomorrow', "Final outcome dependent on compiled results of friends' votes"]}], 'duration': 1460.417, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ16485133.jpg', 'highlights': ['The detailed explanation of random forest algorithm provides insight into its mechanism of compiling results from multiple decision trees, and its capability of performing both regression and classification, making it a versatile algorithm for fraud detection.', 'The chapter emphasizes the importance of identifying fraudulent transactions for credit card companies to minimize the estimated loss of US $6.1 billion due to unauthorized transactions in 2012, as reported by the Federal Reserve payments study.', "The function 'information gain' calculates the information gain using the uncertainty of the starting node minus the weighted impurity of the child node.", "The 'find the best split' function is used to find the best question to ask by iterating over every feature or value and then calculating the information gain.", "The 'build tree' function is used to recursively build the true and false branches based on the partitioning and information gain."]}, {'end': 19524.768, 'segs': [{'end': 18040.6, 'src': 'embed', 'start': 18000.666, 'weight': 0, 'content': [{'end': 18009.096, 'text': 'random forest is being actually used to make it out whether the applicant will be a default pair or it will be non-default one,', 'start': 18000.666, 'duration': 8.43}, {'end': 18013.782, 'text': 'so that it can accordingly approve or reject the applications of loan.', 'start': 18009.096, 'duration': 4.686}, {'end': 18016.905, 'text': 'right so that is how random forest is being used in banking.', 'start': 18013.782, 'duration': 3.123}, {'end': 18019.627, 'text': 'talking about medicine,', 'start': 18017.566, 'duration': 2.061}, {'end': 18028.833, 'text': 'random forest is widely used in medicine field to predict beforehand what is the probability if a person will actually have a particular disease or not.', 'start': 18019.627, 'duration': 9.206}, {'end': 18033.195, 'text': "right so it's actually used to look at the various disease trends.", 'start': 18028.833, 'duration': 4.362}, {'end': 18038.959, 'text': "let's say, you want to figure out what is the probability that a person will have diabetes or not.", 'start': 18033.195, 'duration': 5.764}, {'end': 18040.6, 'text': 'and so what would you do?', 'start': 18038.959, 'duration': 1.641}], 'summary': 'Random forest used in banking to predict loan defaults, and in medicine to predict disease probabilities.', 'duration': 39.934, 'max_score': 18000.666, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18000666.jpg'}, {'end': 18088.481, 'src': 'embed', 'start': 18062.757, 'weight': 2, 'content': [{'end': 18073.086, 'text': 'and then you will finally compile the results of all those variables and then you will make a final decision as to whether the person will have diabetes in the near future or not.', 'start': 18062.757, 'duration': 10.329}, {'end': 18077.048, 'text': 'That is how random forest will be used in medicine sector.', 'start': 18074.164, 'duration': 2.884}, {'end': 18084.076, 'text': 'Now random forest is also actually used to find out the land use.', 'start': 18078.209, 'duration': 5.867}, {'end': 18088.481, 'text': 'For example, I want to set up a particular industry in certain area.', 'start': 18084.276, 'duration': 4.205}], 'summary': 'Random forest used for predicting diabetes and land use.', 'duration': 25.724, 'max_score': 18062.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18062757.jpg'}, {'end': 18222.19, 'src': 'embed', 'start': 18196.554, 'weight': 3, 'content': [{'end': 18205.059, 'text': 'so that is how they track each and every particular move of yours and then they try to predict whether you will be moving out or not.', 'start': 18196.554, 'duration': 8.505}, {'end': 18208.541, 'text': 'so that is how they identify the customer churn.', 'start': 18205.059, 'duration': 3.482}, {'end': 18215.225, 'text': 'so these all are various domains where random forest is used, and this is not the only list.', 'start': 18208.541, 'duration': 6.684}, {'end': 18220.889, 'text': 'so there are numerous other examples which actually are using random forests.', 'start': 18215.225, 'duration': 5.664}, {'end': 18222.19, 'text': 'that makes it so special.', 'start': 18220.889, 'duration': 1.301}], 'summary': 'Random forests are used to track and predict customer churn, among various other domains.', 'duration': 25.636, 'max_score': 18196.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18196554.jpg'}, {'end': 18268, 'src': 'embed', 'start': 18243.715, 'weight': 14, 'content': [{'end': 18251.276, 'text': 'So here T is the total number of the predicted variables that you have in your data set, and out of those total predicted variables,', 'start': 18243.715, 'duration': 7.561}, {'end': 18255.097, 'text': 'you will select some randomly, some few features out of those.', 'start': 18251.276, 'duration': 3.821}, {'end': 18259.918, 'text': 'Now why we are actually selecting a few features only.', 'start': 18255.537, 'duration': 4.381}, {'end': 18268, 'text': 'the reason is that if you will select all the predicted variables or the total predicted variables, then each of your decision tree will be same.', 'start': 18259.918, 'duration': 8.082}], 'summary': 'Select only a few features out of the total predicted variables to avoid identical decision trees.', 'duration': 24.285, 'max_score': 18243.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18243715.jpg'}, {'end': 18344.362, 'src': 'embed', 'start': 18315.993, 'weight': 13, 'content': [{'end': 18321.638, 'text': 'So the first important step is to select certain number of features out of all the features.', 'start': 18315.993, 'duration': 5.645}, {'end': 18324.28, 'text': "Now let's move on to the second step.", 'start': 18322.518, 'duration': 1.762}, {'end': 18326.804, 'text': "Let's say for any node D.", 'start': 18324.561, 'duration': 2.243}, {'end': 18329.907, 'text': 'Now the first step is to calculate the best split at that point.', 'start': 18326.804, 'duration': 3.103}, {'end': 18334.893, 'text': 'So you know that decision tree, how decision tree is actually implemented.', 'start': 18330.788, 'duration': 4.105}, {'end': 18344.362, 'text': 'So you pick up the most significant variable, right, and then you will split that particular node into further child nodes.', 'start': 18335.534, 'duration': 8.828}], 'summary': 'Select features, calculate best split, implement decision tree', 'duration': 28.369, 'max_score': 18315.993, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18315993.jpg'}, {'end': 18817.492, 'src': 'embed', 'start': 18769.426, 'weight': 4, 'content': [{'end': 18776.189, 'text': "so let's say, if the variance is, say, x for decision tree, but for random forest, let's say,", 'start': 18769.426, 'duration': 6.763}, {'end': 18779.791, 'text': 'we have implemented n number of decision trees parallely.', 'start': 18776.189, 'duration': 3.602}, {'end': 18787.703, 'text': 'So my entire variance gets averaged upon and my final variance actually becomes x upon n.', 'start': 18780.211, 'duration': 7.492}, {'end': 18793.572, 'text': 'So that is how the entire variance actually goes down as compared to other algorithms, right?', 'start': 18787.703, 'duration': 5.869}, {'end': 18800.672, 'text': 'Now, second most important feature is that it works well for both classification and regression problems,', 'start': 18794.605, 'duration': 6.067}, {'end': 18807.881, 'text': 'and by far I have come across this is one and the only algorithm which works equally well for both of them.', 'start': 18800.672, 'duration': 7.209}, {'end': 18817.492, 'text': 'Be it classification kind of problem or a regression kind of problem, right? then it really runs efficient on large databases.', 'start': 18808.421, 'duration': 9.071}], 'summary': 'Random forest reduces variance by averaging, works well for classification and regression, and efficient on large databases.', 'duration': 48.066, 'max_score': 18769.426, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18769426.jpg'}, {'end': 18862.169, 'src': 'embed', 'start': 18837.512, 'weight': 8, 'content': [{'end': 18846.576, 'text': 'is because it has got certain implicit methods which actually take care and remove all the outliers and all the missing data,', 'start': 18837.512, 'duration': 9.064}, {'end': 18853.079, 'text': "and you really don't have to take care about all that thing while you are in the stages of input preparation.", 'start': 18846.576, 'duration': 6.503}, {'end': 18857.581, 'text': 'So random forest is all here to take care of everything else.', 'start': 18853.219, 'duration': 4.362}, {'end': 18862.169, 'text': 'and next is it performs implicit feature selection.', 'start': 18858.767, 'duration': 3.402}], 'summary': 'Random forest handles outliers, missing data, and feature selection automatically.', 'duration': 24.657, 'max_score': 18837.512, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18837512.jpg'}, {'end': 18911.238, 'src': 'embed', 'start': 18885.423, 'weight': 7, 'content': [{'end': 18892.928, 'text': 'no matter how so random forest will automatically take care and it will implement all those 500 decision trees.', 'start': 18885.423, 'duration': 7.505}, {'end': 18897.631, 'text': 'and those all 500 decision trees will be different from each other.', 'start': 18892.928, 'duration': 4.703}, {'end': 18905.876, 'text': 'and this is because it has got implicit methods which will automatically collect different parameters itself out of all the variables that you have.', 'start': 18897.631, 'duration': 8.245}, {'end': 18910.298, 'text': 'right, then it can be easily grown in parallel.', 'start': 18905.876, 'duration': 4.422}, {'end': 18911.238, 'text': 'why it is so?', 'start': 18910.298, 'duration': 0.94}], 'summary': 'Random forest implements 500 different decision trees, allowing for parallel growth.', 'duration': 25.815, 'max_score': 18885.423, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18885423.jpg'}, {'end': 18958.545, 'src': 'embed', 'start': 18935.01, 'weight': 5, 'content': [{'end': 18941.614, 'text': 'And the last point is that it has got methods for balancing error in unbalanced datasets.', 'start': 18935.01, 'duration': 6.604}, {'end': 18946.537, 'text': 'Now what exactly unbalanced datasets are? Let me just give you an example of that.', 'start': 18942.234, 'duration': 4.303}, {'end': 18956.503, 'text': "So let's say you're working on a dataset, fine? And you create a random forest model and get 90% accuracy immediately.", 'start': 18947.137, 'duration': 9.366}, {'end': 18958.545, 'text': 'Fantastic, you think right?', 'start': 18956.924, 'duration': 1.621}], 'summary': 'Methods for balancing error in unbalanced datasets, achieving 90% accuracy with random forest model.', 'duration': 23.535, 'max_score': 18935.01, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ18935010.jpg'}, {'end': 19078.165, 'src': 'embed', 'start': 19054.578, 'weight': 11, 'content': [{'end': 19061.803, 'text': 'Similarly if K equal 3 the labels of the three closest classes are checked and the most common label is assigned to the testing data.', 'start': 19054.578, 'duration': 7.225}, {'end': 19064.625, 'text': 'So this is what a KN KN algorithm means.', 'start': 19062.323, 'duration': 2.302}, {'end': 19066.046, 'text': 'So moving on ahead.', 'start': 19065.246, 'duration': 0.8}, {'end': 19069.709, 'text': "Let's see some of the example of scenarios where KNN is used in the industry.", 'start': 19066.226, 'duration': 3.483}, {'end': 19074.944, 'text': "So let's see the industrial application of Canon algorithm starting with recommender system.", 'start': 19070.603, 'duration': 4.341}, {'end': 19078.165, 'text': 'Well, the biggest use case of Canon search is a recommender system.', 'start': 19075.364, 'duration': 2.801}], 'summary': 'Knn algorithm assigns most common label to testing data. it is used in recommender systems in industry.', 'duration': 23.587, 'max_score': 19054.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19054578.jpg'}, {'end': 19118.236, 'src': 'embed', 'start': 19091.229, 'weight': 12, 'content': [{'end': 19097.19, 'text': 'this Canon algorithm applies to recommending products like an Amazon, or for recommending media like in case of Netflix,', 'start': 19091.229, 'duration': 5.961}, {'end': 19099.831, 'text': 'or even for recommending advertisement to display to a user.', 'start': 19097.19, 'duration': 2.641}, {'end': 19104.512, 'text': "If I'm not wrong, almost all of you must have used Amazon for shopping right?", 'start': 19100.411, 'duration': 4.101}, {'end': 19110.374, 'text': 'So, just to tell you, more than 35% of amazon.com revenue is generated by its recommendation engine.', 'start': 19104.752, 'duration': 5.622}, {'end': 19118.236, 'text': "So what's their strategy Amazon uses recommendation as a targeted marketing tool in both the email campaigns around most of its website pages.", 'start': 19110.894, 'duration': 7.342}], 'summary': "Canon algorithm powers amazon's recommendation engine, generating over 35% of revenue.", 'duration': 27.007, 'max_score': 19091.229, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19091229.jpg'}, {'end': 19168.104, 'src': 'embed', 'start': 19146.297, 'weight': 10, 'content': [{'end': 19154.944, 'text': 'So next industrial application of Canon algorithm is concept search or searching semantically similar documents and classifying documents containing similar topics.', 'start': 19146.297, 'duration': 8.647}, {'end': 19162.971, 'text': 'So, as you know, the data on the internet is increasing exponentially every single second, their billions and billions of documents on the internet.', 'start': 19155.625, 'duration': 7.346}, {'end': 19168.104, 'text': 'Each document on the internet contains multiple Concepts that could be a potential Concept.', 'start': 19163.599, 'duration': 4.505}], 'summary': 'Canon algorithm for concept search in billions of internet documents.', 'duration': 21.807, 'max_score': 19146.297, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19146297.jpg'}, {'end': 19355.504, 'src': 'embed', 'start': 19296.661, 'weight': 15, 'content': [{'end': 19301.63, 'text': 'So by now, I guess you know how a cannon algorithm work and what is the significance of K in Canon algorithm?', 'start': 19296.661, 'duration': 4.969}, {'end': 19304.015, 'text': 'So how will you choose the value of K?', 'start': 19302.352, 'duration': 1.663}, {'end': 19308.278, 'text': 'So, keeping in mind this case, the most important parameter in Kane and algorithm.', 'start': 19305.057, 'duration': 3.221}, {'end': 19312.64, 'text': "So let's see when you build a K nearest neighbor classifier, how will you choose a value of K??", 'start': 19308.738, 'duration': 3.902}, {'end': 19315.02, 'text': 'Well, you might have a specific value of K in mind,', 'start': 19312.82, 'duration': 2.2}, {'end': 19322.643, 'text': 'or you could divide up your data and use something like cross validation technique to test several values of K in order to determine which works best for your data.', 'start': 19315.02, 'duration': 7.623}, {'end': 19331.186, 'text': 'For example, if n equal thousand cases then in that case the optimal value of K lies somewhere in between 1 to 19, but is unless you try it.', 'start': 19322.843, 'duration': 8.343}, {'end': 19332.186, 'text': 'You cannot be sure of it.', 'start': 19331.286, 'duration': 0.9}, {'end': 19335.711, 'text': 'So, you know how the algorithm is working on a higher level.', 'start': 19333.209, 'duration': 2.502}, {'end': 19338.713, 'text': "Let's move on and see how things are predicted using Canon algorithm.", 'start': 19335.911, 'duration': 2.802}, {'end': 19344.437, 'text': 'Remember I told you the Canon algorithm uses the least distance measure in order to find its nearest neighbors.', 'start': 19339.173, 'duration': 5.264}, {'end': 19347.079, 'text': "So let's see how these distances calculated.", 'start': 19345.117, 'duration': 1.962}, {'end': 19350.381, 'text': 'Well, there are several distance measure which can be used.', 'start': 19348.179, 'duration': 2.202}, {'end': 19355.504, 'text': 'So to start with will mainly focus on Euclidean distance and Manhattan distance in this session.', 'start': 19350.861, 'duration': 4.643}], 'summary': 'Choosing the value of k in canon algorithm involves testing several values through cross validation, with the optimal value lying somewhere between 1 to 19 for 1000 cases.', 'duration': 58.843, 'max_score': 19296.661, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19296661.jpg'}, {'end': 19473.567, 'src': 'embed', 'start': 19443.921, 'weight': 18, 'content': [{'end': 19447.463, 'text': 'now our task is to predict what is the t-shirt size of that particular customer.', 'start': 19443.921, 'duration': 3.542}, {'end': 19450.886, 'text': 'So for this will be using the KNN algorithm.', 'start': 19448.284, 'duration': 2.602}, {'end': 19453.107, 'text': 'So the very first thing what we need to do.', 'start': 19451.546, 'duration': 1.561}, {'end': 19455.329, 'text': 'We need to calculate the Euclidean distance.', 'start': 19453.347, 'duration': 1.982}, {'end': 19460.352, 'text': 'So now that you have a new data of height 161 centimeter and weight as 61 kg.', 'start': 19455.969, 'duration': 4.383}, {'end': 19464.075, 'text': 'So the very first thing that will do is will calculate the Euclidean distance.', 'start': 19460.953, 'duration': 3.122}, {'end': 19473.567, 'text': 'which is nothing but the square root of 161 minus 158 whole square plus 61 minus 58 whole square and square root of that is 4.24.', 'start': 19464.679, 'duration': 8.888}], 'summary': 'Using knn algorithm, predict t-shirt size based on euclidean distance of new data (161 cm height, 61 kg) resulting in 4.24.', 'duration': 29.646, 'max_score': 19443.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19443921.jpg'}], 'start': 17946.23, 'title': 'Random forest applications', 'summary': "Covers diverse applications of random forest, including predicting loan defaulters, disease probability, customer churn, and weather conditions. it also discusses the algorithm's versatility, scalability, and its industrial applications in recommender systems.", 'chapters': [{'end': 18196.034, 'start': 17946.23, 'title': 'Application of random forest', 'summary': 'Explains the application of random forest in diverse domains such as banking, medicine, and marketing, highlighting its use in predicting loan defaulters, disease probability, and customer churn.', 'duration': 249.804, 'highlights': ['Random forest is used in banking to predict loan defaulters, making decisions on loan applications based on the analysis of various variables.', 'Random forest is widely used in medicine to predict disease probability by analyzing various patient variables such as glucose concentration, BMI, insulin levels, and age.', 'Random forest is utilized to predict land use, aiding in decision-making for setting up industries by analyzing parameters such as vegetation, urban population, and distance from transportation modes.', 'In marketing, random forest is applied to identify customer churn by analyzing customer behavior, purchasing history, and preferences.']}, {'end': 18470.84, 'start': 18196.554, 'title': 'Understanding random forest algorithm', 'summary': 'Discusses the application and working of random forest algorithm where it is highlighted how random forest identifies customer churn and the step-by-step process of creating multiple decision trees through feature selection, node splitting, and majority voting.', 'duration': 274.286, 'highlights': ['Random forest identifies customer churn by tracking and predicting customer behavior, making it applicable in various domains.', 'The step-by-step process of creating multiple decision trees involves feature selection, node splitting, and majority voting to compile results for accurate predictions.', 'Feature selection in random forest involves randomly choosing a subset of predicted variables to avoid identical decision trees and enhance model intelligence.']}, {'end': 18932.869, 'start': 18472.099, 'title': 'Random forest in weather prediction', 'summary': "Discusses using random forest for weather prediction, with a dataset of 14 days' weather conditions and target variable 'play'. it explains the decision-making process for predicting whether a match will take place based on weather conditions and highlights the features and benefits of random forest, emphasizing its accuracy, versatility, scalability, and automatic handling of input preparation and feature selection.", 'duration': 460.77, 'highlights': ['Random forest averages the entire variance across decision trees, reducing final variance to x upon n.', 'Random forest works well for both classification and regression problems.', 'Random forest is scalable and efficient for large databases.', 'Random forest performs implicit input preparation, handling outliers and missing data.', 'Random forest automatically performs feature selection while implementing multiple decision trees.']}, {'end': 19145.332, 'start': 18935.01, 'title': 'Random forest and knn algorithms', 'summary': "Explains the features of random forests, including its ability to balance errors in unbalanced datasets, and details the industrial applications of knn algorithm, particularly in recommender systems, citing that over 35% of amazon's revenue is generated by its recommendation engine.", 'duration': 210.322, 'highlights': ['Random forest models address bias in unbalanced datasets, ensuring balanced errors and decision-making across all classes, preventing biased decisions towards a specific class.', "K Nearest Neighbor (KNN) algorithm is extensively used in recommender systems, contributing to over 35% of Amazon's revenue through targeted marketing and product recommendation strategies.", 'Amazon uses recommendation engine for targeted marketing, generating over 35% of its revenue, and employs strategies like recommending products based on user browsing behavior to increase average order value.']}, {'end': 19524.768, 'start': 19146.297, 'title': 'Canon algorithm applications and working', 'summary': 'Discusses the industrial applications of canon algorithm, including concept search and document classification, and explains how the canon algorithm works for predicting classes in knn using euclidean and manhattan distances, with examples and implications.', 'duration': 378.471, 'highlights': ['The Canon algorithm is applied for concept search, document classification, and other advanced applications like handwriting detection, image recognition, and video recognition, to handle the increasing amount of data on the internet.', 'The working of the Canon algorithm is explained using the KNN approach, emphasizing the significance of K, the number of nearest neighbors, in predicting the class of a new point based on the majority of its closest neighbors.', 'The significance of the value of K in the Canon algorithm is highlighted, showcasing the importance of choosing the optimal K value through techniques like cross-validation, especially in cases where the optimal value of K lies between 1 to 19 for a dataset with a thousand cases.', 'The chapter explains the calculation of distances in KNN using Euclidean and Manhattan distance measures, providing a clear understanding of the differences between the two distance metrics and their practical application in predicting attributes such as t-shirt size based on height and weight data.', 'The detailed explanation of how the Canon algorithm calculates distances and predicts classes using the KNN approach, including the calculation of Euclidean distances and the process of finding the top K minimum Euclidean distances to predict the class of a new data point.']}], 'duration': 1578.538, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ17946230.jpg', 'highlights': ['Random forest is used in banking to predict loan defaulters, making decisions on loan applications based on the analysis of various variables.', 'Random forest is widely used in medicine to predict disease probability by analyzing various patient variables such as glucose concentration, BMI, insulin levels, and age.', 'Random forest is utilized to predict land use, aiding in decision-making for setting up industries by analyzing parameters such as vegetation, urban population, and distance from transportation modes.', 'Random forest is applied in marketing to identify customer churn by analyzing customer behavior, purchasing history, and preferences.', 'Random forest averages the entire variance across decision trees, reducing final variance to x upon n.', 'Random forest models address bias in unbalanced datasets, ensuring balanced errors and decision-making across all classes, preventing biased decisions towards a specific class.', 'Random forest works well for both classification and regression problems.', 'Random forest is scalable and efficient for large databases.', 'Random forest performs implicit input preparation, handling outliers and missing data.', 'Random forest automatically performs feature selection while implementing multiple decision trees.', 'The Canon algorithm is applied for concept search, document classification, and other advanced applications like handwriting detection, image recognition, and video recognition, to handle the increasing amount of data on the internet.', "K Nearest Neighbor (KNN) algorithm is extensively used in recommender systems, contributing to over 35% of Amazon's revenue through targeted marketing and product recommendation strategies.", 'Amazon uses recommendation engine for targeted marketing, generating over 35% of its revenue, and employs strategies like recommending products based on user browsing behavior to increase average order value.', 'The step-by-step process of creating multiple decision trees involves feature selection, node splitting, and majority voting to compile results for accurate predictions.', 'Feature selection in random forest involves randomly choosing a subset of predicted variables to avoid identical decision trees and enhance model intelligence.', 'The working of the Canon algorithm is explained using the KNN approach, emphasizing the significance of K, the number of nearest neighbors, in predicting the class of a new point based on the majority of its closest neighbors.', 'The significance of the value of K in the Canon algorithm is highlighted, showcasing the importance of choosing the optimal K value through techniques like cross-validation, especially in cases where the optimal value of K lies between 1 to 19 for a dataset with a thousand cases.', 'The detailed explanation of how the Canon algorithm calculates distances and predicts classes using the KNN approach, including the calculation of Euclidean distances and the process of finding the top K minimum Euclidean distances to predict the class of a new data point.', 'The chapter explains the calculation of distances in KNN using Euclidean and Manhattan distance measures, providing a clear understanding of the differences between the two distance metrics and their practical application in predicting attributes such as t-shirt size based on height and weight data.']}, {'end': 21234.958, 'segs': [{'end': 19552.482, 'src': 'embed', 'start': 19524.828, 'weight': 4, 'content': [{'end': 19529.49, 'text': 'But before we drill down to the coding part, let me just tell you why people call KNN as a lazy learner.', 'start': 19524.828, 'duration': 4.662}, {'end': 19535.256, 'text': "Well Canon for classification is a very simple algorithm, but that's not why they are called lazy.", 'start': 19530.214, 'duration': 5.042}, {'end': 19539.877, 'text': "Canon is a lazy learner because it doesn't have a discriminative function from the training data.", 'start': 19535.536, 'duration': 4.341}, {'end': 19542.318, 'text': 'But what it does it memorizes the training data.', 'start': 19540.058, 'duration': 2.26}, {'end': 19547.54, 'text': 'There is no learning phase of the model and all of the work happens at the time of prediction is requested.', 'start': 19542.618, 'duration': 4.922}, {'end': 19552.482, 'text': "So as such there's the reason why Canon is often referred to as lazy learning algorithm.", 'start': 19547.7, 'duration': 4.782}], 'summary': 'Knn is called a lazy learner because it memorizes training data and has no discriminative function.', 'duration': 27.654, 'max_score': 19524.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19524828.jpg'}, {'end': 19595.067, 'src': 'embed', 'start': 19561.488, 'weight': 2, 'content': [{'end': 19564.05, 'text': 'So this data set consists of 150 observation.', 'start': 19561.488, 'duration': 2.562}, {'end': 19566.732, 'text': 'We have four features and one class label.', 'start': 19564.29, 'duration': 2.442}, {'end': 19570.714, 'text': 'the four features include the separate, the separate petal length and the petal width,', 'start': 19566.732, 'duration': 3.982}, {'end': 19573.796, 'text': 'whereas the class label decides which flower belongs to which category.', 'start': 19570.714, 'duration': 3.082}, {'end': 19577.378, 'text': 'So this was the description of the data set which we are using.', 'start': 19574.837, 'duration': 2.541}, {'end': 19581.481, 'text': "now let's move on and see what are the step-by-step solution to perform a KNN algorithm.", 'start': 19577.378, 'duration': 4.103}, {'end': 19585.184, 'text': "So first we'll start by handling the data what we have to do.", 'start': 19582.003, 'duration': 3.181}, {'end': 19590.685, 'text': 'We have to open the data set from the CSV format and split the data set into train and test part.', 'start': 19585.244, 'duration': 5.441}, {'end': 19595.067, 'text': 'next will take the similarity where we have to calculate the distance between two data instances.', 'start': 19590.685, 'duration': 4.382}], 'summary': 'Data set has 150 observations with 4 features and 1 class label. discussion on knn algorithm steps.', 'duration': 33.579, 'max_score': 19561.488, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19561488.jpg'}, {'end': 19803.473, 'src': 'embed', 'start': 19773.993, 'weight': 10, 'content': [{'end': 19777.974, 'text': 'which is nothing but the square root of the sum of squared difference between two arrays of the number.', 'start': 19773.993, 'duration': 3.981}, {'end': 19782.428, 'text': 'Additionally, we want to control which field to include in the distance calculation.', 'start': 19778.687, 'duration': 3.741}, {'end': 19785.909, 'text': 'So specifically we only want to include first four attribute.', 'start': 19782.728, 'duration': 3.181}, {'end': 19789.53, 'text': 'So our approach will be to limit the Euclidean distance to a fixed length.', 'start': 19786.409, 'duration': 3.121}, {'end': 19792.51, 'text': "All right, so let's Define our Euclidean function.", 'start': 19790.09, 'duration': 2.42}, {'end': 19798.772, 'text': 'So this is our Euclidean distance function, which takes instance 1, instance 2 and length as parameters.', 'start': 19792.77, 'duration': 6.002}, {'end': 19803.473, 'text': 'instance 1 and instance 2 are the two points of which you want to calculate the Euclidean distance.', 'start': 19798.772, 'duration': 4.701}], 'summary': 'Limit euclidean distance to first four attributes for controlled calculation.', 'duration': 29.48, 'max_score': 19773.993, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19773993.jpg'}, {'end': 19917.127, 'src': 'embed', 'start': 19888.18, 'weight': 9, 'content': [{'end': 19895.184, 'text': 'So this is how I get neighbors function look like it takes training data set and test instance and K as its input here.', 'start': 19888.18, 'duration': 7.004}, {'end': 19897.946, 'text': 'The K is nothing but the number of nearest neighbor you want to check for.', 'start': 19895.204, 'duration': 2.742}, {'end': 19898.426, 'text': 'All right.', 'start': 19898.226, 'duration': 0.2}, {'end': 19904.75, 'text': "So basically what you'll be getting from this get neighbors function is K different points having least Euclidean distance from the test instance.", 'start': 19898.906, 'duration': 5.844}, {'end': 19906.191, 'text': "All right, let's execute it.", 'start': 19904.97, 'duration': 1.221}, {'end': 19908.332, 'text': 'So the function executed without any errors.', 'start': 19906.431, 'duration': 1.901}, {'end': 19910.203, 'text': "So let's test our function.", 'start': 19909.102, 'duration': 1.101}, {'end': 19917.127, 'text': 'So suppose the training data set includes the data like 2, 2, 2 and it belongs to Class A, and other data includes 4, 4,', 'start': 19910.523, 'duration': 6.604}], 'summary': 'The get neighbors function takes training data, test instance, and k as input, finding k points with the least euclidean distance from the test instance.', 'duration': 28.947, 'max_score': 19888.18, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19888180.jpg'}, {'end': 19972.496, 'src': 'embed', 'start': 19946.173, 'weight': 8, 'content': [{'end': 19952.395, 'text': 'Now once you are located the most similar neighbor for a test instance next task is to predict a response based on those neighbors.', 'start': 19946.173, 'duration': 6.222}, {'end': 19960.277, 'text': 'So how we can do that? Well, we can do this by allowing each neighbor to vote for the class attribute and take the majority vote as a prediction.', 'start': 19952.815, 'duration': 7.462}, {'end': 19961.797, 'text': "Let's see how we can do that.", 'start': 19960.677, 'duration': 1.12}, {'end': 19965.098, 'text': 'So we have a function as get response which takes neighbors as the input.', 'start': 19962.177, 'duration': 2.921}, {'end': 19968.819, 'text': 'Well, this neighbor was nothing but the output of this get neighbor function.', 'start': 19965.678, 'duration': 3.141}, {'end': 19972.496, 'text': 'The output of get neighbor function will be fed to get response.', 'start': 19969.7, 'duration': 2.796}], 'summary': 'Predict a response by allowing neighbors to vote for the class attribute and taking the majority vote as a prediction.', 'duration': 26.323, 'max_score': 19946.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19946173.jpg'}, {'end': 20045.788, 'src': 'embed', 'start': 20020.816, 'weight': 5, 'content': [{'end': 20028.418, 'text': "So for this I'll be defining function as get accuracy and inside that I'll be passing my test data set and the predictions get accuracy function,", 'start': 20020.816, 'duration': 7.602}, {'end': 20030.579, 'text': 'check it, execute it without any error.', 'start': 20028.418, 'duration': 2.161}, {'end': 20033.396, 'text': "Let's check it for a sample data set.", 'start': 20031.352, 'duration': 2.044}, {'end': 20040.727, 'text': 'So we have our test data set as 1, 1, 1, which belongs to class a, 2, 2, 2, which again belongs to class 3, 3, 3, which belongs to class B,', 'start': 20033.836, 'duration': 6.891}, {'end': 20045.788, 'text': 'and my predictions is for first test data, it predicted that it belongs to class a, which is true.', 'start': 20040.727, 'duration': 5.061}], 'summary': 'Defined a get accuracy function to test predictions on a sample dataset with 100% accuracy.', 'duration': 24.972, 'max_score': 20020.816, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20020816.jpg'}, {'end': 20090.071, 'src': 'embed', 'start': 20059.058, 'weight': 0, 'content': [{'end': 20062.101, 'text': 'all right, so the ratio will be two by three, which is nothing but 66.66.', 'start': 20059.058, 'duration': 3.043}, {'end': 20066.539, 'text': 'so our accuracy rate is 66.66.', 'start': 20062.101, 'duration': 4.438}, {'end': 20072.105, 'text': "So, now that you have created all the function that are required for Canon algorithm, let's compile them into one single main function.", 'start': 20066.539, 'duration': 5.566}, {'end': 20080.134, 'text': "All right, so there's our main function and we are using Iris data set with a split of 0.67 and the value of K is 3.", 'start': 20072.606, 'duration': 7.528}, {'end': 20080.435, 'text': "Let's see.", 'start': 20080.134, 'duration': 0.301}, {'end': 20084.184, 'text': 'What is the accuracy score of this? check how accurate our model is.', 'start': 20080.475, 'duration': 3.709}, {'end': 20090.071, 'text': 'So in training data set we have 113 values and in the test data set we have 37 values.', 'start': 20084.765, 'duration': 5.306}], 'summary': 'The accuracy rate of the model using the iris dataset with a split of 0.67 and k value of 3 is 66.66%, based on 113 training data and 37 test data values.', 'duration': 31.013, 'max_score': 20059.058, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20059058.jpg'}, {'end': 20152.814, 'src': 'embed', 'start': 20125.006, 'weight': 11, 'content': [{'end': 20130.768, 'text': "Then we'll understand what is Bayes theorem, which serves as a logic behind the naive bias algorithm.", 'start': 20125.006, 'duration': 5.762}, {'end': 20131.788, 'text': 'Moving forward.', 'start': 20131.268, 'duration': 0.52}, {'end': 20136.549, 'text': "I'll explain the steps involved in the naive Bayes algorithm one by one and finally,", 'start': 20131.848, 'duration': 4.701}, {'end': 20140.951, 'text': "I'll finish off this video with a demo on the naive Bayes using the SQL learn package.", 'start': 20136.549, 'duration': 4.402}, {'end': 20145.892, 'text': 'Now naive Bayes is a simple but surprisingly powerful algorithm from predictive analysis.', 'start': 20141.551, 'duration': 4.341}, {'end': 20152.814, 'text': 'It is a classification technique based on Bayes theorem with an assumption of independence among predictors.', 'start': 20146.732, 'duration': 6.082}], 'summary': 'Naive bayes algorithm is a powerful classification technique based on bayes theorem and predictor independence.', 'duration': 27.808, 'max_score': 20125.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20125006.jpg'}, {'end': 20216.573, 'src': 'embed', 'start': 20189.298, 'weight': 12, 'content': [{'end': 20192.962, 'text': 'which is alternatively known as the Bayes law or the Bayes rule,', 'start': 20189.298, 'duration': 3.664}, {'end': 20199.67, 'text': 'describes the probability of an event based on prior knowledge of the conditions that might be related to the event.', 'start': 20192.962, 'duration': 6.708}, {'end': 20202.633, 'text': 'now, Bayes theorem is a way to figure out conditional probability.', 'start': 20199.67, 'duration': 2.963}, {'end': 20210.602, 'text': 'The conditional probability is the probability of an event happening given that it has some relationship to one or more other events.', 'start': 20203.133, 'duration': 7.469}, {'end': 20216.573, 'text': 'For example, your probability of getting a parking space is connected to the time of the day you Park,', 'start': 20211.388, 'duration': 5.185}], 'summary': 'Bayes theorem calculates conditional probability based on prior knowledge.', 'duration': 27.275, 'max_score': 20189.298, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20189298.jpg'}, {'end': 20480.679, 'src': 'embed', 'start': 20454.244, 'weight': 13, 'content': [{'end': 20457.846, 'text': 'So now that we know the math which is involved behind the Bayes theorem.', 'start': 20454.244, 'duration': 3.602}, {'end': 20461.788, 'text': "Let's see how we can implement this in a real-life scenario.", 'start': 20458.386, 'duration': 3.402}, {'end': 20470.753, 'text': 'So suppose we have a data set in which we have the outlook the humidity and we need to find out whether we should play or not on that day.', 'start': 20462.569, 'duration': 8.184}, {'end': 20472.875, 'text': 'So the outlook can be sunny.', 'start': 20471.294, 'duration': 1.581}, {'end': 20475.836, 'text': 'overcast rain and the humidity are high.', 'start': 20472.875, 'duration': 2.961}, {'end': 20480.679, 'text': 'normal and the wind are categorized into two features, which are the weak and the strong winds.', 'start': 20475.836, 'duration': 4.843}], 'summary': 'Implement bayes theorem in real-life to decide whether to play based on weather data.', 'duration': 26.435, 'max_score': 20454.244, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20454244.jpg'}, {'end': 20751.766, 'src': 'embed', 'start': 20718.501, 'weight': 3, 'content': [{'end': 20721.483, 'text': 'We remove the less significant words, which are the stop words,', 'start': 20718.501, 'duration': 2.982}, {'end': 20729.649, 'text': 'from the documents or the articles and then we apply the naive bias classifier for classifying the news contents based on the news code.', 'start': 20721.483, 'duration': 8.166}, {'end': 20737.615, 'text': 'Now, this is by far one of the best examples of naive bias classifier, which is spam filtering.', 'start': 20730.37, 'duration': 7.245}, {'end': 20743.099, 'text': "Now, it's the naive bias classifier are a popular statistical technique for email filtering.", 'start': 20738.456, 'duration': 4.643}, {'end': 20751.766, 'text': 'They typically use bag-of-words features to identify the spam email an approach commonly used in text classification as well.', 'start': 20743.62, 'duration': 8.146}], 'summary': 'Naive bias classifier is used for spam filtering, removing stop words and using bag-of-words features for email filtering.', 'duration': 33.265, 'max_score': 20718.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20718501.jpg'}, {'end': 20924.879, 'src': 'embed', 'start': 20904.474, 'weight': 1, 'content': [{'end': 20918.64, 'text': 'empirical comparison of naive bias versus five popular classifiers on 15 medical data sets shows that naive bias is well suited for medical application and has high performance in most of the examined medical problems.', 'start': 20904.474, 'duration': 14.166}, {'end': 20924.879, 'text': 'Now in the past various statistical methods have been used for modeling in the area of disease diagnosis.', 'start': 20919.338, 'duration': 5.541}], 'summary': 'Naive bias outperforms 5 classifiers on 15 medical datasets, proving its suitability for medical applications.', 'duration': 20.405, 'max_score': 20904.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ20904474.jpg'}], 'start': 19524.828, 'title': 'Machine learning algorithms and applications', 'summary': 'Provides an overview of knn algorithm and demonstrates its application with a dataset of 150 observations. it also covers the implementation of canon algorithm in python, achieving 97.29% accuracy for the iris dataset. additionally, it explains naive bayes algorithm, bayes theorem, and their real-life applications, including probabilities and industrial use cases such as news categorization, spam filtering, and medical diagnosis.', 'chapters': [{'end': 19610.431, 'start': 19524.828, 'title': 'Knn algorithm overview', 'summary': "Introduces the knn algorithm, explaining its 'lazy learner' nature, and provides a practical demonstration using a dataset of 150 observations with four features and one class label.", 'duration': 85.603, 'highlights': ["The KNN algorithm is referred to as 'lazy learning' because it memorizes the training data and does not have a discriminative function from the training data.", 'The practical implementation uses a dataset with 150 observations, four features, and one class label to demonstrate the step-by-step solution for performing the KNN algorithm.', "The hands-on part involves handling the dataset by opening it from CSV format, splitting it into train and test parts, calculating the distance between data instances, selecting K neighbors with the least distance, and generating a response to decide the new point's class."]}, {'end': 20101.745, 'start': 19611.508, 'title': 'Canon algorithm in python', 'summary': 'Covers the implementation of canon algorithm in python for evaluating the accuracy of the model, achieving an accuracy rate of 97.29% with a split ratio of 0.67 for the iris dataset.', 'duration': 490.237, 'highlights': ['The accuracy rate achieved for the Canon algorithm with a split ratio of 0.67 for the Iris dataset is 97.29%.', "The function 'get accuracy' evaluates the accuracy of the model by calculating the ratio of correct predictions to all predictions made, resulting in an accuracy rate of 66.66% for a sample dataset.", "The 'get response' function allows each neighbor to vote for the class attribute and takes the majority vote as a prediction.", "The 'get neighbors' function returns the K most similar neighbors from the training set for a given test instance.", 'The Euclidean distance function is used to calculate the distance between two data instances, with a specific focus on including only the first four attributes.']}, {'end': 20649.531, 'start': 20106.986, 'title': 'Naive bayes algorithm and bayes theorem', 'summary': 'Explains the concepts of naive bayes algorithm and bayes theorem, outlining the steps involved in the naive bayes algorithm, and demonstrating its application using a real-life scenario with quantifiable probabilities, ultimately showcasing the probability of playing or not based on weather conditions.', 'duration': 542.545, 'highlights': ['Explanation of Naive Bayes Algorithm', 'Explanation of Bayes Theorem', 'Implementation of Bayes Theorem in Real-Life Scenario']}, {'end': 21234.958, 'start': 20649.531, 'title': 'Applications of naive bias classifier', 'summary': 'Discusses the industrial applications of the naive bias classifier, including news categorization, spam filtering, object detection, medical diagnosis, and weather prediction, highlighting its effectiveness through empirical comparisons and the types of naive bias models available under scikit-learn library.', 'duration': 585.427, 'highlights': ['Empirical comparison of naive bias versus five popular classifiers on 15 medical data sets shows that naive bias is well suited for medical application and has high performance in most of the examined medical problems.', "Spam filtering is one of the best examples of naive bias classifier, with its roots in the 1990s and has been successful in tailoring itself to individual users' needs and providing low false-positive spam detection rates.", 'The chapter discusses the industrial applications of the Naive Bias Classifier, including news categorization, spam filtering, object detection, medical diagnosis, and weather prediction, highlighting its effectiveness through empirical comparisons and the types of Naive Bias models available under scikit-learn library.']}], 'duration': 1710.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ19524828.jpg', 'highlights': ['The accuracy rate achieved for the Canon algorithm with a split ratio of 0.67 for the Iris dataset is 97.29%', 'Empirical comparison of naive bias versus five popular classifiers on 15 medical data sets shows that naive bias is well suited for medical application and has high performance', 'The practical implementation uses a dataset with 150 observations, four features, and one class label to demonstrate the step-by-step solution for performing the KNN algorithm', "Spam filtering is one of the best examples of naive bias classifier, with its roots in the 1990s and has been successful in tailoring itself to individual users' needs and providing low false-positive spam detection rates", "The KNN algorithm is referred to as 'lazy learning' because it memorizes the training data and does not have a discriminative function from the training data", "The 'get accuracy' function evaluates the accuracy of the model by calculating the ratio of correct predictions to all predictions made, resulting in an accuracy rate of 66.66% for a sample dataset", 'The chapter discusses the industrial applications of the Naive Bias Classifier, including news categorization, spam filtering, object detection, medical diagnosis, and weather prediction, highlighting its effectiveness through empirical comparisons and the types of Naive Bias models available under scikit-learn library', "The hands-on part involves handling the dataset by opening it from CSV format, splitting it into train and test parts, calculating the distance between data instances, selecting K neighbors with the least distance, and generating a response to decide the new point's class", "The 'get response' function allows each neighbor to vote for the class attribute and takes the majority vote as a prediction", "The 'get neighbors' function returns the K most similar neighbors from the training set for a given test instance", 'The Euclidean distance function is used to calculate the distance between two data instances, with a specific focus on including only the first four attributes', 'Explanation of Naive Bayes Algorithm', 'Explanation of Bayes Theorem', 'Implementation of Bayes Theorem in Real-Life Scenario']}, {'end': 22596.367, 'segs': [{'end': 21661.825, 'src': 'embed', 'start': 21628.816, 'weight': 3, 'content': [{'end': 21632.037, 'text': 'We are using the get prediction and the get accuracy method as well.', 'start': 21628.816, 'duration': 3.221}, {'end': 21643.841, 'text': 'So, guys, as you can see, the output of this one gives us that we are splitting the 768 rows into 514, which is the training, and 254,', 'start': 21633.277, 'duration': 10.564}, {'end': 21649.683, 'text': 'which is the test data set rows, and the accuracy of this model is 68%.', 'start': 21643.841, 'duration': 5.842}, {'end': 21654.024, 'text': 'now we can play with the amount of training and test data sets which are to be used here.', 'start': 21649.683, 'duration': 4.341}, {'end': 21661.825, 'text': 'so we can change the split ratio to 70s to 30 80s to 20 to get different sort of accuracy.', 'start': 21654.682, 'duration': 7.143}], 'summary': 'Using get prediction and get accuracy methods, split 768 rows into 514 training and 254 test data, achieving a 68% accuracy. adjusting split ratio can yield different accuracies.', 'duration': 33.009, 'max_score': 21628.816, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ21628816.jpg'}, {'end': 21851.941, 'src': 'embed', 'start': 21826.946, 'weight': 0, 'content': [{'end': 21833.049, 'text': 'So the expected output is data set, dot target, and the predicted is using the predicted model,', 'start': 21826.946, 'duration': 6.103}, {'end': 21838.151, 'text': 'and the model we are using is the Gaussian NB here now to summarize, the model which is created.', 'start': 21833.049, 'duration': 5.102}, {'end': 21842.913, 'text': 'We calculate the confusion Matrix and the classification report.', 'start': 21838.191, 'duration': 4.722}, {'end': 21851.941, 'text': 'So guys as you see the classification report, we have the precision of 0.96 We have the recall of 0.96.', 'start': 21844.073, 'duration': 7.868}], 'summary': 'Using gaussian nb model, achieved precision and recall of 0.96 in classification report.', 'duration': 24.995, 'max_score': 21826.946, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ21826946.jpg'}, {'end': 22017.742, 'src': 'embed', 'start': 21996.542, 'weight': 4, 'content': [{'end': 22005.465, 'text': 'The kernel trick basically means to transform your data into another dimension so that you can easily draw a hyperplane between the different classes of the data.', 'start': 21996.542, 'duration': 8.923}, {'end': 22010.767, 'text': 'Nonlinear data is basically data which cannot be separated with a straight line.', 'start': 22006.205, 'duration': 4.562}, {'end': 22014.248, 'text': 'So SVM can even be used on nonlinear data sets.', 'start': 22011.327, 'duration': 2.921}, {'end': 22016.709, 'text': 'You just have to use the kernel functions to do this.', 'start': 22014.568, 'duration': 2.141}, {'end': 22017.742, 'text': 'all right.', 'start': 22017.262, 'duration': 0.48}], 'summary': 'Kernel trick transforms data for svm to draw hyperplane between classes, even on nonlinear data.', 'duration': 21.2, 'max_score': 21996.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ21996542.jpg'}, {'end': 22143.68, 'src': 'embed', 'start': 22119.102, 'weight': 1, 'content': [{'end': 22126.327, 'text': 'Alright. so basically, the hyperplane which has the maximum distance from the support vectors is the most optimum hyperplane,', 'start': 22119.102, 'duration': 7.225}, {'end': 22130.871, 'text': 'and this distance between the hyperplane and the support vectors is known as the margin.', 'start': 22126.327, 'duration': 4.544}, {'end': 22141.158, 'text': 'Alright. so, to sum it up, SVM is used to classify data by using a hyperplane such that the distance between the hyperplane and the support vectors is maximum.', 'start': 22131.291, 'duration': 9.867}, {'end': 22143.68, 'text': 'So basically your margin has to be maximum.', 'start': 22141.458, 'duration': 2.222}], 'summary': 'Svm classifies data using a hyperplane with maximum margin.', 'duration': 24.578, 'max_score': 22119.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22119102.jpg'}, {'end': 22496.755, 'src': 'embed', 'start': 22468.291, 'weight': 2, 'content': [{'end': 22475.177, 'text': 'it will make a prediction and find out to which cluster that particular data or the data set belongs to or the particular data point belongs to.', 'start': 22468.291, 'duration': 6.886}, {'end': 22479.861, 'text': 'So one of the most important algorithms in unsupervised learning is clustering.', 'start': 22475.237, 'duration': 4.624}, {'end': 22482.524, 'text': "So let's understand exactly what is clustering.", 'start': 22479.921, 'duration': 2.603}, {'end': 22488.429, 'text': 'So clustering basically is the process of dividing the data sets into groups consisting of similar data points.', 'start': 22482.944, 'duration': 5.485}, {'end': 22496.755, 'text': 'And it means grouping of objects based on the information found in the data describing the objects or their relationships.', 'start': 22489.51, 'duration': 7.245}], 'summary': 'Clustering algorithm divides data into similar groups, aiding unsupervised learning.', 'duration': 28.464, 'max_score': 22468.291, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22468291.jpg'}], 'start': 21235.759, 'title': 'Machine learning fundamentals', 'summary': 'Covers using jupiter notebook for python programming, implementing a naive bias classifier achieving 68% accuracy, using gaussian nb model in scikit-learn achieving a precision and recall of 0.96, understanding svm and its real-world application, and an overview of unsupervised learning use cases in various fields.', 'chapters': [{'end': 21288.235, 'start': 21235.759, 'title': 'Using jupiter notebook for python programming', 'summary': 'Covers launching jupiter notebook, importing necessary modules, and loading a dataset, including converting string elements to float for calculations.', 'duration': 52.476, 'highlights': ['The process of launching Jupiter notebook and using it for Python programming is explained.', "The function 'load CSV' is created to import the Pima Indian diabetes dataset using the CSV reader method and converting string elements to float for calculation purposes.", "The necessary modules such as CSV, math, and random are imported before creating the 'load CSV' function."]}, {'end': 21684.949, 'start': 21288.235, 'title': 'Implementing naive bias classifier', 'summary': 'Explains the process of splitting the data for training and testing, calculating mean and standard deviation for each attribute, making predictions using gaussian probability density function, and estimating the accuracy of the model, achieving 68% accuracy when split into 67% training and 33% test data sets.', 'duration': 396.714, 'highlights': ['The model achieves 68% accuracy when split into 67% training and 33% test data sets', 'The process involves calculating mean and standard deviation for each attribute', 'The chapter explains the process of splitting the data for training and testing', 'Making predictions using Gaussian probability density function', 'Estimating the accuracy of the model']}, {'end': 22017.742, 'start': 21685.609, 'title': 'Using gaussian nb model in scikit-learn', 'summary': 'Discusses using the gaussian nb model in scikit-learn to easily fit a model to the iris dataset, achieving a precision and recall of 0.96, and explores the features and applications of support vector machine (svm) in classification and regression problems.', 'duration': 332.133, 'highlights': ['Using the Gaussian NB model in scikit-learn to fit a model to the iris dataset, achieving a precision and recall of 0.96', 'Exploring the features and applications of support vector machine (SVM) in classification and regression problems', 'The ease of using scikit-learn to fit a model to a dataset, and its applications in various fields such as face recognition and cancer classification']}, {'end': 22406.154, 'start': 22017.742, 'title': 'Understanding svm and its real-world application', 'summary': 'Explains how support vector machines (svm) work, including the concept of drawing a hyperplane to separate classes, the use of support vectors, and the application of kernel functions for nonlinear data. additionally, it discusses a real-world use case of svm in cancer classification, demonstrating its high accuracy even with small data sets.', 'duration': 388.412, 'highlights': ['SVM draws a decision boundary (hyperplane) to separate classes, using support vectors to maximize the margin', 'Kernel functions transform nonlinear data into linear space for SVM analysis, improving classification accuracy', 'Real-world use case: SVM in cancer classification demonstrated high accuracy even with small data sets, outperforming Naive Bayes']}, {'end': 22596.367, 'start': 22411.132, 'title': 'Unsupervised learning overview', 'summary': 'Introduces unsupervised learning, emphasizing its use in clustering unstructured and unlabeled data to draw inferences and make predictions, with key use cases including marketing, insurance, seismic studies, and movie recommendations.', 'duration': 185.235, 'highlights': ['Unsupervised learning is used to cluster input data into classes based on statistical properties, such as speed limit, acceleration, or average, without labeled responses.', 'Clustering is the process of dividing data sets into groups of similar data points, focusing on identifying groups of similar records and labeling records accordingly.', 'The goal of clustering is to determine intrinsic groups in unlabeled data, with use cases including marketing, insurance, seismic studies, and movie recommendations.']}], 'duration': 1360.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ21235759.jpg', 'highlights': ['Using Gaussian NB model in scikit-learn achieves precision and recall of 0.96', 'SVM draws a decision boundary to separate classes, using support vectors to maximize the margin', 'Unsupervised learning clusters input data into classes based on statistical properties', 'Model achieves 68% accuracy when split into 67% training and 33% test data sets', 'Kernel functions transform nonlinear data into linear space for SVM analysis']}, {'end': 24094.727, 'segs': [{'end': 22686.046, 'src': 'embed', 'start': 22617.194, 'weight': 1, 'content': [{'end': 22620.216, 'text': 'So an example of this is the K-mean clustering.', 'start': 22617.194, 'duration': 3.022}, {'end': 22623.738, 'text': 'So K-mean clustering does this exclusive kind of clustering.', 'start': 22620.256, 'duration': 3.482}, {'end': 22626.54, 'text': 'So secondly we have overlapping clustering.', 'start': 22624.098, 'duration': 2.442}, {'end': 22628.901, 'text': 'So it is also known as soft clusters.', 'start': 22626.6, 'duration': 2.301}, {'end': 22636.466, 'text': 'In this an item can belong to multiple clusters as its degree of association with each cluster is shown.', 'start': 22629.301, 'duration': 7.165}, {'end': 22643.85, 'text': 'And for example, we have fuzzy or the C means clustering, which is being used for overlapping clustering.', 'start': 22637.246, 'duration': 6.604}, {'end': 22646.571, 'text': 'And finally, we have the hierarchical clustering.', 'start': 22643.97, 'duration': 2.601}, {'end': 22654.195, 'text': 'So when two clusters have a parent-child relationship or a tree-like structure, then it is known as hierarchical cluster.', 'start': 22646.591, 'duration': 7.604}, {'end': 22659.558, 'text': 'So as you can see here from the example, we have a parent-child kind of relationship in the cluster given here.', 'start': 22654.215, 'duration': 5.343}, {'end': 22663.52, 'text': "So let's understand what exactly is k-means clustering.", 'start': 22660.339, 'duration': 3.181}, {'end': 22670.002, 'text': 'So k-means clustering is an algorithm whose main goal is to group similar elements of data points into a cluster.', 'start': 22663.72, 'duration': 6.282}, {'end': 22676.243, 'text': 'And it is the process by which objects are classified into a predefined number of groups,', 'start': 22670.742, 'duration': 5.501}, {'end': 22686.046, 'text': 'so that they are as much dissimilar as possible from one group to another group, but as much as similar or possible within each group.', 'start': 22676.243, 'duration': 9.803}], 'summary': 'Different types of clustering: k-means, overlapping, hierarchical. k-means aims to group similar data points into clusters.', 'duration': 68.852, 'max_score': 22617.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22617194.jpg'}, {'end': 22843.705, 'src': 'embed', 'start': 22818.69, 'weight': 0, 'content': [{'end': 22826.815, 'text': "so let's assume first of all compute the sum squared error, which is the sse, for some value of a, for example.", 'start': 22818.69, 'duration': 8.125}, {'end': 22829.196, 'text': "let's take two, four, six and eight.", 'start': 22826.815, 'duration': 2.381}, {'end': 22838.162, 'text': 'now the sse, which is the sum squared error, is defined as a sum of the squared distance between each number member of the cluster and its centroid.', 'start': 22829.196, 'duration': 8.966}, {'end': 22843.705, 'text': 'mathematically, and if you mathematically, it is given by the equation which is provided here.', 'start': 22838.162, 'duration': 5.543}], 'summary': 'Compute sum squared error (sse) for values 2, 4, 6, 8.', 'duration': 25.015, 'max_score': 22818.69, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22818690.jpg'}, {'end': 23010.293, 'src': 'embed', 'start': 22986.395, 'weight': 7, 'content': [{'end': 22992.598, 'text': 'So typically people do not clean the data for k-means clustering or even if they clean there are,', 'start': 22986.395, 'duration': 6.203}, {'end': 22997.701, 'text': 'sometimes there are noisy and outliners data which affect the whole model.', 'start': 22992.598, 'duration': 5.103}, {'end': 22999.982, 'text': 'So that was all for k-means clustering.', 'start': 22998.021, 'duration': 1.961}, {'end': 23005.891, 'text': "So what we're going to do is now use k-means clustering for the movie data set.", 'start': 23000.609, 'duration': 5.282}, {'end': 23010.293, 'text': 'So we have to find out the number of clusters and divide it accordingly.', 'start': 23005.971, 'duration': 4.322}], 'summary': 'K-means clustering for movie dataset to find clusters.', 'duration': 23.898, 'max_score': 22986.395, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22986395.jpg'}, {'end': 23193.628, 'src': 'embed', 'start': 23158.751, 'weight': 3, 'content': [{'end': 23160.532, 'text': 'so it might depend see,', 'start': 23158.751, 'duration': 1.781}, {'end': 23168.498, 'text': "that's exactly what I was going to say is that initially the main challenge in k-means clustering is to define the number of centers which are the k.", 'start': 23160.532, 'duration': 7.966}, {'end': 23177.883, 'text': 'so as you can see here that the third center and the 0th cluster, the third cluster and the 0th cluster are very, very close to each other.', 'start': 23168.498, 'duration': 9.385}, {'end': 23181.624, 'text': 'so, guys, it probably could have been in one another cluster and the.', 'start': 23177.883, 'duration': 3.741}, {'end': 23186.926, 'text': 'another disadvantage was that we do not exactly know how the points are to be arranged.', 'start': 23181.624, 'duration': 5.302}, {'end': 23193.628, 'text': "so it's very difficult to force the data into any other cluster, which makes our analysis a little different works fine,", 'start': 23186.926, 'duration': 6.702}], 'summary': 'K-means clustering faces challenges in defining number of centers and arranging points, making analysis difficult.', 'duration': 34.877, 'max_score': 23158.751, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ23158751.jpg'}, {'end': 23251.043, 'src': 'embed', 'start': 23227.471, 'weight': 4, 'content': [{'end': 23234.113, 'text': 'In a soft cluster, any point can belong to more than one cluster at a time with a certain affinity value towards each.', 'start': 23227.471, 'duration': 6.642}, {'end': 23241.977, 'text': 'Fuzzy C means assigns the degree of membership, which ranges from 0 to 1, to an object to a given cluster.', 'start': 23234.613, 'duration': 7.364}, {'end': 23251.043, 'text': 'so there is a stipulation that the sum of fuzzy membership of an object to all the cluster it belongs to must be equal to 1.', 'start': 23241.977, 'duration': 9.066}], 'summary': 'Fuzzy c means assigns membership values ranging from 0 to 1 to each object in soft clusters, ensuring the sum of memberships across all clusters equals 1.', 'duration': 23.572, 'max_score': 23227.471, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ23227471.jpg'}, {'end': 23471.317, 'src': 'embed', 'start': 23450.467, 'weight': 6, 'content': [{'end': 23460.172, 'text': 'The marketing team at the retail stores should target customers who buy bread and butter and provide them an offer so that they buy a third item like an egg.', 'start': 23450.467, 'duration': 9.705}, {'end': 23468.236, 'text': 'So if a customer buys bread and butter and sees a discount or an offer on eggs, he will be encouraged to spend more money and buy the eggs.', 'start': 23460.432, 'duration': 7.804}, {'end': 23471.317, 'text': 'Now, this is what market basket analysis is all about.', 'start': 23468.656, 'duration': 2.661}], 'summary': 'Target bread and butter customers with egg offer to increase spending.', 'duration': 20.85, 'max_score': 23450.467, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ23450467.jpg'}, {'end': 23543.557, 'src': 'embed', 'start': 23514.369, 'weight': 2, 'content': [{'end': 23522.178, 'text': 'now there are three common ways to measure a particular association, because we have to find these rules on the basis of some statistics right.', 'start': 23514.369, 'duration': 7.809}, {'end': 23524.44, 'text': 'so what we do is use support,', 'start': 23522.178, 'duration': 2.262}, {'end': 23533.728, 'text': 'confidence and lift now these three common ways and the measures to have a look at the association rule mining and know exactly how good is that rule.', 'start': 23524.44, 'duration': 9.288}, {'end': 23535.57, 'text': 'so first of all, we have support.', 'start': 23533.728, 'duration': 1.842}, {'end': 23543.557, 'text': "so support gives the fraction of the transaction which contains an item a and b, so it's basically the frequency of the item in the whole item set,", 'start': 23535.57, 'duration': 7.987}], 'summary': 'Three common measures for association rule mining are support, confidence, and lift.', 'duration': 29.188, 'max_score': 23514.369, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ23514369.jpg'}, {'end': 23687.76, 'src': 'embed', 'start': 23659.341, 'weight': 5, 'content': [{'end': 23669.851, 'text': 'it uses the frequent item sets to generate the association rules and it is based on the concept that a subset of a frequent item set must also be a frequent isom set.', 'start': 23659.341, 'duration': 10.51}, {'end': 23675.154, 'text': "so let's understand what is a frequent item set and how all of these work together.", 'start': 23670.271, 'duration': 4.883}, {'end': 23687.76, 'text': 'so if we take the following transactions of items, we have transaction t1 to t5 and the items are 1, 3, 4, 2, 3, 5, 1, 2, 3, 5, 2, 5 and 1, 3, 5.', 'start': 23675.154, 'duration': 12.606}], 'summary': 'Using frequent item sets to generate association rules from transactions.', 'duration': 28.419, 'max_score': 23659.341, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ23659341.jpg'}], 'start': 22596.367, 'title': 'Clustering and association rule mining', 'summary': 'Covers different types of clustering such as exclusive, overlapping, and hierarchical clustering, including k-means clustering and c-means clustering. it also delves into association rule mining, a priori algorithm, and their applications in market basket analysis.', 'chapters': [{'end': 22659.558, 'start': 22596.367, 'title': 'Types of clustering in data analysis', 'summary': 'Discusses the three types of clustering: exclusive clustering, overlapping clustering, and hierarchical clustering, with examples such as k-mean clustering and c means clustering, highlighting the characteristics and applications of each.', 'duration': 63.191, 'highlights': ['Exclusive clustering, also known as hard clustering, involves an item belonging exclusively to one cluster, exemplified by K-mean clustering.', 'Overlapping clustering, or soft clusters, allows an item to belong to multiple clusters based on its degree of association, demonstrated by fuzzy or C means clustering.', 'Hierarchical clustering involves a parent-child relationship or a tree-like structure between clusters, as seen in the given example.']}, {'end': 23084.356, 'start': 22660.339, 'title': 'Understanding k-means clustering', 'summary': 'Explains the k-means clustering algorithm, which aims to group similar data points into clusters by identifying the number of clusters, finding centroids, and iteratively assigning data points to the closest cluster based on the euclidean distance, and using the elbow method to determine the optimal number of clusters. it also discusses the pros and cons of k-means clustering. additionally, it outlines a use case of applying k-means clustering to a movie dataset to group movies into clusters based on facebook likes.', 'duration': 424.017, 'highlights': ['The K-means clustering algorithm works by identifying the number of clusters, finding centroids, calculating the Euclidean distance of data points from the centroids, and iteratively assigning data points to the closest cluster, repeating the process until centroids converge or the outputs are very close enough, without stopping until then.', 'The chapter emphasizes the importance of using the elbow method to decide the number of clusters, which involves computing the sum squared error (SSE) for different numbers of clusters and choosing the K at which the SSE decreases abruptly, providing a quantitative method for determining the optimal number of clusters.', 'The discussion on the pros and cons of K-means clustering highlights its simplicity and automatic assignment of items to clusters as pros, while noting the heavy task of defining the number of clusters and the inability to handle noisy data and outliers as cons, providing insights into the practical considerations of using K-means clustering.', 'The use case of applying K-means clustering to a movie dataset to group movies based on Facebook likes demonstrates a practical application of the algorithm, showcasing its relevance and applicability in real-world scenarios.', 'The chapter also outlines the initial steps for using K-means clustering in a practical setting, including importing necessary libraries, examining the shape and contents of the dataset, and preparing the data for analysis, providing a step-by-step guide for implementing K-means clustering in a data analysis project.']}, {'end': 23471.317, 'start': 23084.356, 'title': 'Clustering techniques in machine learning', 'summary': 'Explores k-means clustering, fuzzy c-means clustering, and hierarchical clustering, highlighting key points and quantifiable data, such as the number of clusters, pros and cons, and its applications in market basket analysis.', 'duration': 386.961, 'highlights': ['K-means clustering is performed with 5 clusters, showing close proximity between clusters 3 and 0.', 'Fuzzy C-means clustering allows data points to belong to multiple clusters with membership degrees ranging from 0 to 1.', 'Hierarchical clustering does not require specifying the number of clusters beforehand and may correspond to meaningful taxonomies.', 'Market basket analysis uncovers associations between items in retail transactions, aiding in targeted customer offers.']}, {'end': 24094.727, 'start': 23471.697, 'title': 'Association rule mining', 'summary': 'Discusses association rule mining and a priori algorithm, explaining the concepts and measures of support, confidence, and lift, using an example of creating rules and applying them to market basket analysis problems.', 'duration': 623.03, 'highlights': ['Association rule mining is a technique that shows how items are associated to each other, with examples like customers who purchase bread have a 60% likelihood of also purchasing jam, and customers who purchase laptop are more likely to purchase laptop bags.', 'The measures used to evaluate association rules are support, confidence, and lift, each providing specific insights such as the frequency of items in the whole item set, the likelihood of items occurring together, and the strength of the rule over random co-occurrences.', 'The a priori algorithm uses frequent item sets to generate association rules and is based on the concept that a subset of a frequent item set must also be a frequent item set.', 'Demonstrates the process of data cleanup and consolidation for market basket analysis, including the removal of irrelevant data and the consolidation of items into one transaction per row.']}], 'duration': 1498.36, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ22596367.jpg', 'highlights': ['The elbow method aids in determining the optimal number of clusters using sum squared error (SSE).', 'K-means clustering algorithm involves identifying clusters, finding centroids, and iteratively assigning data points.', 'Association rule mining evaluates rules using support, confidence, and lift measures for specific insights.', 'K-means clustering is performed with 5 clusters, showing close proximity between clusters 3 and 0.', 'Fuzzy C-means clustering allows data points to belong to multiple clusters with membership degrees ranging from 0 to 1.', 'The a priori algorithm uses frequent item sets to generate association rules based on subset concept.', 'Market basket analysis uncovers associations between items in retail transactions, aiding in targeted customer offers.', 'The use case of applying K-means clustering to a movie dataset demonstrates its relevance and applicability.', 'Hierarchical clustering involves a parent-child relationship or a tree-like structure between clusters.', 'Overlapping clustering allows an item to belong to multiple clusters based on its degree of association.']}, {'end': 25032.136, 'segs': [{'end': 24117.115, 'src': 'embed', 'start': 24094.867, 'weight': 0, 'content': [{'end': 24105.501, 'text': 'So now that we have structured the data properly, so the next step is to generate the frequent item set that has support of at least 7%.', 'start': 24094.867, 'duration': 10.634}, {'end': 24108.865, 'text': 'now this number is chosen so that you can get close enough.', 'start': 24105.501, 'duration': 3.364}, {'end': 24114.331, 'text': "now what we're going to do is generate the rules with the corresponding support, confidence and lift.", 'start': 24108.865, 'duration': 5.466}, {'end': 24117.115, 'text': 'so we have given the minimum support as 0.7.', 'start': 24114.331, 'duration': 2.784}], 'summary': 'Generating frequent item set with support of at least 7% and minimum support of 0.7.', 'duration': 22.248, 'max_score': 24094.867, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24094867.jpg'}, {'end': 24187.657, 'src': 'embed', 'start': 24140.893, 'weight': 1, 'content': [{'end': 24150.32, 'text': 'If we filter the data frame using the standard Pandas code for large lift 6 and high confidence 0.8, this is what the output is going to look like.', 'start': 24140.893, 'duration': 9.427}, {'end': 24155.448, 'text': 'These are 1, 2, 3, 4, 5, 6, 7, 8.', 'start': 24151.051, 'duration': 4.397}, {'end': 24161.639, 'text': 'So as you can see here, we have the 8 rules, which are the final rules, which are given by the association rule mining.', 'start': 24155.455, 'duration': 6.184}, {'end': 24169.025, 'text': 'And this is how all the industries or any of these we talk about large retailers.', 'start': 24162.32, 'duration': 6.705}, {'end': 24180.474, 'text': 'they tend to know how their products are used and how exactly they should rearrange and provide the offers on the products so that people spend more and more money and time in their shop.', 'start': 24169.025, 'duration': 11.449}, {'end': 24183.575, 'text': 'so that was all about association rule mining.', 'start': 24181.034, 'duration': 2.541}, {'end': 24187.657, 'text': "so so, guys, that's all for unsupervised learning.", 'start': 24183.575, 'duration': 4.082}], 'summary': 'Association rule mining yields 8 final rules with large lift 6 and high confidence 0.8, benefiting industries like large retailers.', 'duration': 46.764, 'max_score': 24140.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24140893.jpg'}, {'end': 24236.606, 'src': 'embed', 'start': 24211.138, 'weight': 3, 'content': [{'end': 24220.45, 'text': 'The reinforcement learning is a part of machine learning where an agent is put in an environment and he learns to behave in this environment by performing certain actions.', 'start': 24211.138, 'duration': 9.312}, {'end': 24229.222, 'text': 'Okay, so it basically performs actions and it either gets the rewards on the actions or it gets a punishment and observing the reward which it gets from those actions.', 'start': 24220.67, 'duration': 8.552}, {'end': 24236.606, 'text': 'Reinforcement learning is all about taking an appropriate action in order to maximize the reward in a particular situation.', 'start': 24229.702, 'duration': 6.904}], 'summary': 'Reinforcement learning involves maximizing rewards by performing actions in an environment.', 'duration': 25.468, 'max_score': 24211.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24211138.jpg'}, {'end': 24277.807, 'src': 'embed', 'start': 24249.594, 'weight': 4, 'content': [{'end': 24254.917, 'text': 'Here the reinforcement agent decides what actions to take in order to perform a given task.', 'start': 24249.594, 'duration': 5.323}, {'end': 24257.038, 'text': 'In the absence of a training data set.', 'start': 24255.357, 'duration': 1.681}, {'end': 24259.499, 'text': 'It is bound to learn from its experience itself.', 'start': 24257.158, 'duration': 2.341}, {'end': 24260.319, 'text': 'All right.', 'start': 24259.779, 'duration': 0.54}, {'end': 24270.564, 'text': "so reinforcement learning is all about an agent who's put in an unknown environment and he's going to use a hit and trial method in order to figure out the environment and then come up with an outcome.", 'start': 24260.319, 'duration': 10.245}, {'end': 24274.005, 'text': "Okay Now, let's look at reinforcement learning with an analogy.", 'start': 24270.924, 'duration': 3.081}, {'end': 24277.807, 'text': 'So consider a scenario wherein a baby is learning how to walk.', 'start': 24274.506, 'duration': 3.301}], 'summary': 'Reinforcement learning: agent learns by trial and error in unknown environment.', 'duration': 28.213, 'max_score': 24249.594, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24249594.jpg'}, {'end': 24670.485, 'src': 'embed', 'start': 24645.442, 'weight': 6, 'content': [{'end': 24651.848, 'text': "So if you haven't already realized it the basic aim of the RL agent is to maximize the reward.", 'start': 24645.442, 'duration': 6.406}, {'end': 24655.391, 'text': "Now, how does that happen? Let's try to understand this in depth.", 'start': 24652.328, 'duration': 3.063}, {'end': 24662.578, 'text': 'So the agent must be trained in such a way that he takes the best action so that the reward is maximum,', 'start': 24655.992, 'duration': 6.586}, {'end': 24667.903, 'text': 'because the end goal of reinforcement learning is to maximize your reward based on a set of actions.', 'start': 24662.578, 'duration': 5.325}, {'end': 24670.485, 'text': 'So let me explain this with a small game.', 'start': 24668.564, 'duration': 1.921}], 'summary': 'Rl agent aims to maximize reward by taking best actions.', 'duration': 25.043, 'max_score': 24645.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24645442.jpg'}, {'end': 24803.048, 'src': 'embed', 'start': 24777.717, 'weight': 7, 'content': [{'end': 24784.28, 'text': 'So exploration, like the name suggests, is about exploring and capturing more information about an environment.', 'start': 24777.717, 'duration': 6.563}, {'end': 24790.223, 'text': 'on the other hand, exploitation is about using the already known exploited information to heighten the rewards.', 'start': 24784.28, 'duration': 5.943}, {'end': 24794.264, 'text': 'So, guys, consider the fox and tiger example that we discussed now here.', 'start': 24790.663, 'duration': 3.601}, {'end': 24800.587, 'text': 'the Fox eats only the meat chunks which are close to him, but he does not eat the meat chunks which are closer to the tiger.', 'start': 24794.264, 'duration': 6.323}, {'end': 24803.048, 'text': 'Okay, even though they might give him more rewards.', 'start': 24800.607, 'duration': 2.441}], 'summary': 'Exploration involves capturing information, while exploitation uses known information to maximize rewards.', 'duration': 25.331, 'max_score': 24777.717, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24777717.jpg'}, {'end': 24874.01, 'src': 'embed', 'start': 24848.411, 'weight': 8, 'content': [{'end': 24853.034, 'text': 'in a way, the purpose of reinforcement learning is to solve a Markov decision process.', 'start': 24848.411, 'duration': 4.623}, {'end': 24856.941, 'text': 'Okay, so there are a few parameters that are used to get to the solution.', 'start': 24853.499, 'duration': 3.442}, {'end': 24862.484, 'text': 'So the parameters include the set of actions, the set of states, the rewards,', 'start': 24857.361, 'duration': 5.123}, {'end': 24866.126, 'text': "the policy that you're taking to approach the problem and the value that you get.", 'start': 24862.484, 'duration': 3.642}, {'end': 24874.01, 'text': 'Okay, so to sum it up, the agent must take an action A to transition from a start state to the end state S.', 'start': 24866.746, 'duration': 7.264}], 'summary': 'Reinforcement learning aims to solve a markov decision process using parameters like actions, states, rewards, policy, and value.', 'duration': 25.599, 'max_score': 24848.411, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24848411.jpg'}], 'start': 24094.867, 'title': 'Association rule mining and reinforcement learning', 'summary': 'Discusses generating frequent item sets with a support of at least 7%, deriving rules with support, confidence, and lift, and the impact of association rule mining on industries. it also explains reinforcement learning, an agent-environment interaction process, where the agent learns to maximize rewards through trial and error, with examples from various scenarios.', 'chapters': [{'end': 24206.105, 'start': 24094.867, 'title': 'Association rule mining', 'summary': 'Discusses generating frequent item sets with a support of at least 7%, deriving rules with support, confidence, and lift, filtering data frame using pandas code, and the impact of association rule mining on industries, with a focus on large retailers.', 'duration': 111.238, 'highlights': ['The final rules given by the association rule mining provide 8 rules, derived using a support threshold of 0.7.', 'The frequent item set is generated with a support of at least 7% to get close enough, along with rules having high lift and confidence values.', 'The impact of association rule mining on industries, especially large retailers, is highlighted, emphasizing how it helps them understand product usage and optimize offers to increase customer spending and time in their shops.', 'Discussion on unsupervised learning, including the creation of rules without providing labels to the data and the application of different clustering techniques such as k-means, c-means, and hierarchical clustering.']}, {'end': 24590.346, 'start': 24211.138, 'title': 'Understanding reinforcement learning', 'summary': 'Explains reinforcement learning, an agent-environment interaction process, where the agent learns to maximize rewards through trial and error, with examples from a baby learning to walk and a player learning to play counter-strike.', 'duration': 379.208, 'highlights': ['Reinforcement learning involves an agent learning to behave in a given environment through actions, either receiving rewards or punishments based on those actions.', 'In reinforcement learning, the agent learns from its experience without an expected output, aiming to maximize the reward in a given situation.', 'Reinforcement learning mirrors human learning from mistakes through trial and error, with the agent using a hit and trial method to figure out the environment and achieve an outcome.', 'The reinforcement learning process involves an agent and an environment, with the agent making decisions based on observations and receiving rewards or punishments from the environment.', 'The chapter provides a detailed analogy of a baby learning to walk and a player learning to play Counter-Strike to illustrate reinforcement learning concepts and processes.', 'Key concepts in reinforcement learning include the agent, environment, actions, states, rewards, and policy, each playing a crucial role in the learning process.']}, {'end': 25032.136, 'start': 24590.827, 'title': 'Reinforcement learning concepts', 'summary': 'Explains the concepts of value and action value, reward maximization, discounting, exploration, exploitation, and markov decision process in reinforcement learning, emphasizing the importance of maximizing reward and choosing the optimum policy.', 'duration': 441.309, 'highlights': ['The value and action value (Q value) are defined, and examples are used to clarify the concepts.', "The concept of reward maximization in reinforcement learning is detailed, emphasizing the agent's goal to maximize rewards based on a set of actions.", 'The concept of discounting in reward is explained, with the role of the gamma value in determining the discount value.', 'The concepts of exploration and exploitation are defined, using the fox and tiger example to illustrate the trade-off between exploring for bigger rewards and exploiting known information.', 'The Markov decision process is introduced as a mathematical approach for mapping solutions in reinforcement learning, detailing the parameters involved and the goal of maximizing rewards by choosing the optimum policy.']}], 'duration': 937.269, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ24094867.jpg', 'highlights': ['The frequent item set is generated with a support of at least 7% to get close enough, along with rules having high lift and confidence values.', 'The final rules given by the association rule mining provide 8 rules, derived using a support threshold of 0.7.', 'The impact of association rule mining on industries, especially large retailers, is highlighted, emphasizing how it helps them understand product usage and optimize offers to increase customer spending and time in their shops.', 'Reinforcement learning involves an agent learning to behave in a given environment through actions, either receiving rewards or punishments based on those actions.', 'In reinforcement learning, the agent learns from its experience without an expected output, aiming to maximize the reward in a given situation.', 'The reinforcement learning process involves an agent and an environment, with the agent making decisions based on observations and receiving rewards or punishments from the environment.', "The concept of reward maximization in reinforcement learning is detailed, emphasizing the agent's goal to maximize rewards based on a set of actions.", 'The concepts of exploration and exploitation are defined, using the fox and tiger example to illustrate the trade-off between exploring for bigger rewards and exploiting known information.', 'The Markov decision process is introduced as a mathematical approach for mapping solutions in reinforcement learning, detailing the parameters involved and the goal of maximizing rewards by choosing the optimum policy.']}, {'end': 26585.269, 'segs': [{'end': 25204.394, 'src': 'embed', 'start': 25164.395, 'weight': 2, 'content': [{'end': 25172.284, 'text': 'Now the task is to enable the robots so that they can find the shortest route from any given location to another location on their own.', 'start': 25164.395, 'duration': 7.889}, {'end': 25178.691, 'text': 'Now the agents in this case are the robots the environment is the automobile factory warehouse.', 'start': 25172.964, 'duration': 5.727}, {'end': 25180.698, 'text': "So let's talk about the states.", 'start': 25179.437, 'duration': 1.261}, {'end': 25188.223, 'text': 'So the states are the location in which a particular robot is present in the particular instance of time, which will denote its states.', 'start': 25181.198, 'duration': 7.025}, {'end': 25191.285, 'text': 'Now machines understand numbers rather than letters.', 'start': 25188.864, 'duration': 2.421}, {'end': 25194.107, 'text': "So let's map the location codes to number.", 'start': 25191.745, 'duration': 2.362}, {'end': 25199.631, 'text': 'So as you can see here, we have mapped location L1 to the state 0, L2 and 1 and so on.', 'start': 25194.487, 'duration': 5.144}, {'end': 25204.394, 'text': 'We have L8 as state 7 and L9 as state 8.', 'start': 25199.691, 'duration': 4.703}], 'summary': 'Robots to find shortest route in auto warehouse using location codes and state mapping.', 'duration': 39.999, 'max_score': 25164.395, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25164395.jpg'}, {'end': 25323.161, 'src': 'embed', 'start': 25297.242, 'weight': 3, 'content': [{'end': 25303.887, 'text': 'Now with this queue we can construct a reward table which contains all the reward values mapping between all possible states.', 'start': 25297.242, 'duration': 6.645}, {'end': 25311.052, 'text': 'So, as you can see here in the table, the positions which are marked green have a positive reward and, as you can see here,', 'start': 25304.467, 'duration': 6.585}, {'end': 25315.396, 'text': 'we have all the possible rewards that a robot can get by moving in between the different states.', 'start': 25311.052, 'duration': 4.344}, {'end': 25317.637, 'text': 'Now comes an interesting decision.', 'start': 25315.996, 'duration': 1.641}, {'end': 25323.161, 'text': 'Now remember that the factory administrator prioritize L6 to be the topmost.', 'start': 25318.158, 'duration': 5.003}], 'summary': 'A reward table maps all possible states with positive rewards for robot movement, with factory prioritizing l6.', 'duration': 25.919, 'max_score': 25297.242, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25297242.jpg'}, {'end': 25450.04, 'src': 'embed', 'start': 25421.253, 'weight': 5, 'content': [{'end': 25426.598, 'text': 'It happens primarily because the robot does not have a way to remember the directions to proceed.', 'start': 25421.253, 'duration': 5.345}, {'end': 25430.341, 'text': 'So our job now is to enable the robot with the memory.', 'start': 25427.118, 'duration': 3.223}, {'end': 25433.264, 'text': 'Now, this is where the Bellman equation comes into play.', 'start': 25430.982, 'duration': 2.282}, {'end': 25439.155, 'text': 'So as you can see here, the main reason of the Bellarmine equation is to enable the reward with the memory.', 'start': 25433.86, 'duration': 5.295}, {'end': 25440.558, 'text': "That's the thing we're going to use.", 'start': 25439.215, 'duration': 1.343}, {'end': 25450.04, 'text': 'So the equation goes something like this V of s gives maximum a, R of s, comma a plus gamma of V s dash.', 'start': 25441.178, 'duration': 8.862}], 'summary': 'Enabling the robot with memory using the bellman equation for maximum reward.', 'duration': 28.787, 'max_score': 25421.253, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25421253.jpg'}, {'end': 25756.37, 'src': 'embed', 'start': 25727.578, 'weight': 4, 'content': [{'end': 25732.342, 'text': 'So a Markov decision process is a discrete time stochastic control process.', 'start': 25727.578, 'duration': 4.764}, {'end': 25742.026, 'text': 'It provides a mathematical framework for modeling decision-making in situations where the outcomes are partly random and partly under the control of the decision maker.', 'start': 25733.084, 'duration': 8.942}, {'end': 25749.668, 'text': 'Now we need to give this concept a mathematical shape most likely an equation which then can be taken further.', 'start': 25742.686, 'duration': 6.982}, {'end': 25756.37, 'text': 'Now, you might be surprised that we can do this with the help of the Bellman equation with a few minor tweaks.', 'start': 25750.528, 'duration': 5.842}], 'summary': 'Markov decision process models decision-making with stochastic outcomes, using the bellman equation for mathematical representation.', 'duration': 28.792, 'max_score': 25727.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25727578.jpg'}, {'end': 25960.776, 'src': 'embed', 'start': 25917.759, 'weight': 6, 'content': [{'end': 25918.399, 'text': 'In reality,', 'start': 25917.759, 'duration': 0.64}, {'end': 25927.162, 'text': 'the reward system can be very complex and particularly modeling sparse rewards is an active area of research in the domain of reinforcement learning.', 'start': 25918.399, 'duration': 8.763}, {'end': 25934.185, 'text': "So by now we have got the equation which we have is so what we're going to do is now transition to Q learning.", 'start': 25927.762, 'duration': 6.423}, {'end': 25941.387, 'text': 'So this equation gives us the value of going to a particular state taking the stochasticity of the environment into account.', 'start': 25934.805, 'duration': 6.582}, {'end': 25949.228, 'text': 'Now, we have also learned very briefly about the idea of living penalty, which deals with associating each move of the robot with a reward.', 'start': 25941.983, 'duration': 7.245}, {'end': 25956.894, 'text': 'So Q learning possesses an idea of assessing the quality of an action that is taken to move to a state,', 'start': 25949.849, 'duration': 7.045}, {'end': 25960.776, 'text': 'rather than determining the possible value of the state which is being moved to.', 'start': 25956.894, 'duration': 3.882}], 'summary': 'Sparse rewards in reinforcement learning. transition to q learning with stochasticity and living penalty.', 'duration': 43.017, 'max_score': 25917.759, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25917759.jpg'}, {'end': 26195.226, 'src': 'embed', 'start': 26166.329, 'weight': 8, 'content': [{'end': 26169.051, 'text': 'So how do we capture this change and the real difference?', 'start': 26166.329, 'duration': 2.722}, {'end': 26177.078, 'text': 'We calculate the new Q S comma A with the same formula and subtract the previously known Q S A from it.', 'start': 26169.712, 'duration': 7.366}, {'end': 26180.636, 'text': 'So this will in turn give us the new QA.', 'start': 26177.754, 'duration': 2.882}, {'end': 26186.02, 'text': 'Now, the equation that we just derived gives the temporal difference in the Q values,', 'start': 26181.396, 'duration': 4.624}, {'end': 26190.683, 'text': 'which further helps to capture the random changes in the environment which may impose.', 'start': 26186.02, 'duration': 4.663}, {'end': 26195.226, 'text': 'Now the new Q is updated as the following.', 'start': 26191.383, 'duration': 3.843}], 'summary': 'Calculate new q values using temporal difference equation.', 'duration': 28.897, 'max_score': 26166.329, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26166329.jpg'}, {'end': 26304.93, 'src': 'embed', 'start': 26276.542, 'weight': 1, 'content': [{'end': 26284.164, 'text': "So, first of all, let's map each of the above locations in the warehouse to numbers or the states, so that it will ease our calculations right?", 'start': 26276.542, 'duration': 7.622}, {'end': 26295.587, 'text': "So what I'm going to do is create a new python3 file in the Jupyter notebook and I'll name it as Q learning numpy.", 'start': 26284.824, 'duration': 10.763}, {'end': 26298.788, 'text': "Okay, so let's define the states.", 'start': 26296.368, 'duration': 2.42}, {'end': 26304.93, 'text': "But before that what we need to do is import numpy because we're going to use numpy for this purpose.", 'start': 26299.608, 'duration': 5.322}], 'summary': 'Creating a python file for q learning numpy to define states using numpy.', 'duration': 28.388, 'max_score': 26276.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26276542.jpg'}, {'end': 26538.034, 'src': 'embed', 'start': 26514.284, 'weight': 0, 'content': [{'end': 26521.35, 'text': "into the q of next state and we'll take np.argmax of q of next state, minus q of the current state.", 'start': 26514.284, 'duration': 7.066}, {'end': 26526.69, 'text': "We're going to then update the Q values using the Bellman equation, as you can see here.", 'start': 26522.088, 'duration': 4.602}, {'end': 26534.333, 'text': "You have the Bellman equation and we're going to update the Q values and after that we're going to initialize the optimal route with a starting location.", 'start': 26527.09, 'duration': 7.243}, {'end': 26538.034, 'text': 'Now here we do not know what the next location yet.', 'start': 26535.193, 'duration': 2.841}], 'summary': 'Using bellman equation to update q values for optimal route.', 'duration': 23.75, 'max_score': 26514.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26514284.jpg'}], 'start': 25032.436, 'title': 'Reinforcement learning for robot navigation', 'summary': 'Explores reinforcement learning for robot navigation in an automobile factory warehouse, covering q-learning, bellman equation, stochasticity, and the implementation of the q learning algorithm with specific parameters and iterations to find the optimal route.', 'chapters': [{'end': 25341.946, 'start': 25032.436, 'title': 'Reinforcement learning for robot navigation', 'summary': 'Discusses the application of reinforcement learning in enabling autonomous robots to navigate an automobile factory warehouse, with a focus on exploring states, actions, and rewards to find the optimal policy for maximizing rewards and incorporating priority locations.', 'duration': 309.51, 'highlights': ['The task involves enabling robots to find the shortest route from any given location to another within the factory, with a focus on exploring states, actions, and rewards for decision-making.', 'The chapter delves into the mapping of location codes to numbers to represent states and the set of actions as the possible states of the robot, illustrating the relationship between states and actions.', 'The construction of a reward table is discussed, emphasizing the mapping of reward values between all possible states and the incorporation of priority locations with very high rewards to reflect their importance.']}, {'end': 25749.668, 'start': 25342.767, 'title': 'Bellman equation and reinforcement learning', 'summary': 'Introduces the bellman equation as a key component of reinforcement learning and q-learning, explaining its role in enabling the robot with memory and the value of being in a particular state, while also discussing the introduction of stochasticity through markov decision process.', 'duration': 406.901, 'highlights': ['The Bellman equation is introduced as a key component of reinforcement learning, enabling the robot with memory and the value of being in a particular state.', "Explanation of the Bellman equation, including the components such as state, action, reward, and discount factor, and its role in guiding the robot's decision-making process.", 'Introduction of stochasticity in decision-making through the Markov decision process, which models situations where outcomes are partly random and partly under the control of the decision maker.']}, {'end': 26249.446, 'start': 25750.528, 'title': 'Q learning basics and implementation', 'summary': 'Covers the introduction of stochasticity in the bellman equation and transition to q learning, with a focus on the idea of living penalty and the temporal difference method in calculating q values, providing insights into the implementation of q learning.', 'duration': 498.918, 'highlights': ['The introduction of stochasticity in the Bellman equation and transition to Q learning', 'The idea of living penalty and its significance in reinforcement learning', 'The temporal difference method in calculating Q values and its role in capturing random changes in the environment']}, {'end': 26585.269, 'start': 26249.947, 'title': 'Implementing q learning algorithm for robot path finding', 'summary': 'Outlines the implementation of the q learning algorithm for path finding in a warehouse environment, involving mapping of states, defining actions, rewards, and optimal routes using python and numpy, with parameters gamma as 0.75 and alpha as 0.9, and the q learning process involving 1000 iterations to obtain the optimal route from l9 to l1.', 'duration': 335.322, 'highlights': ['The Q learning process involves 1000 iterations to obtain the optimal route from a starting location to an end location, using parameters gamma as 0.75 and alpha as 0.9, along with the Bellman equation for updating Q values.', 'The chapter outlines the implementation of the Q learning algorithm for path finding in a warehouse environment, involving mapping of states, defining actions, rewards, and optimal routes using Python and numpy.', 'The process includes defining the states and mapping them to numbers, defining actions and the transition to the next state, as well as creating a reward table.', "The Q values are initialized to be all zeros, and the rewards matrix is copied to a new one for the Q learning process, with the ending state's priority set to the highest value of 999.", "A function 'get optimal route' is defined to obtain the optimal route for reaching the end location from the starting location, involving iterating through the neighbor locations, computing temporal difference, and updating the Q values using the Bellman equation."]}], 'duration': 1552.833, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ25032436.jpg', 'highlights': ['The Q learning process involves 1000 iterations to obtain the optimal route from a starting location to an end location, using parameters gamma as 0.75 and alpha as 0.9, along with the Bellman equation for updating Q values.', 'The chapter outlines the implementation of the Q learning algorithm for path finding in a warehouse environment, involving mapping of states, defining actions, rewards, and optimal routes using Python and numpy.', 'The task involves enabling robots to find the shortest route from any given location to another within the factory, with a focus on exploring states, actions, and rewards for decision-making.', 'The construction of a reward table is discussed, emphasizing the mapping of reward values between all possible states and the incorporation of priority locations with very high rewards to reflect their importance.', 'Introduction of stochasticity in decision-making through the Markov decision process, which models situations where outcomes are partly random and partly under the control of the decision maker.', 'The Bellman equation is introduced as a key component of reinforcement learning, enabling the robot with memory and the value of being in a particular state.', 'The introduction of stochasticity in the Bellman equation and transition to Q learning', 'The idea of living penalty and its significance in reinforcement learning', 'The temporal difference method in calculating Q values and its role in capturing random changes in the environment', 'The chapter delves into the mapping of location codes to numbers to represent states and the set of actions as the possible states of the robot, illustrating the relationship between states and actions.']}, {'end': 27692.796, 'segs': [{'end': 26724.472, 'src': 'embed', 'start': 26681.087, 'weight': 1, 'content': [{'end': 26686.552, 'text': 'Now as the repository states, there are primarily three major features of TensorFlow.js.', 'start': 26681.087, 'duration': 5.465}, {'end': 26690.435, 'text': 'develop machine learning and deep learning models in your browser itself.', 'start': 26687.092, 'duration': 3.343}, {'end': 26693.577, 'text': 'run pre-existing TensorFlow models within the browser.', 'start': 26690.435, 'duration': 3.142}, {'end': 26696.459, 'text': 'retrain or fine-tune these pre-existing models as well.', 'start': 26693.577, 'duration': 2.882}, {'end': 26702.203, 'text': 'And if you are familiar with Keras the high-level layers API will seem quite familiar.', 'start': 26696.959, 'duration': 5.244}, {'end': 26705.687, 'text': 'But there are plenty of examples available on GitHub repository.', 'start': 26702.785, 'duration': 2.902}, {'end': 26709.249, 'text': 'So do check out those links to quicken your learning curve.', 'start': 26705.747, 'duration': 3.502}, {'end': 26715.792, 'text': "And as I mentioned earlier, I'll leave the links to all of these open source machine learning projects in the description below.", 'start': 26709.709, 'duration': 6.083}, {'end': 26719.334, 'text': "Now next, what we're going to discuss is Detectron.", 'start': 26715.812, 'duration': 3.522}, {'end': 26724.472, 'text': 'It is developed by Facebook and made a huge splash when it was earlier launched in 2018.', 'start': 26719.91, 'duration': 4.562}], 'summary': 'Tensorflow.js offers 3 major features, with examples available on github. detectron was launched by facebook in 2018.', 'duration': 43.385, 'max_score': 26681.087, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26681087.jpg'}, {'end': 27008.379, 'src': 'embed', 'start': 26965.47, 'weight': 3, 'content': [{'end': 26972.618, 'text': "Now, if you follow a few researchers on social media, you must have come across some of the images I'm showing here in a video form.", 'start': 26965.47, 'duration': 7.148}, {'end': 26977.303, 'text': 'A stick human running across the terrain or trying to stand up or some sort.', 'start': 26973.138, 'duration': 4.165}, {'end': 26980.406, 'text': 'Now that, my friends, is reinforcement learning in action.', 'start': 26977.743, 'duration': 2.663}, {'end': 26986.993, 'text': "Now, here's a signature example of it, a framework to create a simulated humanoid to imitate multiple motion skills.", 'start': 26980.846, 'duration': 6.147}, {'end': 26997.131, 'text': "So let's have a look at the top 10 skills which are required to become a successful machine learning engineer.", 'start': 26992.208, 'duration': 4.923}, {'end': 27001.774, 'text': 'So starting with programming languages, Python is the lingua franca of machine learning.', 'start': 26997.592, 'duration': 4.182}, {'end': 27008.379, 'text': "You may have had exposure to Python even if you weren't previously in programming or in a computer science-related field.", 'start': 27002.135, 'duration': 6.244}], 'summary': 'Reinforcement learning demonstrated through simulated humanoid, top 10 skills for a machine learning engineer include python proficiency.', 'duration': 42.909, 'max_score': 26965.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26965470.jpg'}, {'end': 27051.258, 'src': 'embed', 'start': 27026.184, 'weight': 7, 'content': [{'end': 27031.567, 'text': 'Now, if you want a job in machine learning, you will probably have to learn all of these languages at some point.', 'start': 27026.184, 'duration': 5.383}, {'end': 27038.051, 'text': 'C++ can help in speeding code up, whereas R works great in statistics and plots.', 'start': 27032.088, 'duration': 5.963}, {'end': 27043.475, 'text': 'And Hadoop is Java based, so you probably need to implement mappers and reducers in Java.', 'start': 27038.472, 'duration': 5.003}, {'end': 27045.616, 'text': 'Now, next we have linear algebra.', 'start': 27043.915, 'duration': 1.701}, {'end': 27051.258, 'text': "You'll need to be intimately familiar with matrices, vectors and matrix multiplication.", 'start': 27046.176, 'duration': 5.082}], 'summary': 'Job in machine learning requires learning c++, r, java, and linear algebra.', 'duration': 25.074, 'max_score': 27026.184, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27026184.jpg'}, {'end': 27259.351, 'src': 'embed', 'start': 27228.537, 'weight': 4, 'content': [{'end': 27231.438, 'text': 'now coming to our next point, which is the natural language processing.', 'start': 27228.537, 'duration': 2.901}, {'end': 27239.327, 'text': 'Now, since it combines computer science and linguistic, there are a bunch of libraries, like the NLTK, GenSysm and the techniques,', 'start': 27232.018, 'duration': 7.309}, {'end': 27243.552, 'text': 'such as sentimental analysis and summarization, that are unique to NLP.', 'start': 27239.327, 'duration': 4.225}, {'end': 27248.978, 'text': 'Now, audio and video processing has a frequent overlap with the natural language processing.', 'start': 27244.092, 'duration': 4.886}, {'end': 27253.704, 'text': 'However, natural language processing can be applied to non-audio data like text.', 'start': 27249.398, 'duration': 4.306}, {'end': 27259.351, 'text': 'Voice and audio analysis involves extracting useful information from the audio signals themselves.', 'start': 27254.104, 'duration': 5.247}], 'summary': 'Natural language processing involves unique techniques like sentimental analysis and summarization, and can be applied to non-audio data like text.', 'duration': 30.814, 'max_score': 27228.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27228537.jpg'}, {'end': 27350.879, 'src': 'embed', 'start': 27314.994, 'weight': 5, 'content': [{'end': 27319.035, 'text': "You won't really be able to help your organization explore new business opportunities.", 'start': 27314.994, 'duration': 4.041}, {'end': 27321.557, 'text': 'So this is a must-have skill.', 'start': 27319.535, 'duration': 2.022}, {'end': 27324.239, 'text': 'Now. next, we have effective communication.', 'start': 27321.557, 'duration': 2.682}, {'end': 27329.984, 'text': "You'll need to explain the machine learning concepts to the people with little to no expertise in the field.", 'start': 27324.239, 'duration': 5.745}, {'end': 27330.565, 'text': 'Chances are.', 'start': 27329.984, 'duration': 0.581}, {'end': 27337.851, 'text': "you'll need to work with a team of engineers as well as many other teams, So communication is going to make all of this much more easier.", 'start': 27330.565, 'duration': 7.286}, {'end': 27350.879, 'text': 'companies searching for a strong machine learning engineer are looking for someone who can clearly and fluently translate their technical findings to a non-technical team such as marketing or sales department.', 'start': 27338.351, 'duration': 12.528}], 'summary': 'Machine learning engineers need skills in exploring opportunities, effective communication, and technical translation for non-technical teams.', 'duration': 35.885, 'max_score': 27314.994, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27314994.jpg'}, {'end': 27376.302, 'src': 'embed', 'start': 27350.879, 'weight': 8, 'content': [{'end': 27360.578, 'text': 'next on our list, we have rapid prototyping, so iterating on ideas as quickly as possible is mandatory for finding one that works In machine learning.', 'start': 27350.879, 'duration': 9.699}, {'end': 27366.239, 'text': 'this applies to everything from picking up the right model to working on projects such as A-B testing.', 'start': 27360.578, 'duration': 5.661}, {'end': 27376.302, 'text': 'You need to do a group of techniques used to quickly fabricate a scale model of a physical part or assembly using the three-dimensional computer-aided design,', 'start': 27366.64, 'duration': 9.662}], 'summary': 'Rapid prototyping is essential for finding successful machine learning ideas, including a-b testing.', 'duration': 25.423, 'max_score': 27350.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27350879.jpg'}, {'end': 27412.559, 'src': 'embed', 'start': 27385.744, 'weight': 6, 'content': [{'end': 27390.765, 'text': 'Every month, new neural network models come out that outperform the previous architecture.', 'start': 27385.744, 'duration': 5.021}, {'end': 27398.088, 'text': 'It also means being aware of the news regarding the development of the tools, the changelog, the conferences and much more.', 'start': 27390.765, 'duration': 7.323}, {'end': 27400.908, 'text': 'You need to know about the theories and algorithms Now.', 'start': 27398.088, 'duration': 2.82}, {'end': 27406.19, 'text': 'this you can achieve by reading the research papers, blogs, the conferences, videos,', 'start': 27400.908, 'duration': 5.282}, {'end': 27409.851, 'text': 'and also you need to focus on the online community with changes very quickly.', 'start': 27406.19, 'duration': 3.661}, {'end': 27412.559, 'text': 'So expect and cultivate this change.', 'start': 27410.538, 'duration': 2.021}], 'summary': 'New neural network models outperforming previous architecture monthly. stay informed about tools, changelog, theories, and algorithms through research papers, blogs, conferences, videos, and online community.', 'duration': 26.815, 'max_score': 27385.744, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27385744.jpg'}, {'end': 27673.301, 'src': 'embed', 'start': 27645.369, 'weight': 0, 'content': [{'end': 27651.29, 'text': 'so the average salary in the US is around 111 thousand four hundred and ninety dollars,', 'start': 27645.369, 'duration': 5.921}, {'end': 27656.531, 'text': 'and the average salary in India is around seven lakh nineteen thousand six hundred and forty six rupees.', 'start': 27651.29, 'duration': 5.241}, {'end': 27660.133, 'text': "That's a very good average salary for any particular profession.", 'start': 27657.171, 'duration': 2.962}, {'end': 27667.928, 'text': 'So, moving forward, if we have a look at the salary of an entry-level machine learning engineer, so the salary ranges from $76, 000 or $77,', 'start': 27660.653, 'duration': 7.275}, {'end': 27668.318, 'text': '000 to $151, 000 per annum.', 'start': 27667.928, 'duration': 0.39}, {'end': 27669.099, 'text': "That's a huge salary.", 'start': 27668.338, 'duration': 0.761}, {'end': 27673.301, 'text': 'And if you talk about the bonus here, we have like $3, 000 to $25, 000 depending on the work you do and the project you are working on.', 'start': 27669.179, 'duration': 4.122}], 'summary': 'Us average salary: $111,490. india average salary: 7,19,646 inr. entry-level machine learning engineer salary ranges from $76,000 to $151,000 per annum, with bonuses from $3,000 to $25,000.', 'duration': 27.932, 'max_score': 27645.369, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27645369.jpg'}], 'start': 26586.029, 'title': 'Machine learning trends and skills', 'summary': 'Discusses the latest trends in open-source machine learning projects, including tensorflow.js and detectron, and highlights the top 10 technical and non-technical skills required for a successful machine learning engineer. it also explores essential skills for a machine learning engineer and provides job trends and salary insights, with an average salary of $111,490 in the us and 7,19,646 rupees in india.', 'chapters': [{'end': 26986.993, 'start': 26586.029, 'title': 'Open source machine learning projects', 'summary': 'Discusses the latest trends in machine learning, highlighting tensorflow.js, detectron, densepose, and other open-source machine learning projects, with emphasis on their features and impact in various domains such as healthcare, finance, and natural language processing.', 'duration': 400.964, 'highlights': ['Tensorflow.js: A popular release with the potential to change habits, enabling machine learning and deep learning models development, running pre-existing TensorFlow models within the browser, and retraining or fine-tuning pre-existing models.', 'Detectron: A Facebook-developed framework implementing state-of-the-art object detection, written in Python, and containing over 70 pre-trained models.', 'DensePose: Allows training and evaluating dense human pose using the RCNN model, with included open-source code and visualization notebooks for the COCO dataset.', 'Reinforcement Learning in Action: Framework to create a simulated humanoid to imitate multiple motion skills, showcasing reinforcement learning.']}, {'end': 27350.879, 'start': 26992.208, 'title': 'Top 10 skills for machine learning engineer', 'summary': 'Highlights the top 10 technical and non-technical skills required for a successful machine learning engineer, including python programming, linear algebra, neural network architectures, natural language processing, industry knowledge, and effective communication.', 'duration': 358.671, 'highlights': ['Python is the lingua franca of machine learning, and a solid understanding of classes and data structures is important.', 'Familiarity with languages like C++, R, and Java, as well as linear algebra, statistics, and advanced signal processing techniques, is necessary.', 'Applied maths, including numerical analysis and algorithm theory, provides a significant edge in selecting and applying machine learning techniques.', 'Comprehensive understanding of neural network architectures and natural language processing is crucial for solving complex problems in machine learning.', 'Industry knowledge and effective communication are vital soft skills for understanding business needs and conveying technical concepts to non-technical teams.']}, {'end': 27692.796, 'start': 27350.879, 'title': 'Skills for machine learning engineer', 'summary': 'Explores the essential skills for a successful machine learning engineer, including rapid prototyping, staying updated, and bonus skills like physics, reinforcement learning, and computer vision. it also provides job trends and salary insights for machine learning engineers, with an average salary of $111,490 in the us and 7,19,646 rupees in india.', 'duration': 341.917, 'highlights': ['Machine learning engineer job trends and salary insights', 'Importance of staying updated in machine learning', 'Rapid prototyping in machine learning']}], 'duration': 1106.767, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ26586029.jpg', 'highlights': ['Average salary of $111,490 in the US and 7,19,646 rupees in India', 'Tensorflow.js enables machine learning and deep learning models development', 'Detectron is a Facebook-developed framework for object detection', 'Python is the lingua franca of machine learning', 'Comprehensive understanding of neural network architectures and natural language processing is crucial', 'Industry knowledge and effective communication are vital soft skills', 'Importance of staying updated in machine learning', 'Familiarity with languages like C++, R, and Java is necessary', 'Rapid prototyping in machine learning', 'Reinforcement Learning in Action: Framework to create a simulated humanoid']}, {'end': 29743.231, 'segs': [{'end': 27721.245, 'src': 'embed', 'start': 27692.796, 'weight': 0, 'content': [{'end': 27699.743, 'text': 'the company you are working for and the percentage that they give to the engineer or the developer for that particular project.', 'start': 27692.796, 'duration': 6.947}, {'end': 27704.571, 'text': 'Now the total pay comes around $76, 000 or $75, 000 to $162, 000..', 'start': 27700.347, 'duration': 4.224}, {'end': 27709.355, 'text': 'And this is just for the entry level machine learning engineer.', 'start': 27704.571, 'duration': 4.784}, {'end': 27714.399, 'text': 'Just imagine if you become an experienced machine learning engineer, your salary is gonna go through the roof.', 'start': 27709.735, 'duration': 4.664}, {'end': 27721.245, 'text': 'So, now that we have understood who exactly is a machine learning engineer, the various salary trends,', 'start': 27715.64, 'duration': 5.605}], 'summary': 'Entry-level machine learning engineer earns $75,000-$162,000. salary increases with experience.', 'duration': 28.449, 'max_score': 27692.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27692796.jpg'}, {'end': 27754.791, 'src': 'embed', 'start': 27731.296, 'weight': 5, 'content': [{'end': 27739.001, 'text': "Now, programming languages are a big deal when it comes to machine learning because you don't just need to have proficiency in one language.", 'start': 27731.296, 'duration': 7.705}, {'end': 27744.424, 'text': 'You might require proficiency in Python, Java, R or C++,', 'start': 27739.441, 'duration': 4.983}, {'end': 27750.888, 'text': 'because you might be working in a Hadoop environment where you require Java programming to do the MapReduce codings.', 'start': 27744.424, 'duration': 6.464}, {'end': 27754.791, 'text': 'And sometimes R is very great for visualization purposes.', 'start': 27751.309, 'duration': 3.482}], 'summary': 'Proficiency in multiple languages like python, java, r or c++ is essential for machine learning, especially in hadoop environment for mapreduce coding.', 'duration': 23.495, 'max_score': 27731.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27731296.jpg'}, {'end': 28278.839, 'src': 'embed', 'start': 28250.643, 'weight': 6, 'content': [{'end': 28259.006, 'text': 'Companies searching for a strong machine learning engineer are looking for someone who can clearly and fluently translate their technical findings to a non-technical team.', 'start': 28250.643, 'duration': 8.363}, {'end': 28265.551, 'text': 'now. rapid prototyping is another skill which is very much required for any machine learning engineer.', 'start': 28259.867, 'duration': 5.684}, {'end': 28271.955, 'text': 'so iterating on ideas as quickly as possible is mandatory for finding the one that works in machine learning.', 'start': 28265.551, 'duration': 6.404}, {'end': 28278.839, 'text': 'this applies to everything from picking the right model to working on projects such as a be testing and much more.', 'start': 28271.955, 'duration': 6.884}], 'summary': 'Ml engineer should translate findings, rapid prototype, iterate quickly, and work on a/b testing.', 'duration': 28.196, 'max_score': 28250.643, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ28250643.jpg'}, {'end': 28499.517, 'src': 'embed', 'start': 28474.529, 'weight': 1, 'content': [{'end': 28484.554, 'text': "provides a machine learning engineer master's program now that is aligned in such a way that will get you acquainted in all the skills that are required to become a machine learning engineer,", 'start': 28474.529, 'duration': 10.025}, {'end': 28493.494, 'text': 'and that too in the correct format, As part of the machine learning interview series.', 'start': 28484.554, 'duration': 8.94}, {'end': 28496.295, 'text': 'I wanted to let you know that what the market right now.', 'start': 28493.554, 'duration': 2.741}, {'end': 28499.517, 'text': 'There is a lot of openings in the machine learning field.', 'start': 28496.336, 'duration': 3.181}], 'summary': "Machine learning engineer master's program covers all required skills, with many job openings in the field.", 'duration': 24.988, 'max_score': 28474.529, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ28474529.jpg'}, {'end': 28598.389, 'src': 'embed', 'start': 28571.761, 'weight': 7, 'content': [{'end': 28576.604, 'text': 'So first thing is this session is divided into three components of three broad components.', 'start': 28571.761, 'duration': 4.843}, {'end': 28580.247, 'text': 'So first thing is machine learning core interview questions.', 'start': 28576.624, 'duration': 3.623}, {'end': 28586.544, 'text': 'So, within this core interview questions, we are more interested with the theoretical aspects of the machine learning,', 'start': 28580.782, 'duration': 5.762}, {'end': 28591.706, 'text': "like how we're going to ask you the theoretical questions and you can explain those in an efficient manner.", 'start': 28586.544, 'duration': 5.162}, {'end': 28598.389, 'text': "Then second is the technical part where we're going to see the interview questions related to the Python.", 'start': 28592.026, 'duration': 6.363}], 'summary': 'Session covers machine learning core interview questions and technical python questions.', 'duration': 26.628, 'max_score': 28571.761, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ28571761.jpg'}, {'end': 28826.986, 'src': 'embed', 'start': 28802.119, 'weight': 4, 'content': [{'end': 28810.315, 'text': 'So model learns through the observations which are there and it tries to identify some patterns and structure which would be hidden within the data.', 'start': 28802.119, 'duration': 8.196}, {'end': 28812.983, 'text': 'So in this case, we are not labeling the data.', 'start': 28810.782, 'duration': 2.201}, {'end': 28822.545, 'text': 'So model is given a data and model is left on itself to learn the patterns and relationships out of the data by creating clusters.', 'start': 28813.083, 'duration': 9.462}, {'end': 28826.986, 'text': 'So clustering is one of the major techniques which are used in the unsupervised learning.', 'start': 28822.565, 'duration': 4.421}], 'summary': 'Model learns patterns and structures from unlabeled data using clustering in unsupervised learning.', 'duration': 24.867, 'max_score': 28802.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ28802119.jpg'}, {'end': 28949.35, 'src': 'embed', 'start': 28921.543, 'weight': 2, 'content': [{'end': 28928.706, 'text': "you will get your score more, which is like a reward, and when you don't, you do this kind of action, your school will get penalized.", 'start': 28921.543, 'duration': 7.163}, {'end': 28933.028, 'text': 'So based on this environment your reinforcement learning will try to learn those things.', 'start': 28929.026, 'duration': 4.002}, {'end': 28938.061, 'text': 'So that is one example, and you can also give some examples related to the AlphaGo,', 'start': 28933.618, 'duration': 4.443}, {'end': 28943.105, 'text': 'which is a Go playing game which recently was developed by Google based company.', 'start': 28938.061, 'duration': 5.044}, {'end': 28949.35, 'text': 'and you can also say that chess based games can be automated with the reinforcement learning.', 'start': 28943.105, 'duration': 6.245}], 'summary': 'Reinforcement learning rewards performance, penalizes failure, and drives learning in games like alphago and chess.', 'duration': 27.807, 'max_score': 28921.543, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ28921543.jpg'}, {'end': 29677.932, 'src': 'embed', 'start': 29654.628, 'weight': 9, 'content': [{'end': 29662.453, 'text': 'and for the no case for the rejection case, out of 60 people the model says 50, which is again a good thing.', 'start': 29654.628, 'duration': 7.825}, {'end': 29669.647, 'text': 'but Your model says for 10 people who are actually defaulters, but model says you should give them loan.', 'start': 29662.453, 'duration': 7.194}, {'end': 29677.932, 'text': 'So those are the things which can be looked into the future, but confusion matrix will help you to understand how your model is performing,', 'start': 29669.847, 'duration': 8.085}], 'summary': 'Out of 60 people, model predicted 50 non-defaulters. for 10 defaulters, model suggests giving them a loan. confusion matrix aids in understanding model performance.', 'duration': 23.304, 'max_score': 29654.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ29654628.jpg'}], 'start': 27692.796, 'title': 'Machine learning engineer skills & trends', 'summary': 'Discusses essential skills, salary trends, and job responsibilities of machine learning engineers, emphasizing programming languages, industry knowledge, and rapid prototyping. it also highlights the high demand for machine learning engineers and provides an overview of the machine learning interview process and different types of machine learning, including supervised, unsupervised, and reinforcement learning.', 'chapters': [{'end': 28288.805, 'start': 27692.796, 'title': 'Machine learning engineer: skills, trends & salary', 'summary': 'Discusses the skills, salary trends, and job responsibilities of a machine learning engineer, emphasizing the importance of programming languages, calculus and statistics, signal processing, neural networks, and language processing, and stressing the need for industry knowledge, effective communication, and rapid prototyping as non-technical skills.', 'duration': 596.009, 'highlights': ["Machine learning engineer's salary ranges from $75,000 to $162,000 for an entry-level position and significantly increases for experienced engineers.", 'Proficiency in multiple programming languages such as Python, Java, R, and C++ is crucial for machine learning engineers due to diverse job requirements, including Hadoop environment and visualization purposes.', 'Calculus and statistics are essential skills for machine learning engineers, with a focus on matrix multiplication, calculus, and statistics for understanding machine learning algorithms.', 'Industry knowledge, effective communication, and rapid prototyping are crucial non-technical skills for machine learning engineers to address real pain points, communicate findings, and iterate on ideas quickly.']}, {'end': 28474.529, 'start': 28289.412, 'title': 'Machine learning engineer skills and job market', 'summary': 'Highlights the essential skills required for a machine learning engineer, including staying updated on new neural network models and industry developments, having relevant education and experience in computer science and statistics, and working on machine learning-related projects. it also emphasizes the high demand for machine learning engineers across various industries, ensuring job opportunities.', 'duration': 185.117, 'highlights': ['Machine learning engineers need to stay updated with new neural network models and industry developments through research papers, blogs, and conference videos.', 'Relevant education and experience in computer science, statistics, and data analysis are crucial for landing a job as a machine learning engineer.', 'Working on machine learning-related projects involving AI and neural networks is essential for landing a job as a machine learning engineer.', 'The high demand for machine learning engineers in various industries, including tech giants, gaming, graphics, banking, and retail, ensures ample job opportunities in the field.']}, {'end': 28720.291, 'start': 28474.529, 'title': 'Machine learning engineer program overview', 'summary': 'Highlights the demand for machine learning engineers, the confusion around job titles, and provides an overview of the machine learning interview process, including core theoretical questions, technical python-related questions, and scenario-based questions.', 'duration': 245.762, 'highlights': ['There is a high demand for machine learning engineers in the market with many job openings.', 'Confusion exists regarding job titles such as data scientists, machine learning engineers, deep learning engineers, and data analysts, with a focus on emphasizing the importance of the job description during the interview process.', 'An overview of the machine learning interview process, including core theoretical questions, technical Python-related questions, and scenario-based questions.']}, {'end': 28882.898, 'start': 28720.551, 'title': 'Types of machine learning', 'summary': 'Explains supervised learning using labeled data for pattern identification, unsupervised learning using unlabeled data for pattern recognition through clustering, and reinforcement learning based on trial and error.', 'duration': 162.347, 'highlights': ['Supervised learning uses labeled data to train the model to identify patterns, such as distinguishing between apples and bananas based on provided labels, and applies the learned rules to make predictions on new data.', 'Unsupervised learning involves training the model to identify patterns and relationships in unlabeled data through techniques like clustering, where the model learns to group similar items together without explicit labels.', 'Reinforcement learning entails the model learning through trial and error, where it makes decisions and improves based on the outcomes of its actions.']}, {'end': 29258.737, 'start': 28883.547, 'title': 'Reinforcement learning and machine learning', 'summary': 'Discusses reinforcement learning, its application in gaming, and examples like alphago and semi-supervised algorithms. it also covers the differences between deep learning and machine learning, classification and regression in supervised learning, and addresses concepts like selection bias and precision and recall.', 'duration': 375.19, 'highlights': ['Reinforcement learning in gaming, with examples like Mario and AlphaGo, and semi-supervised algorithms explained.', 'Distinction between deep learning and machine learning, with emphasis on feature extraction and classification.', 'Explanation of classification and regression in supervised learning, with examples of continuous variable prediction and class prediction.', 'Definition and example of selection bias in statistical sampling.', 'Explanation of precision and recall in the context of examples.']}, {'end': 29743.231, 'start': 29259.118, 'title': 'Recall and precision in data science', 'summary': 'Discusses the concept of recall and precision in data science, using examples and formulas, and explains the difference between inductive and deductive learning, emphasizing the importance of model performance evaluation through confusion matrix.', 'duration': 484.113, 'highlights': ['The concepts of recall and precision are explained using examples and formulas, demonstrating the calculation of recall ratio and precision.', "The importance of evaluating model performance using confusion matrix is emphasized, showing how it helps in understanding the model's performance based on predicted and actual data.", 'The difference between inductive and deductive learning is explained through relatable examples, emphasizing the approaches of learning through observations and conclusions.']}], 'duration': 2050.435, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ27692796.jpg', 'highlights': ["Machine learning engineer's salary ranges from $75,000 to $162,000 for an entry-level position and significantly increases for experienced engineers.", 'There is a high demand for machine learning engineers in the market with many job openings.', 'Reinforcement learning entails the model learning through trial and error, where it makes decisions and improves based on the outcomes of its actions.', 'The high demand for machine learning engineers in various industries ensures ample job opportunities in the field.', 'Supervised learning uses labeled data to train the model to identify patterns and applies the learned rules to make predictions on new data.', 'Proficiency in multiple programming languages such as Python, Java, R, and C++ is crucial for machine learning engineers due to diverse job requirements.', 'Industry knowledge, effective communication, and rapid prototyping are crucial non-technical skills for machine learning engineers.', 'An overview of the machine learning interview process, including core theoretical questions, technical Python-related questions, and scenario-based questions.', 'Reinforcement learning in gaming, with examples like Mario and AlphaGo, and semi-supervised algorithms explained.', "The importance of evaluating model performance using confusion matrix is emphasized, showing how it helps in understanding the model's performance based on predicted and actual data."]}, {'end': 31527.203, 'segs': [{'end': 29830.089, 'src': 'embed', 'start': 29784.51, 'weight': 0, 'content': [{'end': 29788.631, 'text': 'In this question, the clustering is given to you as K-means clustering,', 'start': 29784.51, 'duration': 4.121}, {'end': 29792.653, 'text': 'but sometimes the interview will just ask you how is KNN different from K-means?', 'start': 29788.631, 'duration': 4.022}, {'end': 29796.194, 'text': 'First thing you have to do is you have to understand the difference,', 'start': 29793.233, 'duration': 2.961}, {'end': 29803.314, 'text': 'as k means is a unsupervised technique algorithm and knn is a supervised technique.', 'start': 29796.194, 'duration': 7.12}, {'end': 29806.717, 'text': 'in knn is used as a supervised algorithm.', 'start': 29803.314, 'duration': 3.403}, {'end': 29811.9, 'text': 'knn is used for classification, regression, and k means is used for clustering.', 'start': 29806.717, 'duration': 5.183}, {'end': 29813.721, 'text': "as it's a clustering algorithm.", 'start': 29811.9, 'duration': 1.821}, {'end': 29817.083, 'text': 'it is used to do create the clusters of your data.', 'start': 29813.721, 'duration': 3.362}, {'end': 29821.606, 'text': 'k within knn basically means it tries to observe the k neighbors.', 'start': 29817.083, 'duration': 4.523}, {'end': 29830.089, 'text': 'so in the case of regression, it will try to identify the surrounding k neighbors and take the average and give you the output.', 'start': 29821.606, 'duration': 8.483}], 'summary': 'K-means is unsupervised, knn is supervised for classification and regression.', 'duration': 45.579, 'max_score': 29784.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ29784510.jpg'}, {'end': 29925.66, 'src': 'embed', 'start': 29898.504, 'weight': 2, 'content': [{'end': 29902.226, 'text': 'The full form part of the ROC is receiver operating characteristics curve.', 'start': 29898.504, 'duration': 3.722}, {'end': 29906.773, 'text': 'and its fundamental use started with the diagnostic test evaluations.', 'start': 29902.631, 'duration': 4.142}, {'end': 29916.336, 'text': 'So in the medical field its application started and it is used in the machine learning field to do the classification related algorithms performance evaluation.', 'start': 29907.293, 'duration': 9.043}, {'end': 29925.66, 'text': 'So in simplistic term it is the plot of true positive rates which is also called as the sensitivity against the false positive rate.', 'start': 29916.776, 'duration': 8.884}], 'summary': "Roc is used in medical and machine learning for evaluating classification algorithms' performance.", 'duration': 27.156, 'max_score': 29898.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ29898504.jpg'}, {'end': 30073.035, 'src': 'embed', 'start': 30050.948, 'weight': 3, 'content': [{'end': 30059.932, 'text': 'So this is one of the very important concepts which you, interviewers, trying to understand as how you try to distinguish between the type of errors.', 'start': 30050.948, 'duration': 8.984}, {'end': 30066.134, 'text': 'or do you understand as how this type 1 and type 2 errors are impacting the performance of your model?', 'start': 30059.932, 'duration': 6.202}, {'end': 30073.035, 'text': 'So interviewers trying to understand those things so you can give some good examples to say how these are going to impact.', 'start': 30066.514, 'duration': 6.521}], 'summary': 'Interviewers seek examples of type 1 and type 2 errors impacting model performance.', 'duration': 22.087, 'max_score': 30050.948, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ30050948.jpg'}, {'end': 30354.279, 'src': 'embed', 'start': 30324.188, 'weight': 4, 'content': [{'end': 30327.491, 'text': 'So in those cases, if you look for the model accuracy,', 'start': 30324.188, 'duration': 3.303}, {'end': 30333.014, 'text': "your model accuracy will mostly be higher and it won't give you a complete picture of your model performance.", 'start': 30327.491, 'duration': 5.523}, {'end': 30341.835, 'text': 'So model accuracy is just a subset of the model performance and there are more metrics that you have to look to understand the model performance.', 'start': 30333.533, 'duration': 8.302}, {'end': 30347.497, 'text': 'Next question is what is the difference between Gini impurity and entropy in decision tree?', 'start': 30342.936, 'duration': 4.561}, {'end': 30348.757, 'text': 'So both.', 'start': 30347.957, 'duration': 0.8}, {'end': 30354.279, 'text': 'first thing this both the things are used as a impurity measure in decision tree.', 'start': 30348.757, 'duration': 5.522}], 'summary': "Model accuracy doesn't provide complete model performance picture; gini impurity and entropy measure impurity in decision tree.", 'duration': 30.091, 'max_score': 30324.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ30324188.jpg'}, {'end': 30494.378, 'src': 'embed', 'start': 30466.348, 'weight': 5, 'content': [{'end': 30472.53, 'text': 'So as your entropy keeps on decreasing, your information gain keeps on increasing.', 'start': 30466.348, 'duration': 6.182}, {'end': 30480.233, 'text': 'So both are related with each other and as your entropy is decreasing, your information gain will keep on increasing.', 'start': 30473.03, 'duration': 7.203}, {'end': 30486.695, 'text': 'Your information gain will keep on increasing as your nodes are getting pure and pure.', 'start': 30481.253, 'duration': 5.442}, {'end': 30494.378, 'text': "So node purity basically says as you're getting specific classes within the nodes, those nodes are getting purer.", 'start': 30486.975, 'duration': 7.403}], 'summary': 'As entropy decreases, information gain increases, leading to purer nodes.', 'duration': 28.03, 'max_score': 30466.348, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ30466348.jpg'}, {'end': 30648.714, 'src': 'embed', 'start': 30616.052, 'weight': 6, 'content': [{'end': 30619.976, 'text': 'The next part is how do we ensure our model is not overfitting.', 'start': 30616.052, 'duration': 3.924}, {'end': 30624.3, 'text': 'So for this there are multiple ways that we can control the overfitting.', 'start': 30620.356, 'duration': 3.944}, {'end': 30627.2, 'text': 'First is we collect more data.', 'start': 30624.818, 'duration': 2.382}, {'end': 30633.844, 'text': 'So as we have lesser data our model will try to get very much exact to what is there within the training data.', 'start': 30627.22, 'duration': 6.624}, {'end': 30636.626, 'text': 'Getting more data may help it to generalize well.', 'start': 30634.164, 'duration': 2.462}, {'end': 30640.969, 'text': 'Second is using ensemble methods that average the models.', 'start': 30637.126, 'duration': 3.843}, {'end': 30648.714, 'text': 'So when we split our data into multiple components or we use multiple models which will try to understand data in a different manner.', 'start': 30641.429, 'duration': 7.285}], 'summary': 'To prevent overfitting, consider collecting more data and using ensemble methods to average models.', 'duration': 32.662, 'max_score': 30616.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ30616052.jpg'}, {'end': 31150.866, 'src': 'embed', 'start': 31123.73, 'weight': 7, 'content': [{'end': 31128.851, 'text': 'how do you screen for outliers and what should you do if you find one?', 'start': 31123.73, 'duration': 5.121}, {'end': 31138.413, 'text': 'in this case, your interviewer is interested to know as how you give importance to the outliers, how you understand them and, when you find them,', 'start': 31128.851, 'duration': 9.562}, {'end': 31142.494, 'text': 'how you are trying to improve your models with the outliers, how you take your decisions.', 'start': 31138.413, 'duration': 4.081}, {'end': 31144.795, 'text': 'once you know you get the outliers.', 'start': 31142.494, 'duration': 2.301}, {'end': 31148.585, 'text': 'so first thing, you screen the outliers using different ways.', 'start': 31144.795, 'duration': 3.79}, {'end': 31150.866, 'text': 'So some of the ways are one is the box plot.', 'start': 31148.605, 'duration': 2.261}], 'summary': 'Screen outliers using different methods like box plot.', 'duration': 27.136, 'max_score': 31123.73, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31123730.jpg'}], 'start': 29743.231, 'title': 'Key machine learning concepts', 'summary': 'Discusses deductive vs inductive learning, knn and k-means clustering, roc curve and model performance, gini impurity and entropy in decision trees, and outliers screening and handling. it emphasizes the significance of these concepts in model evaluation and practical applications in data analysis and machine learning.', 'chapters': [{'end': 29853.101, 'start': 29743.231, 'title': 'Deductive vs inductive learning', 'summary': 'Discusses the differences between deductive and inductive learning, and the distinctions between knn and k-means clustering, highlighting knn as a supervised technique for classification and regression, and k-means as an unsupervised technique for clustering.', 'duration': 109.87, 'highlights': ['KNN is a supervised technique used for classification and regression, while K-means is an unsupervised technique used for clustering.', 'KNN uses k neighbors for observing surrounding data points, enabling regression by identifying the surrounding k neighbors and taking the average and classification based on the majority class in the surrounding k neighbors.', 'K-means creates K clusters for the input data, serving as a clustering algorithm to group data points into clusters.']}, {'end': 30341.835, 'start': 29853.481, 'title': 'Understanding roc curve and model performance', 'summary': 'Explains the significance of roc curve in evaluating model performance, distinguishing between type 1 and type 2 errors, and the importance of model accuracy in model performance, with emphasis on trade-offs and domain-specific examples.', 'duration': 488.354, 'highlights': ['ROC curve is a plot of true positive rates against false positive rates, used for binary classification algorithm performance evaluation.', 'Explanation of type 1 and type 2 errors, with examples and the importance of managing trade-offs between false positives and false negatives based on domain-specific requirements.', 'Model accuracy is a subset of model performance, and the importance of considering various performance measures other than model accuracy based on specific domain contexts.']}, {'end': 31123.73, 'start': 30342.936, 'title': 'Difference between gini impurity and entropy in decision trees', 'summary': 'Explains the difference between gini impurity and entropy in decision trees, the concept of impurity, the relationship between entropy and information gain, the concept of overfitting and methods to prevent it, and the explanation of ensemble learning technique in machine learning.', 'duration': 780.794, 'highlights': ['The relationship between entropy and information gain', 'Explanation of overfitting and methods to prevent it', 'Explanation of ensemble learning technique in machine learning', 'The concept of impurity in decision trees and the difference between Gini impurity and entropy']}, {'end': 31527.203, 'start': 31123.73, 'title': 'Outliers screening and handling', 'summary': 'Discusses the importance of outliers in data analysis, methods for screening outliers including using box plots, probabilistic and statistical models, linear models, and proximity based models, as well as strategies for handling outliers such as dropping, capping, and imputing data. additionally, it covers the concepts of collinearity, multicollinearity, eigenvectors, and eigenvalues, with practical examples and applications in data analysis and linear transformations.', 'duration': 403.473, 'highlights': ['Methods for screening outliers include box plots, probabilistic and statistical models, linear models, and proximity based models, providing various techniques to identify and treat outliers in data analysis.', 'Strategies for handling outliers involve dropping, capping, and imputing data based on percentiles or business rules, providing practical approaches to manage outliers in data sets.', 'The concepts of collinearity, multicollinearity, eigenvectors, and eigenvalues are explained with practical examples and applications in data analysis and linear transformations, providing insights into these fundamental concepts and their real-world relevance.']}], 'duration': 1783.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ29743231.jpg', 'highlights': ['KNN is a supervised technique for classification and regression.', 'K-means is an unsupervised technique for clustering.', 'ROC curve evaluates binary classification algorithm performance.', 'Understanding type 1 and type 2 errors and their trade-offs.', 'Importance of considering various performance measures beyond accuracy.', 'Relationship between entropy and information gain in decision trees.', 'Methods to prevent overfitting and the concept of ensemble learning.', 'Techniques for screening and handling outliers in data analysis.']}, {'end': 32748.684, 'segs': [{'end': 31551.713, 'src': 'embed', 'start': 31528.544, 'weight': 0, 'content': [{'end': 31535.927, 'text': 'What is AB testing? So AB testing is a statistical hypothesis testing which tries to compare different cases.', 'start': 31528.544, 'duration': 7.383}, {'end': 31542.009, 'text': 'So in our cases we want to measure how different model performs as compared to each other.', 'start': 31535.967, 'duration': 6.042}, {'end': 31551.713, 'text': "So assume that in production you have a model which is already running and tries to see how your users are clicking through your products and they're buying those products.", 'start': 31542.349, 'duration': 9.364}], 'summary': 'Ab testing compares different models to measure user engagement and product purchases.', 'duration': 23.169, 'max_score': 31528.544, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31528544.jpg'}, {'end': 31616.56, 'src': 'embed', 'start': 31588.917, 'weight': 7, 'content': [{'end': 31595.622, 'text': 'as in the cases where we want to identify when a user clicks on the pages to increase the outcome of the interest.', 'start': 31588.917, 'duration': 6.705}, {'end': 31596.924, 'text': 'So some websites.', 'start': 31595.642, 'duration': 1.282}, {'end': 31597.504, 'text': 'what they do?', 'start': 31596.924, 'duration': 0.58}, {'end': 31607.553, 'text': 'they try to introduce different functionalities to different users and see how different functionalities are creating a better outcome and better revenues,', 'start': 31597.504, 'duration': 10.049}, {'end': 31609.234, 'text': 'and they use those things in the future.', 'start': 31607.553, 'duration': 1.681}, {'end': 31616.56, 'text': 'So what is cluster sampling? So when you have a population and within a population, you have different clusters available.', 'start': 31610.195, 'duration': 6.365}], 'summary': 'Websites use cluster sampling to test different functionalities for better outcomes and revenues.', 'duration': 27.643, 'max_score': 31588.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31588917.jpg'}, {'end': 31660.086, 'src': 'embed', 'start': 31632.43, 'weight': 1, 'content': [{'end': 31636.352, 'text': 'These are the different clusters which we have within which there are different data scientists.', 'start': 31632.43, 'duration': 3.922}, {'end': 31643.252, 'text': 'When we try to randomly select those clusters for our analysis, that is called as the cluster sampling.', 'start': 31636.832, 'duration': 6.42}, {'end': 31650.118, 'text': 'So in this case, the sample is nothing but different clusters and we are trying to select those samples.', 'start': 31644.153, 'duration': 5.965}, {'end': 31660.086, 'text': 'So for example, if managers are your samples, then companies are basically clusters and we do the clustering of those different companies.', 'start': 31650.919, 'duration': 9.167}], 'summary': 'Cluster sampling involves randomly selecting clusters for analysis, e.g. managers as samples and companies as clusters.', 'duration': 27.656, 'max_score': 31632.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31632430.jpg'}, {'end': 31734.292, 'src': 'embed', 'start': 31705.022, 'weight': 5, 'content': [{'end': 31711.024, 'text': 'So whenever you have a lesser Gini score, you go with that feature for the splitting of your nodes.', 'start': 31705.022, 'duration': 6.002}, {'end': 31715.965, 'text': 'Entropy is a measure of impurity or randomness within your data.', 'start': 31712.084, 'duration': 3.881}, {'end': 31720.286, 'text': "It's like how misclassified your classes are within the nodes.", 'start': 31716.045, 'duration': 4.241}, {'end': 31722.127, 'text': "So it's for the binary classification.", 'start': 31720.406, 'duration': 1.721}, {'end': 31728.588, 'text': 'When you have the binary classification, we have the probability of success, we have the probability of failure.', 'start': 31722.503, 'duration': 6.085}, {'end': 31734.292, 'text': 'So for the positive class, we have the probability of success and the other one, we have the probability of failure.', 'start': 31728.648, 'duration': 5.644}], 'summary': 'Gini score guides node splitting; entropy measures data impurity in binary classification.', 'duration': 29.27, 'max_score': 31705.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31705022.jpg'}, {'end': 31860.463, 'src': 'embed', 'start': 31834.326, 'weight': 2, 'content': [{'end': 31840.29, 'text': 'So the core libraries which are there within Python are NumPy, SciPy, Pandas, and SkyKit.', 'start': 31834.326, 'duration': 5.964}, {'end': 31842.171, 'text': 'So first of all, NumPy.', 'start': 31841.111, 'duration': 1.06}, {'end': 31844.853, 'text': 'NumPy is a numerical library to deal with the data.', 'start': 31842.311, 'duration': 2.542}, {'end': 31847.875, 'text': "So as the name suggests, it's a numerical pi.", 'start': 31845.153, 'duration': 2.722}, {'end': 31849.556, 'text': 'So it tries to deal with the numbers.', 'start': 31847.895, 'duration': 1.661}, {'end': 31855.06, 'text': 'So all the core libraries, such as the SciPy, Pandas, SkyKit, uses NumPy to store the data.', 'start': 31849.896, 'duration': 5.164}, {'end': 31860.463, 'text': 'So those are the core storage format for all the data in data analysis.', 'start': 31855.28, 'duration': 5.183}], 'summary': 'Numpy, scipy, pandas, and skykit are core python libraries for data analysis, with numpy serving as the storage format for the data.', 'duration': 26.137, 'max_score': 31834.326, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31834326.jpg'}, {'end': 32010.658, 'src': 'embed', 'start': 31983.276, 'weight': 8, 'content': [{'end': 31989.659, 'text': "So, preference is something would come based on the thing that you're currently performing with the data analysis stage.", 'start': 31983.276, 'duration': 6.383}, {'end': 31995.841, 'text': "So, for example, when you're doing a quick analysis within your data, so you want to have a quick access to the charts,", 'start': 31990.019, 'duration': 5.822}, {'end': 31998.652, 'text': 'then you can use the matplotlib, for example.', 'start': 31996.431, 'duration': 2.221}, {'end': 32005.696, 'text': 'matplotlib provides you a quick access to the bar chart, spy charts, histogram line charts, scatter plot.', 'start': 31998.652, 'duration': 7.044}, {'end': 32010.658, 'text': 'So, for the quick analysis and a data exploration, you can go with the matplotlib.', 'start': 32005.936, 'duration': 4.722}], 'summary': 'Use matplotlib for quick data analysis and exploration, providing quick access to bar charts, spy charts, histograms, line charts, and scatter plots.', 'duration': 27.382, 'max_score': 31983.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31983276.jpg'}, {'end': 32221.551, 'src': 'embed', 'start': 32189.793, 'weight': 3, 'content': [{'end': 32193.495, 'text': 'How can you handle duplicate values in data set for a variable in Python?', 'start': 32189.793, 'duration': 3.702}, {'end': 32199.578, 'text': 'So in this case you may have to write a code and show to the interviewer that how you can really achieve this thing.', 'start': 32193.815, 'duration': 5.763}, {'end': 32202.56, 'text': 'So you can just import the Pandas library,', 'start': 32199.818, 'duration': 2.742}, {'end': 32221.551, 'text': 'show that you are just reading a random file using the pd.read underscore csv and you use the build data frame dot duplicated to get the list of all the duplicate values which are there within the data and just show that you can also drop those columns if sometimes we want to drop those there by some mistake created by the collector.', 'start': 32202.56, 'duration': 18.991}], 'summary': 'Handle duplicate values in a data set using python and pandas library, drop the columns if necessary.', 'duration': 31.758, 'max_score': 32189.793, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32189793.jpg'}, {'end': 32313.714, 'src': 'embed', 'start': 32279.712, 'weight': 9, 'content': [{'end': 32281.512, 'text': 'so you have to start with the complete program.', 'start': 32279.712, 'duration': 1.8}, {'end': 32285.933, 'text': 'you have to show all the steps which are involved within the accuracy part.', 'start': 32281.512, 'duration': 4.421}, {'end': 32287.994, 'text': 'so you have to start importing your data.', 'start': 32285.933, 'duration': 2.061}, {'end': 32289.905, 'text': 'Just take some random data.', 'start': 32288.464, 'duration': 1.441}, {'end': 32299.128, 'text': 'You can just show that you are reading some data and try to separate your data into the X and Y, the target data and the predictor data,', 'start': 32289.945, 'duration': 9.183}, {'end': 32302.97, 'text': 'and try to create a split within the data of train and test validations.', 'start': 32299.128, 'duration': 3.842}, {'end': 32306.191, 'text': 'You can use whatever ratio you want to use.', 'start': 32303.45, 'duration': 2.741}, {'end': 32313.714, 'text': 'You can use 80%, 70%, 50% as you like, but you need to give some justification to that also if your interviewer asks for it.', 'start': 32306.211, 'duration': 7.503}], 'summary': 'Demonstrate complete program with data import, splitting, and ratio justification.', 'duration': 34.002, 'max_score': 32279.712, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32279712.jpg'}, {'end': 32427.879, 'src': 'embed', 'start': 32398.123, 'weight': 4, 'content': [{'end': 32403.404, 'text': 'So you can go for three simple steps where you can try to increase the accuracy of your model.', 'start': 32398.123, 'duration': 5.281}, {'end': 32409.365, 'text': 'First simple step would be try to see if you can make some tweaking into your probability cutoff.', 'start': 32403.624, 'duration': 5.741}, {'end': 32412.886, 'text': 'So the default probability cutoff is 50%.', 'start': 32409.626, 'duration': 3.26}, {'end': 32419.368, 'text': 'So the ones which are above the 50% are tagged as one and the ones below the 50% probability are tagged as zero.', 'start': 32412.886, 'duration': 6.482}, {'end': 32427.879, 'text': 'So if you are changing it to something like 0.8 and then checking the accuracy if it makes changes to your model, it is still good.', 'start': 32419.848, 'duration': 8.031}], 'summary': 'Increase model accuracy by adjusting probability cutoff, e.g. to 0.8.', 'duration': 29.756, 'max_score': 32398.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32398123.jpg'}, {'end': 32506.886, 'src': 'embed', 'start': 32480.786, 'weight': 6, 'content': [{'end': 32488.554, 'text': 'we will now move on to the scenario based questions, where we will see some puzzle kind of things, and the other are real problems,', 'start': 32480.786, 'duration': 7.768}, {'end': 32493.52, 'text': 'where the interview is related to understand as how we use, try to solve those problems.', 'start': 32488.554, 'duration': 4.966}, {'end': 32494.541, 'text': "let's start with those.", 'start': 32493.52, 'duration': 1.021}, {'end': 32500.9, 'text': "So you're given a data set consisting of variables having more than 30% of missing values.", 'start': 32495.535, 'duration': 5.365}, {'end': 32506.886, 'text': "Let's say out of 50 variables, eight variables are missing values higher than 30%.", 'start': 32501.401, 'duration': 5.485}], 'summary': 'In scenario-based questions, 8 out of 50 variables have over 30% missing values.', 'duration': 26.1, 'max_score': 32480.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32480786.jpg'}], 'start': 31528.544, 'title': 'Various data analysis concepts and techniques', 'summary': 'Covers topics such as ab testing for comparing model performance, cluster sampling and binary classification tree algorithm, python libraries for data analysis, and handling duplicate values and improving model accuracy in python, providing insights into statistical hypothesis testing, sampling methods, data analysis libraries, and model accuracy improvement techniques.', 'chapters': [{'end': 31609.234, 'start': 31528.544, 'title': 'Understanding ab testing', 'summary': "Explains ab testing as a statistical hypothesis testing used to compare different models' performance, measuring user behavior and outcomes to identify the model that provides better results, using examples from website functionalities.", 'duration': 80.69, 'highlights': ["AB testing is a statistical hypothesis testing comparing different models' performance, measuring user behavior and outcomes.", 'It involves introducing a new model to a limited user group, capturing their behavior, and using hypothesis testing like TE test and ANOVA test to identify the better model.', 'Many websites use AB testing to introduce different functionalities to users and measure the impact on outcomes and revenues.']}, {'end': 31795.165, 'start': 31610.195, 'title': 'Cluster sampling and binary classification tree algorithm', 'summary': 'Explains cluster sampling and binary classification tree algorithm, where the former involves randomly selecting clusters from a population, and the latter utilizes gini and entropy parameters to decide on variable splits for nodes, aiming to lower gini or entropy values.', 'duration': 184.97, 'highlights': ['Cluster sampling involves randomly selecting clusters from a population.', 'Binary classification tree algorithm uses Gini and entropy parameters to decide on variable splits for nodes.', 'Entropy is a measure of impurity or randomness within the data for binary classification.', 'Gini score is calculated for each node, and the feature with a lower Gini score is chosen for the splitting of nodes.']}, {'end': 32189.103, 'start': 31795.685, 'title': 'Python libraries for data analysis', 'summary': 'Discusses the core libraries used in python for data analysis, including numpy, scipy, pandas, and scikit, as well as the differences between matplotlib, seaborn, and bokeh for data visualization.', 'duration': 393.418, 'highlights': ['NumPy, SciPy, Pandas, and Scikit are core libraries used for data analysis in Python, with NumPy serving as the numerical library for data storage and manipulation.', 'Matplotlib provides quick access to basic charts for data exploration, Seaborn is suitable for in-depth analysis and provides additional types of graphs, while Bokeh is used for interactive visualization and web publishing.', 'The main difference between a panda series and a single column data frame is that a series can only store a single column, while a data frame can store multiple columns and offers additional functions.']}, {'end': 32748.684, 'start': 32189.793, 'title': 'Handling duplicate values and improving model accuracy in python', 'summary': 'Discusses handling duplicate values in python using pandas library, and improving model accuracy by showing a basic machine learning program to check the accuracy of the data set and ways to improve it, followed by dealing with missing values in a dataset and an sql query to make recommendations, showcasing technical and problem-solving skills.', 'duration': 558.891, 'highlights': ['The chapter discusses handling duplicate values in Python using the Pandas library and demonstrates methods to identify and remove duplicate values from a data set, showcasing technical proficiency in Python and data manipulation.', 'The chapter also showcases a basic machine learning program to check the accuracy of a dataset, emphasizing the importance of using performance metrics and differentiating between test set and training set.', 'Furthermore, the chapter provides insights into improving model accuracy by adjusting probability cutoff, identifying better features, and creating new features, demonstrating problem-solving and analytical skills in machine learning.', 'The chapter also addresses handling missing values in a dataset by suggesting creating a new feature, removing the data, or using clustering and distribution to handle missing values, showcasing problem-solving skills in data preprocessing.', 'Additionally, the chapter presents an SQL query scenario to make recommendations, highlighting the understanding of database queries and problem-solving abilities in a database context.']}], 'duration': 1220.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ31528544.jpg', 'highlights': ["AB testing measures user behavior and outcomes, comparing different models' performance.", 'Cluster sampling involves randomly selecting clusters from a population.', 'Python libraries for data analysis include NumPy, SciPy, Pandas, and Scikit.', 'Handling duplicate values in Python using the Pandas library is discussed.', 'Improving model accuracy by adjusting probability cutoff and identifying better features.', 'Binary classification tree algorithm uses Gini and entropy parameters for variable splits.', 'The chapter provides insights into improving model accuracy and handling missing values.', 'Websites use AB testing to introduce different functionalities to users and measure impact.', 'Matplotlib provides quick access to basic charts for data exploration.', 'The chapter showcases a basic machine learning program to check dataset accuracy.']}, {'end': 34699.991, 'segs': [{'end': 32797.754, 'src': 'embed', 'start': 32771.093, 'weight': 0, 'content': [{'end': 32777.515, 'text': "and in follow up, what is the probability of making money from this game if you're playing it for six times?", 'start': 32771.093, 'duration': 6.422}, {'end': 32786.177, 'text': 'so the first condition says if the sum of the values on the dice equals seven, then you win twenty one dollars,', 'start': 32777.515, 'duration': 8.662}, {'end': 32789.697, 'text': 'but for all other cases you have to pay five dollar.', 'start': 32786.177, 'duration': 3.52}, {'end': 32797.754, 'text': 'So in this case, if we first assume all the possible cases as we have fair six sided dies and we have two of them', 'start': 32790.807, 'duration': 6.947}], 'summary': 'Probability of making money from game played 6 times is calculated using fair six-sided dice with specific conditions.', 'duration': 26.661, 'max_score': 32771.093, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32771093.jpg'}, {'end': 33047.828, 'src': 'embed', 'start': 33022.612, 'weight': 3, 'content': [{'end': 33031.72, 'text': 'For the second question you have to see is The question asks what is the chance a user will be shown only a single ad in 100 stories?', 'start': 33022.612, 'duration': 9.108}, {'end': 33037.285, 'text': 'If you see this question, it is an example of binomial distribution.', 'start': 33032.32, 'duration': 4.965}, {'end': 33042.39, 'text': 'So as we just saw in the question three, binomial distribution takes three parameters.', 'start': 33037.826, 'duration': 4.564}, {'end': 33047.828, 'text': 'First is the probability of success and failure, which is in our case is 4%.', 'start': 33042.771, 'duration': 5.057}], 'summary': 'Binomial distribution: probability of single ad in 100 stories is 4%.', 'duration': 25.216, 'max_score': 33022.612, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ33022612.jpg'}, {'end': 33798.236, 'src': 'embed', 'start': 33770.537, 'weight': 1, 'content': [{'end': 33777.19, 'text': "Let's suppose when you build a classification model, you achieved an accuracy of 96%.", 'start': 33770.537, 'duration': 6.653}, {'end': 33779.911, 'text': "Why shouldn't you be happy with your model performance?", 'start': 33777.19, 'duration': 2.721}, {'end': 33781.691, 'text': 'What can you do about it?', 'start': 33780.391, 'duration': 1.3}, {'end': 33787.393, 'text': 'So first thing is, as this thing is mostly related to the domain of cancer detection.', 'start': 33782.191, 'duration': 5.202}, {'end': 33798.236, 'text': 'Interview is trying to gauge you as how better you know as to how the accuracy of the models works in a different data distributions.', 'start': 33789.013, 'duration': 9.223}], 'summary': 'Building a classification model with 96% accuracy may not be sufficient, especially in cancer detection.', 'duration': 27.699, 'max_score': 33770.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ33770537.jpg'}, {'end': 33983.772, 'src': 'embed', 'start': 33957.765, 'weight': 4, 'content': [{'end': 33962.846, 'text': 'You tried a time series regression model and got higher accuracy than the decision tree model.', 'start': 33957.765, 'duration': 5.081}, {'end': 33972.429, 'text': 'Can this happen? Why? So first thing as it is a time series data time series data mostly linear in nature.', 'start': 33963.426, 'duration': 9.003}, {'end': 33976.05, 'text': 'So as the next value would be related with the previous value.', 'start': 33972.849, 'duration': 3.201}, {'end': 33979.431, 'text': "So let's assume we want to look at the stock prices.", 'start': 33976.33, 'duration': 3.101}, {'end': 33983.772, 'text': 'So stock prices, which are today, are related with the yesterday.', 'start': 33979.451, 'duration': 4.321}], 'summary': 'Time series regression model achieved higher accuracy than decision tree due to linear nature of time series data.', 'duration': 26.007, 'max_score': 33957.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ33957765.jpg'}, {'end': 34301.914, 'src': 'embed', 'start': 34275.683, 'weight': 2, 'content': [{'end': 34282.449, 'text': "So you're asked to build a multiple regression model, but your model R square isn't as good as you wanted.", 'start': 34275.683, 'duration': 6.766}, {'end': 34285.372, 'text': 'For improvement, you remove the intercept term.', 'start': 34283.13, 'duration': 2.242}, {'end': 34287.674, 'text': 'Now your model R square becomes 0.8 from 0.3.', 'start': 34285.832, 'duration': 1.842}, {'end': 34289.075, 'text': 'Is it possible and how?', 'start': 34287.674, 'duration': 1.401}, {'end': 34293.568, 'text': 'So, first thing, what is R square?', 'start': 34291.747, 'duration': 1.821}, {'end': 34301.914, 'text': "So R square says how your linear regression model, how much of the variance out of the data it's trying to explain?", 'start': 34294.009, 'duration': 7.905}], 'summary': 'Improving r square from 0.3 to 0.8 by removing intercept term in multiple regression model.', 'duration': 26.231, 'max_score': 34275.683, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ34275683.jpg'}], 'start': 32748.684, 'title': 'Probability and predictive modeling', 'summary': 'Covers various topics such as probability in a dice game, binomial distribution, subscription renewal prediction for dish tv using logistic regression and neural networks, cancer detection model performance, and model improvement in regression and recommendation systems.', 'chapters': [{'end': 33146.487, 'start': 32748.684, 'title': 'Probability and binomial distribution', 'summary': 'Discusses the probability of winning in a dice game, the probability of making money from the game when played six times, and the use of binomial distribution to calculate the probability of ads being shown in news stories.', 'duration': 397.803, 'highlights': ['The probability of making money from the dice game when played six times is analyzed, showing that winning is possible if two, three, four, five, or six games are won out of six, and this scenario can be simulated using binomial distribution.', 'Calculation of the expected number of ads shown in 100 new stories using two different options, which yields the same result of one ad per 25 stories or 4% chance per story.', 'Application of binomial distribution to calculate the probability of a user being shown only a single ad in 100 stories, and the probability of no ads being shown at all.']}, {'end': 33446.11, 'start': 33147.773, 'title': 'Subscription renewal prediction for dish tv', 'summary': 'Explains how to predict subscription renewal for dish tv, including the data needed, analysis to be done, and building predictive models using algorithms like logistic regression and neural networks.', 'duration': 298.337, 'highlights': ['The chapter explains the need for data to predict subscription renewal, including variables such as active channel hours, number of kids and adults in the household, and channel usage trends, aiding in understanding customer behavior and predicting future subscriptions.', 'The chapter discusses the analysis to be performed, focusing on classification analysis to categorize customers as likely to subscribe or not, as well as the building of predictive models using historical data and algorithms such as logistic regression and neural networks.', 'The chapter addresses the approach to solving a problem with limited data, using the example of mapping nicknames to real names by gathering context from sources like Twitter tweets or customer feedback to identify relationships and apply NLP algorithms to determine real names.']}, {'end': 34275.663, 'start': 33446.11, 'title': 'Probability, cancer detection, model performance', 'summary': 'Discusses probability in coin tosses, cancer detection model performance, and the use of pca for data reduction and explains concepts relevant to each topic.', 'duration': 829.553, 'highlights': ['The probability of the next coin toss being a head after observing 10 heads in a row is calculated using conditional and combined probability, resulting in a 0.7531 probability.', 'Explaining the impact of class distribution on model accuracy, the chapter illustrates that in cancer detection, accuracy may not be a reliable performance metric due to the imbalance in positive and negative cases, highlighting the need for alternative performance metrics like recall, precision, specificity, and F1 score.', 'Discussing the use of time series regression models for time series data, the chapter explains that the linear correlation inherent in time series data makes time series regression models more accurate than decision tree models.', 'Addressing the issue of low bias and high variance in models, the chapter presents techniques such as bagging algorithms, regularization, and feature importance to mitigate overfitting and reduce variance in models.', 'Clarifying the purpose of PCA, the chapter emphasizes that PCA is not solely for addressing multicollinearity but also for reducing the dimensionality of features by leveraging variance, irrespective of collinearity.']}, {'end': 34699.991, 'start': 34275.683, 'title': 'Model improvement, overfitting, and recommendation systems', 'summary': 'Discusses improving a regression model by removing the intercept term, resulting in a significant increase in r square, the issue of overfitting in a random forest model with 10,000 trees, and an overview of recommendation systems, including collaborative and content-based filtering.', 'duration': 424.308, 'highlights': ['The chapter discusses improving a regression model by removing the intercept term, resulting in a significant increase in R square.', 'The issue of overfitting in a random forest model with 10,000 trees is addressed, where training error is 0.0 but the validation error is 34.23.', 'An overview of recommendation systems, including collaborative and content-based filtering, is provided.']}], 'duration': 1951.307, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/GwIo3gDZCVQ/pics/GwIo3gDZCVQ32748684.jpg', 'highlights': ['Probability of making money from dice game analyzed using binomial distribution', 'Analysis of class distribution impact on model accuracy in cancer detection', 'Improving regression model by removing intercept term significantly increases R square', 'Application of binomial distribution to calculate probability of user being shown only a single ad in 100 stories', 'Use of time series regression models for time series data explained']}], 'highlights': ['By 2022, 40% new app projects need ML co-developers, generating $3.9 trillion revenue', 'Linear Discriminant Analysis identified as the most accurate model', 'Decision trees and random forest heavily rely on information gain and entropy, key in ML algorithms', 'The Central Limit Theorem states that the mean of each sample from a large population will be almost equal to the mean of the entire population', 'Confidence interval quantifies uncertainty in sample estimate, crucial in statistical analysis', 'The tutorial explains the implementation of linear regression using Python, achieving an R-squared value of 0.63', 'Logistic regression model for SUV purchases has 89% accuracy', 'The detailed explanation of random forest algorithm provides insight into its mechanism of compiling results from multiple decision trees', 'The Q learning process involves 1000 iterations to obtain the optimal route from a starting location to an end location', 'Average salary of $111,490 in the US and 7,19,646 rupees in India', "Machine learning engineer's salary ranges from $75,000 to $162,000 for an entry-level position and significantly increases for experienced engineers", 'KNN is a supervised technique for classification and regression', "AB testing measures user behavior and outcomes, comparing different models' performance", 'Probability of making money from dice game analyzed using binomial distribution', 'Improving regression model by removing intercept term significantly increases R square', 'Application of binomial distribution to calculate probability of user being shown only a single ad in 100 stories']}