title
Live Day 4- Discussing Decision Tree And Ensemble Machine Learning Algorithms

description
Join the community session https://ineuron.ai/course/Mega-Community . Here All the materials will be uploaded. Playlist: https://www.youtube.com/watch?v=11unm2hmvOQ&list=PLZoTAELRMXVMgtxAboeAx-D9qbnY94Yay The Oneneuron Lifetime subscription has been extended. In Oneneuron platform you will be able to get 100+ courses(Monthly atleast 20 courses will be added based on your demand) Features of the course 1. You can raise any course demand.(Fulfilled within 45-60 days) 2. You can access innovation lab from ineuron. 3. You can use our incubation based on your ideas 4. Live session coming soon(Mostly till Feb) Use Coupon code KRISH10 for addition 10% discount. And Many More..... Enroll Now OneNeuron Link: https://one-neuron.ineuron.ai/ Direct call to our Team incase of any queries 8788503778 6260726925 9538303385 866003424

detail
{'title': 'Live Day 4- Discussing Decision Tree And Ensemble Machine Learning Algorithms', 'heatmap': [{'end': 3659.454, 'start': 3514.047, 'weight': 0.838}, {'end': 3847.616, 'start': 3793.771, 'weight': 0.701}], 'summary': 'Discusses decision tree algorithm, its application in regression and classification, analysis on tennis playing prediction and weather data, concepts of entropy, gini impurity, information gain, feature selection, and decision tree regressor, and practical implementation with the iris dataset.', 'chapters': [{'end': 401.149, 'segs': [{'end': 293.517, 'src': 'embed', 'start': 173.208, 'weight': 1, 'content': [{'end': 202.263, 'text': 'okay now is it audible okay now is it audible Perfect.', 'start': 173.208, 'duration': 29.055}, {'end': 203.104, 'text': 'Loud and clear.', 'start': 202.523, 'duration': 0.581}, {'end': 208.007, 'text': "Okay I'm extremely sorry.", 'start': 205.765, 'duration': 2.242}, {'end': 210.128, 'text': 'This mute button is there.', 'start': 208.327, 'duration': 1.801}, {'end': 213.99, 'text': 'So this mute button got pressed by mistake.', 'start': 211.369, 'duration': 2.621}, {'end': 218.893, 'text': 'Okay Perfect.', 'start': 218.313, 'duration': 0.58}, {'end': 223.956, 'text': 'So this is the agenda of this session.', 'start': 222.135, 'duration': 1.821}, {'end': 226.117, 'text': 'We will try to complete this all things.', 'start': 224.296, 'duration': 1.821}, {'end': 230.06, 'text': 'Again, here we are going to understand the mathematical equations and all.', 'start': 227.078, 'duration': 2.982}, {'end': 232.037, 'text': 'Okay Yeah.', 'start': 230.28, 'duration': 1.757}, {'end': 244.686, 'text': 'Okay Perfect.', 'start': 244.126, 'duration': 0.56}, {'end': 249.27, 'text': 'Okay So my voice is basically very much clear.', 'start': 246.348, 'duration': 2.922}, {'end': 260.257, 'text': 'So how was the experience with respect to day one, day two, day three, day three, all the sessions were good.', 'start': 249.79, 'duration': 10.467}, {'end': 274.327, 'text': 'Right All the sessions, how was the experience? Good enough.', 'start': 266.843, 'duration': 7.484}, {'end': 281.11, 'text': 'Right I hope it was amazing for you all.', 'start': 276.088, 'duration': 5.022}, {'end': 283.591, 'text': 'You may have probably learned a lot.', 'start': 281.19, 'duration': 2.401}, {'end': 293.517, 'text': 'Yeah Great.', 'start': 289.214, 'duration': 4.303}], 'summary': 'Session agenda: understanding mathematical equations, positive feedback on previous sessions.', 'duration': 120.309, 'max_score': 173.208, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4173208.jpg'}, {'end': 401.149, 'src': 'embed', 'start': 343.61, 'weight': 0, 'content': [{'end': 352.151, 'text': "why, i'll tell you because in the later stages, when we will be learning about um, when we will probably be learning about the ensemble techniques,", 'start': 343.61, 'duration': 8.541}, {'end': 355.772, 'text': 'it will definitely, like you, will be able to learn it a lot.', 'start': 352.151, 'duration': 3.621}, {'end': 359.933, 'text': 'okay, means the main accuracy, you know, which is basically there.', 'start': 355.772, 'duration': 4.161}, {'end': 363.354, 'text': "Don't worry about the practical part, guys, right now.", 'start': 360.553, 'duration': 2.801}, {'end': 365.115, 'text': 'For KNN and NAVBIAS, I know.', 'start': 363.514, 'duration': 1.601}, {'end': 372.417, 'text': 'Later on when I get time, I will directly create those things and probably upload a video or take a live session.', 'start': 365.695, 'duration': 6.722}, {'end': 375.898, 'text': "Okay So let's focus on the sessions now.", 'start': 372.457, 'duration': 3.441}, {'end': 385.061, 'text': 'Okay Okay.', 'start': 376.978, 'duration': 8.083}, {'end': 387.382, 'text': "So hit like and let's start.", 'start': 385.481, 'duration': 1.901}, {'end': 401.149, 'text': "Again some spammers, spammers are everywhere, everywhere spammers are there, can't help, okay fine not a problem, okay.", 'start': 389.036, 'duration': 12.113}], 'summary': 'Later stages will cover ensemble techniques, with a focus on knn and navbias, for improved accuracy.', 'duration': 57.539, 'max_score': 343.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4343610.jpg'}], 'start': 173.208, 'title': 'Mathematical equations and decision tree algorithm overview', 'summary': 'Covers understanding mathematical equations and positive feedback, and introduces the decision tree algorithm, outlining its significance in solving use cases and its relevance to future learning of ensemble techniques. it also discusses the upcoming seven days live session, covering topics like eda and deep learning.', 'chapters': [{'end': 293.517, 'start': 173.208, 'title': 'Mathematical equations session', 'summary': 'Focused on understanding mathematical equations and overall positive feedback from the participants, with a good learning experience.', 'duration': 120.309, 'highlights': ['All sessions received positive feedback from participants, indicating an overall good learning experience.', "The chapter's agenda was to understand mathematical equations and aimed to complete all related topics.", 'The speaker apologized for technical difficulties, specifically the mute button being pressed by mistake, indicating the need for clear communication protocols and technical support.']}, {'end': 401.149, 'start': 293.817, 'title': 'Decision tree algorithm overview', 'summary': 'Discusses the upcoming sessions, including the plan for seven days live session, covering topics like eda and deep learning, and specifically focuses on introducing the decision tree algorithm, highlighting its significance in solving various use cases and its relevance to future learning of ensemble techniques.', 'duration': 107.332, 'highlights': ['The upcoming sessions will include seven days live sessions, covering topics like EDA and deep learning, with the aim to cover as much as possible based on time constraints.', 'Introduction of the decision tree algorithm, emphasizing its significance in solving various use cases and its relevance to learning about ensemble techniques in the future.', 'Plans to address practical aspects like KNN and NAVBIAS in future sessions, with the intention to create and upload related content or conduct live sessions.']}], 'duration': 227.941, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4173208.jpg', 'highlights': ['Introduction of the decision tree algorithm, emphasizing its significance in solving various use cases and its relevance to learning about ensemble techniques in the future.', 'The upcoming sessions will include seven days live sessions, covering topics like EDA and deep learning, with the aim to cover as much as possible based on time constraints.', 'All sessions received positive feedback from participants, indicating an overall good learning experience.', "The chapter's agenda was to understand mathematical equations and aimed to complete all related topics.", 'Plans to address practical aspects like KNN and NAVBIAS in future sessions, with the intention to create and upload related content or conduct live sessions.', 'The speaker apologized for technical difficulties, specifically the mute button being pressed by mistake, indicating the need for clear communication protocols and technical support.']}, {'end': 785.319, 'segs': [{'end': 534.566, 'src': 'embed', 'start': 425.22, 'weight': 0, 'content': [{'end': 433.745, 'text': "this particular part, okay, and we'll try to understand by taking a lot of examples where we will be solving a lot of.", 'start': 425.22, 'duration': 8.525}, {'end': 437.827, 'text': 'well, we will take a specific data set and try to slow, uh, solve those problems.', 'start': 433.745, 'duration': 4.082}, {'end': 441.353, 'text': 'okay, Now, coming to the decision tree, one thing you need to understand.', 'start': 437.827, 'duration': 3.526}, {'end': 444.393, 'text': "Let's say that I'm writing an if-else condition.", 'start': 441.813, 'duration': 2.58}, {'end': 451.135, 'text': "I'll say that if age is less than 8, let's say I'm writing this condition.", 'start': 445.013, 'duration': 6.122}, {'end': 458.757, 'text': "If age is less than or equal to 18, I'm going to say print go to college.", 'start': 451.875, 'duration': 6.882}, {'end': 465.198, 'text': "Let's say that here I'm printing print college.", 'start': 459.397, 'duration': 5.801}, {'end': 484.993, 'text': "And then I'll write else if age is greater than 18 and or let me write it down like this.", 'start': 468.598, 'duration': 16.395}, {'end': 494.342, 'text': 'Page is less than or equal to 35 and say print work.', 'start': 489.037, 'duration': 5.305}, {'end': 507.75, 'text': 'Then again I will write else, if age is, let me put this condition little bit better, okay.', 'start': 498.527, 'duration': 9.223}, {'end': 515.633, 'text': 'So that you will be able to understand why I am specifically writing this age, conditions and all, you will be able to understand, just a second.', 'start': 509.271, 'duration': 6.362}, {'end': 527.984, 'text': 'Okay Then I will write here, elif age is greater than 18.', 'start': 517.494, 'duration': 10.49}, {'end': 534.566, 'text': "age is less than or equal to 35, i'm going to say print work.", 'start': 527.984, 'duration': 6.582}], 'summary': 'Teaching decision tree with age-based conditions and examples.', 'duration': 109.346, 'max_score': 425.22, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4425220.jpg'}], 'start': 401.57, 'title': 'Decision trees for regression and classification', 'summary': 'Discusses the purpose and implementation of decision trees, highlighting their use in solving regression and classification problems. it emphasizes understanding the concept through examples and practical application, and the implementation of the algorithm based on nested if-else conditions with a focus on solving classification and regression problems.', 'chapters': [{'end': 444.393, 'start': 401.57, 'title': 'Understanding decision trees for regression and classification', 'summary': 'Discusses the exact purpose of decision trees, emphasizing their use in solving regression and classification problems, with a focus on understanding the concept through examples and practical application.', 'duration': 42.823, 'highlights': ['The purpose of decision trees is to solve regression and classification problems, providing a practical solution to real-world data sets.', 'The session emphasizes understanding decision trees through examples and practical application.']}, {'end': 785.319, 'start': 445.013, 'title': 'Decision tree algorithm', 'summary': 'Discusses the implementation of a decision tree algorithm based on nested if-else conditions, illustrating the process with an example and emphasizing its application in solving classification and regression problems.', 'duration': 340.306, 'highlights': ['The chapter discusses the implementation of a decision tree algorithm based on nested if-else conditions The transcript explains the process of converting nested if-else conditions into a decision tree algorithm, demonstrating the steps involved in creating nodes and making decisions based on specific conditions.', 'Emphasizes its application in solving classification and regression problems The chapter highlights the practical application of the decision tree algorithm in solving classification and regression problems, showcasing its significance in problem-solving and decision-making.', 'Illustrates the process with an example The transcript provides a practical example to illustrate the process of creating decision trees based on age conditions, offering a clear demonstration of how the algorithm works in a real-world scenario.']}], 'duration': 383.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4401570.jpg', 'highlights': ['The chapter emphasizes understanding decision trees through examples and practical application.', 'The purpose of decision trees is to solve regression and classification problems, providing a practical solution to real-world data sets.', 'The chapter highlights the practical application of the decision tree algorithm in solving classification and regression problems, showcasing its significance in problem-solving and decision-making.', 'The chapter discusses the implementation of a decision tree algorithm based on nested if-else conditions, demonstrating the steps involved in creating nodes and making decisions based on specific conditions.', 'The transcript provides a practical example to illustrate the process of creating decision trees based on age conditions, offering a clear demonstration of how the algorithm works in a real-world scenario.']}, {'end': 1076.178, 'segs': [{'end': 893.142, 'src': 'embed', 'start': 858.807, 'weight': 2, 'content': [{'end': 860.388, 'text': 'How many total number of yes are there?', 'start': 858.807, 'duration': 1.581}, {'end': 862.248, 'text': 'There, you will be able to find out.', 'start': 860.788, 'duration': 1.46}, {'end': 863.289, 'text': 'there are nine, yes.', 'start': 862.248, 'duration': 1.041}, {'end': 868.09, 'text': 'See one, two, three, four, five, six, seven, eight, nine.', 'start': 863.949, 'duration': 4.141}, {'end': 872.332, 'text': "And how many no's are there? One, two, three, four, five.", 'start': 868.59, 'duration': 3.742}, {'end': 873.472, 'text': 'I think one, two.', 'start': 872.812, 'duration': 0.66}, {'end': 876.328, 'text': 'three, four, five.', 'start': 874.266, 'duration': 2.062}, {'end': 884.635, 'text': "so nine yes and five no's are there okay, so what we are going to do in this specific thing?", 'start': 876.328, 'duration': 8.307}, {'end': 893.142, 'text': "now we have nine yes and five no's, and the first node that i have actually taken is basically outlook.", 'start': 884.635, 'duration': 8.507}], 'summary': "There are 9 'yes' and 5 'no's. the first node is 'outlook'.", 'duration': 34.335, 'max_score': 858.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4858807.jpg'}, {'end': 1028.72, 'src': 'embed', 'start': 1002.429, 'weight': 0, 'content': [{'end': 1006.174, 'text': 'I will explain why it is taken as Outlook as a specifically.', 'start': 1002.429, 'duration': 3.745}, {'end': 1008.337, 'text': 'I have just randomly selected right now.', 'start': 1006.194, 'duration': 2.143}, {'end': 1009.539, 'text': 'I will talk about it.', 'start': 1008.538, 'duration': 1.001}, {'end': 1010.1, 'text': "Don't worry.", 'start': 1009.639, 'duration': 0.461}, {'end': 1014.166, 'text': "Let's say that I have randomly selected one feature which is Outlook.", 'start': 1010.701, 'duration': 3.465}, {'end': 1016.649, 'text': 'I will talk about it.', 'start': 1015.087, 'duration': 1.562}, {'end': 1019.714, 'text': 'why i have specifically.', 'start': 1017.532, 'duration': 2.182}, {'end': 1023.296, 'text': "why can't i went like see, it is up to it.", 'start': 1019.714, 'duration': 3.582}, {'end': 1026.758, 'text': 'it is up to the decision tree to select any of the feature here.', 'start': 1023.296, 'duration': 3.462}, {'end': 1028.72, 'text': 'i have specifically taken outlook.', 'start': 1026.758, 'duration': 1.962}], 'summary': 'Explaining the choice of outlook as a specific feature for discussion.', 'duration': 26.291, 'max_score': 1002.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41002429.jpg'}], 'start': 785.839, 'title': 'Decision tree analysis on tennis playing prediction', 'summary': "Focuses on using decision tree analysis to predict tennis playing based on input features and emphasizes the analysis of the outlook feature with 9 'yes' and 5 'no' outcomes, highlighting the importance of trust and focus in learning.", 'chapters': [{'end': 884.635, 'start': 785.839, 'title': 'Decision tree classification problem', 'summary': "Discusses a classification problem statement using decision tree for predicting if a person will play tennis based on input features like outlook, temperature, humidity, and wind, with 9 'yes' and 5 'no' outcomes for the outlook feature.", 'duration': 98.796, 'highlights': ['The chapter introduces a classification problem statement for predicting if a person will play tennis using decision tree.', 'The model aims to predict if a person will play tennis based on input features like outlook, temperature, humidity, and wind.', "There are 9 'yes' and 5 'no' outcomes for the outlook feature, forming the basis for decision tree analysis."]}, {'end': 1076.178, 'start': 884.635, 'title': 'Decision tree analysis: outlook feature', 'summary': "Discusses the decision tree analysis focusing on the outlook feature, with three categories - sunny, overcast, and rain, and analyzes the count of 'yes' and 'no' outcomes for each category to determine the best split, emphasizing the importance of trust and focus in learning.", 'duration': 191.543, 'highlights': ['The chapter discusses the decision tree analysis focusing on the outlook feature and identifies three categories - sunny, overcast, and rain. The analysis focuses on the outlook feature and its three unique categories.', "Analyzing the count of 'yes' and 'no' outcomes for each category - sunny has 2 'yes' and 3 'no' outcomes. The analysis provides quantitative data on the count of 'yes' and 'no' outcomes for the sunny category.", 'Emphasizes the importance of trust and focus in learning and discourages distractions during the session. The chapter stresses the significance of trust and focus in the learning process, discouraging distractions to avoid skipping important information.']}], 'duration': 290.339, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC4785839.jpg', 'highlights': ['The model aims to predict if a person will play tennis based on input features like outlook, temperature, humidity, and wind.', 'The chapter introduces a classification problem statement for predicting if a person will play tennis using decision tree.', "Analyzing the count of 'yes' and 'no' outcomes for each category - sunny has 2 'yes' and 3 'no' outcomes.", "There are 9 'yes' and 5 'no' outcomes for the outlook feature, forming the basis for decision tree analysis.", 'The chapter discusses the decision tree analysis focusing on the outlook feature and identifies three categories - sunny, overcast, and rain.']}, {'end': 1419.18, 'segs': [{'end': 1170.493, 'src': 'embed', 'start': 1136.714, 'weight': 0, 'content': [{'end': 1140.037, 'text': 'Right. So here you can basically say that in rain.', 'start': 1136.714, 'duration': 3.323}, {'end': 1144.72, 'text': "in the case of rain, if I take an example, how many number of yes and no's are there?", 'start': 1140.037, 'duration': 4.683}, {'end': 1145.981, 'text': 'You just tell me now.', 'start': 1145.1, 'duration': 0.881}, {'end': 1154.285, 'text': "It will be 3 yes and 2 no's.", 'start': 1151.383, 'duration': 2.902}, {'end': 1157.006, 'text': 'Okay Understand.', 'start': 1155.305, 'duration': 1.701}, {'end': 1158.467, 'text': 'Understanding algorithm.', 'start': 1157.126, 'duration': 1.341}, {'end': 1160.788, 'text': 'Ek baar samajh loge.', 'start': 1159.687, 'duration': 1.101}, {'end': 1163.709, 'text': 'Then everything you will be able to understand.', 'start': 1161.488, 'duration': 2.221}, {'end': 1165.73, 'text': 'Okay Okay.', 'start': 1164.29, 'duration': 1.44}, {'end': 1168.812, 'text': 'Perfect Now these all are there.', 'start': 1165.75, 'duration': 3.062}, {'end': 1170.493, 'text': 'Okay Now see this.', 'start': 1169.092, 'duration': 1.401}], 'summary': "In the case of rain, there are 3 yes and 2 no's.", 'duration': 33.779, 'max_score': 1136.714, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41136714.jpg'}, {'end': 1353.854, 'src': 'embed', 'start': 1302.485, 'weight': 3, 'content': [{'end': 1307.47, 'text': "And again, how do we calculate that which feature we should take next? I'll discuss about it.", 'start': 1302.485, 'duration': 4.985}, {'end': 1311.054, 'text': "Let's say that after this, I take up temperature.", 'start': 1307.87, 'duration': 3.184}, {'end': 1315.175, 'text': 'I take up temperature and I start splitting again.', 'start': 1312.714, 'duration': 2.461}, {'end': 1318.336, 'text': 'okay?. Since this is impure, okay?', 'start': 1315.175, 'duration': 3.161}, {'end': 1325.239, 'text': 'And this split will happen until we get finally a pure split, okay?', 'start': 1318.656, 'duration': 6.583}, {'end': 1327.06, 'text': 'Similarly, with respect to rain,', 'start': 1325.559, 'duration': 1.501}, {'end': 1336.603, 'text': "we will go ahead and take another feature and we'll keep on splitting unless and until we get a leaf node which is completely pure, okay?", 'start': 1327.06, 'duration': 9.543}, {'end': 1341.145, 'text': 'So I hope you understood how this exactly works, okay?', 'start': 1337.183, 'duration': 3.962}, {'end': 1344.43, 'text': 'Now, two questions, two questions.', 'start': 1342.009, 'duration': 2.421}, {'end': 1345.991, 'text': 'is that, Krish?', 'start': 1344.43, 'duration': 1.561}, {'end': 1353.854, 'text': 'the first thing is that how do we calculate this purity and how do we come to know that this is a pure split?', 'start': 1345.991, 'duration': 7.863}], 'summary': 'Discussing feature selection and split purity in decision tree modeling.', 'duration': 51.369, 'max_score': 1302.485, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41302485.jpg'}], 'start': 1076.178, 'title': 'Decision tree analysis and purity', 'summary': "Presents decision tree analysis on weather data, revealing that overcast has 4 'yes' and 0 'no's, followed by rain with 3 'yes' and 2 'no's, and sunny with 2 'yes' and 3 'no's. it also discusses pure leaf nodes, feature selection based on impurity, and the use of entropy and gini coefficient to determine purity, as well as information gain for feature selection.", 'chapters': [{'end': 1228.381, 'start': 1076.178, 'title': 'Decision tree analysis on weather data', 'summary': "Presents a decision tree analysis on weather data, with overcast having the highest number of 'yes' and no 'no's at 4 and 0 respectively, followed by rain with 3 'yes' and 2 'no's, and sunny with 2 'yes' and 3 'no's.", 'duration': 152.203, 'highlights': ["Overcast has the highest number of 'yes' and 'no's at 4 and 0 respectively. This reveals the highest level of purity in the data, making it a significant factor in the decision tree analysis.", "Rain has 3 'yes' and 2 'no's, making it the second most significant factor in the decision tree analysis. This quantifiable data showcases the level of impurity in the data, contributing to the decision tree's branching.", "Sunny has 2 'yes' and 3 'no's, making it the least significant factor in the decision tree analysis. This information emphasizes the relative lack of purity in the data associated with the 'Sunny' category, making it a less influential factor in the decision tree analysis."]}, {'end': 1419.18, 'start': 1228.381, 'title': 'Decision tree split and purity', 'summary': 'Discusses the concept of pure leaf nodes in decision trees, the process of feature selection based on impurity, and the use of entropy and gini coefficient to determine purity, as well as information gain for feature selection.', 'duration': 190.799, 'highlights': ["The concept of pure leaf nodes in decision trees is explained, where a node is considered pure if it contains all 'yes' or all 'no', leading to a final decision without further splitting. The concept of pure leaf nodes in decision trees is explained, where a node is considered pure if it contains all 'yes' or all 'no', leading to a final decision without further splitting.", 'The process of feature selection based on impurity is discussed, where impure nodes lead to further splitting until a pure leaf node is achieved. The process of feature selection based on impurity is discussed, where impure nodes lead to further splitting until a pure leaf node is achieved.', 'The use of entropy and Gini coefficient to determine purity in decision trees is mentioned, highlighting their role in identifying pure splits and leaf nodes. The use of entropy and Gini coefficient to determine purity in decision trees is mentioned, highlighting their role in identifying pure splits and leaf nodes.', 'The concept of information gain for feature selection is introduced as a method to solve the problem of feature selection in decision trees. The concept of information gain for feature selection is introduced as a method to solve the problem of feature selection in decision trees.']}], 'duration': 343.002, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41076178.jpg', 'highlights': ["Overcast has the highest purity with 4 'yes' and 0 'no's.", "Rain has 3 'yes' and 2 'no's, indicating impurity.", "Sunny has 2 'yes' and 3 'no's, showing lower purity.", 'Pure leaf nodes lead to final decisions without further splitting.', 'Feature selection is based on impurity and leads to pure leaf nodes.', 'Entropy and Gini coefficient determine purity in decision trees.', 'Information gain is used for feature selection in decision trees.']}, {'end': 2468.5, 'segs': [{'end': 1448.725, 'src': 'embed', 'start': 1419.6, 'weight': 0, 'content': [{'end': 1426.523, 'text': "Okay So now let's go ahead and let's understand about entropy or Gini coefficient or information gain.", 'start': 1419.6, 'duration': 6.923}, {'end': 1434.309, 'text': 'Okay So I hope, can I get a confirmation you have understood till here? Yes.', 'start': 1426.963, 'duration': 7.346}, {'end': 1437.352, 'text': "I hope everybody's understood till here.", 'start': 1435.751, 'duration': 1.601}, {'end': 1441.5, 'text': 'entropy or guinea coefficient?', 'start': 1439.378, 'duration': 2.122}, {'end': 1443.882, 'text': "oh sorry, guinea coefficient, i'm saying guinea impurity.", 'start': 1441.5, 'duration': 2.382}, {'end': 1448.725, 'text': "also, you can say over here i'll write it as guinea impurity, not coefficient.", 'start': 1443.882, 'duration': 4.843}], 'summary': 'Discussion on entropy, gini coefficient, and information gain.', 'duration': 29.125, 'max_score': 1419.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41419600.jpg'}, {'end': 1529.703, 'src': 'embed', 'start': 1502.136, 'weight': 4, 'content': [{'end': 1506.7, 'text': 'Again, guys, I am not going to derive this formula because you need to understand how the decision tree is actually working.', 'start': 1502.136, 'duration': 4.564}, {'end': 1511.504, 'text': 'I will discuss about each and every component that is present in the formula.', 'start': 1507.621, 'duration': 3.883}, {'end': 1527.439, 'text': 'So, H of S is equal to minus P plus, I will talk about what is minus, what is P plus, log base 2, P plus, minus P minus, log base 2 p minus.', 'start': 1511.584, 'duration': 15.855}, {'end': 1529.703, 'text': 'So this is the formula for entropy.', 'start': 1527.499, 'duration': 2.204}], 'summary': 'Explanation of the entropy formula and its components in decision tree analysis.', 'duration': 27.567, 'max_score': 1502.136, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41502136.jpg'}, {'end': 1651.765, 'src': 'embed', 'start': 1619.61, 'weight': 3, 'content': [{'end': 1620.811, 'text': "Let's take this example.", 'start': 1619.61, 'duration': 1.201}, {'end': 1626.728, 'text': 'How do we calculate the entropy of this? So I have already shown you the entropy formula over here.', 'start': 1620.911, 'duration': 5.817}, {'end': 1629.49, 'text': "Now let's understand the components.", 'start': 1626.848, 'duration': 2.642}, {'end': 1635.033, 'text': 'Okay I will write H of S is equal to minus sign is there.', 'start': 1629.85, 'duration': 5.183}, {'end': 1636.815, 'text': 'What is P plus??', 'start': 1635.914, 'duration': 0.901}, {'end': 1644.78, 'text': 'P plus basically means that what is the probability of S??', 'start': 1636.935, 'duration': 7.845}, {'end': 1648.442, 'text': 'What is the probability of S??', 'start': 1646.381, 'duration': 2.061}, {'end': 1651.765, 'text': 'This is a simple thing for you all out of this.', 'start': 1648.603, 'duration': 3.162}], 'summary': 'Calculating entropy using probability and entropy formula.', 'duration': 32.155, 'max_score': 1619.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41619610.jpg'}, {'end': 1886.327, 'src': 'embed', 'start': 1851.508, 'weight': 1, 'content': [{'end': 1863.26, 'text': "What will be the entropy of this node? What will be the entropy of this node? Okay, what is the entropy of this node? Let's calculate.", 'start': 1851.508, 'duration': 11.752}, {'end': 1864.401, 'text': "So here I'm going to write.", 'start': 1863.441, 'duration': 0.96}, {'end': 1866.862, 'text': "So here I'm going to just make a graph.", 'start': 1865.121, 'duration': 1.741}, {'end': 1875.344, 'text': 'H of S minus, what is P plus? P plus is nothing but 3 by 6.', 'start': 1867.202, 'duration': 8.142}, {'end': 1878.946, 'text': 'Log base 2, 3 by 6.', 'start': 1875.344, 'duration': 3.602}, {'end': 1883.903, 'text': 'Minus, 3 nodes are there, 3 by 6.', 'start': 1878.946, 'duration': 4.957}, {'end': 1886.327, 'text': 'log base 2 3 by 6.', 'start': 1883.903, 'duration': 2.424}], 'summary': 'Entropy of node to be calculated using given probabilities.', 'duration': 34.819, 'max_score': 1851.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41851508.jpg'}], 'start': 1419.6, 'title': 'Decision tree concepts', 'summary': 'Covers entropy, gini impurity, and information gain in decision tree classification, including formulas and examples. it emphasizes understanding decision tree working and provides an example resulting in an entropy value of approximately 0.94 for the root node.', 'chapters': [{'end': 1557.907, 'start': 1419.6, 'title': 'Entropy and gini impurity in decision trees', 'summary': 'Discusses the concepts of entropy and gini impurity in decision tree classification, including their formulas and when to use them, and emphasizes understanding the working of decision trees.', 'duration': 138.307, 'highlights': ['The chapter emphasizes the understanding of the working of decision trees and discusses the concepts of entropy and Gini impurity in decision tree classification.', 'The formula for entropy, H of S, is given by H of S = -P+log2(P+)-P-log2(P-), and the default decision tree classification uses Gini impurity.', 'The formula for Gini impurity is 1 - summation of i=1 to n p square.']}, {'end': 2253.325, 'start': 1559.228, 'title': 'Entropy calculation and feature selection', 'summary': 'Discusses the calculation of entropy to determine the purity of splits in decision tree nodes, with an example demonstrating how to calculate entropy and the resulting graph, and then delves into the problem of feature selection for further splitting.', 'duration': 694.097, 'highlights': ["The chapter begins by demonstrating the calculation of entropy for a decision tree node with a split of 3 Yes and 3 No's, resulting in an entropy of 1, indicating a 50-50% probability of Yes and No, and explains that entropy will always be between 0 and 1, where a value of 0 represents a pure split and 1 represents an impure split.", 'The discussion then moves to graphically representing entropy based on the probability of Yes and No, illustrating that a 50-50% probability results in an entropy of 1, while a probability of 0 or 1 results in an entropy of 0, providing a clear understanding of the relationship between probability and entropy.', 'Further, the chapter addresses the problem of feature selection for splitting, presenting an example with multiple features and categories, and raises the question of how to decide which feature to take for splitting, setting the stage for the subsequent discussion on feature selection.']}, {'end': 2468.5, 'start': 2254.165, 'title': 'Information gain computation', 'summary': 'Explains the calculation of information gain using the entropy formula and provides an example of computing information gain for a specific split, resulting in an entropy value of approximately 0.94 for the root node.', 'duration': 214.335, 'highlights': ['The chapter explains the calculation of information gain using the entropy formula. It provides a step-by-step explanation of how to calculate information gain using the entropy formula, demonstrating the process of computing entropy for the root node and emphasizing the importance of entropy in information gain computation.', "Example of computing information gain for a specific split resulting in an entropy value of approximately 0.94 for the root node. It presents an example of computing information gain for a specific split, detailing the values of 'yes' and 'no' categories and guiding through the process of calculating entropy for the root node, which yields an entropy value of approximately 0.94."]}], 'duration': 1048.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC41419600.jpg', 'highlights': ['The chapter emphasizes the understanding of the working of decision trees and discusses the concepts of entropy and Gini impurity in decision tree classification.', 'The formula for entropy, H of S, is given by H of S = -P+log2(P+)-P-log2(P-), and the default decision tree classification uses Gini impurity.', "The chapter begins by demonstrating the calculation of entropy for a decision tree node with a split of 3 Yes and 3 No's, resulting in an entropy of 1, indicating a 50-50% probability of Yes and No, and explains that entropy will always be between 0 and 1, where a value of 0 represents a pure split and 1 represents an impure split.", 'The chapter explains the calculation of information gain using the entropy formula. It provides a step-by-step explanation of how to calculate information gain using the entropy formula, demonstrating the process of computing entropy for the root node and emphasizing the importance of entropy in information gain computation.', "Example of computing information gain for a specific split resulting in an entropy value of approximately 0.94 for the root node. It presents an example of computing information gain for a specific split, detailing the values of 'yes' and 'no' categories and guiding through the process of calculating entropy for the root node, which yields an entropy value of approximately 0.94."]}, {'end': 2886.596, 'segs': [{'end': 2511.269, 'src': 'embed', 'start': 2469.14, 'weight': 2, 'content': [{'end': 2474.401, 'text': "What is S of V and what is S and what is H of SV? It's very, very simple.", 'start': 2469.14, 'duration': 5.261}, {'end': 2477.282, 'text': 'Okay Very, very simple.', 'start': 2475.861, 'duration': 1.421}, {'end': 2478.342, 'text': 'I will discuss about it.', 'start': 2477.502, 'duration': 0.84}, {'end': 2500.846, 'text': 'okay. now, very important.', 'start': 2497.685, 'duration': 3.161}, {'end': 2501.606, 'text': 'just have a look.', 'start': 2500.846, 'duration': 0.76}, {'end': 2505.867, 'text': 'okay, everybody see this graph.', 'start': 2501.606, 'duration': 4.261}, {'end': 2507.408, 'text': 'okay, see this graph.', 'start': 2505.867, 'duration': 1.541}, {'end': 2509.008, 'text': 'i will talk about h of sv.', 'start': 2507.408, 'duration': 1.6}, {'end': 2511.269, 'text': "first of all, i'll talk about h of sv.", 'start': 2509.008, 'duration': 2.261}], 'summary': 'Discussion on s, v, and h of sv, with emphasis on simplicity and importance.', 'duration': 42.129, 'max_score': 2469.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC42469140.jpg'}, {'end': 2570.409, 'src': 'embed', 'start': 2538.603, 'weight': 0, 'content': [{'end': 2543.927, 'text': 'okay. so here again you will write tell me minus 6 by 8.', 'start': 2538.603, 'duration': 5.324}, {'end': 2549.002, 'text': 'log base 2 6 by 8.', 'start': 2543.927, 'duration': 5.075}, {'end': 2551.803, 'text': 'minus 2 by 8.', 'start': 2549.002, 'duration': 2.801}, {'end': 2554.084, 'text': 'log base 2 2 by 8.', 'start': 2551.803, 'duration': 2.281}, {'end': 2557.244, 'text': 'i hope everybody knows this, how we got it.', 'start': 2554.084, 'duration': 3.16}, {'end': 2559.365, 'text': 'okay. so please write it down.', 'start': 2557.244, 'duration': 2.121}, {'end': 2562.946, 'text': "i'll try to write it down again for you so that your simplification.", 'start': 2559.365, 'duration': 3.581}, {'end': 2566.167, 'text': 'and again people will say, chris, no, i did not understand, okay.', 'start': 2562.946, 'duration': 3.221}, {'end': 2570.409, 'text': "so Let's see this, okay?", 'start': 2566.167, 'duration': 4.242}], 'summary': 'Solving logarithmic expressions and simplifying fractions.', 'duration': 31.806, 'max_score': 2538.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC42538603.jpg'}], 'start': 2469.14, 'title': 'Entropy calculation and feature selection', 'summary': 'Covers the entropy calculation for categories and categorical data, emphasizing practical understanding and application. it also discusses the process of selecting the best feature for splitting based on information gain values.', 'chapters': [{'end': 2645.228, 'start': 2469.14, 'title': 'Entropy calculation for categories', 'summary': 'Discusses the calculation of entropy for categories in the context of s of v and h of sv, involving specific calculations and explanations with a focus on practical understanding and application.', 'duration': 176.088, 'highlights': ['The process involves calculating the entropy of category 1 and category 2 separately by using specific formulas, resulting in the values of 0.81 for H of C1 and 1 for H of C2.', 'The calculation method includes determining the entropy for each category using the formula: - (p1 * log2(p1)) - (p2 * log2(p2)), where p1 and p2 represent the probabilities of different outcomes within each category.', 'The speaker emphasizes the importance of practical understanding by encouraging the audience to actively participate in the calculation process and reinforcing the concept through interactive questioning.']}, {'end': 2743.252, 'start': 2645.248, 'title': 'Entropy calculation for categorical data', 'summary': 'Explains the calculation of entropy for categorical data using a specific equation, with detailed steps and numerical examples for sample and category calculations.', 'duration': 98.004, 'highlights': ['The equation for calculating entropy for categorical data involves the summation of the ratio of samples for each category multiplied by the entropy of that category, illustrated with the specific example of 8 samples out of 14 for category 1 with an entropy of 0.81.', "Numerical examples are provided for calculating the number of samples for each category, with a specific instance of 9 'yes' and 5 'no' samples out of a total of 14 samples, resulting in an 8/14 ratio for category 1 and a 6/14 ratio for category 2.", 'Explanation is given regarding the multiplication of the sample ratio with the entropy of each category, with the relevant entropy values provided, such as 0.81 for category 1 and 1 for category 2, forming the components of the entropy calculation equation.']}, {'end': 2886.596, 'start': 2743.252, 'title': 'Selecting the best feature for splitting', 'summary': 'Discusses the process of selecting the best feature for splitting, comparing the information gain of feature 1 and feature 2, and concludes that feature 2 should be used for splitting based on the calculated gain values.', 'duration': 143.344, 'highlights': ['The information gain of S, F2 is greater than the gain of S, F1, indicating that feature 2 should be used for splitting first.', 'The gain for S, feature two is 0.00051, while the gain for S, F1 is 0.041, demonstrating the difference in information gain between the two features.', 'Encouragement for audience interaction by requesting likes and stating a reward for reaching a certain number of likes.']}], 'duration': 417.456, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC42469140.jpg', 'highlights': ['The process involves calculating the entropy of category 1 and category 2 separately using specific formulas, resulting in the values of 0.81 for H of C1 and 1 for H of C2.', 'The equation for calculating entropy for categorical data involves the summation of the ratio of samples for each category multiplied by the entropy of that category, illustrated with the specific example of 8 samples out of 14 for category 1 with an entropy of 0.81.', 'The information gain of S, F2 is greater than the gain of S, F1, indicating that feature 2 should be used for splitting first.']}, {'end': 3544.136, 'segs': [{'end': 3056.991, 'src': 'embed', 'start': 3023.033, 'weight': 0, 'content': [{'end': 3030.235, 'text': 'Right? Then I will say 1 by, 1 by 4 plus 1 by 4 is nothing but 2 by 4, which is nothing but 1 by 2.', 'start': 3023.033, 'duration': 7.202}, {'end': 3032.779, 'text': 'So I will be getting 0.5.', 'start': 3030.235, 'duration': 2.544}, {'end': 3037.381, 'text': 'Now here you understand this is a complete impure split, right?', 'start': 3032.779, 'duration': 4.602}, {'end': 3045.405, 'text': 'If you have an impure split in entropy, the output, you are getting it as 1, right?', 'start': 3038.682, 'duration': 6.723}, {'end': 3056.991, 'text': 'Whereas in the case of Gini impurity, you are getting how much? As 0, sorry, 0.5..', 'start': 3046.006, 'duration': 10.985}], 'summary': 'Impure split results in 0.5 in gini impurity, 1 in entropy.', 'duration': 33.958, 'max_score': 3023.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43023033.jpg'}, {'end': 3308.435, 'src': 'embed', 'start': 3277.035, 'weight': 2, 'content': [{'end': 3279.556, 'text': 'Entropy should be used when less number of features are there.', 'start': 3277.035, 'duration': 2.521}, {'end': 3282.277, 'text': 'Gini should be used when huge number of features are there.', 'start': 3280.036, 'duration': 2.241}, {'end': 3283.017, 'text': 'Very simple.', 'start': 3282.437, 'duration': 0.58}, {'end': 3286.218, 'text': 'Okay Okay.', 'start': 3284.357, 'duration': 1.861}, {'end': 3288.579, 'text': "Let's say that I have F1 and output.", 'start': 3286.478, 'duration': 2.101}, {'end': 3294.387, 'text': "Okay So this F1, let's say that I have values like I've sorted order values.", 'start': 3288.879, 'duration': 5.508}, {'end': 3297.23, 'text': "I'm sorting these features and basically doing this.", 'start': 3294.447, 'duration': 2.783}, {'end': 3300.373, 'text': "Let's say that initially I have these features like this.", 'start': 3297.29, 'duration': 3.083}, {'end': 3308.435, 'text': "Okay And let's say I have values like 2.3, 1.3, 4, 5, 7, 3.", 'start': 3300.613, 'duration': 7.822}], 'summary': 'Use entropy for fewer features, gini for more features.', 'duration': 31.4, 'max_score': 3277.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43277035.jpg'}, {'end': 3456.186, 'src': 'embed', 'start': 3426.45, 'weight': 3, 'content': [{'end': 3433.074, 'text': 'So in this now you will be having two records which will basically say how many yes and no are there and remaining all records will come over here.', 'start': 3426.45, 'duration': 6.624}, {'end': 3436.996, 'text': 'Then again information gain will be computed here.', 'start': 3434.334, 'duration': 2.662}, {'end': 3440.921, 'text': 'information gain will be computed.', 'start': 3439.44, 'duration': 1.481}, {'end': 3443.381, 'text': "Then again, what will happen? They'll go to the next record.", 'start': 3440.961, 'duration': 2.42}, {'end': 3449.484, 'text': "Then again, they'll create another feature where they'll say less than or equal to 3, and they will create this many nodes.", 'start': 3443.782, 'duration': 5.702}, {'end': 3456.186, 'text': "Again, they'll try to understand how many yes or no are there, and then they'll again compute the information gain.", 'start': 3450.324, 'duration': 5.862}], 'summary': "The process involves computing information gain for different features and their nodes to understand the distribution of 'yes' and 'no' outcomes.", 'duration': 29.736, 'max_score': 3426.45, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43426450.jpg'}], 'start': 2886.596, 'title': 'Calculating information gain and gini vs entropy', 'summary': "Covers the process of calculating information gain and gini impurity, with an explanation of gini impurity's formula and the difference between gini impurity and entropy, providing insights on their applications in decision trees.", 'chapters': [{'end': 3021.333, 'start': 2886.596, 'title': 'Calculating information gain and using gini impurity', 'summary': 'Covers the process of calculating information gain by evaluating the paths and selecting the one with the highest information gain, followed by an explanation of gini impurity and its formula 1 - σp^2, where p is the probability of each output, enabling the calculation of impurity in decision trees.', 'duration': 134.737, 'highlights': ['The process of calculating information gain involves evaluating all paths and selecting the one with the highest information gain.', 'Explanation of Gini impurity and its formula 1 - Σp^2, where p is the probability of each output, enabling the calculation of impurity in decision trees.', 'The Gini impurity formula 1 - Σp^2 is used for calculating impurity in decision trees, with p representing the probability of each output.', 'The Gini impurity formula 1 - Σp^2 is particularly useful for decision trees, as it allows for the calculation of impurity based on the probabilities of different outputs.']}, {'end': 3544.136, 'start': 3023.033, 'title': 'Gini vs entropy in decision tree', 'summary': 'Explains the difference between gini impurity and entropy, highlighting that gini impurity is faster and should be used for a large number of features, while entropy is slower and should be used for a smaller set of features, with a detailed explanation of how continuous features are handled in decision trees.', 'duration': 521.103, 'highlights': ['Gini impurity is faster and should be used for a large number of features. Gini impurity is faster than entropy, making it suitable for decision trees with a large number of features, such as 100 or 200, as it requires less time for execution.', 'Entropy is slower and should be used for a smaller set of features. Entropy is slower than Gini impurity, making it suitable for decision trees with a smaller set of features, as it takes more time for execution, especially when dealing with a huge number of features.', 'Explanation of how continuous features are handled in decision trees. The speaker provides a detailed explanation of how decision trees handle continuous features, including sorting values, calculating information gain, and selecting the best split based on information gain, offering a clear understanding of the process.']}], 'duration': 657.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC42886596.jpg', 'highlights': ['The process of calculating information gain involves evaluating all paths and selecting the one with the highest information gain.', 'Explanation of Gini impurity and its formula 1 - Σp^2, where p is the probability of each output, enabling the calculation of impurity in decision trees.', 'Gini impurity is faster and should be used for a large number of features. Gini impurity is faster than entropy, making it suitable for decision trees with a large number of features, such as 100 or 200, as it requires less time for execution.', 'Explanation of how continuous features are handled in decision trees. The speaker provides a detailed explanation of how decision trees handle continuous features, including sorting values, calculating information gain, and selecting the best split based on information gain, offering a clear understanding of the process.']}, {'end': 3926.864, 'segs': [{'end': 3602.9, 'src': 'embed', 'start': 3570.255, 'weight': 2, 'content': [{'end': 3580.918, 'text': 'if you remember from our logistic linear regression, how do we calculate 1 by 2 m summation of i is equal to 1 to n, y hat minus y whole square.', 'start': 3570.255, 'duration': 10.663}, {'end': 3587.307, 'text': 'Y hat of I minus Y whole squared.', 'start': 3584.425, 'duration': 2.882}, {'end': 3588.868, 'text': 'This is what is mean squared error.', 'start': 3587.627, 'duration': 1.241}, {'end': 3598.396, 'text': 'So what it will do, first based on F1 feature, it will try to assign a mean value and then it will compute the MSC value.', 'start': 3589.749, 'duration': 8.647}, {'end': 3602.9, 'text': 'Okay And then it will go ahead and do the splitting.', 'start': 3599.217, 'duration': 3.683}], 'summary': 'Mean squared error calculates the mse value based on feature f1 for splitting.', 'duration': 32.645, 'max_score': 3570.255, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43570255.jpg'}, {'end': 3847.616, 'src': 'heatmap', 'start': 3793.771, 'weight': 0.701, 'content': [{'end': 3801.318, 'text': 'okay, now, if i talk about hyperparameter, see, this is what is the formula that gets applied over msc?', 'start': 3793.771, 'duration': 7.547}, {'end': 3808.923, 'text': "okay, Now, let's see in this hyperparameter, always understand, decision tree leads to overfitting,", 'start': 3801.318, 'duration': 7.605}, {'end': 3813.025, 'text': 'because we are just going to divide the nodes to whatever level we want.', 'start': 3808.923, 'duration': 4.102}, {'end': 3817.627, 'text': 'Okay So this obviously will lead to overfitting.', 'start': 3814.446, 'duration': 3.181}, {'end': 3822.709, 'text': 'Now in order to prevent overfitting, we perform two important steps.', 'start': 3818.888, 'duration': 3.821}, {'end': 3829.792, 'text': 'One is post pruning and one is pre pruning.', 'start': 3822.909, 'duration': 6.883}, {'end': 3837.952, 'text': "So this two post-pruning and pre-pruning is a condition, let's say that I have done some splits.", 'start': 3832.93, 'duration': 5.022}, {'end': 3840.673, 'text': 'I have done some splits.', 'start': 3839.753, 'duration': 0.92}, {'end': 3843.594, 'text': "Let's say over here, I have seven yes and two no.", 'start': 3841.173, 'duration': 2.421}, {'end': 3847.616, 'text': 'And again, probably I do the further split like this.', 'start': 3845.215, 'duration': 2.401}], 'summary': 'Hyperparameter formula applied over msc leads to overfitting, prevented by post-pruning and pre-pruning.', 'duration': 53.845, 'max_score': 3793.771, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43793771.jpg'}, {'end': 3881.144, 'src': 'embed', 'start': 3853.278, 'weight': 4, 'content': [{'end': 3857.82, 'text': 'there is more than 80% chances that this node is saying that the output is yes.', 'start': 3853.278, 'duration': 4.542}, {'end': 3863.92, 'text': 'So should we further do more pruning? The answer is no.', 'start': 3858.4, 'duration': 5.52}, {'end': 3866.781, 'text': 'We can close it and we can cut the branch from here.', 'start': 3864.22, 'duration': 2.561}, {'end': 3871.502, 'text': 'This technique is basically called as post-pruning.', 'start': 3868.401, 'duration': 3.101}, {'end': 3875.943, 'text': 'That basically means, first of all, you create your decision tree,', 'start': 3872.182, 'duration': 3.761}, {'end': 3881.144, 'text': 'then probably see the decision tree and see that whether there is an extra branch or not, and just try to cut it.', 'start': 3875.943, 'duration': 5.201}], 'summary': 'More than 80% chance of yes output, no need for further pruning, post-pruning involves cutting unnecessary branches.', 'duration': 27.866, 'max_score': 3853.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43853278.jpg'}, {'end': 3931.406, 'src': 'embed', 'start': 3899.637, 'weight': 0, 'content': [{'end': 3901.438, 'text': 'you may say that what is the max depth?', 'start': 3899.637, 'duration': 1.801}, {'end': 3903.859, 'text': 'What is the max depth?', 'start': 3902.939, 'duration': 0.92}, {'end': 3905.66, 'text': 'How many max leaf you can have?', 'start': 3903.939, 'duration': 1.721}, {'end': 3908.722, 'text': 'So this all parameters.', 'start': 3907.061, 'duration': 1.661}, {'end': 3910.763, 'text': 'you can set it with grid search CV.', 'start': 3908.722, 'duration': 2.041}, {'end': 3916.607, 'text': 'And you can try it and you can basically come up with a pre-pruning technique.', 'start': 3912.625, 'duration': 3.982}, {'end': 3926.864, 'text': 'I hope everybody is able to understand, right? So this is the idea about decision tree regressor.', 'start': 3920.802, 'duration': 6.062}, {'end': 3931.406, 'text': 'And what I will also do is that I will show you one practical example.', 'start': 3927.585, 'duration': 3.821}], 'summary': 'Parameters like max depth and max leaf can be set using grid search cv to come up with a pre-pruning technique for decision tree regressor.', 'duration': 31.769, 'max_score': 3899.637, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43899637.jpg'}], 'start': 3544.696, 'title': 'Decision tree regressor', 'summary': 'Explains the use of mean squared error as the cost function in decision trees, its application in assigning mean values, computing msc for splitting, process of splitting based on continuous variable categories, and post-pruning and pre-pruning techniques to prevent overfitting.', 'chapters': [{'end': 3602.9, 'start': 3544.696, 'title': 'Mean squared error in decision trees', 'summary': 'Explains the use of mean squared error as the cost function in decision trees, and its application in assigning mean values and computing msc for splitting.', 'duration': 58.204, 'highlights': ['The cost function used in decision trees is mean squared error, or mean absolute error, with the former being calculated as 1/2m * Σ(i=1 to n) of (ŷi - y)², where ŷi is the predicted value and y is the actual value.', 'The mean squared error is utilized to assign a mean value based on a feature and then compute the MSC value before proceeding with the splitting process.']}, {'end': 3926.864, 'start': 3604.801, 'title': 'Decision tree regressor', 'summary': 'Explains the process of splitting based on continuous variable categories, using mean squared error, and discusses post-pruning and pre-pruning techniques to prevent overfitting in decision tree regressor.', 'duration': 322.063, 'highlights': ['The process of splitting based on continuous variable categories involves obtaining different categories and calculating the mean value after the split, which becomes the output, leading to reaching the leaf node. This is the difference between decision tree regressor and classifier.', 'In decision tree regressor, the mean squared error is used instead of entropy, and hyperparameters like max depth and max leaf can be set using grid search CV to prevent overfitting.', 'Post-pruning involves identifying nodes with a high probability of output and cutting off further splits, while pre-pruning is determined by hyperparameters like max depth and max leaf to prevent overfitting.']}], 'duration': 382.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43544696.jpg', 'highlights': ['The cost function used in decision trees is mean squared error, or mean absolute error, with the former being calculated as 1/2m * Σ(i=1 to n) of (ŷi - y)², where ŷi is the predicted value and y is the actual value.', 'The mean squared error is utilized to assign a mean value based on a feature and then compute the MSC value before proceeding with the splitting process.', 'In decision tree regressor, the mean squared error is used instead of entropy, and hyperparameters like max depth and max leaf can be set using grid search CV to prevent overfitting.', 'The process of splitting based on continuous variable categories involves obtaining different categories and calculating the mean value after the split, which becomes the output, leading to reaching the leaf node. This is the difference between decision tree regressor and classifier.', 'Post-pruning involves identifying nodes with a high probability of output and cutting off further splits, while pre-pruning is determined by hyperparameters like max depth and max leaf to prevent overfitting.']}, {'end': 4689.986, 'segs': [{'end': 4108.011, 'src': 'embed', 'start': 4006.681, 'weight': 3, 'content': [{'end': 4007.921, 'text': 'This is fun, right? If I do it.', 'start': 4006.681, 'duration': 1.24}, {'end': 4032.797, 'text': 'pd import matplotlib.pyplot as plt ok import So this basic things I have with me.', 'start': 4010.339, 'duration': 22.458}, {'end': 4041.283, 'text': 'So I will go and take any dataset that I want from sklearn.datasets import.', 'start': 4033.157, 'duration': 8.126}, {'end': 4043.745, 'text': "Let's say that I'm going to take load iris dataset.", 'start': 4041.403, 'duration': 2.342}, {'end': 4047.908, 'text': 'Iris dataset.', 'start': 4047.167, 'duration': 0.741}, {'end': 4051.751, 'text': 'So everybody go ahead and write it down.', 'start': 4049.989, 'duration': 1.762}, {'end': 4057.255, 'text': "Okay And then I'm going to upload the iris dataset.", 'start': 4053.652, 'duration': 3.603}, {'end': 4058.816, 'text': "So I'm going to write load iris.", 'start': 4057.295, 'duration': 1.521}, {'end': 4084.438, 'text': 'in my iris dataset then the next step once you get your iris dataset Okay.', 'start': 4062.359, 'duration': 22.079}, {'end': 4086.819, 'text': 'So this is my iris.data.', 'start': 4084.778, 'duration': 2.041}, {'end': 4089.641, 'text': 'Okay These are all my features.', 'start': 4088.36, 'duration': 1.281}, {'end': 4092.022, 'text': 'The four features will be there over here.', 'start': 4089.741, 'duration': 2.281}, {'end': 4097.045, 'text': 'These four features are petal length, petal width, sepal length and sepal width.', 'start': 4093.083, 'duration': 3.962}, {'end': 4099.526, 'text': 'Okay This is my independent features.', 'start': 4097.285, 'duration': 2.241}, {'end': 4108.011, 'text': 'Okay Then if I really want to apply for classifier, for decision tree classifier.', 'start': 4100.127, 'duration': 7.884}], 'summary': 'Using sklearn.datasets, load iris dataset with 4 features for classifier.', 'duration': 101.33, 'max_score': 4006.681, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC44006681.jpg'}, {'end': 4204.426, 'src': 'embed', 'start': 4162.74, 'weight': 7, 'content': [{'end': 4166.662, 'text': "Then I'll probably show you how you can go ahead with pruning.", 'start': 4162.74, 'duration': 3.922}, {'end': 4175.068, 'text': 'So by default, what are the parameters over here? If you probably go and see in the classifier over here, you have criterion.', 'start': 4167.863, 'duration': 7.205}, {'end': 4175.428, 'text': 'See this.', 'start': 4175.108, 'duration': 0.32}, {'end': 4178.51, 'text': 'The first parameter is criterion.', 'start': 4175.948, 'duration': 2.562}, {'end': 4179.89, 'text': 'By default, it is guinea.', 'start': 4178.77, 'duration': 1.12}, {'end': 4183.057, 'text': 'okay, then you have splitter.', 'start': 4181.194, 'duration': 1.863}, {'end': 4188.801, 'text': 'splitter basically means how you are going to split, and there also you have two types best and random.', 'start': 4183.057, 'duration': 5.744}, {'end': 4193.327, 'text': 'you can randomly select the features and do it okay, you should always go with best.', 'start': 4188.801, 'duration': 4.526}, {'end': 4196.564, 'text': 'Max depth is a hyperparameter.', 'start': 4194.944, 'duration': 1.62}, {'end': 4199.065, 'text': 'Minimum sample lift is a hyperparameter.', 'start': 4197.205, 'duration': 1.86}, {'end': 4204.426, 'text': 'Max features, how many number of features we are going to take in order to fix that, that is also an hyperparameter.', 'start': 4199.125, 'duration': 5.301}], 'summary': 'Explanation of decision tree pruning parameters: criterion, splitter, max depth, min sample leaf, and max features.', 'duration': 41.686, 'max_score': 4162.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC44162740.jpg'}, {'end': 4308.691, 'src': 'embed', 'start': 4258.391, 'weight': 0, 'content': [{'end': 4263.392, 'text': 'So tree, sorry, tree, tree, tree, tree, tree, tree, tree, tree, tree, tree, tree, tree.', 'start': 4258.391, 'duration': 5.001}, {'end': 4266.793, 'text': 'It should be classified tree dot plot.', 'start': 4264.592, 'duration': 2.201}, {'end': 4269.974, 'text': 'Okay I have to also import a tree.', 'start': 4266.833, 'duration': 3.141}, {'end': 4282.305, 'text': 'Okay So I have to basically import tree from sklearn import tree.', 'start': 4274.54, 'duration': 7.765}, {'end': 4286.647, 'text': "Again, I'm getting error.", 'start': 4284.586, 'duration': 2.061}, {'end': 4289.169, 'text': 'Has no attribute plot.', 'start': 4287.688, 'duration': 1.481}, {'end': 4296.213, 'text': 'Why? Let me just see the documentation guys.', 'start': 4289.469, 'duration': 6.744}, {'end': 4302.537, 'text': 'Just seeing the sklearn documentation.', 'start': 4300.936, 'duration': 1.601}, {'end': 4308.691, 'text': 'Just a second.', 'start': 4307.95, 'duration': 0.741}], 'summary': 'Error in importing tree from sklearn for tree dot plot.', 'duration': 50.3, 'max_score': 4258.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC44258391.jpg'}, {'end': 4689.986, 'src': 'embed', 'start': 4674.337, 'weight': 6, 'content': [{'end': 4677.638, 'text': 'all the information is given in the description of this particular video.', 'start': 4674.337, 'duration': 3.301}, {'end': 4679.379, 'text': 'thank you guys, have a great day.', 'start': 4677.638, 'duration': 1.741}, {'end': 4687.624, 'text': "keep on rocking, keep on learning, and all these things will be given in the core community sessions that i'm going to provide you.", 'start': 4679.379, 'duration': 8.245}, {'end': 4689.986, 'text': 'okay, thank you, bye, bye, keep.', 'start': 4687.624, 'duration': 2.362}], 'summary': 'Community sessions will provide all information for learning and growth.', 'duration': 15.649, 'max_score': 4674.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC44674337.jpg'}], 'start': 3927.585, 'title': 'Decision tree in practice', 'summary': 'Discusses decision tree pruning, post-pruning techniques, and creating a classifier using the iris dataset, covering importing libraries, hyperparameter setting, fitting the classifier, and visualizing the resulting decision tree.', 'chapters': [{'end': 4005.901, 'start': 3927.585, 'title': 'Decision tree pruning and practical example', 'summary': 'Discusses decision tree pruning and demonstrates practical examples, including post-pruning techniques and the process of importing libraries, in a lecture scenario.', 'duration': 78.316, 'highlights': ['The chapter includes a practical example demonstrating post-pruning techniques.', 'The Gini value will always be between 0 and 0.5, contrary to the possibility of being 1.', 'The process of importing libraries is demonstrated in the lecture.']}, {'end': 4689.986, 'start': 4006.681, 'title': 'Decision tree classifier with iris dataset', 'summary': 'Covers the process of creating a decision tree classifier using the iris dataset, including importing the dataset, inspecting its features, setting hyperparameters, fitting the classifier, and visualizing the resulting decision tree.', 'duration': 683.305, 'highlights': ['The chapter demonstrates the process of creating a decision tree classifier using the Iris dataset, including importing the dataset, inspecting its features, setting hyperparameters, fitting the classifier, and visualizing the resulting decision tree.', 'The iris dataset contains four independent features: petal length, petal width, sepal length, and sepal width.', "The decision tree classifier's hyperparameters include criterion, splitter, max depth, minimum sample leaf, and max features, which are essential for the model's performance optimization.", 'The process involves fitting the classifier to the iris dataset and visualizing the resulting decision tree to understand its structure and decision-making process.', 'Post pruning is discussed as a technique to determine if further splits are required in the decision tree, allowing for better model interpretation and optimization.', 'The chapter encourages audience participation and interaction, and also emphasizes the importance of exploring and utilizing the provided code and resources for further learning and understanding.', 'The presenter also promotes the utilization of the One Neuron platform and its benefits before an impending price increase, while encouraging continuous learning and active participation in community sessions.']}], 'duration': 762.401, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/dGNJ-feQLC4/pics/dGNJ-feQLC43927585.jpg', 'highlights': ['The chapter demonstrates the process of creating a decision tree classifier using the Iris dataset, including importing the dataset, inspecting its features, setting hyperparameters, fitting the classifier, and visualizing the resulting decision tree.', "The decision tree classifier's hyperparameters include criterion, splitter, max depth, minimum sample leaf, and max features, which are essential for the model's performance optimization.", 'The iris dataset contains four independent features: petal length, petal width, sepal length, and sepal width.', 'Post pruning is discussed as a technique to determine if further splits are required in the decision tree, allowing for better model interpretation and optimization.', 'The chapter includes a practical example demonstrating post-pruning techniques.', 'The process involves fitting the classifier to the iris dataset and visualizing the resulting decision tree to understand its structure and decision-making process.', 'The Gini value will always be between 0 and 0.5, contrary to the possibility of being 1.', 'The process of importing libraries is demonstrated in the lecture.', 'The chapter encourages audience participation and interaction, and also emphasizes the importance of exploring and utilizing the provided code and resources for further learning and understanding.', 'The presenter also promotes the utilization of the One Neuron platform and its benefits before an impending price increase, while encouraging continuous learning and active participation in community sessions.']}], 'highlights': ["The decision tree algorithm's significance in solving various use cases and its relevance to learning about ensemble techniques in the future.", 'Understanding decision trees through examples and practical application.', 'The model aims to predict if a person will play tennis based on input features like outlook, temperature, humidity, and wind.', "Overcast has the highest purity with 4 'yes' and 0 'no's.", 'The chapter emphasizes the understanding of the working of decision trees and discusses the concepts of entropy and Gini impurity in decision tree classification.', 'The process involves calculating the entropy of category 1 and category 2 separately using specific formulas, resulting in the values of 0.81 for H of C1 and 1 for H of C2.', 'The process of calculating information gain involves evaluating all paths and selecting the one with the highest information gain.', 'The cost function used in decision trees is mean squared error, or mean absolute error, with the former being calculated as 1/2m * Σ(i=1 to n) of (ŷi - y)², where ŷi is the predicted value and y is the actual value.', 'The chapter demonstrates the process of creating a decision tree classifier using the Iris dataset, including importing the dataset, inspecting its features, setting hyperparameters, fitting the classifier, and visualizing the resulting decision tree.']}