title

Complete Machine Learning In 6 Hours| Krish Naik

description

All the materials are available in the below link
https://github.com/krishnaik06/The-Grand-Complete-Data-Science-Materials/tree/main
Visit https://krishnaik.in for data sscience blogs
Time Stamp:
00:00:00 Introduction
00:01:25 AI Vs ML vs DL vs Data Science
00:07:56 Machine LEarning and Deep Learning
00:09:05 Regression And Classification
00:18:14 Linear Regression Algorithm
01:07:14 Ridge And Lasso Regression Algorithms
01:33:08 Logistic Regression Algorithm
02:13:52 Linear Regression Practical Implementation
02:28:30 Ridge And Lasso Regression Practical Implementation
02:54:21 Naive Baye's Algorithms
03:16:02 KNN Algorithm Intuition
03:23:47 Decision Tree Classification Algorithms
03:57:05 Decision Tree Regression Algorithms
04:02:57 Practical Implementation Of Deicsion Tree Classifier
04:09:14 Ensemble Bagging And Bossting Techniques
04:21:29 Random Forest Classifier And Regressor
04:29:58 Boosting, Adaboost Machine Learning Algorithms
04:47:30 K Means Clustering Algorithm
05:01:54 Hierarichal Clustering Algorithms
05:11:28 Silhoutte Clustering- Validating Clusters
05:17:46 Dbscan Clustering Algorithms
05:25:57 Clustering Practical Examples
05:35:51 Bias And Variance Algorithms
05:43:44 Xgboost Classifier Algorithms
06:00:00 Xgboost Regressor Algorithms
06:19:04 SVM Algorithm Machine LEarning Algorithm
---------------------------------------------------------------------------------------------------------------------
►Data Science Projects:
https://www.youtube.com/watch?v=S_F_c9e2bz4&list=PLZoTAELRMXVPS-dOaVbAux22vzqdgoGhG&pp=iAQB
►Learn In One Tutorials
Statistics in 6 hours: https://www.youtube.com/watch?v=LZzq1zSL1bs&t=9522s&pp=ygUVa3Jpc2ggbmFpayBzdGF0aXN0aWNz
Machine Learning In 6 Hours: https://www.youtube.com/watch?v=JxgmHe2NyeY&t=4733s&pp=ygUba3Jpc2ggbmFpayBtYWNoaW5lIGxlYXJuaW5n
Deep Learning 5 hours : https://www.youtube.com/watch?v=d2kxUVwWWwU&t=1210s&pp=ygUYa3Jpc2ggbmFpayBkZWVwIGxlYXJuaW5n
►Learn In a Week Playlist
Statistics:https://www.youtube.com/watch?v=11unm2hmvOQ&list=PLZoTAELRMXVMgtxAboeAx-D9qbnY94Yay
Machine Learning : https://www.youtube.com/watch?v=z8sxaUw_f-M&list=PLZoTAELRMXVPjaAzURB77Kz0YXxj65tYz
Deep Learning:https://www.youtube.com/watch?v=8arGWdq_KL0&list=PLZoTAELRMXVPiyueAqA_eQnsycC_DSBns
NLP : https://www.youtube.com/watch?v=w3coRFpyddQ&list=PLZoTAELRMXVNNrHSKv36Lr3_156yCo6Nn
►Detailed Playlist:
Stats For Data Science In Hindi : https://www.youtube.com/watch?v=7y3XckjaVOw&list=PLTDARY42LDV6YHSRo669_uDDGmUEmQnDJ&pp=gAQB
Machine Learning In English : https://www.youtube.com/watch?v=bPrmA1SEN2k&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe
Machine Learning In Hindi : https://www.youtube.com/watch?v=7uwa9aPbBRU&list=PLTDARY42LDV7WGmlzZtY-w9pemyPrKNUZ&pp=gAQB
Complete Deep Learning: https://www.youtube.com/watch?v=YFNKnUhm_-s&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi

detail

{'title': 'Complete Machine Learning In 6 Hours| Krish Naik', 'heatmap': [{'end': 957.431, 'start': 714.047, 'weight': 0.733}, {'end': 3582.774, 'start': 3341.61, 'weight': 0.717}, {'end': 5492.076, 'start': 5248.493, 'weight': 0.775}, {'end': 7880.019, 'start': 7399.221, 'weight': 0.759}, {'end': 12178.843, 'start': 11695.303, 'weight': 0.899}, {'end': 12892.679, 'start': 12651.037, 'weight': 0.702}, {'end': 14563.11, 'start': 14082.278, 'weight': 0.916}, {'end': 16473.094, 'start': 15994.039, 'weight': 0.718}, {'end': 17198.126, 'start': 16944.229, 'weight': 0.716}], 'summary': 'Covers essential machine learning concepts, including ai vs ml vs dl vs data science, linear regression, ridge and lasso regression, logistic regression, f-score, model selection, probability and classification, decision trees, ensemble techniques, clustering algorithms, xgboost classifier, and practical examples, providing in-depth insights and model evaluations with boston house pricing dataset achieving 96% accuracy.', 'chapters': [{'end': 96.352, 'segs': [{'end': 34.125, 'src': 'embed', 'start': 6.658, 'weight': 0, 'content': [{'end': 9.699, 'text': "so today's session, what all things we are basically going to discuss.", 'start': 6.658, 'duration': 3.041}, {'end': 14.28, 'text': 'so first of all, we are going to discuss about different types of machine learning algorithm,', 'start': 9.699, 'duration': 4.581}, {'end': 17.261, 'text': 'like how many different types of machine learning algorithm are there?', 'start': 14.28, 'duration': 2.981}, {'end': 23.542, 'text': 'understand, the purpose of taking this session is to clear the interviews.', 'start': 17.261, 'duration': 6.281}, {'end': 30.044, 'text': 'okay, clear the interviews once you go for a data science interviews and all the main purpose is to clear the interviews.', 'start': 23.542, 'duration': 6.502}, {'end': 34.125, 'text': 'i have seen people who knew machine learning algorithms in a proper way.', 'start': 30.044, 'duration': 4.081}], 'summary': 'Discussion on different machine learning algorithms to clear data science interviews.', 'duration': 27.467, 'max_score': 6.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6658.jpg'}], 'start': 6.658, 'title': 'Machine learning algorithms and techniques', 'summary': 'Covers various machine learning algorithms, essential for interview preparation, including ai vs ml vs dl vs data science, supervised vs unsupervised ml, linear regression, r square, adjusted r square, ridge, and lasso regression.', 'chapters': [{'end': 96.352, 'start': 6.658, 'title': 'Machine learning algorithms and techniques', 'summary': 'Will cover the different types of machine learning algorithms, the importance of understanding them for interview preparation, and specific topics including ai vs ml vs dl vs data science, supervised ml vs unsupervised ml, linear regression, r square and adjusted r square, and ridge and lasso regression.', 'duration': 89.694, 'highlights': ['The main purpose is to clear the interviews for data science, as understanding and explaining machine learning algorithms properly can lead to successful recruitment.', 'Specific topics to be discussed include AI vs ML vs DL vs Data Science, supervised ML vs unsupervised ML, linear regression, R square and adjusted R square, and ridge and lasso regression.', 'Understanding these topics and algorithms can significantly impact interview success, as individuals who could explain algorithms effectively were able to secure positions.', 'The chapter will delve into the differences between supervised ML and unsupervised ML, providing a comprehensive understanding of the two approaches.', 'The session will cover the mathematical and geometric intuition behind linear regression, aiding in a thorough comprehension of the topic.']}], 'duration': 89.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6658.jpg', 'highlights': ['Understanding and explaining ML algorithms can lead to successful recruitment', 'Explaining algorithms effectively can secure positions', 'Comprehensive understanding of supervised and unsupervised ML approaches is crucial', 'Mathematical and geometric intuition behind linear regression is covered', 'Specific topics include AI vs ML vs DL vs Data Science, supervised vs unsupervised ML, linear regression, R square, adjusted R square, ridge, and lasso regression']}, {'end': 1074.629, 'segs': [{'end': 231.185, 'src': 'embed', 'start': 205.055, 'weight': 8, 'content': [{'end': 212.941, 'text': 'So through this what happens is that it understands your behavior and it is being able to do its task without asking you anything.', 'start': 205.055, 'duration': 7.886}, {'end': 217.065, 'text': 'The second example that I would like to take up in is Amazon.in.', 'start': 213.362, 'duration': 3.703}, {'end': 224.463, 'text': 'Now Amazon.in again, if you buy an iPhone, then it may recommend you a headphones.', 'start': 217.966, 'duration': 6.497}, {'end': 231.185, 'text': 'So this kind of recommendation is also a part of AI module that is integrated with the Amazon.in website.', 'start': 224.804, 'duration': 6.381}], 'summary': 'Ai enables personalized recommendations on amazon.in, improving user experience.', 'duration': 26.13, 'max_score': 205.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY205055.jpg'}, {'end': 495.979, 'src': 'embed', 'start': 465.484, 'weight': 5, 'content': [{'end': 467.245, 'text': 'Sometime I was given a deep learning project.', 'start': 465.484, 'duration': 1.761}, {'end': 473.608, 'text': 'So as a data scientist, if I consider where does data scientist fall into this, it will be a part of everything.', 'start': 467.765, 'duration': 5.843}, {'end': 481.592, 'text': 'So, if I talk about machine learning and deep learning, with respect to any kind of problem statement that we solve,', 'start': 474.709, 'duration': 6.883}, {'end': 485.234, 'text': 'the majority of the business use cases will be falling in two sections.', 'start': 481.592, 'duration': 3.642}, {'end': 487.415, 'text': 'One is supervised machine learning.', 'start': 485.954, 'duration': 1.461}, {'end': 490.096, 'text': 'one is unsupervised machine learning.', 'start': 488.155, 'duration': 1.941}, {'end': 495.979, 'text': 'so most of the problems that you are basically solving this is, with respect to this, two problem statement,', 'start': 490.096, 'duration': 5.883}], 'summary': 'Data scientists handle supervised and unsupervised machine learning for business use cases.', 'duration': 30.495, 'max_score': 465.484, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY465484.jpg'}, {'end': 707.481, 'src': 'embed', 'start': 677.236, 'weight': 0, 'content': [{'end': 682.774, 'text': 'So I have values like as discussed 24, 72, 23, 71, 24, 25, 71.5.', 'start': 677.236, 'duration': 5.538}, {'end': 689.102, 'text': 'Okay, so this kind of data I have.', 'start': 682.779, 'duration': 6.323}, {'end': 693.185, 'text': 'See, this is my output variable, which is my dependent feature.', 'start': 689.823, 'duration': 3.362}, {'end': 695.826, 'text': 'Now, in this particular dependent feature.', 'start': 693.565, 'duration': 2.261}, {'end': 701.469, 'text': "now, whenever I'm trying to find out the output, and in this particular output, you have a continuous variable.", 'start': 695.826, 'duration': 5.643}, {'end': 707.481, 'text': 'when you have a continuous variable, then this becomes a regression problem statement.', 'start': 702.296, 'duration': 5.185}], 'summary': 'The data includes values such as 24, 72, 23, 71, 24, 25, 71.5, indicating a regression problem.', 'duration': 30.245, 'max_score': 677.236, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY677236.jpg'}, {'end': 957.431, 'src': 'heatmap', 'start': 714.047, 'weight': 0.733, 'content': [{'end': 718.07, 'text': 'suppose i am populating this particular data set with the help of scatter plot.', 'start': 714.047, 'duration': 4.023}, {'end': 721.894, 'text': "then, in order to basically solve this problem, what we'll do?", 'start': 718.07, 'duration': 3.824}, {'end': 729.432, 'text': 'suppose? if i take an example of linear regression, i will try to draw a straight line, And this particular line is my equation,', 'start': 721.894, 'duration': 7.538}, {'end': 732.335, 'text': 'which is called as y is equal to mx plus c.', 'start': 729.432, 'duration': 2.903}, {'end': 736.238, 'text': 'And with the help of this particular equation, I will try to find out the predicted points.', 'start': 732.335, 'duration': 3.903}, {'end': 738.3, 'text': 'So this will be my predicted point.', 'start': 736.839, 'duration': 1.461}, {'end': 740.002, 'text': 'this will be my predicted point.', 'start': 738.3, 'duration': 1.702}, {'end': 745.627, 'text': 'this, this, any new points that I see over here will basically be my predicted point with respect to y.', 'start': 740.002, 'duration': 5.625}, {'end': 749.569, 'text': 'So in this way, we basically solve a regression problem statement.', 'start': 746.327, 'duration': 3.242}, {'end': 751.79, 'text': 'So this is very much important to understand.', 'start': 749.889, 'duration': 1.901}, {'end': 756.972, 'text': "Let's go to the, always understand, in a regression problem statement, your output will be a continuous variable.", 'start': 752.25, 'duration': 4.722}, {'end': 760.454, 'text': 'The second one is basically a classification problem.', 'start': 757.553, 'duration': 2.901}, {'end': 762.875, 'text': 'Now in classification problem.', 'start': 761.255, 'duration': 1.62}, {'end': 771.3, 'text': "suppose I have a data set, let's say that number of hours study, number of study hours, number of play hours.", 'start': 762.875, 'duration': 8.425}, {'end': 773.992, 'text': 'So this is my independent feature.', 'start': 772.571, 'duration': 1.421}, {'end': 776.574, 'text': "Let's say a number of sleeping hours.", 'start': 774.012, 'duration': 2.562}, {'end': 780.837, 'text': 'And finally, I have my output, which will be pass or fail.', 'start': 777.054, 'duration': 3.783}, {'end': 786.762, 'text': 'So in this, I have all this as my independent features, and this is my dependent feature.', 'start': 781.778, 'duration': 4.984}, {'end': 789.744, 'text': 'So I will be having some values like this.', 'start': 787.703, 'duration': 2.041}, {'end': 794.348, 'text': 'And here either you will be pass or fail or pass or fail.', 'start': 790.885, 'duration': 3.463}, {'end': 803.313, 'text': 'Now whenever you have in your output fixed number of categories, then that becomes a classification problem.', 'start': 796.107, 'duration': 7.206}, {'end': 807.536, 'text': 'Suppose it just has two outputs, then it becomes a binary classification.', 'start': 803.713, 'duration': 3.823}, {'end': 813.281, 'text': 'If you have more than two different categories, at that time it becomes a multi-class classification.', 'start': 807.956, 'duration': 5.325}, {'end': 818.265, 'text': 'So this is the difference between regression problem statement and the classification problem statement.', 'start': 813.561, 'duration': 4.704}, {'end': 822.971, 'text': "Now let's go ahead and let's discuss about something called as unsupervised machine learning.", 'start': 819.306, 'duration': 3.665}, {'end': 831.124, 'text': "Now in unsupervised machine learning, which is my second main topic, over here I'm just going to write unsupervised machine learning.", 'start': 823.672, 'duration': 7.452}, {'end': 837.952, 'text': 'Now what exactly is unsupervised machine learning? Here, whenever I talk about, there are two main problem statement that we solve.', 'start': 831.825, 'duration': 6.127}, {'end': 841.837, 'text': 'One is clustering, one is dimensionality reduction.', 'start': 838.112, 'duration': 3.725}, {'end': 844.439, 'text': "Let's take one example of a specific data set.", 'start': 842.117, 'duration': 2.322}, {'end': 849.605, 'text': "Over here, let's say that my data set is something called as salary and age.", 'start': 844.6, 'duration': 5.005}, {'end': 855.672, 'text': "Now in this scenario, We don't have any output variable, no output variable, no dependent variable.", 'start': 849.925, 'duration': 5.747}, {'end': 862.941, 'text': 'Then what kind of assumptions were that we can take out from this particular data set? Suppose I have salary and age as my values.', 'start': 855.952, 'duration': 6.989}, {'end': 867.146, 'text': 'So in this particular case, I would like to do something called as clustering.', 'start': 863.241, 'duration': 3.905}, {'end': 874.38, 'text': "Now, why clustering is used? Just understand, let's say I am going to do something called as customer segmentation.", 'start': 868.075, 'duration': 6.305}, {'end': 876.502, 'text': 'Now, what does this customer segmentation do?', 'start': 874.78, 'duration': 1.722}, {'end': 882.786, 'text': 'Clustering basically means that, based on this data, I will try to find out similar groups, groups of people.', 'start': 876.922, 'duration': 5.864}, {'end': 884.528, 'text': 'Suppose this is my one group.', 'start': 883.247, 'duration': 1.281}, {'end': 886.149, 'text': 'This is my another group.', 'start': 884.988, 'duration': 1.161}, {'end': 887.47, 'text': 'This is my third group.', 'start': 886.409, 'duration': 1.061}, {'end': 890.092, 'text': "Let's say that I was able to create this many groups.", 'start': 887.75, 'duration': 2.342}, {'end': 891.533, 'text': 'These many groups are clusters.', 'start': 890.152, 'duration': 1.381}, {'end': 893.814, 'text': "I'll say cluster 1, 2, 3.", 'start': 891.553, 'duration': 2.261}, {'end': 897.578, 'text': 'Each and every cluster will be specifying some information.', 'start': 893.815, 'duration': 3.763}, {'end': 903.942, 'text': 'This cluster may specify that this person, he was very young, but he was able to get some amazing salary.', 'start': 898.318, 'duration': 5.624}, {'end': 910.906, 'text': 'This person, it may specify that these people are basically having more age and they are getting good salary.', 'start': 904.322, 'duration': 6.584}, {'end': 916.51, 'text': 'These people are like middle class background where with respect to the age, the salary is not that much increasing.', 'start': 911.347, 'duration': 5.163}, {'end': 919.812, 'text': 'So here, what we are doing, we are doing clustering.', 'start': 917.25, 'duration': 2.562}, {'end': 921.213, 'text': 'We are grouping them together.', 'start': 919.832, 'duration': 1.381}, {'end': 922.714, 'text': 'Main thing is grouping.', 'start': 921.533, 'duration': 1.181}, {'end': 924.995, 'text': 'This word is very much important.', 'start': 923.635, 'duration': 1.36}, {'end': 933.717, 'text': 'Now, why do we use this? My company launches a product and I want to just target this particular product to rich people.', 'start': 925.416, 'duration': 8.301}, {'end': 938.201, 'text': "Let's say product one is for rich people, product two is for middle class people.", 'start': 934.157, 'duration': 4.044}, {'end': 945.467, 'text': 'So if I make this kind of clusters, I will be able to target my ads only to this kind of people.', 'start': 938.701, 'duration': 6.766}, {'end': 947.529, 'text': "Let's say that this is the rich people.", 'start': 946.087, 'duration': 1.442}, {'end': 949.569, 'text': 'This is the middle class people.', 'start': 948.249, 'duration': 1.32}, {'end': 957.431, 'text': 'I will be able to target this particular ad or this particular product or send this particular things to those specific group of people.', 'start': 949.589, 'duration': 7.842}], 'summary': 'Using scatter plots for linear regression, classification, and unsupervised learning; clustering for customer segmentation.', 'duration': 243.384, 'max_score': 714.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY714047.jpg'}, {'end': 771.3, 'src': 'embed', 'start': 740.002, 'weight': 7, 'content': [{'end': 745.627, 'text': 'this, this, any new points that I see over here will basically be my predicted point with respect to y.', 'start': 740.002, 'duration': 5.625}, {'end': 749.569, 'text': 'So in this way, we basically solve a regression problem statement.', 'start': 746.327, 'duration': 3.242}, {'end': 751.79, 'text': 'So this is very much important to understand.', 'start': 749.889, 'duration': 1.901}, {'end': 756.972, 'text': "Let's go to the, always understand, in a regression problem statement, your output will be a continuous variable.", 'start': 752.25, 'duration': 4.722}, {'end': 760.454, 'text': 'The second one is basically a classification problem.', 'start': 757.553, 'duration': 2.901}, {'end': 762.875, 'text': 'Now in classification problem.', 'start': 761.255, 'duration': 1.62}, {'end': 771.3, 'text': "suppose I have a data set, let's say that number of hours study, number of study hours, number of play hours.", 'start': 762.875, 'duration': 8.425}], 'summary': 'Understanding regression problems and classification problems in data analysis.', 'duration': 31.298, 'max_score': 740.002, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY740002.jpg'}], 'start': 97.072, 'title': 'Ai, ml, dl, and data science', 'summary': 'Differentiates ai, ml, dl, and data science, illustrating their applications with examples, explains the roles and interconnections of ai, machine learning, and data science, and discusses supervised and unsupervised machine learning concepts and applications.', 'chapters': [{'end': 251.41, 'start': 97.072, 'title': 'Ai vs ml vs dl vs data science', 'summary': 'Discusses the differences between ai, ml, dl, and data science, defining ai as the process of creating applications capable of autonomous decision-making, with examples including ai modules in netflix for movie recommendations and amazon.in for product recommendations.', 'duration': 154.338, 'highlights': ['AI is defined as creating applications capable of autonomous decision-making. Artificial Intelligence is the process of creating applications that can perform tasks without human intervention, such as providing movie recommendations in Netflix and product recommendations in Amazon.in.', 'Examples of AI applications include AI modules in Netflix for movie recommendations and Amazon.in for product recommendations. Netflix and Amazon.in both utilize AI modules to provide personalized recommendations based on user behavior, such as recommending action movies when a user frequently watches them and suggesting complementary products like headphones when a user buys an iPhone.', 'The ads in YouTube channels are also recommended using an AI engine, contributing to business-driven goals. AI engines integrated into YouTube channels are responsible for recommending ads to users, contributing to business-driven goals and providing a source of income for content creators.']}, {'end': 530.308, 'start': 251.79, 'title': 'Ai, machine learning, and data science overview', 'summary': 'Explains the roles of ai, machine learning, and data science, highlighting their interconnections and emphasizing the goal of creating ai applications. it provides insights into the subsets of ai, the role of deep learning in mimicking human brain, and the broad scope of responsibilities of a data scientist, including the types of machine learning algorithms used in solving business use cases.', 'duration': 278.518, 'highlights': ['AI applications are prevalent in various domains, including self-driving cars, where AI integrates with the vehicle to drive automatically, demonstrating the widespread implementation and impact of AI.', 'Machine learning serves as a subset of AI, providing statistical tools for data analysis, visualization, predictions, and forecasting, emphasizing its crucial role in AI development.', 'Deep learning, a subset of machine learning, aims to mimic the human brain through multi-layered neural networks, enabling the training of machines or applications for solving complex use cases, showcasing its significant advancements in AI technology.', 'Data scientists have a broad range of responsibilities, including utilizing machine learning and deep learning algorithms to solve business use cases, emphasizing the integration of various AI components in their role.', 'The majority of business use cases fall into supervised machine learning and unsupervised machine learning, with problem statements encompassing regression, classification, clustering, and dimensionality reduction, showcasing the diverse applications of machine learning in real-world scenarios.']}, {'end': 1074.629, 'start': 530.308, 'title': 'Supervised and unsupervised machine learning', 'summary': 'Discusses the key concepts of supervised and unsupervised machine learning, explaining the components of supervised machine learning, the difference between regression and classification, and the applications of clustering and dimensionality reduction in unsupervised machine learning.', 'duration': 544.321, 'highlights': ['Supervised machine learning involves training a model to predict output based on input features, with independent and dependent features, and distinguishes between regression and classification problems. Supervised machine learning involves training a model to predict output based on input features, with independent and dependent features. It distinguishes between regression, involving continuous variables, and classification, involving fixed categories.', 'Clustering in unsupervised machine learning involves grouping similar data points to identify customer segments for targeted marketing, while dimensionality reduction aims to reduce the number of features for more efficient processing. Clustering in unsupervised machine learning involves grouping similar data points to identify customer segments for targeted marketing. Dimensionality reduction aims to reduce the number of features for more efficient processing, using algorithms like PCA.', 'The chapter outlines a comprehensive list of supervised and unsupervised machine learning algorithms to be covered, including linear regression, ridge and lasso, logistic regression, decision tree, AdaBoost, random forest, Gradient Boosting, XGBoost, and K-Means, DBSCAN, Hercule Clustering, and K-Nearest Neighbor Clustering. The chapter outlines a comprehensive list of supervised and unsupervised machine learning algorithms to be covered, including linear regression, ridge and lasso, logistic regression, decision tree, AdaBoost, random forest, Gradient Boosting, XGBoost, and K-Means, DBSCAN, Hercule Clustering, and K-Nearest Neighbor Clustering.']}], 'duration': 977.557, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY97072.jpg', 'highlights': ['AI is creating applications for autonomous decision-making, e.g., Netflix and Amazon.in.', 'AI applications include personalized recommendations in Netflix and Amazon.in.', 'AI engines in YouTube channels recommend ads, contributing to business goals.', 'AI is prevalent in self-driving cars, showcasing its widespread impact.', 'Machine learning provides statistical tools for data analysis and predictions.', 'Deep learning aims to mimic the human brain through multi-layered neural networks.', 'Data scientists utilize machine learning and deep learning algorithms for business use cases.', 'Supervised machine learning involves training a model to predict output based on input features.', 'Unsupervised machine learning involves clustering and dimensionality reduction.', 'Comprehensive list of supervised and unsupervised machine learning algorithms is outlined.']}, {'end': 2941.44, 'segs': [{'end': 2461.679, 'src': 'embed', 'start': 2432.677, 'weight': 5, 'content': [{'end': 2438.821, 'text': 'Because see, out of all these three lines, which is the best fit line? This is the best fit line, right? This is the best fit line.', 'start': 2432.677, 'duration': 6.144}, {'end': 2444.826, 'text': 'When I had this best fit line, my point that came over here was here itself.', 'start': 2439.602, 'duration': 5.224}, {'end': 2451.811, 'text': 'This was my point that came over here, right? And I want to basically come to this region because this is my global minima.', 'start': 2445.186, 'duration': 6.625}, {'end': 2461.679, 'text': 'When I basically am over here, the distance between the predicted and the real point is very, very less right?', 'start': 2453.874, 'duration': 7.805}], 'summary': 'Identifying the best fit line for a point and minimizing the distance between predicted and real points.', 'duration': 29.002, 'max_score': 2432.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY2432677.jpg'}, {'end': 2512.104, 'src': 'embed', 'start': 2473.826, 'weight': 1, 'content': [{'end': 2479.029, 'text': 'Here also you are assuming many things, right? And then you are probably calculating and you are creating this gradient descent.', 'start': 2473.826, 'duration': 5.203}, {'end': 2486.276, 'text': 'But the thing should be that probably you come to one point over here and then you reach towards this.', 'start': 2479.873, 'duration': 6.403}, {'end': 2489.518, 'text': 'So, for that specific reason, how do you do that??', 'start': 2486.956, 'duration': 2.562}, {'end': 2494.4, 'text': 'How do I first of all come to a point and then move towards this global minima?', 'start': 2490.298, 'duration': 4.102}, {'end': 2498.981, 'text': 'So, for that specific case, we will be using one convergence algorithm.', 'start': 2495.18, 'duration': 3.801}, {'end': 2508.143, 'text': 'Because if I come to one specific point, after that I just need to keep on updating theta1 instead of using different different theta1 value.', 'start': 2499.501, 'duration': 8.642}, {'end': 2512.104, 'text': 'So for this, we use something called as convergence algorithm.', 'start': 2508.923, 'duration': 3.181}], 'summary': 'Discussing the use of convergence algorithm for reaching global minima.', 'duration': 38.278, 'max_score': 2473.826, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY2473826.jpg'}, {'end': 2596.827, 'src': 'embed', 'start': 2541.719, 'weight': 7, 'content': [{'end': 2549.083, 'text': 'And then it will be derivative of theta j with respect to this j of theta zero.', 'start': 2541.719, 'duration': 7.364}, {'end': 2551.608, 'text': 'and theta1.', 'start': 2550.808, 'duration': 0.8}, {'end': 2554.07, 'text': 'So this should happen.', 'start': 2552.389, 'duration': 1.681}, {'end': 2559.412, 'text': 'that basically means after we reach to a specific point of theta.', 'start': 2554.07, 'duration': 5.342}, {'end': 2565.075, 'text': 'after performing this particular operation, we should be able to come to the global minima.', 'start': 2559.412, 'duration': 5.663}, {'end': 2569.677, 'text': 'And this specific thing that you are able to see is called as derivative.', 'start': 2565.675, 'duration': 4.002}, {'end': 2572.798, 'text': 'This is called as derivative.', 'start': 2571.318, 'duration': 1.48}, {'end': 2576.16, 'text': 'Derivative basically means I am trying to find out the slope.', 'start': 2572.838, 'duration': 3.322}, {'end': 2580.089, 'text': 'Derivative, which I can also say it as slope.', 'start': 2577.685, 'duration': 2.404}, {'end': 2582.493, 'text': 'This equation will definitely work guys.', 'start': 2580.77, 'duration': 1.723}, {'end': 2584.376, 'text': 'Trust me, this will definitely work.', 'start': 2582.934, 'duration': 1.442}, {'end': 2586.68, 'text': "Why it will work? I'll just draw it, show it to you.", 'start': 2584.496, 'duration': 2.184}, {'end': 2588.683, 'text': "Let's say that this is my cost function.", 'start': 2587.08, 'duration': 1.603}, {'end': 2591.908, 'text': "Let's say that I've got this gradient descent.", 'start': 2590.185, 'duration': 1.723}, {'end': 2596.827, 'text': "And let's say that my first point is somewhere here.", 'start': 2593.706, 'duration': 3.121}], 'summary': 'Derivative of theta is used to find global minima in gradient descent.', 'duration': 55.108, 'max_score': 2541.719, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY2541719.jpg'}, {'end': 2832.692, 'src': 'embed', 'start': 2807.189, 'weight': 0, 'content': [{'end': 2812.632, 'text': 'If I select a small number, then it will start taking small small steps to move towards the optimal minima.', 'start': 2807.189, 'duration': 5.443}, {'end': 2818.067, 'text': 'But if I take an alpha value, a huge value, if it is a huge value, then what will happen?', 'start': 2813.686, 'duration': 4.381}, {'end': 2828.611, 'text': 'This updation of the theta1 will keep on jumping here and there and the situation will be that it will never reach the global minima.', 'start': 2818.628, 'duration': 9.983}, {'end': 2832.692, 'text': 'So it is a very very good decision to take our alpha as small value.', 'start': 2829.271, 'duration': 3.421}], 'summary': 'Small alpha leads to small steps for optimal minima.', 'duration': 25.503, 'max_score': 2807.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY2807189.jpg'}], 'start': 1075.109, 'title': 'Linear regression fundamentals', 'summary': "Covers linear regression basics, introduction to linear regression, cost function and minimization, gradient descent and cost function analysis, and convergence algorithm in linear regression, discussing andrew ng's notation, equations for best fit line, cost function j, and global minima, aiming to predict output based on input features involving squared error function and specialized algorithms in deep learning.", 'chapters': [{'end': 1243.414, 'start': 1075.109, 'title': 'Linear regression basics', 'summary': "Covers the fundamentals of linear regression, including problem statement, model training, hypothesis testing, and best fit line creation, aiming to predict output based on input features, as well as the equations used for linear line representation. andrew ng's notation for linear regression is credited.", 'duration': 168.305, 'highlights': ['The training dataset is used to create a model through hypothesis testing, aiming to predict output based on input features, such as age and weight, and verify its performance using performance metrics.', 'The concept of linear regression involves finding the best fit line to predict output based on input features, defining the relationship between y and x as a linear function.', 'The various equations, including y = mx + c, y = beta 0 + beta 1 * x, and theta of x = theta 0 + theta 1 * x, are used to represent the linear line in linear regression.', 'The credit for the linear regression algorithm goes to Andrew Ng, based on his notation and explanation of the fundamentals.']}, {'end': 1620.941, 'start': 1243.814, 'title': 'Introduction to linear regression', 'summary': 'Introduces the concept of linear regression and explains the equations for a best fit line, intercept, slope, and cost function, emphasizing the objective of minimizing the distance between predicted and actual points.', 'duration': 377.127, 'highlights': ["The equation of a best fit line hθ = θ0 + θ1 * x is demonstrated, emphasizing the role of intercept (θ0) and slope (θ1) in representing the line's characteristics. The equation hθ = θ0 + θ1 * x representing the best fit line is explained, highlighting the significance of intercept (θ0) and slope (θ1) in defining the line's properties.", "The objective of linear regression is to minimize the distance between predicted and actual points, achieved through the comparison of the best fit line's predictions with the given data points. The goal of linear regression is to minimize the distance between predicted and actual points by comparing the best fit line's predictions with the given data points.", 'The concept of cost function is introduced, emphasizing its role in quantifying the distance between predicted and actual points, essential for the derivation and evaluation of the best fit line. The introduction of the cost function is highlighted as crucial for quantifying the distance between predicted and actual points, necessary for deriving and evaluating the best fit line.']}, {'end': 1918.214, 'start': 1621.422, 'title': 'Cost function and minimization', 'summary': 'Discusses the cost function j of theta zero and theta one and the minimization task by adjusting parameters theta 0 and theta 1, in order to find the best fit line for the equation hθ of x = θ0 + θ1 * x, involving squared error function and the need to minimize the 1 by 2m summation of hθ of x of i minus y of i whole square.', 'duration': 296.792, 'highlights': ['The chapter discusses the cost function J of theta zero and theta one and the minimization task by adjusting parameters theta 0 and theta 1. The discussion revolves around the cost function J of theta zero and theta one and the need to minimize it by adjusting parameters theta 0 and theta 1 in order to find the best fit line.', 'Involvement of squared error function in the cost function and the need to minimize the 1 by 2m summation of hθ of x of i minus y of i whole square. It involves the squared error function in the cost function and highlights the need to minimize the 1 by 2m summation of hθ of x of i minus y of i whole square.', 'Explanation of the equation hθ of x = θ0 + θ1 * x and its significance when θ0 is 0, indicating the best fit line passing through the origin. An explanation of the equation hθ of x = θ0 + θ1 * x and its significance when θ0 is 0, implying that the best fit line passes through the origin.']}, {'end': 2494.4, 'start': 1918.935, 'title': 'Gradient descent and cost function analysis', 'summary': 'Explains the concept of gradient descent and cost function analysis through the example of different theta 1 values, showcasing their impact on the cost function j of theta 1, demonstrating how the slope affects the best-fit line and identifying the global minima.', 'duration': 575.465, 'highlights': ['The cost function j of theta 1 is calculated for different theta 1 values, revealing that when theta 1 is 1, the cost function is zero, indicating a perfect fit for the best-fit line. When theta 1 is 1, the cost function j of theta 1 is zero, demonstrating a perfect fit for the best-fit line.', 'With theta 1 as 0.5, the cost function j of theta 1 is approximately 0.58, showcasing a relatively higher cost than when theta 1 is 1. When theta 1 is 0.5, the cost function j of theta 1 is approximately 0.58, indicating a relatively higher cost than when theta 1 is 1.', 'When theta 1 is 0, the cost function j of theta 1 is approximately 2.3, highlighting a significantly higher cost and demonstrating the impact of different slope values on the cost function. When theta 1 is 0, the cost function j of theta 1 is approximately 2.3, showcasing a significantly higher cost and demonstrating the impact of different slope values on the cost function.', 'The concept of gradient descent is introduced as a method to identify the global minima, crucial for determining the best-fit line with the least distance between predicted and real points. The concept of gradient descent is introduced as a method to identify the global minima, crucial for determining the best-fit line with the least distance between predicted and real points.']}, {'end': 2941.44, 'start': 2495.18, 'title': 'Convergence algorithm in linear regression', 'summary': 'Explains the convergence algorithm used in linear regression, detailing the iterative process and the impact of learning rate on converging to the global minima, emphasizing the avoidance of local minima and the need for specialized algorithms in deep learning.', 'duration': 446.26, 'highlights': ['The convergence algorithm is an iterative process that updates theta1 to reach the global minima, ensuring optimization in linear regression. The convergence algorithm updates theta1 through continuous iteration to reach the global minima, ensuring optimization in linear regression.', 'The impact of learning rate is crucial, with a smaller value leading to slower convergence but reducing the risk of overshooting, while a larger value can cause erratic behavior and hinder convergence. The learning rate determines the speed of convergence, with a smaller value leading to slower but more stable progress, while a larger value can cause erratic behavior and hinder convergence.', 'The avoidance of local minima is emphasized, as the gradient descent in linear regression typically avoids getting stuck, while specialized algorithms like RMS prop and Adam optimizers are utilized in deep learning to address this challenge. In linear regression, the gradient descent typically avoids getting stuck in local minima, while specialized algorithms like RMS prop and Adam optimizers are used in deep learning to address this challenge.']}], 'duration': 1866.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY1075109.jpg', 'highlights': ['The training dataset is used to create a model through hypothesis testing, aiming to predict output based on input features, such as age and weight, and verify its performance using performance metrics.', 'The concept of linear regression involves finding the best fit line to predict output based on input features, defining the relationship between y and x as a linear function.', "The equation of a best fit line hθ = θ0 + θ1 * x is demonstrated, emphasizing the role of intercept (θ0) and slope (θ1) in representing the line's characteristics.", 'The concept of cost function is introduced, emphasizing its role in quantifying the distance between predicted and actual points, essential for the derivation and evaluation of the best fit line.', 'The cost function j of theta 1 is calculated for different theta 1 values, revealing that when theta 1 is 1, the cost function is zero, indicating a perfect fit for the best-fit line.', 'The concept of gradient descent is introduced as a method to identify the global minima, crucial for determining the best-fit line with the least distance between predicted and real points.', 'The convergence algorithm updates theta1 through continuous iteration to reach the global minima, ensuring optimization in linear regression.', 'The learning rate determines the speed of convergence, with a smaller value leading to slower but more stable progress, while a larger value can cause erratic behavior and hinder convergence.', 'In linear regression, the gradient descent typically avoids getting stuck in local minima, while specialized algorithms like RMS prop and Adam optimizers are used in deep learning to address this challenge.']}, {'end': 4132.877, 'segs': [{'end': 3228.72, 'src': 'embed', 'start': 3200.736, 'weight': 7, 'content': [{'end': 3203.958, 'text': 'Okay, theta 1 of x will become a constant in this particular case.', 'start': 3200.736, 'duration': 3.222}, {'end': 3212.523, 'text': 'In this case, because theta 1 of x is there, so if I try to find out derivative of theta 1 into x, only I will be getting x.', 'start': 3204.858, 'duration': 7.665}, {'end': 3216.104, 'text': "Why square will not be there? It's easy, right? x square means 2x.", 'start': 3212.523, 'duration': 3.581}, {'end': 3220.927, 'text': 'This is the derivative of x square, right? So that square went and 1 by 2, 2 by 2 got cancelled.', 'start': 3216.585, 'duration': 4.342}, {'end': 3225.919, 'text': 'So this will be now my convergence algorithm.', 'start': 3223.237, 'duration': 2.682}, {'end': 3228.72, 'text': 'So here we have discussed about linear regression.', 'start': 3226.099, 'duration': 2.621}], 'summary': 'Derivation of x yields x in linear regression convergence algorithm.', 'duration': 27.984, 'max_score': 3200.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY3200736.jpg'}, {'end': 3582.774, 'src': 'heatmap', 'start': 3341.61, 'weight': 0.717, 'content': [{'end': 3344.231, 'text': "It's just like coming down a mountain.", 'start': 3341.61, 'duration': 2.621}, {'end': 3349.833, 'text': "Now let's discuss about two performance metrics which is important in this particular case.", 'start': 3345.391, 'duration': 4.442}, {'end': 3355.034, 'text': 'One is R square and adjusted R square.', 'start': 3350.313, 'duration': 4.721}, {'end': 3363.727, 'text': 'We usually use this performance matrix to verify how our model is and how good our model is with respect to linear regression.', 'start': 3357.842, 'duration': 5.885}, {'end': 3369.953, 'text': 'So R square is basically given, R square is a performance matrix to check how good this specific model is.', 'start': 3364.188, 'duration': 5.765}, {'end': 3377.619, 'text': 'So here we basically have a formula which is like 1 minus sum of residual divided by sum of total.', 'start': 3370.573, 'duration': 7.046}, {'end': 3380.019, 'text': 'Now this is the formula of R square.', 'start': 3378.598, 'duration': 1.421}, {'end': 3383.483, 'text': 'Now what is this sum of residual? I can basically write like this.', 'start': 3380.66, 'duration': 2.823}, {'end': 3389.248, 'text': 'Summation of yi minus yi hat whole square.', 'start': 3383.503, 'duration': 5.745}, {'end': 3392.211, 'text': 'This yi hat is nothing but h theta of x.', 'start': 3389.528, 'duration': 2.683}, {'end': 3393.172, 'text': 'Just consider in this way.', 'start': 3392.211, 'duration': 0.961}, {'end': 3402.194, 'text': 'Divided by summation of Y of I minus Y mean Y, mean Y whole square.', 'start': 3393.892, 'duration': 8.302}, {'end': 3403.094, 'text': 'This is the formula.', 'start': 3402.374, 'duration': 0.72}, {'end': 3406.315, 'text': "I'll try to explain you what this formula definitely says.", 'start': 3403.114, 'duration': 3.201}, {'end': 3412.618, 'text': "Okay So first thing first, let's consider that this is my, this is my problem statement that I'm trying to solve.", 'start': 3406.616, 'duration': 6.002}, {'end': 3414.338, 'text': 'Suppose these are my data points.', 'start': 3413.078, 'duration': 1.26}, {'end': 3422.347, 'text': 'And if I try to create the best fit line, This yi hat, yi hat basically means this specific point.', 'start': 3415.278, 'duration': 7.069}, {'end': 3427.191, 'text': 'We are trying to find out the difference between these things.', 'start': 3422.527, 'duration': 4.664}, {'end': 3429.033, 'text': "Let's say that these are my points.", 'start': 3427.812, 'duration': 1.221}, {'end': 3431.935, 'text': 'I am trying to find out the difference between this predicted.', 'start': 3429.573, 'duration': 2.362}, {'end': 3433.136, 'text': 'This is my predicted.', 'start': 3432.255, 'duration': 0.881}, {'end': 3437.44, 'text': 'The point in green color are my predicted points which I have denoted as yi hat.', 'start': 3433.176, 'duration': 4.264}, {'end': 3442.555, 'text': 'And always understand, This is what sum of residual is.', 'start': 3438.44, 'duration': 4.115}, {'end': 3447.559, 'text': 'Sum of residual is nothing but difference between this point to this point, this point to this point, this point to this point,', 'start': 3442.595, 'duration': 4.964}, {'end': 3448.36, 'text': 'this point to this point.', 'start': 3447.559, 'duration': 0.801}, {'end': 3450.421, 'text': 'And I am doing all the summation of those.', 'start': 3448.74, 'duration': 1.681}, {'end': 3455.004, 'text': 'Now, the next point which is very much important, here is my x and y.', 'start': 3451.562, 'duration': 3.442}, {'end': 3459.123, 'text': 'What is this? yi minus y, y bar.', 'start': 3456.081, 'duration': 3.042}, {'end': 3463.005, 'text': 'y bar is nothing but mean, mean of y.', 'start': 3459.783, 'duration': 3.222}, {'end': 3467.208, 'text': 'If I calculate the mean of y, then I will probably get a line which looks like this.', 'start': 3463.005, 'duration': 4.203}, {'end': 3469.329, 'text': 'I will get a line something like this.', 'start': 3467.228, 'duration': 2.101}, {'end': 3477.855, 'text': 'And then I will probably try to calculate the distance between each and every point and this specific point with respect to the distance between this point and this point.', 'start': 3469.53, 'duration': 8.325}, {'end': 3481.091, 'text': 'the denominator will definitely be high, right?', 'start': 3478.59, 'duration': 2.501}, {'end': 3486.172, 'text': 'This value, obviously this value, will be higher than this value, right?', 'start': 3481.151, 'duration': 5.021}, {'end': 3488.152, 'text': 'The reason why it will be higher?', 'start': 3486.652, 'duration': 1.5}, {'end': 3491.473, 'text': 'because the mean of this particular value, distance, will obviously be higher.', 'start': 3488.152, 'duration': 3.321}, {'end': 3498.854, 'text': 'So this 1 minus high, this will be a low value and this will be a high value.', 'start': 3492.273, 'duration': 6.581}, {'end': 3507.946, 'text': 'When I try to divide low by high, low by high, then obviously this entire number will become a small number.', 'start': 3500.035, 'duration': 7.911}, {'end': 3512.168, 'text': 'When this is a small number, 1 minus small number will be a big number.', 'start': 3508.326, 'duration': 3.842}, {'end': 3516.732, 'text': 'So this basically shows that our R square has fitted properly.', 'start': 3512.769, 'duration': 3.963}, {'end': 3520.514, 'text': 'It has basically got a very good R square.', 'start': 3517.792, 'duration': 2.722}, {'end': 3527.239, 'text': "Now tell me, can I get this entire R square a negative number? Let's say that in this particular case, I got 90%.", 'start': 3520.634, 'duration': 6.605}, {'end': 3531.542, 'text': 'Can I get this R square as negative number? There will be situation guys.', 'start': 3527.239, 'duration': 4.303}, {'end': 3540.932, 'text': 'What if I create a best fit line which looks like this? If I create this best fit line, which looks like this, then this value will be quite high.', 'start': 3531.662, 'duration': 9.27}, {'end': 3548.197, 'text': 'It is only possible when this value will be higher than higher than this value.', 'start': 3541.753, 'duration': 6.444}, {'end': 3556.322, 'text': "Okay But in a usual scenario, it will not happen because obviously we'll try to fit a line, which will be at least good.", 'start': 3549.658, 'duration': 6.664}, {'end': 3559.604, 'text': "It's not just like pulling one line somewhere.", 'start': 3557.022, 'duration': 2.582}, {'end': 3564.467, 'text': "We don't want to create a best fit line, which is worse than this, right? Worse than this.", 'start': 3560.784, 'duration': 3.683}, {'end': 3569.249, 'text': 'So, in this particular scenario, you will be saying that in R square.', 'start': 3565.447, 'duration': 3.802}, {'end': 3573.35, 'text': 'now, here you will be able to see one amazing feature about R square.', 'start': 3569.249, 'duration': 4.101}, {'end': 3575.451, 'text': "is that, let's say, one scenario?", 'start': 3573.35, 'duration': 2.101}, {'end': 3582.774, 'text': "Suppose I have features like, let's say that my feature is something like, let's say I have a price of a house.", 'start': 3576.131, 'duration': 6.643}], 'summary': "The r square and adjusted r square performance metrics verify the model's goodness of fit with respect to linear regression, ensuring a proper fit and preventing negative values.", 'duration': 241.164, 'max_score': 3341.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY3341610.jpg'}, {'end': 3670.095, 'src': 'embed', 'start': 3626.829, 'weight': 1, 'content': [{'end': 3627.989, 'text': 'And let me change the example.', 'start': 3626.829, 'duration': 1.16}, {'end': 3631.832, 'text': "See, first case I got my R square as 85%, let's say.", 'start': 3628.55, 'duration': 3.282}, {'end': 3635.335, 'text': 'Now, as soon as I added location, I got a 90%.', 'start': 3632.853, 'duration': 2.482}, {'end': 3639.518, 'text': "Now let's say that I added one more feature with gender is going to stay.", 'start': 3635.335, 'duration': 4.183}, {'end': 3642.513, 'text': 'gender like male or female is going to stay.', 'start': 3640.712, 'duration': 1.801}, {'end': 3645.216, 'text': 'You know that gender is no way correlated to price.', 'start': 3642.954, 'duration': 2.262}, {'end': 3653.142, 'text': 'But even though I add one feature, there is a scenario that my R square will still increase and it may become 91%.', 'start': 3645.976, 'duration': 7.166}, {'end': 3657.285, 'text': 'Even though my feature is not that important.', 'start': 3653.142, 'duration': 4.143}, {'end': 3659.587, 'text': 'even gender is not that important.', 'start': 3657.285, 'duration': 2.302}, {'end': 3668.333, 'text': 'the R square formula works in such a way that if I keep on adding features and that are not nowhere correlated, this is obviously nowhere correlated.', 'start': 3659.587, 'duration': 8.746}, {'end': 3670.095, 'text': 'This is not correlated with price.', 'start': 3668.393, 'duration': 1.702}], 'summary': 'Adding unrelated features increased r square from 90% to 91%.', 'duration': 43.266, 'max_score': 3626.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY3626829.jpg'}, {'end': 3743.037, 'src': 'embed', 'start': 3713.098, 'weight': 2, 'content': [{'end': 3718.621, 'text': 'So in order to prevent this situation, what we do, we basically use something called as adjusted R square.', 'start': 3713.098, 'duration': 5.523}, {'end': 3723.004, 'text': "Now what is this adjusted R square and how it will work? I'll also show it to you.", 'start': 3718.802, 'duration': 4.202}, {'end': 3725.446, 'text': 'Very, very nice concept of adjusted R square.', 'start': 3723.344, 'duration': 2.102}, {'end': 3731.009, 'text': 'So adjusted R square, R square adjusted is given by the formula.', 'start': 3725.886, 'duration': 5.123}, {'end': 3743.037, 'text': 'is given by the formula 1 minus 1 minus r square multiplied by n minus 1, where n is the total number of samples, n minus p minus 1.', 'start': 3732.395, 'duration': 10.642}], 'summary': 'Adjusted r square is calculated using 1 - (1 - r square) * (n - 1), where n is the total number of samples.', 'duration': 29.939, 'max_score': 3713.098, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY3713098.jpg'}, {'end': 4139.282, 'src': 'embed', 'start': 4111.56, 'weight': 0, 'content': [{'end': 4118.185, 'text': 'if i try to calculate the cost function, what will be the value of j of theta 0, comma theta 1?', 'start': 4111.56, 'duration': 6.625}, {'end': 4122.729, 'text': "let's say that in this particular case, since it is passing through the origin, my theta 0 will be 0.", 'start': 4118.185, 'duration': 4.544}, {'end': 4127.794, 'text': 'okay, So what will be the value of So?', 'start': 4122.729, 'duration': 5.065}, {'end': 4132.877, 'text': 'here? obviously you can see that there is no difference, so it will obviously become 0..', 'start': 4127.794, 'duration': 5.083}, {'end': 4139.282, 'text': 'Now understand this data that you see, right? This data is basically called as training data.', 'start': 4132.877, 'duration': 6.405}], 'summary': 'Calculating cost function for theta 0 and theta 1, results in theta 0 as 0 and sum of error as 0.', 'duration': 27.722, 'max_score': 4111.56, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4111560.jpg'}], 'start': 2941.44, 'title': 'Linear regression and metrics', 'summary': 'Covers the gradient descent algorithm, emphasizing convergence and update formulas, and discusses r square and adjusted r square as performance metrics, with a scenario demonstrating the impact of adding an irrelevant feature. it also explains the concept and calculation of adjusted r square in regression analysis, highlighting its significance and practical applications.', 'chapters': [{'end': 3336.449, 'start': 2941.44, 'title': 'Understanding gradient descent algorithm', 'summary': 'Explains the gradient descent algorithm in deep learning, emphasizing the importance of convergence and the update formulas for theta 0 and theta 1 in the context of linear regression.', 'duration': 395.009, 'highlights': ['The chapter emphasizes the importance of convergence in the gradient descent algorithm and explains that convergence will occur when J of theta is minimized to a very low value, providing a key insight into the algorithm.', 'The update formulas for theta 0 and theta 1 are provided, indicating the iterative process involved in the gradient descent algorithm, with the learning rate alpha and the summation of h theta of x and y of i being essential components for updating the values.', 'The concept of derivative of theta 1 with respect to theta 1 x is explained, highlighting the role of x in the update process, providing clarity on the derivative calculation within the algorithm.', "The discussion of R square and adjusted R square in the context of convex functions and multiple features is presented, providing insights into the complexity and visualization of gradient descent in higher dimensions, offering a broader perspective on the algorithm's application."]}, {'end': 3512.168, 'start': 3336.469, 'title': 'Performance metrics in linear regression', 'summary': "Discusses r square and adjusted r square as performance metrics for linear regression, explaining the formulas and their significance in evaluating the model's goodness of fit and the variance explained, with emphasis on the calculation of summation of residuals and the impact of mean value on the denominator.", 'duration': 175.699, 'highlights': ['Explanation of the R square performance metric and its formula The chapter provides a detailed explanation of the R square performance metric used to check the goodness of a specific linear regression model, presenting the formula 1 minus sum of residual divided by sum of total.', 'Calculation of the sum of residuals and its significance The speaker delves into the calculation of the sum of residuals, illustrating it as the difference between predicted and actual points, emphasizing its role in evaluating model accuracy.', 'Impact of mean value on the denominator in the formula The chapter highlights the influence of the mean value of y on the denominator, explaining how it affects the variance and the resulting impact on the overall R square value.']}, {'end': 3712.377, 'start': 3512.769, 'title': 'R square and model impact', 'summary': 'Explains the concept of r square, its significance in model fitting, and the impact of adding irrelevant features on r square, exemplifying a scenario where an irrelevant feature increases r square from 90% to 91%.', 'duration': 199.608, 'highlights': ['Explanation of R square and its significance in model fitting The chapter provides an explanation of the significance of R square in model fitting.', 'Impact of adding irrelevant features on R square The transcript highlights the impact of adding irrelevant features on R square, showcasing a scenario where an irrelevant feature increases R square from 90% to 91%.', 'Illustration of how adding correlated features can increase R square The transcript illustrates how adding correlated features, such as location, can increase R square, demonstrating an increase from 85% to 90% with the addition of a correlated feature.']}, {'end': 4132.877, 'start': 3713.098, 'title': 'Adjusted r square in regression analysis', 'summary': 'Explains the concept of adjusted r square in regression analysis, demonstrating how the adjusted r square value is calculated and how it is affected by the number of predictors, with an emphasis on the importance of understanding this concept in statistical analysis and its practical applications.', 'duration': 419.779, 'highlights': ['The concept of adjusted R square is explained, with the formula 1 - (1 - r square) * (n - 1) / (n - p - 1), where n is the total number of samples and p is the number of features or predictors, and its significance in evaluating the correlation between predictors and the dependent variable is emphasized.', 'The impact of the number of predictors on the adjusted R square value is illustrated through scenarios where increasing the number of predictors leads to changes in the adjusted R square value, highlighting the importance of considering the correlation between predictors and the dependent variable in statistical analysis.', 'The practical implications of adjusted R square are emphasized, stressing its role in evaluating the impact of the number of predictors on the goodness of fit in regression analysis and its relevance in understanding the trade-off between model complexity and explanatory power.', 'A roadmap for upcoming discussions on topics including ridge and lasso regression, assumptions of linear regression, logistic regression, and confusion matrix is outlined, providing an overview of the subsequent content to be covered and setting the context for further exploration of statistical analysis techniques and their applications.']}], 'duration': 1191.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY2941440.jpg', 'highlights': ['The chapter emphasizes the importance of convergence in the gradient descent algorithm and explains that convergence will occur when J of theta is minimized to a very low value, providing a key insight into the algorithm.', 'The concept of adjusted R square is explained, with the formula 1 - (1 - r square) * (n - 1) / (n - p - 1), where n is the total number of samples and p is the number of features or predictors, and its significance in evaluating the correlation between predictors and the dependent variable is emphasized.', 'The update formulas for theta 0 and theta 1 are provided, indicating the iterative process involved in the gradient descent algorithm, with the learning rate alpha and the summation of h theta of x and y of i being essential components for updating the values.', 'The chapter provides a detailed explanation of the R square performance metric used to check the goodness of a specific linear regression model, presenting the formula 1 minus sum of residual divided by sum of total.', 'The impact of adding irrelevant features on R square is highlighted, showcasing a scenario where an irrelevant feature increases R square from 90% to 91%.', 'The concept of derivative of theta 1 with respect to theta 1 x is explained, highlighting the role of x in the update process, providing clarity on the derivative calculation within the algorithm.', 'The practical implications of adjusted R square are emphasized, stressing its role in evaluating the impact of the number of predictors on the goodness of fit in regression analysis and its relevance in understanding the trade-off between model complexity and explanatory power.', "The discussion of R square and adjusted R square in the context of convex functions and multiple features is presented, providing insights into the complexity and visualization of gradient descent in higher dimensions, offering a broader perspective on the algorithm's application."]}, {'end': 5579.13, 'segs': [{'end': 4254.43, 'src': 'embed', 'start': 4181.14, 'weight': 0, 'content': [{'end': 4188.285, 'text': "So if my new data points are here, let's consider that I want to basically come up with this new data point.", 'start': 4181.14, 'duration': 7.145}, {'end': 4194.81, 'text': "Now in this particular scenario, if I want to predict with respect to this particular point, let's say my predicted point is here.", 'start': 4188.585, 'duration': 6.225}, {'end': 4199.152, 'text': 'Is this the difference between the predicted and the real point?', 'start': 4195.97, 'duration': 3.182}, {'end': 4199.712, 'text': 'quite huge?', 'start': 4199.152, 'duration': 0.56}, {'end': 4201.953, 'text': 'Yes or no??', 'start': 4201.453, 'duration': 0.5}, {'end': 4207.796, 'text': 'So this is basically creating a condition which is called as overfitting.', 'start': 4202.614, 'duration': 5.182}, {'end': 4220.683, 'text': 'That basically means, even though my model has given or trained well with the training data, or let me write it down properly over here.', 'start': 4208.617, 'duration': 12.066}, {'end': 4228.31, 'text': 'So this condition, since you can see that over here, my each and every point is basically passing through the best fit line.', 'start': 4221.601, 'duration': 6.709}, {'end': 4233.256, 'text': 'So because of that, what happens? It causes something called as overfitting.', 'start': 4228.89, 'duration': 4.366}, {'end': 4236.62, 'text': 'So you really need to understand what is overfitting.', 'start': 4234.217, 'duration': 2.403}, {'end': 4250.168, 'text': 'Now what does overfitting mean? Overfitting basically means my model performs well with training data but it fails to perform well with test data.', 'start': 4237.401, 'duration': 12.767}, {'end': 4254.43, 'text': 'Now what is the test data over here? The test data is basically these points.', 'start': 4250.748, 'duration': 3.682}], 'summary': 'Overfitting occurs when model performs well with training data but fails with test data.', 'duration': 73.29, 'max_score': 4181.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4181140.jpg'}, {'end': 4370.489, 'src': 'embed', 'start': 4319.716, 'weight': 6, 'content': [{'end': 4321.137, 'text': 'It gives bad accuracy.', 'start': 4319.716, 'duration': 1.421}, {'end': 4322.238, 'text': "I'll say that model.", 'start': 4321.217, 'duration': 1.021}, {'end': 4328.624, 'text': 'Always remember, whenever I talk about bias, then you can understand that it is something related to the training data.', 'start': 4322.879, 'duration': 5.745}, {'end': 4334.289, 'text': 'Whenever I talk about test data, at that point of time, you talk about variance.', 'start': 4329.384, 'duration': 4.905}, {'end': 4339.953, 'text': 'And that specifically, whenever you talk about variance, that basically means we are talking about the test data.', 'start': 4334.669, 'duration': 5.284}, {'end': 4344.958, 'text': 'So for an overfitting, you will basically have low bias and high variance.', 'start': 4340.093, 'duration': 4.865}, {'end': 4351.84, 'text': 'low bias with respect to the training data and high variance with respect to the test data.', 'start': 4345.598, 'duration': 6.242}, {'end': 4362.464, 'text': 'now, if the model accuracy is bad with training data and the model accuracy is also bad with test data,', 'start': 4351.84, 'duration': 10.624}, {'end': 4367.766, 'text': 'in this scenario we basically say it as underfitting.', 'start': 4362.464, 'duration': 5.302}, {'end': 4370.489, 'text': 'so these are the two conditions that are.', 'start': 4367.766, 'duration': 2.723}], 'summary': 'Low accuracy indicates underfitting with high variance in test data.', 'duration': 50.773, 'max_score': 4319.716, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4319716.jpg'}, {'end': 4788.334, 'src': 'embed', 'start': 4755.067, 'weight': 10, 'content': [{'end': 4756.807, 'text': 'And then we will try to create a new line.', 'start': 4755.067, 'duration': 1.74}, {'end': 4757.968, 'text': 'When we.', 'start': 4757.208, 'duration': 0.76}, {'end': 4760.209, 'text': 'Sorry, it is 2, 2, not 3.', 'start': 4757.968, 'duration': 2.241}, {'end': 4760.969, 'text': 'Just a second, guys.', 'start': 4760.209, 'duration': 0.76}, {'end': 4763.11, 'text': '0 plus.', 'start': 4760.989, 'duration': 2.121}, {'end': 4768.293, 'text': '1 multiplied by 2 square which is nothing but 4.', 'start': 4764.609, 'duration': 3.684}, {'end': 4771.697, 'text': 'So now my cost function will not stop over here.', 'start': 4768.293, 'duration': 3.404}, {'end': 4774.42, 'text': 'So we are going to still reduce this.', 'start': 4772.418, 'duration': 2.002}, {'end': 4781.468, 'text': 'Now in order to reduce this, again theta1 value will get changed and then we will get a next best fit line for this point.', 'start': 4774.941, 'duration': 6.527}, {'end': 4788.334, 'text': 'Now what will happen in this scenario once we have this best fit line? We will definitely get a kind of small difference.', 'start': 4782.249, 'duration': 6.085}], 'summary': 'Cost function will be reduced, theta1 value will change for next best fit line.', 'duration': 33.267, 'max_score': 4755.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4755067.jpg'}, {'end': 4856.049, 'src': 'embed', 'start': 4829.93, 'weight': 1, 'content': [{'end': 4840.858, 'text': 'Now this small value plus 1 plus 1.3 square, or let me consider that my slope is now one simple value that is 5.', 'start': 4829.93, 'duration': 10.928}, {'end': 4845.401, 'text': 'So if I get this, it is 2.25, 2.25 plus small value.', 'start': 4840.858, 'duration': 4.543}, {'end': 4847.023, 'text': 'it will be less than 3 only, right?', 'start': 4845.401, 'duration': 1.622}, {'end': 4853.888, 'text': 'It will obviously be less than 3 or equal to 3, but understand what is happening the value is getting reduced from 4 to 3..', 'start': 4847.423, 'duration': 6.465}, {'end': 4856.049, 'text': 'So this is the importance of ridge.', 'start': 4853.888, 'duration': 2.161}], 'summary': "Ridge reduces the value from 4 to 3, ensuring it's less than 3.", 'duration': 26.119, 'max_score': 4829.93, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4829930.jpg'}, {'end': 5492.076, 'src': 'heatmap', 'start': 5248.493, 'weight': 0.775, 'content': [{'end': 5254.556, 'text': 'we are just trying to reduce the cost function in such a way that it will definitely never become zero,', 'start': 5248.493, 'duration': 6.063}, {'end': 5258.717, 'text': 'but it will basically reduce based on the lambda and the slope value.', 'start': 5254.556, 'duration': 4.161}, {'end': 5267.579, 'text': 'In most of the scenario, if you ask me, we should definitely try both the regularization and see that wherever the performance matrix is good,', 'start': 5259.357, 'duration': 8.222}, {'end': 5268.299, 'text': 'we should use that.', 'start': 5267.579, 'duration': 0.72}, {'end': 5274.741, 'text': 'What is cross validation basically means I will try to use different different lambda value and basically use it.', 'start': 5268.759, 'duration': 5.982}, {'end': 5278.142, 'text': 'So in a short, let me write it down again.', 'start': 5275.601, 'duration': 2.541}, {'end': 5288.8, 'text': "For ridge regression, which is an L2 norm, here I'm simply writing my cost function in this particular case will be little bit different.", 'start': 5278.993, 'duration': 9.807}, {'end': 5302.687, 'text': 'Here I can definitely write my cost function as h theta x of i, minus y of i whole square, plus lambda, multiplied by slope square.', 'start': 5289.38, 'duration': 13.307}, {'end': 5304.427, 'text': 'what is the purpose of this?', 'start': 5302.687, 'duration': 1.74}, {'end': 5306.367, 'text': 'the purpose is very simple.', 'start': 5304.427, 'duration': 1.94}, {'end': 5309.028, 'text': 'here we are preventing overfitting.', 'start': 5306.367, 'duration': 2.661}, {'end': 5311.509, 'text': 'this was with respect to the ridge regression, that is, l2.', 'start': 5309.028, 'duration': 2.481}, {'end': 5320.331, 'text': 'now. now, if i go ahead and discuss about the next one, which is called as lasso regression, which is also called as l1 regularization,', 'start': 5311.509, 'duration': 8.822}, {'end': 5334.34, 'text': 'in the case of lasso regression, your cost function will be h theta of x, of i, minus y of i whole square, plus lambda, multiplied by mode of slope.', 'start': 5320.331, 'duration': 14.009}, {'end': 5339.504, 'text': 'So here you have this specific thing and what is the purpose? The purpose are two.', 'start': 5334.981, 'duration': 4.523}, {'end': 5346.51, 'text': 'One is prevent overfitting and the second one is something called as feature selection.', 'start': 5340.165, 'duration': 6.345}, {'end': 5349.891, 'text': 'So, these two are the outcomes of the entire thing.', 'start': 5347.27, 'duration': 2.621}, {'end': 5353.193, 'text': 'See, with respect to this lasso, right, you have slopes.', 'start': 5350.131, 'duration': 3.062}, {'end': 5360.396, 'text': 'Slopes here, you will be having theta 0 plus theta 1 plus theta 2 plus theta 3, like this up to theta n.', 'start': 5353.753, 'duration': 6.643}, {'end': 5363.718, 'text': 'Now, when you will have this many number of thetas, when you have many number of features.', 'start': 5360.396, 'duration': 3.322}, {'end': 5367.602, 'text': "And when you have many number of features, that basically means you'll have multiple slopes, right?", 'start': 5364.398, 'duration': 3.204}, {'end': 5373.929, 'text': 'Those features that are not performing well or that has no contribution in finding out your output.', 'start': 5368.243, 'duration': 5.686}, {'end': 5376.332, 'text': 'that coefficient value will be almost nil, right?', 'start': 5373.929, 'duration': 2.403}, {'end': 5378.254, 'text': 'It will be very much near to zero.', 'start': 5376.372, 'duration': 1.882}, {'end': 5381.759, 'text': 'In short, you are neglecting that value by using modulus.', 'start': 5379.036, 'duration': 2.723}, {'end': 5382.9, 'text': 'You are not squaring them up.', 'start': 5381.779, 'duration': 1.121}, {'end': 5384.802, 'text': 'You are not increasing those values.', 'start': 5383.341, 'duration': 1.461}, {'end': 5391.769, 'text': 'Now, I will continue and probably I will also discuss about the assumptions of linear regressions.', 'start': 5385.303, 'duration': 6.466}, {'end': 5395.593, 'text': 'So what are the assumptions of linear regression in this particular scenario?', 'start': 5392.55, 'duration': 3.043}, {'end': 5406.065, 'text': 'So assumption is that number one point, Linear regression if our features are in normal or Gaussian distribution.', 'start': 5396.554, 'duration': 9.511}, {'end': 5414.056, 'text': 'If our features follows this particular distribution, it is obviously good, our model will get trained well.', 'start': 5407.707, 'duration': 6.349}, {'end': 5418.6, 'text': 'So there is one concept which is called as feature transformation.', 'start': 5415.097, 'duration': 3.503}, {'end': 5425.666, 'text': 'Now in feature transformation, always understand what will happen if a model does not follow a Gaussian distribution,', 'start': 5419.421, 'duration': 6.245}, {'end': 5431.811, 'text': 'then we apply some kind of mathematical equation onto the data and try to convert them into normal or Gaussian distribution.', 'start': 5425.666, 'duration': 6.145}, {'end': 5438.757, 'text': 'The second assumption that I would definitely like to make is that, Standard Scalar or Standard Digestion.', 'start': 5432.592, 'duration': 6.165}, {'end': 5445.323, 'text': 'Standard Digestion is nothing but it is a kind of scaling your data by using Z-score.', 'start': 5439.398, 'duration': 5.925}, {'end': 5447.165, 'text': 'I hope everybody remembers Z-score.', 'start': 5445.463, 'duration': 1.702}, {'end': 5449.427, 'text': 'This is what we basically apply.', 'start': 5447.846, 'duration': 1.581}, {'end': 5453.03, 'text': 'There your mean is equal to 0 and standard deviation equal to 1.', 'start': 5449.607, 'duration': 3.423}, {'end': 5458.914, 'text': 'See guys, wherever you have gradient descent involved, it is good to basically do standardization.', 'start': 5453.03, 'duration': 5.884}, {'end': 5466.22, 'text': 'Because if our initial point is a small point somewhere here, then to reach the global minima, our training will happen quickly.', 'start': 5459.695, 'duration': 6.525}, {'end': 5473.645, 'text': 'Otherwise, what will happen if your values are quite huge, then your graph may be very big and the point can come any over there.', 'start': 5466.98, 'duration': 6.665}, {'end': 5480.469, 'text': 'And the third point is that this linear regression works with respect to linearity.', 'start': 5474.265, 'duration': 6.204}, {'end': 5484.091, 'text': 'It works if your data is linearly separable.', 'start': 5480.569, 'duration': 3.522}, {'end': 5488.214, 'text': "I'll not say linearly separable, but this linearity will come into picture.", 'start': 5484.111, 'duration': 4.103}, {'end': 5492.076, 'text': 'If your data is too much linear, it will obviously be able to give a very good answer.', 'start': 5488.614, 'duration': 3.462}], 'summary': 'Using ridge and lasso regression to prevent overfitting and feature selection in linear regression, also discussing assumptions of linear regression.', 'duration': 243.583, 'max_score': 5248.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY5248493.jpg'}], 'start': 4132.877, 'title': 'Machine learning concepts', 'summary': 'Explores training data, cost minimization, overfitting, underfitting, generalization, ridge and lasso regression, l1 and l2 regularization, including their impact on predictions, bias, variance, and feature selection, with emphasis on the use of hyperparameters, cross-validation, and assumptions for linear regression.', 'chapters': [{'end': 4201.953, 'start': 4132.877, 'title': 'Training data and cost minimization', 'summary': 'Explains the concept of training data, minimizing the cost function, and the impact of new data points on predictions in a machine learning model.', 'duration': 69.076, 'highlights': ['The data used for creating the model is called training data, and the objective is to minimize the cost function, currently at zero.', "The chapter emphasizes the need to minimize the cost function, which reflects the accuracy of the model's predictions.", 'The impact of new data points on predictions is illustrated, highlighting the potential disparity between predicted and real points.']}, {'end': 4610.308, 'start': 4202.614, 'title': 'Understanding overfitting, underfitting, and generalization', 'summary': 'Explains the concepts of overfitting, underfitting, and generalization in machine learning, emphasizing the issues of low bias and high variance in overfitting, low bias and low variance in generalization, and high bias and high variance in underfitting, using examples and a cost function formula.', 'duration': 407.694, 'highlights': ['The chapter elaborates on the concept of overfitting, where the model performs well with training data but fails to perform well with test data, with an emphasis on the issues of low bias and high variance, illustrated through model accuracy percentages and the difference between predicted and real values.', 'It discusses the scenario of underfitting, where the model gives bad accuracy for both the training data and the test data, highlighting the issues of high bias and high variance.', 'The chapter emphasizes the importance of having a generalized model that can provide accurate outputs for new data, and uses examples to demonstrate the impact of overfitting, underfitting, and generalization on model performance and the difference between predicted and real values.', 'It explains the cost function formula and the calculation of the difference between predicted and real values, underlining its significance in evaluating model performance and identifying overfitting, underfitting, and generalization.']}, {'end': 5174.152, 'start': 4610.308, 'title': 'Ridge and lasso regression', 'summary': 'Explains the concepts of ridge and lasso regression, including the use of l2 regularization in ridge to prevent overfitting by adding a unique parameter, and the use of l1 regularization in lasso for feature selection based on the mode of slope.', 'duration': 563.844, 'highlights': ['In Ridge regression, L2 regularization adds a unique parameter, lambda multiplied by slope square, to prevent overfitting. Ridge regression uses L2 regularization to prevent overfitting by adding a parameter lambda multiplied by the slope square.', 'Ridge regression aims to create a generalized model with low bias and low variance to avoid overfitting. Ridge regression aims to create a generalized model with low bias and low variance to avoid overfitting.', 'Lasso regression, using L1 regularization, incorporates the mode of slope to facilitate feature selection. Lasso regression utilizes L1 regularization and the mode of slope for feature selection.', "The mode of slope in Lasso regression helps in neglecting features with very small coefficients, facilitating feature selection. Lasso regression's mode of slope helps in neglecting features with very small coefficients, facilitating feature selection.", 'Lambda, a hyperparameter in both Ridge and Lasso regression, determines the rate of lessening the steepness or making it grow higher. Lambda, a hyperparameter, determines the rate of lessening the steepness or making it grow higher in both Ridge and Lasso regression.']}, {'end': 5579.13, 'start': 5174.253, 'title': 'Importance of l1 and l2 regularization', 'summary': 'Discusses the importance of l1 and l2 regularization (lasso and ridge regression) in preventing overfitting, performing feature selection, and finding the best fit line, with emphasis on the use of hyperparameter lambda and cross-validation, as well as the assumptions and checks for linear regression and feature transformation.', 'duration': 404.877, 'highlights': ['L1 regularization (LASSO) helps in preventing overfitting and feature selection by neglecting unimportant features, with the use of hyperparameter lambda and cross-validation to find the exact value. Preventing overfitting, feature selection, use of hyperparameter lambda, cross-validation', 'L2 regularization (Ridge regression) also prevents overfitting and involves a cost function with the addition of lambda multiplied by the square of the slope. Preventing overfitting, cost function with lambda and slope square', "Assumptions of linear regression include features following a normal or Gaussian distribution, feature transformation to achieve this distribution, standardization using Z-score for scaling, and the importance of linearity in the data for the model's effectiveness. Assumptions of linear regression, feature transformation, standardization using Z-score, importance of linearity", 'The importance of checking for multicollinearity, as highly correlated features may not both be necessary, and the concept of variation inflation factor for solving multicollinearity. Checking for multicollinearity, concept of variation inflation factor']}], 'duration': 1446.253, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY4132877.jpg', 'highlights': ['The data used for creating the model is called training data, and the objective is to minimize the cost function, currently at zero.', "The chapter emphasizes the need to minimize the cost function, which reflects the accuracy of the model's predictions.", 'The impact of new data points on predictions is illustrated, highlighting the potential disparity between predicted and real points.', 'The chapter elaborates on the concept of overfitting, where the model performs well with training data but fails to perform well with test data, with an emphasis on the issues of low bias and high variance, illustrated through model accuracy percentages and the difference between predicted and real values.', 'It discusses the scenario of underfitting, where the model gives bad accuracy for both the training data and the test data, highlighting the issues of high bias and high variance.', 'The chapter emphasizes the importance of having a generalized model that can provide accurate outputs for new data, and uses examples to demonstrate the impact of overfitting, underfitting, and generalization on model performance and the difference between predicted and real values.', 'In Ridge regression, L2 regularization adds a unique parameter, lambda multiplied by slope square, to prevent overfitting.', 'Ridge regression aims to create a generalized model with low bias and low variance to avoid overfitting.', 'Lasso regression, using L1 regularization, incorporates the mode of slope to facilitate feature selection.', 'The mode of slope in Lasso regression helps in neglecting features with very small coefficients, facilitating feature selection.', 'Lambda, a hyperparameter in both Ridge and Lasso regression, determines the rate of lessening the steepness or making it grow higher.', 'L1 regularization (LASSO) helps in preventing overfitting and feature selection by neglecting unimportant features, with the use of hyperparameter lambda and cross-validation to find the exact value.', 'L2 regularization (Ridge regression) also prevents overfitting and involves a cost function with the addition of lambda multiplied by the square of the slope.', "Assumptions of linear regression include features following a normal or Gaussian distribution, feature transformation to achieve this distribution, standardization using Z-score for scaling, and the importance of linearity in the data for the model's effectiveness.", 'The importance of checking for multicollinearity, as highly correlated features may not both be necessary, and the concept of variation inflation factor for solving multicollinearity.']}, {'end': 6669.638, 'segs': [{'end': 5609.49, 'src': 'embed', 'start': 5579.51, 'weight': 2, 'content': [{'end': 5584.636, 'text': 'But if you are almost satisfied with this assumptions, you will definitely be able to outperform in linear regression.', 'start': 5579.51, 'duration': 5.126}, {'end': 5587.239, 'text': 'So you have got an idea of the assumptions.', 'start': 5585.096, 'duration': 2.143}, {'end': 5589.281, 'text': 'You have also got an idea of multiple things.', 'start': 5587.279, 'duration': 2.002}, {'end': 5593.085, 'text': "Okay Now let's go towards something called as logistic regression.", 'start': 5589.501, 'duration': 3.584}, {'end': 5595.206, 'text': 'now logistic regression.', 'start': 5593.906, 'duration': 1.3}, {'end': 5600.108, 'text': 'what logistic regression is the first type of algorithm that we are going to learn in classification?', 'start': 5595.206, 'duration': 4.902}, {'end': 5603.328, 'text': "let's say that in classification i have one example.", 'start': 5600.108, 'duration': 3.22}, {'end': 5609.49, 'text': 'you know, suppose i have, say, number of hours, study hours and number of play hours.', 'start': 5603.328, 'duration': 6.162}], 'summary': 'Understanding assumptions leads to better performance in linear regression and introduction to logistic regression in classification.', 'duration': 29.98, 'max_score': 5579.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY5579510.jpg'}, {'end': 5732.715, 'src': 'embed', 'start': 5708.448, 'weight': 0, 'content': [{'end': 5714.991, 'text': 'If it is greater than five, greater than three, it is basically showing data points with respect to pass.', 'start': 5708.448, 'duration': 6.543}, {'end': 5722.133, 'text': "Now, can't we solve this problem first with linear regression? Now with the help of linear regression?", 'start': 5715.991, 'duration': 6.142}, {'end': 5726.734, 'text': 'here the first point will be that, yes, I can definitely draw a best fit line.', 'start': 5722.133, 'duration': 4.601}, {'end': 5730.895, 'text': 'My best fit line in this particular scenario may be something like this.', 'start': 5726.834, 'duration': 4.061}, {'end': 5732.715, 'text': 'It may look something like this.', 'start': 5731.335, 'duration': 1.38}], 'summary': 'Using linear regression to draw best fit line for data points.', 'duration': 24.267, 'max_score': 5708.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY5708448.jpg'}, {'end': 6373.072, 'src': 'embed', 'start': 6347.025, 'weight': 1, 'content': [{'end': 6353.647, 'text': 'Here is my training set with two outputs and I hope everybody knows about theta of z.', 'start': 6347.025, 'duration': 6.622}, {'end': 6357.149, 'text': 'It is nothing but 1 plus e to the power of minus z.', 'start': 6353.647, 'duration': 3.502}, {'end': 6361.61, 'text': 'Here your z is nothing but theta 0 plus theta 1 multiplied by x1.', 'start': 6357.149, 'duration': 4.461}, {'end': 6364.19, 'text': 'So this is your theta 0.', 'start': 6361.971, 'duration': 2.219}, {'end': 6367.411, 'text': 'Now what we have to do, we have to select this theta.', 'start': 6364.19, 'duration': 3.221}, {'end': 6373.072, 'text': "Now in this particular case, let's consider that my theta 0 is 0 because it is passing through the origin.", 'start': 6367.631, 'duration': 5.441}], 'summary': 'Training set with 2 outputs, using theta of z for computations.', 'duration': 26.047, 'max_score': 6347.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6347025.jpg'}, {'end': 6432.527, 'src': 'embed', 'start': 6400.625, 'weight': 3, 'content': [{'end': 6402.526, 'text': 'Now everything will be same.', 'start': 6400.625, 'duration': 1.901}, {'end': 6405.028, 'text': 'Obviously, you know the cost function of linear regression,', 'start': 6402.906, 'duration': 2.122}, {'end': 6410.292, 'text': 'because the first best fit line that you are probably creating is with the help of linear regression.', 'start': 6405.028, 'duration': 5.264}, {'end': 6414.728, 'text': 'Now, in this particular case, in the case of linear regression.', 'start': 6411.584, 'duration': 3.144}, {'end': 6420.554, 'text': 'so here you can basically write j of theta, 1 is nothing but 1 by m.', 'start': 6414.728, 'duration': 5.826}, {'end': 6432.527, 'text': 'summation of i is equal to 1 to m, 1 by 2, and here you have h theta of x, minus y, of i, i whole square.', 'start': 6420.554, 'duration': 11.973}], 'summary': 'Linear regression cost function: j(θ₁) = 1/m ∑ᵢ₁/2(hθ(xᵢ) - yᵢ)²', 'duration': 31.902, 'max_score': 6400.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6400625.jpg'}], 'start': 5579.51, 'title': 'Logistic regression fundamentals', 'summary': "Covers logistic regression basics, application in binary and multiclass classification, limitations of linear regression for classification, the role of the sigmoid function in handling outliers, ensuring output values between 0 and 1, explaining the concept of logistic regression, the sigmoid function, and its role in creating a decision boundary, and the logistic regression cost function's non-convex nature and challenge of local minima.", 'chapters': [{'end': 6035.071, 'start': 5579.51, 'title': 'Understanding logistic regression', 'summary': 'Discusses the basics of logistic regression, its application in binary and multiclass classification, and the limitations of using linear regression for classification problems, emphasizing the need for a sigmoid function to handle outliers and ensure output values between 0 and 1.', 'duration': 455.561, 'highlights': ['Logistic regression is introduced as the first algorithm for classification, specifically suited for binary classification and can also be used for multiclass classification, providing a clear distinction between categories.', 'The limitations of using linear regression for classification problems are highlighted, with a focus on the impact of outliers on the line and the need to ensure output values between 0 and 1, emphasizing the necessity for logistic regression and the sigmoid function to address these challenges.', 'The concept of decision boundary is defined in logistic regression, emphasizing the need for the output values to be between 0 and 1, essential for binary classification problems.']}, {'end': 6301.361, 'start': 6035.831, 'title': 'Logistic regression and sigmoid function', 'summary': 'Explains the concept of logistic regression and the sigmoid function, highlighting the use of the sigmoid function to squash the linear regression line to create a decision boundary, and the key assumption that g(z) is greater than or equal to 0.5 when z is greater than or equal to 0.', 'duration': 265.53, 'highlights': ['The sigmoid function is used to squash the linear regression line to create a decision boundary. This explains how the sigmoid function is utilized to squash the linear regression line to create a decision boundary in logistic regression.', 'The key assumption that g(z) is greater than or equal to 0.5 when z is greater than or equal to 0. The chapter emphasizes the important assumption that g(z) is greater than or equal to 0.5 when z is greater than or equal to 0 in logistic regression.']}, {'end': 6669.638, 'start': 6301.361, 'title': 'Logistic regression cost function', 'summary': 'Discusses the logistic regression cost function and its non-convex nature, explaining the difference between linear regression and logistic regression cost functions and the challenge of local minima in the latter.', 'duration': 368.277, 'highlights': ['The logistic regression cost function is a non-convex function, unlike the convex function of linear regression cost, leading to local minima problems and making it unsuitable for use (relevance score: 5)', 'The challenge arises due to the non-differentiable nature and multiple local minima, hindering the reach of the global minima in logistic regression cost function optimization (relevance score: 4)', 'The chapter also emphasizes the difference in the nature of convex function and non-convex function, specifically related to gradient descent and its impact on reaching global minima (relevance score: 3)', 'Explanation of the logistic regression cost function and its difference from the linear regression cost function, along with the impact of the sigmoid activation function on the parameter theta1 (relevance score: 2)', 'Introduction of the training set and the concept of binary classification, highlighting the need for selecting theta and the definition of cost function for logistic regression (relevance score: 1)']}], 'duration': 1090.128, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY5579510.jpg', 'highlights': ['Logistic regression is introduced as the first algorithm for classification, suitable for binary and multiclass classification.', 'The sigmoid function is used to squash the linear regression line to create a decision boundary.', 'The limitations of using linear regression for classification problems are highlighted, emphasizing the necessity for logistic regression and the sigmoid function.', 'The logistic regression cost function is a non-convex function, leading to local minima problems.', 'The challenge arises due to the non-differentiable nature and multiple local minima in logistic regression cost function optimization.']}, {'end': 7754.621, 'segs': [{'end': 7012.887, 'src': 'embed', 'start': 6986.551, 'weight': 2, 'content': [{'end': 6991.014, 'text': 'And this 1 minus 1 is 0, so 0 multiplied by anything will be 0.', 'start': 6986.551, 'duration': 4.463}, {'end': 6993.295, 'text': 'if y is equal to 0, then what will happen?', 'start': 6991.014, 'duration': 2.281}, {'end': 6998.617, 'text': 'my cost function will be so when it is 0, this will minus.', 'start': 6993.295, 'duration': 5.322}, {'end': 7001.899, 'text': 'y will become 0 0 multiplied by 0, anything is 0.', 'start': 6998.617, 'duration': 3.282}, {'end': 7003.62, 'text': 'so here you will be able to see that i am.', 'start': 7001.899, 'duration': 1.721}, {'end': 7009.183, 'text': 'i will be having minus log 1, minus h, theta of x of i.', 'start': 7003.62, 'duration': 5.563}, {'end': 7012.887, 'text': 'so this both the condition has been proved by this cost function.', 'start': 7009.183, 'duration': 3.704}], 'summary': 'The cost function demonstrates that 0 multiplied by anything is 0; when y=0, the function becomes minus log 1.', 'duration': 26.336, 'max_score': 6986.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6986551.jpg'}, {'end': 7481.936, 'src': 'embed', 'start': 7454.979, 'weight': 0, 'content': [{'end': 7459.341, 'text': 'If my model is basically just outputting 0, 0, 0, 0, 0.', 'start': 7454.979, 'duration': 4.362}, {'end': 7463.365, 'text': 'If it is outputting 0, 0, 0, obviously, most of the answer will be zeros.', 'start': 7459.342, 'duration': 4.023}, {'end': 7468.568, 'text': 'But this will be a scenario like, you know, where it is just outputting one thing.', 'start': 7463.605, 'duration': 4.963}, {'end': 7470.75, 'text': 'Then also it is able to get 90% accuracy.', 'start': 7468.768, 'duration': 1.982}, {'end': 7474.032, 'text': 'So you should only not be dependent on accuracy.', 'start': 7471.23, 'duration': 2.802}, {'end': 7477.534, 'text': 'So there are lot of terminologies that we will basically use.', 'start': 7474.652, 'duration': 2.882}, {'end': 7481.936, 'text': 'One terminology that we specifically use is something called as precision.', 'start': 7477.754, 'duration': 4.182}], 'summary': 'Model outputting 0, 0, 0; 90% accuracy not sole factor; precision terminology.', 'duration': 26.957, 'max_score': 7454.979, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY7454979.jpg'}], 'start': 6669.999, 'title': 'Logistic regression cost function and performance metrics in binary classification', 'summary': 'Explains the need to modify the logistic regression cost function to address local minima, proposing a new cost function guaranteeing global minima. it also details logistic regression cost function and performance metrics for binary classification, including confusion matrix, accuracy calculation, precision, and recall, emphasizing the impact of imbalanced datasets on model evaluation.', 'chapters': [{'end': 6774.672, 'start': 6669.999, 'title': 'Logistic regression cost function', 'summary': 'Explains the need to modify the logistic regression cost function to address the issue of local minima, proposing a new cost function that guarantees a global minima.', 'duration': 104.673, 'highlights': ['The logistic regression cost function needs to be modified to address the issue of local minima, where the slope is zero and theta1 does not get updated, introducing the proposal for a new cost function. (relevance score: 5)', 'The new proposed logistic regression cost function is represented as minus log of h of theta of x for y=1 and minus log of 1 minus h of theta of x for y=0, ensuring a global minima due to the use of log. (relevance score: 4)', 'The chapter emphasizes the utilization of the new cost function to ensure the achievement of a global minima, neglecting the previous cost function. (relevance score: 3)']}, {'end': 7123.923, 'start': 6775.692, 'title': 'Logistic regression cost function', 'summary': 'Explains the logistic regression cost function, where the cost is 0 when y is 1 and h theta of x is 1, and a different curve is obtained when y is 0, leading to a gradient descent. the final cost function is derived and the convergence algorithm for updating theta 1 is detailed.', 'duration': 348.231, 'highlights': ['The cost is 0 when y is 1 and h theta of x is 1 When y is equal to 1 and h theta of x is equal to 1, the cost function will be 0, indicating a perfect fit.', 'Different curve obtained when y is 0 When y is 0, a different curve is obtained, leading to a gradient descent and contributing to the final cost function.', 'Derivation of the final cost function The final cost function is derived as Cost of h theta of x of i comma y is equal to minus y log h theta of x of i minus log 1 minus y, with specific cases when y is equal to 1 and y is equal to 0.']}, {'end': 7754.621, 'start': 7124.323, 'title': 'Performance metrics in binary classification', 'summary': 'Discusses performance metrics for binary classification, including confusion matrix, accuracy calculation, precision, recall, and the impact of imbalanced datasets on model evaluation, emphasizing the importance of precision and recall in different problem scenarios.', 'duration': 630.298, 'highlights': ['The accuracy of the model is calculated using the confusion matrix, and in the given scenario, it is determined to be 57% based on the true positives, true negatives, false positives, and false negatives.', 'The impact of imbalanced datasets on model accuracy is illustrated using an example where a biased model achieving 90% accuracy is highlighted to emphasize that accuracy alone may not be a reliable metric for evaluation.', 'Precision and recall are explained as important metrics for evaluating model performance, with precision focusing on the accuracy of positive predictions and recall emphasizing the accurate identification of true positives.', 'The distinction between precision and recall is exemplified through use cases such as spam classification and cancer detection, highlighting the prioritization of false positives in spam classification and false negatives in cancer detection.', "The importance of considering the end user's perspective when choosing between precision and recall is emphasized, with the example of stock market prediction illustrating the differing implications for individuals and companies."]}], 'duration': 1084.622, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY6669999.jpg', 'highlights': ['The logistic regression cost function needs modification to address local minima, proposing a new cost function (relevance score: 5)', 'The new proposed logistic regression cost function ensures global minima due to the use of log (relevance score: 4)', 'The accuracy of the model is calculated using the confusion matrix, and in the given scenario, it is determined to be 57% based on the true positives, true negatives, false positives, and false negatives (relevance score: 3)', 'The impact of imbalanced datasets on model accuracy is illustrated using an example where a biased model achieving 90% accuracy is highlighted to emphasize that accuracy alone may not be a reliable metric for evaluation (relevance score: 2)', 'Precision and recall are explained as important metrics for evaluating model performance, with precision focusing on the accuracy of positive predictions and recall emphasizing the accurate identification of true positives (relevance score: 1)']}, {'end': 8510.181, 'segs': [{'end': 7961.923, 'src': 'embed', 'start': 7932.728, 'weight': 3, 'content': [{'end': 7936.769, 'text': 'That basically means you are giving importance to both precision and recall.', 'start': 7932.728, 'duration': 4.041}, {'end': 7941.871, 'text': 'If your false positive is more important, then at that point of time, you reduce your beta value.', 'start': 7937.43, 'duration': 4.441}, {'end': 7946.553, 'text': 'If false negative is greater than false positive, then your beta value is increasing.', 'start': 7942.331, 'duration': 4.222}, {'end': 7953.28, 'text': 'Beta is a deciding parameter to decide your F1 score or F2 score or F point score.', 'start': 7947.978, 'duration': 5.302}, {'end': 7954.36, 'text': 'Now, first thing.', 'start': 7953.72, 'duration': 0.64}, {'end': 7956.341, 'text': "first, what is the agenda of today's session?", 'start': 7954.36, 'duration': 1.981}, {'end': 7961.923, 'text': 'First of all, we will complete practicals for all the algorithms that we have discussed.', 'start': 7956.381, 'duration': 5.542}], 'summary': 'The importance of precision and recall is determined by beta value to calculate f1, f2, and f point scores.', 'duration': 29.195, 'max_score': 7932.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY7932728.jpg'}, {'end': 8088.63, 'src': 'embed', 'start': 8064.691, 'weight': 4, 'content': [{'end': 8072.441, 'text': 'we will be using fit intercept, everything as such, but here the main aim is to find out the coefficients, which is basically indicated by theta 0,', 'start': 8064.691, 'duration': 7.75}, {'end': 8075.204, 'text': 'theta 1 and all the first thing.', 'start': 8072.441, 'duration': 2.763}, {'end': 8088.63, 'text': "we'll start with linear regression and then we will go ahead and discuss with ridge and lasso i'm just going to make this as markdown how many different libraries of for linear regression you can do with stats,", 'start': 8075.204, 'duration': 13.426}], 'summary': 'Analyzing linear regression and its coefficients, exploring ridge and lasso methods.', 'duration': 23.939, 'max_score': 8064.691, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8064691.jpg'}, {'end': 8303.888, 'src': 'embed', 'start': 8267.574, 'weight': 0, 'content': [{'end': 8269.314, 'text': 'And based on that, I have that specific value.', 'start': 8267.574, 'duration': 1.74}, {'end': 8275.179, 'text': "Now, the next thing that I'm going to do, probably I should also be able to add the target feature name over here.", 'start': 8270.396, 'duration': 4.783}, {'end': 8285.085, 'text': "So what I will do, I will just convert this into DF and then I will also say DF dot columns and I'll set it to DF dot target.", 'start': 8275.499, 'duration': 9.586}, {'end': 8288.048, 'text': 'Okay And let me change this to dataset.', 'start': 8285.566, 'duration': 2.482}, {'end': 8294.072, 'text': "So I'm going to change this to dataset and I'm going to say dataset dot columns is equal to DF dot target.", 'start': 8288.748, 'duration': 5.324}, {'end': 8303.888, 'text': 'So if I execute this and now if I probably print my dataset dot head, you will be able to see this specific thing.', 'start': 8294.252, 'duration': 9.636}], 'summary': 'Converting specific values to dataset and printing head.', 'duration': 36.314, 'max_score': 8267.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8267574.jpg'}, {'end': 8413.717, 'src': 'embed', 'start': 8384.808, 'weight': 1, 'content': [{'end': 8387.05, 'text': "I don't know what the hell it means, but it's fine.", 'start': 8384.808, 'duration': 2.242}, {'end': 8390.072, 'text': 'We have some kind of data over here properly in front of you.', 'start': 8387.111, 'duration': 2.961}, {'end': 8392.955, 'text': 'So these are my independent features.', 'start': 8391.093, 'duration': 1.862}, {'end': 8394.897, 'text': 'What are these? These all are my independent features.', 'start': 8392.995, 'duration': 1.902}, {'end': 8399.661, 'text': 'If you want the features detail, here you can see it, right? Everything.', 'start': 8394.957, 'duration': 4.704}, {'end': 8404.466, 'text': 'What is CRM? This basically means per capita crime rate by town, which is important.', 'start': 8399.741, 'duration': 4.725}, {'end': 8410.191, 'text': 'ZN, it is proportional of residential land zone for lots over 25, 000 square feet.', 'start': 8405.106, 'duration': 5.085}, {'end': 8413.717, 'text': 'So this is my df, I did not do much.', 'start': 8412.056, 'duration': 1.661}], 'summary': 'Data includes crm and zn as independent features for analysis.', 'duration': 28.909, 'max_score': 8384.808, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8384808.jpg'}, {'end': 8519.928, 'src': 'embed', 'start': 8492.775, 'weight': 5, 'content': [{'end': 8497.877, 'text': 'it should have definitely said that it is probably in millions or okay, but that is not a problem, i think,', 'start': 8492.775, 'duration': 5.102}, {'end': 8500.518, 'text': 'but mostly it will be in millions somewhere.', 'start': 8497.877, 'duration': 2.641}, {'end': 8501.839, 'text': 'i think it should be here.', 'start': 8500.518, 'duration': 1.321}, {'end': 8507.839, 'text': "Okay, I cannot see it, but probably if I put more time, I'll be able to understand it, okay?", 'start': 8503.536, 'duration': 4.303}, {'end': 8509.781, 'text': 'So, over here, what is the thing?', 'start': 8508.32, 'duration': 1.461}, {'end': 8510.181, 'text': 'main thing?', 'start': 8509.781, 'duration': 0.4}, {'end': 8514.344, 'text': 'These all are my independent features and this is my dependent feature, right?', 'start': 8510.321, 'duration': 4.023}, {'end': 8519.928, 'text': "So if I'm trying to solve linear regression, I have to divide my independent and dependent features properly.", 'start': 8514.764, 'duration': 5.164}], 'summary': 'Discussing the need to properly divide independent and dependent features in linear regression.', 'duration': 27.153, 'max_score': 8492.775, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8492775.jpg'}], 'start': 7755.201, 'title': 'Evaluating f-score and f-beta', 'summary': 'Discusses the use of f-score and f-beta to evaluate precision and recall, emphasizing the importance of false positive and false negative, and the impact of different beta values on the scores. it also covers practical sessions on algorithms including linear regression, ridge, lasso, nape bias, and knn, with a focus on hyperparameter tuning, using the boston house pricing dataset for analysis and understanding the intuition behind the classification and regression algorithms.', 'chapters': [{'end': 7954.36, 'start': 7755.201, 'title': 'Understanding f-score and f-beta', 'summary': 'Discusses the use of f-score and f-beta to evaluate precision and recall, emphasizing the importance of false positive and false negative, and the impact of different beta values on the scores.', 'duration': 199.159, 'highlights': ['F-score is calculated using the formula 1 plus beta square precision multiplied by recall divided by beta square multiplied by precision plus recall, with different beta values impacting the importance given to false positive and false negative. The F-score is calculated using the formula 1 plus beta square precision multiplied by recall divided by beta square multiplied by precision plus recall. Different beta values impact the importance given to false positive and false negative, with beta determining the F1, F0.5, or F2 score.', 'The F1 score is obtained when beta equals 1, indicating equal importance given to precision and recall, while adjusting beta value allows for prioritizing false positive or false negative based on their importance. The F1 score is obtained when beta equals 1, indicating equal importance given to precision and recall. Adjusting the beta value allows for prioritizing false positive or false negative based on their importance.', 'Lowering the beta value to 0.5 emphasizes the importance of false positive, while increasing it to 2 gives more significance to false negative, thereby affecting the F0.5 and F2 scores. Lowering the beta value to 0.5 emphasizes the importance of false positive, while increasing it to 2 gives more significance to false negative, thereby affecting the F0.5 and F2 scores.']}, {'end': 8510.181, 'start': 7954.36, 'title': 'Practicals on algorithms and hyperparameter tuning', 'summary': 'Covers practical sessions on algorithms including linear regression, ridge, lasso, nape bias, and knn, with a focus on hyperparameter tuning, using the boston house pricing dataset for analysis and understanding the intuition behind the classification and regression algorithms.', 'duration': 555.821, 'highlights': ['The session includes practicals for all the discussed algorithms, hyperparameter tuning, and simple examples. The agenda includes completing practicals for all discussed algorithms and doing hyperparameter tuning, with a focus on simple examples.', 'The chapter covers algorithms like nape bias and KNN, with a focus on understanding their intuition. The session discusses the nape bias and KNN algorithms, emphasizing understanding their intuition and probability theorems.', 'The practical problem involves using linear regression, ridge, and lasso on the Boston house pricing dataset. The practical problem involves using linear regression, ridge, and lasso algorithms on the Boston house pricing dataset for analysis and problem-solving.', 'The speaker emphasizes the importance of learning and understanding basic concepts, providing guidance on using necessary libraries and loading datasets. The speaker emphasizes the importance of learning basic concepts and guides the audience on using necessary libraries and loading datasets, aiming for a better learning experience.']}], 'duration': 754.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY7755201.jpg', 'highlights': ['F-score is calculated using the formula 1 plus beta square precision multiplied by recall divided by beta square multiplied by precision plus recall, with different beta values impacting the importance given to false positive and false negative.', 'The F1 score is obtained when beta equals 1, indicating equal importance given to precision and recall, while adjusting beta value allows for prioritizing false positive or false negative based on their importance.', 'Lowering the beta value to 0.5 emphasizes the importance of false positive, while increasing it to 2 gives more significance to false negative, thereby affecting the F0.5 and F2 scores.', 'The session includes practicals for all the discussed algorithms, hyperparameter tuning, and simple examples.', 'The practical problem involves using linear regression, ridge, and lasso on the Boston house pricing dataset.', 'The chapter covers algorithms like nape bias and KNN, with a focus on understanding their intuition.', 'The speaker emphasizes the importance of learning and understanding basic concepts, providing guidance on using necessary libraries and loading datasets.']}, {'end': 10482.081, 'segs': [{'end': 8657.709, 'src': 'embed', 'start': 8631.806, 'weight': 3, 'content': [{'end': 8637.171, 'text': "Always remember, whenever I definitely start with linear regression, I'll definitely not go directly with linear regression.", 'start': 8631.806, 'duration': 5.365}, {'end': 8642.776, 'text': "Instead, what I will do is that I'll try to go with ridge regression and lasso regression,", 'start': 8637.831, 'duration': 4.945}, {'end': 8646.018, 'text': 'because there are a lot of options with respect to hyperparameter tuning.', 'start': 8642.776, 'duration': 3.242}, {'end': 8648.681, 'text': 'But I will just show you how linear regression is done.', 'start': 8646.459, 'duration': 2.222}, {'end': 8653.645, 'text': 'So basically, you really need to use a lot of libraries, okay, over here.', 'start': 8649.301, 'duration': 4.344}, {'end': 8657.709, 'text': 'And based on these libraries, these libraries will try to install, okay.', 'start': 8654.106, 'duration': 3.603}], 'summary': 'Start with ridge and lasso regression before linear regression due to hyperparameter tuning. utilize multiple libraries for implementation.', 'duration': 25.903, 'max_score': 8631.806, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8631806.jpg'}, {'end': 8838.383, 'src': 'embed', 'start': 8812.494, 'weight': 4, 'content': [{'end': 8817.9, 'text': "i'm just going to use scoring is equal to you can use mean squared error, negative mean squared error.", 'start': 8812.494, 'duration': 5.406}, {'end': 8821.304, 'text': "let's say that i'm going to use negative mean squared error again.", 'start': 8817.9, 'duration': 3.404}, {'end': 8822.706, 'text': 'where do you find all these things?', 'start': 8821.304, 'duration': 1.402}, {'end': 8831.436, 'text': 'you will be able to see in the sk learn page of linear crossval score and then finally in the crossval score you give cross validation value as 5, 10,', 'start': 8822.706, 'duration': 8.73}, {'end': 8832.478, 'text': 'whatever you want.', 'start': 8831.436, 'duration': 1.042}, {'end': 8833.959, 'text': "so after this, what i'm actually going to do?", 'start': 8832.478, 'duration': 1.481}, {'end': 8838.383, 'text': "i'm just going to basically from this how many scores i will get.", 'start': 8834.64, 'duration': 3.743}], 'summary': 'Using cross-validation score to get multiple scores.', 'duration': 25.889, 'max_score': 8812.494, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8812494.jpg'}, {'end': 8961.663, 'src': 'embed', 'start': 8923.685, 'weight': 2, 'content': [{'end': 8926.167, 'text': 'whatever you want to predict automatically, the prediction will be done.', 'start': 8923.685, 'duration': 2.482}, {'end': 8932.874, 'text': "So I'm just going to remove this and focus on ridge regression right now because I want to show how hyperparameter tuning is done in ridge regression.", 'start': 8926.488, 'duration': 6.386}, {'end': 8944.839, 'text': "So for ridge regression, the simple thing is that I'll be using two different libraries from sklearn.linear linear underscore model.", 'start': 8933.074, 'duration': 11.765}, {'end': 8947.619, 'text': "i'm going to import ridge.", 'start': 8944.839, 'duration': 2.78}, {'end': 8951, 'text': 'so for the ridge it is also present in linear underscore model.', 'start': 8947.619, 'duration': 3.381}, {'end': 8961.663, 'text': "for doing the hyper parameter tuning i will be using from sk learn dot model underscore selection and then i'm going to import grid search cv.", 'start': 8951, 'duration': 10.663}], 'summary': 'The focus is on demonstrating hyperparameter tuning in ridge regression using sklearn.linear_model and grid search cv.', 'duration': 37.978, 'max_score': 8923.685, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8923685.jpg'}, {'end': 9902.715, 'src': 'embed', 'start': 9877.727, 'weight': 1, 'content': [{'end': 9882.75, 'text': "i'm just going to fit it Now if I go and probably try to do the calculation.", 'start': 9877.727, 'duration': 5.023}, {'end': 9887.532, 'text': 'so if I go and see my R2 score, it is also coming somewhere around 68%, 67%.', 'start': 9882.75, 'duration': 4.782}, {'end': 9893.975, 'text': "Now, since this is just a linear regression, you won't be able to get 100% because you are drawing a straight line right?", 'start': 9887.532, 'duration': 6.443}, {'end': 9899.898, 'text': 'So for that, you basically have to use other algorithms like XGBoost and all NapeBias.', 'start': 9894.315, 'duration': 5.583}, {'end': 9900.958, 'text': 'so many algorithms are there.', 'start': 9899.898, 'duration': 1.06}, {'end': 9902.715, 'text': "It's okay.", 'start': 9902.355, 'duration': 0.36}], 'summary': 'Linear regression r2 score around 68%-67%; for 100%, use xgboost or other algorithms.', 'duration': 24.988, 'max_score': 9877.727, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY9877727.jpg'}, {'end': 10278.097, 'src': 'embed', 'start': 10251.077, 'weight': 0, 'content': [{'end': 10257.442, 'text': "so i'm just going to say grid search cv and i'm going to apply it for model one param.", 'start': 10251.077, 'duration': 6.365}, {'end': 10259.704, 'text': 'grid is equal to params.', 'start': 10257.442, 'duration': 2.262}, {'end': 10262.206, 'text': "this parameter that i'm specifically trying to apply,", 'start': 10259.704, 'duration': 2.502}, {'end': 10268.911, 'text': 'since this is a classification problem and i am not pretty sure that whether true positive is important or true negative is important,', 'start': 10262.206, 'duration': 6.705}, {'end': 10270.913, 'text': "i'm going to use f1 scoring.", 'start': 10268.911, 'duration': 2.002}, {'end': 10278.097, 'text': 'okay, f1 scoring is basically again the parametric term which we discussed yesterday, which is nothing but performance metrics,', 'start': 10270.913, 'duration': 7.184}], 'summary': 'Using grid search cv for model one param with f1 scoring for a classification problem.', 'duration': 27.02, 'max_score': 10251.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY10251077.jpg'}], 'start': 8510.321, 'title': 'Implementing regression and model selection', 'summary': 'Discusses linear regression data preparation, cross-validation, ridge regression, grid search cv for model selection, train-test split, logistic regression, with demonstrations of alpha values, max iteration, model performance improvement, and best model achieving 96% accuracy.', 'chapters': [{'end': 8689.513, 'start': 8510.321, 'title': 'Linear regression data preparation', 'summary': 'Discusses dividing the dataset into independent and dependent features, using iloc to select features, and initializing linear regression for analysis.', 'duration': 179.192, 'highlights': ['The chapter emphasizes the importance of dividing the dataset into independent and dependent features, using iloc to select features, and initializing linear regression for analysis.', 'The speaker explains the process of dividing the dataset into independent and dependent features, using iloc to select features and emphasizes the significance of properly dividing the dataset for linear regression analysis.', 'The speaker highlights the importance of properly dividing the dataset into independent and dependent features for linear regression analysis, emphasizing the use of iloc to select features and the significance of this process in the analysis.']}, {'end': 9181.109, 'start': 8689.513, 'title': 'Implementing cross-validation and ridge regression', 'summary': 'Covers the implementation of cross-validation using cross val score and the application of hyperparameter tuning in ridge regression using grid search cv, with a demonstration of alpha values and max iteration.', 'duration': 491.596, 'highlights': ['The chapter covers the implementation of cross-validation using cross val score and the application of hyperparameter tuning in ridge regression using grid search cv. This is the central theme of the chapter, demonstrating the implementation of cross-validation and hyperparameter tuning in ridge regression.', 'The demonstration of alpha values and max iteration in ridge regression for hyperparameter tuning. The transcript provides an explanation and demonstration of using alpha values and max iteration for hyperparameter tuning in ridge regression.', "Explanation of dividing the train and test data in cross-validation and its importance. The speaker emphasizes the importance of cross-validation in dividing train and test data, ensuring all combinations are taken care of and the model's accuracy is assessed."]}, {'end': 9606.937, 'start': 9181.109, 'title': 'Implementing gridsearchcv for model selection', 'summary': 'Discusses implementing gridsearchcv for model selection, showcasing how ridge regression and lasso regression are compared using gridsearchcv with parameters like alpha values, resulting in improved performance from -37 to -32 for ridge regression and further improvement to -29 by adding more parameters, while lasso regression shows performance of -35.', 'duration': 425.828, 'highlights': ["Ridge regression with GridSearchCV resulted in improved performance from -37 to -32 by selecting the best alpha value, showcasing the effectiveness of parameter tuning. {'initial_performance': -37, 'improved_performance': -32}", "Further improvement in Ridge regression to -29 was achieved by adding more parameters, demonstrating the impact of parameter selection on model performance. {'improved_performance': -29}", "Lasso regression achieved a performance of -35 using GridSearchCV, showing how different parameters can affect model performance. {'performance': -35}"]}, {'end': 10079.192, 'start': 9607.858, 'title': 'Implementing train-test split and model selection', 'summary': 'Discusses implementing train-test split using 33%, evaluating model performance using r2 score, and applying logistic regression on the breast cancer dataset, with emphasis on class imbalance.', 'duration': 471.334, 'highlights': ['Implementing train-test split using 33% The chapter covers implementing train-test split using 33% for model evaluation, demonstrating the distribution of test and train data.', 'Evaluating model performance using R2 score The transcript discusses evaluating model performance using R2 score, with values ranging from 67% to 68% for linear regression and ridge regression, aiming for higher performance.', 'Applying logistic regression on the breast cancer dataset, emphasizing class imbalance The chapter showcases the application of logistic regression on the breast cancer dataset and highlights the need to address class imbalance, with 357 instances of one class and 212 instances of the other class.']}, {'end': 10482.081, 'start': 10079.193, 'title': 'Logistic regression and model evaluation', 'summary': 'Covers train test split, logistic regression parameters such as l1 and l2 norms, c value, class weight, grid search cv, f1 scoring, confusion matrix, accuracy score, and roc, with the best model achieving 95% score and 96% accuracy on test data.', 'duration': 402.888, 'highlights': ['Logistic regression parameters such as l1 and l2 norms, c value, and class weight are discussed, with class weight being applicable for imbalanced datasets resulting in 96% accuracy on test data.', 'Grid search cv and f1 scoring are applied to find the best model with a score of 95%.', 'Confusion matrix and accuracy score are used to evaluate the model performance, achieving 96% accuracy on test data.', 'Explanation of base theorem and Naive Bayes algorithm for classification is provided.']}], 'duration': 1971.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY8510321.jpg', 'highlights': ['Demonstration of alpha values and max iteration in ridge regression for hyperparameter tuning.', 'Ridge regression with GridSearchCV resulted in improved performance from -37 to -32 by selecting the best alpha value.', 'Logistic regression parameters such as l1 and l2 norms, c value, and class weight are discussed, with class weight being applicable for imbalanced datasets resulting in 96% accuracy on test data.', 'Confusion matrix and accuracy score are used to evaluate the model performance, achieving 96% accuracy on test data.', 'The chapter emphasizes the importance of dividing the dataset into independent and dependent features, using iloc to select features, and initializing linear regression for analysis.']}, {'end': 12006.409, 'segs': [{'end': 10635.915, 'src': 'embed', 'start': 10600.511, 'weight': 1, 'content': [{'end': 10604.513, 'text': "So it's the simple, the formula will be very simple, right? Which we have already discussed in stats.", 'start': 10600.511, 'duration': 4.002}, {'end': 10607.635, 'text': 'It is nothing but probability of probability of red.', 'start': 10604.973, 'duration': 2.662}, {'end': 10611.83, 'text': 'multiplied by probability of green given red.', 'start': 10608.729, 'duration': 3.101}, {'end': 10615.29, 'text': 'So this specific thing is called as conditional probability.', 'start': 10612.15, 'duration': 3.14}, {'end': 10617.031, 'text': 'Here understand what is happening.', 'start': 10615.77, 'duration': 1.261}, {'end': 10621.232, 'text': 'Probability of green marble given the red marble event has occurred.', 'start': 10617.131, 'duration': 4.101}, {'end': 10622.932, 'text': 'Here both the events are independent.', 'start': 10621.432, 'duration': 1.5}, {'end': 10624.732, 'text': 'Now let me write it down very nicely.', 'start': 10623.072, 'duration': 1.66}, {'end': 10635.915, 'text': 'So I can write probability of A and B is equal to probability of A multiplied by probability of B divided by probability of A.', 'start': 10625.212, 'duration': 10.703}], 'summary': 'Conditional probability is the probability of green marble given the red marble event, calculated using the formula probability of a and b is equal to probability of a multiplied by probability of b divided by probability of a.', 'duration': 35.404, 'max_score': 10600.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY10600511.jpg'}, {'end': 10866.672, 'src': 'embed', 'start': 10832.735, 'weight': 6, 'content': [{'end': 10836.336, 'text': 'And this will be equal to probability of A.', 'start': 10832.735, 'duration': 3.601}, {'end': 10839.037, 'text': 'That is X1, X2 like this up to Xn.', 'start': 10836.336, 'duration': 2.701}, {'end': 10843.52, 'text': 'so probability of y multiplied by probability of a given y.', 'start': 10839.517, 'duration': 4.003}, {'end': 10855.367, 'text': 'now, if i try to expand this, then this will basically become something like this see probability of y multiplied by probability of x1, given yes,', 'start': 10843.52, 'duration': 11.847}, {'end': 10857.388, 'text': 'a given y, sorry.', 'start': 10855.367, 'duration': 2.021}, {'end': 10863.291, 'text': 'given y multiplied by probability of X2, given Y.', 'start': 10857.388, 'duration': 5.903}, {'end': 10866.672, 'text': 'probability of X3, given Y.', 'start': 10863.291, 'duration': 3.381}], 'summary': 'The transcript discusses the calculation of probability of a using x1, x2 up to xn and the conditional probabilities of a given y.', 'duration': 33.937, 'max_score': 10832.735, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY10832735.jpg'}, {'end': 11077.546, 'src': 'embed', 'start': 11048.455, 'weight': 4, 'content': [{'end': 11050.036, 'text': 'So both the formulas written over here.', 'start': 11048.455, 'duration': 1.581}, {'end': 11054.998, 'text': 'What is the probability with respect to yes and what is the probability with respect to no?', 'start': 11050.811, 'duration': 4.187}, {'end': 11059.605, 'text': 'Now, in this case, one common thing you see that this denominator is fixed.', 'start': 11055.378, 'duration': 4.227}, {'end': 11061.668, 'text': 'This is definitely fixed.', 'start': 11060.546, 'duration': 1.122}, {'end': 11062.669, 'text': 'It is fixed.', 'start': 11062.189, 'duration': 0.48}, {'end': 11065.073, 'text': 'It is not going to change for both of them.', 'start': 11062.729, 'duration': 2.344}, {'end': 11068.417, 'text': 'And I can consider that this is a constant.', 'start': 11065.954, 'duration': 2.463}, {'end': 11071.68, 'text': 'So what I can do? I can definitely ignore.', 'start': 11068.877, 'duration': 2.803}, {'end': 11074.162, 'text': 'So here, I can definitely ignore these things.', 'start': 11072.02, 'duration': 2.142}, {'end': 11075.824, 'text': 'Ignore this also, ignore this also.', 'start': 11074.343, 'duration': 1.481}, {'end': 11077.546, 'text': 'Because, see, this is constant.', 'start': 11075.984, 'duration': 1.562}], 'summary': 'Discussing the fixed denominator and the possibility of ignoring constant factors.', 'duration': 29.091, 'max_score': 11048.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY11048455.jpg'}, {'end': 11177.495, 'src': 'embed', 'start': 11146.46, 'weight': 3, 'content': [{'end': 11148.041, 'text': 'So your final answer will be this one.', 'start': 11146.46, 'duration': 1.581}, {'end': 11149.741, 'text': 'This formulas you have to remember.', 'start': 11148.481, 'duration': 1.26}, {'end': 11150.722, 'text': "Now we'll solve a problem.", 'start': 11149.801, 'duration': 0.921}, {'end': 11152.042, 'text': "Let's solve a problem.", 'start': 11151.222, 'duration': 0.82}, {'end': 11154.163, 'text': 'This will be a very, very interesting problem.', 'start': 11152.462, 'duration': 1.701}, {'end': 11157.544, 'text': "Let's say I have a data set which has like this feature, day.", 'start': 11154.503, 'duration': 3.041}, {'end': 11161.105, 'text': 'Let me just copy this data set, okay, for you all.', 'start': 11158.504, 'duration': 2.601}, {'end': 11164.466, 'text': 'Now in this data set, I want to take out some information.', 'start': 11161.705, 'duration': 2.761}, {'end': 11167.327, 'text': "Let's take out outlook table.", 'start': 11165.847, 'duration': 1.48}, {'end': 11177.495, 'text': 'Now based on this output outlook feature See over here outlook my day, outlook temperature, humidity wind.', 'start': 11167.347, 'duration': 10.148}], 'summary': 'Solving a data set problem with multiple features and variables.', 'duration': 31.035, 'max_score': 11146.46, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY11146460.jpg'}, {'end': 11566.333, 'src': 'embed', 'start': 11539.68, 'weight': 2, 'content': [{'end': 11545.323, 'text': 'So here I will write probability of yes given sunny comma hot.', 'start': 11539.68, 'duration': 5.643}, {'end': 11551.206, 'text': 'Then here I will write probability of years multiplied by probability of.', 'start': 11545.743, 'duration': 5.463}, {'end': 11560.31, 'text': 'so here I will write probability of sunny given years, multiplied by probability of hot given years.', 'start': 11551.206, 'duration': 9.104}, {'end': 11562.551, 'text': 'divided by what is it?', 'start': 11560.31, 'duration': 2.241}, {'end': 11566.333, 'text': 'Probability of sunny multiplied by probability of hot.', 'start': 11562.831, 'duration': 3.502}], 'summary': 'Calculating conditional probability using sunny and hot weather', 'duration': 26.653, 'max_score': 11539.68, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY11539680.jpg'}, {'end': 11741.414, 'src': 'embed', 'start': 11684.187, 'weight': 0, 'content': [{'end': 11695.303, 'text': 'Probability of yes given sunny comma hearth, which is my independent feature, is nothing but 0.031.', 'start': 11684.187, 'duration': 11.116}, {'end': 11698.527, 'text': 'And this is probability of no given sunny comma hearth.', 'start': 11695.303, 'duration': 3.224}, {'end': 11703.453, 'text': "0.085. now we'll try to normalize this 0.085 plus point, divided by 0.031 plus 0.085 0.73.", 'start': 11699.649, 'duration': 3.804}, {'end': 11709.359, 'text': 'this is nothing but 73 percent, and here i can basically say 1 minus 0.73, which is my 0.27,', 'start': 11703.453, 'duration': 5.906}, {'end': 11711.842, 'text': 'which is nothing but 27 percent if the input comes as sunny and hot.', 'start': 11709.359, 'duration': 2.483}, {'end': 11726.238, 'text': 'If the weather is sunny and hot, what will the person do? Whether he will play or not??', 'start': 11721.292, 'duration': 4.946}, {'end': 11727.46, 'text': 'The answer is no.', 'start': 11726.459, 'duration': 1.001}, {'end': 11735.73, 'text': 'Okay, now my next question will be that if your new data is overcast and mild, now tell me what will be the probability?', 'start': 11728.181, 'duration': 7.549}, {'end': 11739.793, 'text': 'using name bias now you can add any number of features.', 'start': 11736.451, 'duration': 3.342}, {'end': 11740.854, 'text': "let's say that.", 'start': 11739.793, 'duration': 1.061}, {'end': 11741.414, 'text': 'i will say that.', 'start': 11740.854, 'duration': 0.56}], 'summary': 'Probability of not playing if sunny and hot is 73%.', 'duration': 57.227, 'max_score': 11684.187, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY11684187.jpg'}], 'start': 10482.541, 'title': 'Probability and classification in data science', 'summary': "Discusses probability concepts, including independent and dependent events, conditional probability, and bayes' theorem, and explains probability calculation in binary classification for data science, with detailed examples and equations. it also covers the application of probabilities in machine learning for predicting outcomes based on specific conditions and explores the k-nearest neighbor algorithm's functionality in classification and regression problems, including distance calculation methods and algorithm performance factors.", 'chapters': [{'end': 10716.428, 'start': 10482.541, 'title': 'Probability and events', 'summary': "Discusses the concepts of independent and dependent events using examples of rolling a dice and drawing marbles, explaining the calculations and formulas for conditional probability and bayes' theorem.", 'duration': 233.887, 'highlights': ['The chapter discusses the concepts of independent and dependent events using examples of rolling a dice and drawing marbles The discussion provides clear examples of independent events in rolling a dice and dependent events in drawing marbles, illustrating the concept of event dependency.', 'explaining the calculations and formulas for conditional probability The explanation includes calculations of conditional probability for events such as taking out red and green marbles, demonstrating the application of conditional probability formulas.', "Bayes' theorem and its relation to naive bias The chapter delves into Bayes' theorem and highlights its significance as the crux behind naive bias, providing insights into the theorem's application in probability calculations."]}, {'end': 11146.46, 'start': 10717.168, 'title': 'Probability calculation in data science', 'summary': "Explains the calculation of probabilities for binary classification in data science, including the equations for the probability of 'yes' and 'no', and the normalization process to determine the final probability percentage.", 'duration': 429.292, 'highlights': ["The chapter explains the calculation of probabilities for binary classification in data science. It covers the process of calculating probabilities for binary classification and emphasizes the importance of understanding the equations for the probability of 'yes' and 'no'.", "The equations for the probability of 'yes' and 'no' are detailed, including the specific calculations for each variable. The equations for the probability of 'yes' and 'no' are explained, along with the specific calculations for each variable, providing insights into the process of determining probabilities in binary classification.", "The normalization process to determine the final probability percentage is described, including the specific calculations for 'yes' and 'no'. The normalization process to determine the final probability percentage is detailed, including the specific calculations for 'yes' and 'no', highlighting the method for obtaining the normalized probabilities in binary classification."]}, {'end': 11402.38, 'start': 11146.46, 'title': 'Binary classification with outlook feature', 'summary': 'Covers a binary classification problem using the outlook feature, where the dataset contains 14 yes and 5 no outcomes, with a detailed explanation of probabilities and counts for different weather categories.', 'duration': 255.92, 'highlights': ['The dataset contains 14 yes and 5 no outcomes, with three categories: sunny, overcast, and rain. The total count of outcomes in the dataset, with categorization by weather conditions, providing a clear understanding of the dataset distribution.', 'Detailed explanation of probabilities and counts for different weather categories, such as 2 yes and 3 no for sunny, 4 yes and 0 no for overcast, and 3 yes and 2 no for rain. A thorough breakdown of the probabilities and counts for each weather category, offering insight into the distribution of outcomes.', 'Clear explanation of the probability calculations, including 2/9 for sunny, 4/9 for overcast, and 3/9 for rain, and the corresponding probabilities of no for each category. Precise calculation of probabilities for yes and no outcomes based on the different weather categories, providing essential insights for the classification problem.']}, {'end': 11763.973, 'start': 11402.38, 'title': 'Probability calculation in machine learning', 'summary': "Discusses the calculation of probabilities for different weather conditions and their impact on the decision to play, including calculations for 'yes' and 'no' probabilities, and the application of these calculations to predict the likelihood of playing based on specific weather conditions.", 'duration': 361.593, 'highlights': ["The chapter provides detailed calculations for the probabilities of 'yes' and 'no' given different weather conditions, including hot, mild, and cold, and the subsequent application of these calculations to predict the likelihood of playing.", 'It explains the process of probability calculation for specific weather conditions, such as sunny and hot, and the subsequent normalization of these probabilities to determine the likelihood of playing based on the given weather conditions.', 'The chapter introduces the concept of using additional features, such as humidity and wind, in probability calculations, and mentions the knn algorithm as a topic for further discussion.']}, {'end': 12006.409, 'start': 11763.973, 'title': 'Understanding k-nearest neighbor algorithm', 'summary': "Covers the k-nearest neighbor algorithm, explaining its application in both classification and regression problems, the calculation of distances using euclidean and manhattan methods, and the impact of hyperparameter k, outliers, and imbalanced datasets on the algorithm's performance.", 'duration': 242.436, 'highlights': ['Explanation of K-Nearest Neighbor algorithm for both classification and regression problems. The algorithm is capable of solving classification and regression problems by finding the k-nearest neighbors for a given data point.', 'Calculation of distances using Euclidean and Manhattan methods. The Euclidean distance is calculated using the formula: sqrt((x2 - x1)^2 + (y2 - y1)^2), while the Manhattan distance is calculated as mod(x2 - x1) + mod(y2 - y1).', "Impact of hyperparameter K, outliers, and imbalanced datasets on the algorithm's performance. The hyperparameter K is crucial in determining the number of nearest neighbors to consider, while the algorithm performs poorly with outliers and imbalanced datasets."]}], 'duration': 1523.868, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY10482541.jpg', 'highlights': ['The chapter discusses the concepts of independent and dependent events using examples of rolling a dice and drawing marbles.', 'The chapter explains the calculation of probabilities for binary classification in data science.', 'The dataset contains 14 yes and 5 no outcomes, with three categories: sunny, overcast, and rain.', 'Explanation of K-Nearest Neighbor algorithm for both classification and regression problems.', "Bayes' theorem and its relation to naive bias are discussed in the chapter.", 'Calculation of distances using Euclidean and Manhattan methods is explained.', "The normalization process to determine the final probability percentage is described, including the specific calculations for 'yes' and 'no'.", "The chapter provides detailed calculations for the probabilities of 'yes' and 'no' given different weather conditions, including hot, mild, and cold.", "The equations for the probability of 'yes' and 'no' are detailed, including the specific calculations for each variable.", 'The explanation includes calculations of conditional probability for events such as taking out red and green marbles.']}, {'end': 14867.597, 'segs': [{'end': 12892.679, 'src': 'heatmap', 'start': 12651.037, 'weight': 0.702, 'content': [{'end': 12656.023, 'text': "And again, how do we calculate that which feature we should take next? I'll discuss about it.", 'start': 12651.037, 'duration': 4.986}, {'end': 12659.587, 'text': "Let's say that after this, I take up temperature.", 'start': 12656.423, 'duration': 3.164}, {'end': 12665.966, 'text': 'I take up temperature and I start splitting again since this is impure.', 'start': 12661.263, 'duration': 4.703}, {'end': 12671.81, 'text': 'And this split will happen until we get finally a pure split.', 'start': 12666.266, 'duration': 5.544}, {'end': 12673.511, 'text': 'Similarly, with respect to rain,', 'start': 12671.99, 'duration': 1.521}, {'end': 12681.436, 'text': 'we will go ahead and take another feature and we will keep on splitting unless and until we get a leaf node which is completely pure.', 'start': 12673.511, 'duration': 7.925}, {'end': 12684.579, 'text': 'I hope you understood how this exactly works.', 'start': 12682.197, 'duration': 2.382}, {'end': 12686.14, 'text': 'Now two questions.', 'start': 12685.039, 'duration': 1.101}, {'end': 12689.001, 'text': 'Two questions is that, Krish.', 'start': 12686.88, 'duration': 2.121}, {'end': 12696.864, 'text': 'the first thing is that how do we calculate this purity and how do we come to know that this is a pure split?', 'start': 12689.001, 'duration': 7.863}, {'end': 12705.027, 'text': "Just by seeing, definitely, I can say, I can definitely say, by just seeing that how many number of yes or no's are there?", 'start': 12697.724, 'duration': 7.303}, {'end': 12707.428, 'text': 'based on that, I can definitely say it is a pure split or not.', 'start': 12705.027, 'duration': 2.401}, {'end': 12710.009, 'text': 'So for this we use two different things.', 'start': 12707.788, 'duration': 2.221}, {'end': 12717.125, 'text': 'One is entropy and the other one is something called as Gini coefficient.', 'start': 12711.124, 'duration': 6.001}, {'end': 12725.507, 'text': 'So we will try to understand how does entropy work and how does Gini coefficient work in decision tree,', 'start': 12717.465, 'duration': 8.042}, {'end': 12732.389, 'text': 'which will help us to determine whether the split is pure split or not, or whether this node is leaf node or not.', 'start': 12725.507, 'duration': 6.882}, {'end': 12734.049, 'text': 'Then coming to the second thing.', 'start': 12732.849, 'duration': 1.2}, {'end': 12738.215, 'text': 'Okay, coming to the second thing, one is with respect to purity.', 'start': 12735.171, 'duration': 3.044}, {'end': 12744.964, 'text': 'Second thing, your first most important question which you had asked why did I probably select outlook? How the features are selected?', 'start': 12738.295, 'duration': 6.669}, {'end': 12748.668, 'text': 'And here you have a topic which is called as information gain.', 'start': 12745.604, 'duration': 3.064}, {'end': 12753.223, 'text': 'And if you know this both, your problem is solved.', 'start': 12749.699, 'duration': 3.524}, {'end': 12759.711, 'text': "So now let's go ahead and let's understand about entropy or Gini coefficient or information gain.", 'start': 12753.644, 'duration': 6.067}, {'end': 12765.758, 'text': "Entropy or Gini coefficient, oh sorry, Gini coefficient I'm saying, Gini impurity also you can say over here.", 'start': 12760.111, 'duration': 5.647}, {'end': 12769.421, 'text': "I'll write it as guinea impurity, not coefficient.", 'start': 12766.458, 'duration': 2.963}, {'end': 12773.584, 'text': "also, I'll just say it as guinea impurity, but I hope everybody is understood.", 'start': 12769.421, 'duration': 4.163}, {'end': 12774.385, 'text': 'till here.', 'start': 12773.584, 'duration': 0.801}, {'end': 12779.91, 'text': "let's go ahead and let's discuss about the first thing, that is entropy.", 'start': 12774.385, 'duration': 5.525}, {'end': 12783.573, 'text': 'how does entropy work and how we are going to use the formula.', 'start': 12779.91, 'duration': 3.663}, {'end': 12786.548, 'text': 'so entropy Here I will just write Gini.', 'start': 12783.573, 'duration': 2.975}, {'end': 12789.61, 'text': 'So we are going to discuss about this both the things.', 'start': 12787.308, 'duration': 2.302}, {'end': 12799.375, 'text': "Let's say that the entropy formula which is given by I will write H of S is equal to, so H of S is equal to minus, P plus.", 'start': 12790.33, 'duration': 9.045}, {'end': 12801.416, 'text': 'I will talk about what is minus.', 'start': 12799.375, 'duration': 2.041}, {'end': 12807.099, 'text': 'what is P plus log base 2, P plus minus.', 'start': 12801.416, 'duration': 5.683}, {'end': 12811.834, 'text': 'P minus log base 2.', 'start': 12807.099, 'duration': 4.735}, {'end': 12813.255, 'text': 'So this is the formula.', 'start': 12811.834, 'duration': 1.421}, {'end': 12818.199, 'text': 'and in Gini impurity, the formula is 1 minus.', 'start': 12813.255, 'duration': 4.944}, {'end': 12821.821, 'text': 'summation of i is equal to 1 to n p square.', 'start': 12818.199, 'duration': 3.622}, {'end': 12827.505, 'text': 'I will even talk about when you should use Gini impurity, when you should not use Gini impurity, when you should use entropy.', 'start': 12821.841, 'duration': 5.664}, {'end': 12835.43, 'text': 'By default, the decision tree classification uses Gini impurity.', 'start': 12828.005, 'duration': 7.425}, {'end': 12837.832, 'text': "Now let's take one specific example.", 'start': 12835.83, 'duration': 2.002}, {'end': 12841.134, 'text': 'So my example is that I have a feature 1, my root node.', 'start': 12838.252, 'duration': 2.882}, {'end': 12843.895, 'text': 'I have a feature 1, which is my root node.', 'start': 12842.054, 'duration': 1.841}, {'end': 12848.496, 'text': "And let's say that in this root node, I have 6 yes and 3 no's.", 'start': 12844.875, 'duration': 3.621}, {'end': 12849.356, 'text': 'Very simple.', 'start': 12848.876, 'duration': 0.48}, {'end': 12852.017, 'text': "Let's say that this has two categories.", 'start': 12850.196, 'duration': 1.821}, {'end': 12855.198, 'text': 'And based on these two categories, a split has happened.', 'start': 12852.997, 'duration': 2.201}, {'end': 12856.658, 'text': 'This is C1.', 'start': 12855.898, 'duration': 0.76}, {'end': 12860.019, 'text': "Let's say in this, I have 3 yes, 3 no's.", 'start': 12857.338, 'duration': 2.681}, {'end': 12863.5, 'text': "And here I have 3 yes, 0 no's.", 'start': 12861.019, 'duration': 2.481}, {'end': 12864.981, 'text': 'And this is my second category.', 'start': 12863.8, 'duration': 1.181}, {'end': 12869.142, 'text': 'Always understand, if I do the summation, 3 yes and 3 yes is 6 yes.', 'start': 12865.101, 'duration': 4.041}, {'end': 12876.656, 'text': 'See this, this summation, if I do 3 plus 3 is obviously 6, 3 plus 0 is obviously 3.', 'start': 12870.475, 'duration': 6.181}, {'end': 12877.916, 'text': 'So this you need to understand.', 'start': 12876.656, 'duration': 1.26}, {'end': 12881.757, 'text': 'Based on the number of root nodes only, almost it will be same.', 'start': 12878.156, 'duration': 3.601}, {'end': 12889.758, 'text': "Now let's go ahead and let's understand how do we calculate, let's take this example, how do we calculate the entropy of this.", 'start': 12882.057, 'duration': 7.701}, {'end': 12892.679, 'text': 'So I have already shown you the entropy formula over here.', 'start': 12890.138, 'duration': 2.541}], 'summary': 'The transcript discusses the process of feature selection and purity calculation using entropy and gini impurity in decision tree classification.', 'duration': 241.642, 'max_score': 12651.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY12651037.jpg'}, {'end': 13407.25, 'src': 'embed', 'start': 13376.696, 'weight': 4, 'content': [{'end': 13380.418, 'text': "S of V, don't worry guys, if you have not understood the formula, I will explain it.", 'start': 13376.696, 'duration': 3.722}, {'end': 13384.399, 'text': 'Then the sample size, H of SV.', 'start': 13381.118, 'duration': 3.281}, {'end': 13386.44, 'text': "I'll discuss about each and every parameter.", 'start': 13384.719, 'duration': 1.721}, {'end': 13389.741, 'text': "Let's say that I'm taking this feature 1 split.", 'start': 13386.7, 'duration': 3.041}, {'end': 13392.682, 'text': 'You have already seen what is feature 1.', 'start': 13390.762, 'duration': 1.92}, {'end': 13394.343, 'text': 'So this is my feature 1.', 'start': 13392.682, 'duration': 1.661}, {'end': 13395.884, 'text': 'I have two categories, C1, C2.', 'start': 13394.343, 'duration': 1.541}, {'end': 13397.664, 'text': "This has 9 yes, 5 no's.", 'start': 13395.904, 'duration': 1.76}, {'end': 13399.465, 'text': "This has 6 yes and 2 no's.", 'start': 13398.204, 'duration': 1.261}, {'end': 13407.25, 'text': "And this has three yes and three no's.", 'start': 13405.489, 'duration': 1.761}], 'summary': 'Explaining sample size calculation using feature 1 with 3 categories and quantifiable data.', 'duration': 30.554, 'max_score': 13376.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY13376696.jpg'}, {'end': 13515.92, 'src': 'embed', 'start': 13482.739, 'weight': 2, 'content': [{'end': 13484.3, 'text': 'Again, you can use calculator if you want.', 'start': 13482.739, 'duration': 1.561}, {'end': 13487.242, 'text': 'Now, I have definitely found out this.', 'start': 13484.92, 'duration': 2.322}, {'end': 13489.063, 'text': 'This is specifically for the root node.', 'start': 13487.282, 'duration': 1.781}, {'end': 13493.345, 'text': "Now, let's see the next thing, the next important thing, which is this part.", 'start': 13490.023, 'duration': 3.322}, {'end': 13500.67, 'text': 'What is S of V and what is S and what is H of SV? Now, very important, just have a look.', 'start': 13493.986, 'duration': 6.684}, {'end': 13503.352, 'text': 'Everybody, see this graph.', 'start': 13500.99, 'duration': 2.362}, {'end': 13505.277, 'text': 'Okay, see this graph.', 'start': 13504.417, 'duration': 0.86}, {'end': 13507.438, 'text': 'I will talk about H of SV.', 'start': 13505.957, 'duration': 1.481}, {'end': 13509.718, 'text': "First of all, I'll talk about H of SV.", 'start': 13507.538, 'duration': 2.18}, {'end': 13510.438, 'text': 'Okay, this one.', 'start': 13509.818, 'duration': 0.62}, {'end': 13515.92, 'text': 'This is the entropy of category 1 you need to find and entropy of category 2 you need to find.', 'start': 13511.198, 'duration': 4.722}], 'summary': 'Analyzing entropy for categories 1 and 2.', 'duration': 33.181, 'max_score': 13482.739, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY13482739.jpg'}, {'end': 13768.613, 'src': 'embed', 'start': 13743.024, 'weight': 1, 'content': [{'end': 13749.286, 'text': 'And whichever path has the highest information gain, then we will select that specific thing.', 'start': 13743.024, 'duration': 6.262}, {'end': 13751.11, 'text': 'Now the question rises.', 'start': 13749.75, 'duration': 1.36}, {'end': 13753.391, 'text': 'Krish, obviously this is good.', 'start': 13751.39, 'duration': 2.001}, {'end': 13756.131, 'text': 'But you had written about Gini impurity.', 'start': 13754.071, 'duration': 2.06}, {'end': 13758.372, 'text': 'What is the purpose of that? Please explain us.', 'start': 13756.271, 'duration': 2.101}, {'end': 13763.312, 'text': 'And why Gini impurity is basically used? So let me go ahead with Gini impurity.', 'start': 13759.312, 'duration': 4}, {'end': 13768.613, 'text': 'I told that yes, you can obviously use entropy.', 'start': 13764.013, 'duration': 4.6}], 'summary': 'Select path with highest information gain; also use gini impurity for decision making.', 'duration': 25.589, 'max_score': 13743.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY13743024.jpg'}, {'end': 14563.11, 'src': 'heatmap', 'start': 14082.278, 'weight': 0.916, 'content': [{'end': 14085.739, 'text': 'So in F1, the decision tree will basically first of all sort these values.', 'start': 14082.278, 'duration': 3.461}, {'end': 14096.841, 'text': 'So I have 1.3, then you have 2.3, then you have 4, then you have 3, then you have 4, then you have 5, and then you have 6.', 'start': 14085.759, 'duration': 11.082}, {'end': 14099.164, 'text': 'Now whenever you have a continuous feature.', 'start': 14096.841, 'duration': 2.323}, {'end': 14102.669, 'text': 'so how the continuous feature will basically work in this case?', 'start': 14099.164, 'duration': 3.505}, {'end': 14114.747, 'text': 'first of all, your decision tree node will say that will take this one only one first record and say that if it is less than or equal to 1.3..', 'start': 14102.669, 'duration': 12.078}, {'end': 14120.216, 'text': 'Okay, if it is less than or equal to 1.3, so here you will be getting two branches, yes or no.', 'start': 14114.747, 'duration': 5.469}, {'end': 14129.844, 'text': "So yes and no, definitely your output over here will be put over here, right? And then for the no, here you'll be having another node.", 'start': 14121.317, 'duration': 8.527}, {'end': 14134.106, 'text': "Over here, how many number of records you'll be having? In this particular case, you'll be having one record.", 'start': 14130.124, 'duration': 3.982}, {'end': 14137.187, 'text': "In this particular case, you'll be having around 5 to 6 records.", 'start': 14134.446, 'duration': 2.741}, {'end': 14140.829, 'text': "And here also you'll be able to see, right, how many yes and no's are there.", 'start': 14137.888, 'duration': 2.941}, {'end': 14143.21, 'text': 'Definitely this will be a leaf node.', 'start': 14141.529, 'duration': 1.681}, {'end': 14148.172, 'text': 'So in the first instance, they will go ahead and calculate the information gain of this.', 'start': 14143.57, 'duration': 4.602}, {'end': 14158.008, 'text': "Then, probably once the information gain is got, then what they'll do? They will Take the first two records and again create a new decision tree.", 'start': 14148.732, 'duration': 9.276}, {'end': 14163.693, 'text': "Let's say that this will be my suggestion where they'll say it is less than or equal to 2.3.", 'start': 14158.569, 'duration': 5.124}, {'end': 14165.875, 'text': 'So I will get 1 and 1 over here.', 'start': 14163.693, 'duration': 2.182}, {'end': 14173.282, 'text': "So in this now you'll be having two records which will basically say how many yes and no are there and remaining all records will come over here.", 'start': 14166.676, 'duration': 6.606}, {'end': 14177.244, 'text': 'Then again information gain will be computed here.', 'start': 14174.522, 'duration': 2.722}, {'end': 14179.765, 'text': 'Then again what will happen? They will go to the next record.', 'start': 14177.344, 'duration': 2.421}, {'end': 14185.869, 'text': 'Then again they will create another feature where they will say less than or equal to 3 and they will create this many nodes.', 'start': 14180.346, 'duration': 5.523}, {'end': 14192.574, 'text': 'Again they will try to understand that how many yes or no are there and then they will again compute the information gain.', 'start': 14186.71, 'duration': 5.864}, {'end': 14195.516, 'text': 'Like this they will do it for each and every record.', 'start': 14193.454, 'duration': 2.062}, {'end': 14202.945, 'text': "And finally, whichever information gain is higher, they will select that specific value in that feature and they'll split the node.", 'start': 14196.283, 'duration': 6.662}, {'end': 14209.267, 'text': 'So in a continuous feature, whenever you have a continuous feature, this is how it will basically have.', 'start': 14203.165, 'duration': 6.102}, {'end': 14213.728, 'text': 'And then it will try to compute who is having the highest information gain.', 'start': 14209.907, 'duration': 3.821}, {'end': 14218.91, 'text': 'The best information gain will get selected and from there the splitting will happen.', 'start': 14214.389, 'duration': 4.521}, {'end': 14223.163, 'text': "Now let's go ahead and understand about the next topic.", 'start': 14220.322, 'duration': 2.841}, {'end': 14228.865, 'text': 'is that how this entirely things work in decision tree regressor?', 'start': 14223.163, 'duration': 5.702}, {'end': 14233.927, 'text': 'because in decision tree regressor my output is an continuous variable.', 'start': 14228.865, 'duration': 5.062}, {'end': 14242.01, 'text': 'So suppose if I have one feature one feature two and this output is a continuous feature it will be continuous any value can be there.', 'start': 14234.267, 'duration': 7.743}, {'end': 14250.007, 'text': "So in this particular case, how do I split it? So let's say that F1C feature is getting selected.", 'start': 14243.424, 'duration': 6.583}, {'end': 14259.891, 'text': 'Now in this F1C feature, what value will come when it is getting selected? First of all, the entire mean will get calculated of the output.', 'start': 14251.127, 'duration': 8.764}, {'end': 14261.432, 'text': 'Mean will get calculated.', 'start': 14260.271, 'duration': 1.161}, {'end': 14270.396, 'text': 'So here I will have the mean and here the cost function that is used is not Gini coefficient or Gini impurity or entropy.', 'start': 14261.492, 'duration': 8.904}, {'end': 14273.077, 'text': 'Here we use mean squared error.', 'start': 14270.736, 'duration': 2.341}, {'end': 14276.7, 'text': 'or you can also use mean absolute error.', 'start': 14274.699, 'duration': 2.001}, {'end': 14277.921, 'text': 'Now, what is mean squared error?', 'start': 14276.8, 'duration': 1.121}, {'end': 14281.903, 'text': 'If you remember from our linear regression, how do we calculate?', 'start': 14278.261, 'duration': 3.642}, {'end': 14288.987, 'text': '1 by 2m summation of i is equal to 1 to n, y hat minus y whole square.', 'start': 14282.463, 'duration': 6.524}, {'end': 14292.529, 'text': 'y hat of i minus y whole square.', 'start': 14289.687, 'duration': 2.842}, {'end': 14294.11, 'text': 'This is what is mean squared error.', 'start': 14292.869, 'duration': 1.241}, {'end': 14303.615, 'text': 'So what it will do, first based on F1 feature, it will try to assign a mean value, and then it will compute the MSC value.', 'start': 14294.99, 'duration': 8.625}, {'end': 14306.802, 'text': 'And then it will go ahead and do the splitting.', 'start': 14304.42, 'duration': 2.382}, {'end': 14314.227, 'text': 'Now when it is doing splitting based on categories of continuous variable, I will be having different, different categories.', 'start': 14307.763, 'duration': 6.464}, {'end': 14319.411, 'text': 'Now in this categories, what will happen after split? Some records will go over here.', 'start': 14315.188, 'duration': 4.223}, {'end': 14323.995, 'text': 'Then I will be having a mean value of this over here.', 'start': 14321.113, 'duration': 2.882}, {'end': 14326.247, 'text': 'That will be my output.', 'start': 14325.205, 'duration': 1.042}, {'end': 14330.133, 'text': 'And then again, the MAC will get calculated over here.', 'start': 14327.068, 'duration': 3.065}, {'end': 14335.182, 'text': 'As the MAC gets reduced, that basically means we are reaching near the leaf node.', 'start': 14330.374, 'duration': 4.808}, {'end': 14338.044, 'text': 'And the same thing will happen over here.', 'start': 14336.584, 'duration': 1.46}, {'end': 14344.447, 'text': 'So finally, when you follow this path, whatever mean value is present over here, that will be your output.', 'start': 14338.425, 'duration': 6.022}, {'end': 14348.928, 'text': 'This is the difference between the decision tree regressor and the classifier.', 'start': 14344.967, 'duration': 3.961}, {'end': 14354.11, 'text': 'Here, instead of using entropy and all, you use mean squared error or mean absolute error.', 'start': 14349.328, 'duration': 4.782}, {'end': 14356.291, 'text': 'And this is the formula of mean squared error.', 'start': 14354.41, 'duration': 1.881}, {'end': 14362.646, 'text': "Now let's go to the one more topic, which is called as the hyperparameters.", 'start': 14356.811, 'duration': 5.835}, {'end': 14370.677, 'text': 'tell me decision tree, if i keep on growing this to any depth, what kind of problem it will face.', 'start': 14362.646, 'duration': 8.031}, {'end': 14372.699, 'text': 'regressor part you want to me to explain.', 'start': 14370.677, 'duration': 2.022}, {'end': 14374.882, 'text': "okay, let's see.", 'start': 14372.699, 'duration': 2.183}, {'end': 14378.347, 'text': "okay, let's, let's do the regression decision tree.", 'start': 14374.882, 'duration': 3.465}, {'end': 14385.841, 'text': "regressor let's say i have feature f1 and this is my output.", 'start': 14380.436, 'duration': 5.405}, {'end': 14397.272, 'text': "let's say i have values like 20, 24, 26, 28, 30 and this is my feature one with category one, category one.", 'start': 14385.841, 'duration': 11.431}, {'end': 14401.276, 'text': "let's say some categories are there.", 'start': 14397.272, 'duration': 4.004}, {'end': 14406.062, 'text': "let's say i have done the division by F1..", 'start': 14401.276, 'duration': 4.786}, {'end': 14408.044, 'text': 'That is this feature.', 'start': 14407.203, 'duration': 0.841}, {'end': 14410.726, 'text': 'Initially, tell me what is the mean of this.', 'start': 14408.664, 'duration': 2.062}, {'end': 14413.309, 'text': 'That mean value will get assigned over here.', 'start': 14411.527, 'duration': 1.782}, {'end': 14419.535, 'text': 'Then, using MSC, that is mean squared error, here you will try to calculate.', 'start': 14414.01, 'duration': 5.525}, {'end': 14422.778, 'text': 'Suppose I get an MSC of some 37, 47, something like this.', 'start': 14419.635, 'duration': 3.143}, {'end': 14426.641, 'text': 'And then I will try to split this.', 'start': 14424.88, 'duration': 1.761}, {'end': 14430.042, 'text': 'Then I will be getting two more nodes or three more nodes.', 'start': 14427.541, 'duration': 2.501}, {'end': 14430.622, 'text': 'It depends.', 'start': 14430.082, 'duration': 0.54}, {'end': 14433.824, 'text': 'Then that specific nodes will be the part of this.', 'start': 14431.223, 'duration': 2.601}, {'end': 14435.164, 'text': 'Again, the mean will change.', 'start': 14434.004, 'duration': 1.16}, {'end': 14437.265, 'text': 'Again, the mean will change over here.', 'start': 14435.824, 'duration': 1.441}, {'end': 14443.687, 'text': 'Suppose this two is there, this two records goes here, right? Then again, MSE will get calculated.', 'start': 14437.585, 'duration': 6.102}, {'end': 14446.208, 'text': "I'm just taking as an example over here.", 'start': 14444.367, 'duration': 1.841}, {'end': 14447.809, 'text': 'Just try to assume this thing.', 'start': 14446.448, 'duration': 1.361}, {'end': 14453.071, 'text': 'Now, if I talk about hyperparameters, see, this is what is the formula that gets applied over MSE.', 'start': 14448.169, 'duration': 4.902}, {'end': 14460.41, 'text': "Now, let's see in this hyperparameter, always understand decision tree leads to overfitting,", 'start': 14454.508, 'duration': 5.902}, {'end': 14464.491, 'text': 'because we are just going to divide the nodes to whatever level we want.', 'start': 14460.41, 'duration': 4.081}, {'end': 14467.632, 'text': 'So this obviously will lead to overfitting.', 'start': 14465.351, 'duration': 2.281}, {'end': 14472.714, 'text': 'Now in order to prevent overfitting, we perform two important steps.', 'start': 14468.893, 'duration': 3.821}, {'end': 14476.775, 'text': 'One is post pruning and one is pre pruning.', 'start': 14472.934, 'duration': 3.841}, {'end': 14480.656, 'text': 'So this two post pruning and pre pruning is a condition.', 'start': 14477.435, 'duration': 3.221}, {'end': 14482.457, 'text': "Let's say that I have done some splits.", 'start': 14480.716, 'duration': 1.741}, {'end': 14485.176, 'text': 'I have done some splits.', 'start': 14484.255, 'duration': 0.921}, {'end': 14488.097, 'text': "Let's say over here, I have seven yes and two no.", 'start': 14485.676, 'duration': 2.421}, {'end': 14492.119, 'text': 'And again, probably I do the further split like this.', 'start': 14489.718, 'duration': 2.401}, {'end': 14497.782, 'text': 'Now, in this particular scenario, you know that if seven yes and two nos are there, there is a maximum.', 'start': 14492.56, 'duration': 5.222}, {'end': 14502.345, 'text': 'there is more than 80% chances that this node is saying that the output is yes.', 'start': 14497.782, 'duration': 4.563}, {'end': 14508.443, 'text': 'So should we further do more pruning? The answer is no.', 'start': 14502.905, 'duration': 5.538}, {'end': 14511.305, 'text': 'We can close it and we can cut the branch from here.', 'start': 14508.743, 'duration': 2.562}, {'end': 14515.087, 'text': 'This technique is basically called as post pruning.', 'start': 14511.965, 'duration': 3.122}, {'end': 14519.531, 'text': 'That basically means, first of all, you create your decision tree,', 'start': 14515.748, 'duration': 3.783}, {'end': 14524.735, 'text': 'then probably see the decision tree and see that whether there is an extra branch or not, and just try to cut it.', 'start': 14519.531, 'duration': 5.204}, {'end': 14527.737, 'text': 'There is one more thing which is called as pre pruning.', 'start': 14525.675, 'duration': 2.062}, {'end': 14531.974, 'text': 'Now pre-pruning is decided by hyperparameters.', 'start': 14528.513, 'duration': 3.461}, {'end': 14537.637, 'text': 'What kind of hyperparameters? You can basically say that how many number of decision tree needs to be used.', 'start': 14532.515, 'duration': 5.122}, {'end': 14539.758, 'text': 'Not number of decision tree, sorry.', 'start': 14538.277, 'duration': 1.481}, {'end': 14541.978, 'text': 'Over here.', 'start': 14541.358, 'duration': 0.62}, {'end': 14543.799, 'text': 'you may say that what is the max depth?', 'start': 14541.978, 'duration': 1.821}, {'end': 14546.2, 'text': 'What is the max depth?', 'start': 14545.28, 'duration': 0.92}, {'end': 14548.001, 'text': 'How many max leaf you can have?', 'start': 14546.28, 'duration': 1.721}, {'end': 14551.062, 'text': 'So this all parameters.', 'start': 14549.401, 'duration': 1.661}, {'end': 14552.963, 'text': 'you can set it with grid search CV.', 'start': 14551.062, 'duration': 1.901}, {'end': 14558.968, 'text': 'And you can try it and you can basically come up with a pre-pruning technique.', 'start': 14554.966, 'duration': 4.002}, {'end': 14563.11, 'text': 'So this is the idea about decision tree regressor.', 'start': 14559.729, 'duration': 3.381}], 'summary': 'Decision tree uses information gain to split continuous features; regression uses mean squared error; hyperparameters prevent overfitting through post and pre pruning.', 'duration': 480.832, 'max_score': 14082.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY14082278.jpg'}, {'end': 14259.891, 'src': 'embed', 'start': 14228.865, 'weight': 0, 'content': [{'end': 14233.927, 'text': 'because in decision tree regressor my output is an continuous variable.', 'start': 14228.865, 'duration': 5.062}, {'end': 14242.01, 'text': 'So suppose if I have one feature one feature two and this output is a continuous feature it will be continuous any value can be there.', 'start': 14234.267, 'duration': 7.743}, {'end': 14250.007, 'text': "So in this particular case, how do I split it? So let's say that F1C feature is getting selected.", 'start': 14243.424, 'duration': 6.583}, {'end': 14259.891, 'text': 'Now in this F1C feature, what value will come when it is getting selected? First of all, the entire mean will get calculated of the output.', 'start': 14251.127, 'duration': 8.764}], 'summary': 'In decision tree regressor, the output is a continuous variable, requiring calculation of the mean for selected features.', 'duration': 31.026, 'max_score': 14228.865, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY14228865.jpg'}, {'end': 14511.305, 'src': 'embed', 'start': 14480.716, 'weight': 3, 'content': [{'end': 14482.457, 'text': "Let's say that I have done some splits.", 'start': 14480.716, 'duration': 1.741}, {'end': 14485.176, 'text': 'I have done some splits.', 'start': 14484.255, 'duration': 0.921}, {'end': 14488.097, 'text': "Let's say over here, I have seven yes and two no.", 'start': 14485.676, 'duration': 2.421}, {'end': 14492.119, 'text': 'And again, probably I do the further split like this.', 'start': 14489.718, 'duration': 2.401}, {'end': 14497.782, 'text': 'Now, in this particular scenario, you know that if seven yes and two nos are there, there is a maximum.', 'start': 14492.56, 'duration': 5.222}, {'end': 14502.345, 'text': 'there is more than 80% chances that this node is saying that the output is yes.', 'start': 14497.782, 'duration': 4.563}, {'end': 14508.443, 'text': 'So should we further do more pruning? The answer is no.', 'start': 14502.905, 'duration': 5.538}, {'end': 14511.305, 'text': 'We can close it and we can cut the branch from here.', 'start': 14508.743, 'duration': 2.562}], 'summary': 'After analyzing splits, with 7 yes and 2 no, decision indicates over 80% probability of yes, no further pruning needed.', 'duration': 30.589, 'max_score': 14480.716, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY14480716.jpg'}], 'start': 12006.769, 'title': 'Decision trees and gini impurity', 'summary': 'Covers decision trees for regression and classification, explaining the use of nodes and leaf nodes, decision tree classification, pure and impure split, entropy, gini impurity, information gain, and prevention of overfitting through pruning and hyperparameters.', 'chapters': [{'end': 12283.694, 'start': 12006.769, 'title': 'Understanding decision trees for regression and classification', 'summary': 'Discusses the purpose of decision trees, demonstrating their use for solving regression and classification problems by creating nodes and leaf nodes, which are visualized representations of nested if-else conditions, and can be used to solve specific problem statements. it also emphasizes the use of decision trees in visualizing and solving problems, using a specific dataset.', 'duration': 276.925, 'highlights': ['The chapter demonstrates the use of decision trees for solving regression and classification problems by creating nodes and leaf nodes, providing a visualized representation of nested if-else conditions. (Relevance: 5)', 'It emphasizes the use of decision trees in visualizing and solving problems, using a specific dataset. (Relevance: 4)', 'Decision trees are visualized representations of nested if-else conditions, used to solve specific problem statements. (Relevance: 3)', 'The chapter discusses the purpose of decision trees and their use for solving regression and classification problems. (Relevance: 2)', 'The chapter also discusses the concept of using specific datasets to understand and solve problems. (Relevance: 1)']}, {'end': 12753.223, 'start': 12283.694, 'title': 'Decision tree classification', 'summary': 'Explains a classification problem statement using a decision tree to predict whether a person will play tennis based on input features like outlook, temperature, humidity, and wind, and discusses the concepts of pure and impure split, and the use of entropy and gini coefficient to determine purity in decision tree.', 'duration': 469.529, 'highlights': ['The chapter explains a classification problem statement using a decision tree to predict whether a person will play tennis based on input features like outlook, temperature, humidity, and wind. Classification problem statement, prediction of playing tennis, input features: outlook, temperature, humidity, wind.', 'The chapter discusses the concepts of pure and impure split in decision tree, where pure split means having all yes or all no in a node, and impure split requires further splitting until a pure leaf node is achieved. Explanation of pure and impure split, criteria for pure split, continuous splitting until a pure leaf node is achieved.', 'The use of entropy and Gini coefficient is explained in determining whether a split is pure or not, and to identify if a node is a leaf node in the decision tree. Explanation of entropy and Gini coefficient, role in determining split purity and identifying leaf nodes.']}, {'end': 13080.125, 'start': 12753.644, 'title': 'Entropy and gini impurity', 'summary': 'Discusses the concepts of entropy and gini impurity, including their formulas, use cases, and the calculation of entropy based on a specific example, demonstrating a pure split resulting in an entropy of zero.', 'duration': 326.481, 'highlights': ['The chapter discusses the concepts of entropy and Gini impurity, including their formulas and use cases. Concepts of entropy and Gini impurity, including formulas and use cases.', 'Demonstrates the calculation of entropy based on a specific example, resulting in a pure split with an entropy of zero. Calculation of entropy based on a specific example, resulting in a pure split with entropy of zero.']}, {'end': 13848.535, 'start': 13082.006, 'title': 'Entropy and information gain in decision trees', 'summary': 'Discusses the concept of entropy and information gain in decision trees, illustrating how to calculate entropy for a root node and its categories, and how to use information gain to determine which feature to use for splitting, with an example demonstrating the superiority of feature 2 over feature 1.', 'duration': 766.529, 'highlights': ['The chapter discusses the concept of entropy and information gain in decision trees, illustrating how to calculate entropy for a root node and its categories, and how to use information gain to determine which feature to use for splitting, with an example demonstrating the superiority of feature 2 over feature 1. Concept of entropy and information gain, calculation of entropy for root node and categories, use of information gain to determine feature for splitting.', 'The formula for entropy (H of S) is discussed, along with a step-by-step example of calculating the entropy for a root node, demonstrating the calculation of probability and subsequent entropy value of 0.94. Formula for entropy, step-by-step example of calculating entropy for a root node, demonstration of probability calculation and resulting entropy value.', 'The process of using information gain to determine the feature for splitting is explained, with a detailed example showing the calculation of information gain for feature 1 and feature 2, emphasizing the superiority of feature 2 for splitting. Explanation of using information gain for feature selection, detailed example of information gain calculation for feature 1 and feature 2, emphasis on the superiority of feature 2.']}, {'end': 14867.597, 'start': 13850.239, 'title': 'Understanding decision tree and gini impurity', 'summary': 'Covers the differences between gini impurity and entropy, when to use each, the calculation of entropy and information gain for continuous features, the usage of mean squared error in decision tree regressor, and the prevention of overfitting through post pruning and pre pruning with hyperparameters.', 'duration': 1017.358, 'highlights': ['The calculation of entropy and information gain for continuous features involves sorting the values, creating decision tree nodes based on comparisons, computing information gain for each record, and selecting the feature with the highest information gain for splitting. Sorting values, creating decision tree nodes, computing information gain, selecting feature for splitting', 'Decision tree regressor uses mean squared error or mean absolute error for calculating the cost function, and the splitting is based on the mean value assigned to a feature. Usage of mean squared error, splitting based on mean value', 'To prevent overfitting, post pruning and pre pruning with hyperparameters are performed, where post pruning involves cutting unnecessary branches in the decision tree and pre pruning sets maximum depth and maximum leaf parameters. Post pruning, pre pruning with hyperparameters, setting maximum depth and maximum leaf parameters']}], 'duration': 2860.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY12006769.jpg', 'highlights': ['The chapter demonstrates the use of decision trees for solving regression and classification problems by creating nodes and leaf nodes, providing a visualized representation of nested if-else conditions. (Relevance: 5)', 'The chapter explains a classification problem statement using a decision tree to predict whether a person will play tennis based on input features like outlook, temperature, humidity, and wind. (Relevance: 4)', 'The chapter discusses the concepts of entropy and Gini impurity, including their formulas and use cases. (Relevance: 3)', 'The formula for entropy (H of S) is discussed, along with a step-by-step example of calculating the entropy for a root node, demonstrating the calculation of probability and subsequent entropy value of 0.94. (Relevance: 2)', 'To prevent overfitting, post pruning and pre pruning with hyperparameters are performed, where post pruning involves cutting unnecessary branches in the decision tree and pre pruning sets maximum depth and maximum leaf parameters. (Relevance: 1)']}, {'end': 16113.306, 'segs': [{'end': 15603.902, 'src': 'embed', 'start': 15574.657, 'weight': 1, 'content': [{'end': 15578.02, 'text': 'Then this model 2 may be a teacher with respect to chemistry.', 'start': 15574.657, 'duration': 3.363}, {'end': 15585.006, 'text': "Let's say model 3 is basically a teacher of maths and model 4 is a teacher of geography.", 'start': 15578.52, 'duration': 6.486}, {'end': 15591.991, 'text': 'Now suppose, if you are trying to solve one problem, obviously if the physics teacher is not able to solve that particular problem,', 'start': 15585.726, 'duration': 6.265}, {'end': 15597.356, 'text': 'then probably chemistry can help, or maths can help, or geography can help, or someone can help.', 'start': 15591.991, 'duration': 5.365}, {'end': 15603.902, 'text': 'So when we combine this many expertise together, they will be able to give you the output in an efficient way.', 'start': 15597.476, 'duration': 6.426}], 'summary': 'Models 2, 3, and 4 have expertise in chemistry, maths, and geography respectively, collaborating to efficiently solve problems.', 'duration': 29.245, 'max_score': 15574.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY15574657.jpg'}, {'end': 15641.109, 'src': 'embed', 'start': 15617.833, 'weight': 2, 'content': [{'end': 15625.52, 'text': 'okay, boosting is, you can just say that it is a sequential set of all the models combined together,', 'start': 15617.833, 'duration': 7.687}, {'end': 15631.065, 'text': 'and these all models that are initialized are usually weak learners, and when they are combined together, they become a strong learner.', 'start': 15625.52, 'duration': 5.545}, {'end': 15634.326, 'text': 'And based on this strong learner, they give an amazing output.', 'start': 15631.625, 'duration': 2.701}, {'end': 15641.109, 'text': 'And right now, if I say in most of the Kaggle competition, they use different types of boosting or bagging technique.', 'start': 15634.946, 'duration': 6.163}], 'summary': 'Boosting combines weak learners to create a strong learner, yielding impressive results in kaggle competitions.', 'duration': 23.276, 'max_score': 15617.833, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY15617833.jpg'}, {'end': 15730.25, 'src': 'embed', 'start': 15703.604, 'weight': 0, 'content': [{'end': 15709.226, 'text': 'Does it not lead to overfitting whenever you probably have a decision tree right?', 'start': 15703.604, 'duration': 5.622}, {'end': 15710.807, 'text': 'It leads to something like overfitting.', 'start': 15709.346, 'duration': 1.461}, {'end': 15716.813, 'text': 'Why overfitting? it completely splits all the feature till its complete depth.', 'start': 15710.847, 'duration': 5.966}, {'end': 15721.619, 'text': 'Overfitting basically means for training data the accuracy is high, for test data the accuracy is low.', 'start': 15717.174, 'duration': 4.445}, {'end': 15730.25, 'text': 'So training data when the accuracy is high, I may basically say it as high bias and then I may basically say it as sorry, not high bias,', 'start': 15722.06, 'duration': 8.19}], 'summary': 'Decision trees can lead to overfitting by completely splitting all features, resulting in high training data accuracy but low test data accuracy.', 'duration': 26.646, 'max_score': 15703.604, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY15703604.jpg'}, {'end': 16063.28, 'src': 'embed', 'start': 16032.083, 'weight': 4, 'content': [{'end': 16033.583, 'text': 'also i can give to a specific model.', 'start': 16032.083, 'duration': 1.5}, {'end': 16040.626, 'text': 'so internally that random forest should take care of over here these things are there and this is how random forest works.', 'start': 16033.583, 'duration': 7.043}, {'end': 16046.108, 'text': 'only the difference between random forest classify and regression is that in regression again, whatever output you are basically getting,', 'start': 16040.626, 'duration': 5.482}, {'end': 16047.729, 'text': 'you basically do the mean.', 'start': 16046.108, 'duration': 1.621}, {'end': 16049.349, 'text': "that's it average.", 'start': 16047.729, 'duration': 1.62}, {'end': 16050.25, 'text': 'you just do the average.', 'start': 16049.349, 'duration': 0.901}, {'end': 16054.613, 'text': "you'll be able to get the output based on all the models, output that you are actually getting.", 'start': 16050.25, 'duration': 4.363}, {'end': 16057.716, 'text': "now let's talk about some of the important points in random forest.", 'start': 16054.613, 'duration': 3.103}, {'end': 16063.28, 'text': 'the first thing, first question, is that is normalization required in random forest?', 'start': 16057.716, 'duration': 5.564}], 'summary': 'Random forest handles classification and regression, averaging outputs from multiple models.', 'duration': 31.197, 'max_score': 16032.083, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY16032083.jpg'}], 'start': 14867.597, 'title': 'Decision tree pruning and ensemble techniques', 'summary': 'Discusses decision tree post-pruning, highlighting the process after certain splits, and covers ensemble techniques, emphasizing bagging and boosting within the context of multiple algorithms and the working of random forest classifier and regressor in addressing variance.', 'chapters': [{'end': 14947.655, 'start': 14867.597, 'title': 'Decision tree post-pruning process', 'summary': 'Discusses the post-pruning process in decision tree modeling, where it is observed that after certain splits, no further splits are required, and the importance of maximum value being less than 0.5 is highlighted, with an exception of a value of 0.667, which needs further investigation.', 'duration': 80.058, 'highlights': ['The importance of maximum value being less than 0.5 is emphasized, with an exception of a value of 0.667, which needs further investigation.', 'Post-pruning process is discussed, where it is observed that after certain splits, no further splits are required, indicating the decision-making process of the model.', 'The process of plotting a graph using sklearn import tree and defining the agenda for the decision tree model is briefly mentioned.']}, {'end': 15162.65, 'start': 14947.655, 'title': 'Ensemble techniques and bagging vs. boosting', 'summary': 'Covers the basics of ensemble techniques, discussing the difference between bagging and boosting, and the use of multiple algorithms to solve problems, emphasizing the concepts of bagging and boosting within ensemble techniques.', 'duration': 214.995, 'highlights': ['The chapter covers the basics of ensemble techniques, discussing the difference between bagging and boosting, and the use of multiple algorithms to solve problems. It introduces the concepts of ensemble techniques, specifically focusing on the comparison between bagging and boosting, and the use of multiple algorithms to solve problems.', 'Bagging and boosting are the two specific ways used in ensemble techniques to combine multiple models to solve a problem. The discussion delves into the specific methods of bagging and boosting within ensemble techniques, highlighting their role in combining multiple models to address problems.', 'In bagging, multiple models are created, such as logistic, decision trees, and knn, to solve a problem by combining their outputs. Bagging involves the creation of multiple models, including logistic, decision trees, and knn, to collectively address a problem by combining their outputs.']}, {'end': 15597.356, 'start': 15162.65, 'title': 'Bagging and boosting techniques', 'summary': 'Introduces the concepts of bootstrap aggregating (bagging) and boosting, describing how bagging involves training multiple models with row sampling and combining their outputs through majority voting, while boosting sequentially combines weak learner models to create a strong learner model.', 'duration': 434.706, 'highlights': ['Bagging involves training multiple models with row sampling and combining their outputs through majority voting. The concept of bagging includes training multiple models with different rows using row sampling and combining their outputs through majority voting.', "Boosting sequentially combines weak learner models to create a strong learner model. Boosting is a sequential combination of weak learner models, where each weak learner's output is fed to the next model, and when combined, they create a strong learner model.", 'In bagging, the majority voting is used to aggregate the outputs of different models trained with row sampling. Bagging involves aggregating the outputs of different models trained with row sampling using majority voting to determine the final output.', 'In boosting, weak learner models are combined sequentially to create a strong learner model. Boosting involves combining weak learner models sequentially to create a strong learner model, where the output of one model feeds into the next model.']}, {'end': 16113.306, 'start': 15597.476, 'title': 'Random forest: boosting, bagging and algorithm', 'summary': 'Explains the concept of boosting, bagging, and the working of random forest classifier and regressor. it emphasizes converting high variance to low variance and addresses the need for normalization in random forest.', 'duration': 515.83, 'highlights': ['Random forest combines multiple decision trees to create a generalized model with low bias and low variance, addressing the overfitting problem encountered in individual decision trees. By combining multiple decision trees, random forest creates a generalized model with low bias and low variance, addressing the overfitting problem encountered in individual decision trees.', 'Normalization is not required in random forest or decision tree, as the splits made by decision trees are not impacted by minimizing the data. Normalization is not required in random forest or decision tree, as the splits made by decision trees are not impacted by minimizing the data.', 'Random forest involves row sampling and feature sampling for each model, and the majority voting from these models helps in converting high variance to low variance. Random forest involves row sampling and feature sampling for each model, and the majority voting from these models helps in converting high variance to low variance.']}], 'duration': 1245.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY14867597.jpg', 'highlights': ['Random forest combines multiple decision trees to create a generalized model with low bias and low variance, addressing the overfitting problem encountered in individual decision trees.', 'The process of plotting a graph using sklearn import tree and defining the agenda for the decision tree model is briefly mentioned.', 'Boosting sequentially combines weak learner models to create a strong learner model.', 'Bagging involves training multiple models with row sampling and combining their outputs through majority voting.', 'The importance of maximum value being less than 0.5 is emphasized, with an exception of a value of 0.667, which needs further investigation.']}, {'end': 18383.182, 'segs': [{'end': 16198.602, 'src': 'embed', 'start': 16176.491, 'weight': 2, 'content': [{'end': 16185.899, 'text': 'So second thing we are going to discuss about is boosting technique and this is The first algorithm that we are going to discuss about is AdaBoost.', 'start': 16176.491, 'duration': 9.408}, {'end': 16190.68, 'text': 'So AdaBoost, we are going to discuss about how does AdaBoost work.', 'start': 16186.579, 'duration': 4.101}, {'end': 16195.561, 'text': "Now let's solve the first boosting technique which is called as AdaBoost.", 'start': 16191.36, 'duration': 4.201}, {'end': 16198.602, 'text': 'And this is a boosting technique.', 'start': 16196.622, 'duration': 1.98}], 'summary': 'Discussion about adaboost, a boosting technique algorithm.', 'duration': 22.111, 'max_score': 16176.491, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY16176491.jpg'}, {'end': 17198.126, 'src': 'heatmap', 'start': 16944.229, 'weight': 0.716, 'content': [{'end': 16956.516, 'text': 'randomly creates those numbers between 0 to 1 and whichever bucket it will come in, like 07 to 014, 014 to 07, basically means 014..', 'start': 16944.229, 'duration': 12.287}, {'end': 16960.016, 'text': 'Then 0, 2, 1, 2.', 'start': 16956.516, 'duration': 3.5}, {'end': 16961.397, 'text': 'See how the bucket is getting created.', 'start': 16960.017, 'duration': 1.38}, {'end': 16962.898, 'text': 'This value is getting added to this.', 'start': 16961.457, 'duration': 1.441}, {'end': 16964.218, 'text': 'So that becomes this bucket.', 'start': 16962.938, 'duration': 1.28}, {'end': 16965.958, 'text': '0, 2, 1 plus 0.537.', 'start': 16964.718, 'duration': 1.24}, {'end': 16972.54, 'text': 'How much it is? It is nothing but 470.', 'start': 16965.958, 'duration': 6.582}, {'end': 16978.461, 'text': '0.747 Then 0.747 to 0.751.', 'start': 16972.54, 'duration': 5.921}, {'end': 16979.901, 'text': 'Like this, you create all the buckets.', 'start': 16978.461, 'duration': 1.44}, {'end': 16981.542, 'text': 'Okay You can create all the buckets.', 'start': 16980.022, 'duration': 1.52}, {'end': 16985.643, 'text': 'Now tell me which record is basically having the biggest bucket size.', 'start': 16982.002, 'duration': 3.641}, {'end': 16987.644, 'text': 'Obviously, this record.', 'start': 16986.523, 'duration': 1.121}, {'end': 16995.448, 'text': 'So, if I randomly create a number between 0 to 1, what is the highest probability that the values will be going in?', 'start': 16988.384, 'duration': 7.064}, {'end': 17000.07, 'text': 'So in this particular case, most of the wrong records will be passed along with the other records.', 'start': 16995.808, 'duration': 4.262}, {'end': 17004.753, 'text': 'Obviously, other records, there are chances that other records will go to the next decision tree.', 'start': 17000.471, 'duration': 4.282}, {'end': 17010.196, 'text': 'But understand maximum number will go with the wrong records because the bucket is high over here.', 'start': 17005.113, 'duration': 5.083}, {'end': 17012.284, 'text': 'the bucket is high over here.', 'start': 17011.124, 'duration': 1.16}, {'end': 17019.186, 'text': 'so most of the time this specific record will get selected and then it will be gone to the second tree.', 'start': 17012.284, 'duration': 6.902}, {'end': 17021.707, 'text': 'now suppose i have this all records.', 'start': 17019.186, 'duration': 2.521}, {'end': 17023.727, 'text': 'so this is my first stump.', 'start': 17021.707, 'duration': 2.02}, {'end': 17025.768, 'text': 'this is my second stump.', 'start': 17023.727, 'duration': 2.041}, {'end': 17027.008, 'text': 'this is my third stump.', 'start': 17025.768, 'duration': 1.24}, {'end': 17029.629, 'text': 'similarly the third stump from the second stump.', 'start': 17027.008, 'duration': 2.621}, {'end': 17031.329, 'text': 'whichever wrong records will be going.', 'start': 17029.629, 'duration': 1.7}, {'end': 17033.15, 'text': 'maximum number of records will go over here.', 'start': 17031.329, 'duration': 1.821}, {'end': 17034.671, 'text': 'Then again it will be trained.', 'start': 17033.61, 'duration': 1.061}, {'end': 17036.692, 'text': "Like this we'll be having lot of stumps.", 'start': 17034.871, 'duration': 1.821}, {'end': 17039.313, 'text': 'Minimum 100 decision trees can be added.', 'start': 17037.252, 'duration': 2.061}, {'end': 17043.296, 'text': 'You know that every decision tree will give one output for a new test data.', 'start': 17039.633, 'duration': 3.663}, {'end': 17044.816, 'text': 'New test data.', 'start': 17043.996, 'duration': 0.82}, {'end': 17046.477, 'text': 'This weak learner will give one output.', 'start': 17044.916, 'duration': 1.561}, {'end': 17047.978, 'text': 'This weak learner will give one output.', 'start': 17046.497, 'duration': 1.481}, {'end': 17051.02, 'text': 'This weak learner and this weak learner will be giving one output.', 'start': 17048.298, 'duration': 2.722}, {'end': 17059.308, 'text': 'obviously at a time, complexity will be more now from this particular output suppose it is a binary classification i will be getting 0, 1, 1, 1.', 'start': 17051.56, 'duration': 7.748}, {'end': 17064.132, 'text': 'so again, over here, majority voting will happen and the output will be 1..', 'start': 17059.308, 'duration': 4.824}, {'end': 17072.1, 'text': 'in case of regression problem, i will be having a continuous value over here and for this the average Average will be computed,', 'start': 17064.132, 'duration': 7.968}, {'end': 17073.82, 'text': 'and that will give me an output over here.', 'start': 17072.1, 'duration': 1.72}, {'end': 17077.022, 'text': 'So for regression, the average will be done.', 'start': 17074.141, 'duration': 2.881}, {'end': 17081.403, 'text': 'For classification, what will happen? Majority will be happening.', 'start': 17077.042, 'duration': 4.361}, {'end': 17084.485, 'text': 'So everywhere that same part will be going on.', 'start': 17081.584, 'duration': 2.901}, {'end': 17086.365, 'text': 'Buckets is very much simple guys.', 'start': 17084.985, 'duration': 1.38}, {'end': 17090.007, 'text': 'Buckets basically means based on this weights, normalized weight, we are going to create bucket.', 'start': 17086.425, 'duration': 3.582}, {'end': 17093.31, 'text': 'So that whichever records has the highest bucket.', 'start': 17090.487, 'duration': 2.823}, {'end': 17099.036, 'text': 'based on this randomly creating code, you know it will select those specific buckets and put it into random forest.', 'start': 17093.31, 'duration': 5.726}, {'end': 17101.658, 'text': 'We understand why this bucket size is big.', 'start': 17099.216, 'duration': 2.442}, {'end': 17107.845, 'text': 'The other wrong records which are present, right? Suppose they are more than four to five wrong records, their bucket size will also be bigger.', 'start': 17102.339, 'duration': 5.506}, {'end': 17115.147, 'text': 'And because based on this randomly creating number between 0 to 1, most of the wrong records will be selected and given to the second stump.', 'start': 17108.625, 'duration': 6.522}, {'end': 17118.409, 'text': 'Similarly, this particular decision tree will be doing some mistakes.', 'start': 17115.768, 'duration': 2.641}, {'end': 17123.41, 'text': 'then that wrong records will get updated, all the weights will get updated and it will be passed to the next decision tree.', 'start': 17118.409, 'duration': 5.001}, {'end': 17128.032, 'text': 'Guys, when I say wrong record, the output will be same only, you know, 0 and 1.', 'start': 17123.791, 'duration': 4.241}, {'end': 17133.595, 'text': 'So interesting everyone? I hope you understood so much of maths in AdaBoost and how AdaBoost actually work.', 'start': 17128.032, 'duration': 5.563}, {'end': 17138.718, 'text': 'Three main things, one is total error, one is performance of stump, and one is the new sample weight.', 'start': 17133.955, 'duration': 4.763}, {'end': 17141.179, 'text': 'These things are getting calculated extensively.', 'start': 17139.178, 'duration': 2.001}, {'end': 17147.003, 'text': 'Max normalized weight was basically used because the sum of all these weights are approximately equal to one.', 'start': 17141.82, 'duration': 5.183}, {'end': 17150.205, 'text': 'When boosting, why not take the last output? No, no, no.', 'start': 17147.603, 'duration': 2.602}, {'end': 17153.226, 'text': 'We have to give the importance of every decision tree output.', 'start': 17150.585, 'duration': 2.641}, {'end': 17156.028, 'text': 'Every decision tree output are important.', 'start': 17154.347, 'duration': 1.681}, {'end': 17162.39, 'text': 'Okay, let me talk about one model which is called as black box model versus white box.', 'start': 17156.528, 'duration': 5.862}, {'end': 17170.112, 'text': 'What is the difference between black box model and white box? If I take an example of linear regression, tell me what kind of model it is.', 'start': 17162.91, 'duration': 7.202}, {'end': 17172.833, 'text': 'Is it a white box model or black box?', 'start': 17171.033, 'duration': 1.8}, {'end': 17178.716, 'text': 'If I take an example of random forest, Is this a white box or black box?', 'start': 17173.573, 'duration': 5.143}, {'end': 17183.118, 'text': 'If I take an example of decision tree, it is a white box or black box model.', 'start': 17178.796, 'duration': 4.322}, {'end': 17187.16, 'text': 'If I take an example of ANN, is it a white box or black box model?', 'start': 17183.158, 'duration': 4.002}, {'end': 17198.126, 'text': 'Linear regression is basically called as a white box model because here you can basically visualize how the theta value is basically changing and how it is coming to a global minima and all those things.', 'start': 17188.001, 'duration': 10.125}], 'summary': 'Adaboost creates decision trees, uses majority voting for classification, and averages for regression.', 'duration': 253.897, 'max_score': 16944.229, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY16944229.jpg'}, {'end': 17115.147, 'src': 'embed', 'start': 17086.425, 'weight': 1, 'content': [{'end': 17090.007, 'text': 'Buckets basically means based on this weights, normalized weight, we are going to create bucket.', 'start': 17086.425, 'duration': 3.582}, {'end': 17093.31, 'text': 'So that whichever records has the highest bucket.', 'start': 17090.487, 'duration': 2.823}, {'end': 17099.036, 'text': 'based on this randomly creating code, you know it will select those specific buckets and put it into random forest.', 'start': 17093.31, 'duration': 5.726}, {'end': 17101.658, 'text': 'We understand why this bucket size is big.', 'start': 17099.216, 'duration': 2.442}, {'end': 17107.845, 'text': 'The other wrong records which are present, right? Suppose they are more than four to five wrong records, their bucket size will also be bigger.', 'start': 17102.339, 'duration': 5.506}, {'end': 17115.147, 'text': 'And because based on this randomly creating number between 0 to 1, most of the wrong records will be selected and given to the second stump.', 'start': 17108.625, 'duration': 6.522}], 'summary': 'Creating buckets based on weights for records, impacting bucket size with wrong records count.', 'duration': 28.722, 'max_score': 17086.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY17086425.jpg'}, {'end': 17162.39, 'src': 'embed', 'start': 17133.955, 'weight': 5, 'content': [{'end': 17138.718, 'text': 'Three main things, one is total error, one is performance of stump, and one is the new sample weight.', 'start': 17133.955, 'duration': 4.763}, {'end': 17141.179, 'text': 'These things are getting calculated extensively.', 'start': 17139.178, 'duration': 2.001}, {'end': 17147.003, 'text': 'Max normalized weight was basically used because the sum of all these weights are approximately equal to one.', 'start': 17141.82, 'duration': 5.183}, {'end': 17150.205, 'text': 'When boosting, why not take the last output? No, no, no.', 'start': 17147.603, 'duration': 2.602}, {'end': 17153.226, 'text': 'We have to give the importance of every decision tree output.', 'start': 17150.585, 'duration': 2.641}, {'end': 17156.028, 'text': 'Every decision tree output are important.', 'start': 17154.347, 'duration': 1.681}, {'end': 17162.39, 'text': 'Okay, let me talk about one model which is called as black box model versus white box.', 'start': 17156.528, 'duration': 5.862}], 'summary': 'Calculation of total error, stump performance, and new sample weight is essential in boosting models, emphasizing the importance of every decision tree output.', 'duration': 28.435, 'max_score': 17133.955, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY17133955.jpg'}, {'end': 17760.866, 'src': 'embed', 'start': 17728.637, 'weight': 8, 'content': [{'end': 17733.922, 'text': 'again it will be calculated and here you can see that all the points are in its own location.', 'start': 17728.637, 'duration': 5.285}, {'end': 17736.865, 'text': 'so here now no update will actually happen.', 'start': 17733.922, 'duration': 2.943}, {'end': 17740.147, 'text': "let's say that there was one point which was red color over here.", 'start': 17736.865, 'duration': 3.282}, {'end': 17742.693, 'text': 'then this would have become green color.', 'start': 17741.092, 'duration': 1.601}, {'end': 17749.478, 'text': 'But since the updation has happened perfectly, we are not going to update it and we are not going to update the centroid right?', 'start': 17742.933, 'duration': 6.545}, {'end': 17751.579, 'text': 'So now you can understand that.', 'start': 17749.838, 'duration': 1.741}, {'end': 17760.866, 'text': 'yes, now we have actually got the perfect centroid and now this will be considered as one group and this will be basically considered as the another group.', 'start': 17751.579, 'duration': 9.287}], 'summary': 'Data points are in their locations, no updates occurred, resulting in perfect centroids and two distinct groups.', 'duration': 32.229, 'max_score': 17728.637, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY17728637.jpg'}, {'end': 17844.362, 'src': 'embed', 'start': 17815.899, 'weight': 0, 'content': [{'end': 17819.74, 'text': 'We cannot go ahead and directly say that, okay, k is equal to 2 is going to work.', 'start': 17815.899, 'duration': 3.841}, {'end': 17826.483, 'text': 'So obviously, we are going to go with iteration for i is equal to probably 1 to 10.', 'start': 17820.34, 'duration': 6.143}, {'end': 17829.604, 'text': "I'm going to move towards iteration from 1 to 10, let's say.", 'start': 17826.483, 'duration': 3.121}, {'end': 17837.167, 'text': 'So for every iteration, we will construct a graph with respect to k value and with respect to something called as WCSS.', 'start': 17830.024, 'duration': 7.143}, {'end': 17844.362, 'text': 'Now what is this WCSS? WCSS basically means within cluster sum of square.', 'start': 17838.174, 'duration': 6.188}], 'summary': 'Exploring k values through iteration for i=1 to 10 to construct graphs based on wcss', 'duration': 28.463, 'max_score': 17815.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY17815899.jpg'}, {'end': 18241.968, 'src': 'embed', 'start': 18213.398, 'weight': 4, 'content': [{'end': 18215.419, 'text': 'So they are also going to combine into one group.', 'start': 18213.398, 'duration': 2.021}, {'end': 18221.805, 'text': 'So once they combine into one group, then we have P6 and P7, which will be obviously greater than the previous distance.', 'start': 18215.7, 'duration': 6.105}, {'end': 18227.85, 'text': 'And we may get this kind of computation and another combination of cluster will get formed over here.', 'start': 18222.345, 'duration': 5.505}, {'end': 18232.275, 'text': 'Then you have seen that okay, P3 and P5 are nearer to each other.', 'start': 18228.65, 'duration': 3.625}, {'end': 18233.497, 'text': 'So we are going to combine this.', 'start': 18232.315, 'duration': 1.182}, {'end': 18237.041, 'text': 'So I am going to basically combine P3 and P5.', 'start': 18233.877, 'duration': 3.164}, {'end': 18241.968, 'text': "Okay, and let's say that this distance is greater than the previous one,", 'start': 18238.203, 'duration': 3.765}], 'summary': 'Combining into one group forms new clusters with greater distances.', 'duration': 28.57, 'max_score': 18213.398, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY18213398.jpg'}], 'start': 16113.806, 'title': 'Machine learning algorithms', 'summary': 'Covers interview q&a on random forest, adaboost algorithm achieving a total error of 0.895, adaboost weight update, k-means clustering, the elbow method for k-means, and hierarchical clustering method.', 'chapters': [{'end': 16176.471, 'start': 16113.806, 'title': 'Random forest interview q&a', 'summary': 'Discusses common interview questions related to random forest and its impact by outliers, highlighting that random forest is not impacted by outliers while knn is, and also mentions the use of bagging techniques.', 'duration': 62.665, 'highlights': ['Random forest is not impacted by outliers, while KNN is, as mentioned in the interview questions.', 'In bagging, random forest is commonly used, and custom bagging techniques can also be created to combine different algorithms for output.']}, {'end': 16574.764, 'start': 16176.491, 'title': 'Understanding adaboost algorithm', 'summary': 'Explains the adaboost algorithm, its key components, and the sequential process of creating weak learners and combining them to form a strong learner, achieving a total error of 0.895 and selecting the best feature based on information gain and entropy.', 'duration': 398.273, 'highlights': ['The first step in AdaBoost involves assigning equal weights to all input records, ensuring that the sum of the weights is 1, resulting in 1/7 weight for each record.', "The next step is to select a feature based on information gain and entropy, and create a decision tree with only one level, referred to as a 'stump', and train the model with the records.", 'After training, the total error is calculated, with an example of achieving a total error of 1/7, and then the performance of the stump is evaluated using a formula resulting in a performance value of 0.895.', 'The process also involves selecting the best feature, where in this case, it is determined to be 0.895, based on information gain and entropy, which is a crucial step in the AdaBoost algorithm.']}, {'end': 17235.639, 'start': 16574.764, 'title': 'Adaboost weight update and model training', 'summary': 'Explains the process of updating sample weights for correct and incorrect records, normalizing the weights, creating buckets based on normalized weights, and the difference between black box and white box models in machine learning.', 'duration': 660.875, 'highlights': ['Updating sample weights for correct and incorrect records The sample weights for correct records are updated using a formula involving the performance of the decision stump, resulting in a weight of 0.05, while the weights for incorrect records are updated using a different formula, resulting in a weight of 0.349.', 'Normalization of weights After updating the weights, the sum of all weights is found to be 0.649, requiring normalization by dividing each weight by 0.649, resulting in normalized weights of 0.07 and 0.537.', 'Creation of buckets based on normalized weights The creation of buckets involves setting ranges based on the normalized weights, and records are assigned to buckets based on randomly generated numbers and the corresponding bucket sizes, influencing the training of subsequent decision trees.', 'Difference between black box and white box models The distinction between black box and white box models is explained, with examples such as linear regression and decision trees being categorized as white box models, while random forests and artificial neural networks fall into the black box model category.']}, {'end': 17749.478, 'start': 17236.26, 'title': 'K-means clustering algorithm', 'summary': 'Introduces k-means clustering, an unsupervised machine learning algorithm used to create clusters within data, and discusses its application in model creation and the iterative steps involved in determining centroids and updating clusters.', 'duration': 513.218, 'highlights': ['K-means clustering is used for unsupervised machine learning to create clusters within data. It is an unsupervised machine learning algorithm used to create clusters within data, helping to identify similar kinds of data within a dataset.', 'Application of k-means clustering in model creation and ensemble techniques. K-means clustering is used in ensemble techniques for model creation, where clusters are created and then separate supervised machine learning algorithms can be applied to each group.', 'The iterative steps in k-means clustering involve determining centroids and updating clusters using the Euclidean distance and computing averages. The iterative steps include determining centroids, finding points near centroids using Euclidean distance, computing averages to update centroids, and recalculating distances to update clusters.']}, {'end': 18383.182, 'start': 17749.838, 'title': 'K-means and hierarchical clustering', 'summary': 'Discusses the k-means clustering method, explaining the elbow method for determining the optimal k value based on the within cluster sum of squares (wcss), and subsequently delves into the hierarchical clustering method, detailing the process of combining data points into clusters based on their proximity and determining the number of clusters using a dendrogram.', 'duration': 633.344, 'highlights': ['The Elbow Method is used to find the optimized K value for K-means clustering by iterating through K values and constructing a graph based on the within cluster sum of square (WCSS), to identify the abrupt change in WCSS values, which indicates the optimal K value, resulting in improved model performance. Elbow Method, optimized K value, iterating through K values, within cluster sum of square (WCSS), identifying abrupt change, improved model performance', 'In K-means clustering, as the K value increases, the WCSS values become normal, allowing for the identification of the feasible K value where the abrupt change occurs, indicating the complexity of evaluating different K values and WCSS values to determine the optimal K value for clustering. K-means clustering, increasing K value, WCSS values normalization, identifying feasible K value, evaluating complexity, optimal K value determination', 'Hierarchical Clustering involves the step-by-step process of combining the nearest data points into clusters based on their proximity, resulting in the formation of a dendrogram, and determining the number of clusters by identifying the longest vertical line without any horizontal line passing through it. Hierarchical Clustering, combining nearest data points, dendrogram formation, determining number of clusters, identifying longest vertical line']}], 'duration': 2269.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY16113806.jpg', 'highlights': ['AdaBoost achieves a total error of 0.895', 'Random forest is not impacted by outliers', 'Elbow Method finds optimized K value for K-means clustering', 'K-means clustering identifies similar data within a dataset', 'Creation of buckets based on normalized weights in AdaBoost', 'K-means clustering used in ensemble techniques for model creation', 'Hierarchical Clustering involves combining nearest data points into clusters', 'Random forest commonly used in bagging for combining different algorithms', 'Difference between black box and white box models explained', 'Updating sample weights for correct and incorrect records in AdaBoost']}, {'end': 19535.542, 'segs': [{'end': 18452.055, 'src': 'embed', 'start': 18424.791, 'weight': 2, 'content': [{'end': 18429.195, 'text': "Okay, I've already uploaded a lot of practical videos with respect to hierarchical clustering and all.", 'start': 18424.791, 'duration': 4.404}, {'end': 18444.672, 'text': 'Now tell me, maximum effort or maximum time is taken by K means? or hierarchical clustering?', 'start': 18429.475, 'duration': 15.197}, {'end': 18445.972, 'text': 'this is a question for you.', 'start': 18444.672, 'duration': 1.3}, {'end': 18450.235, 'text': "yes, guys, number of clusters may be three, but here i'm just showing you that.", 'start': 18445.972, 'duration': 4.263}, {'end': 18452.055, 'text': 'how many lines it may be passed by?', 'start': 18450.235, 'duration': 1.82}], 'summary': 'Practical videos on hierarchical clustering uploaded. maximum effort taken by k means or hierarchical clustering?', 'duration': 27.264, 'max_score': 18424.791, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY18424791.jpg'}, {'end': 18516.99, 'src': 'embed', 'start': 18471.535, 'weight': 0, 'content': [{'end': 18480.502, 'text': 'at that point of time, hierarchical clustering will keep on constructing this kind of dendrograms and it will be taking many, many, many time,', 'start': 18471.535, 'duration': 8.967}, {'end': 18481.963, 'text': 'lot time, right.', 'start': 18480.502, 'duration': 1.461}, {'end': 18487.887, 'text': 'so hierarchical clustering will take more time, maximum time that it is going to basically take.', 'start': 18481.963, 'duration': 5.924}, {'end': 18493.471, 'text': 'so it is very much important that you understand which is making it basically taking more time.', 'start': 18487.887, 'duration': 5.584}, {'end': 18498.354, 'text': 'so if your data set is small, you may go ahead with hierarchical clustering.', 'start': 18493.471, 'duration': 4.883}, {'end': 18504.279, 'text': 'if your data set is large, go with k-means clustering, go with k-means clustering.', 'start': 18499.475, 'duration': 4.804}, {'end': 18509.283, 'text': 'in short, both will take more time, but k-mean will perform better than hierarchical clustering.', 'start': 18504.279, 'duration': 5.004}, {'end': 18516.99, 'text': 'see, guys, you will be forming this kind of dendograms right, and just imagine, if you have 10 features and many data points,', 'start': 18509.283, 'duration': 7.707}], 'summary': 'Hierarchical clustering takes more time than k-means clustering. k-means performs better with large datasets.', 'duration': 45.455, 'max_score': 18471.535, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY18471535.jpg'}, {'end': 19012.772, 'src': 'embed', 'start': 18964.346, 'weight': 1, 'content': [{'end': 18966.627, 'text': 'Let me just show you what is siloed clustering.', 'start': 18964.346, 'duration': 2.281}, {'end': 18971.071, 'text': 'Siloed clustering formula will be something like this B of I.', 'start': 18966.627, 'duration': 4.444}, {'end': 18973.473, 'text': 'so here you have siloed clustering.', 'start': 18971.071, 'duration': 2.402}, {'end': 18980.497, 'text': 'This is the formula b of i, minus a of i, max of a of i, comma b of i.', 'start': 18973.473, 'duration': 7.024}, {'end': 18982.658, 'text': 'if c of i is greater than 1.', 'start': 18980.497, 'duration': 2.161}, {'end': 18992.704, 'text': 'right. so by this you will be getting the value between minus 1 to plus 1, and more the value is towards plus 1, the more good your model is.', 'start': 18982.658, 'duration': 10.046}, {'end': 18999.688, 'text': 'more the value is towards minus 1, more bad your model is, because if it is towards minus 1, that basically means your.', 'start': 18992.704, 'duration': 6.984}, {'end': 19002.189, 'text': 'a of i is obviously greater than b of i.', 'start': 18999.688, 'duration': 2.501}, {'end': 19005.67, 'text': 'So this is the outcome with respect to silhouette clustering.', 'start': 19002.409, 'duration': 3.261}, {'end': 19012.772, 'text': 'If S is equal to zero, that basically means still your model needs to be, basically the clustering needs to be improved.', 'start': 19006.23, 'duration': 6.542}], 'summary': 'Siloed clustering formula measures model performance from -1 to +1, with 0 indicating need for improvement.', 'duration': 48.426, 'max_score': 18964.346, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY18964346.jpg'}, {'end': 19165.282, 'src': 'embed', 'start': 19138.208, 'weight': 5, 'content': [{'end': 19144.931, 'text': 'So can we have a scenario wherein a kind of clustering algorithm is there where we can leave the outlier separately.', 'start': 19138.208, 'duration': 6.723}, {'end': 19146.712, 'text': 'And this outlier.', 'start': 19145.391, 'duration': 1.321}, {'end': 19155.095, 'text': 'in this particular algorithm, and this is basically we will be using dbScan to leave the outlier and this point will be called as a noisy point.', 'start': 19146.712, 'duration': 8.383}, {'end': 19157.836, 'text': 'Noisy point or I can also say it as an outlier.', 'start': 19155.595, 'duration': 2.241}, {'end': 19159.737, 'text': 'So this will be a noise point.', 'start': 19158.256, 'duration': 1.481}, {'end': 19165.282, 'text': 'For this kind of algorithm where you want to skip the outliers, we can definitely use dbScan.', 'start': 19160.157, 'duration': 5.125}], 'summary': 'Utilize dbscan clustering algorithm to identify and separate outliers, also known as noisy points.', 'duration': 27.074, 'max_score': 19138.208, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY19138208.jpg'}, {'end': 19464.636, 'src': 'embed', 'start': 19433.419, 'weight': 4, 'content': [{'end': 19434.84, 'text': 'This will become a border point.', 'start': 19433.419, 'duration': 1.421}, {'end': 19438.402, 'text': 'That basically means, yes, this can be the part of this specific group.', 'start': 19435.28, 'duration': 3.122}, {'end': 19442.163, 'text': 'So what we are doing, whenever there is a noise, we are going to neglect it.', 'start': 19438.882, 'duration': 3.281}, {'end': 19445.005, 'text': 'Wherever there is a broader and core points, we are going to combine it.', 'start': 19442.204, 'duration': 2.801}, {'end': 19449.987, 'text': "So I'll show you one more diagram, which is an amazing diagram, which will help you understand more in this.", 'start': 19445.125, 'duration': 4.862}, {'end': 19452.709, 'text': 'A k-means clustering and a hierarchical mean clustering.', 'start': 19450.228, 'duration': 2.481}, {'end': 19453.669, 'text': 'Now see this, everybody.', 'start': 19452.749, 'duration': 0.92}, {'end': 19456.131, 'text': 'Now the right-hand side of diagram.', 'start': 19454.09, 'duration': 2.041}, {'end': 19460.173, 'text': 'that you see is based on dvScan clustering.', 'start': 19457.191, 'duration': 2.982}, {'end': 19464.636, 'text': 'And the left-hand side is basically your traditional clustering method.', 'start': 19461.254, 'duration': 3.382}], 'summary': 'Border point identified for specific group, noise neglected, k-means and hierarchical mean clustering used.', 'duration': 31.217, 'max_score': 19433.419, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY19433419.jpg'}], 'start': 18383.623, 'title': 'Clustering algorithms', 'summary': 'Covers hierarchical clustering, k-means, and dbscan, detailing the process, time complexity, and validation methods. it also emphasizes the importance of k-means, including the k-means++ algorithm, and explains key concepts of dbscan, with practical application provided through a github link.', 'chapters': [{'end': 18563.957, 'start': 18383.623, 'title': 'Hierarchical clustering and k-means', 'summary': 'Explains the process of hierarchical clustering and k-means, detailing how to determine the number of clusters, the time complexity of both methods, and the validation of clustering models using silhout score.', 'duration': 180.334, 'highlights': ['Hierarchical clustering takes more time than K-means, especially with large datasets Hierarchical clustering will take more time, maximum time that it is going to basically take. So it is very much important that you understand which is making it basically taking more time.', 'Determining the number of clusters in hierarchical clustering The number of clusters is determined by counting the number of lines a straight line passes through, indicating the clusters in hierarchical clustering.', 'Validation of clustering models using Silhout score Clustering models are validated using something called Silhout score, which is used to assess the quality of clusters formed.']}, {'end': 19180.876, 'start': 18565.057, 'title': 'Understanding k-means and dbscan clustering', 'summary': 'Covers the importance of k-means clustering, including the issue of wrong initialization of centroids and the solution provided by the k-means++ algorithm, as well as the significance of silhouette clustering in validating cluster models. additionally, the transcript delves into the key concepts of dbscan clustering, such as core points, border points, and noise points.', 'duration': 615.819, 'highlights': ['The importance of k-means clustering and the issue of wrong initialization of centroids The issue of wrong initialization of centroids in k-means clustering can lead to incorrect clustering, with the possibility of creating more clusters than intended due to the proximity of centroids, emphasizing the need for centroids to be initialized very far apart.', 'The significance of silhouette clustering in validating cluster models Silhouette clustering plays a crucial role in validating cluster models, with the values ranging between -1 to +1, where a value closer to +1 indicates a better clustering model, while a value closer to -1 implies that the distance within clusters is greater than the distance to neighboring clusters.', 'Key concepts of DBScan clustering, including core points, border points, and noise points DBScan clustering introduces core points, border points, and noise points, offering a method to identify and exclude outliers, providing a more robust clustering approach compared to traditional algorithms like k-means or hierarchical means.']}, {'end': 19535.542, 'start': 19181.116, 'title': 'Understanding dbscan clustering', 'summary': 'Introduces the concept of dbscan clustering, explaining key terms such as epsilon, min points, core points, border points, noise points, and outliers, highlighting its advantages over traditional clustering methods and emphasizing its practical application through a github link for a hands-on exercise.', 'duration': 354.426, 'highlights': ['The chapter explains the concept of epsilon and min points in DBSCAN clustering, where epsilon represents the radius of a circle and min points represent the minimum number of points required inside the circle to classify a point as a core point, with a practical example demonstrating the determination of core, border, and noise points.', 'It highlights the use of DBSCAN clustering over traditional methods like k-means, showcasing its ability to group points based on core and border points, avoiding the inclusion of outliers within a group, thus emphasizing the superiority of DBSCAN clustering.', 'The instructor provides a github link for a practical exercise, encouraging the audience to download the code for a hands-on session, reinforcing the practical application of DBSCAN clustering and its relevance in real-world problem-solving.', 'The chapter emphasizes the neglect of noise points and the combination of core and border points in DBSCAN clustering, illustrating the process of identifying outliers and forming distinct groups based on proximity and density, enhancing the understanding of the clustering technique.', 'The instructor stresses the simplicity of using DBSCAN clustering, encouraging the audience to directly utilize it without concerns, instilling confidence in the practical application of the clustering method for data analysis and problem-solving.']}], 'duration': 1151.919, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY18383623.jpg', 'highlights': ['Validation of clustering models using Silhout score', 'Determining the number of clusters in hierarchical clustering', 'Hierarchical clustering takes more time than K-means', 'The importance of k-means clustering and the issue of wrong initialization of centroids', 'Key concepts of DBScan clustering, including core points, border points, and noise points', 'The chapter explains the concept of epsilon and min points in DBSCAN clustering', 'The instructor provides a github link for a practical exercise', 'The chapter emphasizes the neglect of noise points and the combination of core and border points in DBSCAN clustering', 'The significance of silhouette clustering in validating cluster models', 'The instructor stresses the simplicity of using DBSCAN clustering']}, {'end': 20597.649, 'segs': [{'end': 20373.717, 'src': 'embed', 'start': 20340.094, 'weight': 0, 'content': [{'end': 20344.895, 'text': "Okay If I, if I just talk about the definition of variance, I'm just going to refer like this.", 'start': 20340.094, 'duration': 4.801}, {'end': 20363.506, 'text': 'The variance refers to the changes in the model when using, when using different portion of the training or test data.', 'start': 20345.675, 'duration': 17.831}, {'end': 20366.549, 'text': "Now let's understand this particular definition.", 'start': 20364.827, 'duration': 1.722}, {'end': 20373.717, 'text': 'Variance refers to the changes in the model when using different proportion of the test training data or test data.', 'start': 20368.111, 'duration': 5.606}], 'summary': 'Variance measures model changes with different training/test data proportions.', 'duration': 33.623, 'max_score': 20340.094, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20340094.jpg'}, {'end': 20438.445, 'src': 'embed', 'start': 20400.845, 'weight': 1, 'content': [{'end': 20406.448, 'text': 'And suppose in this particular training data, it gets trained and performs well, here I am actually talking about bias.', 'start': 20400.845, 'duration': 5.603}, {'end': 20414.744, 'text': 'But when we come with respect to the prediction of the specific model, At that point of time I can use other training data.', 'start': 20407.468, 'duration': 7.276}, {'end': 20418.988, 'text': 'that basically means that training data may not be similar, or I can also use test data.', 'start': 20414.744, 'duration': 4.244}, {'end': 20422.751, 'text': 'Now in this test data, what we do, we do some kind of predictions.', 'start': 20419.789, 'duration': 2.962}, {'end': 20424.272, 'text': 'These are my predictions.', 'start': 20423.292, 'duration': 0.98}, {'end': 20427.415, 'text': 'And in this prediction, again I may get two scenarios.', 'start': 20425.193, 'duration': 2.222}, {'end': 20431.759, 'text': 'I may get two scenarios, which is basically mentioned by variance.', 'start': 20429.037, 'duration': 2.722}, {'end': 20438.445, 'text': 'It refers to the changes in the model when using different portion of the training or test data.', 'start': 20432.239, 'duration': 6.206}], 'summary': 'Training data may lead to bias, while test data may lead to variance in model predictions.', 'duration': 37.6, 'max_score': 20400.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20400845.jpg'}, {'end': 20572.607, 'src': 'embed', 'start': 20548.481, 'weight': 8, 'content': [{'end': 20555.563, 'text': 'The last scenario is that this is the scenario that we want because it is low bias and low variance.', 'start': 20548.481, 'duration': 7.082}, {'end': 20562.345, 'text': 'Okay Many, many people have basically asked me the definition with respect to bias and variance.', 'start': 20557.804, 'duration': 4.541}, {'end': 20568.586, 'text': "And here I've actually discussed and this indicates, this gives me a generalized model.", 'start': 20562.645, 'duration': 5.941}, {'end': 20572.607, 'text': 'And this is what is our aim when we are working as a data scientist.', 'start': 20569.426, 'duration': 3.181}], 'summary': 'Low bias and low variance scenario aims for a generalized model.', 'duration': 24.126, 'max_score': 20548.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20548481.jpg'}], 'start': 19536.127, 'title': 'Clustering techniques, validation, and understanding bias', 'summary': 'Covers using k-means clustering, silhouette scoring, and wcss to validate models, especially focusing on k-means clustering. it also delves into evaluating k=4 using silhouette cluster score, visualizing data, and silhouette analysis for different clusters. additionally, it discusses the concept of bias and variance in machine learning with examples and scenarios.', 'chapters': [{'end': 19710.361, 'start': 19536.127, 'title': 'Clustering techniques and validation', 'summary': 'Explores the process of using k-means clustering and silhouette scoring to validate clustering models, as well as the use of wcss to determine the optimal number of clusters, with a particular focus on implementing k-means clustering algorithm.', 'duration': 174.234, 'highlights': ['The chapter discusses using k-means clustering and silhouette scoring to validate clustering models. The transcript details the process of using k-means clustering and silhouette scoring from sklearn to validate clustering models.', 'The process of determining the optimal number of clusters using WCSS is explained, with a focus on implementing the K-means clustering algorithm. The process of determining the optimal number of clusters using WCSS and implementing the K-means clustering algorithm is outlined, including the use of different k values and centroid values to minimize WCSS.', 'The use of WCSS (Within Cluster Sum of Square) to determine the optimal number of clusters is demonstrated. The use of WCSS to determine the optimal number of clusters, including the process of importing k-means and using different k values to minimize WCSS, is demonstrated.']}, {'end': 19878.794, 'start': 19710.361, 'title': 'Silhouette cluster score for k=4', 'summary': 'Discusses evaluating the validity of k=4 using silhouette cluster score and comparing different cluster values to determine the best k value, with an emphasis on the importance of the silhouette scoring and the visualization of the data through graphs.', 'duration': 168.433, 'highlights': ['The chapter discusses evaluating the validity of K=4 using Silhouette cluster score The speaker evaluates the validity of K=4 using Silhouette cluster score to determine the best K value.', 'Comparing different cluster values to determine the best K value The speaker compares different cluster values to determine the best K value, emphasizing the importance of finding the optimal K value for clustering.', 'Importance of Silhouette scoring The importance of Silhouette scoring is emphasized, where a higher score towards plus one indicates better clustering.', 'Emphasis on the visualization of the data through graphs The importance of visualizing the data through graphs is highlighted, with the speaker mentioning the use of code to display the data properly in the form of graphs.']}, {'end': 20068.611, 'start': 19878.794, 'title': 'Silhouette analysis for clustering', 'summary': 'Demonstrates the silhouette analysis for different cluster numbers, with the highest average silhouette score of 0.704 for 2 clusters and the need to consider negative values when determining the optimal cluster number.', 'duration': 189.817, 'highlights': ['For n_cluster=2, the average silhouette score is 0.704, indicating a strong clustering performance.', 'It is important to consider negative values when evaluating the cluster solutions, as they indicate suboptimal clustering.', 'The analysis reveals that selecting 2 clusters may perform well due to the absence of negative values and effective points division.', 'The average silhouette score decreases for n_cluster=3, 4, and 5, indicating a less optimal clustering performance.', 'The silhouette analysis suggests choosing a cluster number of 4 or 2, as they exhibit effective points division and absence of negative values.']}, {'end': 20597.649, 'start': 20069.131, 'title': 'Understanding bias and variance', 'summary': 'Discusses the concept of bias and variance in machine learning, providing examples and scenarios to illustrate the difference between high and low bias as well as high and low variance, aiming to create a generalized model for efficient data science work.', 'duration': 528.518, 'highlights': ['The chapter discusses the concept of bias and variance in machine learning, providing examples and scenarios to illustrate the difference between high and low bias as well as high and low variance. The discussion covers the concept of bias and variance in machine learning, providing examples and scenarios to differentiate between high and low bias as well as high and low variance.', 'The session efficiently covers topics including k-means, hierarchical clustering, silhouette score, dbk, and clustering, with plans to cover SVM, SVR, XGBoost, and PCA in the upcoming session. The session covers k-means, hierarchical clustering, silhouette score, dbk, and clustering, with plans for upcoming topics including SVM, SVR, XGBoost, and PCA.', 'The variance refers to the changes in the model when using different portions of the training or test data, leading to scenarios of low variance with good accuracy and high variance with bad accuracy. Variance refers to the changes in the model when using different portions of the training or test data, resulting in low variance with good accuracy and high variance with bad accuracy scenarios.']}], 'duration': 1061.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY19536127.jpg', 'highlights': ['The chapter discusses using k-means clustering and silhouette scoring to validate clustering models.', 'The process of determining the optimal number of clusters using WCSS is explained, with a focus on implementing the K-means clustering algorithm.', 'The use of WCSS (Within Cluster Sum of Square) to determine the optimal number of clusters is demonstrated.', 'The chapter discusses evaluating the validity of K=4 using Silhouette cluster score.', 'Importance of Silhouette scoring is emphasized, where a higher score towards plus one indicates better clustering.', 'For n_cluster=2, the average silhouette score is 0.704, indicating a strong clustering performance.', 'The analysis reveals that selecting 2 clusters may perform well due to the absence of negative values and effective points division.', 'The chapter discusses the concept of bias and variance in machine learning, providing examples and scenarios to illustrate the difference between high and low bias as well as high and low variance.', 'The session efficiently covers topics including k-means, hierarchical clustering, silhouette score, dbk, and clustering, with plans to cover SVM, SVR, XGBoost, and PCA in the upcoming session.']}, {'end': 23864.931, 'segs': [{'end': 20635.5, 'src': 'embed', 'start': 20599.509, 'weight': 0, 'content': [{'end': 20601.23, 'text': 'And I hope you have understood this.', 'start': 20599.509, 'duration': 1.721}, {'end': 20619.474, 'text': "Okay So let's take let's consider a data set credit and let's say this is a approval.", 'start': 20603.23, 'duration': 16.244}, {'end': 20624.757, 'text': 'So we are going to take this sample data set and understand how does XGBoost work.', 'start': 20620.835, 'duration': 3.922}, {'end': 20633.477, 'text': 'Suppose salary is less than or equal to 50 and the credit is bad, so the loan approval will be 0.', 'start': 20626.007, 'duration': 7.47}, {'end': 20635.5, 'text': 'That basically means he or she will not get.', 'start': 20633.477, 'duration': 2.023}], 'summary': 'Analyzing a credit dataset to understand xgboost and loan approval process.', 'duration': 35.991, 'max_score': 20599.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20599509.jpg'}, {'end': 20694.217, 'src': 'embed', 'start': 20667.283, 'weight': 7, 'content': [{'end': 20674.608, 'text': 'If it is greater than 50K and probably if it is normal, then also we are going to get it.', 'start': 20667.283, 'duration': 7.325}, {'end': 20677.81, 'text': 'So this is my data set.', 'start': 20676.709, 'duration': 1.101}, {'end': 20686.615, 'text': 'So how does XGBoost classifier work? Understand the full form of XGBoost is extreme gradient boosting.', 'start': 20677.95, 'duration': 8.665}, {'end': 20689.615, 'text': 'extreme gradient boosting.', 'start': 20688.334, 'duration': 1.281}, {'end': 20694.217, 'text': 'so we will basically understand about extreme gradient boosting.', 'start': 20689.615, 'duration': 4.602}], 'summary': 'Xgboost classifier works on datasets with income over 50k.', 'duration': 26.934, 'max_score': 20667.283, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20667283.jpg'}, {'end': 21469.956, 'src': 'embed', 'start': 21432.785, 'weight': 8, 'content': [{'end': 21433.225, 'text': 'Next step.', 'start': 21432.785, 'duration': 0.44}, {'end': 21436.766, 'text': 'The third step, what we do is that we calculate the information gain.', 'start': 21433.345, 'duration': 3.421}, {'end': 21444.728, 'text': 'now information gain is nothing, but in this particular case the root node similarity weight we will try to add up.', 'start': 21437.666, 'duration': 7.062}, {'end': 21451.631, 'text': 'so i will be getting 0.33 minus this particular top root node.', 'start': 21444.728, 'duration': 6.903}, {'end': 21457.613, 'text': "whatever split has happened, that similarity weight i'll take 0 plus 0.33 minus 0.14.", 'start': 21451.631, 'duration': 5.982}, {'end': 21460.374, 'text': 'so point minus 0.14.', 'start': 21457.613, 'duration': 2.761}, {'end': 21461.734, 'text': 'and if i do it it is nothing.', 'start': 21460.374, 'duration': 1.36}, {'end': 21469.956, 'text': 'but just open your calculator again And 0.33 minus 0.14..', 'start': 21461.734, 'duration': 8.222}], 'summary': 'Calculating information gain using similarity weights, resulting in 0.19 difference.', 'duration': 37.171, 'max_score': 21432.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY21432785.jpg'}], 'start': 20599.509, 'title': 'Understanding xgboost classifier', 'summary': 'Introduces the xgboost classifier for classification and regression, explaining decision tree construction, similarity weight calculation, and information gain, with a focus on a specific example. it also covers the process of further split in binary decision tree, inferencing using base models, and the working of xgboost classifier and regressor, including the use of learning rate, sigmoid activation function, and hyperparameter tuning.', 'chapters': [{'end': 20666.363, 'start': 20599.509, 'title': 'Xgboost for credit approval', 'summary': 'Explores the use of xgboost to predict loan approvals based on a sample credit dataset, where specific salary and credit score conditions determine the approval outcome.', 'duration': 66.854, 'highlights': ['Using XGBoost to predict loan approvals based on specific salary and credit score conditions.', 'If salary is less than or equal to 50 and credit is bad, the loan approval is 0.', 'If salary is greater than 50 and credit is good, the loan approval is 1.']}, {'end': 21491.561, 'start': 20667.283, 'title': 'Understanding xgboost classifier', 'summary': 'Introduces the xgboost classifier, explaining its use for both classification and regression problems, the process of constructing decision trees, and the steps involved in creating an xgboost classifier, with a focus on the specific example of constructing a binary decision tree based on salary feature and calculating similarity weight and information gain.', 'duration': 824.278, 'highlights': ['XGBoost can be used for both classification and regression problems XGBoost is versatile and can be used for both classification and regression problems.', 'The process of constructing decision trees in XGBoost involves creating a base model that gives an output of probability 0.5 The process of constructing decision trees in XGBoost begins with creating a base model that gives an output of probability 0.5.', 'The steps involved in creating an XGBoost classifier include creating a binary decision tree, calculating the similarity weight, and calculating the information gain The steps involved in creating an XGBoost classifier include creating a binary decision tree, calculating the similarity weight, and calculating the information gain.', 'Example of constructing a binary decision tree based on salary feature and calculating similarity weight and information gain A specific example is provided for constructing a binary decision tree based on salary feature and calculating similarity weight and information gain.']}, {'end': 21782.913, 'start': 21491.901, 'title': 'Binary decision tree and inference in xg boost', 'summary': "Focuses on the process of further split in binary decision tree using information gain, with an example of splitting a node based on the 'credit' feature, and then delves into the inferencing part using base models and probability calculation.", 'duration': 291.012, 'highlights': ["The process of further split in binary decision tree using information gain The speaker explains the concept of information gain (0.19) to select the specific node for the split, followed by an example of splitting a node based on the 'Credit' feature.", 'Inferencing part using base models and probability calculation The discussion includes the application of log function to calculate the real probability in the case of base models, with an example of a record going through the base model and the subsequent probability calculation.']}, {'end': 22700.538, 'start': 21782.913, 'title': 'Working of xgboost classifier and regressor', 'summary': 'Explores the functioning of xgboost classifier and regressor, explaining the process of constructing decision trees, calculating similarity weight, information gain, and inferencing, highlighting the use of learning rate, sigmoid activation function, and hyperparameter tuning for boosting decision tree outputs.', 'duration': 917.625, 'highlights': ['The process of constructing decision trees and calculating similarity weight is crucial in XGBoost. The chapter details the process of constructing decision trees and calculating similarity weight for each split, with examples demonstrating the computation of similarity weight and information gain.', 'The use of learning rate and sigmoid activation function in XGBoost for classification is explained. The explanation of learning rate and the application of the sigmoid activation function for classification problems in XGBoost are emphasized.', 'Hyperparameter tuning and pre-pruning are important aspects of XGBoost to address overfitting. The chapter discusses the significance of hyperparameter tuning and pre-pruning in XGBoost to mitigate overfitting and enhance model performance.']}, {'end': 23864.931, 'start': 22700.538, 'title': 'Decision trees, svm, and cost function', 'summary': 'Covers the construction of decision trees, the principles of svm, and the minimization and maximization of the cost function to increase the marginal plane, along with the introduction of parameters such as c and eta to manage errors in the model.', 'duration': 1164.393, 'highlights': ['The chapter covers the construction of decision trees, which involves constructing separate decision trees for different alphas, leading to specific outputs in a regression tree.', 'Explains the principles of SVM, focusing on creating marginal planes with maximum distance to efficiently divide data points, addressing hard and soft marginal planes based on overlapping points and errors.', 'Discusses the minimization and maximization of the cost function to increase the marginal plane, along with the introduction of parameters such as C and Eta to manage errors in the model.']}], 'duration': 3265.422, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JxgmHe2NyeY/pics/JxgmHe2NyeY20599509.jpg', 'highlights': ['XGBoost can be used for both classification and regression problems (3)', 'The process of constructing decision trees in XGBoost involves creating a base model that gives an output of probability 0.5 (2)', 'The process of further split in binary decision tree using information gain (1)', 'The process of constructing decision trees and calculating similarity weight is crucial in XGBoost (4)', 'The use of learning rate and sigmoid activation function in XGBoost for classification is explained (5)', 'Hyperparameter tuning and pre-pruning are important aspects of XGBoost to address overfitting (6)', 'Using XGBoost to predict loan approvals based on specific salary and credit score conditions (7)', 'Explains the principles of SVM, focusing on creating marginal planes with maximum distance to efficiently divide data points, addressing hard and soft marginal planes based on overlapping points and errors (8)', 'Discuss the minimization and maximization of the cost function to increase the marginal plane, along with the introduction of parameters such as C and Eta to manage errors in the model (9)']}], 'highlights': ['Understanding and explaining ML algorithms can lead to successful recruitment', 'AI is creating applications for autonomous decision-making, e.g., Netflix and Amazon.in', 'The training dataset is used to create a model through hypothesis testing', 'The logistic regression cost function needs modification to address local minima', 'F-score is calculated using the formula 1 plus beta square precision multiplied by recall divided by beta square multiplied by precision plus recall', 'AdaBoost achieves a total error of 0.895', 'XGBoost can be used for both classification and regression problems', 'The chapter discusses using k-means clustering and silhouette scoring to validate clustering models', 'The process of constructing decision trees in XGBoost involves creating a base model that gives an output of probability 0.5', 'The process of further split in binary decision tree using information gain']}