title
Different Types of Feature Engineering Encoding Techniques

description
In this video we will be discussing about the different types of Feature Engineering Encoding Techniques Support me on Patreon: https://www.patreon.com/join/2340909? Please fill the below form to get all the Feature Engineering materials. Google Form Url: https://docs.google.com/forms/d/e/1FAIpQLSdve8QIZ7w_7UsEHbDMIHEWykHTN3qa9gPr8_Kyd-frzvHmFA/viewform?usp=sf_link amazon url: https://www.amazon.in/Hands-Python-Finance-implementing-strategies/dp/1789346371/ref=as_sl_pc_tf_til?tag=krishnaik06-21&linkCode=w00&linkId=41bfad1a02096671f9a78ae1160f57ac&creativeASIN=1789346371 Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below amazon url: https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-Tensor/dp/9352135210/ref=as_sl_pc_qf_sp_asin_til?tag=krishnaik06-21&linkCode=w00&linkId=a706a13cecffd115aef76f33a760e197&creativeASIN=9352135210 Connect with me here: Twitter: https://twitter.com/Krishnaik06 Facebook: https://www.facebook.com/krishnaik06 instagram: https://www.instagram.com/krishnaik06 Subscribe my unboxing Channel https://www.youtube.com/channel/UCjWY5hREA6FFYrthD0rZNIw Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning! Deep Learning Playlist: https://www.youtube.com/watch?v=DKSZHN7jftI&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi Data Science Projects playlist: https://www.youtube.com/watch?v=5Txi0nHIe0o&list=PLZoTAELRMXVNUcr7osiU7CCm8hcaqSzGw NLP playlist: https://www.youtube.com/watch?v=6ZVf1jnEKGI&list=PLZoTAELRMXVMdJ5sqbCK2LiM0HhQVWNzm Statistics Playlist: https://www.youtube.com/watch?v=GGZfVeZs_v4&list=PLZoTAELRMXVMhVyr3Ri9IQ-t5QPBtxzJO Feature Engineering playlist: https://www.youtube.com/watch?v=NgoLMsaZ4HU&list=PLZoTAELRMXVPwYGE2PXD3x0bfKnR0cJjN Computer Vision playlist: https://www.youtube.com/watch?v=mT34_yu5pbg&list=PLZoTAELRMXVOIBRx0andphYJ7iakSg3Lk Data Science Interview Question playlist: https://www.youtube.com/watch?v=820Qr4BH0YM&list=PLZoTAELRMXVPkl7oRvzyNnyj1HS4wt2K- You can buy my book on Finance with Machine Learning and Deep Learning from the below url amazon url: https://www.amazon.in/Hands-Python-Finance-implementing-strategies/dp/1789346371/ref=sr_1_1?keywords=krish+naik&qid=1560943725&s=gateway&sr=8-1 🙏🙏🙏🙏🙏🙏🙏🙏 YOU JUST NEED TO DO 3 THINGS to support my channel LIKE SHARE & SUBSCRIBE TO MY YOUTUBE CHANNEL

detail
{'title': 'Different Types of Feature Engineering Encoding Techniques', 'heatmap': [{'end': 624.364, 'start': 606.099, 'weight': 0.715}, {'end': 741.186, 'start': 703.494, 'weight': 0.801}, {'end': 876.431, 'start': 835.908, 'weight': 0.836}, {'end': 972.029, 'start': 924.204, 'weight': 0.719}, {'end': 1146.405, 'start': 1013.222, 'weight': 0.949}, {'end': 1204.519, 'start': 1168.885, 'weight': 0.761}, {'end': 1320.62, 'start': 1302.185, 'weight': 0.842}], 'summary': 'Krish offers free feature engineering materials on independence day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, open for two days, and sending the materials to the participants within three days. the video discusses encoding techniques for categorical variables, emphasizing significance and potential impact on machine learning algorithms, covering nominal and ordinal encoding techniques. it also focuses on predicting salary based on education level, highlighting the impact of different degrees on salary levels and covers one hot encoding, creation of dummy variables, and disadvantages, emphasizing the need to delete one column to avoid multicollinearity, and techniques for handling multiple categories in one-hot encoding leading to a significant reduction and successful application in a competition, resulting in a prize.', 'chapters': [{'end': 69.679, 'segs': [{'end': 38.22, 'src': 'embed', 'start': 0.831, 'weight': 0, 'content': [{'end': 3.213, 'text': 'Hello all, my name is Krish and welcome to my YouTube channel.', 'start': 0.831, 'duration': 2.382}, {'end': 8.998, 'text': 'Before moving ahead, guys, I wish you all a very happy Independence Day and in this particular day,', 'start': 3.753, 'duration': 5.245}, {'end': 15.623, 'text': 'I am going to provide you some of the feature engineering materials that I have prepared personally for free of cost.', 'start': 8.998, 'duration': 6.625}, {'end': 19.786, 'text': 'You just have to fill the form that I have given in the description.', 'start': 16.023, 'duration': 3.763}, {'end': 27.713, 'text': 'I just need your email id so that once I get your email id I will be sending you the complete zip file of the feature engineering.', 'start': 20.107, 'duration': 7.606}, {'end': 33.718, 'text': 'that I have personally created, solving each and every techniques over there very easily.', 'start': 28.055, 'duration': 5.663}, {'end': 38.22, 'text': 'you just need to know Python, you just need to know Pandas, you just need to know NumPy.', 'start': 33.718, 'duration': 4.502}], 'summary': 'Krish offers free feature engineering materials for independence day, accessible via email submission.', 'duration': 37.389, 'max_score': 0.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40831.jpg'}, {'end': 79.387, 'src': 'embed', 'start': 52.007, 'weight': 1, 'content': [{'end': 59.01, 'text': 'and this particular link will just be open for two days, guys, and then I will be forwarding you the complete materials within three days.', 'start': 52.007, 'duration': 7.003}, {'end': 62.713, 'text': "So, first of all, I'll just get how many people are there and then, finally,", 'start': 59.45, 'duration': 3.263}, {'end': 69.679, 'text': "I'll shoot down a mail to everyone at once related to the zip file that I'm going to provide with respect to the feature engineering playlist.", 'start': 62.713, 'duration': 6.966}, {'end': 79.387, 'text': 'So today we are basically going to discuss the types of encoding techniques, and whenever I am talking about encoding techniques,', 'start': 70.32, 'duration': 9.067}], 'summary': 'Link open for 2 days, complete materials in 3 days, discussing encoding techniques', 'duration': 27.38, 'max_score': 52.007, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb4052007.jpg'}], 'start': 0.831, 'title': 'A free feature engineering materials giveaway', 'summary': 'Discusses krish offering free feature engineering materials on independence day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, open for two days, and sending the materials to the participants within three days.', 'chapters': [{'end': 69.679, 'start': 0.831, 'title': 'Free feature engineering materials giveaway', 'summary': 'Discusses krish offering free feature engineering materials on independence day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, which will be open for two days, followed by krish sending the materials to the participants within three days.', 'duration': 68.848, 'highlights': ['Krish is offering free feature engineering materials on Independence Day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, which will be open for two days.', 'Viewers just need to know Python, Pandas, and NumPy to follow the complete feature engineering part.', 'The link for the free materials will be open for two days, and Krish will send the materials to the participants within three days.']}], 'duration': 68.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40831.jpg', 'highlights': ['Krish is offering free feature engineering materials on Independence Day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, which will be open for two days.', 'The link for the free materials will be open for two days, and Krish will send the materials to the participants within three days.', 'Viewers just need to know Python, Pandas, and NumPy to follow the complete feature engineering part.']}, {'end': 260.824, 'segs': [{'end': 120.055, 'src': 'embed', 'start': 70.32, 'weight': 0, 'content': [{'end': 79.387, 'text': 'So today we are basically going to discuss the types of encoding techniques, and whenever I am talking about encoding techniques,', 'start': 70.32, 'duration': 9.067}, {'end': 83.471, 'text': 'that basically means we are discussing about the categorical variables.', 'start': 79.387, 'duration': 4.084}, {'end': 88.161, 'text': 'We are discussing about the categorical variables.', 'start': 85.919, 'duration': 2.242}, {'end': 93.785, 'text': 'So before moving ahead guys, let us understand what is a categorical variable.', 'start': 88.821, 'duration': 4.964}, {'end': 95.827, 'text': 'Now you have seen some data set.', 'start': 94.326, 'duration': 1.501}, {'end': 98.789, 'text': 'In some data set you have a feature name called as gender.', 'start': 95.887, 'duration': 2.902}, {'end': 103.033, 'text': 'So in gender you usually have male, female or female, male.', 'start': 99.37, 'duration': 3.663}, {'end': 114.33, 'text': 'right now this two variables are basically categories and in each and every row this will be getting repeated whenever I am considering a particular data set.', 'start': 103.72, 'duration': 10.61}, {'end': 120.055, 'text': 'so you should understand that if we try to provide this values directly to the machine learning algorithm,', 'start': 114.33, 'duration': 5.725}], 'summary': 'Discussing encoding techniques for categorical variables', 'duration': 49.735, 'max_score': 70.32, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb4070320.jpg'}, {'end': 214.854, 'src': 'embed', 'start': 160.706, 'weight': 1, 'content': [{'end': 166.71, 'text': 'Now whenever you are discussing about categorical variables, you need to understand 2 different types of categorical variables.', 'start': 160.706, 'duration': 6.004}, {'end': 169.611, 'text': 'Let me just rub this and let me make you understand.', 'start': 167.23, 'duration': 2.381}, {'end': 178.936, 'text': 'So the first type is basically called as nominal encoding.', 'start': 170.972, 'duration': 7.964}, {'end': 188.476, 'text': 'So the encoding techniques are basically of two types and the second one is something with respect to ordinal encoding.', 'start': 182.452, 'duration': 6.024}, {'end': 196.7, 'text': 'And this encoding techniques guys is with respect to ordinal categorical variables and nominal categorical variables.', 'start': 189.656, 'duration': 7.044}, {'end': 203.824, 'text': "So I'll just try to make you understand what exactly is nominal categorical variables and ordinal categorical variables.", 'start': 196.76, 'duration': 7.064}, {'end': 211.629, 'text': "Understand, suppose I have some feature where I don't have to worry about the arrangement of the categories.", 'start': 204.505, 'duration': 7.124}, {'end': 214.854, 'text': 'nominal category.', 'start': 213.513, 'duration': 1.341}], 'summary': 'Understanding 2 types of categorical variables: nominal encoding and ordinal encoding.', 'duration': 54.148, 'max_score': 160.706, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40160706.jpg'}], 'start': 70.32, 'title': 'Categorical variables encoding', 'summary': 'Covers encoding techniques for categorical variables, emphasizing significance and potential impact on machine learning algorithms, and discusses nominal and ordinal encoding techniques for converting categories into integer or floating point values.', 'chapters': [{'end': 120.055, 'start': 70.32, 'title': 'Types of encoding techniques for categorical variables', 'summary': 'Covers the discussion on encoding techniques for categorical variables, emphasizing the significance of categorical variables and the potential impact on machine learning algorithms.', 'duration': 49.735, 'highlights': ['Categorical variables such as gender (e.g., male, female) are discussed in relation to encoding techniques, emphasizing the repetitive nature of these categories within datasets and their potential impact on machine learning algorithms.', 'The importance of understanding categorical variables in the context of machine learning algorithms is highlighted, particularly in terms of their potential direct input to algorithms.']}, {'end': 260.824, 'start': 120.055, 'title': 'Categorical variables encoding', 'summary': 'Discusses the encoding of categorical variables, covering nominal and ordinal encoding techniques, with a focus on converting categories into integer or floating point values for machine learning algorithms.', 'duration': 140.769, 'highlights': ['The encoding techniques include nominal and ordinal encoding, with a focus on converting categories into integer or floating point values for machine learning algorithms.', 'Ordinal encoding involves arranging categories based on ranks, while nominal encoding does not require the arrangement of categories.', 'The discussion includes examples of nominal categories such as gender and state columns, emphasizing the lack of concern for the ordering of category variables.']}], 'duration': 190.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb4070320.jpg', 'highlights': ['Covers encoding techniques for categorical variables, emphasizing significance and potential impact on machine learning algorithms.', 'Discusses nominal and ordinal encoding techniques for converting categories into integer or floating point values.', 'The importance of understanding categorical variables in the context of machine learning algorithms is highlighted.', 'Categorical variables such as gender (e.g., male, female) are discussed in relation to encoding techniques, emphasizing the repetitive nature of these categories within datasets and their potential impact on machine learning algorithms.', 'The encoding techniques include nominal and ordinal encoding, with a focus on converting categories into integer or floating point values for machine learning algorithms.', 'Ordinal encoding involves arranging categories based on ranks, while nominal encoding does not require the arrangement of categories.']}, {'end': 595.88, 'segs': [{'end': 311.137, 'src': 'embed', 'start': 261.363, 'weight': 0, 'content': [{'end': 262.705, 'text': 'Let me just give you some example.', 'start': 261.363, 'duration': 1.342}, {'end': 264.386, 'text': 'Suppose I have a data set.', 'start': 263.125, 'duration': 1.261}, {'end': 269.35, 'text': 'In that data set, I want to see, I want to predict the salary of a person.', 'start': 264.566, 'duration': 4.784}, {'end': 272.131, 'text': 'So suppose I have some education column in them.', 'start': 269.89, 'duration': 2.241}, {'end': 280.317, 'text': "And in this education column, I'm basically going to specify what all degrees that particular person is having.", 'start': 273.893, 'duration': 6.424}, {'end': 284.32, 'text': "So he may be having bachelor's degree or he may be having B.Com.", 'start': 280.397, 'duration': 3.923}, {'end': 290.291, 'text': 'he may be having PhD, okay, and he may be having masters now.', 'start': 284.929, 'duration': 5.362}, {'end': 298.533, 'text': 'whenever he has all this kind of features, I can rearrange this rearrange based on the type of degree that he holds.', 'start': 290.291, 'duration': 8.242}, {'end': 307.016, 'text': 'now always remember, guys a PhD person will be getting a better salary when compared to your B.Com or your BE person right, and similarly,', 'start': 298.533, 'duration': 8.483}, {'end': 309.817, 'text': 'your masters will also be better than BE right.', 'start': 307.016, 'duration': 2.801}, {'end': 311.137, 'text': 'so you can rearrange this.', 'start': 309.817, 'duration': 1.32}], 'summary': 'Predict salary based on education; phd earns more.', 'duration': 49.774, 'max_score': 261.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40261363.jpg'}, {'end': 447.557, 'src': 'embed', 'start': 418.695, 'weight': 5, 'content': [{'end': 421.737, 'text': 'so in ordinary encoding, here I am going to discuss about label encoding.', 'start': 418.695, 'duration': 3.042}, {'end': 434.852, 'text': 'And the second technique that I am going to discuss about is target guided ordinal encoding.', 'start': 424.867, 'duration': 9.985}, {'end': 439.974, 'text': 'We will discuss about all these things very nicely.', 'start': 437.673, 'duration': 2.301}, {'end': 442.835, 'text': 'But just see this all the categories.', 'start': 440.354, 'duration': 2.481}, {'end': 447.557, 'text': 'and within this particular category we will discuss how does one hot encoding work.', 'start': 442.835, 'duration': 4.722}], 'summary': 'Discussion on label encoding, target guided ordinal encoding, and one hot encoding.', 'duration': 28.862, 'max_score': 418.695, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40418695.jpg'}, {'end': 510.792, 'src': 'embed', 'start': 489.045, 'weight': 3, 'content': [{'end': 500.649, 'text': 'Suppose in the state column, in each and every records, I have only three states like Germany, three countries I will say, Germany, France and Spain.', 'start': 489.045, 'duration': 11.604}, {'end': 501.989, 'text': 'Suppose this is my example.', 'start': 500.689, 'duration': 1.3}, {'end': 510.792, 'text': 'Now, if I need to apply one hot encoding, all I have to do is that, how many number of categories are there, that many number of columns get created.', 'start': 503.33, 'duration': 7.462}], 'summary': 'Using one hot encoding for three countries creates three columns', 'duration': 21.747, 'max_score': 489.045, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40489045.jpg'}, {'end': 592.996, 'src': 'embed', 'start': 561.957, 'weight': 4, 'content': [{'end': 566.643, 'text': 'Now understand over here we have converted this categories into dummy variables.', 'start': 561.957, 'duration': 4.686}, {'end': 568.446, 'text': 'These are basically called as dummy variables.', 'start': 566.663, 'duration': 1.783}, {'end': 572.371, 'text': 'And we can do this with the help of pandas also with the help of sklearn also.', 'start': 569.046, 'duration': 3.325}, {'end': 577.558, 'text': 'In pandas you have something called as get underscore dummies pd dot get underscore dummies which we can basically apply.', 'start': 572.451, 'duration': 5.107}, {'end': 583.905, 'text': "but don't worry about this coding and the practice problem and all, guys, I will be sharing you the whole feature engineering.", 'start': 577.998, 'duration': 5.907}, {'end': 586.709, 'text': 'just make sure you fill the google form over there,', 'start': 583.905, 'duration': 2.804}, {'end': 592.996, 'text': 'provide your email address so that I can forward you the complete zip file of that feature engineering, what I have actually prepared for you all,', 'start': 586.709, 'duration': 6.287}], 'summary': 'The transcript discusses converting categories into dummy variables using pandas and sklearn, with a promise to share feature engineering material.', 'duration': 31.039, 'max_score': 561.957, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40561957.jpg'}], 'start': 261.363, 'title': 'Predicting salary based on education and encoding techniques in data science', 'summary': "Discusses rearranging a dataset based on education level to predict a person's salary, highlighting the impact of different degrees on salary levels. it also covers ordinal and nominal encoding in data science, including techniques like one hot encoding, label encoding, and target guided ordinal encoding, with an emphasis on the application of one hot encoding to nominal categorical variables.", 'chapters': [{'end': 311.137, 'start': 261.363, 'title': 'Predicting salary based on education', 'summary': "Discusses rearranging a dataset based on education level to predict a person's salary, highlighting the impact of different degrees on salary levels.", 'duration': 49.774, 'highlights': ['Rearranging dataset based on education level to predict salary', 'PhD holders receive higher salary compared to B.Com or BE holders', 'Masters degree holders have higher salary than BE holders']}, {'end': 595.88, 'start': 311.137, 'title': 'Encoding techniques in data science', 'summary': 'Discusses the concept of ordinal and nominal encoding in data science, including techniques like one hot encoding, label encoding, and target guided ordinal encoding, with an emphasis on the application of one hot encoding to nominal categorical variables.', 'duration': 284.743, 'highlights': ['The chapter discusses the concept of ordinal and nominal encoding in data science, including techniques like one hot encoding, label encoding, and target guided ordinal encoding. The chapter presents the concept of ordinal and nominal encoding along with techniques like one hot encoding, label encoding, and target guided ordinal encoding.', 'The application of one hot encoding to nominal categorical variables is explained with an example involving three categories: Germany, France, and Spain. The application of one hot encoding to nominal categorical variables is explained using an example involving three categories: Germany, France, and Spain.', 'The concept of dummy variable trap is introduced, which involves converting categories into dummy variables and is applicable using pandas or sklearn. The concept of dummy variable trap is introduced, involving the conversion of categories into dummy variables, which can be applied using pandas or sklearn.']}], 'duration': 334.517, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40261363.jpg', 'highlights': ['PhD holders receive higher salary compared to B.Com or BE holders', 'Masters degree holders have higher salary than BE holders', 'Rearranging dataset based on education level to predict salary', 'The application of one hot encoding to nominal categorical variables is explained with an example involving three categories: Germany, France, and Spain', 'The concept of dummy variable trap is introduced, involving the conversion of categories into dummy variables, which can be applied using pandas or sklearn', 'The chapter discusses the concept of ordinal and nominal encoding in data science, including techniques like one hot encoding, label encoding, and target guided ordinal encoding']}, {'end': 926.725, 'segs': [{'end': 640.008, 'src': 'heatmap', 'start': 606.099, 'weight': 0, 'content': [{'end': 609.101, 'text': 'So suppose I have 1 over here that basically means I am talking about Germany.', 'start': 606.099, 'duration': 3.002}, {'end': 612.562, 'text': 'If I have 1 over here that basically means I am talking about France.', 'start': 609.561, 'duration': 3.001}, {'end': 617.565, 'text': 'If I have both 0 over here that basically means we are talking about Spain.', 'start': 613.003, 'duration': 4.562}, {'end': 620.587, 'text': 'So what we can do is that we can skip this whole column.', 'start': 618.185, 'duration': 2.402}, {'end': 624.364, 'text': 'we can delete that whole column and this two records.', 'start': 621.363, 'duration': 3.001}, {'end': 627.745, 'text': 'when it is 00, it will basically specify this pain column.', 'start': 624.364, 'duration': 3.381}, {'end': 629.545, 'text': 'and that is how simple it is.', 'start': 627.745, 'duration': 1.8}, {'end': 632.086, 'text': 'and this is basically called as dummy variable track.', 'start': 629.545, 'duration': 2.541}, {'end': 637.127, 'text': 'always remember, whenever you are doing one hot encoding, always make sure that you have to delete one of the column either.', 'start': 632.086, 'duration': 5.041}, {'end': 640.008, 'text': 'it may be the first column, last column, second column, anything.', 'start': 637.127, 'duration': 2.881}], 'summary': 'Explaining the process of dummy variable track and one hot encoding, suggesting to delete one of the columns.', 'duration': 33.909, 'max_score': 606.099, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40606099.jpg'}, {'end': 681.555, 'src': 'embed', 'start': 655.509, 'weight': 3, 'content': [{'end': 663.913, 'text': 'Now what is the disadvantage of one hot encoding we need to understand? Always remember guys, suppose over here I have just three categories.', 'start': 655.509, 'duration': 8.404}, {'end': 666.815, 'text': 'But just let me consider one more column.', 'start': 664.574, 'duration': 2.241}, {'end': 671.517, 'text': 'Suppose I will be considering one column called as pin code.', 'start': 667.495, 'duration': 4.022}, {'end': 681.555, 'text': 'Pin code, right? Now in pin code, remember, I will be having various pin codes based on the city that I am staying.', 'start': 674.859, 'duration': 6.696}], 'summary': 'One hot encoding has disadvantages; pin codes vary by city.', 'duration': 26.046, 'max_score': 655.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40655509.jpg'}, {'end': 741.186, 'src': 'heatmap', 'start': 703.494, 'weight': 0.801, 'content': [{'end': 709.159, 'text': 'Now, just imagine if I try to convert this pin code into dummy variables at that time.', 'start': 703.494, 'duration': 5.665}, {'end': 715.145, 'text': 'if I have 100 unique categories, that basically means 99 columns will get created like this kind of columns.', 'start': 709.159, 'duration': 5.986}, {'end': 721.77, 'text': 'So again, remember, guys, if the number of columns are increasing, that basically means you are increasing the number of dimension,', 'start': 715.845, 'duration': 5.925}, {'end': 724.353, 'text': 'and that leads to something called as curse of dimensionality.', 'start': 721.77, 'duration': 2.583}, {'end': 734.201, 'text': 'So you should always worry about this and always remember if you have many number of categories, many number of category features over here.', 'start': 724.915, 'duration': 9.286}, {'end': 737.643, 'text': "So make sure you don't apply one hot encoding to that.", 'start': 734.922, 'duration': 2.721}, {'end': 741.186, 'text': 'Because you see that more number of columns is getting created.', 'start': 738.064, 'duration': 3.122}], 'summary': 'Applying one hot encoding to 100 unique categories creates 99 columns, leading to curse of dimensionality.', 'duration': 37.692, 'max_score': 703.494, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40703494.jpg'}, {'end': 864.666, 'src': 'embed', 'start': 835.908, 'weight': 1, 'content': [{'end': 841.154, 'text': 'this particular row is having some higher, you know, value with respect to the PhD that we have got.', 'start': 835.908, 'duration': 5.246}, {'end': 844.377, 'text': 'Okay So this is how label encoding is done.', 'start': 841.676, 'duration': 2.701}, {'end': 845.438, 'text': 'Again, very simple.', 'start': 844.557, 'duration': 0.881}, {'end': 848.219, 'text': 'We have something called as label encoder library in the skrun.', 'start': 845.558, 'duration': 2.661}, {'end': 851.34, 'text': 'And this is basically applied to the ordinal categories.', 'start': 848.839, 'duration': 2.501}, {'end': 853.961, 'text': 'Ordinal categories.', 'start': 852.881, 'duration': 1.08}, {'end': 859.604, 'text': 'Okay So this was pretty much simple.', 'start': 857.743, 'duration': 1.861}, {'end': 864.666, 'text': "Now we'll be moving to the next topic, which is called as one hot encoding with multiple categories.", 'start': 859.884, 'duration': 4.782}], 'summary': 'The row has higher value compared to the phd. label encoding and one hot encoding are explained.', 'duration': 28.758, 'max_score': 835.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40835908.jpg'}, {'end': 876.431, 'src': 'heatmap', 'start': 835.908, 'weight': 0.836, 'content': [{'end': 841.154, 'text': 'this particular row is having some higher, you know, value with respect to the PhD that we have got.', 'start': 835.908, 'duration': 5.246}, {'end': 844.377, 'text': 'Okay So this is how label encoding is done.', 'start': 841.676, 'duration': 2.701}, {'end': 845.438, 'text': 'Again, very simple.', 'start': 844.557, 'duration': 0.881}, {'end': 848.219, 'text': 'We have something called as label encoder library in the skrun.', 'start': 845.558, 'duration': 2.661}, {'end': 851.34, 'text': 'And this is basically applied to the ordinal categories.', 'start': 848.839, 'duration': 2.501}, {'end': 853.961, 'text': 'Ordinal categories.', 'start': 852.881, 'duration': 1.08}, {'end': 859.604, 'text': 'Okay So this was pretty much simple.', 'start': 857.743, 'duration': 1.861}, {'end': 864.666, 'text': "Now we'll be moving to the next topic, which is called as one hot encoding with multiple categories.", 'start': 859.884, 'duration': 4.782}, {'end': 876.431, 'text': 'One hot encoding with multiple categories.', 'start': 867.647, 'duration': 8.784}], 'summary': 'Row has high value in phd. label encoding and one hot encoding explained.', 'duration': 40.523, 'max_score': 835.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40835908.jpg'}, {'end': 915.14, 'src': 'embed', 'start': 889.784, 'weight': 2, 'content': [{'end': 897.111, 'text': 'and this is getting repeated in most of the columns is for 50 categories will get repeated in most of the features right now.', 'start': 889.784, 'duration': 7.327}, {'end': 900.433, 'text': 'one technique that we basically use and this is with respect to nominal guys,', 'start': 897.111, 'duration': 3.322}, {'end': 905.898, 'text': "nominal categories i'm talking about now one hot encoding with multiple categories how to solve it?", 'start': 900.433, 'duration': 5.465}, {'end': 915.14, 'text': "now? what i can do is that first of all, i'll try to find out which is the, and This technique that I am discussing over here is from researchers, guys.", 'start': 905.898, 'duration': 9.242}], 'summary': 'Most columns have repeated 50 categories, discussing one-hot encoding for nominal categories.', 'duration': 25.356, 'max_score': 889.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40889784.jpg'}], 'start': 595.88, 'title': 'Encoding techniques and demonstration', 'summary': 'Covers one hot encoding, creation of dummy variables, and disadvantages, emphasizing the need to delete one column to avoid multicollinearity. it also demonstrates the curse of dimensionality, introduces label encoding for ordinal categories, and discusses one hot encoding for multiple categories, citing a technique used in kdd orange competition.', 'chapters': [{'end': 703.014, 'start': 595.88, 'title': 'One hot encoding and dummy variables', 'summary': 'Explains the concept of one hot encoding, the creation of dummy variables, and the disadvantages of one hot encoding, highlighting the need to delete one of the columns to avoid multicollinearity and the potential increase in the number of columns with one hot encoding.', 'duration': 107.134, 'highlights': ['The need to delete one of the columns to avoid multicollinearity when performing one hot encoding, as the total number of columns created will be the total number of categories minus one, such as five categories creating four columns.', 'The explanation of one hot encoding and the creation of dummy variables to represent different categories, such as using 1 to represent Germany, 1 to represent France, and 0 to represent Spain.', 'The disadvantages of one hot encoding, particularly in scenarios where there are numerous categories or frequent repetition of categories, such as pin codes in bigger cities.']}, {'end': 926.725, 'start': 703.494, 'title': 'Encoding techniques demonstration', 'summary': 'Demonstrates the curse of dimensionality due to one hot encoding, introduces label encoding for ordinal categories, and discusses one hot encoding for multiple categories, citing a technique used in kdd orange competition.', 'duration': 223.231, 'highlights': ['One hot encoding leads to curse of dimensionality with 100 unique categories creating 99 columns, emphasizing the need to avoid it for many category features. One hot encoding results in 99 columns for 100 unique categories, contributing to the curse of dimensionality.', 'Label encoding assigns ranks to ordinal categories based on common sense, providing labels in ascending or descending order to indicate their importance, using the label encoder library in scikit-learn. Label encoding assigns ranks to ordinal categories based on common sense and provides labels in ascending or descending order to indicate importance, utilizing the label encoder library in scikit-learn.', 'Introduction to one hot encoding for multiple categories, with a technique used in the KDD Orange competition, recommended for nominal categories to solve the issue of multiple categories being repeated in most features. Introducing one hot encoding for multiple categories, citing a technique used in the KDD Orange competition, recommended for nominal categories to address the repetition of multiple categories in most features.']}], 'duration': 330.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40595880.jpg', 'highlights': ['The need to delete one of the columns to avoid multicollinearity when performing one hot encoding, as the total number of columns created will be the total number of categories minus one, such as five categories creating four columns.', 'Label encoding assigns ranks to ordinal categories based on common sense, providing labels in ascending or descending order to indicate their importance, using the label encoder library in scikit-learn.', 'Introduction to one hot encoding for multiple categories, with a technique used in the KDD Orange competition, recommended for nominal categories to solve the issue of multiple categories being repeated in most features.', 'The disadvantages of one hot encoding, particularly in scenarios where there are numerous categories or frequent repetition of categories, such as pin codes in bigger cities.']}, {'end': 1446.588, 'segs': [{'end': 972.029, 'src': 'embed', 'start': 946.89, 'weight': 0, 'content': [{'end': 952.093, 'text': 'So what they did is that they only applied one hot encoding only to the top 10 categories.', 'start': 946.89, 'duration': 5.203}, {'end': 960.262, 'text': 'So by that suppose if I have 50 categories from this Suppose I consider that 10 categories has been repeated more number of times.', 'start': 952.893, 'duration': 7.369}, {'end': 963.204, 'text': "So I'll just take this 10 and create my 9 columns.", 'start': 960.302, 'duration': 2.902}, {'end': 963.724, 'text': "That's it.", 'start': 963.364, 'duration': 0.36}, {'end': 966.446, 'text': "I don't have to create 49 columns guys.", 'start': 964.544, 'duration': 1.902}, {'end': 972.029, 'text': "I'll just create 9 columns with respect to 1-0 encoding and that will be more than sufficient.", 'start': 966.626, 'duration': 5.403}], 'summary': 'Applied one-hot encoding to top 10 categories, reducing from 50 to 9 columns.', 'duration': 25.139, 'max_score': 946.89, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40946890.jpg'}, {'end': 1146.405, 'src': 'heatmap', 'start': 1012.281, 'weight': 2, 'content': [{'end': 1013.222, 'text': 'let me just check it out.', 'start': 1012.281, 'duration': 0.941}, {'end': 1030.713, 'text': 'okay, so one more is something called as target guided ordinal categories.', 'start': 1013.222, 'duration': 17.491}, {'end': 1035.09, 'text': 'now in this, guys, remember how we have to perform this Here.', 'start': 1030.713, 'duration': 4.377}, {'end': 1039.032, 'text': 'I will just not take one feature, that is, having number of categories.', 'start': 1035.09, 'duration': 3.942}, {'end': 1040.733, 'text': 'I will also take the output variable.', 'start': 1039.093, 'duration': 1.64}, {'end': 1043.534, 'text': 'I will also take the output variable.', 'start': 1041.934, 'duration': 1.6}, {'end': 1048.116, 'text': 'And suppose I have over here categories like A, B, C, D.', 'start': 1043.574, 'duration': 4.542}, {'end': 1048.916, 'text': 'It may be ordinal.', 'start': 1048.116, 'duration': 0.8}, {'end': 1049.856, 'text': 'It may be nominal.', 'start': 1048.976, 'duration': 0.88}, {'end': 1052.757, 'text': 'We do not care that much in this case.', 'start': 1050.917, 'duration': 1.84}, {'end': 1056.699, 'text': 'Now suppose I take all these particular categories.', 'start': 1053.158, 'duration': 3.541}, {'end': 1060.5, 'text': 'And I am considering a classification problem, guys.', 'start': 1056.719, 'duration': 3.781}, {'end': 1063.373, 'text': 'Suppose my output for A is 1.', 'start': 1060.78, 'duration': 2.593}, {'end': 1073.056, 'text': 'for D is also 1, C is 0, D is 1 and again I have A is equal to 1, D is equal to 0 and like this it is repeated many number of times.', 'start': 1063.373, 'duration': 9.683}, {'end': 1080.798, 'text': 'What we do with the help of this is that we will be considering this two column and, based on this two column, with each and every categories,', 'start': 1073.496, 'duration': 7.302}, {'end': 1082.458, 'text': 'I will try to find out the mean.', 'start': 1080.798, 'duration': 1.66}, {'end': 1091.534, 'text': 'I will try to find out the mean of this, mean of this particular values.', 'start': 1086.832, 'duration': 4.702}, {'end': 1098.736, 'text': 'And when I am calculating the mean, that basically means wherever a value is 1 and wherever a value is 0,', 'start': 1092.454, 'duration': 6.282}, {'end': 1101.017, 'text': 'I am just going to consider with respect to those features.', 'start': 1098.736, 'duration': 2.281}, {'end': 1106.439, 'text': 'I am going to consider with respect to those features and I am going to compute the mean of these values.', 'start': 1101.998, 'duration': 4.441}, {'end': 1117.108, 'text': 'Now when you see mean, when you see mean, in short you are basically trying to find out the number of values for which a was basically 1.', 'start': 1106.92, 'duration': 10.188}, {'end': 1127.442, 'text': 'for which A was basically 1, and suppose from this you found out that your A number is basically given.', 'start': 1117.108, 'duration': 10.334}, {'end': 1130.446, 'text': 'or suppose over here you have got 0.73.', 'start': 1127.442, 'duration': 3.004}, {'end': 1133.294, 'text': 'for B, your mean is 0.6..', 'start': 1130.446, 'duration': 2.848}, {'end': 1134.695, 'text': 'okay for C.', 'start': 1133.294, 'duration': 1.401}, {'end': 1137.298, 'text': 'for C is basically, you have got 0.4.', 'start': 1134.695, 'duration': 2.603}, {'end': 1140, 'text': 'okay, based on this mean values that you have calculated.', 'start': 1137.298, 'duration': 2.702}, {'end': 1146.405, 'text': 'now, once you have this mean now, what you have to do is that you have to assign this,', 'start': 1140, 'duration': 6.405}], 'summary': 'Target guided ordinal categories involves finding mean values for each category to guide feature selection.', 'duration': 36.635, 'max_score': 1012.281, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb401012281.jpg'}, {'end': 1204.519, 'src': 'heatmap', 'start': 1168.885, 'weight': 0.761, 'content': [{'end': 1173.729, 'text': 'so what you will do is that you will rearrange this now, wherever your 0.73 is highest, suppose.', 'start': 1168.885, 'duration': 4.844}, {'end': 1178.312, 'text': 'so a will be getting the highest rank, then b will be getting the second highest rank,', 'start': 1173.729, 'duration': 4.583}, {'end': 1184.097, 'text': 'c will be getting the less highest rank and based on that particular rank, you will be assigning labels.', 'start': 1178.312, 'duration': 5.785}, {'end': 1185.578, 'text': 'okay, suppose i have four categories.', 'start': 1184.097, 'duration': 1.481}, {'end': 1190.981, 'text': 'so a will get assigned 4, 3, 2, 1 and again it will assign at 4, 3, 2,', 'start': 1185.578, 'duration': 5.403}, {'end': 1196.071, 'text': '1 and the computation will be taken up and this value will be given to the machine learning algorithms.', 'start': 1190.981, 'duration': 5.09}, {'end': 1199.995, 'text': 'And always remember guys because of the ordinal I am giving this whole number.', 'start': 1196.612, 'duration': 3.383}, {'end': 1204.519, 'text': 'If suppose this was nominal, it would have been a different scenario.', 'start': 1201.016, 'duration': 3.503}], 'summary': 'Rank and assign labels based on highest values for machine learning algorithms', 'duration': 35.634, 'max_score': 1168.885, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb401168885.jpg'}, {'end': 1326.563, 'src': 'heatmap', 'start': 1302.185, 'weight': 0.842, 'content': [{'end': 1309.029, 'text': 'And suppose I get like 0.73 over here, 0.6, 0.5, 0.4, okay.', 'start': 1302.185, 'duration': 6.844}, {'end': 1311.41, 'text': 'So this particular value will be replaced here.', 'start': 1309.369, 'duration': 2.041}, {'end': 1316.618, 'text': 'It will not be converted into categories like how we did it in ordinal.', 'start': 1313.457, 'duration': 3.161}, {'end': 1320.62, 'text': 'It will be converted into this nominal values.', 'start': 1317.139, 'duration': 3.481}, {'end': 1323.921, 'text': 'Now you may be thinking where this will be basically applicable.', 'start': 1320.94, 'duration': 2.981}, {'end': 1326.563, 'text': 'Guys I gave you an example of the pin code right.', 'start': 1324.442, 'duration': 2.121}], 'summary': 'Numeric values will be replaced as nominal, applicable in scenarios like pin codes.', 'duration': 24.378, 'max_score': 1302.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb401302185.jpg'}, {'end': 1426.273, 'src': 'embed', 'start': 1400.209, 'weight': 3, 'content': [{'end': 1407.895, 'text': 'now you have understood, for nominal and for ordinal, which encoding you can basically use, but which will be the best one.', 'start': 1400.209, 'duration': 7.686}, {'end': 1411.117, 'text': 'suppose you have this kind of scenario of PIN code right.', 'start': 1407.895, 'duration': 3.222}, {'end': 1413.079, 'text': 'you should basically go with mean encoding.', 'start': 1411.117, 'duration': 1.962}, {'end': 1417.892, 'text': 'Now, suppose you have ordinal variables and there also you have many categories.', 'start': 1414.469, 'duration': 3.423}, {'end': 1421.956, 'text': 'So then that time you can basically go with target encoding guys.', 'start': 1418.452, 'duration': 3.504}, {'end': 1426.273, 'text': "okay, so I'll be discussing more about this in the next session.", 'start': 1423.191, 'duration': 3.082}], 'summary': 'For numeric: mean encoding. for ordinal: target encoding for many categories.', 'duration': 26.064, 'max_score': 1400.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb401400209.jpg'}], 'start': 927.105, 'title': 'Handling categorical variable encoding', 'summary': 'Covers techniques for handling multiple categories in one-hot encoding, reducing columns by encoding only top 10 repeating categories, leading to a significant reduction and successful application in a competition, resulting in a prize. it also discusses various techniques for encoding categorical variables, such as one-hot encoding for nominal categories, target-guided ordinal categories, and mean encoding based on mean values.', 'chapters': [{'end': 989.073, 'start': 927.105, 'title': 'Handling multiple categories in one-hot encoding', 'summary': 'Explains how to handle multiple categories in one-hot encoding by identifying and encoding only the top 10 repeating categories, resulting in a significant reduction of columns and successful application in a competition, leading to a prize.', 'duration': 61.968, 'highlights': ['By identifying and encoding only the top 10 repeating categories, it resulted in a significant reduction of columns from 49 to 9, simplifying the encoding process. This approach was successfully applied in a competition, leading to a prize.', 'The top 10 categories from the particular feature were observed to be repeated more, leading to the decision to apply one-hot encoding only to these top 10 categories, which proved to be effective in ensemble techniques and competition performance.']}, {'end': 1446.588, 'start': 989.073, 'title': 'Categorical variable encoding', 'summary': 'Discusses techniques for encoding categorical variables, including one hot encoding for nominal categories, target guided ordinal categories, and mean encoding, highlighting the process of assigning ranks based on mean values.', 'duration': 457.515, 'highlights': ['The technique of target guided ordinal categories involves assigning ranks to categories based on mean values derived from the output variable, allowing for the encoding of ordinal categories. assigning ranks, mean values, encoding of ordinal categories', 'Mean encoding is discussed as a technique for encoding nominal categories by replacing them with their respective mean values based on the output variable, making it suitable for scenarios such as encoding PIN codes. encoding of nominal categories, replacing with mean values, suitability for encoding PIN codes', 'One hot encoding is mentioned as a technique for encoding nominal categories by converting them into binary dummy variables, and it is highlighted as less suitable for scenarios with a large number of categories such as PIN codes. encoding of nominal categories, converting into binary dummy variables, less suitable for large number of categories']}], 'duration': 519.483, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/OTPz5plKb40/pics/OTPz5plKb40927105.jpg', 'highlights': ['Identifying and encoding only the top 10 repeating categories resulted in a significant reduction of columns from 49 to 9, simplifying the encoding process.', 'The top 10 categories from the particular feature were observed to be repeated more, leading to the decision to apply one-hot encoding only to these top 10 categories, which proved to be effective in ensemble techniques and competition performance.', 'The technique of target guided ordinal categories involves assigning ranks to categories based on mean values derived from the output variable, allowing for the encoding of ordinal categories.', 'Mean encoding is discussed as a technique for encoding nominal categories by replacing them with their respective mean values based on the output variable, making it suitable for scenarios such as encoding PIN codes.', 'One hot encoding is mentioned as a technique for encoding nominal categories by converting them into binary dummy variables, and it is highlighted as less suitable for scenarios with a large number of categories such as PIN codes.']}], 'highlights': ['Krish offers free feature engineering materials on Independence Day, requiring viewers to fill a form with their email ids to receive the complete zip file with advanced techniques, open for two days, and sending the materials to the participants within three days.', 'Covers encoding techniques for categorical variables, emphasizing significance and potential impact on machine learning algorithms.', 'PhD holders receive higher salary compared to B.Com or BE holders', 'The need to delete one of the columns to avoid multicollinearity when performing one hot encoding, as the total number of columns created will be the total number of categories minus one, such as five categories creating four columns.', 'Identifying and encoding only the top 10 repeating categories resulted in a significant reduction of columns from 49 to 9, simplifying the encoding process.']}