title
Feature Selection Techniques Easily Explained | Machine Learning
description
Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in. Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.
References : https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e.
#FeatureSelectionTechnique
Github url :https://github.com/krishnaik06/Feature-Selection-techniques
You can buy my book in Finance with ML and DL from the below url
https://www.amazon.in/Hands-Python-Finance-implementing-strategies/dp/1789346371/ref=sr_1_1?keywords=krish+naik&qid=1560267294&s=gateway&sr=8-1
detail
{'title': 'Feature Selection Techniques Easily Explained | Machine Learning', 'heatmap': [{'end': 402.165, 'start': 381.676, 'weight': 0.724}, {'end': 982.297, 'start': 948.103, 'weight': 1}], 'summary': 'Explains the impact of feature selection on model accuracy, presenting practical techniques including univariate selection, feature importance, and correlation matrix with heat map, achieving 92% accuracy with cross validation and hyperparameters for phone model feature selection.', 'chapters': [{'end': 72.807, 'segs': [{'end': 72.807, 'src': 'embed', 'start': 16.702, 'weight': 0, 'content': [{'end': 22.684, 'text': 'after the number of features increases the threshold value, what happens is that it decreases the accuracy of the model.', 'start': 16.702, 'duration': 5.982}, {'end': 28.386, 'text': 'whenever we are giving those data to our training our model, the model basically gets confused because it is learning too much of data.', 'start': 22.684, 'duration': 5.702}, {'end': 37.297, 'text': 'so in order to actually resolve that particular situation, what we do is that we do not select all the features from a particular data set.', 'start': 28.906, 'duration': 8.391}, {'end': 40.721, 'text': 'instead, we apply various techniques of feature selection.', 'start': 37.297, 'duration': 3.424}, {'end': 45.628, 'text': "so some of the techniques that i'm going to show you in practical today are basically like univariate selection.", 'start': 40.721, 'duration': 4.907}, {'end': 53.435, 'text': 'okay, the other technique is something called as feature importance, and the third technique is basically called as correlation matrix with heat map.', 'start': 46.69, 'duration': 6.745}, {'end': 56.677, 'text': 'these all techniques are very, very efficient, you know.', 'start': 53.435, 'duration': 3.242}, {'end': 61.02, 'text': "so I'm actually going to show you all these particular techniques with the help of code.", 'start': 56.677, 'duration': 4.343}, {'end': 66.222, 'text': "so make sure you watch this video till the end And this particular notebook file I'll be sharing it.", 'start': 61.02, 'duration': 5.202}, {'end': 72.807, 'text': "I'll be uploading this in my GitHub link and I'll be sharing the URL in this particular description of this particular video.", 'start': 66.902, 'duration': 5.905}], 'summary': 'Increasing features decreases model accuracy; use feature selection techniques: univariate selection, feature importance, correlation matrix with heat map.', 'duration': 56.105, 'max_score': 16.702, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ16702.jpg'}], 'start': 1.476, 'title': 'Feature selection techniques', 'summary': 'Discusses the impact of feature selection on model accuracy due to curse of dimensionality and presents practical techniques including univariate selection, feature importance, and correlation matrix with heat map for efficient feature selection.', 'chapters': [{'end': 72.807, 'start': 1.476, 'title': 'Feature selection techniques', 'summary': 'Discusses the impact of feature selection on model accuracy due to curse of dimensionality and presents practical techniques including univariate selection, feature importance, and correlation matrix with heat map for efficient feature selection.', 'duration': 71.331, 'highlights': ['The curse of dimensionality leads to decreased model accuracy when the number of features in a dataset surpasses a threshold value, causing confusion for the model due to excessive data.', 'Feature selection techniques like univariate selection, feature importance, and correlation matrix with heat map are efficient methods to address the curse of dimensionality and enhance model performance.', 'The presenter will demonstrate the practical application of feature selection techniques through code and will share the notebook file via GitHub for further reference.']}], 'duration': 71.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1476.jpg', 'highlights': ['Feature selection techniques like univariate selection, feature importance, and correlation matrix with heat map are efficient methods to address the curse of dimensionality and enhance model performance.', 'The curse of dimensionality leads to decreased model accuracy when the number of features in a dataset surpasses a threshold value, causing confusion for the model due to excessive data.', 'The presenter will demonstrate the practical application of feature selection techniques through code and will share the notebook file via GitHub for further reference.']}, {'end': 237.775, 'segs': [{'end': 128.667, 'src': 'embed', 'start': 96.795, 'weight': 0, 'content': [{'end': 99.436, 'text': 'First component is that suppose I have all the set of features.', 'start': 96.795, 'duration': 2.641}, {'end': 103.118, 'text': "From this particular set of features, I'll be selecting the best subset.", 'start': 99.836, 'duration': 3.282}, {'end': 108.52, 'text': "How I'll be selecting the best subset? There are various techniques that we basically apply.", 'start': 104.218, 'duration': 4.302}, {'end': 112.662, 'text': 'Some of the techniques I would like to call it as something called as ANOVA test.', 'start': 108.8, 'duration': 3.862}, {'end': 115.383, 'text': 'I hope everybody has heard of NOA test.', 'start': 113.422, 'duration': 1.961}, {'end': 118.124, 'text': 'it is a statistical method or the other method.', 'start': 115.383, 'duration': 2.741}, {'end': 122.585, 'text': 'one more method that we basically have is something called as chi-square test right.', 'start': 118.124, 'duration': 4.461}, {'end': 128.667, 'text': 'and one more method I would specifically call it something called as coefficient.', 'start': 122.585, 'duration': 6.082}], 'summary': 'Select best feature subset using anova, chi-square, and coefficient methods.', 'duration': 31.872, 'max_score': 96.795, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ96795.jpg'}, {'end': 169.885, 'src': 'embed', 'start': 139.218, 'weight': 2, 'content': [{'end': 142.98, 'text': 'this three techniques helps us to select some important features.', 'start': 139.218, 'duration': 3.762}, {'end': 144.461, 'text': 'you know some important features.', 'start': 142.98, 'duration': 1.481}, {'end': 150.385, 'text': 'now, when I say the selecting some important features, that basically means that these features will be very,', 'start': 144.461, 'duration': 5.924}, {'end': 153.687, 'text': 'very much correlated with our target output.', 'start': 150.385, 'duration': 3.302}, {'end': 155.508, 'text': 'you know, with our target output.', 'start': 153.687, 'duration': 1.821}, {'end': 163.555, 'text': 'now, when I say it is very, very much correlated, that basically means that let me just take an example of correlation, Coefficient, correlation.', 'start': 155.508, 'duration': 8.047}, {'end': 163.956, 'text': 'and that time.', 'start': 163.555, 'duration': 0.401}, {'end': 169.885, 'text': 'what happens is that suppose I have my independent feature than X and my dependent feature on my output feature is Y.', 'start': 163.956, 'duration': 5.929}], 'summary': 'Techniques for selecting highly correlated features to the target output are crucial.', 'duration': 30.667, 'max_score': 139.218, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ139218.jpg'}, {'end': 241.299, 'src': 'embed', 'start': 215.753, 'weight': 1, 'content': [{'end': 221.459, 'text': 'And this correlation minus one to plus one is basically for Pearson correlation, Pearson correlation coefficient.', 'start': 215.753, 'duration': 5.706}, {'end': 229.867, 'text': "So what I was trying to say is that Here we'll be selecting the best subset by using some statistical tools like ANOVA,", 'start': 221.859, 'duration': 8.008}, {'end': 232.17, 'text': 'Chi-square or correlation matrix.', 'start': 229.867, 'duration': 2.303}, {'end': 237.775, 'text': "And over here, I'll also be showing you a correlation matrix example with the help of a data set over here.", 'start': 232.55, 'duration': 5.225}, {'end': 239.017, 'text': 'Just wait for some time.', 'start': 238.096, 'duration': 0.921}, {'end': 241.299, 'text': 'Now, this was the first technique of filter method.', 'start': 239.417, 'duration': 1.882}], 'summary': 'Select best subset using anova, chi-square, and correlation matrix for filter method.', 'duration': 25.546, 'max_score': 215.753, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ215753.jpg'}], 'start': 73.307, 'title': 'Feature selection techniques', 'summary': 'Discusses the basics of feature selection, focusing on filter method techniques such as anova, chi-square test, and correlation coefficient, which help in selecting important features highly correlated with the target output.', 'chapters': [{'end': 237.775, 'start': 73.307, 'title': 'Feature selection techniques', 'summary': 'Discusses the basics of feature selection, focusing on filter method techniques such as anova, chi-square test, and correlation coefficient, which help in selecting important features highly correlated with the target output.', 'duration': 164.468, 'highlights': ['The filter method for feature selection involves three sub-components: ANOVA test, chi-square test, and correlation coefficient, which are statistical techniques used to select important features correlated with the target output.', 'Correlation coefficient is a key statistical tool for feature selection, with values ranging from -1 to +1, indicating the degree of correlation between independent and dependent features.', 'The chapter emphasizes the importance of selecting features highly correlated with the target output, using statistical tools like ANOVA, chi-square, and correlation matrix.']}], 'duration': 164.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ73307.jpg', 'highlights': ['The filter method for feature selection involves ANOVA test, chi-square test, and correlation coefficient.', 'Correlation coefficient is a key statistical tool for feature selection, with values ranging from -1 to +1.', 'The chapter emphasizes the importance of selecting features highly correlated with the target output.']}, {'end': 436.996, 'segs': [{'end': 299.918, 'src': 'embed', 'start': 258.012, 'weight': 0, 'content': [{'end': 263.174, 'text': "Here you don't have to apply any statistical stuff, only a simple mechanism you have to apply.", 'start': 258.012, 'duration': 5.162}, {'end': 265.315, 'text': 'There are three basic mechanism in this.', 'start': 263.514, 'duration': 1.801}, {'end': 268.736, 'text': 'so one is basically like forward selection.', 'start': 266.275, 'duration': 2.461}, {'end': 277.5, 'text': 'again. this method is basically getting used to select the best important features from the particular data set with respect to the target output.', 'start': 268.736, 'duration': 8.764}, {'end': 280.562, 'text': 'now, first technique is something called as forward selection.', 'start': 277.5, 'duration': 3.062}, {'end': 283.343, 'text': 'now forward selection works in a simple way.', 'start': 280.562, 'duration': 2.781}, {'end': 286.965, 'text': "okay, we'll just read this particular sentence so you'll be able to understand.", 'start': 283.343, 'duration': 3.622}, {'end': 292.737, 'text': 'a forward selection is an iterative method in which we start with having no feature in the model.', 'start': 286.965, 'duration': 5.772}, {'end': 296.557, 'text': 'in each iteration we keep adding the feature which best improves our model.', 'start': 292.737, 'duration': 3.82}, {'end': 299.918, 'text': 'now let me just give you an example how forward selection works.', 'start': 296.557, 'duration': 3.361}], 'summary': 'Forward selection is an iterative method to select important features, improving the model with each iteration.', 'duration': 41.906, 'max_score': 258.012, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ258012.jpg'}, {'end': 390.14, 'src': 'embed', 'start': 361.372, 'weight': 3, 'content': [{'end': 365.594, 'text': 'now backward elimination works in some in a slight different way.', 'start': 361.372, 'duration': 4.222}, {'end': 368.676, 'text': 'suppose these are my independent features and this is my output feature.', 'start': 365.594, 'duration': 3.082}, {'end': 373.958, 'text': "what I'll do is that I'll take all this independent feature, you know, train it to know a model.", 'start': 368.676, 'duration': 5.282}, {'end': 378.661, 'text': "okay, before training it in our model, I'll just apply a statistical test.", 'start': 373.958, 'duration': 4.703}, {'end': 381.676, 'text': 'okay, static sticker test,', 'start': 379.795, 'duration': 1.881}, {'end': 390.14, 'text': 'and this particular test will basically be saying that which feature is basically having the lowest impact on the target variable.', 'start': 381.676, 'duration': 8.464}], 'summary': 'Backward elimination method selects features with lowest impact on target variable.', 'duration': 28.768, 'max_score': 361.372, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ361372.jpg'}, {'end': 405.506, 'src': 'heatmap', 'start': 381.676, 'weight': 0.724, 'content': [{'end': 390.14, 'text': 'and this particular test will basically be saying that which feature is basically having the lowest impact on the target variable.', 'start': 381.676, 'duration': 8.464}, {'end': 395.742, 'text': 'that basically means that the correlation between the independent and the target features nothing, okay.', 'start': 390.14, 'duration': 5.602}, {'end': 402.165, 'text': 'now, if we are having those kind of feature, we can definitely skip that feature because it has no impact on the output.', 'start': 395.742, 'duration': 6.423}, {'end': 405.506, 'text': 'now, in this particular case, we basically have something called a choice for test.', 'start': 402.165, 'duration': 3.341}], 'summary': 'Identifying the feature with the lowest impact on the target variable using a choice for test.', 'duration': 23.83, 'max_score': 381.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ381676.jpg'}, {'end': 436.996, 'src': 'embed', 'start': 402.165, 'weight': 4, 'content': [{'end': 405.506, 'text': 'now, in this particular case, we basically have something called a choice for test.', 'start': 402.165, 'duration': 3.341}, {'end': 405.987, 'text': 'again here.', 'start': 405.506, 'duration': 0.481}, {'end': 414.215, 'text': 'Now, in chi-square stats, we will be calculating the p-value of this particular feature, and if the p-value is less than 0.05,,', 'start': 406.747, 'duration': 7.468}, {'end': 418.059, 'text': 'we basically consider that this particular feature is useful.', 'start': 414.215, 'duration': 3.844}, {'end': 425.988, 'text': 'If the p-value is greater than 0.05, we consider that this value is not useful and this does not have any impact on the output feature.', 'start': 418.46, 'duration': 7.528}, {'end': 429.751, 'text': 'So this is how the backward elimination is basically implemented.', 'start': 426.468, 'duration': 3.283}, {'end': 433.153, 'text': "And don't worry about this chi-square and p-value.", 'start': 430.131, 'duration': 3.022}, {'end': 436.996, 'text': 'This is basically a statistical method wherein we will be getting a p-value,', 'start': 433.213, 'duration': 3.783}], 'summary': 'Chi-square test calculates p-value to determine feature usefulness, with threshold of 0.05.', 'duration': 34.831, 'max_score': 402.165, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ402165.jpg'}], 'start': 238.096, 'title': 'Feature selection methods', 'summary': 'Covers wrapper method forward selection, forward and backward feature selection techniques, and backward elimination in chi-square test, providing insights into iterative feature selection, model improvement, and statistical analysis with a p-value threshold of 0.05.', 'chapters': [{'end': 299.918, 'start': 238.096, 'title': 'Wrapper method: forward selection', 'summary': 'Explains the wrapper method, specifically the forward selection technique, which is an iterative method for selecting the best features to improve the model without the need for statistical analysis.', 'duration': 61.822, 'highlights': ["Forward selection is an iterative method where features are added to the model in iterations to improve the model's performance, simplifying the feature selection process without the need for statistical analysis.", 'The wrapper method, particularly the forward selection technique, is used to select the most important features from the dataset with respect to the target output, providing a simplified mechanism for feature selection.']}, {'end': 381.676, 'start': 299.918, 'title': 'Forward and backward feature selection', 'summary': 'Discusses forward selection and backward elimination techniques for feature selection in machine learning, where forward selection involves iteratively adding features to the model and evaluating accuracy, and backward elimination involves removing features based on statistical tests.', 'duration': 81.758, 'highlights': ['Forward selection involves iteratively adding features to the model and evaluating accuracy. The model is trained with one feature initially, and then additional features are iteratively added to check for improved accuracy.', 'Backward elimination entails removing features based on statistical tests. All the independent features are initially considered, and then a statistical test is applied before training the model to determine feature removal.']}, {'end': 436.996, 'start': 381.676, 'title': 'Backward elimination in chi-square test', 'summary': 'Discusses the implementation of backward elimination using the chi-square test to identify the features with the lowest impact on the target variable, determining their usefulness based on a p-value threshold of 0.05.', 'duration': 55.32, 'highlights': ['The chi-square test is used to calculate the p-value of a feature, and if the p-value is less than 0.05, it is considered useful for impacting the output feature.', 'Features with a p-value greater than 0.05 are considered not useful and can be skipped as they have no impact on the output.', 'The chapter emphasizes the use of chi-square and p-value as a statistical method for feature selection in backward elimination.']}], 'duration': 198.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ238096.jpg', 'highlights': ['Forward selection simplifies feature selection without statistical analysis.', 'Wrapper method, forward selection, selects important features for target output.', 'Forward selection iteratively adds features to improve model accuracy.', 'Backward elimination removes features based on statistical tests.', 'Chi-square test calculates p-value of a feature for impact on output.', 'Features with p-value > 0.05 are considered not useful and can be skipped.']}, {'end': 860.716, 'segs': [{'end': 462.079, 'src': 'embed', 'start': 436.996, 'weight': 0, 'content': [{'end': 443.101, 'text': 'which will be the value with respect to the correlation between the independent feature and the output feature.', 'start': 436.996, 'duration': 6.105}, {'end': 446.204, 'text': 'Now that is how a backward elimination method is done.', 'start': 443.402, 'duration': 2.802}, {'end': 449.987, 'text': 'now in recursive feature of elimination.', 'start': 446.684, 'duration': 3.303}, {'end': 452.39, 'text': 'what we do is that my basically the third technique.', 'start': 449.987, 'duration': 2.403}, {'end': 453.771, 'text': 'you can read it from here.', 'start': 452.39, 'duration': 1.381}, {'end': 458.556, 'text': 'it is a greedy optimization algorithm which aims to find the best performing feature subset.', 'start': 453.771, 'duration': 4.785}, {'end': 460.237, 'text': 'so here it will not just go.', 'start': 458.556, 'duration': 1.681}, {'end': 462.079, 'text': 'it will not randomly select any feature.', 'start': 460.237, 'duration': 1.842}], 'summary': 'Backward elimination and recursive feature elimination aim to find the best performing feature subset.', 'duration': 25.083, 'max_score': 436.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ436996.jpg'}, {'end': 538.798, 'src': 'embed', 'start': 514.994, 'weight': 3, 'content': [{'end': 523.003, 'text': "I've eliminated these techniques because I don't use it too much and you will also not use it because the reason is that whenever you get a dataset,", 'start': 514.994, 'duration': 8.009}, {'end': 530.351, 'text': 'it will be a huge dataset and if you try to apply this forward selection and backward elimination a huge data set it will definitely take time.', 'start': 523.003, 'duration': 7.348}, {'end': 533.033, 'text': "okay, so I I would prefer that you don't follow this.", 'start': 530.351, 'duration': 2.682}, {'end': 538.798, 'text': 'but yes, if your data set is in medium size or small size, you can go ahead with forward selection on backward.', 'start': 533.033, 'duration': 5.765}], 'summary': 'Avoid using forward selection and backward elimination for large datasets, but consider for medium or small ones.', 'duration': 23.804, 'max_score': 514.994, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ514994.jpg'}, {'end': 587.135, 'src': 'embed', 'start': 555.369, 'weight': 1, 'content': [{'end': 563.892, 'text': 'The embedded techniques, what it does is that it creates a lot of subset from this particular dataset, from this particular independent feature.', 'start': 555.369, 'duration': 8.523}, {'end': 568.174, 'text': 'Sometime it may just give aid to the model, it may find the accuracy.', 'start': 564.232, 'duration': 3.942}, {'end': 576.482, 'text': 'then it may give a B to the model, or it may find the accuracy, or it may just give B to the model and find the accuracy, you know,', 'start': 568.954, 'duration': 7.528}, {'end': 587.135, 'text': 'and it will try to do all the permutation and combination, like a, a, B, a, C, a D, a E, or it can have a B, C, a ABCD, ABCDE.', 'start': 576.482, 'duration': 10.653}], 'summary': 'Embedded techniques create subsets to aid model accuracy.', 'duration': 31.766, 'max_score': 555.369, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ555369.jpg'}, {'end': 703.647, 'src': 'embed', 'start': 675.228, 'weight': 2, 'content': [{'end': 678.109, 'text': 'guys, now, in the univariate selection, it is nothing,', 'start': 675.228, 'duration': 2.881}, {'end': 685.933, 'text': 'but it is a statistical test and it can be used to select those features that have the strongest relationship with the output values.', 'start': 678.109, 'duration': 7.824}, {'end': 693.002, 'text': 'that basically means that your independent feature and your output feature are very, very strongly related.', 'start': 686.758, 'duration': 6.244}, {'end': 697.844, 'text': 'that basically means that suppose, if your independent feature value is increasing, the output will be increasing.', 'start': 693.002, 'duration': 4.842}, {'end': 701.286, 'text': 'if your independent feature value is decreasing, your output will actually be decreasing.', 'start': 697.844, 'duration': 3.442}, {'end': 703.647, 'text': 'so all these particular techniques will basically get applied.', 'start': 701.286, 'duration': 2.361}], 'summary': 'Univariate selection uses statistical tests to select features strongly related to output values.', 'duration': 28.419, 'max_score': 675.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ675228.jpg'}], 'start': 436.996, 'title': 'Feature selection techniques', 'summary': 'Covers feature selection techniques such as backward elimination, recursive feature elimination, forward selection, and embedded methods. it includes practical implementations using libraries like select k best and chi-square test in machine learning use cases and presents univariate selection using statistical tests with a practical example of a mobile handset dataset.', 'chapters': [{'end': 475.612, 'start': 436.996, 'title': 'Feature selection techniques', 'summary': 'Explains the backward elimination method and recursive feature elimination, with the latter using a greedy optimization algorithm to find the best performing feature subset.', 'duration': 38.616, 'highlights': ['Recursive feature elimination uses a greedy optimization algorithm to find the best performing feature subset, not randomly selecting any feature and adding the next performing feature in each iteration with respect to the target output.', 'Backward elimination method is explained in the chapter, focusing on the correlation between the independent feature and the output feature.']}, {'end': 860.716, 'start': 475.612, 'title': 'Feature selection techniques', 'summary': 'Discusses feature selection techniques, including forward selection and backward elimination for small datasets, and embedded methods that create subsets and select the best performing features, along with practical implementations using libraries like select k best and chi-square test in the context of machine learning use cases, and presents univariate selection using statistical tests and libraries like select k best for feature selection with a practical example of a mobile handset data set.', 'duration': 385.104, 'highlights': ['The chapter presents embedded methods as a feature selection technique, which creates subsets from independent features, tries all permutations and combinations, and selects the subset with the highest accuracy for model training, applicable to medium or small-sized datasets. Embedded methods create subsets, try permutations and combinations, and select the subset with the highest accuracy for model training, suitable for medium or small-sized datasets.', 'The chapter explains univariate selection using statistical tests and libraries like select K best to identify features strongly related to output values, with a practical example using a mobile handset data set and the select K best library from scikit-learn. Univariate selection uses statistical tests and select K best to identify features strongly related to output values, demonstrated with a practical example using a mobile handset data set and the select K best library.', 'The chapter discusses forward selection and backward elimination as feature selection techniques for small datasets, highlighting their ease of practical implementation using iteration of loops, and advises against using them for huge datasets in machine learning use cases. Forward selection and backward elimination are discussed as feature selection techniques for small datasets, emphasizing their ease of practical implementation using iteration of loops, and advising against using them for huge datasets in machine learning use cases.']}], 'duration': 423.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ436996.jpg', 'highlights': ['Recursive feature elimination uses a greedy optimization algorithm to find the best performing feature subset, not randomly selecting any feature and adding the next performing feature in each iteration with respect to the target output.', 'The chapter presents embedded methods as a feature selection technique, which creates subsets from independent features, tries all permutations and combinations, and selects the subset with the highest accuracy for model training, applicable to medium or small-sized datasets.', 'The chapter explains univariate selection using statistical tests and libraries like select K best to identify features strongly related to output values, with a practical example using a mobile handset data set and the select K best library from scikit-learn.', 'The chapter discusses forward selection and backward elimination as feature selection techniques for small datasets, highlighting their ease of practical implementation using iteration of loops, and advises against using them for huge datasets in machine learning use cases.', 'Backward elimination method is explained in the chapter, focusing on the correlation between the independent feature and the output feature.']}, {'end': 1055.781, 'segs': [{'end': 911.877, 'src': 'embed', 'start': 877.167, 'weight': 2, 'content': [{'end': 879.868, 'text': 'Because, as you know, in the course of dimensionally, what happens?', 'start': 877.167, 'duration': 2.701}, {'end': 885.73, 'text': 'that if I increase the number of features after a particular threshold value, the accuracy of the model is decrease.', 'start': 879.868, 'duration': 5.862}, {'end': 885.95, 'text': 'you know?', 'start': 885.73, 'duration': 0.22}, {'end': 891.331, 'text': "So for that I'm basically using this particular univariate selection in this univariate selection.", 'start': 886.29, 'duration': 5.041}, {'end': 893.792, 'text': "I'll be using a library which is called a select a best.", 'start': 891.371, 'duration': 2.421}, {'end': 895.812, 'text': 'and this particular, select a best.', 'start': 894.412, 'duration': 1.4}, {'end': 901.434, 'text': "what I'm doing is that here I just give a score function as chi-square and my K value is 10.", 'start': 895.812, 'duration': 5.622}, {'end': 906.535, 'text': "this will basically, I'm applying the K base class to extract top 10 best features.", 'start': 901.434, 'duration': 5.101}, {'end': 910.176, 'text': 'you know and this is with respect to your output feature that we have now.', 'start': 906.535, 'duration': 3.641}, {'end': 911.877, 'text': "what I'll do is that I will initialize this,", 'start': 910.176, 'duration': 1.701}], 'summary': 'Using univariate selection with a chi-square score function and k value of 10 to extract top 10 best features.', 'duration': 34.71, 'max_score': 877.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ877167.jpg'}, {'end': 956.093, 'src': 'embed', 'start': 930.296, 'weight': 5, 'content': [{'end': 936.878, 'text': 'okay, now, this particular fit underscore score will basically calculate the score with respect to the chi-square test value.', 'start': 930.296, 'duration': 6.582}, {'end': 941.72, 'text': 'if you remember the chi-square formula, you can just google it and understand how the chi-square formula works.', 'start': 936.878, 'duration': 4.842}, {'end': 948.103, 'text': 'but based on the chi-square formula, it will calculate some scores with respect to each and every feature and the target output.', 'start': 941.72, 'duration': 6.383}, {'end': 956.093, 'text': "so Here we are storing all the scores in this particular data frame, that is df.scores and my column names I'm creating as x.columns.", 'start': 948.103, 'duration': 7.99}], 'summary': 'The fit_score calculates chi-square test scores for each feature and target output.', 'duration': 25.797, 'max_score': 930.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ930296.jpg'}, {'end': 996.262, 'src': 'heatmap', 'start': 948.103, 'weight': 1, 'content': [{'end': 956.093, 'text': "so Here we are storing all the scores in this particular data frame, that is df.scores and my column names I'm creating as x.columns.", 'start': 948.103, 'duration': 7.99}, {'end': 964.185, 'text': "I'm concatenating in the next statement for better visualization and I'm just renaming the column as specs and score.", 'start': 956.694, 'duration': 7.491}, {'end': 967.569, 'text': 'So here you can see that these are all my features.', 'start': 964.385, 'duration': 3.184}, {'end': 969.049, 'text': 'I have total 19 features.', 'start': 967.569, 'duration': 1.48}, {'end': 973.132, 'text': 'okay, and you can see that in my heading I have specs and scores.', 'start': 969.049, 'duration': 4.083}, {'end': 973.832, 'text': 'battery power.', 'start': 973.132, 'duration': 0.7}, {'end': 978.595, 'text': 'this is my score, the highest score, the more important those feature is.', 'start': 973.832, 'duration': 4.763}, {'end': 982.297, 'text': 'now, from all this, you can see that the RAM has basically the highest score.', 'start': 978.595, 'duration': 3.702}, {'end': 987.119, 'text': 'you know, and it is obvious that if the RAM size basically increases, the price of the phone also increases.', 'start': 982.297, 'duration': 4.822}, {'end': 988.28, 'text': 'it is just a common sense name.', 'start': 987.119, 'duration': 1.161}, {'end': 996.262, 'text': 'so all those values that are basically higher over here are basically much more correlated with the output feature that we have in our data set.', 'start': 988.88, 'duration': 7.382}], 'summary': 'Analyzed 19 features, with ram being most important; higher ram correlates with higher phone price.', 'duration': 48.159, 'max_score': 948.103, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ948103.jpg'}, {'end': 1029.517, 'src': 'embed', 'start': 1004.604, 'weight': 4, 'content': [{'end': 1011.545, 'text': 'with respect to the score that i have now, you can see that my top 10 features that i have selected is basically ram, pixel height, battery power,', 'start': 1004.604, 'duration': 6.941}, {'end': 1017.167, 'text': 'pixel width, mobile weight in interior memory, sc, underscore, w, talk time, fc and sch.', 'start': 1011.545, 'duration': 5.622}, {'end': 1019.987, 'text': 'So all these features are my 10 best feature.', 'start': 1018.285, 'duration': 1.702}, {'end': 1025.291, 'text': 'What I can do is that I can just take this 10 feature because these are the most 10 best feature and I can give it to my model.', 'start': 1020.287, 'duration': 5.004}, {'end': 1029.517, 'text': 'Okay, Now the next technique.', 'start': 1026.292, 'duration': 3.225}], 'summary': 'Top 10 features selected for model: ram, pixel height, battery power, pixel width, mobile weight, internal memory, sc, underscore, w, talk time, fc, sch.', 'duration': 24.913, 'max_score': 1004.604, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1004604.jpg'}, {'end': 1059.305, 'src': 'embed', 'start': 1036.644, 'weight': 0, 'content': [{'end': 1044.128, 'text': 'So I think, I think that the double digit value, based on the distribution that you have over there, you can just consider till nine,', 'start': 1036.644, 'duration': 7.484}, {'end': 1046.031, 'text': "and I've just selected it as 10..", 'start': 1044.128, 'duration': 1.903}, {'end': 1049.715, 'text': 'And this will definitely probably work, because I have executed the further code.', 'start': 1046.031, 'duration': 3.684}, {'end': 1055.781, 'text': 'with respect to this, I was able to get 92% accuracy with cross validation and also applying hyperparameters.', 'start': 1049.715, 'duration': 6.066}, {'end': 1059.305, 'text': 'Now, the next technique is something called as feature importance.', 'start': 1056.302, 'duration': 3.003}], 'summary': 'Achieved 92% accuracy with cross validation and hyperparameters. moving on to feature importance.', 'duration': 22.661, 'max_score': 1036.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1036644.jpg'}], 'start': 860.716, 'title': 'Feature selection for price prediction and phone model', 'summary': 'Discusses selecting the best features for price prediction using univariate selection with a chi-square score function and k value of 10 to extract the top 10 best features, considering the impact of dimensional threshold. it also covers feature selection for a phone model, emphasizing the importance of ram size and presenting the top 10 selected features based on their correlation with the output feature, achieving 92% accuracy with cross validation and hyperparameters.', 'chapters': [{'end': 973.132, 'start': 860.716, 'title': 'Selecting best features for price prediction', 'summary': 'Discusses the process of selecting the best features for price prediction, using univariate selection with a chi-square score function and k value of 10 to extract the top 10 best features, considering the impact of dimensional threshold on model accuracy.', 'duration': 112.416, 'highlights': ['Implementing univariate selection with a chi-square score function and K value of 10 to extract the top 10 best features The speaker explains the process of using univariate selection with a chi-square score function and a K value of 10 to extract the top 10 best features for price prediction.', 'Impact of increasing the number of features beyond a threshold value on model accuracy The chapter emphasizes the impact of increasing the number of features beyond a certain threshold on the decrease in model accuracy, highlighting the importance of feature selection.', 'Calculation of scores using the chi-square test value for each feature and the target output The speaker describes the calculation of scores for each feature with respect to the chi-square test value and the target output, providing insights into the scoring process for feature selection.']}, {'end': 1055.781, 'start': 973.132, 'title': 'Feature selection for phone model', 'summary': 'Discusses feature selection for a phone model, emphasizing the importance of ram size and presenting the top 10 selected features based on their correlation with the output feature, achieving 92% accuracy with cross validation and hyperparameters.', 'duration': 82.649, 'highlights': ['The RAM has the highest score and is correlated with the output feature, with an increase in RAM size leading to a higher phone price.', 'The top 10 selected features for the phone model are RAM, pixel height, battery power, pixel width, mobile weight, interior memory, sc, w, talk time, fc, and sch.', 'Selecting the top 10 features and incorporating them into the model resulted in 92% accuracy with cross validation and hyperparameters.']}], 'duration': 195.065, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ860716.jpg', 'highlights': ['Selecting the top 10 features and incorporating them into the model resulted in 92% accuracy with cross validation and hyperparameters.', 'The RAM has the highest score and is correlated with the output feature, with an increase in RAM size leading to a higher phone price.', 'Implementing univariate selection with a chi-square score function and K value of 10 to extract the top 10 best features.', 'Impact of increasing the number of features beyond a threshold value on model accuracy.', 'The top 10 selected features for the phone model are RAM, pixel height, battery power, pixel width, mobile weight, interior memory, sc, w, talk time, fc, and sch.', 'Calculation of scores using the chi-square test value for each feature and the target output.']}, {'end': 1379.823, 'segs': [{'end': 1159.135, 'src': 'embed', 'start': 1126.374, 'weight': 3, 'content': [{'end': 1131.063, 'text': 'now, after I do fit, If I write model dot feature underscore importance.', 'start': 1126.374, 'duration': 4.689}, {'end': 1132.844, 'text': 'this is with respect to all the features.', 'start': 1131.063, 'duration': 1.781}, {'end': 1137.348, 'text': "I'm getting different, different values assigned, and that is all because of feature importance.", 'start': 1132.844, 'duration': 4.504}, {'end': 1140.47, 'text': 'Okay And this basically assigns a different value.', 'start': 1137.728, 'duration': 2.742}, {'end': 1143.152, 'text': 'Now from this, what I can do is that I can get the top 10 values.', 'start': 1140.51, 'duration': 2.642}, {'end': 1150.068, 'text': "Uh, when I'm getting the top 10 values, these are basically my top 10 values, right? So, sorry, these are not the top 10 values.", 'start': 1143.813, 'duration': 6.255}, {'end': 1159.135, 'text': "What I'll do is that I will try to create a series from this okay, with the column names in that, and I'll try to just plot that in a bar graph,", 'start': 1150.109, 'duration': 9.026}], 'summary': 'Analyzing feature importance to identify top 10 values for plotting in a bar graph.', 'duration': 32.761, 'max_score': 1126.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1126374.jpg'}, {'end': 1197.719, 'src': 'embed', 'start': 1162.077, 'weight': 0, 'content': [{'end': 1166.84, 'text': 'Now, here also, you can see that RAM has the highest value of 0.40, okay?', 'start': 1162.077, 'duration': 4.763}, {'end': 1172.804, 'text': 'Similarly, like our previous technique that we have basically used over here, which is my univariate selection,', 'start': 1167.06, 'duration': 5.744}, {'end': 1176.387, 'text': 'you can also see that RAM was basically having the highest value.', 'start': 1172.804, 'duration': 3.583}, {'end': 1180.538, 'text': 'Okay, but with respect to feature importance also, it is also getting the highest score.', 'start': 1176.874, 'duration': 3.664}, {'end': 1182.641, 'text': 'Then I have battery power, pixel width, pixel height.', 'start': 1180.558, 'duration': 2.083}, {'end': 1188.988, 'text': 'So this particular is basically decreasing and this is what the scores it has basically got with the help of feature importance.', 'start': 1182.981, 'duration': 6.007}, {'end': 1194.134, 'text': 'So I can select all these particular features, give it to my model and train it and get the uh accuracy.', 'start': 1189.128, 'duration': 5.006}, {'end': 1197.719, 'text': 'the last technique is something called as correlation matrix with heat map.', 'start': 1194.134, 'duration': 3.585}], 'summary': 'Ram has the highest feature importance score of 0.40, followed by battery power, pixel width, and pixel height, as revealed through feature importance and correlation matrix with heat map.', 'duration': 35.642, 'max_score': 1162.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1162077.jpg'}, {'end': 1285.216, 'src': 'embed', 'start': 1257.419, 'weight': 1, 'content': [{'end': 1260.282, 'text': 'now you know that price range is my output feature.', 'start': 1257.419, 'duration': 2.863}, {'end': 1261.743, 'text': 'now this particular pricing.', 'start': 1260.282, 'duration': 1.461}, {'end': 1270.365, 'text': 'if I see, with respect to my RAM, the correlation, the value is 0.92, and we know that our correlation range is between 0 to 1, right.', 'start': 1261.743, 'duration': 8.622}, {'end': 1276.93, 'text': 'so you can see that even over here, the price range of ram and, sorry, the, the correlation between price range and ram,', 'start': 1270.365, 'duration': 6.565}, {'end': 1278.631, 'text': 'is basically a very higher value.', 'start': 1276.93, 'duration': 1.701}, {'end': 1285.216, 'text': 'similarly, you can see that the price range, the correlation between price range and battery power, is 0.2 right.', 'start': 1278.631, 'duration': 6.585}], 'summary': 'High correlation (0.92) between price range and ram, 0.2 with battery power.', 'duration': 27.797, 'max_score': 1257.419, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1257419.jpg'}, {'end': 1338.753, 'src': 'embed', 'start': 1313.096, 'weight': 4, 'content': [{'end': 1319.341, 'text': 'Now you know that you just have to choose what all features are very, very important with respect to the target output and just select.', 'start': 1313.096, 'duration': 6.245}, {'end': 1325.225, 'text': 'I think this is the most efficient thing, guys, because you will be considering that if the correlation is more than 0.2,', 'start': 1319.401, 'duration': 5.824}, {'end': 1327.186, 'text': 'just select those number of features from here.', 'start': 1325.225, 'duration': 1.961}, {'end': 1334.11, 'text': 'you know from here and just use those in the particular model training right with respect to the output feature.', 'start': 1327.726, 'duration': 6.384}, {'end': 1338.753, 'text': 'so just select those, those features that are basically having higher value with respect to price range.', 'start': 1334.11, 'duration': 4.643}], 'summary': 'Select features with correlation > 0.2 for efficient model training.', 'duration': 25.657, 'max_score': 1313.096, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1313096.jpg'}], 'start': 1056.302, 'title': 'Feature importance and selection in ml', 'summary': 'Demonstrates feature importance using a tree-based classifier to extract top 10 features and emphasizes the significance of selecting features with a correlation value greater than 0.2 with respect to the output feature for efficient model training.', 'chapters': [{'end': 1162.037, 'start': 1056.302, 'title': 'Feature importance in machine learning', 'summary': 'Demonstrates the implementation of feature importance using a tree-based classifier to extract the top 10 features from the dataset, providing insights into the relevance of each feature towards the output value.', 'duration': 105.735, 'highlights': ['The feature importance property of the model provides a score for each feature, with higher scores indicating greater relevance towards the output value.', 'Implementation involves using an in-built tree-based classifier and an extra tree classifier to extract the top 10 features from the dataset.', "After fitting the model, the 'model.feature_importance' command yields different values assigned to each feature, enabling the selection of the top 10 values for further analysis and visualization."]}, {'end': 1379.823, 'start': 1162.077, 'title': 'Feature selection techniques', 'summary': 'Discusses feature selection techniques, including univariate selection, feature importance, and correlation matrix with a heat map, emphasizing the importance of selecting features with a correlation value greater than 0.2 with respect to the output feature for efficient model training.', 'duration': 217.746, 'highlights': ['The correlation between price range and RAM is 0.92, indicating a strong positive correlation, which is crucial for selecting features for model training.', 'The technique emphasizes selecting features with a correlation value greater than 0.2 with respect to the output feature for efficient model training.', 'The feature importance analysis reveals that RAM has the highest score, indicating its significance in determining the price range of mobile handsets.', 'The chapter introduces the correlation matrix with a heat map technique, which visualizes the correlation between independent features and the output feature, aiding in feature selection for model training.', 'The chapter concludes by emphasizing the importance of carefully selecting features with high correlation values with respect to the output feature for efficient model training.']}], 'duration': 323.521, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EqLBAmtKMnQ/pics/EqLBAmtKMnQ1056302.jpg', 'highlights': ['The feature importance analysis reveals that RAM has the highest score, indicating its significance in determining the price range of mobile handsets.', 'The correlation between price range and RAM is 0.92, indicating a strong positive correlation, which is crucial for selecting features for model training.', 'The chapter introduces the correlation matrix with a heat map technique, which visualizes the correlation between independent features and the output feature, aiding in feature selection for model training.', "After fitting the model, the 'model.feature_importance' command yields different values assigned to each feature, enabling the selection of the top 10 values for further analysis and visualization.", 'The technique emphasizes selecting features with a correlation value greater than 0.2 with respect to the output feature for efficient model training.']}], 'highlights': ['Feature selection techniques like univariate selection, feature importance, and correlation matrix with heat map are efficient methods to address the curse of dimensionality and enhance model performance.', 'The curse of dimensionality leads to decreased model accuracy when the number of features in a dataset surpasses a threshold value, causing confusion for the model due to excessive data.', 'Selecting the top 10 features and incorporating them into the model resulted in 92% accuracy with cross validation and hyperparameters.', 'The RAM has the highest score and is correlated with the output feature, with an increase in RAM size leading to a higher phone price.', 'The feature importance analysis reveals that RAM has the highest score, indicating its significance in determining the price range of mobile handsets.']}