Coursnap

title
Feature Selection

description

detail
{'title': 'Feature Selection', 'heatmap': [{'end': 740.984, 'start': 610.135, 'weight': 0.812}, {'end': 814.051, 'start': 778.331, 'weight': 0.807}, {'end': 829.676, 'start': 814.551, 'weight': 0.761}, {'end': 1225.041, 'start': 1204.505, 'weight': 0.837}], 'summary': 'Covers feature reduction, subset selection, and methods in machine learning, emphasizing the need for optimal feature subsets to improve classification accuracy and classifier efficiency, including unsupervised and supervised evaluation methods, heuristic algorithms, univariate and multivariate feature selection, and the importance of selecting relevant and uncorrelated features with high positive or negative correlation values.', 'chapters': [{'end': 562.791, 'segs': [{'end': 82.905, 'src': 'embed', 'start': 47.364, 'weight': 0, 'content': [{'end': 49.204, 'text': 'For this, we need a distance function.', 'start': 47.364, 'duration': 1.84}, {'end': 53.285, 'text': 'This distance function is computed in terms of the features.', 'start': 49.544, 'duration': 3.741}, {'end': 63.809, 'text': 'If the number of features is large, there is a problem because the distance that you get is not, may not be representative of the actual distance.', 'start': 53.806, 'duration': 10.003}, {'end': 68.17, 'text': 'So, this is a reason why feature reduction becomes important.', 'start': 64.369, 'duration': 3.801}, {'end': 82.905, 'text': 'Now, we have seen that or you know that information about the target.', 'start': 76.781, 'duration': 6.124}], 'summary': 'Feature reduction is important when dealing with a large number of features, as it affects the representation of actual distance.', 'duration': 35.541, 'max_score': 47.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw447364.jpg'}, {'end': 177.557, 'src': 'embed', 'start': 151.13, 'weight': 1, 'content': [{'end': 163.096, 'text': 'this may not hold always that just because you have more features does not mean you have more information or better classification performance.', 'start': 151.13, 'duration': 11.966}, {'end': 170.954, 'text': 'If you look at the slide, this is one typical scenario,', 'start': 165.071, 'duration': 5.883}, {'end': 177.557, 'text': 'as if you keep the number of training examples fixed and the training set is not extremely large,', 'start': 170.954, 'duration': 6.603}], 'summary': "Having more features doesn't necessarily lead to better information or classification performance.", 'duration': 26.427, 'max_score': 151.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4151130.jpg'}, {'end': 236.657, 'src': 'embed', 'start': 209.563, 'weight': 4, 'content': [{'end': 220.544, 'text': 'In algorithms such as k nearest neighbor, these irrelevant features introduce noise and they and they fool the learning algorithm.', 'start': 209.563, 'duration': 10.981}, {'end': 233.214, 'text': 'because you are trying to find which instances are close together, these irrelevant features or noisy features will make the result wrong.', 'start': 220.544, 'duration': 12.67}, {'end': 236.657, 'text': 'Secondly, you may have redundant features.', 'start': 234.275, 'duration': 2.382}], 'summary': 'Irrelevant and redundant features introduce noise in k-nearest neighbor algorithms.', 'duration': 27.094, 'max_score': 209.563, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4209563.jpg'}, {'end': 358.182, 'src': 'embed', 'start': 329.522, 'weight': 2, 'content': [{'end': 341.11, 'text': 'When you have too many features, too many dimensions, this will lead to degradation of the learning algorithm, more computational time,', 'start': 329.522, 'duration': 11.588}, {'end': 344.993, 'text': 'and this phenomena is called the curse of dimensionality.', 'start': 341.11, 'duration': 3.883}, {'end': 350.336, 'text': 'To overcome this curse of dimensionality, we want to do feature reduction.', 'start': 345.393, 'duration': 4.943}, {'end': 358.182, 'text': 'There are two types of feature reduction, one is called feature selection, the other is called feature extraction.', 'start': 350.577, 'duration': 7.605}], 'summary': 'Curse of dimensionality affects learning algorithm with too many features, leading to longer computational time. feature reduction is needed, with two types: selection and extraction.', 'duration': 28.66, 'max_score': 329.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4329522.jpg'}, {'end': 531.273, 'src': 'embed', 'start': 480.834, 'weight': 3, 'content': [{'end': 485.317, 'text': 'whereas in feature selection you select a subset of the features.', 'start': 480.834, 'duration': 4.483}, {'end': 489.319, 'text': 'We will talk about feature extraction in the next class.', 'start': 485.357, 'duration': 3.962}, {'end': 505.349, 'text': 'So, in both these cases what you are seeking to optimize is you want to either improve or maintain classification accuracy.', 'start': 489.94, 'duration': 15.409}, {'end': 514.948, 'text': 'and you want to simplify classifier complexity.', 'start': 511.807, 'duration': 3.141}, {'end': 531.273, 'text': 'Earlier we saw a curve which showed that as you increase the number of features.', 'start': 525.371, 'duration': 5.902}], 'summary': 'Feature selection and extraction optimize for improved classification accuracy and simplified classifier complexity.', 'duration': 50.439, 'max_score': 480.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4480834.jpg'}], 'start': 17.774, 'title': 'Feature reduction in machine learning', 'summary': 'Covers instance-based learning, curse of dimensionality, and the need for feature reduction to improve classification accuracy and simplify classifier complexity due to large feature sets impacting distance computation and classifier performance, and the negative impact of irrelevant and redundant features on classifier performance.', 'chapters': [{'end': 177.557, 'start': 17.774, 'title': 'Instance-based learning & feature reduction', 'summary': 'Covers instance-based learning and feature reduction, emphasizing the importance of feature selection due to the impact of large feature sets on distance computation and classification performance.', 'duration': 159.783, 'highlights': ['Feature reduction is crucial due to the potential distortion in distance computation caused by a large number of features, impacting the accuracy of instance-based learning.', 'The assumption that more features always lead to better information or classification performance is challenged, highlighting that increased features do not necessarily equate to improved information or classification performance.']}, {'end': 562.791, 'start': 177.557, 'title': 'Curse of dimensionality & feature reduction', 'summary': 'Discusses the curse of dimensionality in machine learning, emphasizing the negative impact of irrelevant and redundant features on classifier performance, and the need for feature reduction through selection or extraction to improve or maintain classification accuracy and simplify classifier complexity.', 'duration': 385.234, 'highlights': ['The curse of dimensionality in machine learning is caused by irrelevant and redundant features, leading to degradation of the learning algorithm and increased computational time.', 'Irrelevant features in algorithms like k-nearest neighbor introduce noise, misleading the learning algorithm and impacting the accuracy of results.', 'Redundant features with limited training examples may degrade the performance of the learning algorithm by not contributing additional information.', 'Feature reduction aims to optimize classification accuracy and simplify classifier complexity, either by selecting a subset of features or by extracting a new subspace with a smaller number of dimensions.']}], 'duration': 545.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw417774.jpg', 'highlights': ['Feature reduction is crucial due to potential distortion in distance computation caused by a large number of features, impacting the accuracy of instance-based learning.', 'The assumption that more features always lead to better information or classification performance is challenged, highlighting that increased features do not necessarily equate to improved information or classification performance.', 'The curse of dimensionality in machine learning is caused by irrelevant and redundant features, leading to degradation of the learning algorithm and increased computational time.', 'Feature reduction aims to optimize classification accuracy and simplify classifier complexity, either by selecting a subset of features or by extracting a new subspace with a smaller number of dimensions.', 'Irrelevant features in algorithms like k-nearest neighbor introduce noise, misleading the learning algorithm and impacting the accuracy of results.']}, {'end': 718.433, 'segs': [{'end': 644.492, 'src': 'embed', 'start': 563.292, 'weight': 0, 'content': [{'end': 570.475, 'text': 'Now, we said that in feature selection we want to select a subset of the original feature set.', 'start': 563.292, 'duration': 7.183}, {'end': 574.177, 'text': 'Now, let us see how we can select that subset.', 'start': 571.275, 'duration': 2.902}, {'end': 587.331, 'text': 'So, you can see that if we have n features the number of subsets possible is 2 to the power n.', 'start': 575.876, 'duration': 11.455}, {'end': 604.753, 'text': 'and it is impossible for us to enumerate each of these possible subsets and check how good it is because the number of subsets is exponential.', 'start': 592.606, 'duration': 12.147}, {'end': 609.495, 'text': 'So, we need a method which works in reasonable time.', 'start': 605.093, 'duration': 4.402}, {'end': 624.895, 'text': 'The methods that we can use for feature subset selection can be optimum methods, can be heuristic can be randomized methods.', 'start': 610.135, 'duration': 14.76}, {'end': 639.168, 'text': 'However, we can use optimum methods if the hypothesis space or the feature subset space has a structure,', 'start': 629.84, 'duration': 9.328}, {'end': 644.492, 'text': 'so that we can have a optimum algorithm which works in polynomial time.', 'start': 639.168, 'duration': 5.324}], 'summary': 'Feature selection aims to pick subset from n features, with 2^n possible subsets. we need a method that works in reasonable time, such as optimum, heuristic, or randomized methods.', 'duration': 81.2, 'max_score': 563.292, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4563292.jpg'}, {'end': 718.433, 'src': 'embed', 'start': 680.214, 'weight': 2, 'content': [{'end': 689.358, 'text': 'In unsupervised methods, we do not evaluate the subset over the training examples.', 'start': 680.214, 'duration': 9.144}, {'end': 694.7, 'text': 'We evaluate the information content in some way in an unsupervised way.', 'start': 690.298, 'duration': 4.402}, {'end': 697.762, 'text': 'These methods are called filter methods.', 'start': 695.241, 'duration': 2.521}, {'end': 705.385, 'text': 'In supervised methods also called wrapper methods.', 'start': 701.743, 'duration': 3.642}, {'end': 714.812, 'text': 'we evaluate the feature subset by using it on a learning algorithm.', 'start': 708.43, 'duration': 6.382}, {'end': 718.433, 'text': 'These are called supervised methods or wrapper methods.', 'start': 714.952, 'duration': 3.481}], 'summary': 'Unsupervised methods do not evaluate subsets over training examples, while supervised methods evaluate feature subsets using learning algorithms.', 'duration': 38.219, 'max_score': 680.214, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4680214.jpg'}], 'start': 563.292, 'title': 'Feature subset selection', 'summary': 'Discusses challenges in selecting subsets from an original feature set, methods for feature subset selection, including optimum, heuristic, and randomized methods, focusing on evaluating subsets through unsupervised and supervised methods.', 'chapters': [{'end': 718.433, 'start': 563.292, 'title': 'Feature subset selection', 'summary': 'Discusses the challenges of selecting a subset from an original feature set, the exponential number of possible subsets, and the methods for feature subset selection, including optimum, heuristic, and randomized methods, with a focus on evaluating subsets through unsupervised and supervised methods.', 'duration': 155.141, 'highlights': ['The number of subsets possible for n features is 2 to the power n, making it impossible to enumerate each possible subset due to the exponential number of subsets. With n features, the number of possible subsets is 2 to the power n, posing a challenge due to the exponential number of subsets.', 'Feature subset selection can utilize optimum, heuristic, or randomized methods based on the structure of the hypothesis space or the feature subset space. The selection of feature subsets can involve optimum, heuristic, or randomized methods, determined by the structure of the hypothesis space or the feature subset space.', 'Evaluation of feature subsets can be performed using unsupervised methods, which assess the information content in an unsupervised way, or supervised methods, which involve using a learning algorithm. Feature subset evaluation can be conducted using unsupervised methods, assessing information content, or supervised methods, involving the use of a learning algorithm.']}], 'duration': 155.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4563292.jpg', 'highlights': ['The number of subsets possible for n features is 2 to the power n, making it impossible to enumerate each possible subset due to the exponential number of subsets.', 'Feature subset selection can utilize optimum, heuristic, or randomized methods based on the structure of the hypothesis space or the feature subset space.', 'Evaluation of feature subsets can be performed using unsupervised methods, which assess the information content in an unsupervised way, or supervised methods, which involve using a learning algorithm.']}, {'end': 856.233, 'segs': [{'end': 814.051, 'src': 'heatmap', 'start': 751.21, 'weight': 0, 'content': [{'end': 759.796, 'text': 'the search algorithm will select a feature subset and score it on the objective function and find the goodness of the feature subset.', 'start': 751.21, 'duration': 8.586}, {'end': 765.5, 'text': 'Based on this it will decide which part of the search space to explore next.', 'start': 760.356, 'duration': 5.144}, {'end': 771.125, 'text': 'And after this module is completed, you get a final feature subset,', 'start': 766.18, 'duration': 4.945}, {'end': 777.47, 'text': 'and this final feature subset is used by your machine learning or pattern recognition algorithm.', 'start': 771.125, 'duration': 6.345}, {'end': 784.616, 'text': 'So, you want to pick the subset that is optimal or near optimal with respect to the objective function.', 'start': 778.331, 'duration': 6.285}, {'end': 793.941, 'text': 'The feature subset can be evaluated by two methods.', 'start': 790.599, 'duration': 3.342}, {'end': 803.425, 'text': 'In supervised methods or wrapper methods, we train using the selected subset and we estimate the error on the validation set.', 'start': 794.041, 'duration': 9.384}, {'end': 814.051, 'text': 'In unsupervised or filter methods, we look at only the input and we select the subset which has the most information.', 'start': 804.746, 'duration': 9.305}], 'summary': 'Search algorithm selects optimal feature subset for machine learning.', 'duration': 52.215, 'max_score': 751.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4751210.jpg'}, {'end': 851.291, 'src': 'heatmap', 'start': 814.551, 'weight': 0.761, 'content': [{'end': 818.053, 'text': 'Now, these two types of methods are illustrated in this picture.', 'start': 814.551, 'duration': 3.502}, {'end': 829.676, 'text': 'So, in the filter method, the search algorithm comes up with a feature subset that is evaluated for information content and based on that,', 'start': 819.173, 'duration': 10.503}, {'end': 834.618, 'text': 'search algorithm proceeds and finally this module gives a final feature subset.', 'start': 829.676, 'duration': 4.942}, {'end': 839.486, 'text': 'In the wrapper based method or supervised method,', 'start': 835.945, 'duration': 3.541}, {'end': 851.291, 'text': 'the search algorithm outputs a feature subset which is again used with a pattern recognition of machine learning algorithm and the prediction accuracy is obtained,', 'start': 839.486, 'duration': 11.805}], 'summary': 'Filter method evaluates feature subset, wrapper method uses pattern recognition for prediction accuracy.', 'duration': 36.74, 'max_score': 814.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4814551.jpg'}, {'end': 856.233, 'src': 'embed', 'start': 829.676, 'weight': 2, 'content': [{'end': 834.618, 'text': 'search algorithm proceeds and finally this module gives a final feature subset.', 'start': 829.676, 'duration': 4.942}, {'end': 839.486, 'text': 'In the wrapper based method or supervised method,', 'start': 835.945, 'duration': 3.541}, {'end': 851.291, 'text': 'the search algorithm outputs a feature subset which is again used with a pattern recognition of machine learning algorithm and the prediction accuracy is obtained,', 'start': 839.486, 'duration': 11.805}, {'end': 856.233, 'text': 'which is fed to the search algorithm, and after this whole module is completed,', 'start': 851.291, 'duration': 4.942}], 'summary': 'Search algorithm outputs feature subset for pattern recognition, improving prediction accuracy.', 'duration': 26.557, 'max_score': 829.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4829676.jpg'}], 'start': 720.374, 'title': 'Feature selection in machine learning', 'summary': 'Discusses the optimization problem of feature selection in machine learning, comparing supervised and unsupervised methods, and the importance of selecting an optimal or near-optimal feature subset for machine learning or pattern recognition algorithms.', 'chapters': [{'end': 856.233, 'start': 720.374, 'title': 'Feature selection in machine learning', 'summary': 'Discusses the optimization problem of feature selection in machine learning, comparing supervised and unsupervised methods, and the importance of selecting an optimal or near-optimal feature subset for machine learning or pattern recognition algorithms.', 'duration': 135.859, 'highlights': ['The feature selection process in machine learning involves an optimization problem where a search algorithm selects a feature subset and scores it on an objective function to find the goodness of the subset.', 'Two methods for evaluating feature subsets are illustrated: supervised methods involve training using the selected subset and estimating the error on the validation set, while unsupervised or filter methods focus on selecting the subset with the most information.', 'The search algorithm in the wrapper-based method outputs a feature subset used in pattern recognition or machine learning algorithms, and the prediction accuracy is obtained and fed back to the search algorithm.']}], 'duration': 135.859, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4720374.jpg', 'highlights': ['The feature selection process in machine learning involves an optimization problem where a search algorithm selects a feature subset and scores it on an objective function to find the goodness of the subset.', 'Two methods for evaluating feature subsets are illustrated: supervised methods involve training using the selected subset and estimating the error on the validation set, while unsupervised or filter methods focus on selecting the subset with the most information.', 'The search algorithm in the wrapper-based method outputs a feature subset used in pattern recognition or machine learning algorithms, and the prediction accuracy is obtained and fed back to the search algorithm.']}, {'end': 1190.759, 'segs': [{'end': 929.927, 'src': 'embed', 'start': 856.233, 'weight': 2, 'content': [{'end': 859.834, 'text': 'you have a feature subset which is used by your machine learning algorithm.', 'start': 856.233, 'duration': 3.601}, {'end': 867.7, 'text': 'So, these are the two different methods of two different frameworks of feature selection.', 'start': 860.234, 'duration': 7.466}, {'end': 873.585, 'text': 'Now, how do you do the feature selection algorithm?', 'start': 868.901, 'duration': 4.684}, {'end': 884.454, 'text': 'So, first of all, what you can do is that if features are redundant, you may not use all the features.', 'start': 874.365, 'duration': 10.089}, {'end': 888.397, 'text': 'So, you can find uncorrelated features.', 'start': 885.154, 'duration': 3.243}, {'end': 902.98, 'text': 'You start with some features and then, when you try to introduce the new feature,', 'start': 898.257, 'duration': 4.723}, {'end': 911.604, 'text': 'you do not take that feature if that feature is highly correlated with another feature which already contains the information about that feature.', 'start': 902.98, 'duration': 8.624}, {'end': 921.75, 'text': 'So, you select only uncorrelated features and for selecting, so you have to select uncorrelated features.', 'start': 912.085, 'duration': 9.665}, {'end': 929.927, 'text': 'and you also have to eliminate irrelevant features.', 'start': 924.224, 'duration': 5.703}], 'summary': 'Feature selection involves eliminating redundant and irrelevant features, and selecting uncorrelated features to improve machine learning algorithms.', 'duration': 73.694, 'max_score': 856.233, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4856233.jpg'}, {'end': 1051.341, 'src': 'embed', 'start': 960.828, 'weight': 0, 'content': [{'end': 964.314, 'text': 'So, let us first write about the forward selection algorithm.', 'start': 960.828, 'duration': 3.486}, {'end': 998.077, 'text': 'you start with empty feature set and then you add features one by one.', 'start': 990.871, 'duration': 7.206}, {'end': 1017.653, 'text': 'So, you try each of the remaining features and for each of them you estimate the accuracy.', 'start': 1000.439, 'duration': 17.214}, {'end': 1051.341, 'text': 'So, you estimate the classification or regression error for adding each specific feature and you select the feature that gives maximum improvement.', 'start': 1021.966, 'duration': 29.375}], 'summary': 'Forward selection algorithm adds features one by one to maximize accuracy.', 'duration': 90.513, 'max_score': 960.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4960828.jpg'}, {'end': 1190.759, 'src': 'embed', 'start': 1125.06, 'weight': 1, 'content': [{'end': 1143.646, 'text': 'In backward search, what should you start with? In backward search you start with the full feature set, then you try removing features.', 'start': 1125.06, 'duration': 18.586}, {'end': 1150.908, 'text': 'You try removing features.', 'start': 1148.867, 'duration': 2.041}, {'end': 1155.897, 'text': 'from the features that you have.', 'start': 1154.056, 'duration': 1.841}, {'end': 1164.379, 'text': 'You find that feature whose removal gives rise to maximum improvement in performance.', 'start': 1156.637, 'duration': 7.742}, {'end': 1183.384, 'text': 'You try removing, you drop the feature with the smallest improvement.', 'start': 1168.16, 'duration': 15.224}, {'end': 1190.759, 'text': 'or with smallest impact on error.', 'start': 1186.855, 'duration': 3.904}], 'summary': 'In backward search, start with full feature set and remove features giving maximum performance improvement.', 'duration': 65.699, 'max_score': 1125.06, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41125060.jpg'}], 'start': 856.233, 'title': 'Feature selection methods and heuristic algorithms', 'summary': 'Discusses feature selection in machine learning, emphasizing the importance of uncorrelated and relevant features. it also covers forward and backward selection algorithms for feature subset selection, detailing their processes and approach for improving model performance and efficiency.', 'chapters': [{'end': 929.927, 'start': 856.233, 'title': 'Feature selection methods', 'summary': 'Discusses the process of feature selection in machine learning, emphasizing the importance of selecting uncorrelated and relevant features to improve model performance and efficiency.', 'duration': 73.694, 'highlights': ['The importance of selecting uncorrelated features to avoid redundancy and improve model performance.', 'The process of eliminating irrelevant features to enhance the efficiency of the machine learning algorithm.']}, {'end': 1190.759, 'start': 931.588, 'title': 'Heuristic algorithms for feature selection', 'summary': 'Discusses two heuristic algorithms, forward selection and backward selection, for feature subset selection, where the forward selection algorithm starts with an empty feature set and adds features one by one, estimating the accuracy for each and selecting the feature that gives maximum improvement, while the backward selection algorithm starts with the full feature set and tries removing features to find the one that gives maximum improvement in performance.', 'duration': 259.171, 'highlights': ['The forward selection algorithm starts with an empty feature set and adds features one by one, estimating the accuracy for each and selecting the feature that gives maximum improvement. The forward selection algorithm iteratively adds features one by one, estimating the accuracy for each and selecting the feature that gives maximum improvement.', 'The backward selection algorithm starts with the full feature set and tries removing features to find the one that gives maximum improvement in performance. The backward selection algorithm begins with the full feature set and tries removing features to find the one that gives maximum improvement in performance.', 'The process of feature selection is based on evaluating the improvement in performance when adding or removing features, and the algorithm stops when there is no significant improvement. The feature selection process involves evaluating the improvement in performance when adding or removing features, and the algorithm stops when there is no significant improvement.']}], 'duration': 334.526, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw4856233.jpg', 'highlights': ['The forward selection algorithm iteratively adds features one by one, estimating the accuracy for each and selecting the feature that gives maximum improvement.', 'The backward selection algorithm begins with the full feature set and tries removing features to find the one that gives maximum improvement in performance.', 'The importance of selecting uncorrelated features to avoid redundancy and improve model performance.', 'The process of eliminating irrelevant features to enhance the efficiency of the machine learning algorithm.']}, {'end': 1422.362, 'segs': [{'end': 1281.48, 'src': 'heatmap', 'start': 1204.505, 'weight': 0, 'content': [{'end': 1215.294, 'text': 'Now, feature selection methods can be univariate methods which look at one feature at a time or can be multivariate feature.', 'start': 1204.505, 'duration': 10.789}, {'end': 1225.041, 'text': 'In univariate methods, so what we do is that we look at each feature independently of the other features.', 'start': 1215.894, 'duration': 9.147}, {'end': 1237.831, 'text': 'For this we can look for different methods, for example Pearson correlation, coefficient f score, chi square signal to noise ratio,', 'start': 1225.902, 'duration': 11.929}, {'end': 1239.733, 'text': 'mutual information, etcetera.', 'start': 1237.831, 'duration': 1.902}, {'end': 1245.978, 'text': 'And based on the method that we select, we will rank the features by importance.', 'start': 1240.574, 'duration': 5.404}, {'end': 1258.948, 'text': 'And the user will select a cutoff and we will take the top m features above the cutoff or top few features or features which have ranked above the cutoff.', 'start': 1246.598, 'duration': 12.35}, {'end': 1261.99, 'text': 'I will just talk about one or two of these methods.', 'start': 1259.068, 'duration': 2.922}, {'end': 1272.416, 'text': 'But this univariate methods basically what they do is that the measure some type of correlation between two random variables.', 'start': 1263.772, 'duration': 8.644}, {'end': 1281.48, 'text': 'In this case we are measuring the correlation between a feature between a particular feature and the target variable.', 'start': 1272.976, 'duration': 8.504}], 'summary': 'Feature selection methods rank and select top m features based on univariate methods such as pearson correlation, coefficient f score, chi square, and mutual information.', 'duration': 84.936, 'max_score': 1204.505, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41204505.jpg'}, {'end': 1422.362, 'src': 'embed', 'start': 1333.669, 'weight': 3, 'content': [{'end': 1338.711, 'text': 'So, you want to select features which have high positive or high negative correlation.', 'start': 1333.669, 'duration': 5.042}, {'end': 1340.926, 'text': 'this picture.', 'start': 1340.106, 'duration': 0.82}, {'end': 1351.711, 'text': 'this slide illustrates, you know, some points x and y, and for these points correlation is minus 1,.', 'start': 1340.926, 'duration': 10.785}, {'end': 1355.392, 'text': 'for this points, correlation is between 0 and minus 1,.', 'start': 1351.711, 'duration': 3.681}, {'end': 1358.574, 'text': 'for these points, correlation is between 0 and plus 1,.', 'start': 1355.392, 'duration': 3.182}, {'end': 1366.007, 'text': 'these points, correlation is plus 1, and for these points, correlation is 0..', 'start': 1358.574, 'duration': 7.433}, {'end': 1368.488, 'text': 'So, that is Pearson correlation coefficient.', 'start': 1366.007, 'duration': 2.481}, {'end': 1373.291, 'text': 'I will just briefly mention another method called signal to noise ratio,', 'start': 1368.508, 'duration': 4.783}, {'end': 1380.815, 'text': 'which measures the difference in means divided by the difference in standard deviation between the two classes,', 'start': 1373.291, 'duration': 7.524}, {'end': 1390.1, 'text': 'and the formula is given by mu x minus mu y, divided by sigma x minus sigma y, and large values indicate a strong correlation.', 'start': 1380.815, 'duration': 9.285}, {'end': 1394.823, 'text': 'So, for every feature you can find the value of SNR.', 'start': 1390.6, 'duration': 4.223}, {'end': 1405.152, 'text': 'on with y and then sort them according to this value and select the largest values or those values which are above a cutoff.', 'start': 1395.783, 'duration': 9.369}, {'end': 1412.92, 'text': 'In contrast to the univariate selection methods, we also have multivariate selection methods.', 'start': 1406.494, 'duration': 6.426}, {'end': 1417.865, 'text': 'In multivariate selection methods, they consider all the features.', 'start': 1413.5, 'duration': 4.365}, {'end': 1422.362, 'text': 'For example, we have talked about linear regression.', 'start': 1418.68, 'duration': 3.682}], 'summary': 'Select features with high correlation or snr for better classification.', 'duration': 88.693, 'max_score': 1333.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41333669.jpg'}], 'start': 1196.544, 'title': 'Feature subset selection methods', 'summary': 'Discusses univariate feature selection methods including pearson correlation, coefficient f score, chi square signal to noise ratio, and mutual information to rank features by importance and select top m features based on a user-defined cutoff. it also explains the correlation coefficient formula and the importance of selecting features with high positive or high negative correlation, with correlation values ranging from -1 to 1. additionally, it introduces pearson correlation coefficient and signal to noise ratio as feature selection methods, with snr measuring the difference in means divided by the difference in standard deviation, and the need to consider multivariate selection methods such as linear regression.', 'chapters': [{'end': 1281.48, 'start': 1196.544, 'title': 'Algorithm for feature subset selection', 'summary': 'Discusses the univariate feature selection methods including pearson correlation, coefficient f score, chi square signal to noise ratio, and mutual information to rank features by importance and select top m features based on a user-defined cutoff.', 'duration': 84.936, 'highlights': ['Univariate feature selection methods include Pearson correlation, coefficient f score, chi square signal to noise ratio, and mutual information to rank features by importance.', 'The user will select a cutoff and the algorithm will take the top m features above the cutoff or top few features.', 'Univariate methods measure the correlation between a feature and the target variable, helping in feature selection.']}, {'end': 1366.007, 'start': 1282, 'title': 'Correlation coefficient and feature selection', 'summary': 'Explains the correlation coefficient formula and the importance of selecting features with high positive or high negative correlation, with correlation values ranging from -1 to 1.', 'duration': 84.007, 'highlights': ['The correlation coefficient measures the strength and direction of the relationship between two features, with values ranging from -1 to 1.', 'Features with a correlation value closer to 1 or -1 are highly correlated and should be considered for feature selection.', 'The formula for the correlation coefficient is given by r = Σ(xi - x̄)(yi - ȳ) / (√Σ(xi - x̄)²√Σ(yi - ȳ)²), where x̄ and ȳ are the average values of the random variables xi and yi respectively.']}, {'end': 1422.362, 'start': 1366.007, 'title': 'Feature selection methods', 'summary': 'Introduces pearson correlation coefficient and signal to noise ratio as feature selection methods, with snr measuring the difference in means divided by the difference in standard deviation, and the need to consider multivariate selection methods such as linear regression.', 'duration': 56.355, 'highlights': ['Signal to noise ratio (SNR) measures the difference in means divided by the difference in standard deviation between the two classes, and large values indicate a strong correlation.', 'For every feature, the value of SNR can be found and sorted according to this value, selecting the largest values or those above a cutoff.', 'In contrast to univariate selection methods, multivariate selection methods consider all the features, such as linear regression.']}], 'duration': 225.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41196544.jpg', 'highlights': ['Univariate feature selection methods include Pearson correlation, coefficient f score, chi square signal to noise ratio, and mutual information to rank features by importance.', 'The user will select a cutoff and the algorithm will take the top m features above the cutoff or top few features.', 'Univariate methods measure the correlation between a feature and the target variable, helping in feature selection.', 'The correlation coefficient measures the strength and direction of the relationship between two features, with values ranging from -1 to 1.', 'Features with a correlation value closer to 1 or -1 are highly correlated and should be considered for feature selection.', 'Signal to noise ratio (SNR) measures the difference in means divided by the difference in standard deviation between the two classes, and large values indicate a strong correlation.', 'For every feature, the value of SNR can be found and sorted according to this value, selecting the largest values or those above a cutoff.', 'In contrast to univariate selection methods, multivariate selection methods consider all the features, such as linear regression.']}, {'end': 1559.338, 'segs': [{'end': 1559.338, 'src': 'embed', 'start': 1423.003, 'weight': 0, 'content': [{'end': 1426.525, 'text': 'So, linear regression will find a set of weights.', 'start': 1423.003, 'duration': 3.522}, {'end': 1436.992, 'text': 'In our class we refer to those weights as beta 0, beta 1, beta 2, beta n which are the coefficients of the n attributes and the value of the bias.', 'start': 1427.606, 'duration': 9.386}, {'end': 1440.034, 'text': 'The classification of a point is given by this.', 'start': 1437.752, 'duration': 2.282}, {'end': 1448.076, 'text': 'this equation of the line and the coefficient matrix is given by W transpose.', 'start': 1442.355, 'duration': 5.721}, {'end': 1451.697, 'text': 'Now you look at the values of this matrix.', 'start': 1448.936, 'duration': 2.761}, {'end': 1455.698, 'text': 'if W is small, the corresponding feature.', 'start': 1451.697, 'duration': 4.001}, {'end': 1462.999, 'text': 'if you have w, i, x i and w i is small, x i has less contribution to y.', 'start': 1455.698, 'duration': 7.301}, {'end': 1467.64, 'text': 'but if w i is large, x i has large contribution to y.', 'start': 1462.999, 'duration': 4.641}, {'end': 1470.641, 'text': 'if w i is small, it has small contribution.', 'start': 1467.64, 'duration': 3.001}, {'end': 1486.844, 'text': 'So, for example, if w values are 10.01, minus 9, the features whose coefficients are 10 and minus 9, they have high contribution to the output,', 'start': 1471.821, 'duration': 15.023}, {'end': 1492.105, 'text': 'but the one with coefficient 0.01 has low contribution to the output.', 'start': 1486.844, 'duration': 5.261}, {'end': 1493.865, 'text': 'So, we can drop this feature.', 'start': 1492.185, 'duration': 1.68}, {'end': 1497.966, 'text': 'So, this is multivariate feature selection.', 'start': 1495.725, 'duration': 2.241}, {'end': 1504.068, 'text': 'The W can be obtained by any of linear classifiers.', 'start': 1499.625, 'duration': 4.443}, {'end': 1507.129, 'text': 'For example, we have seen a method in our class.', 'start': 1504.528, 'duration': 2.601}, {'end': 1514.974, 'text': 'A variant of this multivariate feature selection approach is called recursive feature elimination.', 'start': 1508.11, 'duration': 6.864}, {'end': 1524.4, 'text': 'In recursive feature elimination, you compute the W on all the features and you remove the feature with the smallest value of W.', 'start': 1515.414, 'duration': 8.986}, {'end': 1528.026, 'text': 'Then you recompute w on the reduced data.', 'start': 1525.68, 'duration': 2.346}, {'end': 1529.83, 'text': 'So, initially you have n features.', 'start': 1528.166, 'duration': 1.664}, {'end': 1536.152, 'text': 'you remove the feature with the smallest w, you are left with n minus 1 features.', 'start': 1530.871, 'duration': 5.281}, {'end': 1547.375, 'text': 'On this n minus 1 features you recompute w by invoking your algorithm again and recursively you keep doing this until the stopping criteria is met.', 'start': 1536.692, 'duration': 10.683}, {'end': 1550.716, 'text': 'This is about multivariate feature selection.', 'start': 1547.735, 'duration': 2.981}, {'end': 1553.857, 'text': "With this we come to the end of today's lecture.", 'start': 1551.116, 'duration': 2.741}, {'end': 1558.818, 'text': 'In the next volume, next part we will talk about feature extraction.', 'start': 1554.117, 'duration': 4.701}, {'end': 1559.338, 'text': 'Thank you.', 'start': 1558.978, 'duration': 0.36}], 'summary': 'Linear regression finds coefficients for multivariate feature selection, where smaller coefficients indicate less contribution to the output, leading to a method called recursive feature elimination.', 'duration': 136.335, 'max_score': 1423.003, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41423003.jpg'}], 'start': 1423.003, 'title': 'Multivariate feature selection', 'summary': 'Explains multivariate feature selection using linear regression to find coefficients of attributes, emphasizing on the impact of coefficient values on feature contributions, and discusses recursive feature elimination as a method for selecting relevant features.', 'chapters': [{'end': 1559.338, 'start': 1423.003, 'title': 'Multivariate feature selection', 'summary': 'Explains multivariate feature selection using linear regression to find coefficients of attributes, emphasizing on the impact of coefficient values on feature contributions, and discusses recursive feature elimination as a method for selecting relevant features, concluding with a mention of the upcoming topic on feature extraction.', 'duration': 136.335, 'highlights': ['The coefficients of the attributes in linear regression, denoted as beta 0, beta 1, beta 2, beta n, determine the contribution of each feature to the output.', 'The value of the coefficients (W) in the equation impacts the contribution of features to the output; larger coefficients indicate higher contribution, while smaller coefficients indicate lower contribution.', "Multivariate feature selection involves dropping features with small coefficients, while retaining those with large coefficients, to optimize the model's performance.", 'Recursive feature elimination is a method of multivariate feature selection where features with the smallest coefficient values are removed iteratively until a stopping criteria is met, thereby reducing the feature set for improved model performance.', 'The upcoming topic to be discussed is feature extraction, marking the end of the lecture on multivariate feature selection.']}], 'duration': 136.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/KTzXVnRlnw4/pics/KTzXVnRlnw41423003.jpg', 'highlights': ['The value of the coefficients (W) in the equation impacts the contribution of features to the output; larger coefficients indicate higher contribution, while smaller coefficients indicate lower contribution.', 'Recursive feature elimination is a method of multivariate feature selection where features with the smallest coefficient values are removed iteratively until a stopping criteria is met, thereby reducing the feature set for improved model performance.', "Multivariate feature selection involves dropping features with small coefficients, while retaining those with large coefficients, to optimize the model's performance.", 'The coefficients of the attributes in linear regression, denoted as beta 0, beta 1, beta 2, beta n, determine the contribution of each feature to the output.', 'The upcoming topic to be discussed is feature extraction, marking the end of the lecture on multivariate feature selection.']}], 'highlights': ['Feature reduction aims to optimize classification accuracy and simplify classifier complexity.', 'The assumption that more features always lead to better information or classification performance is challenged.', 'The curse of dimensionality in machine learning is caused by irrelevant and redundant features.', 'The number of subsets possible for n features is 2 to the power n, making it impossible to enumerate each possible subset.', 'Feature subset selection can utilize optimum, heuristic, or randomized methods based on the structure of the hypothesis space or the feature subset space.', 'The feature selection process in machine learning involves an optimization problem where a search algorithm selects a feature subset and scores it on an objective function.', 'The forward selection algorithm iteratively adds features one by one, estimating the accuracy for each and selecting the feature that gives maximum improvement.', 'Univariate feature selection methods include Pearson correlation, coefficient f score, chi square signal to noise ratio, and mutual information to rank features by importance.', 'The value of the coefficients (W) in the equation impacts the contribution of features to the output; larger coefficients indicate higher contribution, while smaller coefficients indicate lower contribution.', 'Recursive feature elimination is a method of multivariate feature selection where features with the smallest coefficient values are removed iteratively until a stopping criteria is met, thereby reducing the feature set for improved model performance.']}