title

Kernels Introduction - Practical Machine Learning Tutorial with Python p.29

description

In this machine learning tutorial, we introduce the concept of Kernels. Kernels can be used with the Support Vector Machine in order to take a new perspective and hopefully allow us to translate into further dimensions in order to find a linearly separable case.
https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex

detail

{'title': 'Kernels Introduction - Practical Machine Learning Tutorial with Python p.29', 'heatmap': [{'end': 259.668, 'start': 230.113, 'weight': 0.784}], 'summary': 'Covers challenges of non-linearly separable data in support vector machines, adding dimensions leads to a 50% increase in data, potential training issues, using kernels to augment svms for handling nonlinear data, and the concept of inner product and its interchangeability for classification algorithms.', 'chapters': [{'end': 169.478, 'segs': [{'end': 56.907, 'src': 'embed', 'start': 30.01, 'weight': 1, 'content': [{'end': 40.201, 'text': "Now unfortunately, it's just the case that in the real world, you are almost certainly not going to get linearly separable data.", 'start': 30.01, 'duration': 10.191}, {'end': 41.823, 'text': "It's just highly unlikely.", 'start': 40.321, 'duration': 1.502}, {'end': 50.164, 'text': 'so, for example, what we have here is a two-dimensional feature set with the classes plotted on.', 'start': 43.641, 'duration': 6.523}, {'end': 56.907, 'text': 'you got the plus class and the minus class here and, of course, if i was to ask you okay, what are the support vectors right,', 'start': 50.164, 'duration': 6.743}], 'summary': 'Real-world data is unlikely to be linearly separable.', 'duration': 26.897, 'max_score': 30.01, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c30010.jpg'}, {'end': 169.478, 'src': 'embed', 'start': 117.974, 'weight': 0, 'content': [{'end': 121.036, 'text': "you're doing image analysis or video analysis basically the same thing.", 'start': 117.974, 'duration': 3.062}, {'end': 126.919, 'text': 'You might have hundreds of features or thousands of features even.', 'start': 122.316, 'duration': 4.603}, {'end': 132.684, 'text': 'And so looking at the data we have right here, obviously we just added one more dimension.', 'start': 127.461, 'duration': 5.223}, {'end': 134.665, 'text': 'We went from two dimensions to three dimensions.', 'start': 132.744, 'duration': 1.921}, {'end': 141.709, 'text': "Not a horrible thing, but actually, in reality, that's a 50% increase in data, right?", 'start': 135.065, 'duration': 6.644}, {'end': 145.291, 'text': 'And what was that main downfall of the support vector machine??', 'start': 141.789, 'duration': 3.502}, {'end': 145.892, 'text': 'Oh right,', 'start': 145.491, 'duration': 0.401}, {'end': 150.894, 'text': 'It was training it because if you have a large size of data,', 'start': 146.332, 'duration': 4.562}, {'end': 157.218, 'text': 'it is extremely cumbersome to train it because of that quadratic programming and optimization problem that we have.', 'start': 150.894, 'duration': 6.324}, {'end': 169.478, 'text': 'So is it really a great idea to be multiplying our data set maybe by 1.5x? Probably not.', 'start': 157.995, 'duration': 11.483}], 'summary': 'Increasing dimensions by 50% can lead to 50% increase in data, affecting training of support vector machine.', 'duration': 51.504, 'max_score': 117.974, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c117974.jpg'}], 'start': 1.937, 'title': 'Support vector machine reality check', 'summary': 'Discusses the challenges of working with non-linearly separable data in support vector machines, and the implications of adding dimensions to the feature set, leading to a 50% increase in data and potential training issues.', 'chapters': [{'end': 169.478, 'start': 1.937, 'title': 'Support vector machine reality check', 'summary': 'Discusses the challenges of working with non-linearly separable data in support vector machines, and the implications of adding dimensions to the feature set, leading to a 50% increase in data and potential training issues.', 'duration': 167.541, 'highlights': ['Adding a new dimension to the feature set can lead to a 50% increase in data, posing potential training issues for the Support Vector Machine. The addition of a new dimension to the feature set resulted in a 50% increase in data, which could lead to training issues for the Support Vector Machine due to its quadratic programming and optimization problem.', "Real-world data is unlikely to be linearly separable, presenting challenges for the Support Vector Machine. In the real world, linearly separable data is highly unlikely, posing a challenge for the Support Vector Machine's functionality.", 'Challenges arise when working with large feature sets, such as in image or video analysis, due to potential data multiplication and training complexity. When dealing with large feature sets, such as in image or video analysis, the addition of dimensions could lead to a significant increase in data, resulting in potential training complexity for the Support Vector Machine.']}], 'duration': 167.541, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c1937.jpg', 'highlights': ['Adding a new dimension to the feature set can lead to a 50% increase in data, posing potential training issues for the Support Vector Machine.', 'Real-world data is unlikely to be linearly separable, presenting challenges for the Support Vector Machine.', 'Challenges arise when working with large feature sets, such as in image or video analysis, due to potential data multiplication and training complexity.']}, {'end': 580.955, 'segs': [{'end': 267.831, 'src': 'heatmap', 'start': 230.113, 'weight': 0, 'content': [{'end': 237.317, 'text': 'and may find themselves under the impression that kernels are really just something used in a support vector machine.', 'start': 230.113, 'duration': 7.204}, {'end': 237.777, 'text': "and that's it.", 'start': 237.317, 'duration': 0.46}, {'end': 239.438, 'text': "That's not the case.", 'start': 238.658, 'duration': 0.78}, {'end': 243.4, 'text': 'A kernel is a similarity function.', 'start': 239.879, 'duration': 3.521}, {'end': 246.502, 'text': 'It takes two inputs and outputs their similarity.', 'start': 243.46, 'duration': 3.042}, {'end': 247.903, 'text': 'Simple as that.', 'start': 247.182, 'duration': 0.721}, {'end': 254.327, 'text': 'Being that this is a machine learning tutorial, some of you might be thinking already well,', 'start': 249.266, 'duration': 5.061}, {'end': 257.327, 'text': "why don't people just use kernels as machine learning classifiers?", 'start': 254.327, 'duration': 3}, {'end': 259.668, 'text': 'And I would say that they do.', 'start': 257.648, 'duration': 2.02}, {'end': 267.831, 'text': 'It just so happens that we can actually use kernels to augment or add to our support vector machine, hopefully.', 'start': 260.569, 'duration': 7.262}], 'summary': 'Kernels are not just for support vector machines; they can augment classifiers.', 'duration': 80.226, 'max_score': 230.113, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c230113.jpg'}, {'end': 386.736, 'src': 'embed', 'start': 356.623, 'weight': 3, 'content': [{'end': 365.351, 'text': "So, with kernels, there's really just one major element to kernels, and that is that they are done using inner product.", 'start': 356.623, 'duration': 8.728}, {'end': 371.818, 'text': "For the purposes of what we're going to be using them for and the purposes of this entire tutorial.", 'start': 366.952, 'duration': 4.866}, {'end': 372.859, 'text': 'what is inner product?', 'start': 371.818, 'duration': 1.041}, {'end': 374.021, 'text': 'what is dot product?', 'start': 372.859, 'duration': 1.162}, {'end': 374.741, 'text': 'how do they relate??', 'start': 374.021, 'duration': 0.72}, {'end': 377.124, 'text': 'They are the same thing.', 'start': 375.582, 'duration': 1.542}, {'end': 386.736, 'text': "If you pull up NumPy right now, you make a couple vectors and you do np.inner or np.dot, you're going to find.", 'start': 377.164, 'duration': 9.572}], 'summary': 'Kernels are done using inner product, which is the same as dot product.', 'duration': 30.113, 'max_score': 356.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c356623.jpg'}, {'end': 444.428, 'src': 'embed', 'start': 418.506, 'weight': 5, 'content': [{'end': 424.849, 'text': 'So, to find out if we can use a kernel, we know that in order to use a kernel, we have to be able to use inner product.', 'start': 418.506, 'duration': 6.343}, {'end': 430.471, 'text': "And the way that we can find out is basically we're trying to get to some sort of new dimensional space.", 'start': 424.889, 'duration': 5.582}, {'end': 435.813, 'text': "So up to this point, we've been using basically we've got x.", 'start': 431.011, 'duration': 4.802}, {'end': 437.874, 'text': "We're dealing in an x space.", 'start': 435.813, 'duration': 2.061}, {'end': 444.428, 'text': 'Okay, because our feature sets are denoted as x1, x2, and so on.', 'start': 439.42, 'duration': 5.008}], 'summary': 'Exploring usage of kernel and inner product in dimensional space.', 'duration': 25.922, 'max_score': 418.506, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c418506.jpg'}, {'end': 585.407, 'src': 'embed', 'start': 560.2, 'weight': 4, 'content': [{'end': 565.164, 'text': "Anyway, 5 dimensions or 50 dimensions, does it matter? No, it doesn't, because it's just going to return a scalar.", 'start': 560.2, 'duration': 4.964}, {'end': 572.949, 'text': 'So modifying x you know our x space to a z space, an unknown dimension space.', 'start': 565.524, 'duration': 7.425}, {'end': 578.473, 'text': 'is that going to have an effect on the classification algorithm, that old sine algorithm?', 'start': 572.949, 'duration': 5.524}, {'end': 580.955, 'text': 'Nope, no problem there.', 'start': 579.634, 'duration': 1.321}, {'end': 585.407, 'text': 'Now, how about the constraints? Probably where a lot of people stopped watching the series.', 'start': 582.406, 'duration': 3.001}], 'summary': "Modifying dimensions doesn't affect classification algorithm or constraints.", 'duration': 25.207, 'max_score': 560.2, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c560200.jpg'}], 'start': 169.498, 'title': 'Understanding kernels and inner product in machine learning', 'summary': 'Discusses the potential of using kernels to augment support vector machines for handling nonlinear data, emphasizing their role in avoiding high processing costs. it also explains the concept of kernels and inner product, highlighting their interchangeability and essentiality for classification algorithms.', 'chapters': [{'end': 355.642, 'start': 169.498, 'title': 'Kernels in machine learning', 'summary': 'Discusses the potential of using kernels to perform calculations in implausibly infinite dimensions, explaining their role in augmenting support vector machines to handle nonlinear data and avoid high processing costs.', 'duration': 186.144, 'highlights': ['Kernels enable calculations in implausibly infinite dimensions, offering a way to transform nonlinear data and create linearly separable situations, thus avoiding high processing costs associated with adding dimensions (e.g. 50 or more).', 'Kernels serve as a similarity function, taking two inputs and outputting their similarity, providing a valuable tool for working with nonlinear data by transforming it to higher dimensions and creating linearly separable situations.', 'Using kernels to augment support vector machines allows for the handling of nonlinear data by transforming it to higher dimensions, potentially requiring a significant number of dimensions (e.g. 100 or a million) to achieve linear separability, highlighting the potential computational savings.']}, {'end': 580.955, 'start': 356.623, 'title': 'Understanding kernels and inner product', 'summary': 'Explains the concept of kernels and inner product, emphasizing that they are interchangeable and essential for classification algorithms, with a focus on the use of inner product and the interchangeability of x and z spaces.', 'duration': 224.332, 'highlights': ['Kernels are done using inner product, which is interchangeable with dot product, as confirmed by np.inner or np.dot yielding the same results.', 'In order to use a kernel, one must be able to use inner product and determine if interchangeability between x and z spaces will affect the classification algorithm.', 'The classification algorithm is based on the sine of w.x plus b, which essentially constitutes a dot product, making it feasible to interchange x with z space regardless of the dimensions.', 'The interchangeability of x and z spaces will not affect the classification algorithm based on the sine algorithm, as it returns a scalar value.', 'The use of inner product and interchangeability of x and z spaces are fundamental to understanding the concept of kernels and their relevance in classification algorithms.']}], 'duration': 411.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c169498.jpg', 'highlights': ['Kernels enable calculations in implausibly infinite dimensions, transforming nonlinear data to create linearly separable situations, avoiding high processing costs.', 'Using kernels to augment support vector machines allows for handling nonlinear data by transforming it to higher dimensions, potentially requiring a significant number of dimensions (e.g. 100 or a million) to achieve linear separability, highlighting the potential computational savings.', 'Kernels serve as a similarity function, providing a valuable tool for working with nonlinear data by transforming it to higher dimensions and creating linearly separable situations.', 'Kernels are done using inner product, which is interchangeable with dot product, as confirmed by np.inner or np.dot yielding the same results.', 'The interchangeability of x and z spaces will not affect the classification algorithm based on the sine algorithm, as it returns a scalar value.', 'The use of inner product and interchangeability of x and z spaces are fundamental to understanding the concept of kernels and their relevance in classification algorithms.']}, {'end': 920.966, 'segs': [{'end': 632.8, 'src': 'embed', 'start': 607.552, 'weight': 0, 'content': [{'end': 624.177, 'text': 'So the first constraint was the requirement that y sub i, multiplied by x sub i, w, plus, b, minus one, was greater than or equal to zero.', 'start': 607.552, 'duration': 16.625}, {'end': 626.278, 'text': "right, and again that's.", 'start': 624.177, 'duration': 2.101}, {'end': 627.018, 'text': "uh, we didn't have.", 'start': 626.278, 'duration': 0.74}, {'end': 631.139, 'text': "we don't have the dot there, but it's just kind of expected that you you know that by this point.", 'start': 627.018, 'duration': 4.121}, {'end': 632.8, 'text': "but anyway, that is, there's your dot.", 'start': 631.139, 'duration': 1.661}], 'summary': 'First constraint: y_i * x_i * w + b - 1 >= 0', 'duration': 25.248, 'max_score': 607.552, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c607552.jpg'}, {'end': 685.6, 'src': 'embed', 'start': 653.441, 'weight': 1, 'content': [{'end': 662.926, 'text': "What about that other constraint? Well, the other formal constraint was that basically, it wasn't even really a constraint, it's just a value.", 'start': 653.441, 'duration': 9.485}, {'end': 669.729, 'text': 'Once we found, if you recall, the quadratic programming problem was basically going to give us some values for alpha.', 'start': 663.106, 'duration': 6.623}, {'end': 673.591, 'text': 'So that was going to be the value for eventually w.', 'start': 670.27, 'duration': 3.321}, {'end': 685.6, 'text': 'And that was going to eventually give us basically the sum over alpha i, y i, x i.', 'start': 674.972, 'duration': 10.628}], 'summary': 'Quadratic programming problem yields values for alpha, eventually leading to sum over alpha i, y i, x i.', 'duration': 32.159, 'max_score': 653.441, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c653441.jpg'}, {'end': 840.732, 'src': 'embed', 'start': 815.004, 'weight': 2, 'content': [{'end': 819.365, 'text': 'The main value here is that you can go out to all those dimensions without actually paying the processing cost.', 'start': 815.004, 'duration': 4.361}, {'end': 826.908, 'text': "And that's the entire reason why we're going to use kernels rather than some sort of function that just creates new dimensions,", 'start': 820.306, 'duration': 6.602}, {'end': 828.728, 'text': 'like what we wrote out before.', 'start': 826.908, 'duration': 1.82}, {'end': 833.85, 'text': "So that's what we're going to be talking about in the next tutorial is actually applying a kernel,", 'start': 828.748, 'duration': 5.102}, {'end': 839.011, 'text': 'working out that kernel by hand and truly showing it.', 'start': 833.85, 'duration': 5.161}, {'end': 840.732, 'text': 'Because we can actually do it with one of the kernels.', 'start': 839.031, 'duration': 1.701}], 'summary': 'Using kernels allows for efficient dimension expansion without added processing cost.', 'duration': 25.728, 'max_score': 815.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c815004.jpg'}, {'end': 911.09, 'src': 'embed', 'start': 870.599, 'weight': 3, 'content': [{'end': 877.306, 'text': 'We can restate many of the machine learning algorithms or even create our own machine learning algorithms purely with kernels.', 'start': 870.599, 'duration': 6.707}, {'end': 882.932, 'text': 'So a kernel just takes two inputs and outputs the similarity of them using the inner product.', 'start': 877.727, 'duration': 5.205}, {'end': 890.257, 'text': "Inner product is a projection of, let's say, X1 onto X2.", 'start': 883.653, 'duration': 6.604}, {'end': 897.782, 'text': "Basically, how much overlapping do we have going on there, which is how and why it's considered to be a degree of similarity.", 'start': 890.577, 'duration': 7.205}, {'end': 905.747, 'text': 'So we know that we can use a kernel to help us transform our feature space, our X space,', 'start': 898.902, 'duration': 6.845}, {'end': 911.09, 'text': 'because every interaction with that feature space is an inner product reaction.', 'start': 905.747, 'duration': 5.343}], 'summary': 'Kernels can transform feature space using inner product, increasing similarity.', 'duration': 40.491, 'max_score': 870.599, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c870599.jpg'}], 'start': 582.406, 'title': 'Constraints in quadratic programming and understanding kernels in support vector machines', 'summary': 'Discusses constraints in quadratic programming, including the requirement of y sub i multiplied by x sub i, w, plus, b, minus one, being greater than or equal to zero. it also explains the concept of kernels in support vector machines, highlighting their ability to transform feature sets and the role of inner product in measuring similarity.', 'chapters': [{'end': 760.317, 'start': 582.406, 'title': 'Constraints in quadratic programming', 'summary': 'Discusses the constraints in quadratic programming, including the requirement of y sub i multiplied by x sub i, w, plus, b, minus one, being greater than or equal to zero, and the values for alpha in the quadratic programming problem eventually giving the sum over alpha i, y i, x i.', 'duration': 177.911, 'highlights': ['The requirement that y sub i multiplied by x sub i, w, plus, b, minus one, be greater than or equal to zero is a major constraint in the quadratic programming.', 'The quadratic programming problem gives us values for alpha, eventually determining the sum over alpha i, y i, x i, which is an important aspect to deal with.']}, {'end': 920.966, 'start': 760.717, 'title': 'Understanding kernels in support vector machines', 'summary': 'Explains the concept of kernels in support vector machines, highlighting their ability to transform feature sets, the role of inner product in measuring similarity, and the versatility of kernels in machine learning algorithms.', 'duration': 160.249, 'highlights': ['The main value of kernels is the ability to go out to all those dimensions without paying the processing cost. Kernels allow for expanding to multiple dimensions without incurring processing costs.', 'A kernel is just a similarity function and can be used to restate many machine learning algorithms or create new ones. Kernels serve as a similarity function and can be applied to various machine learning algorithms.', 'Inner product is a projection of X1 onto X2, representing the degree of similarity between the two inputs. The inner product measures the similarity between two inputs based on their overlap.', 'Every interaction with the feature space is an inner product reaction, enabling the transformation of the feature space using kernels. Kernels facilitate the transformation of the feature space through inner product interactions.']}], 'duration': 338.56, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9IfT8KXX_9c/pics/9IfT8KXX_9c582406.jpg', 'highlights': ['The requirement that y sub i multiplied by x sub i, w, plus, b, minus one, be greater than or equal to zero is a major constraint in the quadratic programming.', 'The quadratic programming problem gives us values for alpha, eventually determining the sum over alpha i, y i, x i, which is an important aspect to deal with.', 'Kernels allow for expanding to multiple dimensions without incurring processing costs.', 'A kernel is just a similarity function and can be used to restate many machine learning algorithms or create new ones.', 'Inner product measures the similarity between two inputs based on their overlap.', 'Kernels facilitate the transformation of the feature space through inner product interactions.']}], 'highlights': ['Kernels enable calculations in implausibly infinite dimensions, transforming nonlinear data to create linearly separable situations, avoiding high processing costs.', 'Using kernels to augment support vector machines allows for handling nonlinear data by transforming it to higher dimensions, potentially requiring a significant number of dimensions (e.g. 100 or a million) to achieve linear separability, highlighting the potential computational savings.', 'The interchangeability of x and z spaces will not affect the classification algorithm based on the sine algorithm, as it returns a scalar value.', 'The use of inner product and interchangeability of x and z spaces are fundamental to understanding the concept of kernels and their relevance in classification algorithms.', 'The requirement that y sub i multiplied by x sub i, w, plus, b, minus one, be greater than or equal to zero is a major constraint in the quadratic programming.', 'The quadratic programming problem gives us values for alpha, eventually determining the sum over alpha i, y i, x i, which is an important aspect to deal with.', 'Kernels allow for expanding to multiple dimensions without incurring processing costs.', 'A kernel is just a similarity function and can be used to restate many machine learning algorithms or create new ones.', 'Inner product measures the similarity between two inputs based on their overlap.', 'Kernels facilitate the transformation of the feature space through inner product interactions.', 'Challenges arise when working with large feature sets, such as in image or video analysis, due to potential data multiplication and training complexity.', 'Real-world data is unlikely to be linearly separable, presenting challenges for the Support Vector Machine.', 'Adding a new dimension to the feature set can lead to a 50% increase in data, posing potential training issues for the Support Vector Machine.']}