title

Lecture 15 - Kernel Methods

description

Kernel Methods - Extending SVM to infinite-dimensional spaces using the kernel trick, and to non-separable data using soft margins. Lecture 15 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on May 22, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

detail

{'title': 'Lecture 15 - Kernel Methods', 'heatmap': [{'end': 758.383, 'start': 702.489, 'weight': 0.744}, {'end': 987.424, 'start': 892.194, 'weight': 0.787}, {'end': 1695.431, 'start': 1642.762, 'weight': 0.808}, {'end': 2725.525, 'start': 2623.839, 'weight': 0.787}, {'end': 3195.859, 'start': 3053.004, 'weight': 0.702}, {'end': 3807.056, 'start': 3568.718, 'weight': 0.763}], 'summary': 'This lecture covers support vector machines, emphasizing the importance of margin maximization and the use of lagrangian and quadratic programming to solve for alphas, explores support vectors, kernel methods, inner product in z space, kernel functions, transformation, validation, error measures, constraints on alpha and beta, and dimensionality in the context of svm.', 'chapters': [{'end': 151.065, 'segs': [{'end': 90.033, 'src': 'embed', 'start': 0.703, 'weight': 0, 'content': [{'end': 1.204, 'text': 'Welcome back.', 'start': 0.703, 'duration': 0.501}, {'end': 3.586, 'text': 'Last time, we introduced support vector machines.', 'start': 1.224, 'duration': 2.362}, {'end': 30.814, 'text': 'And if you think of linear models as economy cars, which is what we said when we introduced them,', 'start': 23.551, 'duration': 7.263}, {'end': 34.736, 'text': 'you can think of support vector machines as the luxury line of those cars.', 'start': 30.814, 'duration': 3.922}, {'end': 43.819, 'text': 'And indeed, they are nothing but a linear model in the simplest form, except that they actually are a little bit more keen on the performance.', 'start': 35.976, 'duration': 7.843}, {'end': 47.801, 'text': 'And the key to the performance was the idea of the margin.', 'start': 44.88, 'duration': 2.921}, {'end': 55.85, 'text': 'is that if the data is linearly separable, there is more than one line that can separate the data.', 'start': 48.686, 'duration': 7.164}, {'end': 63.694, 'text': 'And if you take the line that has the biggest margin, furthest away from the closest point, then you have an advantage.', 'start': 56.77, 'duration': 6.924}, {'end': 69.437, 'text': "It's both an intuitive advantage and an advantage that can be theoretically established,", 'start': 63.994, 'duration': 5.443}, {'end': 72.459, 'text': 'which we did through the idea of the growth function in this case.', 'start': 69.437, 'duration': 3.022}, {'end': 78.443, 'text': "And after we determined that it's a good idea to maximize the margin, we set out to do that.", 'start': 73.599, 'duration': 4.844}, {'end': 84.568, 'text': 'And after a chain of mathematics, we ended up with a Lagrangian that we are going to maximize.', 'start': 79.344, 'duration': 5.224}, {'end': 87.35, 'text': 'And the Lagrangian has very interesting properties.', 'start': 85.209, 'duration': 2.141}, {'end': 90.033, 'text': "It's quadratic, so it's a simple function.", 'start': 87.491, 'duration': 2.542}], 'summary': 'Support vector machines maximize margin for better performance.', 'duration': 89.33, 'max_score': 0.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU703.jpg'}, {'end': 135.381, 'src': 'embed', 'start': 110.208, 'weight': 3, 'content': [{'end': 116.071, 'text': 'Now, quadratic programming will have problems with solving this if the number of examples is bigger.', 'start': 110.208, 'duration': 5.863}, {'end': 120.433, 'text': 'So once you get to thousands, it becomes an issue.', 'start': 116.811, 'duration': 3.622}, {'end': 123.535, 'text': 'And then there are all kinds of heuristics to deal with that case.', 'start': 120.793, 'duration': 2.742}, {'end': 129.598, 'text': 'And in general, quadratic programming sometimes needs babysitting, tweaking, limiting range and whatnot.', 'start': 124.115, 'duration': 5.483}, {'end': 135.381, 'text': 'But at least someone else wrote it, and we only have to do these things in order to get the solution, rather than to write this from scratch.', 'start': 129.978, 'duration': 5.403}], 'summary': 'Quadratic programming struggles with large datasets, requiring heuristics and tweaking, but saves time compared to writing from scratch.', 'duration': 25.173, 'max_score': 110.208, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU110208.jpg'}], 'start': 0.703, 'title': 'Support vector machines', 'summary': 'Introduces support vector machines as a luxury line of linear models, emphasizing the importance of maximizing the margin for performance, and the use of lagrangian and quadratic programming to solve for alphas, with the majority of alphas being 0.', 'chapters': [{'end': 151.065, 'start': 0.703, 'title': 'Support vector machines: luxury line of linear models', 'summary': 'Introduces support vector machines as a luxury line of linear models, emphasizing the importance of maximizing the margin for performance, and the use of lagrangian and quadratic programming to solve for alphas, with the majority of alphas being 0.', 'duration': 150.362, 'highlights': ['The importance of maximizing the margin for performance by selecting the line with the biggest margin, as it provides an advantage both intuitively and theoretically, as established through the idea of the growth function.', 'The use of Lagrangian, a quadratic function with simple inequality constraints and one equality constraint, to be maximized to solve for alphas, with the majority of alphas being 0, providing an interesting interpretation.', 'The introduction of support vector machines as a luxury line of linear models, highlighting their performance and keenness, compared to the economy cars represented by linear models.', 'The use of quadratic programming to solve for alphas, with potential issues when the number of examples is large, necessitating the use of heuristics and babysitting to deal with the problem.']}], 'duration': 150.362, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU703.jpg', 'highlights': ['The importance of maximizing the margin for performance by selecting the line with the biggest margin, providing an advantage both intuitively and theoretically.', 'The use of Lagrangian, a quadratic function with simple inequality constraints and one equality constraint, to be maximized to solve for alphas, with the majority of alphas being 0, providing an interesting interpretation.', 'The introduction of support vector machines as a luxury line of linear models, highlighting their performance and keenness, compared to the economy cars represented by linear models.', 'The use of quadratic programming to solve for alphas, with potential issues when the number of examples is large, necessitating the use of heuristics and babysitting to deal with the problem.']}, {'end': 500.744, 'segs': [{'end': 205.351, 'src': 'embed', 'start': 173.091, 'weight': 0, 'content': [{'end': 176.434, 'text': 'the support vectors are the ones that achieve the margin.', 'start': 173.091, 'duration': 3.343}, {'end': 178.796, 'text': 'They are sitting exactly at the critical point here.', 'start': 176.534, 'duration': 2.262}, {'end': 181.618, 'text': 'And they are used to define the plane.', 'start': 179.657, 'duration': 1.961}, {'end': 193.906, 'text': 'And the most important aspect about them is the fact that you can predict abound on the out-of-sample error based on the number of support vectors you get.', 'start': 182.459, 'duration': 11.447}, {'end': 199.549, 'text': 'And it is the normal form of dividing the complexity in terms of the number of parameters,', 'start': 194.487, 'duration': 5.062}, {'end': 205.351, 'text': 'in this case the non-zero alphas or the number of support vectors that corresponds to it, divided by more or less the number of examples.', 'start': 199.549, 'duration': 5.802}], 'summary': 'Support vectors define the plane, predict out-of-sample error, and divide complexity based on non-zero alphas or support vectors.', 'duration': 32.26, 'max_score': 173.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU173091.jpg'}, {'end': 368.606, 'src': 'embed', 'start': 324.652, 'weight': 1, 'content': [{'end': 334.758, 'text': 'But the hypothesis we get is really, although it looks very, very complex, it really belongs to a simple set, because it maximizes the margin.', 'start': 324.652, 'duration': 10.106}, {'end': 338.68, 'text': 'So we get the benefit of a fairly low out-of-sample error,', 'start': 335.058, 'duration': 3.622}, {'end': 342.603, 'text': 'in spite of the fact that we captured the fitting very well by getting the 0 in sample error.', 'start': 338.68, 'duration': 3.923}, {'end': 344.984, 'text': 'Now, this is exaggerated.', 'start': 343.303, 'duration': 1.681}, {'end': 346.704, 'text': 'I grant you that.', 'start': 345.004, 'duration': 1.7}, {'end': 348.785, 'text': 'But it has an element of truth in it.', 'start': 347.044, 'duration': 1.741}, {'end': 352.066, 'text': 'And it captures what support vector machines do.', 'start': 349.185, 'duration': 2.881}, {'end': 356.108, 'text': 'They allow you to go very sophisticated without fully paying the price for it.', 'start': 352.507, 'duration': 3.601}, {'end': 363.821, 'text': 'Today, we are going to continue this by extending the support vector machines in the basic case.', 'start': 358.756, 'duration': 5.065}, {'end': 368.606, 'text': 'And we are going to cover the main method, which is the kernel methods, in the bulk of the lecture.', 'start': 364.582, 'duration': 4.024}], 'summary': 'Support vector machines maximize margin, leading to low out-of-sample error and sophisticated performance without paying the full price.', 'duration': 43.954, 'max_score': 324.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU324652.jpg'}, {'end': 416.259, 'src': 'embed', 'start': 392.239, 'weight': 5, 'content': [{'end': 400.066, 'text': 'The other topic is to extend support vector machines from the linearly separable case to the non-linearly separable case,', 'start': 392.239, 'duration': 7.827}, {'end': 402.788, 'text': 'allowing yourself to make errors.', 'start': 400.066, 'duration': 2.722}, {'end': 407.072, 'text': 'This is pretty much that if you were using perceptrons and went to pocket,', 'start': 402.828, 'duration': 4.244}, {'end': 416.259, 'text': 'this would be if you went from the support vector machines that we introduced that we are going to label now hard margin because they strictly obey the margin,', 'start': 407.072, 'duration': 9.187}], 'summary': 'Extending support vector machines to handle non-linearly separable cases.', 'duration': 24.02, 'max_score': 392.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU392239.jpg'}, {'end': 487.472, 'src': 'embed', 'start': 459.801, 'weight': 4, 'content': [{'end': 465.602, 'text': 'If you remember from last lecture, the way Z manifests itself in the computation is very simple.', 'start': 459.801, 'duration': 5.801}, {'end': 467.902, 'text': 'You do an inner product in the Z space.', 'start': 466.042, 'duration': 1.86}, {'end': 471.203, 'text': "And from then on, it's a regular quadratic programming problem.", 'start': 468.363, 'duration': 2.84}, {'end': 478.485, 'text': 'And the dimensionality of the problem depends on the number of examples, not on the dimensionality of the Z space, once you get the inner product.', 'start': 471.723, 'duration': 6.762}, {'end': 487.472, 'text': 'And when you get the result back, you count the number of support vectors, which really depends again on the number of examples,', 'start': 479.245, 'duration': 8.227}], 'summary': 'Z manifests in computation via inner product, leading to regular quadratic programming problem. support vectors depend on number of examples.', 'duration': 27.671, 'max_score': 459.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU459801.jpg'}], 'start': 151.065, 'title': 'Support vectors, nonlinear transform, and kernel methods', 'summary': "Explores support vectors' role in defining margin, their impact on error prediction, use of nonlinear transform for non-linearly separable data, and extension of support vector machines using kernel methods to handle complex problems and outliers.", 'chapters': [{'end': 348.785, 'start': 151.065, 'title': 'Support vectors and nonlinear transform', 'summary': 'Explains the concept of support vectors, their role in defining the margin, and their impact on in-sample and out-of-sample error prediction, while also discussing the use of nonlinear transform to handle non-linearly separable data and its impact on hypothesis complexity and generalization.', 'duration': 197.72, 'highlights': ['The support vectors achieve the margin and are used to define the plane, and the number of support vectors can predict the out-of-sample error based on the in-sample quantity. The support vectors achieve the margin and define the plane, and the number of support vectors can predict the out-of-sample error based on the in-sample quantity, providing a measure for error prediction.', 'The use of nonlinear transform leads to a complex hypothesis set in a high-dimensional space, but it maximizes the margin, resulting in low out-of-sample error despite capturing the fitting well. The use of nonlinear transform results in a complex hypothesis set in a high-dimensional space, but it maximizes the margin, leading to low out-of-sample error despite capturing the fitting well.', 'The complexity of the hypothesis set is impacted by the nonlinear transform, but the resulting hypothesis, despite being complex, belongs to a simple set due to maximizing the margin. The complexity of the hypothesis set is impacted by the nonlinear transform, but the resulting hypothesis, despite being complex, belongs to a simple set due to maximizing the margin.']}, {'end': 500.744, 'start': 349.185, 'title': 'Kernel methods in support vector machines', 'summary': 'Covers the extension of support vector machines using kernel methods to go to a high-dimensional or infinite-dimensional space without paying the price for it and allowing some errors, expanding the ability to deal with complex problems and not letting outliers dictate an unduly complex nonlinear transformation.', 'duration': 151.559, 'highlights': ['The chapter covers the extension of support vector machines using kernel methods to go to a high-dimensional or infinite-dimensional space without paying the price for it and allowing some errors This extension expands the ability to deal with complex problems and not letting outliers dictate an unduly complex nonlinear transformation.', "The kernels allow you to go to the Z space without paying the price for it, and the computation of the Z space manifests itself as a simple inner product, making the problem's dimensionality depend on the number of examples, not on the dimensionality of the Z space The inner product in the Z space simplifies the computation, and the problem's dimensionality depends on the number of examples, not on the dimensionality of the Z space.", 'The other topic is to extend support vector machines from the linearly separable case to the non-linearly separable case, allowing yourself to make errors This extension allows for the transition from hard margin (strictly obeying the margin) to soft margin (allowing some errors) in support vector machines, expanding the capability to handle non-linearly separable cases.']}], 'duration': 349.679, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU151065.jpg', 'highlights': ['The support vectors achieve the margin and define the plane, and the number of support vectors can predict the out-of-sample error based on the in-sample quantity, providing a measure for error prediction.', 'The use of nonlinear transform results in a complex hypothesis set in a high-dimensional space, but it maximizes the margin, leading to low out-of-sample error despite capturing the fitting well.', 'The complexity of the hypothesis set is impacted by the nonlinear transform, but the resulting hypothesis, despite being complex, belongs to a simple set due to maximizing the margin.', 'The chapter covers the extension of support vector machines using kernel methods to go to a high-dimensional or infinite-dimensional space without paying the price for it and allowing some errors.', "The inner product in the Z space simplifies the computation, and the problem's dimensionality depends on the number of examples, not on the dimensionality of the Z space.", 'This extension allows for the transition from hard margin (strictly obeying the margin) to soft margin (allowing some errors) in support vector machines, expanding the capability to handle non-linearly separable cases.']}, {'end': 794.828, 'segs': [{'end': 551.945, 'src': 'embed', 'start': 501.225, 'weight': 0, 'content': [{'end': 504.487, 'text': "But basically, the dimensionality of Z explicitly doesn't appear.", 'start': 501.225, 'duration': 3.262}, {'end': 507.912, 'text': 'Nonetheless, we still have to take an inner product in the Z space.', 'start': 505.268, 'duration': 2.644}, {'end': 513.059, 'text': 'So in this view graph, I am going to zoom in to the very simple question.', 'start': 508.954, 'duration': 4.105}, {'end': 519.148, 'text': 'What do I need from the Z space in order to be able to carry out the machinery that I have seen so far?', 'start': 513.62, 'duration': 5.528}, {'end': 523.07, 'text': 'So what do we do??', 'start': 521.249, 'duration': 1.821}, {'end': 524.751, 'text': 'We have a Lagrangian to solve.', 'start': 523.51, 'duration': 1.241}, {'end': 526.952, 'text': 'So the Lagrangian looks like this.', 'start': 525.451, 'duration': 1.501}, {'end': 533.275, 'text': "And since we are interested in what we do in the z space, I'm going to make these purple.", 'start': 527.972, 'duration': 5.303}, {'end': 540.478, 'text': 'So in order to be able to carry out the Lagrangian, I need to get the inner product in the z space.', 'start': 534.515, 'duration': 5.963}, {'end': 547.601, 'text': 'But getting an inner product in the z space is less demand than getting the actual vector in the z space.', 'start': 540.918, 'duration': 6.683}, {'end': 548.982, 'text': 'Think of it this way.', 'start': 548.021, 'duration': 0.961}, {'end': 551.945, 'text': 'I am a guardian of the Z space.', 'start': 550.123, 'duration': 1.822}], 'summary': 'Understanding the z space and its role in carrying out the lagrangian machinery.', 'duration': 50.72, 'max_score': 501.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU501225.jpg'}, {'end': 619.182, 'src': 'embed', 'start': 596.263, 'weight': 1, 'content': [{'end': 606.251, 'text': 'So in this slide we are going through step by step in the entire process to see if we ever need anything out of the z space other than the inner product.', 'start': 596.263, 'duration': 9.988}, {'end': 610.635, 'text': 'So in forming the Lagrangian, we need the inner product.', 'start': 607.772, 'duration': 2.863}, {'end': 613.097, 'text': "Let's look at the constraints.", 'start': 612.116, 'duration': 0.981}, {'end': 615.098, 'text': 'We have to pass the constraints to quadratic programming.', 'start': 613.137, 'duration': 1.961}, {'end': 617.18, 'text': 'So this is the first constraint.', 'start': 615.899, 'duration': 1.281}, {'end': 618.621, 'text': "I don't see any z.", 'start': 617.38, 'duration': 1.241}, {'end': 619.182, 'text': 'So we are cool.', 'start': 618.621, 'duration': 0.561}], 'summary': 'Lagrangian formation requires inner product; constraints are passed to quadratic programming.', 'duration': 22.919, 'max_score': 596.263, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU596263.jpg'}, {'end': 761.545, 'src': 'heatmap', 'start': 702.489, 'weight': 2, 'content': [{'end': 710.536, 'text': 'How do I solve for b? I solve for b by taking any support vector, and solving for this equation.', 'start': 702.489, 'duration': 8.047}, {'end': 714.499, 'text': 'So I take a support vector m, and plug it in.', 'start': 710.576, 'duration': 3.923}, {'end': 716.98, 'text': 'Am I in trouble because I have the w? No.', 'start': 715.019, 'duration': 1.961}, {'end': 718.822, 'text': 'We already saw that w is here.', 'start': 717.04, 'duration': 1.782}, {'end': 719.722, 'text': 'It has this form.', 'start': 718.902, 'duration': 0.82}, {'end': 721.083, 'text': 'So I can plug it in here.', 'start': 719.762, 'duration': 1.321}, {'end': 726.147, 'text': 'And all I need in order to solve for b here is this fellow.', 'start': 721.483, 'duration': 4.664}, {'end': 732.191, 'text': 'Done We only deal with z as far as the inner product is concerned.', 'start': 727.608, 'duration': 4.583}, {'end': 735.416, 'text': 'Now, that raises a very interesting possibility.', 'start': 733.134, 'duration': 2.282}, {'end': 745.444, 'text': 'If I am able to compute the inner product in the Z space without visiting the Z space, I still can carry this machinery.', 'start': 736.697, 'duration': 8.747}, {'end': 749.067, 'text': 'We can even move further.', 'start': 747.626, 'duration': 1.441}, {'end': 758.383, 'text': 'If I can carry the inner product in the Z space without knowing what the Z space is, I still will be OK.', 'start': 750.288, 'duration': 8.095}, {'end': 761.545, 'text': "You may wonder, how am I going to do that? That's a different question.", 'start': 759.384, 'duration': 2.161}], 'summary': 'Solving for b involves computing the inner product in the z space without visiting or knowing the z space.', 'duration': 59.056, 'max_score': 702.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU702489.jpg'}], 'start': 501.225, 'title': 'Inner product in z space', 'summary': 'Emphasizes the necessity of obtaining the inner product in the z space to carry out the lagrangian machinery, simplifying operations and parameter solving without directly accessing the z space.', 'chapters': [{'end': 551.945, 'start': 501.225, 'title': 'Inner product in z space', 'summary': 'Explains the necessity of obtaining the inner product in the z space to carry out the lagrangian machinery, emphasizing that it is less demanding than obtaining the actual vector in the z space.', 'duration': 50.72, 'highlights': ['Obtaining the inner product in the Z space is a prerequisite for carrying out the Lagrangian machinery.', 'The process of getting an inner product in the Z space is less demanding than obtaining the actual vector in the Z space.', "The dimensionality of Z explicitly doesn't appear, but an inner product in the Z space is still required."]}, {'end': 794.828, 'start': 551.985, 'title': 'Inner product in z space', 'summary': 'Discusses leveraging inner products in the z space to simplify operations, including using inner products for forming the lagrangian and constraints, and solving for parameters like w and b without directly accessing the z space.', 'duration': 242.843, 'highlights': ['Using inner products for forming the Lagrangian and constraints The chapter emphasizes the use of inner products in the Z space for forming the Lagrangian and constraints, demonstrating a reliance on inner products in the optimization process.', 'Solving for parameters like w and b without directly accessing the Z space The approach involves solving for parameters such as w and b without directly accessing the Z space, indicating the ability to perform computations and solve equations solely based on inner products.', 'Leveraging inner products to carry out support vector machinery without visiting the Z space The discussion highlights the possibility of carrying out support vector machinery in the x space without visiting the Z space, indicating the potential to perform complex operations solely based on inner products.']}], 'duration': 293.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU501225.jpg', 'highlights': ['Obtaining the inner product in the Z space is a prerequisite for carrying out the Lagrangian machinery.', 'Using inner products for forming the Lagrangian and constraints.', 'The process of getting an inner product in the Z space is less demanding than obtaining the actual vector in the Z space.', 'Solving for parameters like w and b without directly accessing the Z space.', "The dimensionality of Z explicitly doesn't appear, but an inner product in the Z space is still required.", 'The approach involves solving for parameters such as w and b without directly accessing the Z space.', 'Leveraging inner products to carry out support vector machinery without visiting the Z space.']}, {'end': 1493.105, 'segs': [{'end': 820.638, 'src': 'embed', 'start': 795.308, 'weight': 3, 'content': [{'end': 801.732, 'text': 'And usually we have our stunned silence moments in machine learning where you do something and you know that the existence is sufficient.', 'start': 795.308, 'duration': 6.424}, {'end': 809.275, 'text': "So let's look at this idea as being a generalized inner product, x and x dash.", 'start': 804.133, 'duration': 5.142}, {'end': 811.815, 'text': 'We transform them and take an inner product into this space.', 'start': 809.575, 'duration': 2.24}, {'end': 814.916, 'text': 'We are going to treat it as if it was a generalized inner product in the x space.', 'start': 811.855, 'duration': 3.061}, {'end': 817.957, 'text': 'So what are the components? You take two points.', 'start': 815.076, 'duration': 2.881}, {'end': 820.638, 'text': 'We are going to label them x and x dash in the input space.', 'start': 818.117, 'duration': 2.521}], 'summary': 'In machine learning, transforming and taking generalized inner product in input space.', 'duration': 25.33, 'max_score': 795.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU795308.jpg'}, {'end': 875.506, 'src': 'embed', 'start': 845.854, 'weight': 4, 'content': [{'end': 849.616, 'text': 'And therefore, their inner product will be a function that is determined by X and X dash.', 'start': 845.854, 'duration': 3.762}, {'end': 851.837, 'text': "So this is the function that I'm looking for.", 'start': 850.397, 'duration': 1.44}, {'end': 856.794, 'text': 'Now we are going to call this the kernel, hence the name.', 'start': 854.813, 'duration': 1.981}, {'end': 859.236, 'text': 'So this is the kernel we are going to use.', 'start': 857.535, 'duration': 1.701}, {'end': 862.938, 'text': 'A kernel will correspond to some z space.', 'start': 859.556, 'duration': 3.382}, {'end': 868.362, 'text': 'And as I mentioned, this will be labeled as an inner product.', 'start': 865.08, 'duration': 3.282}, {'end': 869.943, 'text': "I put it between quotations because it's general.", 'start': 868.422, 'duration': 1.521}, {'end': 870.963, 'text': 'Between x and x dash.', 'start': 869.963, 'duration': 1}, {'end': 875.506, 'text': "It's not a straight inner product, but an inner product after a transformation.", 'start': 871.244, 'duration': 4.262}], 'summary': 'The kernel function determined by x and x dash corresponds to z space.', 'duration': 29.652, 'max_score': 845.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU845854.jpg'}, {'end': 987.424, 'src': 'heatmap', 'start': 892.194, 'weight': 0.787, 'content': [{'end': 896.078, 'text': "And I'm using a nonlinear transformation, which happens to be second-order polynomial.", 'start': 892.194, 'duration': 3.884}, {'end': 897.279, 'text': 'We have seen that a number of times.', 'start': 896.098, 'duration': 1.181}, {'end': 906.664, 'text': 'So what do we have? We have a transformation that takes the vector x, produces the vector z.', 'start': 898.04, 'duration': 8.624}, {'end': 909.946, 'text': 'And that would be the full second-order guy.', 'start': 906.664, 'duration': 3.282}, {'end': 914.548, 'text': 'So we have six coordinates corresponding to all terms of the second order, involving x1 and x2.', 'start': 909.966, 'duration': 4.582}, {'end': 915.628, 'text': 'And this is the guy.', 'start': 914.928, 'duration': 0.7}, {'end': 916.429, 'text': 'We used that before.', 'start': 915.708, 'duration': 0.721}, {'end': 930.593, 'text': 'And therefore, if you want to get the kernel, which is formally the inner product between the transformation of x and x dash, you will get this.', 'start': 918.149, 'duration': 12.444}, {'end': 931.534, 'text': 'Nothing mysterious.', 'start': 930.833, 'duration': 0.701}, {'end': 936.097, 'text': 'You are just going to substitute for this for x, and substitute for it again for x dash.', 'start': 931.574, 'duration': 4.523}, {'end': 940, 'text': 'Multiply the corresponding terms, and add them up, and this is what you get.', 'start': 936.838, 'duration': 3.162}, {'end': 945.964, 'text': 'So the only lesson we are learning here is that, indeed, this is just a function of x and x dash.', 'start': 940.901, 'duration': 5.063}, {'end': 948.486, 'text': "If I didn't know this was an inner product, I can look at this.", 'start': 946.245, 'duration': 2.241}, {'end': 950.027, 'text': 'This is a function I can compute.', 'start': 948.767, 'duration': 1.26}, {'end': 954.411, 'text': 'Fine Now we come to the trick.', 'start': 951.208, 'duration': 3.203}, {'end': 962.068, 'text': 'Can we compute this kernel? without transforming x and x dash.', 'start': 957.173, 'duration': 4.895}, {'end': 967.692, 'text': "So let's look at the example again.", 'start': 965.09, 'duration': 2.602}, {'end': 970.934, 'text': "I'm going to now improvise a kernel.", 'start': 968.292, 'duration': 2.642}, {'end': 976.658, 'text': "It doesn't transform things to z space and then does the inner product.", 'start': 972.655, 'duration': 4.003}, {'end': 978.499, 'text': 'It just tells you what the kernel is.', 'start': 976.838, 'duration': 1.661}, {'end': 987.424, 'text': "And then I'm going to convince you that this kernel actually corresponds to a transformation to some z space, and taking an inner product there.", 'start': 979.459, 'duration': 7.965}], 'summary': 'Using second-order polynomial transformation to compute kernel without transforming x and x dash.', 'duration': 95.23, 'max_score': 892.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU892194.jpg'}, {'end': 945.964, 'src': 'embed', 'start': 915.708, 'weight': 2, 'content': [{'end': 916.429, 'text': 'We used that before.', 'start': 915.708, 'duration': 0.721}, {'end': 930.593, 'text': 'And therefore, if you want to get the kernel, which is formally the inner product between the transformation of x and x dash, you will get this.', 'start': 918.149, 'duration': 12.444}, {'end': 931.534, 'text': 'Nothing mysterious.', 'start': 930.833, 'duration': 0.701}, {'end': 936.097, 'text': 'You are just going to substitute for this for x, and substitute for it again for x dash.', 'start': 931.574, 'duration': 4.523}, {'end': 940, 'text': 'Multiply the corresponding terms, and add them up, and this is what you get.', 'start': 936.838, 'duration': 3.162}, {'end': 945.964, 'text': 'So the only lesson we are learning here is that, indeed, this is just a function of x and x dash.', 'start': 940.901, 'duration': 5.063}], 'summary': 'Kernel is the inner product between transformation of x and x dash, a function of x and x dash.', 'duration': 30.256, 'max_score': 915.708, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU915708.jpg'}, {'end': 1133.914, 'src': 'embed', 'start': 1108.753, 'weight': 1, 'content': [{'end': 1117.018, 'text': 'Look at the difference between computing this quantity and actually going to the 100-order transformation.', 'start': 1108.753, 'duration': 8.265}, {'end': 1122.93, 'text': 'getting s expanded, and getting the other one expanded, and then doing the inner product.', 'start': 1117.768, 'duration': 5.162}, {'end': 1124.23, 'text': "So let's see how this works.", 'start': 1123.19, 'duration': 1.04}, {'end': 1126.431, 'text': "That's called the polynomial kernel.", 'start': 1124.791, 'duration': 1.64}, {'end': 1131.093, 'text': 'Now I take a d-dimensional space.', 'start': 1128.892, 'duration': 2.201}, {'end': 1133.914, 'text': 'Not 2, but general d.', 'start': 1131.313, 'duration': 2.601}], 'summary': 'Comparing computation and transformation orders in polynomial kernel in d-dimensional space.', 'duration': 25.161, 'max_score': 1108.753, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1108753.jpg'}, {'end': 1337.395, 'src': 'embed', 'start': 1310.703, 'weight': 0, 'content': [{'end': 1315.485, 'text': 'You can adjust the scales a little bit, not fully, by taking your kernel instead of being 1 plus.', 'start': 1310.703, 'duration': 4.782}, {'end': 1323.409, 'text': 'You have scales A and B that will mitigate a little bit the diversity of the coefficients you get here.', 'start': 1315.925, 'duration': 7.484}, {'end': 1328.531, 'text': 'But the bottom line is that a kernel of this form does correspond to an inner product in a higher space.', 'start': 1323.949, 'duration': 4.582}, {'end': 1337.395, 'text': 'And by computing it just in the X space using this formula, I am doing all I need to do in order to carry out the SV machinery.', 'start': 1329.151, 'duration': 8.244}], 'summary': 'Adjust scales with kernel, mitigates coefficients diversity, corresponds to inner product in higher space.', 'duration': 26.692, 'max_score': 1310.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1310703.jpg'}], 'start': 795.308, 'title': 'Kernel functions and inner products', 'summary': 'Explains kernel functions, such as polynomial kernels, and their computational complexity, demonstrating their correspondence to inner products in higher-dimensional spaces, with benefits for support vector machines and generalization.', 'chapters': [{'end': 915.628, 'start': 795.308, 'title': 'Generalized inner product in machine learning', 'summary': 'Discusses the concept of a generalized inner product in machine learning, emphasizing the function of x and x dash in determining the kernel, which corresponds to a z space and is used as an inner product after a transformation.', 'duration': 120.32, 'highlights': ['The kernel corresponds to a z space and is determined by the inner product of X and X dash, with z being exclusively a function of X and z dash being exclusively a function of X dash.', 'The function of X and X dash determines the kernel, which is used as an inner product after a transformation, illustrated through a second-order polynomial transformation in a two-dimensional Euclidean space.', 'The concept of a generalized inner product in machine learning is discussed, highlighting the idea of transforming X and X dash and taking an inner product into a space, treated as a function exclusively determined by X and X dash.']}, {'end': 1493.105, 'start': 915.708, 'title': 'Kernel functions and inner products', 'summary': 'Explains the concept of kernel functions, including examples of polynomial kernels, and their computational complexity, demonstrating how they correspond to inner products in higher-dimensional spaces, and the benefits of utilizing them for support vector machines and generalization.', 'duration': 577.397, 'highlights': ['The kernel function is essentially the inner product between the transformation of x and x dash, allowing for computational simplicity and direct computation without transforming x and x dash.', "The polynomial kernel's computational complexity remains the same regardless of the order, making it a simple operation to carry out, with the benefits of mapping to a higher-dimensional space without explicitly visiting it.", 'The kernel function corresponds to an inner product in an infinite-dimensional space, providing the benefits of a nonlinear transformation without concerns about generalization issues.']}], 'duration': 697.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU795308.jpg', 'highlights': ['The kernel function corresponds to an inner product in an infinite-dimensional space, providing the benefits of a nonlinear transformation without concerns about generalization issues.', "The polynomial kernel's computational complexity remains the same regardless of the order, making it a simple operation to carry out, with the benefits of mapping to a higher-dimensional space without explicitly visiting it.", 'The kernel function is essentially the inner product between the transformation of x and x dash, allowing for computational simplicity and direct computation without transforming x and x dash.', 'The concept of a generalized inner product in machine learning is discussed, highlighting the idea of transforming X and X dash and taking an inner product into a space, treated as a function exclusively determined by X and X dash.', 'The function of X and X dash determines the kernel, which is used as an inner product after a transformation, illustrated through a second-order polynomial transformation in a two-dimensional Euclidean space.', 'The kernel corresponds to a z space and is determined by the inner product of X and X dash, with z being exclusively a function of X and z dash being exclusively a function of X dash.']}, {'end': 2009.595, 'segs': [{'end': 1526.498, 'src': 'embed', 'start': 1495.325, 'weight': 2, 'content': [{'end': 1500.046, 'text': "Let's take the kernel, but apply it, in this case, to one-dimensional space.", 'start': 1495.325, 'duration': 4.721}, {'end': 1502.307, 'text': 'So x and x dash are both scalars.', 'start': 1500.446, 'duration': 1.861}, {'end': 1504.427, 'text': 'So I call them x and x dash.', 'start': 1502.887, 'duration': 1.54}, {'end': 1509.428, 'text': "And I'm going to take gamma to be 1, which is my modulating constant here.", 'start': 1505.908, 'duration': 3.52}, {'end': 1510.769, 'text': 'So I get this fellow.', 'start': 1509.909, 'duration': 0.86}, {'end': 1514.871, 'text': 'So now let me express this using Taylor series.', 'start': 1511.869, 'duration': 3.002}, {'end': 1518.173, 'text': 'First, I do the following.', 'start': 1516.072, 'duration': 2.101}, {'end': 1526.498, 'text': 'I expand this, so I get x squared, x dash squared, and minus twice x x dash, and minus twice gets the minus and becomes a plus.', 'start': 1518.953, 'duration': 7.545}], 'summary': 'Applying kernel to 1d space with gamma=1, using taylor series expansion.', 'duration': 31.173, 'max_score': 1495.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1495325.jpg'}, {'end': 1695.431, 'src': 'heatmap', 'start': 1598.926, 'weight': 3, 'content': [{'end': 1608.253, 'text': 'And why am I doing that? Because I am going to separate this into an inner product, something coming from X, and something coming from X dash.', 'start': 1598.926, 'duration': 9.327}, {'end': 1610.335, 'text': "And I want to make sure that it's the same.", 'start': 1608.693, 'duration': 1.642}, {'end': 1614.458, 'text': 'Once it is the same, then the dimensionality is really this summation.', 'start': 1611.075, 'duration': 3.383}, {'end': 1616.3, 'text': 'Each of these is a coordinate.', 'start': 1614.938, 'duration': 1.362}, {'end': 1621.964, 'text': 'And this is the contribution to the inner product by this coordinate.', 'start': 1617.06, 'duration': 4.904}, {'end': 1626.028, 'text': 'So here, I am getting this x dash, multiplied by x.', 'start': 1622.365, 'duration': 3.663}, {'end': 1628.83, 'text': 'Both of them normalize by e to the minus x theta.', 'start': 1626.028, 'duration': 2.802}, {'end': 1637.958, 'text': 'So if I want to see what is the transformation of the first guy, it would be e to the minus x squared, multiplied by x to the k.', 'start': 1628.99, 'duration': 8.968}, {'end': 1638.759, 'text': "That's one coordinate.", 'start': 1637.958, 'duration': 0.801}, {'end': 1641.861, 'text': 'And as k goes from 0 to infinity, I get different coordinates.', 'start': 1639.179, 'duration': 2.682}, {'end': 1645.183, 'text': 'I would be ready to go, except for the annoying constants.', 'start': 1642.762, 'duration': 2.421}, {'end': 1648.164, 'text': "So let's put them in purple.", 'start': 1645.763, 'duration': 2.401}, {'end': 1652.205, 'text': 'What do you do with them? You divide them between red and blue.', 'start': 1649.584, 'duration': 2.621}, {'end': 1654.766, 'text': 'Take the square root of that, and put it in the red.', 'start': 1652.765, 'duration': 2.001}, {'end': 1657.266, 'text': 'And take the other square root, and put it in the blue.', 'start': 1655.266, 'duration': 2}, {'end': 1660.487, 'text': 'And now we have formally two identical vectors.', 'start': 1657.867, 'duration': 2.62}, {'end': 1663.828, 'text': 'One is the transformed version of x, and one is the transformed version of x dash.', 'start': 1660.908, 'duration': 2.92}, {'end': 1665.269, 'text': 'And this is the inner product.', 'start': 1664.208, 'duration': 1.061}, {'end': 1671.371, 'text': 'And it happens to be an infinite dimensional space, because you are summing for 0 until infinity.', 'start': 1665.509, 'duration': 5.862}, {'end': 1675.814, 'text': 'Now, this is a very interesting kernel.', 'start': 1673.552, 'duration': 2.262}, {'end': 1680.018, 'text': "It's called radial basis function kernel, if that rings a bell.", 'start': 1675.934, 'duration': 4.084}, {'end': 1681.619, 'text': "Indeed, that's the subject of the next lecture.", 'start': 1680.078, 'duration': 1.541}, {'end': 1685.723, 'text': 'So let us look at this kernel in action.', 'start': 1682.78, 'duration': 2.943}, {'end': 1688.145, 'text': "And it's very interesting, because it's a very sophisticated kernel.", 'start': 1685.903, 'duration': 2.242}, {'end': 1689.886, 'text': 'It corresponds to an infinite dimensional space.', 'start': 1688.225, 'duration': 1.661}, {'end': 1695.431, 'text': 'Nonetheless, we can carry it out by computing a very simple exponential between the x points.', 'start': 1690.267, 'duration': 5.164}], 'summary': 'Explaining the inner product and transformation in an infinite dimensional space using a radial basis function kernel.', 'duration': 46.257, 'max_score': 1598.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1598926.jpg'}, {'end': 1689.886, 'src': 'embed', 'start': 1665.509, 'weight': 0, 'content': [{'end': 1671.371, 'text': 'And it happens to be an infinite dimensional space, because you are summing for 0 until infinity.', 'start': 1665.509, 'duration': 5.862}, {'end': 1675.814, 'text': 'Now, this is a very interesting kernel.', 'start': 1673.552, 'duration': 2.262}, {'end': 1680.018, 'text': "It's called radial basis function kernel, if that rings a bell.", 'start': 1675.934, 'duration': 4.084}, {'end': 1681.619, 'text': "Indeed, that's the subject of the next lecture.", 'start': 1680.078, 'duration': 1.541}, {'end': 1685.723, 'text': 'So let us look at this kernel in action.', 'start': 1682.78, 'duration': 2.943}, {'end': 1688.145, 'text': "And it's very interesting, because it's a very sophisticated kernel.", 'start': 1685.903, 'duration': 2.242}, {'end': 1689.886, 'text': 'It corresponds to an infinite dimensional space.', 'start': 1688.225, 'duration': 1.661}], 'summary': 'Radial basis function kernel operates in an infinite dimensional space for summing from 0 to infinity.', 'duration': 24.377, 'max_score': 1665.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1665509.jpg'}, {'end': 1844.914, 'src': 'embed', 'start': 1819.514, 'weight': 1, 'content': [{'end': 1828.282, 'text': 'Can you tell me what is the out-of-sample error? Can you bound it above? Oh, it looks like it should be less than 10%.', 'start': 1819.514, 'duration': 8.768}, {'end': 1830.364, 'text': 'I have gone to an infinite dimensional space.', 'start': 1828.282, 'duration': 2.082}, {'end': 1832.125, 'text': 'You are a witness to that.', 'start': 1830.564, 'duration': 1.561}, {'end': 1836.028, 'text': 'I used what is effectively an infinite number of parameters.', 'start': 1833.406, 'duration': 2.622}, {'end': 1840.411, 'text': 'Completely suicidal in terms of generalization.', 'start': 1837.389, 'duration': 3.022}, {'end': 1844.014, 'text': 'But hey, I get 9 support vectors.', 'start': 1841.692, 'duration': 2.322}, {'end': 1844.914, 'text': 'I can claim victory.', 'start': 1844.094, 'duration': 0.82}], 'summary': 'Out-of-sample error is less than 10%, using infinite parameters, with 9 support vectors.', 'duration': 25.4, 'max_score': 1819.514, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1819514.jpg'}], 'start': 1495.325, 'title': 'Kernel transformation in machine learning', 'summary': 'Delves into the application of kernel to one-dimensional space, deriving the taylor series expansion and demonstrating the inner product transformation. it also discusses the use of radial basis function kernel to transform data into an infinite-dimensional space, resulting in a small number of support vectors and a respectable margin despite using an infinite number of parameters.', 'chapters': [{'end': 1665.269, 'start': 1495.325, 'title': 'Kernel application in one-dimensional space', 'summary': 'Explains the application of kernel to one-dimensional space, deriving the taylor series expansion, and demonstrating the inner product transformation, emphasizing the significance of the transformation and dimensionality.', 'duration': 169.944, 'highlights': ['The chapter explains the application of kernel to one-dimensional space, deriving the Taylor series expansion, and demonstrating the inner product transformation, emphasizing the significance of the transformation and dimensionality.', 'The Taylor series expansion is derived to express the kernel, involving the terms x, x dash, and a modulating constant gamma of 1.', 'The inner product transformation is highlighted, showcasing the separation into components from X and X dash, ensuring the same dimensionality and contribution to the inner product by each coordinate.', 'The transformation of the vectors X and X dash is formalized, normalizing by e to the power of minus x theta and separating the constants between red and blue for identical vectors.', 'The chapter emphasizes the significance of the inner product transformation and the dimensionality, aiming to convince the audience of the existence of a z space as an inner product.']}, {'end': 2009.595, 'start': 1665.509, 'title': 'Radial basis function kernel', 'summary': 'Discusses the transformation of data into an infinite-dimensional space using a radial basis function kernel, resulting in a small number of support vectors and a respectable margin, despite the use of an infinite number of parameters.', 'duration': 344.086, 'highlights': ['Transformation to infinite-dimensional space using radial basis function kernel The chapter demonstrates the transformation of data into an infinite-dimensional space using a radial basis function kernel, resulting in a small number of support vectors and a respectable margin, despite the use of an infinite number of parameters.', 'Small number of support vectors in the infinite-dimensional space The transformation yields only nine support vectors out of 100 points in the infinite-dimensional space, showcasing the effectiveness of the method in reducing the complexity of the problem.', 'Effectiveness of the method despite the use of infinite parameters The approach demonstrates its effectiveness in achieving a low out-of-sample error rate, likely to be less than 10%, despite employing an infinite number of parameters, showcasing its robustness and efficiency.']}], 'duration': 514.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU1495325.jpg', 'highlights': ['The transformation to infinite-dimensional space using radial basis function kernel results in a small number of support vectors and a respectable margin, despite using an infinite number of parameters.', 'The approach demonstrates its effectiveness in achieving a low out-of-sample error rate, likely to be less than 10%, despite employing an infinite number of parameters, showcasing its robustness and efficiency.', 'The Taylor series expansion is derived to express the kernel, involving the terms x, x dash, and a modulating constant gamma of 1.', 'The inner product transformation is highlighted, showcasing the separation into components from X and X dash, ensuring the same dimensionality and contribution to the inner product by each coordinate.', 'The transformation of the vectors X and X dash is formalized, normalizing by e to the power of minus x theta and separating the constants between red and blue for identical vectors.']}, {'end': 2733.072, 'segs': [{'end': 2055.695, 'src': 'embed', 'start': 2030.626, 'weight': 0, 'content': [{'end': 2036.708, 'text': "So now let's look at if I give you a kernel and it's a valid kernel that corresponds to an inner product in some z space.", 'start': 2030.626, 'duration': 6.082}, {'end': 2037.908, 'text': 'how do you formulate the problem?', 'start': 2036.708, 'duration': 1.2}, {'end': 2039.929, 'text': 'This is just formality.', 'start': 2038.068, 'duration': 1.861}, {'end': 2040.629, 'text': 'You already know.', 'start': 2039.989, 'duration': 0.64}, {'end': 2042.07, 'text': "But let's take it step by step.", 'start': 2040.729, 'duration': 1.341}, {'end': 2045.291, 'text': 'What do you do? You remember quadratic programming? Yes, I do.', 'start': 2042.19, 'duration': 3.101}, {'end': 2049.853, 'text': 'And in quadratic programming, we have this huge matrix.', 'start': 2047.052, 'duration': 2.801}, {'end': 2053.074, 'text': 'That is the big Q matrix that you pass onto the algorithm.', 'start': 2050.433, 'duration': 2.641}, {'end': 2055.695, 'text': 'And you compute it in terms of inner products.', 'start': 2053.674, 'duration': 2.021}], 'summary': 'Formulate the problem using valid kernel, involving quadratic programming and computing big q matrix in terms of inner products.', 'duration': 25.069, 'max_score': 2030.626, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2030626.jpg'}, {'end': 2193.385, 'src': 'embed', 'start': 2160.402, 'weight': 2, 'content': [{'end': 2162.502, 'text': 'You choose a kernel, and it will give you a different model.', 'start': 2160.402, 'duration': 2.1}, {'end': 2166.643, 'text': 'So, if you have ever been curious, in the middle of all of this junk, what is the model??', 'start': 2162.802, 'duration': 3.841}, {'end': 2168.363, 'text': "What is the hypothesis that I'm working with??", 'start': 2166.683, 'duration': 1.68}, {'end': 2170.704, 'text': 'It happens to have this functional form.', 'start': 2168.804, 'duration': 1.9}, {'end': 2172.765, 'text': 'The kernel you choose appears here.', 'start': 2171.404, 'duration': 1.361}, {'end': 2175.085, 'text': 'It gets summed up with coefficients.', 'start': 2173.365, 'duration': 1.72}, {'end': 2178.006, 'text': 'The coefficients happen to be determined by alpha.', 'start': 2175.625, 'duration': 2.381}, {'end': 2181.117, 'text': 'They all happen to agree in sign with the label.', 'start': 2178.996, 'duration': 2.121}, {'end': 2183.599, 'text': "That's one of the artifacts of that, because alphas are non-negative.", 'start': 2181.177, 'duration': 2.422}, {'end': 2185.9, 'text': 'And we have plus b.', 'start': 2184.059, 'duration': 1.841}, {'end': 2187.801, 'text': "And again, plus b is the one that we haven't solved for.", 'start': 2185.9, 'duration': 1.901}, {'end': 2193.385, 'text': 'But I can solve for it using the other one, and I end up with this equation for it.', 'start': 2188.202, 'duration': 5.183}], 'summary': 'Choosing a kernel yields a model with coefficients determined by alpha and functional form.', 'duration': 32.983, 'max_score': 2160.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2160402.jpg'}, {'end': 2328.863, 'src': 'embed', 'start': 2301.185, 'weight': 1, 'content': [{'end': 2304.887, 'text': 'This transformation, in order to get this thing, I need to know what xn is.', 'start': 2301.185, 'duration': 3.702}, {'end': 2307.329, 'text': 'But we have seen this before.', 'start': 2306.088, 'duration': 1.241}, {'end': 2314.17, 'text': 'Remember the hidden layer in neural networks? It got a nonlinear transform based on the data set.', 'start': 2307.649, 'duration': 6.521}, {'end': 2316.132, 'text': 'So this is not foreign to us.', 'start': 2314.771, 'duration': 1.361}, {'end': 2319.455, 'text': 'But this tells you why this looks very simple.', 'start': 2317.353, 'duration': 2.102}, {'end': 2322.077, 'text': "Where is the infinite dimensional space? I'm only determining this.", 'start': 2319.555, 'duration': 2.522}, {'end': 2326.721, 'text': 'This is the solution after all of the manipulation has been done.', 'start': 2322.498, 'duration': 4.223}, {'end': 2328.863, 'text': 'And that is why it has this form.', 'start': 2327.082, 'duration': 1.781}], 'summary': 'Nonlinear transformations like in neural networks explain the simplicity and form of the solution.', 'duration': 27.678, 'max_score': 2301.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2301185.jpg'}, {'end': 2433.999, 'src': 'embed', 'start': 2406, 'weight': 3, 'content': [{'end': 2409.223, 'text': 'By the way, in support vector machines, you will come up with your own kernels.', 'start': 2406, 'duration': 3.223}, {'end': 2414.788, 'text': "So it's a good idea to just ask yourself what are the conditions to get the kernel right?", 'start': 2409.903, 'duration': 4.885}, {'end': 2419.152, 'text': 'In order to get to be the valid kernel, there are three approaches.', 'start': 2415.369, 'duration': 3.783}, {'end': 2423.136, 'text': 'First approach we have already seen.', 'start': 2421.855, 'duration': 1.281}, {'end': 2430.817, 'text': 'This is by construction, conceptual construction, if not explicit construction, like we did with the polynomial.', 'start': 2425.475, 'duration': 5.342}, {'end': 2433.999, 'text': 'We looked at it, and we realized that there is a polynomial thing.', 'start': 2431.318, 'duration': 2.681}], 'summary': 'In support vector machines, defining valid kernels involves three approaches, with one being conceptual construction like with polynomials.', 'duration': 27.999, 'max_score': 2406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2406000.jpg'}, {'end': 2590.984, 'src': 'embed', 'start': 2559.713, 'weight': 4, 'content': [{'end': 2561.013, 'text': 'The following statement holds.', 'start': 2559.713, 'duration': 1.3}, {'end': 2564.835, 'text': 'The kernel that you wrote down is a valid kernel.', 'start': 2561.734, 'duration': 3.101}, {'end': 2574.761, 'text': "That is, the Z space that you're talking about actually exists, if and only if two conditions in conjunction are satisfied.", 'start': 2564.855, 'duration': 9.906}, {'end': 2577.782, 'text': 'One is the fact that the kernel is symmetric.', 'start': 2575.521, 'duration': 2.261}, {'end': 2579.742, 'text': 'That should be abundantly obvious.', 'start': 2578.302, 'duration': 1.44}, {'end': 2584.583, 'text': 'Symmetric being K of X and X dash being equal to K of X dash and X.', 'start': 2579.842, 'duration': 4.741}, {'end': 2590.984, 'text': "Well, this is supposed to be the dot product in the Z space, right? So we're going to transform X and X dash into Z and Z dash.", 'start': 2584.583, 'duration': 6.401}], 'summary': "A valid kernel exists if the symmetric condition is satisfied, i.e., k(x, x') = k(x', x).", 'duration': 31.271, 'max_score': 2559.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2559713.jpg'}, {'end': 2725.525, 'src': 'heatmap', 'start': 2623.839, 'weight': 0.787, 'content': [{'end': 2631.965, 'text': 'So if this was a genuine inner product, and you had it explicitly, each of these will be the inner product, this one between Z1 transpose Z1.', 'start': 2623.839, 'duration': 8.126}, {'end': 2634.668, 'text': 'This would be Z2 transpose Z2, et cetera.', 'start': 2632.226, 'duration': 2.442}, {'end': 2642.694, 'text': "And therefore, this thing could be decomposed as an outer product between Z's standing and Z's sitting.", 'start': 2635.969, 'duration': 6.725}, {'end': 2644.596, 'text': 'And you will get that.', 'start': 2643.975, 'duration': 0.621}, {'end': 2654.988, 'text': 'So the condition here on that matrix, without visiting the Z, is that when you put these numbers to your pleasant surprise, This needs to be positive,', 'start': 2645.917, 'duration': 9.071}, {'end': 2655.808, 'text': 'semi-definite.', 'start': 2654.988, 'duration': 0.82}, {'end': 2659.97, 'text': 'That is, in matrix language, this matrix should be greater than or equal to 0.', 'start': 2656.208, 'duration': 3.762}, {'end': 2664.752, 'text': "That's what positive semi-definite really means, conceptually.", 'start': 2659.97, 'duration': 4.782}, {'end': 2669.214, 'text': 'And this should be true for any choice of the points.', 'start': 2666.053, 'duration': 3.161}, {'end': 2672.178, 'text': "And that is Mercer's condition.", 'start': 2671.078, 'duration': 1.1}, {'end': 2674.219, 'text': 'Now, you can see the difficulty.', 'start': 2673.178, 'duration': 1.041}, {'end': 2677.999, 'text': 'If I want to satisfy that this is true for any, I choose the point set.', 'start': 2674.259, 'duration': 3.74}, {'end': 2683.701, 'text': 'So obviously, I have to have some math helping me to corner that this has to be positive or negative for some reason.', 'start': 2678.1, 'duration': 5.601}, {'end': 2685.981, 'text': 'But this is indeed the condition.', 'start': 2684.701, 'duration': 1.28}, {'end': 2693.603, 'text': "And if you look at the case where you know the transformation into the Z and you put this as an outer product between a bunch of Z's and a bunch of Z's,", 'start': 2686.341, 'duration': 7.262}, {'end': 2696.985, 'text': 'What you are going to get is patently positive, semi-definite.', 'start': 2694.483, 'duration': 2.502}, {'end': 2698.746, 'text': 'Because what is positive, semi-definite?', 'start': 2697.305, 'duration': 1.441}, {'end': 2706.511, 'text': 'You put a sleeping vector here and the same vector standing here, and you are guaranteed to get a number greater than or equal to 0 for any vector.', 'start': 2698.766, 'duration': 7.745}, {'end': 2708.192, 'text': "That's what positive semi-definite means.", 'start': 2706.571, 'duration': 1.621}, {'end': 2712.795, 'text': 'If you put that and the matrix happens to be the outer product of these guys,', 'start': 2708.933, 'duration': 3.862}, {'end': 2717.439, 'text': 'then the guy sleeping here gets multiplied by z and the other guy is the transpose of that.', 'start': 2712.795, 'duration': 4.644}, {'end': 2721.702, 'text': 'So you get a number squared, and a number squared is always greater than or equal to 0.', 'start': 2717.739, 'duration': 3.963}, {'end': 2723.283, 'text': 'So the necessity part is obvious.', 'start': 2721.702, 'duration': 1.581}, {'end': 2725.525, 'text': 'Sufficiency is a very elaborate thing to prove.', 'start': 2723.543, 'duration': 1.982}], 'summary': "Mercer's condition for positive semi-definite matrices is crucial in inner product decomposition.", 'duration': 101.686, 'max_score': 2623.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2623839.jpg'}, {'end': 2672.178, 'src': 'embed', 'start': 2645.917, 'weight': 5, 'content': [{'end': 2654.988, 'text': 'So the condition here on that matrix, without visiting the Z, is that when you put these numbers to your pleasant surprise, This needs to be positive,', 'start': 2645.917, 'duration': 9.071}, {'end': 2655.808, 'text': 'semi-definite.', 'start': 2654.988, 'duration': 0.82}, {'end': 2659.97, 'text': 'That is, in matrix language, this matrix should be greater than or equal to 0.', 'start': 2656.208, 'duration': 3.762}, {'end': 2664.752, 'text': "That's what positive semi-definite really means, conceptually.", 'start': 2659.97, 'duration': 4.782}, {'end': 2669.214, 'text': 'And this should be true for any choice of the points.', 'start': 2666.053, 'duration': 3.161}, {'end': 2672.178, 'text': "And that is Mercer's condition.", 'start': 2671.078, 'duration': 1.1}], 'summary': "Matrix must be positive semi-definite for mercer's condition to hold.", 'duration': 26.261, 'max_score': 2645.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2645917.jpg'}], 'start': 2009.635, 'title': 'Svm kernels & validation', 'summary': 'Covers the use of kernels in svm, detailing problem formulation, hypothesis construction, and transformation into nonlinear space. it also explains validating kernels through conceptual construction, math properties, and improvisation, emphasizing symmetry and positive semi-definiteness of the kernel matrix.', 'chapters': [{'end': 2322.077, 'start': 2009.635, 'title': 'Support vector machines and kernels', 'summary': 'Discusses the use of kernels in support vector machines, explaining the formulation of the problem, construction of hypothesis, and the transformation into an infinite-dimensional nonlinear space.', 'duration': 312.442, 'highlights': ['The formulation of the problem involves passing a matrix computed in terms of inner products through quadratic programming, which allows the construction of hypothesis in terms of the kernel. The formulation of the problem involves passing a matrix computed in terms of inner products through quadratic programming, which allows the construction of hypothesis in terms of the kernel. The transformation into an infinite-dimensional nonlinear space is illustrated through the construction of the hypothesis.', 'The transformation into an infinite-dimensional nonlinear space is illustrated through the construction of the hypothesis, which is determined without looking at the data set. The transformation into an infinite-dimensional nonlinear space is illustrated through the construction of the hypothesis, which is determined without looking at the data set. The chapter emphasizes the dependency of the transformation on the dataset and compares it to the hidden layer in neural networks.', "The model's functional form is determined by the kernel chosen, and the hypothesis is represented as a sum of terms with coefficients determined by alpha. The model's functional form is determined by the kernel chosen, and the hypothesis is represented as a sum of terms with coefficients determined by alpha. The chapter emphasizes the relationship between the chosen kernel and the resulting model."]}, {'end': 2733.072, 'start': 2322.498, 'title': 'Validating kernels in svm', 'summary': "Explains the process of validating kernels in support vector machines, including three approaches: conceptual construction, using math properties (mercer's condition), and improvisation, highlighting the necessity for a kernel to be symmetric and the requirement for the kernel matrix to be positive semi-definite for any point set.", 'duration': 410.574, 'highlights': ["The kernel can be validated using three approaches: conceptual construction, math properties (Mercer's condition), and improvisation, with the necessity for the kernel to be symmetric and the requirement for the kernel matrix to be positive semi-definite for any point set.", 'The necessity for the kernel to be symmetric is emphasized, as K of X and X dash being equal to K of X dash and X is a crucial condition for the kernel to represent the dot product in the Z space.', "The requirement for the kernel matrix to be positive semi-definite for any choice of points is highlighted as Mercer's condition, which is necessary for the validity of the kernel in support vector machines."]}], 'duration': 723.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2009635.jpg', 'highlights': ['The formulation of the problem involves passing a matrix computed in terms of inner products through quadratic programming, allowing the construction of hypothesis in terms of the kernel.', 'The transformation into an infinite-dimensional nonlinear space is illustrated through the construction of the hypothesis, determined without looking at the data set, emphasizing the dependency of the transformation on the dataset and comparing it to the hidden layer in neural networks.', "The model's functional form is determined by the kernel chosen, and the hypothesis is represented as a sum of terms with coefficients determined by alpha, emphasizing the relationship between the chosen kernel and the resulting model.", "The kernel can be validated using three approaches: conceptual construction, math properties (Mercer's condition), and improvisation, with the necessity for the kernel to be symmetric and the requirement for the kernel matrix to be positive semi-definite for any point set.", 'The necessity for the kernel to be symmetric is emphasized, as K of X and X dash being equal to K of X dash and X is a crucial condition for the kernel to represent the dot product in the Z space.', "The requirement for the kernel matrix to be positive semi-definite for any choice of points is highlighted as Mercer's condition, necessary for the validity of the kernel in support vector machines."]}, {'end': 3593.335, 'segs': [{'end': 2761.198, 'src': 'embed', 'start': 2733.112, 'weight': 0, 'content': [{'end': 2739.277, 'text': "And if you manage to establish this for any kernel, then you establish that the Z space exists, even if you don't know what the Z space is.", 'start': 2733.112, 'duration': 6.165}, {'end': 2743.75, 'text': 'Done with kernels.', 'start': 2742.97, 'duration': 0.78}, {'end': 2744.851, 'text': "That's half the deal.", 'start': 2744.131, 'duration': 0.72}, {'end': 2754.335, 'text': 'And now we are going to the case where the data is not linearly separable, and we still insist on separating them with making some errors.', 'start': 2745.411, 'duration': 8.924}, {'end': 2761.198, 'text': 'And this brings us back to the old dichotomy between two types of non-separable.', 'start': 2755.275, 'duration': 5.923}], 'summary': 'Establishing existence of z space for any kernel is half the deal in handling non-linearly separable data.', 'duration': 28.086, 'max_score': 2733.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2733112.jpg'}, {'end': 2875.928, 'src': 'embed', 'start': 2819.036, 'weight': 1, 'content': [{'end': 2823.238, 'text': 'And then there is a seriously non-separable case, as in you get this.', 'start': 2819.036, 'duration': 4.202}, {'end': 2825.119, 'text': "It's not a question of outliers.", 'start': 2823.658, 'duration': 1.461}, {'end': 2828.08, 'text': "It's just the surface is there, and you have to go to an linear transformation.", 'start': 2825.139, 'duration': 2.941}, {'end': 2831.147, 'text': 'Kernels deal with this.', 'start': 2829.804, 'duration': 1.343}, {'end': 2838.284, 'text': 'Soft margin support vector machines deal with this.', 'start': 2835.337, 'duration': 2.947}, {'end': 2849.567, 'text': 'And in all reality, when you deal with a practical data set, the chances are the data set will have aspects of both.', 'start': 2840.279, 'duration': 9.288}, {'end': 2858.894, 'text': 'It will have a built-in nonlinearity, and still, even modulo that nonlinearity, some annoying guys are there just to test your learning ability.', 'start': 2850.347, 'duration': 8.547}, {'end': 2869.182, 'text': 'And therefore, you will be combining the kernel with the soft margin support vector machines in almost all the problems that you encounter.', 'start': 2860.095, 'duration': 9.087}, {'end': 2871.746, 'text': "Now let's focus on this.", 'start': 2870.746, 'duration': 1}, {'end': 2873.467, 'text': "I'm now back to the X space.", 'start': 2872.126, 'duration': 1.341}, {'end': 2875.928, 'text': 'The data is not linearly separable.', 'start': 2874.407, 'duration': 1.521}], 'summary': 'Kernel and soft margin svms handle non-separable data with nonlinearity, encountered in practical datasets.', 'duration': 56.892, 'max_score': 2819.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2819036.jpg'}, {'end': 2988.715, 'src': 'embed', 'start': 2959.019, 'weight': 3, 'content': [{'end': 2962.381, 'text': "I'm going to define my error measure based on violating the margin.", 'start': 2959.019, 'duration': 3.362}, {'end': 2964.543, 'text': "So let's see what I mean.", 'start': 2963.322, 'duration': 1.221}, {'end': 2969.781, 'text': 'This point that used to be here has violated the margin.', 'start': 2967.099, 'duration': 2.682}, {'end': 2974.385, 'text': "Now I'm not saying that once you put this here, the same solution will hold or whatever.", 'start': 2970.502, 'duration': 3.883}, {'end': 2979.068, 'text': "I'm just illustrating to you what is a violation of the margin, and how do I quantify it.", 'start': 2974.485, 'duration': 4.583}, {'end': 2980.389, 'text': 'So this is just an illustration.', 'start': 2979.308, 'duration': 1.081}, {'end': 2984.732, 'text': "So this point went in, in spite of the fact that it's correctly classified.", 'start': 2980.709, 'duration': 4.023}, {'end': 2988.715, 'text': "Yes, because this is the line, and it's on the blue side of the line, so to speak.", 'start': 2984.933, 'duration': 3.782}], 'summary': 'Error measure based on violating the margin, illustrated with a point violation.', 'duration': 29.696, 'max_score': 2959.019, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2959019.jpg'}, {'end': 3195.859, 'src': 'heatmap', 'start': 3053.004, 'weight': 0.702, 'content': [{'end': 3056.288, 'text': "And now I'm going to penalize you for the total violation you made.", 'start': 3053.004, 'duration': 3.284}, {'end': 3060.213, 'text': "What is the total violation? I'm just going to add up these violations.", 'start': 3056.969, 'duration': 3.244}, {'end': 3062.884, 'text': 'We have seen error measures before.', 'start': 3061.584, 'duration': 1.3}, {'end': 3066.965, 'text': "We know that it's largely hand-waving, because I have something in mind.", 'start': 3063.264, 'duration': 3.701}, {'end': 3073.567, 'text': "Either I'm thinking of an optimizer, and I want to hand something friendly to it, or I'm thinking of something that is analytically plausible.", 'start': 3067.365, 'duration': 6.202}, {'end': 3075.228, 'text': 'This is no different.', 'start': 3074.307, 'duration': 0.921}, {'end': 3077.688, 'text': 'Why did I choose this instead of square??', 'start': 3075.848, 'duration': 1.84}, {'end': 3078.988, 'text': 'Why did I choose this instead of that??', 'start': 3077.728, 'duration': 1.26}, {'end': 3082.649, 'text': 'All of these are considerations that will come up when you see the result of choosing this.', 'start': 3079.109, 'duration': 3.54}, {'end': 3083.67, 'text': 'This is reasonable.', 'start': 3082.909, 'duration': 0.761}, {'end': 3085.95, 'text': 'This does seem like violating the margin.', 'start': 3084.11, 'duration': 1.84}, {'end': 3088.591, 'text': 'This does seem like measuring the relation of the margin.', 'start': 3086.31, 'duration': 2.281}, {'end': 3093.756, 'text': 'So in the absence of further evidence one way or the other, this is a good error measure to have.', 'start': 3088.991, 'duration': 4.765}, {'end': 3099.882, 'text': 'And then when I plug this error measure in what we had, things will collapse completely back to where we solved it already.', 'start': 3094.136, 'duration': 5.746}, {'end': 3101.463, 'text': 'So this is the big advantage here.', 'start': 3100.242, 'duration': 1.221}, {'end': 3104.927, 'text': 'So that is going to be my error measure.', 'start': 3103.125, 'duration': 1.802}, {'end': 3108.65, 'text': "Now, the new optimization I'm going to do is the following.", 'start': 3106.209, 'duration': 2.441}, {'end': 3113.453, 'text': "It used to be that I'm minimizing this, because minimizing this maximizes the margin.", 'start': 3109.391, 'duration': 4.062}, {'end': 3114.934, 'text': 'That was what we did in the last lecture.', 'start': 3113.473, 'duration': 1.461}, {'end': 3123.178, 'text': "And now I'm going to add an error term that corresponds to the violation of the margin, and it's going to be this.", 'start': 3116.094, 'duration': 7.084}, {'end': 3128.02, 'text': 'This is the quantity that I promised you captures the violation of the margin.', 'start': 3124.677, 'duration': 3.343}, {'end': 3135.986, 'text': 'And this is a constant that gives me the relative importance of this term versus this term.', 'start': 3128.981, 'duration': 7.005}, {'end': 3140.45, 'text': 'This is no different from our notion of augmented error.', 'start': 3137.307, 'duration': 3.143}, {'end': 3147.815, 'text': 'Augmented error, we used to have the in-sample performance, which I guess would be the violation of the margin here.', 'start': 3142.071, 'duration': 5.744}, {'end': 3149.817, 'text': "If you are violating too much, you'll start making errors.", 'start': 3147.855, 'duration': 1.962}, {'end': 3154.159, 'text': 'plus lambda times a regularization term.', 'start': 3150.677, 'duration': 3.482}, {'end': 3156.9, 'text': 'This looks pretty much like a regularization term, like weight decay.', 'start': 3154.179, 'duration': 2.721}, {'end': 3160.722, 'text': 'So this C is actually 1 over the other lambda.', 'start': 3157.941, 'duration': 2.781}, {'end': 3164.384, 'text': 'But this is a standard formulation in SVM for a good reason.', 'start': 3161.222, 'duration': 3.162}, {'end': 3166.785, 'text': 'C will appear in a very nice way in the solution.', 'start': 3164.404, 'duration': 2.381}, {'end': 3170.087, 'text': 'So this is an augmented error that gives different weight.', 'start': 3167.726, 'duration': 2.361}, {'end': 3175.65, 'text': "If I have C close to infinity, then what am I saying? You'd better not violate the margins.", 'start': 3170.447, 'duration': 5.203}, {'end': 3180.329, 'text': 'Because the slightest violation, you mess up what you are minimizing.', 'start': 3176.547, 'duration': 3.782}, {'end': 3187.974, 'text': 'So the end result is that you are going to pick size all of them close to 0, and then the data had better be linearly separable.', 'start': 3180.79, 'duration': 7.184}, {'end': 3189.215, 'text': "And that's what you are solving for.", 'start': 3188.034, 'duration': 1.181}, {'end': 3190.856, 'text': 'So you go back to the hard margin.', 'start': 3189.575, 'duration': 1.281}, {'end': 3195.859, 'text': 'If C is very, very small, then you could be violating the margin right and left.', 'start': 3191.756, 'duration': 4.103}], 'summary': 'Introducing a new error measure for optimization in svm, with emphasis on the role of the parameter c in controlling margin violation.', 'duration': 142.855, 'max_score': 3053.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3053004.jpg'}, {'end': 3226.497, 'src': 'embed', 'start': 3196.78, 'weight': 5, 'content': [{'end': 3198.781, 'text': 'So nominally, you are getting a great margin.', 'start': 3196.78, 'duration': 2.001}, {'end': 3201.874, 'text': 'But you are violating it very frequently.', 'start': 3200.332, 'duration': 1.542}, {'end': 3204.076, 'text': 'And there is a compromise here.', 'start': 3202.615, 'duration': 1.461}, {'end': 3205.398, 'text': "But that's what you are minimizing.", 'start': 3204.297, 'duration': 1.101}, {'end': 3211.425, 'text': 'Subject to, this is what I had before.', 'start': 3206.399, 'duration': 5.026}, {'end': 3214.668, 'text': 'And now the condition adds xi to it.', 'start': 3211.965, 'duration': 2.703}, {'end': 3217.351, 'text': "So I'm requiring this to be the case.", 'start': 3215.79, 'duration': 1.561}, {'end': 3220.856, 'text': "And I said that xi's are non-negative.", 'start': 3218.735, 'duration': 2.121}, {'end': 3223.456, 'text': "I'm only penalizing the violating of the margin.", 'start': 3220.996, 'duration': 2.46}, {'end': 3226.497, 'text': "I'm not rewarding the anti-violation of the margin.", 'start': 3223.717, 'duration': 2.78}], 'summary': 'Frequent margin violations are penalized, not rewarded, in the optimization process.', 'duration': 29.717, 'max_score': 3196.78, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3196780.jpg'}], 'start': 2733.112, 'title': 'Kernel, svm, and error measures', 'summary': 'Delves into establishing z space for any kernel, handling non-separable data types, and introducing error measures based on margin violations with support vector machines to achieve optimal solutions.', 'chapters': [{'end': 2818.175, 'start': 2733.112, 'title': 'Kernel and non-separable data', 'summary': 'Discusses the significance of establishing the z space for any kernel, addressing the dichotomy between two types of non-separable data, and the trade-off between errors and generalization in the case of non-linearly separable data.', 'duration': 85.063, 'highlights': ['Establishing the Z space for any kernel is crucial for proving the existence of the Z space, even without knowing its nature.', 'Addressing the trade-off between errors and generalization in non-linearly separable data, emphasizing the significance of avoiding inordinately complex solutions to prevent huge generalization errors.', 'Discussing the dichotomy between two types of non-separable data and the impact of outliers, highlighting the challenge of dealing with high-dimensional nonlinear spaces and the potential increase in the number of support vectors.']}, {'end': 3593.335, 'start': 2819.036, 'title': 'Support vector machines and error measures', 'summary': 'Discusses the application of support vector machines in handling non-separable cases, introducing error measures based on margin violations, and formulating the lagrangian to minimize the margin violation and achieve the optimal solution.', 'duration': 774.299, 'highlights': ['Support vector machines and kernels are used to handle non-separable cases in practical datasets, combining kernel with soft margin support vector machines.', 'Introduction of error measure based on quantifying margin violations, with the aim of penalizing and minimizing the violation of the margin.', 'Formulation of the Lagrangian to include the new variable xi for penalizing the violation of the margin, leading to the same Lagrangian as before and achieving the same solution.']}], 'duration': 860.223, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU2733112.jpg', 'highlights': ['Establishing the Z space for any kernel is crucial for proving the existence of the Z space, even without knowing its nature.', 'Support vector machines and kernels are used to handle non-separable cases in practical datasets, combining kernel with soft margin support vector machines.', 'Addressing the trade-off between errors and generalization in non-linearly separable data, emphasizing the significance of avoiding inordinately complex solutions to prevent huge generalization errors.', 'Introduction of error measure based on quantifying margin violations, with the aim of penalizing and minimizing the violation of the margin.', 'Discussing the dichotomy between two types of non-separable data and the impact of outliers, highlighting the challenge of dealing with high-dimensional nonlinear spaces and the potential increase in the number of support vectors.', 'Formulation of the Lagrangian to include the new variable xi for penalizing the violation of the margin, leading to the same Lagrangian as before and achieving the same solution.']}, {'end': 4069.88, 'segs': [{'end': 3625.072, 'src': 'embed', 'start': 3594.375, 'weight': 2, 'content': [{'end': 3600.237, 'text': 'The only ramification of beta that we have is that, because beta is greater than or equal to 0,', 'start': 3594.375, 'duration': 5.862}, {'end': 3606.898, 'text': 'and we have this condition Alpha is not only greater than or equal to 0,, which is what it used to be.', 'start': 3600.237, 'duration': 6.661}, {'end': 3609.64, 'text': 'it also cannot be bigger than c.', 'start': 3606.898, 'duration': 2.742}, {'end': 3612.502, 'text': "Because if it's bigger than c, this quantity becomes negative.", 'start': 3609.64, 'duration': 2.862}, {'end': 3616.465, 'text': 'And all of a sudden, I cannot find a legitimate beta to make this true.', 'start': 3613.083, 'duration': 3.382}, {'end': 3625.072, 'text': 'So the only thing out of all of this adventure is that we are going to require that alpha be at most c.', 'start': 3617.386, 'duration': 7.686}], 'summary': 'Beta must be greater than or equal to 0, and alpha must be at most c.', 'duration': 30.697, 'max_score': 3594.375, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3594375.jpg'}, {'end': 3892.043, 'src': 'embed', 'start': 3864.535, 'weight': 0, 'content': [{'end': 3868.278, 'text': 'So as long as you are violating, you are a support vector.', 'start': 3864.535, 'duration': 3.743}, {'end': 3871.12, 'text': 'Not a clean support vector, but a support vector nonetheless.', 'start': 3868.758, 'duration': 2.362}, {'end': 3882.716, 'text': 'Now, the value of c is a very important parameter here, because it tells us how much violation we have versus the width of the yellow region.', 'start': 3873.669, 'duration': 9.047}, {'end': 3892.043, 'text': 'And this is a quantity that will be decided in a practical problem using old-fashioned cross-validation.', 'start': 3883.337, 'duration': 8.706}], 'summary': 'C determines violation vs. yellow region width in support vector machines.', 'duration': 27.508, 'max_score': 3864.535, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3864535.jpg'}, {'end': 3975.872, 'src': 'embed', 'start': 3940.804, 'weight': 1, 'content': [{'end': 3943.086, 'text': 'I never told you to check that the data is linearly separable.', 'start': 3940.804, 'duration': 2.282}, {'end': 3946.528, 'text': 'I give you the data, minimize this subject to that.', 'start': 3943.106, 'duration': 3.422}, {'end': 3950.912, 'text': 'Now if the data is not linearly separable, subject to that will be impossible to satisfy.', 'start': 3946.969, 'duration': 3.943}, {'end': 3952.253, 'text': 'There will be no feasible solution.', 'start': 3951.112, 'duration': 1.141}, {'end': 3957.097, 'text': "Nonetheless, this didn't prevent me from getting a dual and passing it to quadratic programming.", 'start': 3953.074, 'duration': 4.023}, {'end': 3959.499, 'text': 'And maybe quadratic programming will give me back a solution.', 'start': 3957.778, 'duration': 1.721}, {'end': 3962.422, 'text': "So now I'm in a strange world.", 'start': 3959.94, 'duration': 2.482}, {'end': 3975.872, 'text': 'So the key thing to realize is that The translation from the primal form minimizing w transpose w to the dual form maximizing with respect to alpha to the Lagrangian.', 'start': 3963.342, 'duration': 12.53}], 'summary': 'The data might be linearly separable, but a dual form can provide a solution using quadratic programming.', 'duration': 35.068, 'max_score': 3940.804, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3940804.jpg'}, {'end': 4045.715, 'src': 'embed', 'start': 4017.006, 'weight': 3, 'content': [{'end': 4018.707, 'text': 'Quadratic programming passes alphas back to you.', 'start': 4017.006, 'duration': 1.701}, {'end': 4023.071, 'text': "Now, it's impossible that all of a sudden, the data became linearly separable.", 'start': 4019.928, 'duration': 3.143}, {'end': 4024.573, 'text': "You don't have to worry.", 'start': 4023.752, 'duration': 0.821}, {'end': 4027.715, 'text': 'You can always check if the solution separates the data.', 'start': 4024.933, 'duration': 2.782}, {'end': 4031.845, 'text': 'You can evaluate the solution on every point, compare it with the label.', 'start': 4028.422, 'duration': 3.423}, {'end': 4037.028, 'text': "And when you realize that it's not agreeing with the label, you realize that something is wrong.", 'start': 4032.885, 'duration': 4.143}, {'end': 4039.87, 'text': "So you don't have to go through the combinatorial problem.", 'start': 4037.749, 'duration': 2.121}, {'end': 4045.715, 'text': 'Is this linearly separable in the first place? Should I run the perceptron first to see if it converges before? No, no, no, no.', 'start': 4039.95, 'duration': 5.765}], 'summary': 'Quadratic programming checks data separability and avoids combinatorial problem.', 'duration': 28.709, 'max_score': 4017.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4017006.jpg'}], 'start': 3594.375, 'title': 'Constraints on alpha and beta', 'summary': "Discusses the constraint on alpha, where it is required to be at most c, and the implications of this condition on the solution, with beta not being a factor in the new constraint, and the importance of the parameter 'c' in soft margin support vector machines, emphasizing the use of cross-validation for optimal determination.", 'chapters': [{'end': 3669.576, 'start': 3594.375, 'title': 'Constraint on alpha and beta', 'summary': 'Discusses the constraint on alpha, where it is required to be at most c, and the implications of this condition on the solution, with beta not being a factor in the new constraint.', 'duration': 75.201, 'highlights': ['The only ramification of beta is that alpha needs to be at most c, with the added condition of less than or equal to C. This constraint is inherited from the previous slide.', 'If alpha is bigger than c, the quantity becomes negative and a legitimate beta cannot be found to make the condition true.', 'The solution now involves alpha being non-negative, with the added condition of less than or equal to C.']}, {'end': 4069.88, 'start': 3671.457, 'title': 'Support vector machines: types and parameters', 'summary': "Discusses the types of support vectors and the importance of the parameter 'c' in soft margin support vector machines, emphasizing the use of cross-validation for optimal determination, along with technical considerations for non-linearly separable data and the validity of the dual solution.", 'duration': 398.423, 'highlights': ["The value of c is a very important parameter, as it determines how much violation occurs versus the width of the margin, and is decided using cross-validation in practical problems. The parameter 'c' determines the balance between violation and margin width and is determined using cross-validation in practical scenarios.", 'The chapter emphasizes the importance of checking for linear separability in data before applying the machinery, with the possibility of non-linearly separable data leading to infeasible solutions. The chapter stresses the need to verify data linear separability before applying the machinery, as non-linearly separable data may result in infeasible solutions.', 'The discussion highlights the validity of the dual solution and the use of quadratic programming, with the reassurance that the linearity separability check can be performed after obtaining the solution. The chapter reassures the validity of the dual solution and quadratic programming, allowing for the evaluation of linearity separability after obtaining the solution.']}], 'duration': 475.505, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU3594375.jpg', 'highlights': ['The value of c is a very important parameter, as it determines how much violation occurs versus the width of the margin, and is decided using cross-validation in practical problems.', 'The chapter emphasizes the importance of checking for linear separability in data before applying the machinery, with the possibility of non-linearly separable data leading to infeasible solutions.', 'The only ramification of beta is that alpha needs to be at most c, with the added condition of less than or equal to C. This constraint is inherited from the previous slide.', 'The discussion highlights the validity of the dual solution and the use of quadratic programming, with the reassurance that the linearity separability check can be performed after obtaining the solution.', 'If alpha is bigger than c, the quantity becomes negative and a legitimate beta cannot be found to make the condition true.']}, {'end': 4690.082, 'segs': [{'end': 4121.292, 'src': 'embed', 'start': 4093.471, 'weight': 0, 'content': [{'end': 4096.493, 'text': '1 in our mind used to correspond to w0.', 'start': 4093.471, 'duration': 3.022}, {'end': 4103.776, 'text': 'We made it a point at the beginning of discussing support vectors that there is no w0.', 'start': 4099.352, 'duration': 4.424}, {'end': 4106.219, 'text': 'We took it out and called it b, the bias.', 'start': 4104.197, 'duration': 2.022}, {'end': 4107.62, 'text': 'We treated it differently.', 'start': 4106.559, 'duration': 1.061}, {'end': 4112.404, 'text': 'So now we are working with both w0 and b.', 'start': 4108.941, 'duration': 3.463}, {'end': 4116.148, 'text': "Because if you have a constant, you may not call it w0, but effectively it's w0.", 'start': 4112.404, 'duration': 3.744}, {'end': 4118.029, 'text': "It's the guy that gets multiplied by the constant.", 'start': 4116.247, 'duration': 1.782}, {'end': 4121.292, 'text': 'So what gives? Now I have two guys that play the same role.', 'start': 4118.71, 'duration': 2.582}], 'summary': 'In support vector discussion, w0 became b, introducing two parameters.', 'duration': 27.821, 'max_score': 4093.471, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4093471.jpg'}, {'end': 4313.041, 'src': 'embed', 'start': 4283.961, 'weight': 1, 'content': [{'end': 4290.723, 'text': 'And how many support vectors am I going to get? I cannot get more than 2, because I only have 2, even if I go to 100-dimensional space.', 'start': 4283.961, 'duration': 6.762}, {'end': 4297.645, 'text': 'So the linearity is an impression that requires further assumptions.', 'start': 4291.023, 'duration': 6.622}, {'end': 4299.225, 'text': 'But in general, it will not hold.', 'start': 4297.705, 'duration': 1.52}, {'end': 4306.056, 'text': 'Yes, and for example, the RBF kernel, in its form, it looks like infinite dimensional.', 'start': 4299.571, 'duration': 6.485}, {'end': 4313.041, 'text': 'But in reality, I think its effective dimension is very small, because the higher order terms decay very fast.', 'start': 4306.096, 'duration': 6.945}], 'summary': 'Support vector machine has 2 support vectors, may not hold linearity, rbf kernel appears infinite dimensional but has a small effective dimension due to decay of higher order terms.', 'duration': 29.08, 'max_score': 4283.961, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4283961.jpg'}, {'end': 4424.395, 'src': 'embed', 'start': 4396.094, 'weight': 3, 'content': [{'end': 4401.436, 'text': 'The other aspect is that the major success of support vector machines is really in classification.', 'start': 4396.094, 'duration': 5.342}, {'end': 4407.563, 'text': 'They are not as successful competitively in regression.', 'start': 4402.96, 'duration': 4.603}, {'end': 4409.084, 'text': "That's the practical experience.", 'start': 4407.823, 'duration': 1.261}, {'end': 4415.568, 'text': "So I have found that it's not worth the amount of time to go into that.", 'start': 4409.664, 'duration': 5.904}, {'end': 4424.395, 'text': 'Is it safe to assume, then, that if you do the transformation to an infinite dimensional space, the data will be linearly separable there?', 'start': 4417.17, 'duration': 7.225}], 'summary': 'Support vector machines excel in classification but not in regression, making it not worth the time to pursue.', 'duration': 28.301, 'max_score': 4396.094, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4396094.jpg'}, {'end': 4517.197, 'src': 'embed', 'start': 4487.333, 'weight': 4, 'content': [{'end': 4493.019, 'text': 'It just has a certain reliability that it has to have in order not to complain.', 'start': 4487.333, 'duration': 5.686}, {'end': 4497.624, 'text': 'So invariably, when you use quadratic programming, there will be a complaint one way or the other.', 'start': 4493.38, 'duration': 4.244}, {'end': 4504.652, 'text': 'But I have learned not to be completely discouraged by that, and tweak, limit variables and whatnot.', 'start': 4498.666, 'duration': 5.986}, {'end': 4508.256, 'text': 'But this is just completely a practical situation, depending on the package.', 'start': 4504.672, 'duration': 3.584}, {'end': 4511.353, 'text': 'Going back to the previous question.', 'start': 4510.052, 'duration': 1.301}, {'end': 4517.197, 'text': 'so when you said safe but not certain, does that mean just in very degenerate cases?', 'start': 4511.353, 'duration': 5.844}], 'summary': 'Quadratic programming has reliability issues, but can be tweaked for practical use.', 'duration': 29.864, 'max_score': 4487.333, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4487333.jpg'}, {'end': 4688.14, 'src': 'embed', 'start': 4657.035, 'weight': 5, 'content': [{'end': 4660.517, 'text': 'There are packages specifically for SVM that use heuristics.', 'start': 4657.035, 'duration': 3.482}, {'end': 4666.258, 'text': "So they don't specifically pass on the thing to quadratic programming directly, but try to break it into pieces,", 'start': 4660.537, 'duration': 5.721}, {'end': 4670.96, 'text': 'get support vectors for each case and then get the union and so on the hierarchy methods and other methods.', 'start': 4666.258, 'duration': 4.702}, {'end': 4676.726, 'text': 'So they are basically heuristic methods for solving SVM when straightforward quadratic programming will fail.', 'start': 4671.36, 'duration': 5.366}, {'end': 4681.232, 'text': 'And these are also available, and should be used when you have too many data points.', 'start': 4677.107, 'duration': 4.125}, {'end': 4688.14, 'text': "I think that's it.", 'start': 4687.279, 'duration': 0.861}], 'summary': 'Svm packages use heuristic methods to handle large datasets and solve when quadratic programming fails.', 'duration': 31.105, 'max_score': 4657.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4657035.jpg'}], 'start': 4069.94, 'title': 'Support vector machines and dimensionality', 'summary': 'Discusses the transformation to z-space for support vector machines, emphasizing the treatment of constant coordinates and the role of w0 and b in the solution. it also explores the relationship between support vectors and dimensionality, the impact of rbf kernel on effective dimension, and the challenges of generalizing svm to regression.', 'chapters': [{'end': 4158.944, 'start': 4069.94, 'title': 'Support vector machines in z-space', 'summary': 'Discusses the transformation to z-space for support vector machines, emphasizing the treatment of constant coordinates and the role of w0 and b in the solution, ensuring that the weights will go to 0 while the bulk of the bias will go to b.', 'duration': 89.004, 'highlights': ['The transformation to z-space for support vector machines involves the treatment of constant coordinates and the role of w0 and b in the solution, ensuring that the weights will go to 0 while the bulk of the bias will go to b.', 'The chapter addresses the handling of constant coordinates in z-space, clarifying the role of w0 and b in the solution, and emphasizes that all corresponding weights will go to 0 while the bulk of the bias will go to b.', 'In z-space, the treatment of constant coordinates and the distinction between w0 and b in the solution is highlighted, ensuring that all corresponding weights will go to 0 while the bulk of the bias will go to b.']}, {'end': 4690.082, 'start': 4160.044, 'title': 'Svm support vectors and dimensionality', 'summary': 'Discusses the relationship between support vectors and dimensionality, the behavior of support vectors in higher dimensions, the impact of rbf kernel on the effective dimension, and the challenges of generalizing svm to regression. it also covers the limitations of transforming data to infinite-dimensional space, the reliability of quadratic programming, and the potential of combining kernels to create new ones.', 'duration': 530.038, 'highlights': ['The number of support vectors is likely to increase with the increasing dimension, but the exact form depends on the data set and the position of the data set, including the interior points. The relationship between the number of support vectors and the dimensionality is discussed, indicating that the support vectors are likely to increase with the dimension. However, the exact form depends on the data set and the position of the data set, including the interior points.', 'The RBF kernel may appear infinite-dimensional in form, but its effective dimension is very small due to the fast decay of higher-order terms with both exponential and factorial terms. The impact of the RBF kernel on the effective dimension is explained, highlighting that despite its appearance of infinite dimensionality, the effective dimension is very small due to the rapid decay of higher-order terms with both exponential and factorial terms.', 'The practical success of support vector machines lies in classification rather than regression, and the generalization to regression is not extensively covered due to its technical complexity and limited practical success. The limitations of generalizing support vector machines to regression are discussed, emphasizing that while there is a substantial body of knowledge for generalizing to regression, the focus remains on the practical success of SVM in classification rather than regression.', 'Quadratic programming may raise complaints if the matrix given is not positive definite, but despite the complaints, the solution tends to be reliable in most cases. The reliability of quadratic programming is discussed, noting that while complaints may arise if the given matrix is not positive definite, the solution remains reliable in most cases despite the complaints.', 'The scalability of problems that can be solved by SVMs through quadratic programming depends on the software used, with some packages designed specifically for handling large datasets through heuristic methods. The scalability of problems that can be solved by SVMs through quadratic programming is addressed, highlighting that the scalability depends on the software used, and specific packages are designed to handle large datasets using heuristic methods.']}], 'duration': 620.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/XUj5JbQihlU/pics/XUj5JbQihlU4069940.jpg', 'highlights': ['The transformation to z-space for support vector machines involves the treatment of constant coordinates and the role of w0 and b in the solution, ensuring that the weights will go to 0 while the bulk of the bias will go to b.', 'The RBF kernel may appear infinite-dimensional in form, but its effective dimension is very small due to the fast decay of higher-order terms with both exponential and factorial terms.', 'The number of support vectors is likely to increase with the increasing dimension, but the exact form depends on the data set and the position of the data set, including the interior points.', 'The limitations of generalizing support vector machines to regression are discussed, emphasizing that while there is a substantial body of knowledge for generalizing to regression, the focus remains on the practical success of SVM in classification rather than regression.', 'The reliability of quadratic programming is discussed, noting that while complaints may arise if the given matrix is not positive definite, the solution remains reliable in most cases despite the complaints.', 'The scalability of problems that can be solved by SVMs through quadratic programming is addressed, highlighting that the scalability depends on the software used, and specific packages are designed to handle large datasets using heuristic methods.']}], 'highlights': ['The importance of maximizing the margin for performance by selecting the line with the biggest margin, providing an advantage both intuitively and theoretically.', 'The use of Lagrangian, a quadratic function with simple inequality constraints and one equality constraint, to be maximized to solve for alphas, with the majority of alphas being 0, providing an interesting interpretation.', 'The support vectors achieve the margin and define the plane, and the number of support vectors can predict the out-of-sample error based on the in-sample quantity, providing a measure for error prediction.', 'The use of nonlinear transform results in a complex hypothesis set in a high-dimensional space, but it maximizes the margin, leading to low out-of-sample error despite capturing the fitting well.', 'Obtaining the inner product in the Z space is a prerequisite for carrying out the Lagrangian machinery.', 'The kernel function corresponds to an inner product in an infinite-dimensional space, providing the benefits of a nonlinear transformation without concerns about generalization issues.', 'The transformation to infinite-dimensional space using radial basis function kernel results in a small number of support vectors and a respectable margin, despite using an infinite number of parameters.', 'The formulation of the problem involves passing a matrix computed in terms of inner products through quadratic programming, allowing the construction of hypothesis in terms of the kernel.', 'Establishing the Z space for any kernel is crucial for proving the existence of the Z space, even without knowing its nature.', 'The value of c is a very important parameter, as it determines how much violation occurs versus the width of the margin, and is decided using cross-validation in practical problems.', 'The transformation to z-space for support vector machines involves the treatment of constant coordinates and the role of w0 and b in the solution, ensuring that the weights will go to 0 while the bulk of the bias will go to b.', 'The RBF kernel may appear infinite-dimensional in form, but its effective dimension is very small due to the fast decay of higher-order terms with both exponential and factorial terms.']}