title
Lecture 16 - Radial Basis Functions
description
Radial Basis Functions - An important learning model that connects several machine learning models and techniques. Lecture 16 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - https://itunes.apple.com/us/course/machine-learning/id515364596 and on the course website - http://work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, http://creativecommons.org/licenses/by-nc-nd/3.0/
This lecture was recorded on May 24, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.
detail
{'title': 'Lecture 16 - Radial Basis Functions', 'heatmap': [{'end': 349.026, 'start': 294.844, 'weight': 0.737}, {'end': 596.237, 'start': 385.113, 'weight': 1}, {'end': 3059.982, 'start': 2906.421, 'weight': 0.901}], 'summary': 'Discusses svm generalization, introduces radial basis functions (rbf) in machine learning, covers learning algorithms, unsupervised learning, compares svm and neural networks, optimization of parameters for gaussian-based neural networks, rbf implementation, function approximation, and interprets rbfs for smooth interpolation and simulating two-layer neural networks.', 'chapters': [{'end': 248.212, 'segs': [{'end': 64.251, 'src': 'embed', 'start': 0.703, 'weight': 0, 'content': [{'end': 3.564, 'text': 'The following program is brought to you by Caltech.', 'start': 0.703, 'duration': 2.861}, {'end': 15.889, 'text': 'Welcome back.', 'start': 15.369, 'duration': 0.52}, {'end': 31.54, 'text': 'Last time we talked about kernel methods, which is a generalization of the basic SVM algorithm to accommodate feature spaces Z,', 'start': 18.109, 'duration': 13.431}, {'end': 44.511, 'text': "which are possibly infinite and which we don't have to explicitly know or transform our inputs to in order to be able to carry out the support vector machinery.", 'start': 31.54, 'duration': 12.971}, {'end': 50.978, 'text': 'And the idea was to define a kernel that captures the inner product in that space.', 'start': 45.512, 'duration': 5.466}, {'end': 58.446, 'text': 'And if you can compute that kernel, the generalized inner product for the z space.', 'start': 51.779, 'duration': 6.667}, {'end': 64.251, 'text': 'this is the only operation you need in order to carry the algorithm and in order to interpret the solution after you get it.', 'start': 58.446, 'duration': 5.805}], 'summary': 'Kernel methods in svm generalizes basic algorithm to accommodate feature spaces, allowing computation of kernel to carry out the support vector machinery.', 'duration': 63.548, 'max_score': 0.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc703.jpg'}, {'end': 128.6, 'src': 'embed', 'start': 104.119, 'weight': 1, 'content': [{'end': 114.068, 'text': 'So with this, we went into another way to generalize SVM, not by having a nonlinear transform in this case, but by having an allowance for errors.', 'start': 104.119, 'duration': 9.949}, {'end': 117.17, 'text': 'Errors in this case would be violations of the margin.', 'start': 114.268, 'duration': 2.902}, {'end': 119.953, 'text': 'The margin is the currency we use in SVM.', 'start': 117.571, 'duration': 2.382}, {'end': 128.6, 'text': 'And we added a term to the objective function that allows us to violate the margin for different points, according to the variable psi.', 'start': 120.754, 'duration': 7.846}], 'summary': 'Generalized svm with allowance for errors to violate margin.', 'duration': 24.481, 'max_score': 104.119, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc104119.jpg'}], 'start': 0.703, 'title': 'Svm generalization', 'summary': 'Discusses kernel methods, rbfs, and generalizing svm by accommodating possibly infinite feature spaces, allowing margin violations, and achieving better generalization prospects, resulting in an identical solution to the original svm.', 'chapters': [{'end': 82.62, 'start': 0.703, 'title': 'Kernel methods and rbfs at caltech', 'summary': 'Discusses kernel methods and the rbf kernel, highlighting the generalization of the basic svm algorithm to accommodate possibly infinite feature spaces and the computation of the rbf kernel in terms of x, corresponding to an infinite dimensional space.', 'duration': 81.917, 'highlights': ['The chapter discusses the generalization of the basic SVM algorithm to accommodate feature spaces Z, which are possibly infinite, without the need to explicitly know or transform inputs.', 'It emphasizes the importance of defining a kernel that captures the inner product in the space, as it is the only operation needed to carry the algorithm and interpret the solution.', 'The RBF kernel, suitable for discussing radial basis functions, is presented as a simple computation in terms of x, corresponding to an infinite dimensional space.']}, {'end': 248.212, 'start': 83.22, 'title': 'Generalizing svm with margin errors', 'summary': 'Discusses generalizing support vector machines by allowing margin violations using a term in the objective function, providing flexibility in accommodating outliers and achieving better generalization prospects, resulting in an identical solution to the original svm.', 'duration': 164.992, 'highlights': ['By introducing a term in the objective function to allow margin violations, the chapter provides a way to accommodate outliers and achieve better generalization prospects, resulting in an identical solution to the original SVM.', 'The parameter c in the objective function provides a degree of freedom in design, allowing for different levels of tolerance towards violations, ultimately influencing the margin and the number of support vectors.', 'The modification of the problem statement to allow margin errors does not affect the solution, as it still involves applying quadratic programming with similar constraints, except for the limitation of alpha n by capital C.']}], 'duration': 247.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc703.jpg', 'highlights': ['The chapter discusses generalizing the basic SVM algorithm to accommodate possibly infinite feature spaces without explicit knowledge or transformation.', 'Introduces a term in the objective function to allow margin violations, accommodating outliers and achieving better generalization prospects.', 'Emphasizes the importance of defining a kernel capturing the inner product in the space, as the only operation needed to carry the algorithm and interpret the solution.']}, {'end': 582.394, 'segs': [{'end': 349.026, 'src': 'heatmap', 'start': 294.844, 'weight': 0.737, 'content': [{'end': 295.904, 'text': 'Very small overhead.', 'start': 294.844, 'duration': 1.06}, {'end': 300.367, 'text': 'There is a particular criteria that makes it better than just choosing a random separating plane.', 'start': 296.465, 'duration': 3.902}, {'end': 304.029, 'text': 'And therefore, it does reflect on the out-of-sample performance.', 'start': 301.007, 'duration': 3.022}, {'end': 309.145, 'text': "Today's topic is a new model, which is radial basis functions.", 'start': 305.663, 'duration': 3.482}, {'end': 313.608, 'text': "Not so new, because we had a version of it under SVM, and we'll be able to relate to it.", 'start': 309.225, 'duration': 4.383}, {'end': 315.389, 'text': "But it's an interesting model.", 'start': 314.248, 'duration': 1.141}, {'end': 322.233, 'text': 'In its own right, it captures a particular understanding of the input space that we will talk about.', 'start': 316.11, 'duration': 6.123}, {'end': 334.419, 'text': 'But the most important aspect that the radial basis functions provide for us is the fact that they relate to so many facets of machine learning that we have already touched on,', 'start': 323.234, 'duration': 11.185}, {'end': 341.082, 'text': "and other aspects that we didn't touch on in pattern recognition, that it's worthwhile to understand the model and see how it relates.", 'start': 334.419, 'duration': 6.663}, {'end': 345.604, 'text': 'It almost serves as a glue between so many different topics in machine learning.', 'start': 341.422, 'duration': 4.182}, {'end': 349.026, 'text': 'And this is one of the important aspects of studying the subject.', 'start': 345.904, 'duration': 3.122}], 'summary': 'Radial basis functions model relates to many facets of machine learning and serves as a glue between different topics.', 'duration': 54.182, 'max_score': 294.844, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc294844.jpg'}, {'end': 341.082, 'src': 'embed', 'start': 316.11, 'weight': 0, 'content': [{'end': 322.233, 'text': 'In its own right, it captures a particular understanding of the input space that we will talk about.', 'start': 316.11, 'duration': 6.123}, {'end': 334.419, 'text': 'But the most important aspect that the radial basis functions provide for us is the fact that they relate to so many facets of machine learning that we have already touched on,', 'start': 323.234, 'duration': 11.185}, {'end': 341.082, 'text': "and other aspects that we didn't touch on in pattern recognition, that it's worthwhile to understand the model and see how it relates.", 'start': 334.419, 'duration': 6.663}], 'summary': 'Radial basis functions relate to many facets of machine learning and pattern recognition.', 'duration': 24.972, 'max_score': 316.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc316110.jpg'}, {'end': 412.057, 'src': 'embed', 'start': 381.151, 'weight': 1, 'content': [{'end': 384.533, 'text': 'Obviously, it should relate to the RBF kernel, and it will.', 'start': 381.151, 'duration': 3.382}, {'end': 394.619, 'text': 'And finally, it will relate to regularization, which is actually the origin in function approximation for the study of RBFs.', 'start': 385.113, 'duration': 9.506}, {'end': 398.921, 'text': "Let's first describe the basic radial basis function model.", 'start': 394.699, 'duration': 4.222}, {'end': 410.957, 'text': 'The idea here is that every point in your data set will influence the value of the hypothesis at every point x.', 'start': 401.454, 'duration': 9.503}, {'end': 412.057, 'text': "That's nothing new.", 'start': 410.957, 'duration': 1.1}], 'summary': 'The transcript discusses the rbf kernel, regularization, and the radial basis function model in function approximation.', 'duration': 30.906, 'max_score': 381.151, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc381151.jpg'}], 'start': 248.792, 'title': 'Radial basis functions model', 'summary': 'Introduces the concept of radial basis functions (rbf) in machine learning, emphasizing its role as a technique for classification with minimal overhead and its ability to relate to various facets of machine learning. it also explains the standard form of the rbf model and its influence on the hypothesis at different data points.', 'chapters': [{'end': 582.394, 'start': 248.792, 'title': 'Radial basis functions model', 'summary': 'Introduces the concept of radial basis functions (rbf) and its relevance in machine learning, emphasizing its role as a technique for classification with minimal overhead and its ability to relate to various facets of machine learning. it also explains the standard form of the rbf model and its influence on the hypothesis at different data points.', 'duration': 333.602, 'highlights': ['The radial basis functions (RBF) serve as a technique for classification with minimal overhead, often chosen as the model of choice by many, and its ability to relate to various facets of machine learning. technique for classification, minimal overhead, model of choice, relates to various facets of machine learning', 'The chapter explains the standard form of the radial basis function (RBF) model, emphasizing its influence on the hypothesis at different data points based on the distance and the value of the target. standard form of RBF model, influence based on distance, value of the target', 'The concept of radial basis functions (RBF) is introduced, highlighting its relevance in machine learning and ability to capture a particular understanding of the input space, serving as a glue between different topics in machine learning. relevance in machine learning, captures understanding of input space, serves as a glue between different topics']}], 'duration': 333.602, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc248792.jpg', 'highlights': ['The radial basis functions (RBF) serve as a technique for classification with minimal overhead, often chosen as the model of choice by many, and its ability to relate to various facets of machine learning.', 'The chapter explains the standard form of the radial basis function (RBF) model, emphasizing its influence on the hypothesis at different data points based on the distance and the value of the target.', 'The concept of radial basis functions (RBF) is introduced, highlighting its relevance in machine learning and ability to capture a particular understanding of the input space, serving as a glue between different topics in machine learning.']}, {'end': 2217.648, 'segs': [{'end': 609.389, 'src': 'embed', 'start': 583.235, 'weight': 2, 'content': [{'end': 587.576, 'text': 'But this is basically the model in its simplest form, and its most popular form.', 'start': 583.235, 'duration': 4.341}, {'end': 589.437, 'text': 'Most people will use a Gaussian like this.', 'start': 587.836, 'duration': 1.601}, {'end': 592.838, 'text': 'And this will be the functional form for the hypothesis.', 'start': 590.017, 'duration': 2.821}, {'end': 596.237, 'text': 'Now we have the model.', 'start': 595.256, 'duration': 0.981}, {'end': 604.384, 'text': 'The next question we normally ask is, what is the learning algorithm? What is a learning algorithm in general? You want to find the parameters.', 'start': 596.638, 'duration': 7.746}, {'end': 606.827, 'text': 'And we call the parameters w1 up to wn.', 'start': 604.444, 'duration': 2.383}, {'end': 609.389, 'text': 'And they have this functional form.', 'start': 607.607, 'duration': 1.782}], 'summary': 'The gaussian model is widely used for hypothesis with parameters w1 to wn.', 'duration': 26.154, 'max_score': 583.235, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc583235.jpg'}, {'end': 720.949, 'src': 'embed', 'start': 663.955, 'weight': 0, 'content': [{'end': 671.26, 'text': "When we, let's say, ask, ambitiously, to have the in-sample error being 0, I want to be exactly right on the data points.", 'start': 663.955, 'duration': 7.305}, {'end': 674.242, 'text': 'I should expect that I will be able to do that.', 'start': 672.1, 'duration': 2.142}, {'end': 686.631, 'text': "Why? Because really, I have quite a number of parameters here, don't I? I have N data points, and I'm trying to learn N parameters.", 'start': 674.702, 'duration': 11.929}, {'end': 692.636, 'text': 'Notwithstanding the generalization ramifications of that statement,', 'start': 688.393, 'duration': 4.243}, {'end': 698.826, 'text': 'it should be easy to get parameters that really knock down the in-sample error to 0..', 'start': 692.636, 'duration': 6.19}, {'end': 707.737, 'text': "So in doing that, what I'm going to do, I'm going to apply this to every point xn, and ask that the output of the hypothesis be equal to yn.", 'start': 698.826, 'duration': 8.911}, {'end': 708.938, 'text': 'No error at all.', 'start': 708.257, 'duration': 0.681}, {'end': 712.182, 'text': 'So indeed, the in-sample error will be 0.', 'start': 709.299, 'duration': 2.883}, {'end': 714.385, 'text': "So let's substitute in the equation here.", 'start': 712.182, 'duration': 2.203}, {'end': 719.949, 'text': 'And this is true for all n up to N.', 'start': 715.806, 'duration': 4.143}, {'end': 720.949, 'text': 'And here is what you have.', 'start': 719.949, 'duration': 1}], 'summary': 'Achieving in-sample error of 0 by matching n parameters to n data points.', 'duration': 56.994, 'max_score': 663.955, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc663955.jpg'}, {'end': 886.039, 'src': 'embed', 'start': 851.025, 'weight': 3, 'content': [{'end': 856.69, 'text': 'the solution is very simply just w equals the inverse of phi times y.', 'start': 851.025, 'duration': 5.665}, {'end': 862.22, 'text': 'In that case, you interpret your solution as exact interpolation.', 'start': 856.69, 'duration': 5.53}, {'end': 873.569, 'text': 'Because what you are really doing is, on the points that you know the value, which are the training points, you are getting the value exactly.', 'start': 863.001, 'duration': 10.568}, {'end': 875.07, 'text': "That's what you solve for.", 'start': 873.689, 'duration': 1.381}, {'end': 886.039, 'text': 'And now the kernel, which is the Gaussian in this case, what it does is interpolate between the points to give you the value on the other axes.', 'start': 875.911, 'duration': 10.128}], 'summary': 'The solution for w is w = inverse of phi times y, providing exact interpolation for training points and interpolation between points using a gaussian kernel.', 'duration': 35.014, 'max_score': 851.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc851025.jpg'}, {'end': 1074.346, 'src': 'embed', 'start': 1025.974, 'weight': 4, 'content': [{'end': 1033.501, 'text': "And you probably in your mind think that gamma matters also in relation to the distance between the points, because that's what the interpolation is.", 'start': 1025.974, 'duration': 7.527}, {'end': 1036.483, 'text': 'And we will discuss the choice of gamma towards the end.', 'start': 1033.921, 'duration': 2.562}, {'end': 1041.588, 'text': 'After we settle all the other parameters, we will go and visit gamma and see how we can choose it wisely.', 'start': 1036.584, 'duration': 5.004}, {'end': 1047.907, 'text': 'With this in mind, we have a model.', 'start': 1046.146, 'duration': 1.761}, {'end': 1051.451, 'text': 'But that model, if you look at it, is a regression model.', 'start': 1048.147, 'duration': 3.304}, {'end': 1053.912, 'text': 'I consider the output to be real-valued.', 'start': 1051.851, 'duration': 2.061}, {'end': 1059.236, 'text': 'And I match the real-valued output to the target output, which is also real-valued.', 'start': 1054.913, 'duration': 4.323}, {'end': 1063.238, 'text': 'Often, we will use RBFs for classification.', 'start': 1060.316, 'duration': 2.922}, {'end': 1069.683, 'text': 'So when you look at h of x, which used to be regression this way, it gives you a real number.', 'start': 1063.999, 'duration': 5.684}, {'end': 1074.346, 'text': 'Now we are going to take, as usual, the sine of this quantity.', 'start': 1070.743, 'duration': 3.603}], 'summary': 'Discussing the choice of gamma for regression and rbf classification models.', 'duration': 48.372, 'max_score': 1025.974, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc1025973.jpg'}, {'end': 1226.73, 'src': 'embed', 'start': 1194.549, 'weight': 5, 'content': [{'end': 1196.269, 'text': 'This is the nearest neighbor method.', 'start': 1194.549, 'duration': 1.72}, {'end': 1196.99, 'text': "Let's look at it.", 'start': 1196.309, 'duration': 0.681}, {'end': 1202.934, 'text': 'The idea of nearest neighbor is that I give you a data set, and each data set has a value yn.', 'start': 1198.09, 'duration': 4.844}, {'end': 1205.417, 'text': 'It could be a label if you are talking about classification.', 'start': 1203.035, 'duration': 2.382}, {'end': 1206.838, 'text': 'It could be a real value.', 'start': 1205.437, 'duration': 1.401}, {'end': 1213.244, 'text': 'And what you do for classifying other points or assigning values to other points is very simple.', 'start': 1207.959, 'duration': 5.285}, {'end': 1217.627, 'text': 'You look at the closest point within the training set.', 'start': 1213.984, 'duration': 3.643}, {'end': 1219.568, 'text': 'to the point you are considering.', 'start': 1218.328, 'duration': 1.24}, {'end': 1220.688, 'text': 'So you have x.', 'start': 1219.588, 'duration': 1.1}, {'end': 1226.73, 'text': 'You look at what is x sub n in the training set that is closest to me in Euclidean distance.', 'start': 1220.688, 'duration': 6.042}], 'summary': 'Nearest neighbor method assigns values based on closest point in training set.', 'duration': 32.181, 'max_score': 1194.549, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc1194549.jpg'}, {'end': 1382.219, 'src': 'embed', 'start': 1357.097, 'weight': 6, 'content': [{'end': 1365.241, 'text': 'So in order to make it less abrupt, The nearest neighbor is modified to becoming k nearest neighbors.', 'start': 1357.097, 'duration': 8.144}, {'end': 1369.726, 'text': "That is, instead of taking the value of the closest point, you look, let's say,", 'start': 1365.541, 'duration': 4.185}, {'end': 1375.873, 'text': 'for the 3 closest points or the 5 closest points or the 7 closest points and then take a vote.', 'start': 1369.726, 'duration': 6.147}, {'end': 1380.117, 'text': 'If most of them are plus 1, you consider yourself plus 1.', 'start': 1376.753, 'duration': 3.364}, {'end': 1382.219, 'text': 'That helps even things out a little bit.', 'start': 1380.117, 'duration': 2.102}], 'summary': 'Modified nearest neighbor to k nearest neighbors for better classification by considering 3, 5, or 7 closest points and taking a vote.', 'duration': 25.122, 'max_score': 1357.097, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc1357097.jpg'}, {'end': 1557.099, 'src': 'embed', 'start': 1530.425, 'weight': 8, 'content': [{'end': 1538.11, 'text': "you take K, which is the number of centers in this case, and hopefully it's much smaller than n, so that the generalization worry is mitigated.", 'start': 1530.425, 'duration': 7.685}, {'end': 1541.712, 'text': 'And you define the centers.', 'start': 1539.751, 'duration': 1.961}, {'end': 1548.737, 'text': 'These are vectors, mu 1 up to mu sub k, as the centers of the radial basis functions.', 'start': 1541.732, 'duration': 7.005}, {'end': 1554.318, 'text': 'Instead of having x1 up to xn, the data points themselves being the center.', 'start': 1549.677, 'duration': 4.641}, {'end': 1557.099, 'text': 'Now, those guys live in the same space.', 'start': 1554.838, 'duration': 2.261}], 'summary': 'Define k centers, mu 1 up to mu sub k, as the centers of radial basis functions, aiming for k to be much smaller than n to mitigate generalization worry.', 'duration': 26.674, 'max_score': 1530.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc1530425.jpg'}, {'end': 1989.565, 'src': 'embed', 'start': 1960.97, 'weight': 10, 'content': [{'end': 1967.274, 'text': "And we'll get not to the global minimum, which is the finding of which is NP-hard, but a local minimum, hopefully a decent local minimum.", 'start': 1960.97, 'duration': 6.304}, {'end': 1969.175, 'text': "We'll do exactly the same thing here.", 'start': 1967.674, 'duration': 1.501}, {'end': 1975.097, 'text': 'So here is the iterative algorithm for solving this problem, the K-means.', 'start': 1971.615, 'duration': 3.482}, {'end': 1977.739, 'text': "And it's called Lloyd's algorithm.", 'start': 1976.138, 'duration': 1.601}, {'end': 1984.202, 'text': 'It is extremely simple to the level where the contrast between this algorithm not only in the specification of it,', 'start': 1978.219, 'duration': 5.983}, {'end': 1989.565, 'text': 'by how quickly it converges and the fact that finding the global minimum is NP-hard is rather mind-boggling.', 'start': 1984.202, 'duration': 5.363}], 'summary': 'Iterative k-means algorithm finds local minimum efficiently.', 'duration': 28.595, 'max_score': 1960.97, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc1960970.jpg'}], 'start': 583.235, 'title': 'Learning algorithms and interpolation methods', 'summary': 'Covers linear regression learning algorithm, interpolation with gaussian kernels, rbfs for classification, nearest neighbor method, and radial basis function with k-means clustering, supported by specific data points and parameter modifications.', 'chapters': [{'end': 737.579, 'start': 583.235, 'title': 'Linear regression learning algorithm', 'summary': 'Explains the model for linear regression in its simplest form, highlighting the learning algorithm to find parameters and aiming to minimize the in-sample error to 0, supported by the number of parameters and data points.', 'duration': 154.344, 'highlights': ['The model for linear regression in its simplest form is explained, with emphasis on using a Gaussian model as the functional form for the hypothesis.', 'The learning algorithm is detailed, focusing on finding parameters w1 up to wn and aiming to minimize the in-sample error to 0 by evaluating the hypothesis on the data points.', 'The significance of the number of parameters and data points in relation to minimizing the in-sample error is highlighted, indicating the potential ease of achieving an in-sample error of 0.']}, {'end': 1047.907, 'start': 738.439, 'title': 'Interpolation with gaussian kernels', 'summary': 'Explains the process of solving equations and unknowns using matrix form with n data points and the effect of the gamma parameter on the interpolation, showing the impact of small and large gamma values on the gaussian interpolation.', 'duration': 309.468, 'highlights': ['The solution involves solving N equations with N unknowns in matrix form using the matrix phi and vectors w and y, resulting in the exact interpolation of values at training points. The process involves solving N equations with N unknowns in matrix form using the matrix phi and vectors w and y, ensuring exact interpolation of values at training points.', 'The gamma parameter significantly affects the outcome of the interpolation, as a small gamma results in a wide Gaussian interpolation, while a large gamma leads to a narrow Gaussian interpolation, impacting the meaningfulness of the interpolations between points. The gamma parameter has a significant impact on the interpolation outcome, with small gamma leading to wide Gaussian interpolation and large gamma resulting in narrow Gaussian interpolation, affecting the meaningfulness of interpolations between points.', 'A small gamma results in successful interpolation between points, while a large gamma leads to poor interpolation due to the quick dying out of Gaussian inference between points, emphasizing the importance of gamma in relation to the distance between points. Small gamma leads to successful interpolation between points, while large gamma results in poor interpolation due to quick dying out of Gaussian inference between points, highlighting the importance of gamma in relation to the distance between points.']}, {'end': 1419.569, 'start': 1048.147, 'title': 'Rbfs for classification and nearest neighbor method', 'summary': 'Explores the use of radial basis functions (rbfs) for classification, including the transformation of real-valued outputs to yes-no decisions, and the modification of the nearest neighbor method to k-nearest neighbors for smoothing the surface of the classification boundary.', 'duration': 371.422, 'highlights': ['The chapter explains the transformation of real-valued outputs to yes-no decisions for classification using RBFs, and the minimization of mean squared error to match the plus-minus 1 target, akin to linear regression for classification.', 'The discussion on the nearest neighbor method details the classification of points based on the label of the nearest point within the training set, and the modification to k-nearest neighbors for smoothing the classification boundary and reducing fluctuations.']}, {'end': 2217.648, 'start': 1419.569, 'title': 'Radial basis function and k-means clustering', 'summary': "Discusses the concepts of radial basis function and k-means clustering, with a focus on modifying the interpolation model to address the issue of having n parameters based on n data points, introducing k as the number of centers to mitigate generalization worries, and detailing the iterative algorithm for solving the k-means problem using lloyd's algorithm.", 'duration': 798.079, 'highlights': ['Modifying the interpolation model to address the issue of having N parameters based on N data points The chapter addresses the problem of having N parameters based on N data points in the model and modifies it by introducing K as the number of centers to mitigate generalization worries.', 'Introducing K as the number of centers to mitigate generalization worries The concept of introducing K as the number of centers is emphasized to mitigate generalization worries and reduce the ratio between data points and parameters.', "Detailing the iterative algorithm for solving the K-means problem using Lloyd's algorithm The chapter details the iterative algorithm for solving the K-means problem using Lloyd's algorithm, which involves iteratively minimizing the clustering and center parameters to converge to a local minimum."]}], 'duration': 1634.413, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc583235.jpg', 'highlights': ['The learning algorithm aims to minimize the in-sample error to 0 by evaluating the hypothesis on the data points.', 'The significance of the number of parameters and data points in relation to minimizing the in-sample error is highlighted.', 'The model for linear regression in its simplest form is explained, with emphasis on using a Gaussian model as the functional form for the hypothesis.', 'The solution involves solving N equations with N unknowns in matrix form using the matrix phi and vectors w and y, ensuring exact interpolation of values at training points.', 'The gamma parameter has a significant impact on the interpolation outcome, with small gamma leading to wide Gaussian interpolation and large gamma resulting in narrow Gaussian interpolation, affecting the meaningfulness of interpolations between points.', 'The discussion on the nearest neighbor method details the classification of points based on the label of the nearest point within the training set.', 'The modification to k-nearest neighbors for smoothing the classification boundary and reducing fluctuations is discussed.', 'The chapter explains the transformation of real-valued outputs to yes-no decisions for classification using RBFs, and the minimization of mean squared error to match the plus-minus 1 target, akin to linear regression for classification.', 'The chapter addresses the problem of having N parameters based on N data points in the model and modifies it by introducing K as the number of centers to mitigate generalization worries.', 'The concept of introducing K as the number of centers is emphasized to mitigate generalization worries and reduce the ratio between data points and parameters.', "The chapter details the iterative algorithm for solving the K-means problem using Lloyd's algorithm, which involves iteratively minimizing the clustering and center parameters to converge to a local minimum."]}, {'end': 2718.594, 'segs': [{'end': 2256.423, 'src': 'embed', 'start': 2218.409, 'weight': 0, 'content': [{'end': 2223.572, 'text': 'Depending on your initial configuration, you will end up with one local minimum or another.', 'start': 2218.409, 'duration': 5.163}, {'end': 2227.835, 'text': 'But again, exactly the same situation as we had with neural networks.', 'start': 2224.313, 'duration': 3.522}, {'end': 2231.357, 'text': 'We did converge to a local minimum with backpropagation.', 'start': 2228.315, 'duration': 3.042}, {'end': 2234.079, 'text': 'And that minimum depended on the initial weights.', 'start': 2231.937, 'duration': 2.142}, {'end': 2239.64, 'text': 'Here, it will depend on the initial centers, or the initial clustering, whichever way you want to begin.', 'start': 2234.679, 'duration': 4.961}, {'end': 2247.281, 'text': 'And the way you do it is, try different starting points, and you get different solutions.', 'start': 2240.96, 'duration': 6.321}, {'end': 2249.622, 'text': 'And you can evaluate which one is better,', 'start': 2247.641, 'duration': 1.981}, {'end': 2254.902, 'text': 'because you can definitely evaluate this objective function for all of them and pick one out of a number of runs.', 'start': 2249.622, 'duration': 5.28}, {'end': 2256.423, 'text': 'That usually works very nicely.', 'start': 2255.142, 'duration': 1.281}], 'summary': 'Different initial configurations lead to different local minima, similar to neural networks. evaluating objective function helps choose the best solution.', 'duration': 38.014, 'max_score': 2218.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2218409.jpg'}, {'end': 2421.51, 'src': 'embed', 'start': 2389.608, 'weight': 2, 'content': [{'end': 2391.449, 'text': 'So here are my initial centers.', 'start': 2389.608, 'duration': 1.841}, {'end': 2392.79, 'text': 'Totally random.', 'start': 2392.07, 'duration': 0.72}, {'end': 2399.655, 'text': 'Looks like a terribly stupid thing to have three centers near to each other, and have this entire area empty.', 'start': 2393.311, 'duration': 6.344}, {'end': 2404.238, 'text': "But let's hope that Lloyd's algorithm will place them a little bit more strategically.", 'start': 2399.735, 'duration': 4.503}, {'end': 2409.342, 'text': 'Now you iterate.', 'start': 2408.702, 'duration': 0.64}, {'end': 2411.684, 'text': 'I would like you to stare at this.', 'start': 2410.183, 'duration': 1.501}, {'end': 2414.746, 'text': 'I will even make it bigger.', 'start': 2412.925, 'duration': 1.821}, {'end': 2421.51, 'text': "Stare at it, because I'm going to do a full iteration now.", 'start': 2419.069, 'duration': 2.441}], 'summary': "Initial centers placed randomly, hoping for strategic placement with lloyd's algorithm.", 'duration': 31.902, 'max_score': 2389.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2389608.jpg'}, {'end': 2729.998, 'src': 'embed', 'start': 2699.49, 'weight': 3, 'content': [{'end': 2702.332, 'text': 'So some of them are on this side, some of them are on this side.', 'start': 2699.49, 'duration': 2.842}, {'end': 2706.929, 'text': 'And indeed, they serve completely different purposes.', 'start': 2704.528, 'duration': 2.401}, {'end': 2715.252, 'text': "And it's rather remarkable that we get two solutions using the same kernel, which is the RBF kernel,", 'start': 2708.149, 'duration': 7.103}, {'end': 2718.594, 'text': 'using such an incredibly different diversity of approaches.', 'start': 2715.252, 'duration': 3.342}, {'end': 2729.998, 'text': 'So this was just to show you the difference between when you do the choice of important points in an unsupervised way and here patently in a supervised way.', 'start': 2719.194, 'duration': 10.804}], 'summary': 'Comparison of two solutions using rbf kernel for different purposes.', 'duration': 30.508, 'max_score': 2699.49, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2699490.jpg'}], 'start': 2218.409, 'title': 'Clustering in unsupervised learning', 'summary': "Discusses local minima dependency on initial centers, the need for trying different starting points, and the evaluation of objective function for selecting the best solution in clustering, with a focus on lloyd's algorithm. it also compares unsupervised learning with supervised learning, emphasizes the use of random initialization and the convergence of centers in lloyd's algorithm, and provides a comparison of rbf centers and support vectors.", 'chapters': [{'end': 2333.058, 'start': 2218.409, 'title': 'Local minima in clustering', 'summary': "Discusses the dependency of local minima on initial centers in clustering, the need to try different starting points for obtaining different solutions, and the evaluation of objective function for selecting the best solution, with the focus on lloyd's algorithm for clustering.", 'duration': 114.649, 'highlights': ["Lloyd's algorithm requires trying different starting points to get different solutions, allowing the evaluation of the objective function for selecting the best solution, which usually works very nicely.", 'The convergence to a local minimum in clustering depends on the initial centers, similar to the situation with neural networks and backpropagation.', 'The algorithm works by taking the data points and carrying out the clustering, focusing on the inputs without the labels or the target function.']}, {'end': 2718.594, 'start': 2333.138, 'title': 'Unsupervised learning and clustering', 'summary': "Discusses the process of unsupervised learning, focusing on clustering, and compares it with supervised learning, emphasizing the use of random initialization and the convergence of centers in lloyd's algorithm, with a comparison of rbf centers and support vectors.", 'duration': 385.456, 'highlights': ["The convergence of centers in Lloyd's algorithm is illustrated through multiple iterations, showing the movement and eventual clustering of points, highlighting the rapid convergence and the reasonableness of the clustering despite the incidental nature of the data points.", 'The comparison between RBF centers and support vectors demonstrates the distinct purposes and approaches of each, emphasizing their roles in capturing data inputs and representing the separating surface, with a focus on the diversity of approaches despite using the same RBF kernel.', 'The process of unsupervised learning is explained, emphasizing the challenge of obtaining similarity based on inputs rather than behavior with the target function, and the potential issues arising from using unsupervised learning, such as the dilemma of propagating the influence of centers with both positive and negative points.']}], 'duration': 500.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2218409.jpg', 'highlights': ["Lloyd's algorithm requires trying different starting points to get different solutions, allowing the evaluation of the objective function for selecting the best solution, which usually works very nicely.", 'The convergence to a local minimum in clustering depends on the initial centers, similar to the situation with neural networks and backpropagation.', "The convergence of centers in Lloyd's algorithm is illustrated through multiple iterations, showing the movement and eventual clustering of points, highlighting the rapid convergence and the reasonableness of the clustering despite the incidental nature of the data points.", 'The comparison between RBF centers and support vectors demonstrates the distinct purposes and approaches of each, emphasizing their roles in capturing data inputs and representing the separating surface, with a focus on the diversity of approaches despite using the same RBF kernel.']}, {'end': 3276.951, 'segs': [{'end': 2743.448, 'src': 'embed', 'start': 2719.194, 'weight': 0, 'content': [{'end': 2729.998, 'text': 'So this was just to show you the difference between when you do the choice of important points in an unsupervised way and here patently in a supervised way.', 'start': 2719.194, 'duration': 10.804}, {'end': 2733.8, 'text': 'choosing the support vectors was very much dependent on the value of the target.', 'start': 2729.998, 'duration': 3.802}, {'end': 2739.644, 'text': 'The other thing you need to notice is that the support vectors have to be points from the data.', 'start': 2734.5, 'duration': 5.144}, {'end': 2743.448, 'text': "The mu's here are not points from the data.", 'start': 2741.386, 'duration': 2.062}], 'summary': 'Comparison of unsupervised and supervised selection of support vectors.', 'duration': 24.254, 'max_score': 2719.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2719194.jpg'}, {'end': 2799.722, 'src': 'embed', 'start': 2768.864, 'weight': 1, 'content': [{'end': 2771.726, 'text': 'I tell you K, capital K equals 9.', 'start': 2768.864, 'duration': 2.862}, {'end': 2774.507, 'text': "You go and you do your Lloyd's algorithm, and you come up with the centers.", 'start': 2771.726, 'duration': 2.781}, {'end': 2777.489, 'text': 'And half the problem of the choice is now solved.', 'start': 2774.787, 'duration': 2.702}, {'end': 2781.071, 'text': "And it's the big half, because the centers are vectors of d dimension.", 'start': 2777.749, 'duration': 3.322}, {'end': 2783.992, 'text': 'And now I found the centers without even touching the labels.', 'start': 2781.391, 'duration': 2.601}, {'end': 2785.033, 'text': "I didn't touch Yn.", 'start': 2784.072, 'duration': 0.961}, {'end': 2787.194, 'text': "So I know that I didn't contaminate anything.", 'start': 2785.433, 'duration': 1.761}, {'end': 2793.037, 'text': 'And indeed, I have only the weights, which happen to be capital K weights, determine using the labels.', 'start': 2787.474, 'duration': 5.563}, {'end': 2795.979, 'text': 'And therefore, I have good hopes for generalization.', 'start': 2793.337, 'duration': 2.642}, {'end': 2799.722, 'text': 'So now I look at here.', 'start': 2797.981, 'duration': 1.741}], 'summary': "Using lloyd's algorithm, k=9 centers found without touching labels, ensuring good generalization.", 'duration': 30.858, 'max_score': 2768.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2768864.jpg'}, {'end': 2838.882, 'src': 'embed', 'start': 2808.928, 'weight': 2, 'content': [{'end': 2811.77, 'text': 'I want this to be true for all the data points, if I can.', 'start': 2808.928, 'duration': 2.842}, {'end': 2816.954, 'text': 'And I ask myself, how many equations, how many unknowns? So I end up with n equations.', 'start': 2812.451, 'duration': 4.503}, {'end': 2819.495, 'text': 'Same thing, I want this to be true for all the data points.', 'start': 2817.154, 'duration': 2.341}, {'end': 2821.615, 'text': 'I have N data points, so I have n equations.', 'start': 2819.655, 'duration': 1.96}, {'end': 2825.377, 'text': "How many unknowns? The unknowns are the w's.", 'start': 2822.196, 'duration': 3.181}, {'end': 2827.258, 'text': 'And I have K of them.', 'start': 2825.957, 'duration': 1.301}, {'end': 2831.559, 'text': 'And K is less than n.', 'start': 2827.998, 'duration': 3.561}, {'end': 2833.92, 'text': 'I have more equations than unknowns.', 'start': 2831.559, 'duration': 2.361}, {'end': 2835.261, 'text': 'So something has to give.', 'start': 2834.32, 'duration': 0.941}, {'end': 2838.882, 'text': 'And this fellow is the one that has to give.', 'start': 2835.281, 'duration': 3.601}], 'summary': 'N data points lead to n equations with k unknowns, where k < n.', 'duration': 29.954, 'max_score': 2808.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2808928.jpg'}, {'end': 3067.692, 'src': 'heatmap', 'start': 2894.049, 'weight': 3, 'content': [{'end': 2896.97, 'text': 'You take phi transpose phi minus 1 times phi transpose y,', 'start': 2894.049, 'duration': 2.921}, {'end': 2901.012, 'text': 'and that will give you the value of w that minimizes the mean square difference between these guys.', 'start': 2896.97, 'duration': 4.042}, {'end': 2905.981, 'text': 'So you have the pseudo-inverse instead of the exact interpolation.', 'start': 2902.918, 'duration': 3.063}, {'end': 2911.085, 'text': 'And in this case, you are not guaranteed that you will get the correct value at every data point.', 'start': 2906.421, 'duration': 4.664}, {'end': 2913.727, 'text': 'So you are going to be making an in-sample error.', 'start': 2911.846, 'duration': 1.881}, {'end': 2915.409, 'text': 'But we know that this is not a bad thing.', 'start': 2913.947, 'duration': 1.462}, {'end': 2920.753, 'text': 'On the other hand, we are only determining capital K weights, so the chances of generalization are good.', 'start': 2915.889, 'duration': 4.864}, {'end': 2926.974, 'text': 'Now, I would like to take this and put it as a graphical network.', 'start': 2923.55, 'duration': 3.424}, {'end': 2929.717, 'text': 'And this will help me relate it to neural networks.', 'start': 2927.394, 'duration': 2.323}, {'end': 2931.318, 'text': 'This is the second link.', 'start': 2930.117, 'duration': 1.201}, {'end': 2935.363, 'text': 'We already related RBF to nearest neighbor methods, similarity methods.', 'start': 2931.699, 'duration': 3.664}, {'end': 2937.725, 'text': 'Now we are going to relate it to neural networks.', 'start': 2935.823, 'duration': 1.902}, {'end': 2939.407, 'text': 'Let me first put the diagram.', 'start': 2938.086, 'duration': 1.321}, {'end': 2943.242, 'text': "So here's my illustration of it.", 'start': 2941.721, 'duration': 1.521}, {'end': 2944.803, 'text': 'I have x.', 'start': 2943.282, 'duration': 1.521}, {'end': 2952.409, 'text': 'I am computing the radial aspect, the distance from mu 1, mu up to mu k.', 'start': 2944.803, 'duration': 7.606}, {'end': 2956.753, 'text': 'And then handing it to a nonlinearity, in this case the Gaussian nonlinearity.', 'start': 2952.409, 'duration': 4.344}, {'end': 2961.396, 'text': 'You can have other basis functions, like we had the cylinder in one case, but cylinder is a bit extreme.', 'start': 2956.773, 'duration': 4.623}, {'end': 2963.057, 'text': 'But there are other functions.', 'start': 2961.656, 'duration': 1.401}, {'end': 2971.444, 'text': 'You get features that are combined with weights in order to give you the output.', 'start': 2963.738, 'duration': 7.706}, {'end': 2976.528, 'text': 'Now, this one could be just passing the sum if you are doing regression.', 'start': 2972.784, 'duration': 3.744}, {'end': 2979.41, 'text': 'It could be hard threshold if you are doing classification.', 'start': 2976.568, 'duration': 2.842}, {'end': 2980.571, 'text': 'It could be something else.', 'start': 2979.43, 'duration': 1.141}, {'end': 2985.136, 'text': 'But what I care about is that this configuration looks familiar to us.', 'start': 2981.532, 'duration': 3.604}, {'end': 2986.237, 'text': "It's layers.", 'start': 2985.576, 'duration': 0.661}, {'end': 2989.36, 'text': 'I extract features, and then I go to output.', 'start': 2987.058, 'duration': 2.302}, {'end': 2990.821, 'text': "So let's look at the features.", 'start': 2989.8, 'duration': 1.021}, {'end': 3004.186, 'text': 'The features are These fellows, right? Now, if you look at these features, they depend on D.', 'start': 2992.523, 'duration': 11.663}, {'end': 3005.968, 'text': 'Mu, in general, are parameters.', 'start': 3004.186, 'duration': 1.782}, {'end': 3013.136, 'text': "If I didn't have this slick Lloyd's algorithm, and key means, and unsupervised thing, I need to determine what these guys are.", 'start': 3006.729, 'duration': 6.407}, {'end': 3018.787, 'text': 'And once you determine them, the value of the feature depends on the data set.', 'start': 3014.102, 'duration': 4.685}, {'end': 3023.292, 'text': 'And when the value of the feature depends on the data set, all bets are off.', 'start': 3019.688, 'duration': 3.604}, {'end': 3030.319, 'text': "It's no longer a linear model, pretty much like a neural network doing the first layer, extracting the features.", 'start': 3023.472, 'duration': 6.847}, {'end': 3039.618, 'text': "Now, the good thing is that because we used only the inputs in order to compute mu, it's almost linear.", 'start': 3031.556, 'duration': 8.062}, {'end': 3050.04, 'text': "We got the benefit of the pseudo-inverse because in this case we didn't have to go back and adjust mu because we don't like the value of the output.", 'start': 3040.318, 'duration': 9.722}, {'end': 3054.041, 'text': "These were frozen forever based on inputs, and then we only had to get the w's.", 'start': 3050.2, 'duration': 3.841}, {'end': 3059.982, 'text': "And the w's now look like multiplicative factors, in which case it's linear on those w's, and we get the solution.", 'start': 3054.341, 'duration': 5.641}, {'end': 3067.692, 'text': 'Now, in radial basis functions, there is often a bias term added.', 'start': 3062.89, 'duration': 4.802}], 'summary': 'Using radial basis functions in neural networks for feature extraction and linear model computation.', 'duration': 173.643, 'max_score': 2894.049, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2894049.jpg'}, {'end': 3119.091, 'src': 'embed', 'start': 3088.809, 'weight': 5, 'content': [{'end': 3091.63, 'text': 'So here is the RBF network.', 'start': 3088.809, 'duration': 2.821}, {'end': 3092.63, 'text': 'We just saw it.', 'start': 3092.05, 'duration': 0.58}, {'end': 3096.012, 'text': 'And I pointed x in red.', 'start': 3093.471, 'duration': 2.541}, {'end': 3099.813, 'text': 'This is what gets passed to this, gets the features, and gets you the output.', 'start': 3096.232, 'duration': 3.581}, {'end': 3104.995, 'text': 'And here is a neural network that is comparable in structure.', 'start': 3101.494, 'duration': 3.501}, {'end': 3108.497, 'text': 'So you start with the input, you start with the input.', 'start': 3106.696, 'duration': 1.801}, {'end': 3112.686, 'text': 'Now you compute features, and here you do.', 'start': 3109.724, 'duration': 2.962}, {'end': 3119.091, 'text': 'And the features here depend on the distance, and they are such that when the distance is large, the influence dies.', 'start': 3113.687, 'duration': 5.404}], 'summary': 'Comparison of rbf network and neural network structures for feature computation and influence based on distance.', 'duration': 30.282, 'max_score': 3088.809, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3088809.jpg'}, {'end': 3163.355, 'src': 'embed', 'start': 3142.817, 'weight': 4, 'content': [{'end': 3152.606, 'text': 'One interpretation is that what radial basis function networks do is look at local regions in the space and worry about them,', 'start': 3142.817, 'duration': 9.789}, {'end': 3154.467, 'text': 'without worrying about the faraway points.', 'start': 3152.606, 'duration': 1.861}, {'end': 3157.79, 'text': 'So I have a function that is in this space.', 'start': 3155.548, 'duration': 2.242}, {'end': 3159.752, 'text': 'I look at this part, and I want to learn it.', 'start': 3157.83, 'duration': 1.922}, {'end': 3163.355, 'text': 'So I get a basis function that captures it, or a couple of them, et cetera.', 'start': 3160.212, 'duration': 3.143}], 'summary': 'Radial basis function networks focus on local regions in space to capture functions.', 'duration': 20.538, 'max_score': 3142.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3142817.jpg'}, {'end': 3267.63, 'src': 'embed', 'start': 3238.755, 'weight': 6, 'content': [{'end': 3241.717, 'text': 'This is a two-layer network, and this is a two-layer network.', 'start': 3238.755, 'duration': 2.962}, {'end': 3250.001, 'text': 'And pretty much any two-layer network of this type of structure lends itself to being a support vector machine.', 'start': 3242.517, 'duration': 7.484}, {'end': 3257.605, 'text': 'The first layer takes care of the kernel, and the second one is the linear combination that is built in in support vector machines.', 'start': 3251.041, 'duration': 6.564}, {'end': 3260.846, 'text': 'So you can solve a support vector machine by choosing a kernel.', 'start': 3258.265, 'duration': 2.581}, {'end': 3267.63, 'text': 'And you can picture in your mind that I have one of those, where the first part is getting the kernel, and the second part is getting the linear part.', 'start': 3261.147, 'duration': 6.483}], 'summary': 'A two-layer network can function as a support vector machine with distinct roles for each layer.', 'duration': 28.875, 'max_score': 3238.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3238755.jpg'}], 'start': 2719.194, 'title': 'Svm and neural network comparison', 'summary': 'Compares supervised and unsupervised learning in support vector machines, emphasizing the importance of choosing support vectors and solving equations to obtain weights for generalization, and explores the similarities and differences between radial basis function (rbf) networks and neural networks, highlighting their respective focus and adaptability to support vector machine structures.', 'chapters': [{'end': 2913.727, 'start': 2719.194, 'title': 'Supervised vs unsupervised learning in support vector machines', 'summary': 'Explains the difference between supervised and unsupervised learning in support vector machines, emphasizing the importance of choosing support vectors, finding centers, and solving equations to obtain weights for generalization, with a focus on minimizing mean square difference.', 'duration': 194.533, 'highlights': ['The importance of choosing support vectors in a supervised way is emphasized as it is dependent on the value of the target. Supervised choice of support vectors depends on the target value.', "Centers are obtained using Lloyd's algorithm with capital K centers, solving half of the choice problem without touching the labels. Lloyd's algorithm with K centers solves half of the choice problem without using labels.", 'The process involves solving n equations with K unknowns, where K is less than n, leading to making the equations give in a mean squared sense. Solving n equations with K unknowns results in mean squared difference.', 'Using the pseudo-inverse of phi transpose phi, the weights are determined to minimize the mean square difference, although it may result in in-sample error. Weights are determined using pseudo-inverse to minimize mean square difference, leading to in-sample error.']}, {'end': 3276.951, 'start': 2913.947, 'title': 'Comparing rbf and neural networks', 'summary': 'Explores the similarities and differences between radial basis function (rbf) networks and neural networks, highlighting how rbf networks focus on local regions in the space and the almost linear relationship between the inputs and weights, while neural networks involve learned features and backpropagation for nonlinearity, with both types of networks being adaptable to support vector machine structures.', 'duration': 363.004, 'highlights': ['RBF networks focus on local regions in the space, resulting in minimal interference from faraway points, while neural networks are influenced by the combination of features and are not as locally focused. RBF networks emphasize local regions, minimizing interference from distant points, while neural networks are influenced by feature combinations and lack this local focus.', 'The almost linear relationship between the inputs and weights in RBF networks, due to the use of pseudo-inverse and the frozen nature of parameters, contrasts with the non-linear nature of learned features in neural networks obtained through backpropagation. RBF networks exhibit an almost linear relationship between inputs and weights, facilitated by the frozen nature of parameters and the use of pseudo-inverse, while neural networks involve non-linear learned features obtained through backpropagation.', 'Both RBF and neural networks can be structured as two-layer networks, aligning with the adaptability of this structure to support vector machine implementation. RBF and neural networks can be structured as two-layer networks, demonstrating their adaptability to support vector machine implementation.']}], 'duration': 557.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc2719194.jpg', 'highlights': ['Supervised choice of support vectors depends on the target value.', "Lloyd's algorithm with K centers solves half of the choice problem without using labels.", 'Solving n equations with K unknowns results in mean squared difference.', 'Weights are determined using pseudo-inverse to minimize mean square difference, leading to in-sample error.', 'RBF networks emphasize local regions, minimizing interference from distant points, while neural networks are influenced by feature combinations and lack this local focus.', 'RBF networks exhibit an almost linear relationship between inputs and weights, facilitated by the frozen nature of parameters and the use of pseudo-inverse, while neural networks involve non-linear learned features obtained through backpropagation.', 'RBF and neural networks can be structured as two-layer networks, demonstrating their adaptability to support vector machine implementation.']}, {'end': 3608.356, 'segs': [{'end': 3333.321, 'src': 'embed', 'start': 3306.883, 'weight': 0, 'content': [{'end': 3311.164, 'text': 'Now I have parameters w1, wk, and then I have also gamma.', 'start': 3306.883, 'duration': 4.281}, {'end': 3316.268, 'text': 'And you can see that this is actually pretty important because, as you saw, if we choose it wrong,', 'start': 3311.744, 'duration': 4.524}, {'end': 3321.652, 'text': 'the interpolation becomes very poor and it does depend on the spacing in the data set and whatnot.', 'start': 3316.268, 'duration': 5.384}, {'end': 3327.696, 'text': 'So it might be a good idea to choose gamma in order to also minimize the in-sample error, get performance.', 'start': 3321.992, 'duration': 5.704}, {'end': 3333.321, 'text': 'So of course I could do that, and I could do it for w, for all I care.', 'start': 3328.817, 'duration': 4.504}], 'summary': 'Choosing the right parameters like w1, wk, and gamma is crucial for improving interpolation and minimizing in-sample error for better performance.', 'duration': 26.438, 'max_score': 3306.883, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3306883.jpg'}, {'end': 3443.458, 'src': 'embed', 'start': 3353.33, 'weight': 1, 'content': [{'end': 3356.971, 'text': 'Start with random values, and then descend, and then you get a solution.', 'start': 3353.33, 'duration': 3.641}, {'end': 3363.633, 'text': 'However, it will be a shame to do that, because these guys have such a simple algorithm that goes with them.', 'start': 3357.791, 'duration': 5.842}, {'end': 3365.774, 'text': 'If gamma is fixed, this is a snap.', 'start': 3364.093, 'duration': 1.681}, {'end': 3367.935, 'text': 'You do the pseudo-inference, and you get exactly that.', 'start': 3366.174, 'duration': 1.761}, {'end': 3374.821, 'text': "So it is a good idea to separate that for this one, it's inside the exponential and this and that.", 'start': 3368.695, 'duration': 6.126}, {'end': 3376.923, 'text': "I don't think I have any hope of finding a shortcut.", 'start': 3374.921, 'duration': 2.002}, {'end': 3379.225, 'text': 'I probably will have to do gradient descent for this guy.', 'start': 3377.023, 'duration': 2.202}, {'end': 3382.868, 'text': 'But I might as well do gradient descent for this guy, not for these guys.', 'start': 3379.725, 'duration': 3.143}, {'end': 3384.45, 'text': 'And the way this is done.', 'start': 3383.469, 'duration': 0.981}, {'end': 3387.18, 'text': 'is by an iterative approach.', 'start': 3385.879, 'duration': 1.301}, {'end': 3390.021, 'text': 'You fix one, and solve for the others.', 'start': 3388.02, 'duration': 2.001}, {'end': 3392.703, 'text': 'This seems to be the theme of the lecture.', 'start': 3391.302, 'duration': 1.401}, {'end': 3397.966, 'text': 'And in this case, it is a pretty famous algorithm, a variation of that algorithm.', 'start': 3393.864, 'duration': 4.102}, {'end': 3401.468, 'text': 'The algorithm is called EM, expectation maximization.', 'start': 3398.106, 'duration': 3.362}, {'end': 3409.272, 'text': 'And it is used for solving the case of mixture of Gaussians, which we actually have, except that we are not calling them probabilities.', 'start': 3402.248, 'duration': 7.024}, {'end': 3412.354, 'text': 'We are calling them bases that are implementing a target.', 'start': 3409.292, 'duration': 3.062}, {'end': 3413.895, 'text': 'So here is the idea.', 'start': 3413.014, 'duration': 0.881}, {'end': 3416.699, 'text': 'Fix gamma.', 'start': 3416.038, 'duration': 0.661}, {'end': 3419.322, 'text': 'That we have done before.', 'start': 3418.461, 'duration': 0.861}, {'end': 3420.784, 'text': 'We have been fixing gamma all through.', 'start': 3419.362, 'duration': 1.422}, {'end': 3428.052, 'text': 'So if you want to solve for w based on fixing gamma, you just solve for it using the pseudo-inverse.', 'start': 3421.565, 'duration': 6.487}, {'end': 3429.434, 'text': "So now we have w's.", 'start': 3428.553, 'duration': 0.881}, {'end': 3432.998, 'text': 'Now you fix them.', 'start': 3432.358, 'duration': 0.64}, {'end': 3433.739, 'text': 'They are frozen.', 'start': 3433.179, 'duration': 0.56}, {'end': 3439.155, 'text': 'And you minimize the error, the squared error, with respect to gamma, one parameter.', 'start': 3435.573, 'duration': 3.582}, {'end': 3442.197, 'text': 'It would be pretty easy to gradient descent with respect to one parameter.', 'start': 3439.395, 'duration': 2.802}, {'end': 3443.458, 'text': 'You find the minimum.', 'start': 3442.657, 'duration': 0.801}], 'summary': 'Simple algorithm em used for solving mixture of gaussians with iterative approach and fixed parameters.', 'duration': 90.128, 'max_score': 3353.33, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3353330.jpg'}, {'end': 3557.006, 'src': 'embed', 'start': 3507.416, 'weight': 5, 'content': [{'end': 3511.72, 'text': 'And in that case, you adjust the width of the Gaussian according to the region you are in the space.', 'start': 3507.416, 'duration': 4.304}, {'end': 3523.58, 'text': "Now, very quickly, I'm going to go through two aspects of RBF, one of them relating it to kernel methods, which we already have seen the beginning of.", 'start': 3514.935, 'duration': 8.645}, {'end': 3526.601, 'text': "We have used it as a kernel, so we'd like to compare the performance.", 'start': 3523.98, 'duration': 2.621}, {'end': 3529.543, 'text': 'And then I will relate it to regularization.', 'start': 3527.142, 'duration': 2.401}, {'end': 3535.286, 'text': "It's interesting that RBFs, as I described them like, intuitive local influence all of that,", 'start': 3529.643, 'duration': 5.643}, {'end': 3539.949, 'text': 'you will find in a moment that they are completely based on regularization.', 'start': 3535.286, 'duration': 4.663}, {'end': 3543.451, 'text': "And that's how they arose in the first place in function approximation.", 'start': 3539.989, 'duration': 3.462}, {'end': 3549.317, 'text': "So let's do the RBF versus its kernel version.", 'start': 3545.092, 'duration': 4.225}, {'end': 3554.223, 'text': 'Last lecture, we had a kernel, which is the RBF kernel.', 'start': 3550.439, 'duration': 3.784}, {'end': 3557.006, 'text': 'And we had a solution with 9 support vectors.', 'start': 3554.724, 'duration': 2.282}], 'summary': 'Rbf has intuitive local influence, based on regularization, with 9 support vectors for function approximation.', 'duration': 49.59, 'max_score': 3507.416, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3507416.jpg'}], 'start': 3277.251, 'title': 'Gaussian parameter optimization and em algorithm for mixture of gaussians', 'summary': 'Discusses the optimization of parameters for gaussian-based neural networks, focusing on the significance of choosing the width parameter (gamma) and the efficiency of the algorithm when gamma is fixed. it also covers the em algorithm for solving the case of mixture of gaussians, emphasizing the iterative process of fixing parameters, gradient descent, convergence, and the relation of rbf to kernel methods and regularization.', 'chapters': [{'end': 3392.703, 'start': 3277.251, 'title': 'Gaussian parameter optimization for neural networks', 'summary': 'Discusses the optimization of parameters for gaussian-based neural networks, emphasizing the significance of choosing the width parameter (gamma) to minimize in-sample error and improve performance, while also highlighting the efficiency of the algorithm when gamma is fixed.', 'duration': 115.452, 'highlights': ['The significance of choosing the width parameter (gamma) to minimize in-sample error and improve performance is emphasized, with the consequences of choosing it wrong leading to poor interpolation and dependency on the spacing in the dataset.', 'The efficiency of the algorithm when gamma is fixed is highlighted, indicating that the pseudo-inference process becomes straightforward, contrasting with the iterative approach required when gamma is treated as a genuine parameter.', 'The iterative approach of fixing one parameter and solving for the others is mentioned as the theme of the lecture, demonstrating the process of parameter optimization for Gaussian-based neural networks.']}, {'end': 3608.356, 'start': 3393.864, 'title': 'Em algorithm for mixture of gaussians', 'summary': 'Discusses the em algorithm for solving the case of mixture of gaussians, emphasizing the iterative process of fixing parameters, gradient descent, convergence, and the relation of rbf to kernel methods and regularization.', 'duration': 214.492, 'highlights': ['The algorithm EM (Expectation Maximization) is used for solving the case of mixture of Gaussians, with an iterative approach of fixing parameters, performing gradient descent, and achieving convergence. The EM algorithm is utilized for solving the case of mixture of Gaussians, employing an iterative process of fixing parameters, performing gradient descent, and achieving convergence.', "The iterative process involves fixing parameters such as w and gamma, performing gradient descent, and minimizing the error with respect to gamma, leading to a combination of w's and gammas. The iterative process includes fixing parameters like w and gamma, performing gradient descent, and minimizing the error with respect to gamma, resulting in a combination of w's and gammas.", 'The method involves adjusting the width of the Gaussian according to the region in the space, and it is related to regularization in function approximation. The method includes adjusting the width of the Gaussian according to the region in the space and is related to regularization in function approximation.', 'The discussion also covers the relation of RBF to kernel methods and its performance comparison, as well as its usage in support vector machines for classification. The discussion also encompasses the relation of RBF to kernel methods, its performance comparison, and its usage in support vector machines for classification.']}], 'duration': 331.105, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3277251.jpg', 'highlights': ['The significance of choosing the width parameter (gamma) to minimize in-sample error and improve performance is emphasized, with the consequences of choosing it wrong leading to poor interpolation and dependency on the spacing in the dataset.', 'The efficiency of the algorithm when gamma is fixed is highlighted, indicating that the pseudo-inference process becomes straightforward, contrasting with the iterative approach required when gamma is treated as a genuine parameter.', 'The iterative approach of fixing one parameter and solving for the others is mentioned as the theme of the lecture, demonstrating the process of parameter optimization for Gaussian-based neural networks.', 'The algorithm EM (Expectation Maximization) is used for solving the case of mixture of Gaussians, with an iterative approach of fixing parameters, performing gradient descent, and achieving convergence.', "The iterative process involves fixing parameters such as w and gamma, performing gradient descent, and minimizing the error with respect to gamma, leading to a combination of w's and gammas.", 'The method involves adjusting the width of the Gaussian according to the region in the space, and it is related to regularization in function approximation.', 'The discussion also covers the relation of RBF to kernel methods and its performance comparison, as well as its usage in support vector machines for classification.']}, {'end': 4005.396, 'segs': [{'end': 3723.173, 'src': 'embed', 'start': 3696.223, 'weight': 0, 'content': [{'end': 3703.046, 'text': "Just to be fair to the poor straight RBF implementation, The data doesn't cluster normally.", 'start': 3696.223, 'duration': 6.823}, {'end': 3706.047, 'text': 'And I chose the 9, because I got 9 here.', 'start': 3703.666, 'duration': 2.381}, {'end': 3709.408, 'text': 'So the SVM has the home advantage here.', 'start': 3706.667, 'duration': 2.741}, {'end': 3710.869, 'text': 'This is just a comparison.', 'start': 3710.008, 'duration': 0.861}, {'end': 3712.449, 'text': "I didn't optimize the number of things.", 'start': 3711.109, 'duration': 1.34}, {'end': 3713.69, 'text': "I didn't do anything.", 'start': 3712.489, 'duration': 1.201}, {'end': 3717.831, 'text': "So if this guy ends up performing better, it's better.", 'start': 3714.39, 'duration': 3.441}, {'end': 3718.632, 'text': 'SVM is good.', 'start': 3717.991, 'duration': 0.641}, {'end': 3723.173, 'text': 'But it really has a little bit of unfair advantage in this comparison.', 'start': 3719.032, 'duration': 4.141}], 'summary': 'Svm with 9 data points has home advantage, but may have unfair advantage in comparison.', 'duration': 26.95, 'max_score': 3696.223, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3696223.jpg'}, {'end': 3776.982, 'src': 'embed', 'start': 3748.874, 'weight': 2, 'content': [{'end': 3752.979, 'text': 'And the first thing you realize is that the in-sample error is not 0.', 'start': 3748.874, 'duration': 4.105}, {'end': 3754.24, 'text': 'There are points that are misclassified.', 'start': 3752.979, 'duration': 1.261}, {'end': 3754.921, 'text': 'Not a surprise.', 'start': 3754.28, 'duration': 0.641}, {'end': 3758.204, 'text': "I had only k centers, and I'm trying to minimize mean square error.", 'start': 3754.941, 'duration': 3.263}, {'end': 3762.568, 'text': 'It is possible that some points close to the boundary will go one way or the other.', 'start': 3758.524, 'duration': 4.044}, {'end': 3765.871, 'text': "I'm interpreting the signal as being closer to plus 1 or minus 1.", 'start': 3762.608, 'duration': 3.263}, {'end': 3767.413, 'text': "Sometimes it will cross, and that's what I get.", 'start': 3765.871, 'duration': 1.542}, {'end': 3769.335, 'text': 'So this is the guy that I get.', 'start': 3767.793, 'duration': 1.542}, {'end': 3772.818, 'text': 'Here is the guy that I got last time from the SVM.', 'start': 3770.376, 'duration': 2.442}, {'end': 3776.982, 'text': 'Rather interesting.', 'start': 3776.181, 'duration': 0.801}], 'summary': 'Using k centers to minimize mean square error, some points close to the boundary may be misclassified in svm.', 'duration': 28.108, 'max_score': 3748.874, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3748874.jpg'}, {'end': 3837.361, 'src': 'embed', 'start': 3812.28, 'weight': 1, 'content': [{'end': 3819.847, 'text': 'You know the ramifications of doing unsupervised learning and what you miss out by choosing the centers without knowing the label versus the advantages of support vectors and whatnot.', 'start': 3812.28, 'duration': 7.567}, {'end': 3826.418, 'text': 'So the final item that I promised was RBF versus regularization.', 'start': 3822.677, 'duration': 3.741}, {'end': 3834.74, 'text': 'It turns out that you can derive RBFs entirely based on regularization.', 'start': 3827.018, 'duration': 7.722}, {'end': 3837.361, 'text': "You're not talking about inference of a point.", 'start': 3835.601, 'duration': 1.76}], 'summary': 'Unsupervised learning vs. support vectors and rbfs, based on regularization', 'duration': 25.081, 'max_score': 3812.28, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3812280.jpg'}], 'start': 3609.397, 'title': 'Rbf implementation and function approximation', 'summary': 'Compares two methods of implementing rbf, highlighting the unfair advantage of svm due to abnormal data clustering. it also discusses in-sample error, support vector machines, and rbfs in function approximation, emphasizing the importance of smoothness constraints and the advantages of using support vectors over unsupervised learning.', 'chapters': [{'end': 3748.293, 'start': 3609.397, 'title': 'Comparison of rbf implementation', 'summary': 'Compares two routes of implementing rbf, one using linear regression after unsupervised learning of centers, and the other using svm with 9 centers, highlighting the unfair advantage of svm in this comparison due to the data not clustering normally.', 'duration': 138.896, 'highlights': ['The SVM implementation has a home advantage due to the data not clustering normally, and it unfairly outperforms the regular RBF implementation.', 'The chapter compares two routes of implementing RBF: one using linear regression after unsupervised learning of centers, and the other using SVM with 9 centers.', 'The number of terms in both implementations is 9, with the addition of bias in the SVM implementation for exact comparability.', 'The centers in the SVM implementation are general centers, mu k, which do not have to be points from the data set, while in the regular RBF implementation, the centers are points from the data set.', 'The two routes of RBF implementation involve completely different methods: linear regression after unsupervised learning of centers and SVM with maximized margin and equated with a kernel.']}, {'end': 4005.396, 'start': 3748.874, 'title': 'Svm and rbf in function approximation', 'summary': 'Discusses the in-sample error, support vector machines, and rbfs in function approximation, emphasizing the importance of smoothness constraints and the advantages of using support vectors over unsupervised learning.', 'duration': 256.522, 'highlights': ['The chapter discusses the in-sample error and the benefits of using support vectors in function approximation. The in-sample error is not 0, and the speaker emphasizes the benefits of using support vectors over unsupervised learning.', 'The speaker compares two solutions from support vector machines and emphasizes the benefits of using support vectors. The speaker compares two solutions from support vector machines and highlights the advantages of using support vectors, mentioning the improvement in tracking and achieving 0 in-sample error.', 'The chapter explains the derivation of RBFs based on regularization and the importance of smoothness constraints. The chapter explains the derivation of RBFs based on regularization, emphasizing the importance of smoothness constraints and the estimation of derivative size for smoothness.']}], 'duration': 395.999, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc3609397.jpg', 'highlights': ['The SVM implementation has a home advantage due to the data not clustering normally, and it unfairly outperforms the regular RBF implementation.', 'The chapter compares two routes of implementing RBF: one using linear regression after unsupervised learning of centers, and the other using SVM with 9 centers.', 'The chapter discusses the in-sample error and the benefits of using support vectors in function approximation. The in-sample error is not 0, and the speaker emphasizes the benefits of using support vectors over unsupervised learning.', 'The centers in the SVM implementation are general centers, mu k, which do not have to be points from the data set, while in the regular RBF implementation, the centers are points from the data set.', 'The chapter explains the derivation of RBFs based on regularization and the importance of smoothness constraints. The chapter explains the derivation of RBFs based on regularization, emphasizing the importance of smoothness constraints and the estimation of derivative size for smoothness.']}, {'end': 4914.837, 'segs': [{'end': 4108.243, 'src': 'embed', 'start': 4077.07, 'weight': 0, 'content': [{'end': 4089.134, 'text': 'First, can you explain again, how does an SVM simulate a two-level neural network? Look at the RBF in order to get a hint.', 'start': 4077.07, 'duration': 12.064}, {'end': 4101.434, 'text': 'What does this feature do? It actually computes the kernel, right? Think of what this guy is doing as implementing the kernel.', 'start': 4090.454, 'duration': 10.98}, {'end': 4108.243, 'text': "What is it implementing? It's implementing theta, the sigmoidal function, the tension in this case, of this guy.", 'start': 4101.635, 'duration': 6.608}], 'summary': 'Svm simulates a two-level neural network by computing the kernel and implementing the sigmoidal function.', 'duration': 31.173, 'max_score': 4077.07, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4077070.jpg'}, {'end': 4392.105, 'src': 'embed', 'start': 4362.04, 'weight': 1, 'content': [{'end': 4369.146, 'text': "The way I'm going to take advantage of it is to separate the variables into two groups, the expectation and the maximization.", 'start': 4362.04, 'duration': 7.106}, {'end': 4370.947, 'text': "That's according to the EM algorithm.", 'start': 4369.266, 'duration': 1.681}, {'end': 4376.492, 'text': "And when I fix one of them, when I fix gamma, then I can solve for wk's directly.", 'start': 4371.708, 'duration': 4.784}, {'end': 4377.333, 'text': 'I get them.', 'start': 4376.872, 'duration': 0.461}, {'end': 4378.274, 'text': "So that's one step.", 'start': 4377.593, 'duration': 0.681}, {'end': 4384.799, 'text': "And then I fix w's that I have, and then try to optimize with respect to gamma according to the mean square error.", 'start': 4378.994, 'duration': 5.805}, {'end': 4392.105, 'text': "So I take this guy with w's being constant, gamma being a variable, and I apply this to every point in the training.", 'start': 4384.819, 'duration': 7.286}], 'summary': 'Using em algorithm to optimize variables and minimize mean square error.', 'duration': 30.065, 'max_score': 4362.04, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4362040.jpg'}, {'end': 4542.845, 'src': 'embed', 'start': 4515.429, 'weight': 2, 'content': [{'end': 4519.111, 'text': 'It turns out in hindsight that this is the underlying assumption.', 'start': 4515.429, 'duration': 3.682}, {'end': 4524.955, 'text': 'Because when we looked at solving the approximation problem with smoothness, we ended up with those radial basis functions.', 'start': 4520.232, 'duration': 4.723}, {'end': 4527.337, 'text': "There is another motivation which I didn't refer to.", 'start': 4525.315, 'duration': 2.022}, {'end': 4528.938, 'text': "It's a good opportunity to raise it.", 'start': 4527.377, 'duration': 1.561}, {'end': 4540.024, 'text': "Let's say that I have a data set, up to And I'm going to assume that there is noise, but it's a funny noise.", 'start': 4529.54, 'duration': 10.484}, {'end': 4542.845, 'text': "It's not noise in the value y.", 'start': 4540.284, 'duration': 2.561}], 'summary': 'Using smoothness to solve approximation problem leads to radial basis functions, with a mention of data set and noise.', 'duration': 27.416, 'max_score': 4515.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4515429.jpg'}, {'end': 4683.161, 'src': 'embed', 'start': 4650.789, 'weight': 4, 'content': [{'end': 4652.631, 'text': 'Because by the time you get here, it will have died out.', 'start': 4650.789, 'duration': 1.842}, {'end': 4653.471, 'text': "So it's all relative.", 'start': 4652.731, 'duration': 0.74}, {'end': 4662.135, 'text': "But, relatively speaking, It's a good idea to have the width of the Gaussian comparable to the distances between the points,", 'start': 4653.892, 'duration': 8.243}, {'end': 4663.755, 'text': 'so that there is a genuine interpolation.', 'start': 4662.135, 'duration': 1.62}, {'end': 4669.797, 'text': 'And the objective criteria for choosing gamma will affect that.', 'start': 4664.916, 'duration': 4.881}, {'end': 4673.558, 'text': 'Because when we solve for gamma, we are using the k centers.', 'start': 4669.857, 'duration': 3.701}, {'end': 4678.259, 'text': 'So you have points that have the center of the Gaussian.', 'start': 4674.478, 'duration': 3.781}, {'end': 4683.161, 'text': 'But you need to worry about that Gaussian covering the data points that are nearby.', 'start': 4678.599, 'duration': 4.562}], 'summary': 'Choosing width of gaussian relative to data points for genuine interpolation.', 'duration': 32.372, 'max_score': 4650.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4650789.jpg'}, {'end': 4901.524, 'src': 'embed', 'start': 4876.729, 'weight': 3, 'content': [{'end': 4883.974, 'text': 'Are there cases when RBFs are actually better than SVMs? You can run them in a number of cases.', 'start': 4876.729, 'duration': 7.245}, {'end': 4890.738, 'text': 'And if the data is clustered in a particular way and the clusters happen to have a common value,', 'start': 4884.154, 'duration': 6.584}, {'end': 4894.22, 'text': 'then you would expect that doing the unsupervised learning will get me ahead.', 'start': 4890.738, 'duration': 3.482}, {'end': 4901.524, 'text': 'Whereas the SVMs now are on the boundary, and they have to be such that the cancellations of RBFs will give me the right value.', 'start': 4894.78, 'duration': 6.744}], 'summary': 'Rbfs are better than svms in clustered data with common value for unsupervised learning.', 'duration': 24.795, 'max_score': 4876.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4876729.jpg'}], 'start': 4005.816, 'title': 'Radial basis functions and svms', 'summary': 'Covers the interpretation of radial basis functions for smooth interpolation, svms simulating two-layer neural networks, challenges of rbf in high dimensional spaces, impact of gamma in clustering, and the use of rbfs over svms in certain cases.', 'chapters': [{'end': 4228.522, 'start': 4005.816, 'title': 'Understanding radial basis functions and svms', 'summary': 'Explains how radial basis functions are inherently self-regularized and their interpretation in providing smooth interpolation, as well as how svms simulate two-layer neural networks by implementing the kernel and support vector machinery.', 'duration': 222.706, 'highlights': ['Radial basis functions are inherently self-regularized and provide the smoothest interpolation. Radial basis functions are interpreted as inherently self-regularized, providing the smoothest interpolation, emphasizing the preference for smoothness versus fitting.', 'SVMs simulate two-layer neural networks by implementing the kernel and support vector machinery. SVMs simulate two-layer neural networks through the implementation of the kernel and the support vector machinery, with the kernel representing the sigmoidal function and the support vectors becoming the units of the second layer.', 'Choosing the number of centers in clustering is a challenging open question with various heuristics for determining the meritorious addition of a center based on the minimization of the objective function. Choosing the number of centers in clustering remains an open question, with heuristics suggesting the meritorious addition of a center based on the minimization of the objective function, indicating the difficulty in determining the optimal number of centers.']}, {'end': 4484.221, 'start': 4228.943, 'title': 'Rbf: cross-validation and high dimensionality', 'summary': 'Discusses the challenges of determining the number of clusters for rbf, the practicality of rbf in high dimensional input spaces, the choice of gamma, and the use of em algorithm for optimization, while highlighting the ease of convergence and the minimal use of hidden layers in neural networks.', 'duration': 255.278, 'highlights': ["The EM algorithm is used to separate variables into two groups, allowing for the direct solving of wk's when gamma is fixed, followed by the optimization with respect to gamma, resulting in efficient convergence and a successful algorithm in practice.", "It's not necessary to have more than two layers in neural networks for approximation purposes, with the minority of users actually employing more than two layers, highlighting the minimal restriction imposed by support vector machines.", 'Determining the number of clusters for RBF is challenging, with a reasonable number of clusters often providing comparable performance, while the absolute hit in terms of the number of clusters needed is rare.', 'RBF may face difficulties in high dimensional input spaces, with issues such as funny distances and sparsity affecting the choice of gamma and the expectation of good interpolation.', 'The curse of dimensionality is inherent in RBF, as in other methods, with difficulties arising due to the high dimensional space and few points, making it difficult to expect good interpolation.']}, {'end': 4694.072, 'start': 4484.241, 'title': 'Radial basis functions and gaussian interpolation', 'summary': 'Discusses the motivation behind using radial basis functions for interpolation, including the assumption of smoothness, consideration of input noise, and the impact of choosing the width parameter (gamma) for gaussian interpolation.', 'duration': 209.831, 'highlights': ['The assumption of smoothness leads to the use of radial basis functions for solving the approximation problem with smoothness, with the underlying assumption being that the target function should be smooth.', 'Consideration of input noise, particularly the assumption of Gaussian noise in the input, leads to the realization that the hypothesis value should not change much by changing x, resulting in Gaussian interpolation.', 'Choosing the width parameter (gamma) for Gaussian interpolation should involve making the width of the Gaussian comparable to the distances between the points to achieve genuine interpolation, with objective criteria for choosing gamma based on the k centers and the coverage of nearby data points.']}, {'end': 4914.837, 'start': 4695.093, 'title': 'Understanding the role of gamma and clustering', 'summary': 'Discusses the importance of gamma in clustering, the relationship between the number of clusters and vc dimension, and the potential use of rbfs over svms in certain cases.', 'duration': 219.744, 'highlights': ['The number of clusters and the VC dimension are related, as the choice of the number of clusters affects the complexity of the hypothesis set.', 'In cases where the data is clustered in a particular way and the clusters have a common value, unsupervised learning with RBFs may outperform SVMs.', 'The main utility of gamma is for the K-centers, with both cases having an in-sample error of 0 and the same generalization behavior.', 'In some scenarios, a half-cooked clustering approach can be used to represent some points before the supervised stage of learning takes over.', 'Choosing the number of clusters is related to the number of parameters, ultimately affecting the VC dimension.']}], 'duration': 909.021, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/O8CfrnOPtLc/pics/O8CfrnOPtLc4005816.jpg', 'highlights': ['SVMs simulate two-layer neural networks through the implementation of the kernel and the support vector machinery, with the kernel representing the sigmoidal function and the support vectors becoming the units of the second layer.', "The EM algorithm is used to separate variables into two groups, allowing for the direct solving of wk's when gamma is fixed, followed by the optimization with respect to gamma, resulting in efficient convergence and a successful algorithm in practice.", 'The assumption of smoothness leads to the use of radial basis functions for solving the approximation problem with smoothness, with the underlying assumption being that the target function should be smooth.', 'In cases where the data is clustered in a particular way and the clusters have a common value, unsupervised learning with RBFs may outperform SVMs.', 'Choosing the width parameter (gamma) for Gaussian interpolation should involve making the width of the Gaussian comparable to the distances between the points to achieve genuine interpolation, with objective criteria for choosing gamma based on the k centers and the coverage of nearby data points.']}], 'highlights': ['The radial basis functions (RBF) serve as a technique for classification with minimal overhead, often chosen as the model of choice by many, and its ability to relate to various facets of machine learning.', 'The chapter discusses generalizing the basic SVM algorithm to accommodate possibly infinite feature spaces without explicit knowledge or transformation.', 'The learning algorithm aims to minimize the in-sample error to 0 by evaluating the hypothesis on the data points.', 'The significance of the number of parameters and data points in relation to minimizing the in-sample error is highlighted.', 'The model for linear regression in its simplest form is explained, with emphasis on using a Gaussian model as the functional form for the hypothesis.', 'The discussion on the nearest neighbor method details the classification of points based on the label of the nearest point within the training set.', "Lloyd's algorithm requires trying different starting points to get different solutions, allowing the evaluation of the objective function for selecting the best solution, which usually works very nicely.", 'Supervised choice of support vectors depends on the target value.', 'The significance of choosing the width parameter (gamma) to minimize in-sample error and improve performance is emphasized, with the consequences of choosing it wrong leading to poor interpolation and dependency on the spacing in the dataset.', 'SVMs simulate two-layer neural networks through the implementation of the kernel and the support vector machinery, with the kernel representing the sigmoidal function and the support vectors becoming the units of the second layer.']}