title

Support Vector Machines Part 1 (of 3): Main Ideas!!!

description

Support Vector Machines are one of the most mysterious methods in Machine Learning. This StatQuest sweeps away the mystery to let know how they work.
Part 2: The Polynomial Kernel: https://youtu.be/Toet3EiSFcM
Part 3: The Radial (RBF) Kernel: https://youtu.be/Qc5IyLW_hns
NOTE: This StatQuest assumes you already know about...
The bias/variance tradeoff: https://youtu.be/EuBBz3bI-aA
Cross Validation: https://youtu.be/fSytzGwwBVw
ALSO NOTE: This StatQuest is based on description of Support Vector Machines, and associated concepts, found on pages 337 to 354 of the Introduction to Statistical Learning in R: http://faculty.marshall.usc.edu/gareth-james/ISL/
I also found this blogpost helpful for understanding the Kernel Trick: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:40 Basic concepts and Maximal Margin Classifiers
4:35 Soft Margins (allowing misclassifications)
6:46 Soft Margin and Support Vector Classifiers
12:23 Intuition behind Support Vector Machines
15:25 The polynomial kernel function
17:30 The radial basis function (RBF) kernel
18:32 The kernel trick
19:31 Summary of concepts
#statquest #SVM

detail

{'title': 'Support Vector Machines Part 1 (of 3): Main Ideas!!!', 'heatmap': [{'end': 344.568, 'start': 317.907, 'weight': 0.728}, {'end': 449.053, 'start': 357.227, 'weight': 0.794}, {'end': 978.386, 'start': 956.588, 'weight': 0.725}], 'summary': 'Covers support vector machines, maximal margin classifiers, threshold impact on misclassifications, support vector classifiers in multi-dimensional spaces, support vector machines as an improvement over maximal margin classifiers, and the use of kernel functions to systematically increase dimensions and find support vector classifiers, including polynomial and radial kernels.', 'chapters': [{'end': 269.556, 'segs': [{'end': 31.034, 'src': 'embed', 'start': 0.209, 'weight': 0, 'content': [{'end': 6.215, 'text': 'Support vector machines have a lot of terminology associated with them.', 'start': 0.209, 'duration': 6.006}, {'end': 7.676, 'text': 'Brace yourself.', 'start': 6.935, 'duration': 0.741}, {'end': 15.403, 'text': "StatQuest Hello, I'm Josh Starmer and welcome to StatQuest.", 'start': 9.678, 'duration': 5.725}, {'end': 20.588, 'text': "Today we're going to talk about support vector machines and they're going to be clearly explained.", 'start': 15.743, 'duration': 4.845}, {'end': 31.034, 'text': 'This stat quest assumes that you are already familiar with the trade-off that plagues all of machine learning, the bias-variance trade-off.', 'start': 22.129, 'duration': 8.905}], 'summary': "Introduction to support vector machines explained by statquest's josh starmer.", 'duration': 30.825, 'max_score': 0.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE209.jpg'}, {'end': 211.979, 'src': 'embed', 'start': 181.887, 'weight': 1, 'content': [{'end': 185.049, 'text': 'and thus the margin would be smaller than it was before.', 'start': 181.887, 'duration': 3.162}, {'end': 195.27, 'text': 'And if we move the threshold to the right a little bit, then the distance between the obese observation and the threshold would get smaller.', 'start': 186.971, 'duration': 8.299}, {'end': 198.912, 'text': 'And again, the margin would be smaller.', 'start': 196.511, 'duration': 2.401}, {'end': 206.136, 'text': 'When we use the threshold that gives us the largest margin to make classifications.', 'start': 200.493, 'duration': 5.643}, {'end': 209.097, 'text': 'heads up terminology alert.', 'start': 206.136, 'duration': 2.961}, {'end': 211.979, 'text': 'we are using a maximal margin classifier.', 'start': 209.097, 'duration': 2.882}], 'summary': 'Using a maximal margin classifier, adjusting the threshold can decrease the margin between obese observation and the threshold.', 'duration': 30.092, 'max_score': 181.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE181887.jpg'}, {'end': 281.282, 'src': 'embed', 'start': 247.799, 'weight': 3, 'content': [{'end': 254.644, 'text': 'Now, if we got this new observation, we would classify it as not obese,', 'start': 247.799, 'duration': 6.845}, {'end': 260.608, 'text': 'even though most of the not obese observations are much further away than the obese observations.', 'start': 254.644, 'duration': 5.964}, {'end': 269.556, 'text': 'So, maximum margin classifiers are super sensitive to outliers in the training data, and that makes them pretty lame.', 'start': 262.41, 'duration': 7.146}, {'end': 281.282, 'text': 'Can we do better? Yes! To make a threshold that is not so sensitive to outliers, we must allow misclassifications.', 'start': 271.017, 'duration': 10.265}], 'summary': 'Maximum margin classifiers are sensitive to outliers, leading to misclassifications.', 'duration': 33.483, 'max_score': 247.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE247799.jpg'}], 'start': 0.209, 'title': 'Svms and maximal margin classifiers', 'summary': 'Discusses support vector machines and their use in classifying observations based on a maximal margin classifier, ensuring the largest margin for accurate classifications. it also explores the limitations of maximal margin classifiers, highlighting their sensitivity to outliers in the training data, which makes them less effective in classifying new observations.', 'chapters': [{'end': 211.979, 'start': 0.209, 'title': 'Support vector machines', 'summary': 'Explains support vector machines and their use in classifying observations based on a maximal margin classifier, ensuring the largest margin for accurate classifications.', 'duration': 211.77, 'highlights': ['The chapter focuses on explaining support vector machines and their application in classifying observations based on a maximal margin classifier, ensuring the largest margin for accurate classifications.', 'The threshold, halfway between edge observations in a training data set, is used to create the largest margin for classifying new observations, ensuring accurate classification.', 'The margin, as the shortest distance between the observations and the threshold, is maximized to ensure accurate classifications using a maximal margin classifier.']}, {'end': 269.556, 'start': 213.399, 'title': 'Maximal margin classifiers', 'summary': 'Discusses the limitations of maximal margin classifiers, highlighting their sensitivity to outliers in the training data, which makes them less effective in classifying new observations.', 'duration': 56.157, 'highlights': ['Maximal margin classifiers are sensitive to outliers in the training data, which can lead to misclassification of new observations.', 'The maximum margin classifier would be super close to the obese observations and really far from the majority of the observations that are not obese.', 'This sensitivity to outliers makes maximal margin classifiers less effective in classifying new observations.']}], 'duration': 269.347, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE209.jpg', 'highlights': ['The chapter focuses on explaining support vector machines and their application in classifying observations based on a maximal margin classifier, ensuring the largest margin for accurate classifications.', 'The threshold, halfway between edge observations in a training data set, is used to create the largest margin for classifying new observations, ensuring accurate classification.', 'The margin, as the shortest distance between the observations and the threshold, is maximized to ensure accurate classifications using a maximal margin classifier.', 'Maximal margin classifiers are sensitive to outliers in the training data, which can lead to misclassification of new observations.', 'This sensitivity to outliers makes maximal margin classifiers less effective in classifying new observations.']}, {'end': 481.061, 'segs': [{'end': 344.568, 'src': 'heatmap', 'start': 307.924, 'weight': 0, 'content': [{'end': 316.286, 'text': 'Choosing a threshold that allows misclassifications is an example of the bias-variance trade-off that plagues all of machine learning.', 'start': 307.924, 'duration': 8.362}, {'end': 324.988, 'text': 'In other words, before we allowed misclassifications, we picked a threshold that was very sensitive to the training data.', 'start': 317.907, 'duration': 7.081}, {'end': 326.589, 'text': 'It had low bias.', 'start': 325.469, 'duration': 1.12}, {'end': 330.557, 'text': 'and it performed poorly when we got new data.', 'start': 327.935, 'duration': 2.622}, {'end': 332.498, 'text': 'It had high variance.', 'start': 331.177, 'duration': 1.321}, {'end': 344.568, 'text': 'In contrast, when we picked a threshold that was less sensitive to the training data and allowed misclassifications, so it had higher bias,', 'start': 334.46, 'duration': 10.108}], 'summary': 'Choosing a threshold with low bias led to poor performance with new data, while a less sensitive threshold resulted in higher bias.', 'duration': 24.574, 'max_score': 307.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE307924.jpg'}, {'end': 394.224, 'src': 'embed', 'start': 366.012, 'weight': 3, 'content': [{'end': 375.338, 'text': 'So the question is, how do we know that this soft margin is better than this soft margin? The answer is simple.', 'start': 366.012, 'duration': 9.326}, {'end': 386.125, 'text': 'We use cross-validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification.', 'start': 375.879, 'duration': 10.246}, {'end': 394.224, 'text': 'For example, if cross-validation determined that this was the best soft margin,', 'start': 387.779, 'duration': 6.445}], 'summary': 'Cross-validation determines the best soft margin for classification.', 'duration': 28.212, 'max_score': 366.012, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE366012.jpg'}, {'end': 455.735, 'src': 'heatmap', 'start': 357.227, 'weight': 4, 'content': [{'end': 364.451, 'text': 'When we allow misclassifications, the distance between the observations and the threshold is called a soft margin.', 'start': 357.227, 'duration': 7.224}, {'end': 375.338, 'text': 'So the question is, how do we know that this soft margin is better than this soft margin? The answer is simple.', 'start': 366.012, 'duration': 9.326}, {'end': 386.125, 'text': 'We use cross-validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification.', 'start': 375.879, 'duration': 10.246}, {'end': 394.224, 'text': 'For example, if cross-validation determined that this was the best soft margin,', 'start': 387.779, 'duration': 6.445}, {'end': 402.911, 'text': 'then we would allow one misclassification and two observations that are correctly classified to be within the soft margin.', 'start': 394.224, 'duration': 8.687}, {'end': 414.86, 'text': 'Bam!, When we use a soft margin to determine the location of a threshold, brace yourself, we have another terminology alert.', 'start': 404.192, 'duration': 10.668}, {'end': 424.415, 'text': 'then we are using a soft margin classifier, aka a support vector classifier, to classify observations.', 'start': 416.254, 'duration': 8.161}, {'end': 435.037, 'text': 'The name support vector classifier comes from the fact that the observations on the edge and within the soft margin are called support vectors.', 'start': 426.036, 'duration': 9.001}, {'end': 438.278, 'text': 'Super small bam.', 'start': 436.897, 'duration': 1.381}, {'end': 449.053, 'text': 'Note, if each observation had a mass measurement and a height measurement, then the data would be two-dimensional.', 'start': 439.918, 'duration': 9.135}, {'end': 455.735, 'text': 'When the data are two-dimensional, a support vector classifier is a line.', 'start': 450.694, 'duration': 5.041}], 'summary': 'Soft margin classifier uses cross-validation to allow misclassifications and observations for best classification.', 'duration': 29.699, 'max_score': 357.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE357227.jpg'}], 'start': 271.017, 'title': 'Threshold and bias-variance trade-off', 'summary': 'Discusses the impact of choosing thresholds on misclassifications and model performance, illustrating the bias-variance trade-off in machine learning and the need for a threshold less sensitive to outliers.', 'chapters': [{'end': 364.451, 'start': 271.017, 'title': 'Threshold and bias-variance trade-off', 'summary': 'Discusses the need for a threshold less sensitive to outliers, illustrating the bias-variance trade-off in machine learning and the impact of choosing thresholds on misclassifications and model performance.', 'duration': 93.434, 'highlights': ['Choosing a threshold less sensitive to outliers illustrates the bias-variance trade-off in machine learning. The chapter emphasizes the impact of choosing thresholds on misclassifications and model performance, demonstrating the bias-variance trade-off in machine learning.', 'Picking a threshold with low bias led to poor performance with new data, while a threshold with higher bias performed better with new data. It explains how a threshold with low bias had poor performance with new data, while a threshold with higher bias performed better, outlining the trade-off between bias and variance.', 'Allowing misclassifications results in a threshold less sensitive to training data and lower variance but higher bias. The chapter details how allowing misclassifications leads to a threshold less sensitive to training data, resulting in lower variance but higher bias, demonstrating the impact on model performance.']}, {'end': 481.061, 'start': 366.012, 'title': 'Support vector classifier', 'summary': 'Discusses using cross-validation to determine the best soft margin for a support vector classifier, which allows one misclassification and two observations within the soft margin, supporting vectors are the observations on the edge and within the soft margin, and a support vector classifier in a two-dimensional space is represented by a line.', 'duration': 115.049, 'highlights': ['Cross-validation is used to determine the best soft margin, allowing one misclassification and two correctly classified observations within the soft margin.', 'Support vectors are the observations on the edge and within the soft margin of a support vector classifier.', 'In a two-dimensional space, a support vector classifier is represented by a line, with the soft margin measured from two points.']}], 'duration': 210.044, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE271017.jpg', 'highlights': ['Choosing a threshold less sensitive to outliers illustrates the bias-variance trade-off in machine learning.', 'Picking a threshold with low bias led to poor performance with new data, while a threshold with higher bias performed better with new data.', 'Allowing misclassifications results in a threshold less sensitive to training data and lower variance but higher bias.', 'Cross-validation is used to determine the best soft margin, allowing one misclassification and two correctly classified observations within the soft margin.', 'Support vectors are the observations on the edge and within the soft margin of a support vector classifier.', 'In a two-dimensional space, a support vector classifier is represented by a line, with the soft margin measured from two points.']}, {'end': 706.77, 'segs': [{'end': 610.487, 'src': 'embed', 'start': 585.376, 'weight': 0, 'content': [{'end': 592.719, 'text': 'And when the data are in two dimensions, the support vector classifier is a one-dimensional line in a two-dimensional space.', 'start': 585.376, 'duration': 7.343}, {'end': 600.543, 'text': 'In mathematical jargon, a line is a flat affine one-dimensional subspace.', 'start': 595.16, 'duration': 5.383}, {'end': 610.487, 'text': 'And when the data are three-dimensional, the support vector classifier is a two-dimensional plane in a three-dimensional space.', 'start': 602.623, 'duration': 7.864}], 'summary': 'Support vector classifier forms a line in two dimensions and a plane in three dimensions.', 'duration': 25.111, 'max_score': 585.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE585376.jpg'}, {'end': 706.77, 'src': 'embed', 'start': 648.797, 'weight': 1, 'content': [{'end': 652.62, 'text': "But we generally only use the term when we can't draw it on paper.", 'start': 648.797, 'duration': 3.823}, {'end': 657.544, 'text': 'Small bam, because this is just more terminology.', 'start': 654.221, 'duration': 3.323}, {'end': 661.368, 'text': 'Ugh Support.', 'start': 658.986, 'duration': 2.382}, {'end': 670.632, 'text': 'vector classifiers seem pretty cool because they can handle outliers And because they can allow misclassifications.', 'start': 661.368, 'duration': 9.264}, {'end': 673.134, 'text': 'they can handle overlapping classifications.', 'start': 670.632, 'duration': 2.502}, {'end': 685.621, 'text': 'But what if this was our training data and we had tons of overlap? In this new example, with tons of overlap, we are now looking at drug dosages.', 'start': 675.015, 'duration': 10.606}, {'end': 690.624, 'text': 'And the red dots represent patients that were not cured.', 'start': 687.182, 'duration': 3.442}, {'end': 695.107, 'text': 'And the green dots represent patients that were cured.', 'start': 692.045, 'duration': 3.062}, {'end': 702.748, 'text': "In other words, the drug doesn't work if the dosage is too small or too large.", 'start': 696.744, 'duration': 6.004}, {'end': 706.77, 'text': 'It only works when the dosage is just right.', 'start': 704.108, 'duration': 2.662}], 'summary': 'Vector classifiers handle outliers, misclassifications, and overlapping classifications, illustrated using drug dosages and patient outcomes.', 'duration': 57.973, 'max_score': 648.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE648797.jpg'}], 'start': 482.863, 'title': 'Support vector classifier', 'summary': 'Explains support vector classifiers in multi-dimensional spaces, including formation of planes and hyperplanes, handling outliers and overlapping classifications, and practical application in drug dosage determination.', 'chapters': [{'end': 706.77, 'start': 482.863, 'title': 'Support vector classifier in multi-dimensions', 'summary': 'Explains the concept of support vector classifiers in multi-dimensional spaces, highlighting the formation of planes and hyperplanes, the handling of outliers and overlapping classifications, and the practical application in determining drug dosages.', 'duration': 223.907, 'highlights': ['Support vector classifiers form planes in three-dimensional spaces and hyperplanes in four or more dimensions, providing a practical approach for multi-dimensional classification.', 'Support vector classifiers can handle outliers and overlapping classifications, making them suitable for complex datasets with varying degrees of misclassification.', 'The practical application of support vector classifiers is demonstrated in determining drug dosages, where the classifier helps identify the optimal dosage for curing patients.']}], 'duration': 223.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE482863.jpg', 'highlights': ['Support vector classifiers form planes in three-dimensional spaces and hyperplanes in four or more dimensions, providing a practical approach for multi-dimensional classification.', 'Support vector classifiers can handle outliers and overlapping classifications, making them suitable for complex datasets with varying degrees of misclassification.', 'The practical application of support vector classifiers is demonstrated in determining drug dosages, where the classifier helps identify the optimal dosage for curing patients.']}, {'end': 853.108, 'segs': [{'end': 761.553, 'src': 'embed', 'start': 731.086, 'weight': 0, 'content': [{'end': 741.931, 'text': "Since maximal margin classifiers and support vector classifiers can't handle this data, it's high time we talked about support vector machines.", 'start': 731.086, 'duration': 10.845}, {'end': 749.714, 'text': "So let's start by getting an intuitive sense of the main ideas behind support vector machines.", 'start': 743.871, 'duration': 5.843}, {'end': 754.827, 'text': 'We start by adding a y-axis so we can draw a graph.', 'start': 751.104, 'duration': 3.723}, {'end': 761.553, 'text': 'The x-axis coordinates in this graph will be the dosages that we have already observed.', 'start': 756.429, 'duration': 5.124}], 'summary': "Support vector machines handle data that others can't, introducing main ideas with a graph.", 'duration': 30.467, 'max_score': 731.086, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE731086.jpg'}, {'end': 827.049, 'src': 'embed', 'start': 797.526, 'weight': 1, 'content': [{'end': 803.691, 'text': 'Since each observation has x- and y-axis coordinates, the data are now two-dimensional.', 'start': 797.526, 'duration': 6.165}, {'end': 807.732, 'text': 'And now that the data are two-dimensional,', 'start': 805.29, 'duration': 2.442}, {'end': 814.478, 'text': 'we can draw a support vector classifier that separates the people who were cured from the people who were not cured.', 'start': 807.732, 'duration': 6.746}, {'end': 820.904, 'text': 'And the support vector classifier can be used to classify new observations.', 'start': 816.34, 'duration': 4.564}, {'end': 827.049, 'text': 'For example, if a new observation had this dosage,', 'start': 822.465, 'duration': 4.584}], 'summary': 'Data is now two-dimensional, allowing support vector classifier to separate cured from not cured people and classify new observations.', 'duration': 29.523, 'max_score': 797.526, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE797526.jpg'}], 'start': 708.451, 'title': 'Support vector machines', 'summary': 'Introduces support vector machines as an improvement over maximal margin classifiers and support vector classifiers, showcasing its applicability in classifying observations.', 'chapters': [{'end': 853.108, 'start': 708.451, 'title': 'Support vector machines', 'summary': 'Introduces support vector machines as an improvement over maximal margin classifiers and support vector classifiers, explaining the concept using a two-dimensional graph and showcasing its applicability in classifying observations.', 'duration': 144.657, 'highlights': ['Support vector machines are introduced as an improvement over maximal margin classifiers and support vector classifiers, which struggle with certain types of data.', 'The concept of support vector machines is explained using a two-dimensional graph, with x-axis representing dosages and y-axis representing the square of dosages, showcasing the separation of cured and not cured individuals using a support vector classifier.', "The support vector classifier's ability to classify new observations is demonstrated, with examples of how dosages are squared to determine the classification of the observations."]}], 'duration': 144.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE708451.jpg', 'highlights': ['Support vector machines are introduced as an improvement over maximal margin classifiers and support vector classifiers.', 'The concept of support vector machines is explained using a two-dimensional graph, showcasing the separation of cured and not cured individuals using a support vector classifier.', "The support vector classifier's ability to classify new observations is demonstrated, with examples of how dosages are squared to determine the classification of the observations."]}, {'end': 1231.153, 'segs': [{'end': 985.972, 'src': 'heatmap', 'start': 956.588, 'weight': 0.725, 'content': [{'end': 964.57, 'text': 'When d equals 1, the polynomial kernel computes the relationships between each pair of observations in one dimension.', 'start': 956.588, 'duration': 7.982}, {'end': 971.862, 'text': 'And these relationships are used to find a support vector classifier.', 'start': 967.699, 'duration': 4.163}, {'end': 978.386, 'text': 'When d equals 2, we get a second dimension based on dosages squared.', 'start': 973.603, 'duration': 4.783}, {'end': 985.972, 'text': 'And the polynomial kernel computes the two-dimensional relationships between each pair of observations.', 'start': 979.948, 'duration': 6.024}], 'summary': 'Polynomial kernel computes relationships for d=1 and d=2 dimensions.', 'duration': 29.384, 'max_score': 956.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE956588.jpg'}, {'end': 1038.582, 'src': 'embed', 'start': 1008.847, 'weight': 2, 'content': [{'end': 1012.79, 'text': 'And those relationships are used to find a support vector classifier.', 'start': 1008.847, 'duration': 3.943}, {'end': 1021.296, 'text': 'And when D equals 4 or more, then we get even more dimensions to find a support vector classifier.', 'start': 1014.831, 'duration': 6.465}, {'end': 1031.117, 'text': 'In summary, the polynomial kernel systematically increases dimensions by setting d, the degree of the polynomial.', 'start': 1023.251, 'duration': 7.866}, {'end': 1038.582, 'text': 'And the relationships between each pair of observations are used to find a support vector classifier.', 'start': 1032.578, 'duration': 6.004}], 'summary': 'Using relationships to find support vector classifiers with increasing dimensions when d equals 4 or more.', 'duration': 29.735, 'max_score': 1008.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE1008847.jpg'}, {'end': 1096.778, 'src': 'embed', 'start': 1070.407, 'weight': 1, 'content': [{'end': 1079.39, 'text': 'However, when using it on a new observation like this, the radial kernel behaves like a weighted nearest neighbor model.', 'start': 1070.407, 'duration': 8.983}, {'end': 1089.934, 'text': 'In other words, the closest observations, aka the nearest neighbors, have a lot of influence on how we classify the new observation.', 'start': 1080.689, 'duration': 9.245}, {'end': 1096.778, 'text': 'And observations that are further away have relatively little influence on the classification.', 'start': 1091.175, 'duration': 5.603}], 'summary': 'Radial kernel acts as a weighted nearest neighbor model, giving closest observations significant influence on new observation classification.', 'duration': 26.371, 'max_score': 1070.407, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE1070407.jpg'}, {'end': 1177.434, 'src': 'embed', 'start': 1153.168, 'weight': 0, 'content': [{'end': 1163.01, 'text': 'The kernel trick reduces the amount of computation required for support vector machines by avoiding the math that transforms the data from low to high dimensions.', 'start': 1153.168, 'duration': 9.842}, {'end': 1170.032, 'text': 'And it makes calculating the relationships in the infinite dimensions used by the radial kernel possible.', 'start': 1164.351, 'duration': 5.681}, {'end': 1177.434, 'text': 'However, regardless of how the relationships are calculated, the concepts are the same.', 'start': 1171.893, 'duration': 5.541}], 'summary': 'Kernel trick reduces computation for svm, enables calculation in infinite dimensions.', 'duration': 24.266, 'max_score': 1153.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE1153168.jpg'}], 'start': 853.108, 'title': 'Support vector machines and kernel functions', 'summary': 'Delves into the main concepts of support vector machines, including the use of kernel functions such as polynomial and radial kernels to systematically increase dimensions and find support vector classifiers. it also explores the implementation of the kernel trick to reduce computation and calculate high dimensional relationships in infinite dimensions used by the radial kernel.', 'chapters': [{'end': 918.567, 'start': 853.108, 'title': 'Support vector machines', 'summary': 'Explains the main ideas behind support vector machines, which involve starting with data in a relatively low dimension, moving the data into a higher dimension, and finding a support vector classifier that separates the higher dimensional data into two groups.', 'duration': 65.459, 'highlights': ['The main ideas behind support vector machines involve starting with data in a relatively low dimension, moving the data into a higher dimension, and finding a support vector classifier that separates the higher dimensional data into two groups.', 'The data started in one dimension and was moved to two dimensions.', 'The chapter also addresses the choice of creating y-axis coordinates with dosage squared instead of other alternatives like dosage cubed or pi divided by 4 times the square root of dosage.']}, {'end': 1231.153, 'start': 920.268, 'title': 'Support vector machines and kernel functions', 'summary': "Explains how kernel functions like polynomial and radial kernels systematically increase dimensions to find support vector classifiers, with polynomial kernel using a parameter 'd' to compute relationships between observations and radial kernel behaving like a weighted nearest neighbor model. the kernel trick is also discussed, which reduces computation by calculating high dimensional relationships without actually transforming the data, making it possible to calculate relationships in infinite dimensions used by the radial kernel.", 'duration': 310.885, 'highlights': ["The polynomial kernel systematically increases dimensions by setting d, the degree of the polynomial, to compute relationships between observations and find a support vector classifier. The polynomial kernel uses a parameter 'd' to systematically increase dimensions, with d=1 computing relationships in one dimension, d=2 computing two-dimensional relationships, d=3 computing three-dimensional relationships, and d=4 or more increasing dimensions further to find a support vector classifier.", 'The radial kernel behaves like a weighted nearest neighbor model, where the closest observations have a lot of influence on classifying new observations. The radial kernel, finding support vector classifiers in infinite dimensions, behaves like a weighted nearest neighbor model, with the closest observations having a significant influence on classifying new observations while those further away have relatively little influence.', 'The kernel trick reduces computation by calculating high dimensional relationships without actually transforming the data, making it possible to calculate relationships in infinite dimensions used by the radial kernel. The kernel trick reduces computation by calculating high dimensional relationships without actually transforming the data, making it possible to calculate relationships in infinite dimensions used by the radial kernel, and it avoids the math that transforms the data from low to high dimensions.']}], 'duration': 378.045, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/efR1C6CvhmE/pics/efR1C6CvhmE853108.jpg', 'highlights': ['The kernel trick reduces computation by calculating high dimensional relationships without actually transforming the data, making it possible to calculate relationships in infinite dimensions used by the radial kernel.', 'The radial kernel behaves like a weighted nearest neighbor model, where the closest observations have a lot of influence on classifying new observations.', 'The polynomial kernel systematically increases dimensions by setting d, the degree of the polynomial, to compute relationships between observations and find a support vector classifier.']}], 'highlights': ['Support vector machines ensure the largest margin for accurate classifications.', 'Threshold, halfway between edge observations, creates the largest margin for classifying new observations.', 'Maximal margin classifiers are sensitive to outliers, leading to misclassification of new observations.', 'Choosing a threshold less sensitive to outliers illustrates the bias-variance trade-off in machine learning.', 'Cross-validation is used to determine the best soft margin, allowing one misclassification and two correctly classified observations within the soft margin.', 'Support vector classifiers form planes in three-dimensional spaces and hyperplanes in four or more dimensions, providing a practical approach for multi-dimensional classification.', 'Support vector machines are introduced as an improvement over maximal margin classifiers and support vector classifiers.', 'The kernel trick reduces computation by calculating high dimensional relationships without actually transforming the data, making it possible to calculate relationships in infinite dimensions used by the radial kernel.']}