title

StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

description

LDA is surprisingly simple and anyone can understand it. Here I avoid the complex linear algebra and use illustrations to show you what it does so you will know when to use it and how to interpret the results. Sample code for R is at the StatQuest GitHub:
https://github.com/StatQuest/linear_discriminant_analysis_demo/blob/master/linear_discriminant_analysis_demo.R
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:59 Motivation for LDA
5:03 LDA Main Idea
5:29 LDA with 2 categories and 2 variables
7:07 How LDA creates new axes
10:03 LDA with 2 categories and 3 or more variables
10:57 LDA for 3 categories
13:39 Similarities between LDA and PCA
#statquest #LDA #ML

detail

{'title': 'StatQuest: Linear Discriminant Analysis (LDA) clearly explained.', 'heatmap': [{'end': 420.694, 'start': 397.177, 'weight': 0.83}, {'end': 753.098, 'start': 590.487, 'weight': 0.794}, {'end': 770.301, 'start': 754.498, 'weight': 0.708}], 'summary': 'Explains linear discriminant analysis (lda) and its potential application in understanding the effectiveness of a cancer drug through gene expression. it discusses the challenge of visualizing high-dimensional data and how lda maximizes data separability, providing a better approach than pca for decision-making. the concept of lda is emphasized, focusing on minimizing variation and maximizing the distance between means, with comparisons to pca and its superior category separation.', 'chapters': [{'end': 220.459, 'segs': [{'end': 92.359, 'src': 'embed', 'start': 4.17, 'weight': 0, 'content': [{'end': 8.491, 'text': 'StatQuest Stats are coming at you.', 'start': 4.17, 'duration': 4.321}, {'end': 16.192, 'text': 'StatQuest Stats are gonna find you.', 'start': 8.511, 'duration': 7.681}, {'end': 26.194, 'text': 'StatQuest Watch out! Hello and welcome to StatQuest.', 'start': 18.993, 'duration': 7.201}, {'end': 33.435, 'text': 'StatQuest is brought to you by the friendly folks in the genetics department at the University of North Carolina at Chapel Hill.', 'start': 26.754, 'duration': 6.681}, {'end': 38.765, 'text': "Today we're going to be talking about linear discriminant analysis.", 'start': 34.983, 'duration': 3.782}, {'end': 43.406, 'text': "Which, let's be honest, sounds really fancy.", 'start': 40.365, 'duration': 3.041}, {'end': 46.868, 'text': 'And it kind of is, but not really.', 'start': 44.587, 'duration': 2.281}, {'end': 48.148, 'text': 'I think we can understand it.', 'start': 47.128, 'duration': 1.02}, {'end': 51.15, 'text': "Let's see what it does and then we'll work it out.", 'start': 49.289, 'duration': 1.861}, {'end': 58.172, 'text': "That is, let's look at some examples of why we might need linear discriminant analysis and then we'll talk about the details of how it works.", 'start': 51.47, 'duration': 6.702}, {'end': 61.734, 'text': 'Imagine that we have this cancer drug.', 'start': 59.893, 'duration': 1.841}, {'end': 65.388, 'text': 'And that cancer drug works great for some people.', 'start': 62.728, 'duration': 2.66}, {'end': 69.29, 'text': 'But for other people, it just makes them feel worse.', 'start': 66.589, 'duration': 2.701}, {'end': 71.051, 'text': 'Wah, wah.', 'start': 69.931, 'duration': 1.12}, {'end': 74.832, 'text': 'We want to figure out who to give the drug to.', 'start': 72.432, 'duration': 2.4}, {'end': 77.734, 'text': "We want to give it to people who it's going to help.", 'start': 75.433, 'duration': 2.301}, {'end': 81.195, 'text': "But we don't want to give it to people that it might harm.", 'start': 78.494, 'duration': 2.701}, {'end': 89.718, 'text': "And since I'm a geneticist and I work in a genetics department, the way I answer all my questions is to look at gene expression.", 'start': 82.715, 'duration': 7.003}, {'end': 92.359, 'text': 'Maybe gene expression can help us decide.', 'start': 90.278, 'duration': 2.081}], 'summary': 'Statquest introduces linear discriminant analysis for gene expression analysis in cancer drug treatment.', 'duration': 88.189, 'max_score': 4.17, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc4170.jpg'}], 'start': 4.17, 'title': 'Linear discriminant analysis and using gene expression for drug response', 'summary': 'Introduces linear discriminant analysis and its potential application in understanding the effectiveness of a cancer drug. it also discusses the use of gene expression to determine drug response, identifying individuals who benefit from the drug and those who may experience adverse effects.', 'chapters': [{'end': 65.388, 'start': 4.17, 'title': 'Linear discriminant analysis', 'summary': 'Introduces linear discriminant analysis, a statistical method, and its potential application in understanding the effectiveness of a cancer drug.', 'duration': 61.218, 'highlights': ['Linear discriminant analysis is being discussed in this chapter, with potential application in understanding the effectiveness of a cancer drug.', 'The chapter is brought by the genetics department at the University of North Carolina at Chapel Hill.', 'Linear discriminant analysis is presented as a method that sounds fancy but can be understood, with a promise to explore its functionality and application.', 'The example of a cancer drug that works great for some people is used to illustrate the potential need for linear discriminant analysis.']}, {'end': 220.459, 'start': 66.589, 'title': 'Using gene expression for drug response', 'summary': 'Discusses the use of gene expression to determine drug response, showing how gene x and multiple genes are used to identify individuals who benefit from the drug and those who may experience adverse effects.', 'duration': 153.87, 'highlights': ['Using gene X to determine drug response Gene X shows that low gene transcripts indicate drug effectiveness, while high gene transcripts indicate ineffectiveness.', 'Comparison of using one gene vs. multiple genes for drug response Using multiple genes provides better categorization of drug effectiveness than using a single gene, as demonstrated by the examples of using two and three genes.', 'The use of three genes to decide drug response The example of using three genes, including gene Z on the Z-axis, demonstrates the attempt to use a three-dimensional plane to separate individuals who benefit from the drug and those who do not.']}], 'duration': 216.289, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc4170.jpg', 'highlights': ['Linear discriminant analysis is being discussed in this chapter, with potential application in understanding the effectiveness of a cancer drug.', 'Using gene X to determine drug response Gene X shows that low gene transcripts indicate drug effectiveness, while high gene transcripts indicate ineffectiveness.', 'Comparison of using one gene vs. multiple genes for drug response Using multiple genes provides better categorization of drug effectiveness than using a single gene, as demonstrated by the examples of using two and three genes.', 'The example of using three genes, including gene Z on the Z-axis, demonstrates the attempt to use a three-dimensional plane to separate individuals who benefit from the drug and those who do not.', 'The chapter is brought by the genetics department at the University of North Carolina at Chapel Hill.', 'Linear discriminant analysis is presented as a method that sounds fancy but can be understood, with a promise to explore its functionality and application.', 'The example of a cancer drug that works great for some people is used to illustrate the potential need for linear discriminant analysis.']}, {'end': 446.002, 'segs': [{'end': 349.289, 'src': 'embed', 'start': 318.713, 'weight': 3, 'content': [{'end': 328.039, 'text': 'Linear Discriminant Analysis is like PCA, but it focuses on maximizing the separability among the known categories.', 'start': 318.713, 'duration': 9.326}, {'end': 332.682, 'text': "Here, we're going to start with a super simple example.", 'start': 329.76, 'duration': 2.922}, {'end': 337.725, 'text': "we're just going to try to reduce a two-dimensional graph to a 1D graph.", 'start': 333.663, 'duration': 4.062}, {'end': 349.289, 'text': 'That is to say, we want to take this two-dimensional graph, aka an XY graph, and reduce it to a one-dimensional graph, aka a number line,', 'start': 338.805, 'duration': 10.484}], 'summary': 'Linear discriminant analysis maximizes separability among known categories, simplifying 2d to 1d.', 'duration': 30.576, 'max_score': 318.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc318713.jpg'}, {'end': 446.002, 'src': 'heatmap', 'start': 397.177, 'weight': 0, 'content': [{'end': 408.984, 'text': 'LDA uses the information from both genes to create a new axis and it projects the data onto this new axis in a way to maximize the separation of the two categories.', 'start': 397.177, 'duration': 11.807}, {'end': 420.694, 'text': 'So the general concept here is that LDA creates a new access and it projects the data onto that new access in a way that maximizes the separation of the two categories.', 'start': 410.472, 'duration': 10.222}, {'end': 426.635, 'text': "Now let's look at the nitty gritty details and figure out how LDA does that.", 'start': 422.554, 'duration': 4.081}, {'end': 436.597, 'text': 'How does LDA create the new access? The new access is created according to two criteria that are considered simultaneously.', 'start': 427.915, 'duration': 8.682}, {'end': 446.002, 'text': 'The first criteria is that once the data is projected onto the new axis, we want to maximize the distance between the two means.', 'start': 437.54, 'duration': 8.462}], 'summary': 'Lda creates a new axis to maximize separation of two categories.', 'duration': 70.057, 'max_score': 397.177, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc397177.jpg'}], 'start': 221.829, 'title': 'Visualizing high-dimensional data and maximizing data separability with lda', 'summary': 'Discusses the challenge of visualizing high-dimensional data and the impossibility of drawing 4 or 10,000-dimensional graphs, drawing parallels to principal component analysis. it also explores how linear discriminant analysis maximizes separability among categories by creating a new axis to project data, providing a better approach than pca for making decisions based on separability.', 'chapters': [{'end': 267.233, 'start': 221.829, 'title': 'Visualizing high-dimensional data', 'summary': 'Discusses the challenge of visualizing high-dimensional data, highlighting the difficulty in accurately separating categories in 3d space and the impossibility of drawing 4 or 10,000-dimensional graphs, drawing parallels to the problem encountered in principal component analysis.', 'duration': 45.404, 'highlights': ['The challenge of accurately separating categories in 3D space due to the difficulty in visualizing three dimensions on a flat computer screen.', 'The impossibility of drawing 4 or 10,000-dimensional graphs, presenting a significant obstacle in visualizing high-dimensional data.', 'Drawing parallels to the problem encountered in principal component analysis, emphasizing the recurring nature of the visualization challenge in high-dimensional data analysis.']}, {'end': 446.002, 'start': 267.653, 'title': 'Maximizing data separability with lda', 'summary': 'Discusses how linear discriminant analysis (lda) maximizes separability among categories by creating a new axis to project data in a way that maximizes the distance between the means of the two categories, providing a better approach than pca for making decisions based on separability.', 'duration': 178.349, 'highlights': ['LDA creates a new access and projects the data onto it to maximize the separation of the two categories, providing a better approach than PCA for making decisions based on separability.', 'PCA reduces dimensions by focusing on genes with the most variation, which is incredibly useful when plotting data with a lot of dimensions or genes onto a simple XY plot.', 'LDA focuses on maximizing the separability among the known categories, providing a better approach than PCA for making decisions based on separability.']}], 'duration': 224.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc221829.jpg', 'highlights': ['LDA focuses on maximizing the separability among the known categories, providing a better approach than PCA for making decisions based on separability.', 'The impossibility of drawing 4 or 10,000-dimensional graphs, presenting a significant obstacle in visualizing high-dimensional data.', 'The challenge of accurately separating categories in 3D space due to the difficulty in visualizing three dimensions on a flat computer screen.', 'Drawing parallels to the problem encountered in principal component analysis, emphasizing the recurring nature of the visualization challenge in high-dimensional data analysis.', 'PCA reduces dimensions by focusing on genes with the most variation, which is incredibly useful when plotting data with a lot of dimensions or genes onto a simple XY plot.']}, {'end': 688.059, 'segs': [{'end': 474.27, 'src': 'embed', 'start': 447.662, 'weight': 1, 'content': [{'end': 454.984, 'text': 'Here we have a green mu character, which is a Greek character representing the mean for the green category,', 'start': 447.662, 'duration': 7.322}, {'end': 458.785, 'text': 'and a red mu representing the mean for the red category.', 'start': 454.984, 'duration': 3.801}, {'end': 463.966, 'text': 'The second criteria is that we want to minimize the variation.', 'start': 460.345, 'duration': 3.621}, {'end': 469.706, 'text': 'which LDA calls scatter and is represented by s squared within each category.', 'start': 464.622, 'duration': 5.084}, {'end': 474.27, 'text': 'On the left side, we see the scatter around the green dots.', 'start': 471.007, 'duration': 3.263}], 'summary': 'Using lda to minimize variation in green and red categories.', 'duration': 26.608, 'max_score': 447.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc447662.jpg'}, {'end': 574.894, 'src': 'embed', 'start': 520.035, 'weight': 0, 'content': [{'end': 529.222, 'text': 'And ideally, the denominator would be very small in that the scatter, the variation of the data around each mean in each category, would be small.', 'start': 520.035, 'duration': 9.187}, {'end': 536.899, 'text': "Now, I know this isn't a very complicated equation, but to make things simpler later on in this discussion,", 'start': 530.256, 'duration': 6.643}, {'end': 541.421, 'text': "let's call the difference between the two means d for distance.", 'start': 536.899, 'duration': 4.522}, {'end': 548.544, 'text': 'So we can replace the difference between the two means with d.', 'start': 543.222, 'duration': 5.322}, {'end': 554.547, 'text': 'Now I want to show you an example of why both the distance between the two means and the scatter are important.', 'start': 548.544, 'duration': 6.003}, {'end': 557.167, 'text': "Here's a new data set.", 'start': 556.067, 'duration': 1.1}, {'end': 560.049, 'text': 'We still just have two categories, green and red.', 'start': 557.227, 'duration': 2.822}, {'end': 567.812, 'text': "In this case, there's a little bit of overlap on the y-axis, but lots of spread along the x-axis.", 'start': 560.909, 'duration': 6.903}, {'end': 574.894, 'text': "If we only maximize the distance between the means, then we'll get something like this.", 'start': 569.252, 'duration': 5.642}], 'summary': 'Minimize scatter, maximize distance between means for effective data analysis.', 'duration': 54.859, 'max_score': 520.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc520034.jpg'}], 'start': 447.662, 'title': 'Linear discriminant analysis', 'summary': 'Explains the concept of linear discriminant analysis (lda), emphasizing the importance of minimizing variation and considering the difference between means squared over the sum of scatter in determining discriminant functions. it also highlights the importance of maximizing the distance between means and minimizing scatter in lda, with illustrations and discussions on multiple dimensions and categories.', 'chapters': [{'end': 500.252, 'start': 447.662, 'title': 'Linear discriminant analysis', 'summary': 'Explains the concept of linear discriminant analysis (lda), emphasizing the importance of minimizing variation and considering the difference between means squared over the sum of scatter in determining discriminant functions.', 'duration': 52.59, 'highlights': ['Linear Discriminant Analysis (LDA) involves minimizing variation and considering the difference between means squared over the sum of scatter in determining discriminant functions.', 'The green mu represents the mean for the green category, the red mu represents the mean for the red category, and the scatter around the green and red dots is considered simultaneously.', 'We have a ratio of the difference between the two means squared over the sum of the scatter, where the numerator is squared due to uncertainty in the comparison of means.']}, {'end': 688.059, 'start': 501.412, 'title': 'Linear discriminant analysis', 'summary': 'Explains the importance of maximizing the distance between means and minimizing scatter in linear discriminant analysis, illustrated by examples and discussing the process for multiple dimensions and categories.', 'duration': 186.647, 'highlights': ['The importance of optimizing both the distance between means and scatter is demonstrated through examples. By optimizing both criteria, good separation can be achieved, illustrated by the example of maximizing distance and minimizing scatter resulting in better separation.', 'The process for multiple dimensions and categories in Linear Discriminant Analysis remains the same, involving creating a new axis to maximize the distance between means and minimize scatter. The process for multiple dimensions and categories in Linear Discriminant Analysis involves creating a new axis to maximize the distance between means and minimize scatter, as shown in the example of LDA with three genes.', 'The measurement of distances among means changes when dealing with multiple categories in Linear Discriminant Analysis. When dealing with multiple categories, the measurement of distances among means changes, as shown by the example of having three categories in contrast to two categories.']}], 'duration': 240.397, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc447662.jpg', 'highlights': ['Linear Discriminant Analysis (LDA) involves minimizing variation and considering the difference between means squared over the sum of scatter in determining discriminant functions.', 'The importance of optimizing both the distance between means and scatter is demonstrated through examples. By optimizing both criteria, good separation can be achieved, illustrated by the example of maximizing distance and minimizing scatter resulting in better separation.', 'The process for multiple dimensions and categories in Linear Discriminant Analysis remains the same, involving creating a new axis to maximize the distance between means and minimize scatter.']}, {'end': 910.691, 'segs': [{'end': 781.748, 'src': 'heatmap', 'start': 754.498, 'weight': 0.708, 'content': [{'end': 760.779, 'text': 'Suddenly, being able to create two axes that maximize the separation of the three categories is super cool.', 'start': 754.498, 'duration': 6.281}, {'end': 767.04, 'text': "It's way better than drawing a 10,000 dimension figure that we can't even imagine what it would look like.", 'start': 761.239, 'duration': 5.801}, {'end': 770.301, 'text': "Here's an example using real data.", 'start': 768.361, 'duration': 1.94}, {'end': 773.502, 'text': "I'm trying to separate three categories and I've got 10,000 genes.", 'start': 770.781, 'duration': 2.721}, {'end': 781.748, 'text': 'plotting the raw data would require 10,000 axes, we used LDA to reduce the number to two.', 'start': 775.663, 'duration': 6.085}], 'summary': 'Using lda, reduced 10,000 dimensions to two for data visualization.', 'duration': 27.25, 'max_score': 754.498, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc754498.jpg'}, {'end': 910.691, 'src': 'embed', 'start': 907.747, 'weight': 0, 'content': [{'end': 910.691, 'text': 'Tune in next time for another exciting StatQuest!.', 'start': 907.747, 'duration': 2.944}], 'summary': 'Next statquest coming soon!', 'duration': 2.944, 'max_score': 907.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc907747.jpg'}], 'start': 689.594, 'title': 'Linear discriminant analysis and its comparison with pca', 'summary': "Explains how linear discriminant analysis reduces 10,000 dimensions to two, improving visualization and separation of categories, and compares lda and pca, demonstrating lda's superior category separation and both methods' ranking of axes by importance.", 'chapters': [{'end': 789.115, 'start': 689.594, 'title': 'Linear discriminant analysis', 'summary': 'Explains how linear discriminant analysis creates two axes to separate data, reducing 10,000 dimensions to two, resulting in improved visualization and easier separation of categories, demonstrated with real data.', 'duration': 99.521, 'highlights': ['Linear Discriminant Analysis (LDA) creates two axes to separate data, reducing the number of dimensions from 10,000 to two, resulting in improved visualization and easier separation of categories.', 'Using LDA to reduce 10,000 dimensions to two allows for the easy visualization of three separate categories, despite imperfect separation.', 'The three central points for each category define a plane, creating new optimized x and y axes to separate the categories, which is particularly advantageous when dealing with data from 10,000 genes.']}, {'end': 910.691, 'start': 790.776, 'title': 'Lda vs pca: comparison and similarities', 'summary': 'Compares lda and pca using the same data set, highlighting that lda separates categories better than pca, and both methods rank axes by importance, with lda maximizing category separation and pca focusing on genes with the most variation.', 'duration': 119.915, 'highlights': ['LDA separates categories better than PCA, as seen in the plot with less overlap between the categories (quantifiable data not provided).', 'Both LDA and PCA rank the new axes by importance, with PC1 accounting for the most variation in the data and LD1 accounting for the most variation between categories (quantifiable data not provided).', 'LDA and PCA both allow the examination of genes driving the new axes, with PCA using loading scores and LDA correlating genes with the new axes (quantifiable data not provided).', 'LDA and PCA share the goal of dimensionality reduction, with PCA focusing on genes with the most variation and LDA maximizing category separation (quantifiable data not provided).']}], 'duration': 221.097, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/azXCzI57Yfc/pics/azXCzI57Yfc689594.jpg', 'highlights': ['LDA creates two axes to separate data, reducing dimensions from 10,000 to two for improved visualization and easier category separation.', 'LDA separates categories better than PCA, with less overlap between categories.', 'Both LDA and PCA rank new axes by importance, with PC1 accounting for the most variation and LD1 accounting for the most variation between categories.', 'LDA and PCA share the goal of dimensionality reduction, with PCA focusing on genes with the most variation and LDA maximizing category separation.']}], 'highlights': ['Using gene X to determine drug response Gene X shows that low gene transcripts indicate drug effectiveness, while high gene transcripts indicate ineffectiveness.', 'Comparison of using one gene vs. multiple genes for drug response Using multiple genes provides better categorization of drug effectiveness than using a single gene, as demonstrated by the examples of using two and three genes.', 'The example of using three genes, including gene Z on the Z-axis, demonstrates the attempt to use a three-dimensional plane to separate individuals who benefit from the drug and those who do not.', 'LDA focuses on maximizing the separability among the known categories, providing a better approach than PCA for making decisions based on separability.', 'The impossibility of drawing 4 or 10,000-dimensional graphs, presenting a significant obstacle in visualizing high-dimensional data.', 'Linear Discriminant Analysis (LDA) involves minimizing variation and considering the difference between means squared over the sum of scatter in determining discriminant functions.', 'LDA creates two axes to separate data, reducing dimensions from 10,000 to two for improved visualization and easier category separation.', 'Both LDA and PCA rank new axes by importance, with PC1 accounting for the most variation and LD1 accounting for the most variation between categories.']}