title

Covariance, Clearly Explained!!!

description

Covariance is one of those statistical terms that you might have heard before but didn't quite understand. It sounds fancy, but it's really quite simple and it is a computational stepping stone to many other interesting concepts like correlation. This video describes covariance, what it does and doesn't do, how it's computed, and why it is more useful as a computational stepping stone rather than an end in itself.
NOTE: This StatQuest assumes you already know about "variance". If not check out the quest: https://youtu.be/SzZ6GpcfoQY
And if you are interested in Part 2, which explains Correlation, here's the link: https://youtu.be/xZ_z8KWkhXE
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:24 Review of variance
1:07 Motivation for Covariance
5:59 Types of Covariance relationships
7:32 How to calculate covariance
17:00 Why covariance is hard to interpret
20:31 Motivation for Correlation
21:20 Summary
Correction:
16:45 We should be dividing by 4-1, not 5-1. Oops! :)
#statquest #covariance

detail

{'title': 'Covariance, Clearly Explained!!!', 'heatmap': [{'end': 83.605, 'start': 52.964, 'weight': 0.984}, {'end': 189.285, 'start': 162.358, 'weight': 0.783}, {'end': 290.377, 'start': 261.749, 'weight': 1}, {'end': 1002.549, 'start': 973.508, 'weight': 0.705}, {'end': 1024.592, 'start': 1008.587, 'weight': 0.828}], 'summary': 'Explains covariance and correlation in the context of gene data analysis, showcasing the relationship between gene x and gene y, with specific examples and calculations, highlighting the computational significance and limitations of interpreting covariance values, and emphasizing its role as a foundation for correlation analysis.', 'chapters': [{'end': 177.482, 'segs': [{'end': 31.567, 'src': 'embed', 'start': 0.461, 'weight': 3, 'content': [{'end': 9.631, 'text': "my cat can't do stats in the window, so i'll do stats for her all day long.", 'start': 0.461, 'duration': 9.17}, {'end': 12.654, 'text': 'stat quest.', 'start': 9.631, 'duration': 3.023}, {'end': 16.398, 'text': "hello, i'm josh starmer and welcome to stat quest.", 'start': 12.654, 'duration': 3.744}, {'end': 23.125, 'text': "today we're going to talk about covariance and this is part one in a two-part series on covariance and correlation.", 'start': 16.398, 'duration': 6.727}, {'end': 29.507, 'text': 'This StatQuest assumes that you are already familiar with the concept of variance.', 'start': 24.685, 'duration': 4.822}, {'end': 31.567, 'text': 'If not, check out the quest.', 'start': 30.007, 'duration': 1.56}], 'summary': 'Statquest series on covariance and correlation by josh starmer.', 'duration': 31.106, 'max_score': 0.461, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY461.jpg'}, {'end': 122.934, 'src': 'heatmap', 'start': 52.964, 'weight': 0, 'content': [{'end': 55.485, 'text': 'Then we estimated the mean, X bar.', 'start': 52.964, 'duration': 2.521}, {'end': 58.526, 'text': 'And then we estimated the variance.', 'start': 56.745, 'duration': 1.781}, {'end': 65.849, 'text': "Bam And that's our review of variance.", 'start': 62.567, 'duration': 3.282}, {'end': 76.152, 'text': 'Now imagine that in addition to counting mRNA transcripts for gene X, we also counted gene Y transcripts in the same five cells.', 'start': 67.489, 'duration': 8.663}, {'end': 83.605, 'text': 'Alternatively, you can imagine we counted the number of red apples in the same five grocery stores.', 'start': 77.533, 'duration': 6.072}, {'end': 90.587, 'text': "Note, if you're wondering why gene Y is perpendicular to gene X, don't sweat it.", 'start': 85.165, 'duration': 5.422}, {'end': 93.288, 'text': 'The reason will become clear in just a bit.', 'start': 91.007, 'duration': 2.281}, {'end': 100.99, 'text': 'Anyway, just like we did for gene X, we can estimate the mean for gene Y.', 'start': 94.908, 'duration': 6.082}, {'end': 107.272, 'text': 'And since gene Y is on the Y-axis, we will use Y-bar to represent its mean value.', 'start': 100.99, 'duration': 6.282}, {'end': 110.45, 'text': 'And we can estimate the variance.', 'start': 108.729, 'duration': 1.721}, {'end': 122.934, 'text': "Bam So far, we've estimated the mean and variance for two different genes measured in the same five cells.", 'start': 114.331, 'duration': 8.603}], 'summary': 'Estimation of mean and variance for gene x and gene y in five cells.', 'duration': 60.367, 'max_score': 52.964, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY52964.jpg'}, {'end': 177.482, 'src': 'embed', 'start': 153.147, 'weight': 2, 'content': [{'end': 160.571, 'text': 'This pair of measurements came from another cell, and both measurements are greater than their respective mean values.', 'start': 153.147, 'duration': 7.424}, {'end': 164.819, 'text': 'Since the measurements were taken in pairs.', 'start': 162.358, 'duration': 2.461}, {'end': 172.261, 'text': 'the question is do the measurements taken as pairs tell us something that the individual measurements do not??', 'start': 164.819, 'duration': 7.442}, {'end': 177.482, 'text': 'Covariance is one way to try to answer this question.', 'start': 174.121, 'duration': 3.361}], 'summary': 'Pair of measurements are both greater than their respective mean values, prompting exploration of covariance.', 'duration': 24.335, 'max_score': 153.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY153147.jpg'}], 'start': 0.461, 'title': 'Understanding covariance and correlation', 'summary': 'Explains the concepts of covariance and correlation in the context of estimating mean and variance for genes in the same cells, providing insights into the data relationship.', 'chapters': [{'end': 177.482, 'start': 0.461, 'title': 'Understanding covariance and correlation', 'summary': 'Explains covariance and correlation in the context of estimating the mean and variance for different genes measured in the same cells, with a focus on understanding the relationship between the measurements and how they can provide insights into the data.', 'duration': 177.021, 'highlights': ['The chapter provides an introduction to covariance and correlation, emphasizing the estimation of mean and variance for two different genes measured in the same cells or grocery stores.', 'It explains the concept using the analogy of counting mRNA transcripts for gene X and gene Y in the same cells or counting different types of apples in the same grocery stores.', 'The chapter highlights the significance of covariance as a way to understand if the measurements taken in pairs provide additional insights compared to individual measurements.', 'It emphasizes the relevance of understanding covariance and correlation in analyzing data relationships, setting the stage for the subsequent part of the series on correlation.']}], 'duration': 177.021, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY461.jpg', 'highlights': ['The chapter provides an introduction to covariance and correlation, emphasizing the estimation of mean and variance for two different genes measured in the same cells or grocery stores.', 'It explains the concept using the analogy of counting mRNA transcripts for gene X and gene Y in the same cells or counting different types of apples in the same grocery stores.', 'The chapter highlights the significance of covariance as a way to understand if the measurements taken in pairs provide additional insights compared to individual measurements.', 'It emphasizes the relevance of understanding covariance and correlation in analyzing data relationships, setting the stage for the subsequent part of the series on correlation.']}, {'end': 346.237, 'segs': [{'end': 233.576, 'src': 'embed', 'start': 209.553, 'weight': 0, 'content': [{'end': 218.836, 'text': 'This relationship low measurements for both genes in some cells and high measurements for both genes in other cells can be summarized with this line', 'start': 209.553, 'duration': 9.283}, {'end': 233.576, 'text': 'Note, the line that represents this particular relationship has a positive slope and it reflects the positive trend where the values for gene X and gene Y increase together.', 'start': 220.512, 'duration': 13.064}], 'summary': 'Positive relationship between gene x and gene y observed in cell measurements.', 'duration': 24.023, 'max_score': 209.553, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY209553.jpg'}, {'end': 316.99, 'src': 'heatmap', 'start': 261.749, 'weight': 1, 'content': [{'end': 277.542, 'text': 'If the data had looked like this and relatively low values for gene X corresponded with relatively high values for gene Y and relatively high values for gene X corresponded with relatively low values for gene Y,', 'start': 261.749, 'duration': 15.793}, {'end': 290.377, 'text': 'then the relationship would have a negative slope and reflect the negative trend that the values for gene X increase as the values for gene Y decrease.', 'start': 279.172, 'duration': 11.205}, {'end': 304.623, 'text': 'If the data had looked like this and every value for gene X was paired with the same value for gene Y, then there would be no trend,', 'start': 293.338, 'duration': 11.285}, {'end': 307.985, 'text': 'positive or negative, between gene X and gene Y.', 'start': 304.623, 'duration': 3.362}, {'end': 316.99, 'text': 'This is because if you told me that you got the same measurement for gene Y found in all of the other cells,', 'start': 309.448, 'duration': 7.542}], 'summary': 'Data showing gene x and gene y have a negative relationship with lower gene x values corresponding to higher gene y values, and no trend if all gene x values are paired with the same gene y value.', 'duration': 55.241, 'max_score': 261.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY261749.jpg'}], 'start': 179.202, 'title': 'Correlation of gene x and gene y', 'summary': 'Explores the relationship between gene x and gene y, revealing a positive trend with a positive slope, indicating high values for gene x correspond to high values for gene y, and vice versa, and also delves into scenarios of negative slope and no trend between the genes.', 'chapters': [{'end': 346.237, 'start': 179.202, 'title': 'Correlation of gene x and gene y', 'summary': 'Explains the relationship between the values of gene x and gene y, showing a positive trend with a positive slope indicating that high values for gene x correspond to high values for gene y, and vice versa, and also discusses scenarios of negative slope and no trend between the genes.', 'duration': 167.035, 'highlights': ['The positive slope of the line representing the relationship between gene X and gene Y indicates that high values for gene X correspond to high values for gene Y, and vice versa.', 'The chapter also discusses scenarios of negative slope and no trend between gene X and gene Y, providing a comprehensive understanding of the relationship between the genes.', 'Explanation of how the data would look if there were no trend, positive or negative, between gene X and gene Y, providing a clear illustration of scenarios with no relationship between the genes.']}], 'duration': 167.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY179202.jpg', 'highlights': ['The positive slope of the line representing the relationship between gene X and gene Y indicates high correspondence.', 'The chapter discusses scenarios of negative slope and no trend between gene X and gene Y.', 'Explanation of data with no trend, positive or negative, between gene X and gene Y.']}, {'end': 472.431, 'segs': [{'end': 472.431, 'src': 'embed', 'start': 382.326, 'weight': 0, 'content': [{'end': 386.069, 'text': 'Okay, we just covered the main idea behind covariance.', 'start': 382.326, 'duration': 3.743}, {'end': 389.452, 'text': "It's so important that I'm going to repeat it.", 'start': 387.09, 'duration': 2.362}, {'end': 394.837, 'text': 'Covariance can classify these three types of relationships.', 'start': 391.034, 'duration': 3.803}, {'end': 399.186, 'text': 'One, relationships with positive trends.', 'start': 396.244, 'duration': 2.942}, {'end': 403.328, 'text': 'Two, relationships with negative trends.', 'start': 400.527, 'duration': 2.801}, {'end': 409.552, 'text': 'And three, times when there is no relationship because there is no trend.', 'start': 404.809, 'duration': 4.743}, {'end': 416.856, 'text': 'Bam! The other main idea behind covariance is kind of a bummer.', 'start': 410.933, 'duration': 5.923}, {'end': 421.859, 'text': 'Covariance, in and of itself, is not very interesting.', 'start': 418.137, 'duration': 3.722}, {'end': 427.876, 'text': 'What I mean by this is that you will never calculate covariance and be done for the day.', 'start': 423.032, 'duration': 4.844}, {'end': 435.963, 'text': 'Instead, covariance is a computational stepping stone to something that is interesting, like correlation.', 'start': 429.317, 'duration': 6.646}, {'end': 443.009, 'text': 'Because I like repeating myself, let me repeat the second main idea behind covariance.', 'start': 437.924, 'duration': 5.085}, {'end': 450.295, 'text': 'Covariance is a computational stepping stone to something that is interesting, like correlation.', 'start': 444.45, 'duration': 5.845}, {'end': 454.806, 'text': "So let's talk about how covariance is calculated.", 'start': 452.065, 'duration': 2.741}, {'end': 459.987, 'text': 'Covariance is calculated with a slightly nasty looking thing.', 'start': 456.486, 'duration': 3.501}, {'end': 465.289, 'text': 'To get an intuitive sense for how covariance is calculated,', 'start': 461.568, 'duration': 3.721}, {'end': 472.431, 'text': "let's go back to the mean value for gene X and extend the green line to the top of the graph.", 'start': 465.289, 'duration': 7.142}], 'summary': "Covariance classifies relationships into positive, negative, or no trend. it's a stepping stone to correlation.", 'duration': 90.105, 'max_score': 382.326, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY382326.jpg'}], 'start': 346.237, 'title': 'Understanding covariance', 'summary': 'Explains the main idea behind covariance, emphasizing its role in classifying relationships and as a computational stepping stone to correlation.', 'chapters': [{'end': 472.431, 'start': 346.237, 'title': 'Understanding covariance and its importance', 'summary': 'Explains the main idea behind covariance, which is its ability to classify relationships into positive trends, negative trends, and no relationship, and highlights its role as a computational stepping stone to correlation.', 'duration': 126.194, 'highlights': ['Covariance can classify relationships into positive trends, negative trends, and no relationship, serving as a computational stepping stone to correlation, making it an important concept. (relevance score: 5)', 'Covariance is a computational stepping stone to something interesting like correlation, implying its significance in statistical analysis. (relevance score: 4)', 'Covariance is not very interesting in itself and is a computational stepping stone to something more interesting like correlation, emphasizing its role as a precursor to correlation. (relevance score: 3)', 'The main idea behind covariance is its ability to classify relationships into positive trends, negative trends, and no relationship, making it a crucial concept in data analysis. (relevance score: 2)', 'Covariance is calculated with a slightly nasty looking thing and is related to the mean value for gene X, providing insight into the computational aspect of covariance. (relevance score: 1)']}], 'duration': 126.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY346237.jpg', 'highlights': ['Covariance can classify relationships into positive trends, negative trends, and no relationship, serving as a computational stepping stone to correlation, making it an important concept.', 'Covariance is a computational stepping stone to something interesting like correlation, implying its significance in statistical analysis.', 'Covariance is not very interesting in itself and is a computational stepping stone to something more interesting like correlation, emphasizing its role as a precursor to correlation.', 'The main idea behind covariance is its ability to classify relationships into positive trends, negative trends, and no relationship, making it a crucial concept in data analysis.', 'Covariance is calculated with a slightly nasty looking thing and is related to the mean value for gene X, providing insight into the computational aspect of covariance.']}, {'end': 632.12, 'segs': [{'end': 632.12, 'src': 'embed', 'start': 565.813, 'weight': 0, 'content': [{'end': 571.619, 'text': 'So we plug in the values for gene X, and that gives us a negative difference.', 'start': 565.813, 'duration': 5.806}, {'end': 579.545, 'text': 'Now we plug in the values for gene Y, and that gives us another negative difference.', 'start': 573.24, 'duration': 6.305}, {'end': 588.412, 'text': 'Again, since both differences are negative, multiplying them together gives us a positive value.', 'start': 581.306, 'duration': 7.106}, {'end': 598.971, 'text': 'So we see that when the values for gene X and gene Y are both less than their respective averages, we end up with positive values.', 'start': 590.006, 'duration': 8.965}, {'end': 611.978, 'text': 'Bam! The remaining three cells are to the right of the solid green line, so we see that they are all greater than the average value for gene X.', 'start': 600.051, 'duration': 11.927}, {'end': 619.322, 'text': 'And they are all above the solid red line, so we see that they are also greater than the average value for gene Y.', 'start': 611.978, 'duration': 7.344}, {'end': 629.359, 'text': 'Thus, when we plug in the values and do the math, we end up with positive values.', 'start': 621.015, 'duration': 8.344}, {'end': 632.12, 'text': "Doing the math, I'm doing the math.", 'start': 630.479, 'duration': 1.641}], 'summary': 'When gene x and gene y values are below average, result is positive.', 'duration': 66.307, 'max_score': 565.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY565813.jpg'}], 'start': 474.391, 'title': 'Analyzing gene expression data', 'summary': 'Explains how to analyze gene expression data by comparing gene measurements to their respective means, resulting in positive values when both genes are less than their averages and vice versa, using specific examples and calculations.', 'chapters': [{'end': 632.12, 'start': 474.391, 'title': 'Analyzing gene expression data', 'summary': 'Explains how to analyze gene expression data by comparing gene measurements to their respective means, resulting in positive values when both genes are less than their averages and vice versa, using specific examples and calculations.', 'duration': 157.729, 'highlights': ['The chapter emphasizes that when gene X and gene Y values are both less than their respective averages, the resulting product is positive, illustrated through examples and calculations.', 'By analyzing specific data points, it is shown that when both gene X and gene Y values are less than their means, the resulting differences are negative, and when multiplied, yield a positive value.', 'The chapter demonstrates that when gene X and gene Y measurements are greater than their respective averages, the resulting values are positive, supported by specific examples and calculations.']}], 'duration': 157.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY474391.jpg', 'highlights': ['The chapter demonstrates that when gene X and gene Y measurements are greater than their respective averages, the resulting values are positive, supported by specific examples and calculations.', 'The chapter emphasizes that when gene X and gene Y values are both less than their respective averages, the resulting product is positive, illustrated through examples and calculations.', 'By analyzing specific data points, it is shown that when both gene X and gene Y values are less than their means, the resulting differences are negative, and when multiplied, yield a positive value.']}, {'end': 745.598, 'segs': [{'end': 697.563, 'src': 'embed', 'start': 666.285, 'weight': 0, 'content': [{'end': 673.402, 'text': 'And then we divide by the number of measurements, n, which in this case is 5, minus 1.', 'start': 666.285, 'duration': 7.117}, {'end': 677.446, 'text': 'And ultimately, we end up with a covariance equal to 116.', 'start': 673.402, 'duration': 4.044}, {'end': 689.817, 'text': 'Since the covariance value, 116, is positive, it means that the slope of the relationship between gene X and gene Y is positive.', 'start': 677.446, 'duration': 12.371}, {'end': 697.563, 'text': 'In other words, when the covariance value is positive, we classify the trend as positive.', 'start': 691.198, 'duration': 6.365}], 'summary': 'Covariance of 116 indicates positive relationship between gene x and gene y.', 'duration': 31.278, 'max_score': 666.285, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY666285.jpg'}, {'end': 745.598, 'src': 'embed', 'start': 723.141, 'weight': 3, 'content': [{'end': 732.848, 'text': "More importantly, the covariance value doesn't tell us if the points are relatively close to the dotted line or relatively far from the dotted line.", 'start': 723.141, 'duration': 9.707}, {'end': 738.713, 'text': 'Again, it just tells us that the slope of the relationship is positive.', 'start': 734.109, 'duration': 4.604}, {'end': 745.598, 'text': "we'll talk about why the covariance value is so hard to interpret later.", 'start': 741.315, 'duration': 4.283}], 'summary': 'Covariance indicates positive slope, not proximity to line.', 'duration': 22.457, 'max_score': 723.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY723141.jpg'}], 'start': 632.4, 'title': 'Covariance calculation and interpretation', 'summary': 'Explains the process of calculating covariance, resulting in a positive value of 116, indicating a positive trend between gene x and gene y, and discusses the limitations of interpreting covariance values.', 'chapters': [{'end': 745.598, 'start': 632.4, 'title': 'Covariance calculation and interpretation', 'summary': 'Explains the process of calculating covariance, arriving at a positive covariance value of 116, and the significance of a positive covariance value in indicating a positive trend between gene x and gene y, while also highlighting the limitations of interpreting covariance values.', 'duration': 113.198, 'highlights': ['The covariance value of 116 signifies a positive slope in the relationship between gene X and gene Y, indicating a positive trend (116).', 'The positive covariance value contributes positive values to the total covariance in the context of the data (positive values).', 'The number of measurements, n, in this case, is 5, resulting in a covariance of 116 (n=5, covariance=116).', "The covariance value does not provide information about the steepness of the relationship's slope or the proximity of the data points to the line (interpretation limitations).", "The chapter emphasizes the difficulty in interpreting the covariance value, highlighting its limitations in depicting the relationship's characteristics (interpretation challenges)."]}], 'duration': 113.198, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY632400.jpg', 'highlights': ['The covariance value of 116 signifies a positive slope in the relationship between gene X and gene Y, indicating a positive trend (116)', 'The number of measurements, n, in this case, is 5, resulting in a covariance of 116 (n=5, covariance=116)', 'The positive covariance value contributes positive values to the total covariance in the context of the data (positive values)', "The covariance value does not provide information about the steepness of the relationship's slope or the proximity of the data points to the line (interpretation limitations)", "The chapter emphasizes the difficulty in interpreting the covariance value, highlighting its limitations in depicting the relationship's characteristics (interpretation challenges)"]}, {'end': 1102.064, 'segs': [{'end': 782.064, 'src': 'embed', 'start': 747.34, 'weight': 0, 'content': [{'end': 754.946, 'text': 'And remember, even though covariance is hard to interpret, it is a computational stepping stone to more interesting things.', 'start': 747.34, 'duration': 7.606}, {'end': 762.679, 'text': "Bam! Now let's imagine we got different values for gene Y.", 'start': 756.087, 'duration': 6.592}, {'end': 768.141, 'text': 'Just like before, we can graph the data using pairs of X and Y-axis values.', 'start': 762.679, 'duration': 5.462}, {'end': 776.723, 'text': 'The mean value for gene X is the same as before, 17.6.', 'start': 770.201, 'duration': 6.522}, {'end': 782.064, 'text': 'And the mean value for gene Y is 20.2.', 'start': 776.723, 'duration': 5.341}], 'summary': "Covariance is a computational stepping stone, with gene x's mean at 17.6 and gene y's at 20.2.", 'duration': 34.724, 'max_score': 747.34, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY747340.jpg'}, {'end': 880.894, 'src': 'embed', 'start': 843.47, 'weight': 1, 'content': [{'end': 848.933, 'text': 'In summary, data in these quadrants contribute negative values to the covariance.', 'start': 843.47, 'duration': 5.463}, {'end': 861.099, 'text': 'Now we add up each term, and then we divide by the number of measurements n, which is 5 minus 1.', 'start': 850.914, 'duration': 10.185}, {'end': 864.761, 'text': 'And ultimately, we end up with a covariance equal to negative 105.15.', 'start': 861.099, 'duration': 3.662}, {'end': 880.894, 'text': 'Since the covariance value negative 105.15 is negative, it means that the slope of the relationship between gene X and gene Y is negative.', 'start': 864.761, 'duration': 16.133}], 'summary': 'Covariance is negative, -105.15, indicating a negative slope in gene x and gene y relationship.', 'duration': 37.424, 'max_score': 843.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY843470.jpg'}, {'end': 1002.549, 'src': 'heatmap', 'start': 954.741, 'weight': 4, 'content': [{'end': 960.223, 'text': 'Doing the rest of the math gives us zero in the numerator, and the whole thing equals zero.', 'start': 954.741, 'duration': 5.482}, {'end': 970.207, 'text': 'Likewise, when every value for gene Y corresponds to the same value for gene X, the covariance equals zero.', 'start': 962.024, 'duration': 8.183}, {'end': 983.812, 'text': 'In this last case we can see that even though there are multiple values for gene X and Y, there is still no trend, because as gene X increases,', 'start': 973.508, 'duration': 10.304}, {'end': 987.212, 'text': 'gene Y increases and decreases.', 'start': 984.369, 'duration': 2.843}, {'end': 995.401, 'text': 'In other words, the negative value for this point is canceled out by this positive point.', 'start': 988.634, 'duration': 6.767}, {'end': 1002.549, 'text': 'And this positive point is canceled out by this negative point.', 'start': 997.343, 'duration': 5.206}], 'summary': 'Covariance between gene x and y is zero when their values correspond and no trend is observed.', 'duration': 32.471, 'max_score': 954.741, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY954741.jpg'}, {'end': 1032.934, 'src': 'heatmap', 'start': 1008.587, 'weight': 0.828, 'content': [{'end': 1017.39, 'text': 'So we see that the covariance equals zero when there is no relationship between gene X and gene Y.', 'start': 1008.587, 'duration': 8.803}, {'end': 1024.592, 'text': "Double bam! Now let's talk about why the covariance value is hard to interpret.", 'start': 1017.39, 'duration': 7.202}, {'end': 1032.934, 'text': "To see why the covariance value is difficult to interpret, let's go all the way back to looking at just gene X.", 'start': 1025.872, 'duration': 7.062}], 'summary': 'Covariance equals zero when no relationship between gene x and gene y.', 'duration': 24.347, 'max_score': 1008.587, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY1008587.jpg'}, {'end': 1059.979, 'src': 'embed', 'start': 1025.872, 'weight': 5, 'content': [{'end': 1032.934, 'text': "To see why the covariance value is difficult to interpret, let's go all the way back to looking at just gene X.", 'start': 1025.872, 'duration': 7.062}, {'end': 1038.911, 'text': 'and calculate the covariance between gene X and itself.', 'start': 1034.969, 'duration': 3.942}, {'end': 1043.152, 'text': 'Just like before, we can plot the data.', 'start': 1040.81, 'duration': 2.342}, {'end': 1051.756, 'text': 'And the mean value for gene X is the same as before, 17.6.', 'start': 1044.953, 'duration': 6.803}, {'end': 1055.697, 'text': 'And we use 17.6 on the y-axis as well.', 'start': 1051.756, 'duration': 3.941}, {'end': 1059.979, 'text': 'Now we are ready to calculate the covariance.', 'start': 1057.558, 'duration': 2.421}], 'summary': 'Exploring the difficulty of interpreting covariance, finding mean value for gene x is 17.6.', 'duration': 34.107, 'max_score': 1025.872, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY1025872.jpg'}], 'start': 747.34, 'title': 'Covariance in gene data analysis', 'summary': 'Delves into the computational significance of covariance and visualization of gene data analysis, with mean values of gene x as 17.6 and gene y as 20.2. it explains the determination of the relationship between the two genes through resulting negative numbers and discusses the interpretation of covariance, yielding a negative value of -105.15 for gene x and gene y.', 'chapters': [{'end': 841.865, 'start': 747.34, 'title': 'Covariance and gene data analysis', 'summary': 'Discusses the concept of covariance, its computational significance, and the visualization of gene data analysis, with mean values of gene x as 17.6 and gene y as 20.2, showing how the relationship between the two genes is determined through the resulting negative numbers.', 'duration': 94.525, 'highlights': ['The mean value for gene Y is 20.2, while for gene X, it is 17.6.', 'Explaining the visualization of data points and the determination of their relationship through resulting negative numbers.', 'Emphasizing the computational significance of covariance as a stepping stone to more interesting concepts.']}, {'end': 1102.064, 'start': 843.47, 'title': 'Covariance and its interpretation in gene analysis', 'summary': 'Discusses the calculation of covariance for gene x and gene y, yielding a negative value of -105.15, indicating a negative slope in the relationship. it also explains how covariance equals zero when there is no relationship between gene x and gene y, and delves into the challenges of interpreting covariance values.', 'duration': 258.594, 'highlights': ['The calculation of covariance for gene X and gene Y results in a negative value of -105.15, indicating a negative slope in the relationship.', 'The explanation of how covariance equals zero when there is no relationship between gene X and gene Y, as seen when every value for gene X corresponds to the same value for gene Y.', 'The discussion on the challenges of interpreting covariance values, particularly when calculating the covariance for gene X with itself and its connection to estimating variance.']}], 'duration': 354.724, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY747340.jpg', 'highlights': ['The mean value for gene Y is 20.2, while for gene X, it is 17.6.', 'The calculation of covariance for gene X and gene Y results in a negative value of -105.15, indicating a negative slope in the relationship.', 'Explaining the visualization of data points and the determination of their relationship through resulting negative numbers.', 'Emphasizing the computational significance of covariance as a stepping stone to more interesting concepts.', 'The explanation of how covariance equals zero when there is no relationship between gene X and gene Y, as seen when every value for gene X corresponds to the same value for gene Y.', 'The discussion on the challenges of interpreting covariance values, particularly when calculating the covariance for gene X with itself and its connection to estimating variance.']}, {'end': 1341.989, 'segs': [{'end': 1277.469, 'src': 'embed', 'start': 1102.064, 'weight': 0, 'content': [{'end': 1107.387, 'text': 'Now, when we do the math, we get 102.', 'start': 1102.064, 'duration': 5.323}, {'end': 1115.012, 'text': 'Since the covariance value is positive, we know that the relationship between gene X and itself has a positive slope.', 'start': 1107.387, 'duration': 7.625}, {'end': 1120.375, 'text': "So let's move the graph and the covariance value over here.", 'start': 1116.993, 'duration': 3.382}, {'end': 1126.298, 'text': 'and see what happens when we multiply the data by 2.', 'start': 1121.755, 'duration': 4.543}, {'end': 1131.821, 'text': 'Now, the X and Y axis labels on the right are twice what they are on the left.', 'start': 1126.298, 'duration': 5.523}, {'end': 1136.644, 'text': 'And the new mean values are twice what they were before.', 'start': 1133.402, 'duration': 3.242}, {'end': 1141.807, 'text': 'But the relative positions of the data did not change.', 'start': 1138.405, 'duration': 3.402}, {'end': 1147.71, 'text': 'And each dot still falls on the same straight line with positive slope.', 'start': 1143.247, 'duration': 4.463}, {'end': 1153.796, 'text': 'In other words, the only thing that changed was the scale that the data is on.', 'start': 1149.232, 'duration': 4.564}, {'end': 1163.585, 'text': 'However, when we do the math, we get covariance equals 408, which is 4 times what we got before.', 'start': 1155.037, 'duration': 8.548}, {'end': 1170.912, 'text': 'Thus, we see that the covariance value changes even when the relationship does not.', 'start': 1165.347, 'duration': 5.565}, {'end': 1179.877, 'text': 'In other words, covariance values are sensitive to the scale of the data, and this makes them difficult to interpret.', 'start': 1172.992, 'duration': 6.885}, {'end': 1192.386, 'text': 'This sensitivity to scale also prevents the covariance value from telling us if the data are close to the dotted line that represents the relationship or far from it.', 'start': 1181.598, 'duration': 10.788}, {'end': 1202.219, 'text': 'In this example, the covariance on the left, when each point is on the dotted line, is 102.', 'start': 1194.147, 'duration': 8.072}, {'end': 1210.245, 'text': 'And the covariance on the right, when the data are relatively far from the dotted line, is 381.', 'start': 1202.219, 'duration': 8.026}, {'end': 1215.769, 'text': 'So, in this case, when the data are far from the line, the covariance is larger.', 'start': 1210.245, 'duration': 5.524}, {'end': 1220.753, 'text': "Now let's just change the scale on the right hand side.", 'start': 1217.651, 'duration': 3.102}, {'end': 1224.476, 'text': 'And recalculate the covariance.', 'start': 1222.414, 'duration': 2.062}, {'end': 1229.92, 'text': 'And now the covariance is less for the data that does not fall on the line.', 'start': 1225.717, 'duration': 4.203}, {'end': 1240.235, 'text': "If you're thinking, I sure wish there was something to describe relationships that wasn't sensitive to the scale of the data, then you're in luck.", 'start': 1232.088, 'duration': 8.147}, {'end': 1245.158, 'text': 'Calculating covariance is the first step in calculating correlation.', 'start': 1241.275, 'duration': 3.883}, {'end': 1250.603, 'text': 'Correlation describes relationships and is not sensitive to the scale of the data.', 'start': 1245.939, 'duration': 4.664}, {'end': 1255.867, 'text': "And we'll talk about correlation more in the next video in this series.", 'start': 1251.804, 'duration': 4.063}, {'end': 1263.9, 'text': "It's also worth mentioning that covariance values are used as stepping stones in a wide variety of analyses.", 'start': 1257.495, 'duration': 6.405}, {'end': 1271.705, 'text': 'For example, covariance values were used for principal component analysis PCA,', 'start': 1265.581, 'duration': 6.124}, {'end': 1277.469, 'text': 'and are still used in other settings as computational stepping stones to other more interesting things.', 'start': 1271.705, 'duration': 5.764}], 'summary': 'Covariance values change with scale but correlation is not sensitive to scale.', 'duration': 175.405, 'max_score': 1102.064, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY1102064.jpg'}, {'end': 1341.989, 'src': 'embed', 'start': 1335.784, 'weight': 9, 'content': [{'end': 1337.966, 'text': 'The links to do this are in the description below.', 'start': 1335.784, 'duration': 2.182}, {'end': 1341.989, 'text': 'Alright, until next time, quest on!.', 'start': 1338.867, 'duration': 3.122}], 'summary': 'Links for further action in the description below. quest on!', 'duration': 6.205, 'max_score': 1335.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY1335784.jpg'}], 'start': 1102.064, 'title': 'Covariance influence', 'summary': 'Discusses the impact of scaling on covariance, showing that doubling the data scale results in a fourfold increase in covariance, and emphasizes the sensitivity of covariance to data scale and its role as a foundation for correlation analysis. it also concludes with a call to action for support and subscription to statquest.', 'chapters': [{'end': 1163.585, 'start': 1102.064, 'title': 'Covariance and scaling in data analysis', 'summary': "Explains the impact of scaling data on covariance, demonstrating that while scaling the data by 2 results in the axis labels and mean values also doubling, the covariance value increases by a factor of 4, emphasizing the scale's influence on covariance.", 'duration': 61.521, 'highlights': ['When scaling the data by 2, the covariance value increases by a factor of 4, reaching 408, indicating the significant impact of scaling on covariance.', 'The relationship between gene X and itself has a positive slope, as indicated by the positive covariance value of 102, showcasing the nature of the relationship between the variables.', 'The relative positions of the data remain unchanged when scaling by 2, with the dots still falling on the same straight line with a positive slope, emphasizing the impact of scaling solely on the data scale and not its relative positions.']}, {'end': 1311.775, 'start': 1165.347, 'title': 'Understanding covariance in data analysis', 'summary': 'Explains the sensitivity of covariance values to data scale, illustrates the impact on covariance when data are close or far from the line, and highlights the role of covariance as a stepping stone for correlation and other analyses.', 'duration': 146.428, 'highlights': ['The covariance on the right, when the data are relatively far from the dotted line, is 381, showing that when the data are far from the line, the covariance is larger.', 'The covariance is less for the data that does not fall on the line when the scale is changed, indicating the impact of data scale on covariance.', 'Covariance is the first step in calculating correlation, which describes relationships and is not sensitive to the scale of the data, providing a more reliable measure.', 'Covariance values are used as stepping stones in various analyses, such as principal component analysis (PCA), demonstrating their significance in data analysis.']}, {'end': 1341.989, 'start': 1312.276, 'title': 'Statquest end', 'summary': 'Covers the conclusion of statquest with a call to action for support and subscription, emphasizing the importance of correlation calculations and computational settings.', 'duration': 29.713, 'highlights': ['The conclusion emphasizes the importance of correlation calculations and computational settings, urging support and subscription to StatQuest.', 'The transcript mentions the availability of original songs, t-shirts, hoodies, and the option to donate money to support StatQuest.', 'The chapter ends with an encouraging message for the viewers to continue on their quest.']}], 'duration': 239.925, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/qtaqvPAeEJY/pics/qtaqvPAeEJY1102064.jpg', 'highlights': ['Doubling data scale results in fourfold increase in covariance, reaching 408', 'Covariance is less for data not falling on the line when scale is changed', 'Covariance values are used in principal component analysis (PCA)', 'Covariance is the first step in calculating correlation, providing a reliable measure', 'Positive covariance value of 102 showcases the nature of the relationship between variables', 'Covariance on the right, when data are far from the line, is 381', 'Scaling data by 2 results in unchanged relative positions of the data', 'Covariance emphasizes the sensitivity to data scale and its role in correlation analysis', 'Conclusion emphasizes importance of correlation calculations and computational settings', 'Encouraging message for viewers to continue on their quest']}], 'highlights': ['Covariance is a computational stepping stone to correlation, implying its significance in statistical analysis.', 'Covariance can classify relationships into positive trends, negative trends, and no relationship, serving as a computational stepping stone to correlation, making it an important concept.', 'The chapter provides an introduction to covariance and correlation, emphasizing the estimation of mean and variance for two different genes measured in the same cells or grocery stores.', 'The positive slope of the line representing the relationship between gene X and gene Y indicates high correspondence.', 'The chapter demonstrates that when gene X and gene Y measurements are greater than their respective averages, the resulting values are positive, supported by specific examples and calculations.', 'Doubling data scale results in fourfold increase in covariance, reaching 408', 'Covariance is calculated with a slightly nasty looking thing and is related to the mean value for gene X, providing insight into the computational aspect of covariance.', "The chapter emphasizes the difficulty in interpreting the covariance value, highlighting its limitations in depicting the relationship's characteristics (interpretation challenges)", 'Covariance is not very interesting in itself and is a computational stepping stone to something more interesting like correlation, emphasizing its role as a precursor to correlation.', 'The chapter highlights the significance of covariance as a way to understand if the measurements taken in pairs provide additional insights compared to individual measurements.']}