Coursnap

title
False Discovery Rates, FDR, clearly explained

description
One of the best ways to prevent p-hacking is to adjust p-values for multiple testing. This StatQuest explains how the Benjamini-Hochberg method corrects for multiple-testing and FDR. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Buying The StatQuest Illustrated Guide to Machine Learning!!! PDF - https://statquest.gumroad.com/l/wvtmc Paperback - https://www.amazon.com/dp/B09ZCKR4H6 Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...a cool StatQuest t-shirt or sweatshirt: https://shop.spreadshirt.com/statquest-with-josh-starmer/ ...buying one or two of my songs (or go large and get a whole album!) https://joshuastarmer.bandcamp.com/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer #statistics #pvalue #fdr

detail
{'title': 'False Discovery Rates, FDR, clearly explained', 'heatmap': [{'end': 681.648, 'start': 628.945, 'weight': 0.735}, {'end': 753.969, 'start': 693.163, 'weight': 0.715}, {'end': 1030.311, 'start': 1016.945, 'weight': 0.707}, {'end': 1102.292, 'start': 1083.995, 'weight': 0.794}], 'summary': 'Explains the concept of false discovery rates (fdr) in genetics research, illustrating the occurrence of 500 false positives in the comparison of 10,000 genes due to a p-value greater than 0.05 in 95% of cases, and discusses the benjamini-hochberg method to limit false positives.', 'chapters': [{'end': 55.036, 'segs': [{'end': 55.036, 'src': 'embed', 'start': 1.121, 'weight': 0, 'content': [{'end': 9.447, 'text': "Holy freaking smokes! It's time for StatQuest! Hello and welcome to StatQuest.", 'start': 1.121, 'duration': 8.326}, {'end': 16.872, 'text': 'StatQuest is brought to you by the friendly folks in the genetics department at the University of North Carolina at Chapel Hill.', 'start': 10.187, 'duration': 6.685}, {'end': 21.715, 'text': "Today we're going to be talking about false discovery rates or FDR.", 'start': 17.753, 'duration': 3.962}, {'end': 32.916, 'text': "If you've ever seen or done anything with high throughput sequencing Chances are you've heard of false discovery rates, FDR, before.", 'start': 24.217, 'duration': 8.699}, {'end': 35.298, 'text': 'You may have even used them.', 'start': 33.857, 'duration': 1.441}, {'end': 40.903, 'text': 'But where do false discovery rates come from? And how do they work?', 'start': 36.459, 'duration': 4.444}, {'end': 48.55, 'text': 'Before we get down to the nitty gritty, let me blurt out the main idea of this whole StatQuest.', 'start': 42.645, 'duration': 5.905}, {'end': 55.036, 'text': 'False discovery rates are a tool to weed out bad data that looks good.', 'start': 49.611, 'duration': 5.425}], 'summary': 'Statquest discusses false discovery rates to weed out bad data in high throughput sequencing.', 'duration': 53.915, 'max_score': 1.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo1121.jpg'}], 'start': 1.121, 'title': 'False discovery rates', 'summary': 'Introduces the concept of false discovery rates (fdr) as a tool to weed out bad data in high throughput sequencing, commonly used in genetics research at the university of north carolina at chapel hill.', 'chapters': [{'end': 55.036, 'start': 1.121, 'title': 'Statquest: understanding false discovery rates', 'summary': 'Introduces the concept of false discovery rates (fdr) as a tool to weed out bad data in high throughput sequencing, commonly used in genetics research at the university of north carolina at chapel hill.', 'duration': 53.915, 'highlights': ['False discovery rates (FDR) are a tool to weed out bad data that looks good in high throughput sequencing.', 'StatQuest is brought to you by the genetics department at the University of North Carolina at Chapel Hill.', 'The chapter discusses the origin and functionality of false discovery rates (FDR) in the context of high throughput sequencing.']}], 'duration': 53.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo1121.jpg', 'highlights': ['False discovery rates (FDR) are a tool to weed out bad data in high throughput sequencing.', 'The chapter discusses the origin and functionality of false discovery rates (FDR) in high throughput sequencing.', 'StatQuest is brought to you by the genetics department at the University of North Carolina at Chapel Hill.']}, {'end': 329.858, 'segs': [{'end': 111.664, 'src': 'embed', 'start': 56.388, 'weight': 0, 'content': [{'end': 58.749, 'text': "Now let's get down to the nitty-gritty.", 'start': 56.388, 'duration': 2.361}, {'end': 65.37, 'text': "Let's start with an example of measuring gene expression using RNA sequencing.", 'start': 60.289, 'duration': 5.081}, {'end': 75.252, 'text': "Here we're going to plot the measurements or the read counts for a gene called gene X, which is an imaginary gene, on a graph,", 'start': 66.55, 'duration': 8.702}, {'end': 79.613, 'text': 'with the y-axis being gene counts and the x-axis being the samples.', 'start': 75.252, 'duration': 4.361}, {'end': 85.414, 'text': 'For this example, imagine that we are looking at normal wild-type mice.', 'start': 80.813, 'duration': 4.601}, {'end': 90.712, 'text': "Later on, we'll be comparing them to mice that have been treated with a drug.", 'start': 86.69, 'duration': 4.022}, {'end': 100.278, 'text': "Isn't it funny that normal mice are called wild type? If someone said I was a wild type, I don't think they would also think I was normal.", 'start': 91.993, 'duration': 8.285}, {'end': 106.981, 'text': "Anyway, here's our measurement for the first mouse that we do this RNA sequencing on.", 'start': 101.779, 'duration': 5.202}, {'end': 111.664, 'text': "And here's the measurement for the second mouse that we do the RNA sequencing on.", 'start': 108.042, 'duration': 3.622}], 'summary': 'Measuring gene expression using rna sequencing on normal wild-type mice and comparing them to mice treated with a drug.', 'duration': 55.276, 'max_score': 56.388, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo56388.jpg'}, {'end': 170.612, 'src': 'embed', 'start': 141.697, 'weight': 1, 'content': [{'end': 145.059, 'text': "Rarely, we'll get a value that is much larger than the mean.", 'start': 141.697, 'duration': 3.362}, {'end': 149.803, 'text': "and rarely we'll get a value that is much smaller than the mean.", 'start': 146.502, 'duration': 3.301}, {'end': 156.346, 'text': 'We can summarize the distribution of the measurements using this bell-shaped curve.', 'start': 151.224, 'duration': 5.122}, {'end': 162.569, 'text': 'Most of the measurements which are close to the mean will come from the middle of this curve.', 'start': 157.747, 'duration': 4.822}, {'end': 170.612, 'text': 'The rare measurement that is significantly larger than the average would come from the right side of the bell-shaped curve.', 'start': 163.929, 'duration': 6.683}], 'summary': 'Bell-shaped curve summarizes distribution of measurements.', 'duration': 28.915, 'max_score': 141.697, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo141697.jpg'}, {'end': 301.042, 'src': 'embed', 'start': 216.032, 'weight': 3, 'content': [{'end': 225.094, 'text': 'If we did a statistical test to compare sample number 1 to sample number 2, the p-value would be large greater than 0.05,', 'start': 216.032, 'duration': 9.062}, {'end': 227.014, 'text': 'because the two samples overlap.', 'start': 225.094, 'duration': 1.92}, {'end': 232.635, 'text': "Very rarely, we'll get two samples that do not overlap.", 'start': 228.674, 'duration': 3.961}, {'end': 236.674, 'text': 'When this happens, the p-value will be less than .', 'start': 233.612, 'duration': 3.062}, {'end': 249.905, 'text': '05. This is called a false positive because the small p-value suggests that the samples are from two types of mice or two separate distributions,', 'start': 236.674, 'duration': 13.231}, {'end': 250.826, 'text': 'and this is false.', 'start': 249.905, 'duration': 0.921}, {'end': 261.194, 'text': "Normally, false positives are rare, unless you're a p-hacker, but that's another StatQuest, already on YouTube.", 'start': 252.567, 'duration': 8.627}, {'end': 265.371, 'text': 'Anyways, normally, false positives are rare.', 'start': 262.21, 'duration': 3.161}, {'end': 269.552, 'text': '95% of the time, the two samples will overlap.', 'start': 266.611, 'duration': 2.941}, {'end': 272.613, 'text': 'This will result in a p-value greater than .', 'start': 270.252, 'duration': 2.361}, {'end': 277.154, 'text': "05 5% of the time, they don't.", 'start': 272.613, 'duration': 4.541}, {'end': 282.276, 'text': 'This will result in a false positive with a p-value less than .', 'start': 278.175, 'duration': 4.101}, {'end': 287.658, 'text': '05 But human and mouse cells have at least 10, 000 transcribed genes.', 'start': 282.276, 'duration': 5.382}, {'end': 301.042, 'text': 'If we took two samples from the same type of mice and compared all 10, 000 genes, well, 5% of 10, 000 equals 500 false positives.', 'start': 289.977, 'duration': 11.065}], 'summary': 'Comparing samples with 10,000 genes results in 5% false positives.', 'duration': 85.01, 'max_score': 216.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo216032.jpg'}], 'start': 56.388, 'title': 'Gene expression measurement and p-value in gene comparison', 'summary': 'Illustrates rna sequencing for gene expression measurement with gene x as an example, demonstrating measurement variability and consistency in normal wild-type mice. it also discusses p-value and false positives in gene comparison, emphasizing the occurrence of 500 false positives in the comparison of 10,000 genes due to a p-value greater than 0.05 in 95% of cases.', 'chapters': [{'end': 214.272, 'start': 56.388, 'title': 'Gene expression measurement with rna sequencing', 'summary': 'Illustrates measuring gene expression using rna sequencing with an example of gene x, showcasing the variability in measurements and the distribution of values, which can be summarized using a bell-shaped curve, and comparing samples from normal wild-type mice to demonstrate the consistency of measurements.', 'duration': 157.884, 'highlights': ['RNA sequencing measures gene expression using read counts and showcases the variability in measurements.', 'Demonstrates the concept of bell-shaped curve to summarize the distribution of measurements, with most values clustering around the mean.', 'Compares different sets of measurements from normal wild-type mice to showcase consistency in measurements.']}, {'end': 329.858, 'start': 216.032, 'title': 'P-value and false positives in gene comparison', 'summary': 'Discusses the concept of p-value and false positives in gene comparison, highlighting that 95% of the time, two samples will overlap resulting in a p-value greater than 0.05, leading to 500 false positives in the comparison of 10,000 genes.', 'duration': 113.826, 'highlights': ['Comparing two samples of genes from the same type of mice will result in 500 false positives out of 10,000 genes, as 5% of 10,000 equals 500 false positives.', '95% of the time, the two samples will overlap, resulting in a p-value greater than 0.05, indicating a false positive.', 'A small p-value suggests that the samples are from two types of mice or two separate distributions, leading to false positives, with 500 genes appearing to be interesting when they are not.']}], 'duration': 273.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo56388.jpg', 'highlights': ['RNA sequencing measures gene expression using read counts and showcases the variability in measurements.', 'Demonstrates the concept of bell-shaped curve to summarize the distribution of measurements, with most values clustering around the mean.', 'Compares different sets of measurements from normal wild-type mice to showcase consistency in measurements.', 'Comparing two samples of genes from the same type of mice will result in 500 false positives out of 10,000 genes, as 5% of 10,000 equals 500 false positives.', '95% of the time, the two samples will overlap, resulting in a p-value greater than 0.05, indicating a false positive.', 'A small p-value suggests that the samples are from two types of mice or two separate distributions, leading to false positives, with 500 genes appearing to be interesting when they are not.']}, {'end': 1106.076, 'segs': [{'end': 359.371, 'src': 'embed', 'start': 330.619, 'weight': 4, 'content': [{'end': 335.162, 'text': 'In particular, it is used for the Benjamini-Hochberg method.', 'start': 330.619, 'duration': 4.543}, {'end': 343.768, 'text': "Now there's a high probability that I just mispronounced Benjamini or Hochberg, and if I did, I apologize.", 'start': 336.102, 'duration': 7.666}, {'end': 352.204, 'text': "Before we talk about the details of the Benjamini-Hochberg method, let's review the concepts that it's based on.", 'start': 345.618, 'duration': 6.586}, {'end': 359.371, 'text': "We'll start by generating 10, 000 p-values from samples taken from the same distribution.", 'start': 353.606, 'duration': 5.765}], 'summary': 'Reviewing concepts and generating 10,000 p-values for benjamini-hochberg method', 'duration': 28.752, 'max_score': 330.619, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo330619.jpg'}, {'end': 490.637, 'src': 'embed', 'start': 421.655, 'weight': 1, 'content': [{'end': 425.736, 'text': 'On the y-axis, we have the number of p-values in each bin.', 'start': 421.655, 'duration': 4.081}, {'end': 428.368, 'text': '510 p-values, or 5.1%, are less than .', 'start': 427.247, 'duration': 1.121}, {'end': 431.75, 'text': '05 Close to 5% of the p-values are between 0.5 and 0.1.', 'start': 428.368, 'duration': 3.382}, {'end': 436.234, 'text': 'Actually, each bin contains about 5% of the p-values, about 500 p-values per bin.', 'start': 431.75, 'duration': 4.484}, {'end': 460.714, 'text': "Since the p-values are uniformly distributed, There's an equal probability that a test p-value falls into any one of these bins.", 'start': 436.254, 'duration': 24.46}, {'end': 468.216, 'text': "Now, let's look at how p-values are distributed when they come from two different distributions.", 'start': 462.375, 'duration': 5.841}, {'end': 479.979, 'text': 'And by two different distributions, I mean two different types of mice, or we have wild type versus knockout, or control versus drugged.', 'start': 469.817, 'duration': 10.162}, {'end': 482.96, 'text': "We're just comparing two different situations.", 'start': 480.619, 'duration': 2.341}, {'end': 490.637, 'text': 'Like before, we start off with test number one, but now we have two different distributions.', 'start': 484.335, 'duration': 6.302}], 'summary': '5.1% of p-values are less than 0.05, and each bin contains about 500 p-values.', 'duration': 68.982, 'max_score': 421.655, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo421655.jpg'}, {'end': 681.648, 'src': 'heatmap', 'start': 628.945, 'weight': 0.735, 'content': [{'end': 633.629, 'text': 'Since the samples come from different distributions, the p-values are skewed.', 'start': 628.945, 'duration': 4.684}, {'end': 639.414, 'text': 'The remaining 9, 000 active genes might not be affected by the drug.', 'start': 634.73, 'duration': 4.684}, {'end': 644.819, 'text': 'This means the measurements for most of the genes will come from the same distribution.', 'start': 640.395, 'duration': 4.424}, {'end': 650.184, 'text': 'The p-values for these genes should be uniformly distributed.', 'start': 646.34, 'duration': 3.844}, {'end': 659.187, 'text': 'The histogram of p-values we obtain from all 10, 000 genes is the sum of the two separate histograms.', 'start': 651.465, 'duration': 7.722}, {'end': 665.513, 'text': 'The uniformly distributed p-values come from the genes unaffected by the drug.', 'start': 660.589, 'duration': 4.924}, {'end': 673.48, 'text': 'The p-values on the left side are a mixture from genes affected by the drug and genes unaffected by the drug.', 'start': 666.594, 'duration': 6.886}, {'end': 681.648, 'text': 'By eye, we can see where the p-values are uniformly distributed and determine how many tests are in each bin.', 'start': 674.501, 'duration': 7.147}], 'summary': '10,000 gene study: 9,000 unaffected by drug, showing uniform p-value distribution.', 'duration': 52.703, 'max_score': 628.945, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo628945.jpg'}, {'end': 753.969, 'src': 'heatmap', 'start': 693.163, 'weight': 0.715, 'content': [{'end': 698.246, 'text': 'We can extend this line and use it as a cutoff to identify the true positives.', 'start': 693.163, 'duration': 5.083}, {'end': 705.071, 'text': "Since we usually use a cutoff of 0.05, we're going to focus on these p-values.", 'start': 699.567, 'duration': 5.504}, {'end': 708.453, 'text': 'Roughly 450 p-values less than 0.05 are above the dotted line.', 'start': 706.231, 'duration': 2.222}, {'end': 715.636, 'text': 'and roughly 450 p-values less than .', 'start': 712.955, 'duration': 2.681}, {'end': 718.416, 'text': '05 are below the dotted line.', 'start': 715.636, 'duration': 2.78}, {'end': 729.998, 'text': 'One way to isolate the true positives from the false positives would be to only consider the smallest 450 p-values.', 'start': 719.736, 'duration': 10.262}, {'end': 739.598, 'text': 'This procedure works fairly well because the p-values within the bins are skewed for the genes affected by the drug Note.', 'start': 730.839, 'duration': 8.759}, {'end': 744.061, 'text': 'this histogram is for p-values between 0 and.', 'start': 739.598, 'duration': 4.463}, {'end': 748.424, 'text': '05, and spread evenly for the genes not affected by the drug.', 'start': 744.061, 'duration': 4.363}, {'end': 753.969, 'text': 'BAM!. If you can understand these concepts,', 'start': 749.725, 'duration': 4.244}], 'summary': 'About 450 p-values less than 0.05 above and below the line help isolate true positives from false positives.', 'duration': 60.806, 'max_score': 693.163, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo693163.jpg'}, {'end': 753.969, 'src': 'embed', 'start': 719.736, 'weight': 3, 'content': [{'end': 729.998, 'text': 'One way to isolate the true positives from the false positives would be to only consider the smallest 450 p-values.', 'start': 719.736, 'duration': 10.262}, {'end': 739.598, 'text': 'This procedure works fairly well because the p-values within the bins are skewed for the genes affected by the drug Note.', 'start': 730.839, 'duration': 8.759}, {'end': 744.061, 'text': 'this histogram is for p-values between 0 and.', 'start': 739.598, 'duration': 4.463}, {'end': 748.424, 'text': '05, and spread evenly for the genes not affected by the drug.', 'start': 744.061, 'duration': 4.363}, {'end': 753.969, 'text': 'BAM!. If you can understand these concepts,', 'start': 749.725, 'duration': 4.244}], 'summary': 'Isolate true positives by considering the smallest 450 p-values for genes affected by the drug note.', 'duration': 34.233, 'max_score': 719.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo719736.jpg'}, {'end': 809.439, 'src': 'embed', 'start': 779.514, 'weight': 0, 'content': [{'end': 787.476, 'text': 'The Benjamini-Hochberg method adjusts p-values in a way that limits the number of false positives that are reported as significant.', 'start': 779.514, 'duration': 7.962}, {'end': 791.694, 'text': 'Adjust p-values means that it makes them larger.', 'start': 788.553, 'duration': 3.141}, {'end': 802.317, 'text': 'For example, before the false discovery rate correction, your p-value might be 0.04, i.e. significant.', 'start': 792.954, 'duration': 9.363}, {'end': 809.439, 'text': 'After the FDR correction, your p-value might be 0.06, no longer significant.', 'start': 803.697, 'duration': 5.742}], 'summary': 'Benjamini-hochberg method limits false positives by adjusting p-values.', 'duration': 29.925, 'max_score': 779.514, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo779514.jpg'}, {'end': 1044.378, 'src': 'heatmap', 'start': 1016.945, 'weight': 0.707, 'content': [{'end': 1019.726, 'text': "And here I've plugged in the remaining adjusted p-values.", 'start': 1016.945, 'duration': 2.781}, {'end': 1025.289, 'text': 'That false positive p-value is no longer significant.', 'start': 1020.927, 'duration': 4.362}, {'end': 1030.311, 'text': "Hooray! Now let's look at a huge example.", 'start': 1026.089, 'duration': 4.222}, {'end': 1037.973, 'text': 'The blue boxes represent the p-values from when the samples came from two separate distributions.', 'start': 1031.909, 'duration': 6.064}, {'end': 1044.378, 'text': 'That is to say, these p-values are for genes that were affected by the drug.', 'start': 1039.355, 'duration': 5.023}], 'summary': 'Adjusted p-values show non-significant false positives, affecting drug-affected genes.', 'duration': 27.433, 'max_score': 1016.945, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo1016945.jpg'}, {'end': 1106.076, 'src': 'heatmap', 'start': 1083.995, 'weight': 0.794, 'content': [{'end': 1086.557, 'text': "Here, I've shown you the adjusted p-values.", 'start': 1083.995, 'duration': 2.562}, {'end': 1090.22, 'text': 'the false positives are now all greater than .', 'start': 1087.677, 'duration': 2.543}, {'end': 1095.425, 'text': '05 But these true positives remain less than .', 'start': 1090.22, 'duration': 5.205}, {'end': 1102.292, 'text': '05 Double bam! Hooray! We made it to the end.', 'start': 1095.425, 'duration': 6.867}, {'end': 1106.076, 'text': 'Tune in next time for another exciting stat quest.', 'start': 1102.972, 'duration': 3.104}], 'summary': 'Adjusted p-values show false positives > .05, true positives < .05. stay tuned for more stats!', 'duration': 22.081, 'max_score': 1083.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo1083995.jpg'}], 'start': 330.619, 'title': 'P-value analysis', 'summary': 'Discusses the benjamini-hochberg method, generating 10,000 p-values with about 5.1% less than 0.05, and p-value distribution from two different distributions. it also explains p-value concept, distribution, and benjamini-hochberg method application to limit false positives, along with insights into adjusting and identifying true positives.', 'chapters': [{'end': 517.917, 'start': 330.619, 'title': 'Benjamini-hochberg method & p-value distribution', 'summary': 'Discusses the benjamini-hochberg method, based on generating 10,000 p-values from samples, with about 5.1% of the p-values being less than 0.05, and the distribution of p-values from two different distributions.', 'duration': 187.298, 'highlights': ['The Benjamini-Hochberg method is based on generating 10,000 p-values from samples taken from the same distribution, with about 5.1% of the p-values being less than 0.05.', 'The p-values are uniformly distributed, with each bin containing about 5% of the p-values, approximately 500 p-values per bin.', 'The chapter explains the distribution of p-values when they come from two different distributions, illustrating the differences when comparing two different situations, such as wild type versus knockout or control versus drugged.']}, {'end': 1106.076, 'start': 519.256, 'title': 'P-value analysis and benjamini-hochberg method', 'summary': 'Explains the concept of p-values, their distribution when samples come from different distributions, and the application of the benjamini-hochberg method to limit false positives, with insights into adjusting p-values and identifying true positives.', 'duration': 586.82, 'highlights': ['The Benjamini-Hochberg method adjusts p-values to limit false positives, making them larger and ensuring less than 5% of significant results are false positives.', 'The distribution of p-values varies when samples come from different distributions, with p-values skewed and closer to zero for different distributions, and uniformly distributed for samples from the same distribution.', 'Identifying true positives involves using a cutoff to isolate the smallest p-values, which works well due to the skewed nature of p-values for genes affected by the drug.']}], 'duration': 775.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/K8LQSvtjcEo/pics/K8LQSvtjcEo330619.jpg', 'highlights': ['The Benjamini-Hochberg method adjusts p-values to limit false positives, ensuring less than 5% of significant results are false positives.', 'The p-values are uniformly distributed, with each bin containing about 5% of the p-values, approximately 500 p-values per bin.', 'The chapter explains the distribution of p-values when they come from two different distributions, illustrating the differences when comparing two different situations.', 'Identifying true positives involves using a cutoff to isolate the smallest p-values, which works well due to the skewed nature of p-values for genes affected by the drug.', 'The Benjamini-Hochberg method is based on generating 10,000 p-values from samples taken from the same distribution, with about 5.1% of the p-values being less than 0.05.']}], 'highlights': ['The Benjamini-Hochberg method adjusts p-values to limit false positives, ensuring less than 5% of significant results are false positives.', 'Comparing two samples of genes from the same type of mice will result in 500 false positives out of 10,000 genes, as 5% of 10,000 equals 500 false positives.', 'False discovery rates (FDR) are a tool to weed out bad data in high throughput sequencing.', 'RNA sequencing measures gene expression using read counts and showcases the variability in measurements.', 'A small p-value suggests that the samples are from two types of mice or two separate distributions, leading to false positives, with 500 genes appearing to be interesting when they are not.']}