Live Day 2- Basic To Intermediate Statistics

Join the community session https://ineuron.ai/course/Mega-Project-Foundation . Here All the materials will be uploaded.
Yes, everybody is enrolled in the community session, right? So just give me a confirmation. I will just remove this now and probably we can start today's topic. so yesterday, if you remember, we have discussed all the basic things. today we will be moving from basics to intermediate right. so we are basically going to move from basics to intermediate stats, specifically for data science. okay, so this is what we are going to discuss, and there are so many topics that i am probably going to cover today. we are basically going to cover measure of central tendency, measure of central tendency, measure of dispersion, measure of dispersion, then probably will start with Gaussian distribution. Then, fourth, we are going to understand Z score. Then we are going to understand standard normal distribution. Standard normal distribution. And there are some more topics that we really need to cover. So many people have problem with the background. intermediate stats for data science including measure of central tendency, measure of dispersion, gaussian distribution, and z score.', 'chapters': [{'end': 125.593, 'start': 32.366, 'title': 'Community session enrollment', 'summary': "Emphasizes the importance of enrolling in the community session and finding materials from the yesterday's video, along with emphasizing the readiness for the live session.", 'duration': 93.227, 'highlights': ['The importance of enrolling in the community session and accessing materials from the previous video is emphasized, ensuring everyone is ready for the live session.', 'Encouraging audience participation by asking for likes and confirmation of enrollment in the community session.', 'Emphasizing the availability of materials in the previous video and the importance of being prepared for the live session.']}, {'end': 259.875, 'start': 126.474, 'title': 'Intermediate stats for data science', 'summary': 'Covers moving from basics to intermediate stats for data science, including measure of central tendency, measure of dispersion, gaussian distribution, and z score.', 'duration': 133.401, 'highlights': ['The chapter covers measure of central tendency, measure of dispersion, Gaussian distribution, and Z score for data science.', 'The instructor plans to move from basics to intermediate stats, specifically for data science.', 'The instructor seeks feedback on the background and encourages engagement by asking for likes from the audience.']}], 'duration': 227.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU32366.jpg', 'highlights': ['The chapter covers measure of central tendency, measure of dispersion, Gaussian distribution, and Z score for data science.', 'The instructor plans to move from basics to intermediate stats, specifically for data science.', 'The importance of enrolling in the community session and accessing materials from the previous video is emphasized, ensuring everyone is ready for the live session.', 'Emphasizing the availability of materials in the previous video and the importance of being prepared for the live session.', 'Encouraging audience participation by asking for likes and confirmation of enrollment in the community session.', 'The instructor seeks feedback on the background and encourages engagement by asking for likes from the audience.']}, {'end': 643.813, 'segs': [{'end': 448.861, 'src': 'embed', 'start': 392.673, 'weight': 1, 'content': [{'end': 395.915, 'text': 'here specifically, we are talking about average.', 'start': 392.673, 'duration': 3.242}, {'end': 404.725, 'text': 'now, with population And with sample, we really need to understand the formulas of p.', 'start': 395.915, 'duration': 8.81}, {'end': 407.568, 'text': 'And we will try to understand in this specific way.', 'start': 404.725, 'duration': 2.843}, {'end': 410.552, 'text': 'Population is basically given by capital N.', 'start': 407.648, 'duration': 2.904}, {'end': 415.277, 'text': 'Sample is given by small n, which we have discussed in the last session.', 'start': 410.552, 'duration': 4.725}, {'end': 418.996, 'text': 'Now coming to the first thing.', 'start': 417.293, 'duration': 1.703}, {'end': 426.947, 'text': 'whenever we are probably discussing about mean, you need to remember that we are trying to find out the average of a specific distribution.', 'start': 418.996, 'duration': 7.951}, {'end': 437.537, 'text': "So let's say that my data sets look something like this three, three for 5, comma 5, comma 6.", 'start': 427.587, 'duration': 9.95}, {'end': 446.86, 'text': 'so if i really want to find out the mean of this population, okay, mean of this population i can basically give by a symbol which is mu.', 'start': 437.537, 'duration': 9.323}, {'end': 448.861, 'text': 'okay, this is mu.', 'start': 446.86, 'duration': 2.001}], 'summary': 'Understanding the average in population and sample using formulas of p, with a specific example of finding the mean of a given data set.', 'duration': 56.188, 'max_score': 392.673, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU392673.jpg'}, {'end': 538.242, 'src': 'embed', 'start': 510.525, 'weight': 0, 'content': [{'end': 514.126, 'text': 'now, if i try to find out the average, please have somebody help me out.', 'start': 510.525, 'duration': 3.601}, {'end': 524.532, 'text': '4, 7, 8, 9, 12, 16, 17, 22, 28, 28 by 10, which is nothing but 2.8.', 'start': 514.126, 'duration': 10.406}, {'end': 534.579, 'text': 'Okay so if I am considering x as my random variable, which is the population data set, if I really want to find out the mean, which is denoted by mu.', 'start': 524.532, 'duration': 10.047}, {'end': 538.242, 'text': 'so here you will be able to see that I am able to find out the average.', 'start': 534.579, 'duration': 3.663}], 'summary': 'Average of population data set (x) is 2.8.', 'duration': 27.717, 'max_score': 510.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU510525.jpg'}, {'end': 647.614, 'src': 'embed', 'start': 623.409, 'weight': 2, 'content': [{'end': 629.891, 'text': "because I want, because in the real world industry, when you are working, when you're explaining someone as a data scientist,", 'start': 623.409, 'duration': 6.482}, {'end': 632.691, 'text': 'you really need to use this well-known notation.', 'start': 629.891, 'duration': 2.8}, {'end': 637.292, 'text': 'You can use your own notation, whatever you like, but think of a larger point of view.', 'start': 632.731, 'duration': 4.561}, {'end': 643.813, 'text': 'Here, you really need to make sure that whatever standards is being followed, we need to try to follow in that specific way.', 'start': 637.772, 'duration': 6.041}, {'end': 647.614, 'text': 'Okay So this was the basic things with respect to mean.', 'start': 644.133, 'duration': 3.481}], 'summary': 'In the real world industry, data scientists need to use well-known notation and follow standards to ensure effective communication.', 'duration': 24.205, 'max_score': 623.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU623409.jpg'}], 'start': 259.875, 'title': 'Arithmetic mean and its calculation', 'summary': 'Covers the review and introduction to arithmetic mean, mean calculation with a dataset of 10 elements, and the importance of using well-known notations in the real-world industry, emphasizing the average of 3.2 in a specific example.', 'chapters': [{'end': 448.861, 'start': 259.875, 'title': 'Arithmetic mean in statistics', 'summary': 'Covers a review of previous session, introduction to arithmetic mean for population and sample, and the importance of understanding the formulas of p and the concept of average in statistics.', 'duration': 188.986, 'highlights': ['Introduction to arithmetic mean for population and sample The chapter starts with the introduction to arithmetic mean for population and sample as a key topic for discussion.', 'Review of previous session on statistics The speaker asks attendees to recall the topics from the previous session on statistics, indicating a focus on building on existing knowledge.', 'Emphasis on understanding the formulas of p and the concept of average in statistics The importance of understanding the formulas of p and the concept of average in statistics is emphasized, with the distinction between population and sample discussed.']}, {'end': 510.525, 'start': 448.861, 'title': 'Mean calculation and random variable x', 'summary': 'Covers the calculation of the mean using the formula mu = (summation of x of i) / n, using a dataset of 10 elements and a random variable x, emphasizing the iteration through the elements and the division by capital n.', 'duration': 61.664, 'highlights': ['The formula for calculating the mean, mu = (summation of X of i) / N, is explained using a dataset of 10 elements, providing a clear understanding of the process.', 'The concept of random variable X and its values in the dataset is introduced, emphasizing the iteration through the elements and the division by capital N for mean calculation.', 'The detailed breakdown of the dataset values and their iteration through the formula, 1 + 1 + 1 + 2 + 2 + 3 + 3 + 4 + 5 + 5 + 6, provides a practical demonstration of the mean calculation process.']}, {'end': 643.813, 'start': 510.525, 'title': 'Arithmetic mean and notations', 'summary': 'Covers the calculation of arithmetic mean with an example dataset, yielding an average of 3.2, and emphasizes the importance of using well-known notations in the real-world industry.', 'duration': 133.288, 'highlights': ['The average of the dataset 4, 7, 8, 9, 12, 16, 17, 22, 28, 28 is calculated to be 3.2, highlighting the application of arithmetic mean.', 'Emphasizes the significance of using well-known notations in the real-world industry, particularly in the context of working as a data scientist.', 'Discusses the use of notation x bar and the formula for sample mean, demonstrating the importance of proper notation in statistical calculations.']}], 'duration': 383.938, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU259875.jpg', 'highlights': ['The average of the dataset 4, 7, 8, 9, 12, 16, 17, 22, 28, 28 is calculated to be 3.2, highlighting the application of arithmetic mean.', 'The formula for calculating the mean, mu = (summation of X of i) / N, is explained using a dataset of 10 elements, providing a clear understanding of the process.', 'Emphasizes the significance of using well-known notations in the real-world industry, particularly in the context of working as a data scientist.', 'The detailed breakdown of the dataset values and their iteration through the formula, 1 + 1 + 1 + 2 + 2 + 3 + 3 + 4 + 5 + 5 + 6, provides a practical demonstration of the mean calculation process.', 'Emphasis on understanding the formulas of p and the concept of average in statistics The importance of understanding the formulas of p and the concept of average in statistics is emphasized, with the distinction between population and sample discussed.']}, {'end': 1004.67, 'segs': [{'end': 786.893, 'src': 'embed', 'start': 740.859, 'weight': 1, 'content': [{'end': 745.722, 'text': 'It refers to the measure used to determine the center of the distribution of the data.', 'start': 740.859, 'duration': 4.863}, {'end': 749.044, 'text': 'Okay So average and mean are one in the same guys.', 'start': 746.022, 'duration': 3.022}, {'end': 751.205, 'text': 'Understand average mean.', 'start': 749.544, 'duration': 1.661}, {'end': 754.928, 'text': 'Okay We use the same formula that is basically used.', 'start': 751.706, 'duration': 3.222}, {'end': 758.65, 'text': 'Okay So it actually refers to the measure.', 'start': 755.008, 'duration': 3.642}, {'end': 762.993, 'text': 'It is, it refers to the measure used to determine the center of the distribution of the data.', 'start': 759.17, 'duration': 3.823}, {'end': 770.835, 'text': 'Okay So, um, So this was the part with respect to central tendency.', 'start': 763.493, 'duration': 7.342}, {'end': 773.178, 'text': "Now let's go ahead and let's try to solve some problems.", 'start': 771.055, 'duration': 2.123}, {'end': 777.442, 'text': 'Obviously, I have given you a lot of examples with respect to mean.', 'start': 774.039, 'duration': 3.403}, {'end': 781.146, 'text': "But now let's go ahead and try to understand median.", 'start': 778.103, 'duration': 3.043}, {'end': 786.893, 'text': "And why do we specifically use median? So I'm going to take the same data set, whatever data set I have used over here.", 'start': 781.727, 'duration': 5.166}], 'summary': 'Transcript discusses measures of central tendency, focusing on mean and median.', 'duration': 46.034, 'max_score': 740.859, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU740859.jpg'}, {'end': 918.651, 'src': 'embed', 'start': 891.083, 'weight': 0, 'content': [{'end': 893.904, 'text': 'There is a huge difference with respect to this mean.', 'start': 891.083, 'duration': 2.821}, {'end': 896.585, 'text': 'And why it is basically added? Because of this number.', 'start': 894.284, 'duration': 2.301}, {'end': 899.406, 'text': 'We consider this number as outliers.', 'start': 897.065, 'duration': 2.341}, {'end': 903.154, 'text': 'Okay So we consider this numbers and outliers.', 'start': 900.447, 'duration': 2.707}, {'end': 909.088, 'text': 'Outliers really have a adverse impact on the entire distribution.', 'start': 903.735, 'duration': 5.353}, {'end': 913.21, 'text': 'Okay adverse impact on the entire distribution.', 'start': 909.71, 'duration': 3.5}, {'end': 918.651, 'text': 'So that is the reason why we specifically use, why we should be very much careful with outliers.', 'start': 913.23, 'duration': 5.421}], 'summary': 'Outliers have a huge adverse impact on the distribution.', 'duration': 27.568, 'max_score': 891.083, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU891083.jpg'}, {'end': 988.818, 'src': 'embed', 'start': 962.034, 'weight': 4, 'content': [{'end': 967.059, 'text': 'always understand, in median, the first thing that you really need to do is sort the numbers.', 'start': 962.034, 'duration': 5.025}, {'end': 969.281, 'text': 'so first step is sort the numbers.', 'start': 967.059, 'duration': 2.222}, {'end': 973.244, 'text': 'so over here you can see that the numbers are already sorted right.', 'start': 969.281, 'duration': 3.963}, {'end': 978.809, 'text': 'if your numbers is not sorted at that point of time, you will be able to see that.', 'start': 973.244, 'duration': 5.565}, {'end': 982.093, 'text': 'uh, you know, you probably have to sort it right now by default.', 'start': 978.809, 'duration': 3.284}, {'end': 985.235, 'text': 'i have made sure that the number is already sorted.', 'start': 982.093, 'duration': 3.142}, {'end': 988.818, 'text': 'okay, So do you define distribution in statistical term?', 'start': 985.235, 'duration': 3.583}], 'summary': 'Sorting numbers is the first step in understanding the median, as demonstrated with sorted numbers in the given example.', 'duration': 26.784, 'max_score': 962.034, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU962034.jpg'}], 'start': 644.133, 'title': 'Central tendency and outliers', 'summary': 'Discusses the concept of central tendency, focusing on mean, median, and mode, and emphasizes the impact of outliers on the distribution, showcasing a significant change in mean with the introduction of an outlier.', 'chapters': [{'end': 1004.67, 'start': 644.133, 'title': 'Central tendency and outliers', 'summary': 'Discusses the concept of central tendency, focusing on mean, median, and mode, and emphasizes the impact of outliers on the distribution, showcasing a significant change in mean with the introduction of an outlier.', 'duration': 360.537, 'highlights': ['Mean is a measure used to determine the center of the distribution of the data. Mean is defined as a measure used to determine the center of the distribution of the data, serving as one of the central measures of tendency.', 'The introduction of an outlier significantly impacted the mean value. The addition of an outlier (100) to the dataset resulted in a substantial change in the mean value from 3.2 to 12, highlighting the substantial impact of outliers on the distribution.', 'Outliers have a major adverse impact on the distribution of the central data. Outliers are noted to have a significant adverse impact on the entire distribution, emphasizing the need for caution and specialized techniques in dealing with outliers in statistics and data science.', 'Median is used to mitigate the impact of outliers on the central tendency. The use of median is advocated as a technique to mitigate the impact of outliers on the central tendency, providing a more robust measure in scenarios with significant outliers.', 'Sorting of numbers is a crucial step in calculating the median. The importance of sorting numbers as the initial step in calculating the median is emphasized, highlighting the essential procedure in determining the median of a dataset.']}], 'duration': 360.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU644133.jpg', 'highlights': ['The introduction of an outlier significantly impacted the mean value. The addition of an outlier (100) to the dataset resulted in a substantial change in the mean value from 3.2 to 12, highlighting the substantial impact of outliers on the distribution.', 'Mean is a measure used to determine the center of the distribution of the data. Mean is defined as a measure used to determine the center of the distribution of the data, serving as one of the central measures of tendency.', 'Median is used to mitigate the impact of outliers on the central tendency. The use of median is advocated as a technique to mitigate the impact of outliers on the central tendency, providing a more robust measure in scenarios with significant outliers.', 'Outliers have a major adverse impact on the distribution of the central data. Outliers are noted to have a significant adverse impact on the entire distribution, emphasizing the need for caution and specialized techniques in dealing with outliers in statistics and data science.', 'Sorting of numbers is a crucial step in calculating the median. The importance of sorting numbers as the initial step in calculating the median is emphasized, highlighting the essential procedure in determining the median of a dataset.']}, {'end': 1736.92, 'segs': [{'end': 1109.095, 'src': 'embed', 'start': 1082.224, 'weight': 2, 'content': [{'end': 1086.848, 'text': 'Now understand, even though the outlier is added see outlier basically means what?', 'start': 1082.224, 'duration': 4.624}, {'end': 1091.552, 'text': 'Outlier is a number which is completely different from the entire distribution.', 'start': 1087.448, 'duration': 4.104}, {'end': 1095.183, 'text': 'Okay? Completely different from the entire distribution.', 'start': 1092.361, 'duration': 2.822}, {'end': 1100.568, 'text': 'Over here, you can see that 100 is completely different from the entire distribution.', 'start': 1095.244, 'duration': 5.324}, {'end': 1109.095, 'text': "Now what if, now your question may rise that, okay Krish, what if I have one more number like this? Let's say that I have one more number 112.", 'start': 1101.249, 'duration': 7.846}], 'summary': 'An outlier is a number completely different from the distribution, e.g., 100 and 112.', 'duration': 26.871, 'max_score': 1082.224, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1082224.jpg'}, {'end': 1253.726, 'src': 'embed', 'start': 1225.775, 'weight': 0, 'content': [{'end': 1230.979, 'text': 'If the number of elements is even, then we probably take the central two elements.', 'start': 1225.775, 'duration': 5.204}, {'end': 1233.701, 'text': 'We try to find out the average and we try to calculate it.', 'start': 1231.099, 'duration': 2.602}, {'end': 1236.562, 'text': 'Right? But understand one thing over here.', 'start': 1234.221, 'duration': 2.341}, {'end': 1244.003, 'text': 'What is the main purpose? Initially, when we did not add outlier and we tried to calculate the mean, at that time I got 3.2.', 'start': 1236.722, 'duration': 7.281}, {'end': 1253.726, 'text': 'Okay? When I tried to calculate by adding an outlier, my median was 12.', 'start': 1244.003, 'duration': 9.723}], 'summary': 'When an outlier is added, the median increases from 3.2 to 12.', 'duration': 27.951, 'max_score': 1225.775, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1225775.jpg'}, {'end': 1584.642, 'src': 'embed', 'start': 1546.774, 'weight': 3, 'content': [{'end': 1555, 'text': "So what I'm saying, the missing value will be replaced with most frequent occurring element.", 'start': 1546.774, 'duration': 8.226}, {'end': 1571.353, 'text': 'Right? So we can definitely say that most frequent element, you can actually get it by using mode.', 'start': 1562.005, 'duration': 9.348}, {'end': 1578.438, 'text': 'Okay? Which is most frequently used and this specifically works well with categorical variable.', 'start': 1571.793, 'duration': 6.645}, {'end': 1583.161, 'text': "Okay? Now, let's take another example.", 'start': 1580.319, 'duration': 2.842}, {'end': 1584.642, 'text': 'Suppose I have a feature age.', 'start': 1583.541, 'duration': 1.101}], 'summary': 'Replace missing values with mode, best for categorical variables.', 'duration': 37.868, 'max_score': 1546.774, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1546774.jpg'}, {'end': 1641.047, 'src': 'embed', 'start': 1615.376, 'weight': 4, 'content': [{'end': 1621.002, 'text': 'Which do you think, based on this scenario, that is, ages of students, we should definitely apply?', 'start': 1615.376, 'duration': 5.626}, {'end': 1622.724, 'text': 'Just tell me this answer.', 'start': 1621.883, 'duration': 0.841}, {'end': 1628.643, 'text': "In this particular case, definitely I would suggest let's go with mean.", 'start': 1625.362, 'duration': 3.281}, {'end': 1634.065, 'text': "Because I know students' age will basically range from one value to one value.", 'start': 1629.063, 'duration': 5.002}, {'end': 1637.566, 'text': "It won't extend more than that specific value.", 'start': 1634.845, 'duration': 2.721}, {'end': 1641.047, 'text': 'So here a domain knowledge will also come into existence.', 'start': 1637.886, 'duration': 3.161}], 'summary': 'Suggest using mean based on limited age range and domain knowledge.', 'duration': 25.671, 'max_score': 1615.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1615376.jpg'}], 'start': 1004.83, 'title': 'Understanding distributions and calculating median', 'summary': 'Discusses the process of finding the median in a set of numbers, including sorting the numbers and identifying the central element, as well as the concept of outliers, with an example of finding the median in a set of 11 numbers resulting in a mode of 3.', 'chapters': [{'end': 1100.568, 'start': 1004.83, 'title': 'Understanding distributions and calculating median', 'summary': 'Discusses the process of finding the median in a set of numbers, including sorting the numbers and identifying the central element, as well as the concept of outliers, with an example of finding the median in a set of 11 numbers resulting in a mode of 3.', 'duration': 95.738, 'highlights': ['The process of finding the median involves sorting the numbers and identifying the central element in the set, demonstrated with an example of finding the median in a set of 11 numbers resulting in a mode of 3.', 'Explanation of outliers as numbers completely different from the entire distribution, illustrated with the example of 100 being an outlier in the given distribution.']}, {'end': 1736.92, 'start': 1101.249, 'title': 'Understanding median and mode', 'summary': 'Discusses the concept of median and mode, emphasizing their calculation methodologies, their impact on the distribution, and their application in handling outliers and missing data in different scenarios, and highlights the importance of domain knowledge in choosing between mean, median, and mode for different use cases.', 'duration': 635.671, 'highlights': ['The median is calculated by taking the average of the middle two elements in a dataset, and it works well with outliers, as seen in the example where the median remained close to the initial value even with the addition of two outliers.', 'In the context of mode, the most frequent element is identified, and it is suitable for both categorical and integer variables, especially working well with categorical variables as demonstrated by the application in a dataset representing types of flowers with missing values.', 'The discussion highlights the importance of domain knowledge in choosing between mean, median, and mode for different use cases, as exemplified by the decision to use mean for the ages of students due to the expected range of values and the consideration of global population data where mean may not be suitable.', 'The session emphasizes the engagement of the audience and the encouragement for likes, demonstrating a dynamic and interactive teaching style to facilitate understanding and engagement.', 'The transcript includes a lighthearted and engaging teaching approach, aiming to create an interactive and enjoyable learning environment for the audience.']}], 'duration': 732.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1004830.jpg', 'highlights': ['The process of finding the median involves sorting the numbers and identifying the central element in the set, demonstrated with an example of finding the median in a set of 11 numbers resulting in a mode of 3.', 'The median is calculated by taking the average of the middle two elements in a dataset, and it works well with outliers, as seen in the example where the median remained close to the initial value even with the addition of two outliers.', 'Explanation of outliers as numbers completely different from the entire distribution, illustrated with the example of 100 being an outlier in the given distribution.', 'In the context of mode, the most frequent element is identified, and it is suitable for both categorical and integer variables, especially working well with categorical variables as demonstrated by the application in a dataset representing types of flowers with missing values.', 'The discussion highlights the importance of domain knowledge in choosing between mean, median, and mode for different use cases, as exemplified by the decision to use mean for the ages of students due to the expected range of values and the consideration of global population data where mean may not be suitable.']}, {'end': 2405.841, 'segs': [{'end': 1821.813, 'src': 'embed', 'start': 1773.64, 'weight': 0, 'content': [{'end': 1777.084, 'text': 'Okay So these are the two topics that we are probably going to discuss.', 'start': 1773.64, 'duration': 3.444}, {'end': 1786.78, 'text': "Okay So, let's go ahead and let's discuss about this.", 'start': 1783.058, 'duration': 3.722}, {'end': 1789.461, 'text': 'Now, first topic is basically with respect to variance.', 'start': 1787.1, 'duration': 2.361}, {'end': 1795.204, 'text': 'Now, how do we define variance? Variance is a concept of measure of dispersion.', 'start': 1790.302, 'duration': 4.902}, {'end': 1798.126, 'text': 'Okay, And probably for an interviewer also.', 'start': 1795.645, 'duration': 2.481}, {'end': 1800.327, 'text': 'this may be a confusing question.', 'start': 1798.126, 'duration': 2.201}, {'end': 1807.19, 'text': 'they may ask candidates, you know, and they may probably make them understand different, different things and they may again confuse you.', 'start': 1800.327, 'duration': 6.863}, {'end': 1811.913, 'text': 'Okay But when I say dispersion, dispersion basically means spread.', 'start': 1807.491, 'duration': 4.422}, {'end': 1816.107, 'text': 'Okay Please make sure that you remember this word.', 'start': 1813.024, 'duration': 3.083}, {'end': 1817.689, 'text': 'This basically means spread.', 'start': 1816.328, 'duration': 1.361}, {'end': 1818.95, 'text': 'Okay Spread.', 'start': 1818.37, 'duration': 0.58}, {'end': 1821.813, 'text': 'How spread, how well spread your data is.', 'start': 1819.351, 'duration': 2.462}], 'summary': 'Discussion on variance as a measure of dispersion and its relevance for interviews, emphasizing the concept of spread.', 'duration': 48.173, 'max_score': 1773.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1773640.jpg'}, {'end': 1919.297, 'src': 'embed', 'start': 1875.04, 'weight': 3, 'content': [{'end': 1878.203, 'text': 'is that how this two distribution is different?', 'start': 1875.04, 'duration': 3.163}, {'end': 1881.566, 'text': 'okay?. We really need to understand okay?', 'start': 1878.203, 'duration': 3.363}, {'end': 1885.49, 'text': 'And probably interviewer may say you and he may confuse you in dispersion.', 'start': 1881.626, 'duration': 3.864}, {'end': 1888.053, 'text': 'What is variance? He may definitely confuse you.', 'start': 1886.011, 'duration': 2.042}, {'end': 1896.22, 'text': 'So, for that specific reason, if you really want to identify Okay, how two distribution are different,', 'start': 1888.533, 'duration': 7.687}, {'end': 1899.082, 'text': 'at that point of time we may use variance and standard deviation.', 'start': 1896.22, 'duration': 2.862}, {'end': 1903.845, 'text': "Now let's go ahead and let's try to understand the formula with respect to variance and standard deviation.", 'start': 1899.582, 'duration': 4.263}, {'end': 1908.049, 'text': 'And here also, I will probably talk about two different things.', 'start': 1904.486, 'duration': 3.563}, {'end': 1911.291, 'text': 'And here one very, very important interview question will come.', 'start': 1908.569, 'duration': 2.722}, {'end': 1913.553, 'text': 'One is population variance.', 'start': 1912.011, 'duration': 1.542}, {'end': 1919.297, 'text': 'And one is about sample variance.', 'start': 1917.155, 'duration': 2.142}], 'summary': 'Understanding the differences between distributions using variance and standard deviation, with emphasis on population and sample variance.', 'duration': 44.257, 'max_score': 1875.04, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1875040.jpg'}, {'end': 2048.52, 'src': 'embed', 'start': 2009.65, 'weight': 7, 'content': [{'end': 2011.331, 'text': "That basically means I'm teaching well, okay.", 'start': 2009.65, 'duration': 1.681}, {'end': 2014.492, 'text': "So first thing first, let's go and calculate.", 'start': 2011.991, 'duration': 2.501}, {'end': 2017.574, 'text': "Now we'll go and calculate different, different things.", 'start': 2015.132, 'duration': 2.442}, {'end': 2019.814, 'text': "Don't worry, guys, you'll also be getting assignments.", 'start': 2017.714, 'duration': 2.1}, {'end': 2023.036, 'text': "So with respect to population, we'll go and calculate the mu.", 'start': 2020.455, 'duration': 2.581}, {'end': 2028.096, 'text': 'What is mu in this particular case? What is mu? Quickly.', 'start': 2024.252, 'duration': 3.844}, {'end': 2029.297, 'text': '5 plus 4 plus 3.', 'start': 2028.116, 'duration': 1.181}, {'end': 2029.517, 'text': '12, 14, 16, 17.', 'start': 2029.297, 'duration': 0.22}, {'end': 2031.479, 'text': '17 divided by 6 is nothing but 2.83.', 'start': 2029.517, 'duration': 1.962}, {'end': 2033.561, 'text': 'Okay So here mu is basically 2.83, 2.83.', 'start': 2031.479, 'duration': 2.082}, {'end': 2048.52, 'text': 'Okay 2.83, 2.83, 2.83, okay.', 'start': 2033.561, 'duration': 14.959}], 'summary': 'Teaching involves calculations and assignments, with population mu calculated as 2.83.', 'duration': 38.87, 'max_score': 2009.65, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2009650.jpg'}, {'end': 2335.484, 'src': 'embed', 'start': 2310.528, 'weight': 5, 'content': [{'end': 2319.051, 'text': 'Right Spread when we say spread is basically high, that basically means the elements that is present in the central region is more.', 'start': 2310.528, 'duration': 8.523}, {'end': 2323.552, 'text': 'Okay So always understand that specific thing.', 'start': 2320.491, 'duration': 3.061}, {'end': 2335.484, 'text': 'Clear everyone? Clear everyone? Yes or no? Whenever I talk about more variance, that basically means the data is more dispersed.', 'start': 2324.878, 'duration': 10.606}], 'summary': 'High spread means more elements in central region, more variance means more dispersed data.', 'duration': 24.956, 'max_score': 2310.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2310528.jpg'}], 'start': 1737.2, 'title': 'Measures of dispersion', 'summary': 'Covers the concepts of variance and standard deviation, key topics for interview preparation, emphasizing dispersion as the spread of data. it also explores the importance of variance in identifying differences between distributions and the spread of data.', 'chapters': [{'end': 1821.813, 'start': 1737.2, 'title': 'Measure of dispersion: variance and standard deviation', 'summary': 'Covers the measure of dispersion, focusing on the concepts of variance and standard deviation, which are key topics for interview preparation, emphasizing the idea that dispersion refers to the spread of data.', 'duration': 84.613, 'highlights': ['The chapter discusses the measure of dispersion, specifically focusing on two main topics: variance and standard deviation, which are essential concepts for data analysis and interview preparation.', 'Variance is defined as a concept of measure of dispersion, and it is emphasized that dispersion refers to the spread of data, a crucial point for understanding data analysis.', "The concept of dispersion is explained as 'spread,' emphasizing the importance of understanding how well spread the data is, which is a fundamental aspect of data analysis and interpretation."]}, {'end': 2405.841, 'start': 1821.954, 'title': 'Understanding variance and standard deviation', 'summary': 'Explores the concept of mean, variance, and standard deviation, and demonstrates the calculation process using datasets, emphasizing the importance of variance in identifying differences between distributions and the spread of data.', 'duration': 583.887, 'highlights': ['The importance of variance in identifying differences between distributions is emphasized by comparing two datasets with the same mean but different variance, showcasing the relevance of variance in measuring the spread of data.', 'The detailed calculation process for both population variance and sample variance is explained, with formulas and step-by-step calculations provided for better understanding.', 'The concept of variance is reinforced by highlighting its role in indicating the spread of data, with a clear explanation that higher variance signifies greater dispersion of data, enhancing the comprehension of the topic.', 'The significance of understanding variance and its relationship with the spread of data is reiterated, with the explanation focusing on the central region and the dispersion of elements within the datasets.', 'The chapter provides practical examples and encourages interactive learning by engaging the audience in questions and discussions to reinforce understanding and application of the concepts.']}], 'duration': 668.641, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU1737200.jpg', 'highlights': ['The chapter discusses the measure of dispersion, focusing on variance and standard deviation, essential for data analysis and interview preparation.', 'Variance is defined as a measure of dispersion, emphasizing the spread of data, crucial for understanding data analysis.', 'The importance of understanding how well spread the data is, a fundamental aspect of data analysis and interpretation, is highlighted.', 'The importance of variance in identifying differences between distributions is emphasized, showcasing its relevance in measuring the spread of data.', 'The detailed calculation process for both population variance and sample variance is explained, enhancing comprehension of the topic.', 'Higher variance signifies greater dispersion of data, reinforcing the understanding of the topic.', 'The significance of understanding variance and its relationship with the spread of data is reiterated, focusing on the central region and the dispersion of elements within the datasets.', 'The chapter provides practical examples and encourages interactive learning to reinforce understanding and application of the concepts.']}, {'end': 2905.63, 'segs': [{'end': 2470.635, 'src': 'embed', 'start': 2445.809, 'weight': 0, 'content': [{'end': 2452.41, 'text': 'Now if you see standard deviation formula, it is nothing but root of variance, okay? Root of variance.', 'start': 2445.809, 'duration': 6.601}, {'end': 2458.591, 'text': 'Now here you can see when the standard deviation is smaller, that basically means you are having a very huge curve.', 'start': 2453.03, 'duration': 5.561}, {'end': 2463.665, 'text': 'That basically means the data is not that much distributed.', 'start': 2460.02, 'duration': 3.645}, {'end': 2470.635, 'text': 'When you have a big standard deviation like 50, 60 and all, you can see your data is highly distributed.', 'start': 2464.306, 'duration': 6.329}], 'summary': 'Standard deviation is the root of variance. smaller std dev indicates less distributed data, while larger std dev like 50 or 60 indicates highly distributed data.', 'duration': 24.826, 'max_score': 2445.809, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2445809.jpg'}, {'end': 2576.194, 'src': 'embed', 'start': 2550.831, 'weight': 2, 'content': [{'end': 2555.554, 'text': 'What is the mean in this particular case? What is the mean? Mean is nothing but 2.83.', 'start': 2550.831, 'duration': 4.723}, {'end': 2558.076, 'text': "Right Let's consider this one.", 'start': 2555.554, 'duration': 2.522}, {'end': 2560.669, 'text': 'The mean is 2.83.', 'start': 2558.516, 'duration': 2.153}, {'end': 2567.831, 'text': 'Now, from this mean, your data will be distributed because mean is basically specifying your measure of central tendency.', 'start': 2560.669, 'duration': 7.162}, {'end': 2572.073, 'text': 'It basically says that where the center is there for that specific distribution.', 'start': 2567.851, 'duration': 4.222}, {'end': 2576.194, 'text': 'So where the center is actually present in that specific distribution.', 'start': 2573.213, 'duration': 2.981}], 'summary': 'Mean in this case is 2.83, specifying central tendency of data.', 'duration': 25.363, 'max_score': 2550.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2550831.jpg'}, {'end': 2744.515, 'src': 'heatmap', 'start': 2687.851, 'weight': 1, 'content': [{'end': 2691.993, 'text': "With the help of variance, definitely you'll be able to understand how the data is spread.", 'start': 2687.851, 'duration': 4.142}, {'end': 2699.637, 'text': "And with standard deviation you'll be able to understand that between one standard deviation to the right and the left,", 'start': 2692.853, 'duration': 6.784}, {'end': 2702.178, 'text': 'what may be the range of data that may be falling.', 'start': 2699.637, 'duration': 2.541}, {'end': 2707.956, 'text': 'Okay? So standard deviation is nothing but it is the square root of variance.', 'start': 2703.075, 'duration': 4.881}, {'end': 2713.157, 'text': 'That basically means from the mean, right, how far an element can be.', 'start': 2708.616, 'duration': 4.541}, {'end': 2714.598, 'text': "Let's consider that if I consider 5.", 'start': 2713.217, 'duration': 1.381}, {'end': 2720.879, 'text': 'Okay? Now for 5, if you try to calculate, it may fall somewhere here.', 'start': 2714.598, 'duration': 6.281}, {'end': 2729.641, 'text': 'So how you are going to represent 5? You will say that it falls in 1.5 standard deviation from the mean.', 'start': 2721.539, 'duration': 8.102}, {'end': 2733.571, 'text': 'from the mean.', 'start': 2732.571, 'duration': 1}, {'end': 2737.893, 'text': 'So this kind of definition you will be able to tell them.', 'start': 2734.792, 'duration': 3.101}, {'end': 2744.515, 'text': 'Okay? So that basically means from the mean how far a specific number is with respect to standard deviation.', 'start': 2738.393, 'duration': 6.122}], 'summary': 'Variance and standard deviation measure data spread and distance from mean; standard deviation is the square root of variance.', 'duration': 56.664, 'max_score': 2687.851, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2687851.jpg'}, {'end': 2720.879, 'src': 'embed', 'start': 2692.853, 'weight': 1, 'content': [{'end': 2699.637, 'text': "And with standard deviation you'll be able to understand that between one standard deviation to the right and the left,", 'start': 2692.853, 'duration': 6.784}, {'end': 2702.178, 'text': 'what may be the range of data that may be falling.', 'start': 2699.637, 'duration': 2.541}, {'end': 2707.956, 'text': 'Okay? So standard deviation is nothing but it is the square root of variance.', 'start': 2703.075, 'duration': 4.881}, {'end': 2713.157, 'text': 'That basically means from the mean, right, how far an element can be.', 'start': 2708.616, 'duration': 4.541}, {'end': 2714.598, 'text': "Let's consider that if I consider 5.", 'start': 2713.217, 'duration': 1.381}, {'end': 2720.879, 'text': 'Okay? Now for 5, if you try to calculate, it may fall somewhere here.', 'start': 2714.598, 'duration': 6.281}], 'summary': 'Standard deviation measures the range of data from the mean, providing insight into the spread of data.', 'duration': 28.026, 'max_score': 2692.853, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2692853.jpg'}, {'end': 2850.833, 'src': 'embed', 'start': 2821.36, 'weight': 4, 'content': [{'end': 2826.263, 'text': 'This is the first step to find outliers.', 'start': 2821.36, 'duration': 4.903}, {'end': 2830.025, 'text': 'How do we find an outlier? Okay.', 'start': 2827.103, 'duration': 2.922}, {'end': 2836.469, 'text': 'So probably we are going to discuss in this the first and with the help of code also you can basically do.', 'start': 2830.385, 'duration': 6.084}, {'end': 2845.314, 'text': "Okay Now with respect to percentiles, let's try to understand what is percentiles and how do you find out percentiles? Okay.", 'start': 2837.409, 'duration': 7.905}, {'end': 2850.833, 'text': 'Now, before understanding percentile, you basically need to understand about percentage.', 'start': 2846.252, 'duration': 4.581}], 'summary': 'Discussion on finding outliers using percentiles and code.', 'duration': 29.473, 'max_score': 2821.36, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2821360.jpg'}, {'end': 2914.974, 'src': 'embed', 'start': 2886.625, 'weight': 5, 'content': [{'end': 2888.586, 'text': 'Sample variance is n minus 1.', 'start': 2886.625, 'duration': 1.961}, {'end': 2889.987, 'text': "At last, I'll put up a question.", 'start': 2888.586, 'duration': 1.401}, {'end': 2891.168, 'text': 'I want to give you an assignment.', 'start': 2890.047, 'duration': 1.121}, {'end': 2893.009, 'text': "Okay So don't worry.", 'start': 2891.908, 'duration': 1.101}, {'end': 2894.47, 'text': "I'll talk about that also.", 'start': 2893.509, 'duration': 0.961}, {'end': 2896.706, 'text': 'Okay Yes.', 'start': 2895.566, 'duration': 1.14}, {'end': 2898.947, 'text': "Sample variance, I'll give you as an assignment.", 'start': 2896.967, 'duration': 1.98}, {'end': 2901.348, 'text': 'I want some kind of answers from you.', 'start': 2899.588, 'duration': 1.76}, {'end': 2904.37, 'text': "Okay And we'll discuss after we complete some topics.", 'start': 2901.708, 'duration': 2.662}, {'end': 2905.63, 'text': "Okay So don't worry.", 'start': 2904.55, 'duration': 1.08}, {'end': 2911.172, 'text': 'So over here, if I really want to find out the numbers, percentage of the numbers that are odd.', 'start': 2906.43, 'duration': 4.742}, {'end': 2914.974, 'text': 'Okay, So how do you basically apply a formula over here?', 'start': 2911.893, 'duration': 3.081}], 'summary': 'Sample variance is n-1, and an assignment will be given. discussion to follow completion of topics.', 'duration': 28.349, 'max_score': 2886.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2886625.jpg'}], 'start': 2411.477, 'title': 'Understanding standard deviation and data distribution', 'summary': 'Delves into the relationship between standard deviation and variance, highlighting the impact of mean on data distribution. it also explains the calculation of data range within different standard deviations and introduces percentiles and quartiles for outlier detection.', 'chapters': [{'end': 2572.073, 'start': 2411.477, 'title': 'Understanding standard deviation and variance', 'summary': 'Discusses the relationship between standard deviation and variance, illustrating how smaller standard deviation indicates less distributed data and higher variance is associated with more dispersed data, with a specific example demonstrating the calculation of standard deviation from variance and the impact of mean on data distribution.', 'duration': 160.596, 'highlights': ['Illustrating the relationship between standard deviation and data distribution Smaller standard deviation indicates less distributed data, while higher variance is associated with more dispersed data.', 'Demonstrating the calculation of standard deviation from variance The standard deviation is calculated as the root of the variance, with a specific example resulting in a standard deviation of 1.345 from a variance of 1.81.', 'Explaining the impact of mean on data distribution The mean specifies the measure of central tendency, influencing the distribution of data around the center.']}, {'end': 2905.63, 'start': 2573.213, 'title': 'Understanding standard deviation and percentiles', 'summary': 'Explains the concept of standard deviation, demonstrating how to calculate the range of data falling within different standard deviations, and introduces the concept of percentiles and quartiles as the first step to find outliers, with a mention of the assignment on sample variance.', 'duration': 332.417, 'highlights': ['Standard deviation is the square root of variance, which helps understand the range of data within different standard deviations. Standard deviation is the square root of variance, allowing an understanding of the range of data within different standard deviations.', 'Introduction to percentiles and quartiles as the first step to find outliers. Introducing percentiles and quartiles as the initial step to identify outliers in the data.', 'Mention of the assignment on sample variance for further discussion. The mention of an upcoming assignment on sample variance for further discussion and understanding.']}], 'duration': 494.153, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2411477.jpg', 'highlights': ['Illustrating the relationship between standard deviation and data distribution Smaller standard deviation indicates less distributed data, while higher variance is associated with more dispersed data.', 'Demonstrating the calculation of standard deviation from variance The standard deviation is calculated as the root of the variance, with a specific example resulting in a standard deviation of 1.345 from a variance of 1.81.', 'Explaining the impact of mean on data distribution The mean specifies the measure of central tendency, influencing the distribution of data around the center.', 'Standard deviation is the square root of variance, which helps understand the range of data within different standard deviations. Standard deviation is the square root of variance, allowing an understanding of the range of data within different standard deviations.', 'Introduction to percentiles and quartiles as the first step to find outliers. Introducing percentiles and quartiles as the initial step to identify outliers in the data.', 'Mention of the assignment on sample variance for further discussion. The mention of an upcoming assignment on sample variance for further discussion and understanding.']}, {'end': 3648.528, 'segs': [{'end': 2937.184, 'src': 'embed', 'start': 2906.43, 'weight': 0, 'content': [{'end': 2911.172, 'text': 'So over here, if I really want to find out the numbers, percentage of the numbers that are odd.', 'start': 2906.43, 'duration': 4.742}, {'end': 2914.974, 'text': 'Okay, So how do you basically apply a formula over here?', 'start': 2911.893, 'duration': 3.081}, {'end': 2931.841, 'text': 'So I can basically say percentage is equal to number of numbers, number of numbers that are odd, divided by total numbers.', 'start': 2915.474, 'duration': 16.367}, {'end': 2937.184, 'text': 'So if I really try to calculate how many numbers are odd? 1, 2, 3.', 'start': 2934.263, 'duration': 2.921}], 'summary': 'Calculating the percentage of odd numbers from a given set using a formula.', 'duration': 30.754, 'max_score': 2906.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2906430.jpg'}, {'end': 3057.158, 'src': 'embed', 'start': 3013.379, 'weight': 1, 'content': [{'end': 3039.229, 'text': 'What is a percentile? So a percentile is a value, a percentile is a value below which a certain percentage of observations lie.', 'start': 3013.379, 'duration': 25.85}, {'end': 3044.792, 'text': 'Okay So this is the definition of percentile.', 'start': 3041.71, 'duration': 3.082}, {'end': 3048.333, 'text': 'It is basically saying, it is a value.', 'start': 3045.572, 'duration': 2.761}, {'end': 3051.575, 'text': 'If I say, okay, this number is the 25 percentile.', 'start': 3048.794, 'duration': 2.781}, {'end': 3057.158, 'text': 'This basically says that 25 percentage of the entire distribution is less than that particular value.', 'start': 3052.195, 'duration': 4.963}], 'summary': 'A percentile is a value below which a certain percentage of observations lie, e.g., 25th percentile indicates 25% of the distribution is less than that value.', 'duration': 43.779, 'max_score': 3013.379, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3013379.jpg'}, {'end': 3279.615, 'src': 'embed', 'start': 3244.289, 'weight': 2, 'content': [{'end': 3252.834, 'text': 'yes, so 80 percentile will basically be my answer for this.', 'start': 3244.289, 'duration': 8.545}, {'end': 3259.979, 'text': 'that basically means, if i really want to find out what this 10 value percentile is, it is 80..', 'start': 3252.834, 'duration': 7.145}, {'end': 3262.221, 'text': 'Now understand what is the main meaning out of it?', 'start': 3259.979, 'duration': 2.242}, {'end': 3274.591, 'text': 'The main meaning is that 80% please listen to me very, very carefully 80% of the entire distribution is less than 10..', 'start': 3262.821, 'duration': 11.77}, {'end': 3279.615, 'text': 'Right? 80% of the entire distribution is less than 10.', 'start': 3274.591, 'duration': 5.024}], 'summary': '80% of the distribution is less than 10, 80 percentile.', 'duration': 35.326, 'max_score': 3244.289, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3244289.jpg'}, {'end': 3560.664, 'src': 'embed', 'start': 3517.433, 'weight': 3, 'content': [{'end': 3518.774, 'text': 'And 5.25 will be in between this.', 'start': 3517.433, 'duration': 1.341}, {'end': 3524.584, 'text': "But right now, I don't see any element between this.", 'start': 3521.243, 'duration': 3.341}, {'end': 3531.026, 'text': 'So what we do is that we take 5th and 6th index and then we do the average and we calculate the value.', 'start': 3525.004, 'duration': 6.022}, {'end': 3535.867, 'text': 'In this particular case, my answer will be 5.', 'start': 3531.486, 'duration': 4.381}, {'end': 3540.129, 'text': 'Okay So 5 is the value for 25 percentile.', 'start': 3535.867, 'duration': 4.262}, {'end': 3542.769, 'text': 'Clear or no?', 'start': 3541.589, 'duration': 1.18}, {'end': 3548.191, 'text': 'Clear or no?', 'start': 3547.471, 'duration': 0.72}, {'end': 3552.061, 'text': 'You want white background?', 'start': 3550.66, 'duration': 1.401}, {'end': 3559.083, 'text': 'You want this background, guys?', 'start': 3557.763, 'duration': 1.32}, {'end': 3560.664, 'text': 'I think black was good.', 'start': 3559.644, 'duration': 1.02}], 'summary': 'Calculating 25th percentile yields a value of 5.', 'duration': 43.231, 'max_score': 3517.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3517433.jpg'}], 'start': 2906.43, 'title': 'Odd number percentage and percentiles', 'summary': 'Covers calculating the percentage of odd numbers, yielding an example of 60%, and understanding percentile concepts in data through practical examples and formulas, emphasizing comprehension and practical application.', 'chapters': [{'end': 2959.24, 'start': 2906.43, 'title': 'Calculating odd number percentage', 'summary': 'Explains how to calculate the percentage of odd numbers, using the formula percentage = number of odd numbers / total numbers, illustrated with an example resulting in 60%.', 'duration': 52.81, 'highlights': ['Illustrates the formula for calculating percentage by dividing the number of odd numbers by total numbers, resulting in 60%.', 'Introduces the concept of percentiles as a very important topic to understand.']}, {'end': 3648.528, 'start': 2960.222, 'title': 'Understanding percentiles in data', 'summary': 'Explains the concept of percentiles through a practical example and a formula, demonstrating how to calculate percentile ranking and find the value at a specific percentile in a dataset, with a focus on understanding the underlying meaning and implications.', 'duration': 688.306, 'highlights': ['Explaining the concept of percentile The speaker provides a clear definition of a percentile as a value below which a certain percentage of observations lie, using a dataset to illustrate the concept and its practical application.', 'Calculating the percentile ranking of 10 The step-by-step calculation of the percentile ranking of 10 is demonstrated, resulting in an 80th percentile, signifying that 80% of the entire distribution is less than 10.', 'Assigning the assignment of finding the percentile ranking of 11 The audience is engaged in a practical exercise to calculate the percentile ranking of 11, with the correct answer being 85, indicating that 85% of the dataset is less than 11.', 'Determining the value at the 25th percentile A clear method for calculating the value at the 25th percentile is presented, resulting in the value 5, demonstrating a practical application of percentile calculation in datasets.']}], 'duration': 742.098, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU2906430.jpg', 'highlights': ['Illustrates the formula for calculating percentage by dividing the number of odd numbers by total numbers, resulting in 60%.', 'Explaining the concept of percentile as a value below which a certain percentage of observations lie, using a dataset to illustrate the concept and its practical application.', 'Calculating the percentile ranking of 10, resulting in an 80th percentile, signifying that 80% of the entire distribution is less than 10.', 'Determining the value at the 25th percentile, resulting in the value 5, demonstrating a practical application of percentile calculation in datasets.', 'Introduces the concept of percentiles as a very important topic to understand.']}, {'end': 5174.551, 'segs': [{'end': 3771.473, 'src': 'embed', 'start': 3735.182, 'weight': 0, 'content': [{'end': 3746.289, 'text': "so if i use 75 divided by 100 multiplied by 21, this is nothing, but i'll use my calculator now.", 'start': 3735.182, 'duration': 11.107}, {'end': 3751.472, 'text': "so i'm going to basically say 75 divided by 100 multiplied by 21.", 'start': 3746.289, 'duration': 5.183}, {'end': 3751.852, 'text': 'so it is 15.75.', 'start': 3751.472, 'duration': 0.38}, {'end': 3764.427, 'text': 'understand, 15.75 is the index position.', 'start': 3751.852, 'duration': 12.575}, {'end': 3766.969, 'text': 'Now go and count which is 15.75 from the top.', 'start': 3765.267, 'duration': 1.702}, {'end': 3767.379, 'text': '1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15.', 'start': 3766.989, 'duration': 0.39}, {'end': 3769.551, 'text': '15.75 is the sum of these two numbers.', 'start': 3767.389, 'duration': 2.162}, {'end': 3771.473, 'text': 'So my answer is, my answer is 9.', 'start': 3769.691, 'duration': 1.782}], 'summary': 'By calculating an index position of 15.75, the final answer is 9.', 'duration': 36.291, 'max_score': 3735.182, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3735182.jpg'}, {'end': 3919.917, 'src': 'embed', 'start': 3891.169, 'weight': 1, 'content': [{'end': 3899.433, 'text': 'The fourth topic that we should discuss about third quartile, which is also said as Q3.', 'start': 3891.169, 'duration': 8.264}, {'end': 3902.815, 'text': 'And the fifth topic we basically discuss about maximum.', 'start': 3900.174, 'duration': 2.641}, {'end': 3910.09, 'text': 'And with the help of this, we will be using these values to basically remove the outliers.', 'start': 3904.346, 'duration': 5.744}, {'end': 3919.917, 'text': "Okay So let's take one example and let's see that by the help of five number summary, how do we remove an outlier? Okay.", 'start': 3910.811, 'duration': 9.106}], 'summary': 'Discussion on third quartile (q3), maximum, and using values to remove outliers.', 'duration': 28.748, 'max_score': 3891.169, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3891169.jpg'}, {'end': 4620.766, 'src': 'embed', 'start': 4573.905, 'weight': 2, 'content': [{'end': 4584.713, 'text': 'Okay So median obviously over here, how much median is? Median is nothing but 5.', 'start': 4573.905, 'duration': 10.808}, {'end': 4589.357, 'text': "Now let's draw a plot which is called as box plot.", 'start': 4584.713, 'duration': 4.644}, {'end': 4594.541, 'text': 'By this specific data, you can definitely draw a box plot.', 'start': 4590.378, 'duration': 4.163}, {'end': 4597.773, 'text': 'Now, how does a box plot basically get drawn?', 'start': 4595.33, 'duration': 2.443}, {'end': 4608.763, 'text': "So you will be having x-axis, and let's consider that in this particular x-axis you have values like minus 2, 0, 2, 4, 6, 8, 10..", 'start': 4598.413, 'duration': 10.35}, {'end': 4620.766, 'text': 'Okay So this is your x-axis.', 'start': 4608.764, 'duration': 12.002}], 'summary': 'Median is 5, and a box plot can be drawn with x-axis values from -2 to 10.', 'duration': 46.861, 'max_score': 4573.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU4573905.jpg'}, {'end': 4870.76, 'src': 'heatmap', 'start': 4807.563, 'weight': 3, 'content': [{'end': 4815.37, 'text': 'Summation of i is equal to 1 to n x of i minus x bar whole square divided by n minus 1.', 'start': 4807.563, 'duration': 7.807}, {'end': 4819.334, 'text': 'This n minus 1 why we do it it is also called as basal correction.', 'start': 4815.37, 'duration': 3.964}, {'end': 4822.117, 'text': 'We also say it as degree of freedom.', 'start': 4820.055, 'duration': 2.062}, {'end': 4833.352, 'text': 'Okay, degree of freedom, and i have probably made this video in my stats playlist why sample variance is divided by n minus one.', 'start': 4822.137, 'duration': 11.215}, {'end': 4836.575, 'text': 'you can go and search for that because of time constraint.', 'start': 4833.352, 'duration': 3.223}, {'end': 4837.296, 'text': 'again same thing.', 'start': 4836.575, 'duration': 0.721}, {'end': 4848.371, 'text': 'i will not explain you, but over here you can definitely understand these things now, okay, So overall, this was the topic that I want to cover.', 'start': 4837.296, 'duration': 11.075}, {'end': 4851.252, 'text': "Tomorrow, we'll basically start with distribution.", 'start': 4848.831, 'duration': 2.421}, {'end': 4853.773, 'text': "We'll finish up normal distribution.", 'start': 4851.272, 'duration': 2.501}, {'end': 4856.775, 'text': "We'll finish up log normal distribution, standard normal distributions.", 'start': 4853.793, 'duration': 2.982}, {'end': 4859.676, 'text': "We'll discuss multiple things.", 'start': 4857.875, 'duration': 1.801}, {'end': 4862.577, 'text': "Then we'll move towards confidence interval.", 'start': 4859.816, 'duration': 2.761}, {'end': 4867.959, 'text': "Then we'll move towards p-value, hypothesis testing, z-test, t-test, chi-squared test, ANOVA test.", 'start': 4862.597, 'duration': 5.362}, {'end': 4869.8, 'text': 'This test, that test, everything.', 'start': 4868.359, 'duration': 1.441}, {'end': 4870.76, 'text': "We'll try to finish it up.", 'start': 4869.82, 'duration': 0.94}], 'summary': 'Explained sample variance, degree of freedom, and upcoming topics in statistics.', 'duration': 63.197, 'max_score': 4807.563, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU4807563.jpg'}, {'end': 4964.198, 'src': 'embed', 'start': 4938.787, 'weight': 4, 'content': [{'end': 4945.573, 'text': 'okay, so, with respect to one neuron, you will basically be getting 100 plus courses.', 'start': 4938.787, 'duration': 6.786}, {'end': 4950.016, 'text': 'right now, every month, we are recording 100 plus courses.', 'start': 4945.573, 'duration': 4.443}, {'end': 4953.84, 'text': 'and remember, guys, right now it is for lifetime subscription.', 'start': 4950.016, 'duration': 3.824}, {'end': 4960.045, 'text': 'that basically means you just have to pay 6000 plus gst to take up all these courses.', 'start': 4953.84, 'duration': 6.205}, {'end': 4961.466, 'text': 'name the courses that you want.', 'start': 4960.045, 'duration': 1.421}, {'end': 4964.198, 'text': 'suppose you want to go with blockchain.', 'start': 4962.418, 'duration': 1.78}], 'summary': 'Access 100+ courses for lifetime subscription at a cost of 6000 plus gst', 'duration': 25.411, 'max_score': 4938.787, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU4938787.jpg'}], 'start': 3648.728, 'title': 'Statistical analysis and ineuron offer', 'summary': "Covers statistical analysis topics including calculating the 75th percentile, five number summary, outlier identification, and data visualization. additionally, it discusses an ineuron platform lifetime subscription offer, providing access to 100+ courses and a 10% discount with the code 'krish10'.", 'chapters': [{'end': 3823.777, 'start': 3648.728, 'title': 'Calculating 75th percentile', 'summary': 'Covered the calculation of the 75th percentile, where the value was found to be 9 using a specific formula and explanation, and the instructor encountered technical issues with the font and brightness settings during the session.', 'duration': 175.049, 'highlights': ['The 75th percentile value was calculated as 9 using the formula 75/100*21, resulting in an index position of 15.75, which was then rounded down to 9.', 'The instructor encountered technical issues with font and brightness settings, leading to difficulties in the session.']}, {'end': 4418.877, 'start': 3824.718, 'title': 'Five number summary & outlier removal', 'summary': 'Discusses the five number summary, including minimum, first quartile (q1), median, third quartile (q3), and maximum, and how to use these values to remove outliers using lower and higher fences, with a specific example, formulae, and computation steps explained in detail.', 'duration': 594.159, 'highlights': ['The chapter discusses the five number summary, including minimum, first quartile (Q1), median, third quartile (Q3), and maximum, and how to use these values to remove outliers using lower and higher fences, with a specific example, formulae, and computation steps explained in detail.', 'It explains the process of outlier removal using lower and higher fences, with a specific example and detailed computation steps, such as defining lower fence as Q1 minus 1.5 multiplied by IQR and upper fence as Q3 plus 1.5 multiplied by IQR.', 'It provides a detailed explanation of the computation of lower and higher fences for outlier removal, including the formulae and computation steps, such as Q1 minus 1.5 multiplied by IQR for the lower fence and Q3 plus 1.5 multiplied by IQR for the higher fence.', 'It details the computation of key statistical values for outlier removal, including the first quartile (Q1), third quartile (Q3), and the interquartile range (IQR), with a step-by-step calculation and explanation of percentile values and IQR.', 'It provides a step-by-step explanation of calculating the first quartile (Q1), third quartile (Q3), and interquartile range (IQR) using percentile values and specific formulae, with interactive engagement and calculation verification.']}, {'end': 4733.749, 'start': 4418.877, 'title': 'Identifying outliers and creating box plots', 'summary': 'Discusses identifying outliers using the 1.5 × iqr method, removing the outlier 27 from the dataset, computing the five-number summary (1, 3, 5, 7, 9), and creating a box plot to visualize the distribution.', 'duration': 314.872, 'highlights': ['The outlier 27 is removed from the dataset using the 1.5 × IQR method, resulting in a five-number summary of 1, 3, 5, 7, and 9. The removal of the outlier 27 is demonstrated using the 1.5 × IQR method, leading to the computation of the five-number summary (1, 3, 5, 7, 9).', 'The process of creating a box plot to visualize the distribution of the data is explained, with the key elements being the minimum (1), Q1 (3), median (5), Q3 (7), and maximum (9). A step-by-step explanation of creating a box plot is provided, emphasizing the representation of the minimum, Q1, median, Q3, and maximum values in the plot.', 'The concept of outliers is clarified by using the lower and higher fences, along with the mention of the interquartile range (IQR) for identifying and removing outliers. The explanation includes the use of lower and higher fences to identify outliers, along with referencing the interquartile range (IQR) as a method for outlier detection and removal.']}, {'end': 4909.303, 'start': 4736.21, 'title': 'Statistics explanation and data visualization', 'summary': 'Covers an explanation of statistics and data visualization, including topics like variance, data visualization applications, and upcoming topics like distribution, confidence interval, and hypothesis testing.', 'duration': 173.093, 'highlights': ['The explanation received 1000 likes, indicating positive reception from the audience.', 'The lecturer plans to cover a range of advanced topics in the upcoming sessions, including distribution, confidence interval, hypothesis testing, and various tests, reflecting a comprehensive curriculum.', 'The application of box plots in identifying outliers is explained, providing a visualization method for outlier detection.', 'The formula for variance, and the rationale behind dividing by n minus 1, known as basal correction and degree of freedom, is detailed, offering a clear understanding of the statistical concept.']}, {'end': 5174.551, 'start': 4909.884, 'title': 'Ineuron lifetime subscription offer', 'summary': "Discusses the ineuron platform's lifetime subscription offer, providing access to 100+ courses and emphasizing the affordability and breadth of content, with a 10% discount available using the code 'krish10'. the platform allows users to request new courses and modules, with plans to add an additional 100 courses in the future.", 'duration': 264.667, 'highlights': ['iNeuron platform offers a lifetime subscription for 100+ courses at 6000 plus gst, with plans to add 100 more courses in the future.', "Users can request new courses and modules, and avail a 10% discount using the code 'Krish10'.", 'The platform features diverse courses including blockchain, mon stack, data science, ML masters with over 100-200 videos, AIOps, deep learning, stats, C++, DSA with Python, MernStack, aptitude, GCP, AWS, and more.', 'The speaker encourages sharing the offer with friends to spread knowledge and mentions the availability of net banking for payment.', 'The session will continue the next day at 7 p.m, focusing on advanced topics.']}], 'duration': 1525.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Ims3L_hfLJU/pics/Ims3L_hfLJU3648728.jpg', 'highlights': ['The 75th percentile value was calculated as 9 using the formula 75/100*21, resulting in an index position of 15.75, which was then rounded down to 9.', 'The chapter discusses the five number summary, including minimum, first quartile (Q1), median, third quartile (Q3), and maximum, and how to use these values to remove outliers using lower and higher fences, with a specific example, formulae, and computation steps explained in detail.', 'The process of creating a box plot to visualize the distribution of the data is explained, with the key elements being the minimum (1), Q1 (3), median (5), Q3 (7), and maximum (9). A step-by-step explanation of creating a box plot is provided, emphasizing the representation of the minimum, Q1, median, Q3, and maximum values in the plot.', 'The lecturer plans to cover a range of advanced topics in the upcoming sessions, including distribution, confidence interval, hypothesis testing, and various tests, reflecting a comprehensive curriculum.', 'iNeuron platform offers a lifetime subscription for 100+ courses at 6000 plus gst, with plans to add 100 more courses in the future.']}], 'highlights': ['The instructor plans to move from basics to intermediate stats, specifically for data science.', 'The chapter covers measure of central tendency, measure of dispersion, Gaussian distribution, and Z score for data science.', 'The importance of enrolling in the community session and accessing materials from the previous video is emphasized, ensuring everyone is ready for the live session.', 'The introduction of an outlier significantly impacted the mean value. The addition of an outlier (100) to the dataset resulted in a substantial change in the mean value from 3.2 to 12, highlighting the substantial impact of outliers on the distribution.', 'The chapter discusses the measure of dispersion, focusing on variance and standard deviation, essential for data analysis and interview preparation.', 'Illustrating the relationship between standard deviation and data distribution Smaller standard deviation indicates less distributed data, while higher variance is associated with more dispersed data.', 'The 75th percentile value was calculated as 9 using the formula 75/100*21, resulting in an index position of 15.75, which was then rounded down to 9.', 'The process of creating a box plot to visualize the distribution of the data is explained, with the key elements being the minimum (1), Q1 (3), median (5), Q3 (7), and maximum (9). A step-by-step explanation of creating a box plot is provided, emphasizing the representation of the minimum, Q1, median, Q3, and maximum values in the plot.', 'iNeuron platform offers a lifetime subscription for 100+ courses at 6000 plus gst, with plans to add 100 more courses in the future.']}