title
Bayesian Learning
description
detail
{'title': 'Bayesian Learning', 'heatmap': [{'end': 649.594, 'start': 566.121, 'weight': 0.756}, {'end': 709.296, 'start': 674.434, 'weight': 0.734}, {'end': 796.545, 'start': 727.825, 'weight': 0.902}, {'end': 891.875, 'start': 840.254, 'weight': 0.85}, {'end': 1278.573, 'start': 1245.177, 'weight': 0.717}, {'end': 1577.025, 'start': 1543.262, 'weight': 0.821}], 'summary': 'Delves into bayesian learning and its applications, including bayesian probability, bayes theorem, cancer probability determination, bayes theorem applications in machine learning, linear regression, maximum likelihood, bayes optimal classifier, and gibbs sampling.', 'chapters': [{'end': 319.375, 'segs': [{'end': 217.179, 'src': 'embed', 'start': 18.476, 'weight': 0, 'content': [{'end': 21.018, 'text': "Good morning, welcome to today's lecture.", 'start': 18.476, 'duration': 2.542}, {'end': 34.505, 'text': 'Today we will talk about Bayesian learning which is part b of module 4.', 'start': 21.418, 'duration': 13.087}, {'end': 45.392, 'text': 'In the last class we gave a crash course on probability and today we will see how probability is used for learning especially for classification.', 'start': 34.505, 'duration': 10.887}, {'end': 52.929, 'text': 'So, probability how it is used for modeling concepts.', 'start': 45.812, 'duration': 7.117}, {'end': 74.663, 'text': 'So, Bayesian probability is the notion of probability which talks about partial beliefs.', 'start': 62.075, 'duration': 12.588}, {'end': 103.735, 'text': 'So Bayesian probability talks about probability interpretation as partial beliefs and Bayesian estimation.', 'start': 83.154, 'duration': 20.581}, {'end': 106.518, 'text': 'it calculates the validity of a proposition.', 'start': 103.735, 'duration': 2.783}, {'end': 122.157, 'text': 'The validity of the proposition is calculated based on two things.', 'start': 117.034, 'duration': 5.123}, {'end': 137.487, 'text': 'Number one, prior estimate it is based on the prior estimate of its probability and secondly new evidence, new relevant evidence.', 'start': 122.858, 'duration': 14.629}, {'end': 152.258, 'text': 'based on this the posterior Bayesian estimation is done and the key to this is an important theorem called Bayes theorem which we will introduce now.', 'start': 140.086, 'duration': 12.172}, {'end': 181.769, 'text': 'So, Bayes theorem it deals with how to find the probability of a hypothesis given the data.', 'start': 163.449, 'duration': 18.32}, {'end': 192.044, 'text': 'you have different possible competing hypothesis and you can find out the probability of the individual hypothesis given the data,', 'start': 182.958, 'duration': 9.086}, {'end': 197.968, 'text': 'so that you can find out which is the most probable or most likely hypothesis.', 'start': 192.044, 'duration': 5.924}, {'end': 205.393, 'text': 'According to the Bayes theorem, probability of hypothesis given data is given by probability.', 'start': 198.548, 'duration': 6.845}, {'end': 217.179, 'text': 'This is very easy to derive.', 'start': 214.238, 'duration': 2.941}], 'summary': 'Bayesian learning explores probability for modeling concepts and uses bayes theorem for hypothesis probability calculation.', 'duration': 198.703, 'max_score': 18.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI18476.jpg'}, {'end': 319.375, 'src': 'embed', 'start': 253.575, 'weight': 2, 'content': [{'end': 261.341, 'text': 'So, Bayes rule is the most important formula from which we can look for at Bayes learning.', 'start': 253.575, 'duration': 7.766}, {'end': 268.292, 'text': 'So, p H is the prior probability of the hypothesis H.', 'start': 262.498, 'duration': 5.794}, {'end': 270.838, 'text': 'So, this is the prior probability.', 'start': 268.292, 'duration': 2.546}, {'end': 285.497, 'text': 'Probability D given H is the probability of the data, if the hypothesis is true what is the likelihood of the data being generated.', 'start': 273.868, 'duration': 11.629}, {'end': 294.103, 'text': 'If H was true what is the probability of D being generated and P D is the likelihood of the data.', 'start': 286.117, 'duration': 7.986}, {'end': 297.906, 'text': 'So, based on this we have Bayes theorem.', 'start': 295.264, 'duration': 2.642}, {'end': 304.186, 'text': "Now let us see an application of Bayes' theorem, for this you may look at this slide.", 'start': 299.764, 'duration': 4.422}, {'end': 312.331, 'text': 'Suppose you want to know whether a patient has cancer or not.', 'start': 308.169, 'duration': 4.162}, {'end': 319.375, 'text': "So, this particular example is taken from Tom Mitchell's book on machine learning.", 'start': 313.712, 'duration': 5.663}], 'summary': "Bayes rule is crucial for bayes learning, with an example of cancer diagnosis from tom mitchell's book.", 'duration': 65.8, 'max_score': 253.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI253575.jpg'}], 'start': 18.476, 'title': 'Bayesian learning and applications', 'summary': "Explores bayesian learning for classification, focusing on bayesian probability, bayes theorem, and its application in determining cancer probability, as illustrated in tom mitchell's book on machine learning.", 'chapters': [{'end': 217.179, 'start': 18.476, 'title': 'Bayesian learning in classification', 'summary': 'Delves into bayesian learning for classification, emphasizing bayesian probability as partial beliefs and estimation based on prior and new evidence, and introduces bayes theorem to calculate the probability of a hypothesis given the data.', 'duration': 198.703, 'highlights': ['Bayesian probability interprets probability as partial beliefs and is used for modeling concepts, particularly in classification.', 'Bayesian estimation calculates the validity of a proposition based on prior estimates and new relevant evidence.', 'Bayes theorem is introduced to find the probability of a hypothesis given the data, allowing the determination of the most probable or likely hypothesis.', 'The chapter provides a crash course on probability and its use in learning, specifically for classification.']}, {'end': 319.375, 'start': 217.179, 'title': 'Bayes rule and applications', 'summary': "Discusses the concept of bayes rule, emphasizing its importance in bayes learning and its application in determining the probability of a patient having cancer based on given data, as illustrated in tom mitchell's book on machine learning.", 'duration': 102.196, 'highlights': ['Bayes rule is derived from the law of products, manipulating the probabilities of a hypothesis and data, and is crucial in Bayes learning.', 'The prior probability of a hypothesis and the likelihood of the data are key components of Bayes theorem, which is fundamental in Bayesian inference.', "An application of Bayes' theorem is demonstrated through the example of determining the probability of a patient having cancer based on provided data, as illustrated in Tom Mitchell's book on machine learning."]}], 'duration': 300.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI18476.jpg', 'highlights': ['Bayesian probability interprets probability as partial beliefs and is used for modeling concepts, particularly in classification.', 'Bayes theorem is introduced to find the probability of a hypothesis given the data, allowing the determination of the most probable or likely hypothesis.', 'Bayes rule is derived from the law of products, manipulating the probabilities of a hypothesis and data, and is crucial in Bayes learning.', "An application of Bayes' theorem is demonstrated through the example of determining the probability of a patient having cancer based on provided data, as illustrated in Tom Mitchell's book on machine learning.", 'The chapter provides a crash course on probability and its use in learning, specifically for classification.', 'Bayesian estimation calculates the validity of a proposition based on prior estimates and new relevant evidence.', 'The prior probability of a hypothesis and the likelihood of the data are key components of Bayes theorem, which is fundamental in Bayesian inference.']}, {'end': 810.13, 'segs': [{'end': 350.404, 'src': 'embed', 'start': 320.095, 'weight': 0, 'content': [{'end': 323.277, 'text': 'A patient takes a lab test and the result is positive.', 'start': 320.095, 'duration': 3.182}, {'end': 342.357, 'text': 'Now, the test returns a correct positive result in only 98 percent of the cases in which the disease is actually present and the correct negative result in only 97 percent of the cases in which the disease is not present.', 'start': 324.463, 'duration': 17.894}, {'end': 350.404, 'text': 'Furthermore, you know that 0.008 of the entire population have this cancer.', 'start': 343.338, 'duration': 7.066}], 'summary': '98% accurate positive test result for a cancer affecting 0.008 of the population.', 'duration': 30.309, 'max_score': 320.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI320095.jpg'}, {'end': 649.594, 'src': 'heatmap', 'start': 566.121, 'weight': 0.756, 'content': [{'end': 578.587, 'text': 'So, the posterior probability of having cancer is proportional to 0.98 into 0.02 and not cancer is proportional to 0.003 into 0.992.', 'start': 566.121, 'duration': 12.466}, {'end': 581.889, 'text': 'Based on this you can figure out this probabilities.', 'start': 578.587, 'duration': 3.302}, {'end': 587.692, 'text': 'So, this plus this will sum to 1 and you can find out that this is more likely.', 'start': 582.229, 'duration': 5.463}, {'end': 592.434, 'text': 'So, it is more likely that the patient does not have cancer given this.', 'start': 588.172, 'duration': 4.262}, {'end': 594.976, 'text': 'So, this is an application of Bayes theorem.', 'start': 592.855, 'duration': 2.121}, {'end': 606.622, 'text': 'Now, the goal of Bayes learning, how can Bayes theorem be applied to find a hypothesis in machine learning.', 'start': 596.697, 'duration': 9.925}, {'end': 621.769, 'text': 'So, based on the Bayes theorem, we can find out the most a likely hypothesis which is called the maximum a posteriori hypothesis.', 'start': 609.083, 'duration': 12.686}, {'end': 649.594, 'text': 'So, the map hypothesis is given by that value of H for which probability h given data is maximized.', 'start': 628.692, 'duration': 20.902}], 'summary': 'Applying bayes theorem to find most likely hypothesis in machine learning.', 'duration': 83.473, 'max_score': 566.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI566121.jpg'}, {'end': 718.683, 'src': 'heatmap', 'start': 674.434, 'weight': 1, 'content': [{'end': 680.537, 'text': 'Now H capital H is the hypothesis space and small h out of all hypothesis.', 'start': 674.434, 'duration': 6.103}, {'end': 687, 'text': 'in the hypothesis space you want to find that hypothesis for which this expression is maximized.', 'start': 680.537, 'duration': 6.463}, {'end': 691.142, 'text': 'Now, P d is independent of the particular hypothesis.', 'start': 687.52, 'duration': 3.622}, {'end': 696.304, 'text': 'So, we can say this is the same hypothesis for which this part is maximized.', 'start': 691.562, 'duration': 4.742}, {'end': 709.296, 'text': 'So the posterior probability is given by probability d given h, proportional to probability d, given h times p h,', 'start': 700.01, 'duration': 9.286}, {'end': 718.683, 'text': 'and the maximum a posteriori hypothesis is the one for which probability d, given h times p h, is maximum.', 'start': 709.296, 'duration': 9.387}], 'summary': 'Find the hypothesis maximizing probability d given h times p h.', 'duration': 27.121, 'max_score': 674.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI674434.jpg'}, {'end': 796.545, 'src': 'heatmap', 'start': 727.825, 'weight': 0.902, 'content': [{'end': 744.07, 'text': 'Now, in the event if for all hypothesis the probabilities are equal then you choose that hypothesis for which probability D given H is maximum.', 'start': 727.825, 'duration': 16.245}, {'end': 749.372, 'text': 'So H m l is the maximum likelihood hypothesis.', 'start': 744.871, 'duration': 4.501}, {'end': 759.787, 'text': 'it is applicable in those cases where the prior probability of all hypothesis are equal, that is, initially, before you have any data,', 'start': 749.372, 'duration': 10.415}, {'end': 761.989, 'text': 'all the hypothesis are equally probable.', 'start': 759.787, 'duration': 2.202}, {'end': 768.013, 'text': 'In that case you choose the hypothesis for which probability d given h is maximum.', 'start': 762.489, 'duration': 5.524}, {'end': 780.502, 'text': 'So, this is the application of Bayesian theorem in order to find out maximum a posteriori hypothesis and the maximum likelihood hypothesis.', 'start': 769.154, 'duration': 11.348}, {'end': 796.545, 'text': 'Now, we will see an example of how in finding the least squared line we can apply the Bayes theorem to find out the most likely hypothesis.', 'start': 781.92, 'duration': 14.625}], 'summary': 'Bayesian theorem used to find maximum likelihood hypothesis when all hypothesis have equal prior probabilities.', 'duration': 68.72, 'max_score': 727.825, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI727825.jpg'}], 'start': 320.095, 'title': 'Bayes theorem applications', 'summary': 'Discusses the application of bayes theorem in calculating the probability of a patient having cancer given a positive test result and its application in machine learning to find maximum a posteriori and maximum likelihood hypotheses.', 'chapters': [{'end': 594.976, 'start': 320.095, 'title': 'Bayes theorem application', 'summary': 'Discusses the application of bayes theorem to calculate the probability of a patient having cancer given a positive test result, revealing that it is more likely that the patient does not have cancer.', 'duration': 274.881, 'highlights': ['The probability of the test being positive given that cancer is present is 0.98, and the probability of the test being negative given cancer is 0.02.', 'The probability of test being positive given not cancer is 0.03, and the probability of not given not cancer is 0.97.', 'The posterior probability of having cancer is proportional to 0.98 * 0.002, and not having cancer is proportional to 0.03 * 0.992.']}, {'end': 810.13, 'start': 596.697, 'title': 'Bayes theorem in machine learning', 'summary': 'Explains the application of bayes theorem in finding the maximum a posteriori hypothesis and the maximum likelihood hypothesis in machine learning, with an example of applying it to find the most likely hypothesis for a real valued function.', 'duration': 213.433, 'highlights': ['The maximum a posteriori hypothesis (MAP) is determined by maximizing the expression probability d given h times probability h, which is the prior probability of the hypothesis.', 'In cases where the prior probabilities of all hypotheses are equal, the maximum likelihood hypothesis (ML) is chosen based on the maximum probability of d given h.']}], 'duration': 490.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI320095.jpg', 'highlights': ['The probability of the test being positive given that cancer is present is 0.98, and the probability of the test being negative given cancer is 0.02.', 'The maximum a posteriori hypothesis (MAP) is determined by maximizing the expression probability d given h times probability h, which is the prior probability of the hypothesis.']}, {'end': 1198.728, 'segs': [{'end': 891.875, 'src': 'heatmap', 'start': 813.26, 'weight': 2, 'content': [{'end': 822.028, 'text': 'We have already talked about linear regression which can be used to learn a real valued function.', 'start': 813.26, 'duration': 8.768}, {'end': 828.914, 'text': 'And suppose the data is generated in the following fashion.', 'start': 824.009, 'duration': 4.905}, {'end': 833.998, 'text': 'So, there is a target function f, f is the target function.', 'start': 829.334, 'duration': 4.664}, {'end': 843.276, 'text': 'and the individual data are generated.', 'start': 840.254, 'duration': 3.022}, {'end': 857.482, 'text': 'So the data is given as x i, d i at the individual data points and d i is generated as f, x i plus epsilon i.', 'start': 843.656, 'duration': 13.826}, {'end': 868.428, 'text': 'epsilon i is the error and we assume that this error follows a normal distribution with mean 0 and a standard deviation sigma.', 'start': 857.482, 'duration': 10.946}, {'end': 879.766, 'text': 'So, we can think that d i is coming from a normal distribution whose mean is f x i.', 'start': 869.62, 'duration': 10.146}, {'end': 891.875, 'text': 'and whose error is given by sigma square, where sigma square is the variance corresponding to this error term.', 'start': 881.307, 'duration': 10.568}], 'summary': 'Linear regression models real valued function with normal distribution error and variance.', 'duration': 44.222, 'max_score': 813.26, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI813260.jpg'}, {'end': 1198.728, 'src': 'embed', 'start': 1169.707, 'weight': 0, 'content': [{'end': 1174.968, 'text': 'i whole square is maximized, and this is exactly the least square criteria.', 'start': 1169.707, 'duration': 5.261}, {'end': 1178.859, 'text': 'So, based on this, we will get a function,', 'start': 1176.258, 'duration': 2.601}, {'end': 1186.282, 'text': 'and that function could be something like this but this is that function for which the sum of square errors is maximized.', 'start': 1178.859, 'duration': 7.423}, {'end': 1198.728, 'text': 'So, this is a Bayesian explanation to why we would choose a sum of square error to minimize in order to find out the linear regression.', 'start': 1186.743, 'duration': 11.985}], 'summary': 'Bayesian explanation for choosing sum of square error to minimize in linear regression.', 'duration': 29.021, 'max_score': 1169.707, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1169707.jpg'}], 'start': 813.26, 'title': 'Linear regression and maximum likelihood', 'summary': 'Covers the generation of data in linear regression, the concept of maximum likelihood hypothesis, and the bayesian explanation for choosing the sum of square error to minimize in linear regression.', 'chapters': [{'end': 1198.728, 'start': 813.26, 'title': 'Linear regression and maximum likelihood', 'summary': 'Explains the generation of data in linear regression, the concept of maximum likelihood hypothesis, and the bayesian explanation for choosing the sum of square error to minimize in linear regression.', 'duration': 385.468, 'highlights': ['The data in linear regression is generated from a normal distribution with mean f(xi) and variance sigma square, and the maximum likelihood hypothesis for linear regression is the one that minimizes the sum of square errors.', 'The maximum likelihood hypothesis (hml) for linear regression is obtained by maximizing the likelihood function, which is equivalent to minimizing the sum of square errors, providing a Bayesian explanation for choosing this criterion.', 'The generation of data in linear regression involves a target function f, individual data points xi and di, and an error term epsilon i, which follows a normal distribution with mean 0 and standard deviation sigma.']}], 'duration': 385.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI813260.jpg', 'highlights': ['The maximum likelihood hypothesis for linear regression minimizes the sum of square errors.', 'The likelihood function for linear regression is equivalent to minimizing the sum of square errors, providing a Bayesian explanation for this criterion.', 'Data in linear regression involves a target function f, individual data points xi and di, and an error term epsilon i following a normal distribution.']}, {'end': 1738.612, 'segs': [{'end': 1304.388, 'src': 'heatmap', 'start': 1245.177, 'weight': 3, 'content': [{'end': 1250.103, 'text': 'Now, the question is we are suppose we are given some training data.', 'start': 1245.177, 'duration': 4.926}, {'end': 1256.368, 'text': 'which each of the training instances we are given the class that it belongs to.', 'start': 1251.647, 'duration': 4.721}, {'end': 1264.31, 'text': 'Then we are given a test instance and we are asked what is the optimum classification of X.', 'start': 1257.068, 'duration': 7.242}, {'end': 1278.573, 'text': 'The naive answer would be that you find out the most probable hypothesis using the map criteria and then you apply that hypothesis to the test example,', 'start': 1264.31, 'duration': 14.263}, {'end': 1281.054, 'text': 'but this is not necessarily the case.', 'start': 1278.573, 'duration': 2.481}, {'end': 1293.065, 'text': 'So, if we are given the training data, from the training data we learn H map.', 'start': 1283.763, 'duration': 9.302}, {'end': 1304.388, 'text': 'So, H map is the most probable hypothesis, but H map is not the most probable classification.', 'start': 1295.966, 'duration': 8.422}], 'summary': 'Training data helps find most probable hypothesis for classification.', 'duration': 40.078, 'max_score': 1245.177, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1245177.jpg'}, {'end': 1480.126, 'src': 'embed', 'start': 1450.154, 'weight': 4, 'content': [{'end': 1457.861, 'text': 'Bayes optimal classifier will output that class for a classification problem, for which,', 'start': 1450.154, 'duration': 7.707}, {'end': 1471.793, 'text': 'if you take summation over the entire hypothesis space of probability v j given h, i times probability h i given d, that will be maximum.', 'start': 1457.861, 'duration': 13.932}, {'end': 1480.126, 'text': 'So, in order to find out the Bayes optimal classifier, you have to for that test instance.', 'start': 1472.093, 'duration': 8.033}], 'summary': 'Bayes optimal classifier selects class with maximum probability in hypothesis space.', 'duration': 29.972, 'max_score': 1450.154, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1450154.jpg'}, {'end': 1577.025, 'src': 'heatmap', 'start': 1540.74, 'weight': 2, 'content': [{'end': 1542.642, 'text': 'So why is this called optimal?', 'start': 1540.74, 'duration': 1.902}, {'end': 1555.392, 'text': 'It is optimal in the sense that no other classifier, using the same hypothesis, space and same prior knowledge, can outperform this on the average.', 'start': 1543.262, 'duration': 12.13}, {'end': 1564.86, 'text': 'So this is called the Bayes optimal classifier, but, as you can see, since typically the size of the hypothesis space is huge,', 'start': 1555.753, 'duration': 9.107}, {'end': 1568.964, 'text': 'it is not possible to apply the Bayes optimal classifier.', 'start': 1564.86, 'duration': 4.104}, {'end': 1577.025, 'text': 'So, we have to use some approximation of the Bayes optimal classifier and for that we can use Gibbs sampling.', 'start': 1569.344, 'duration': 7.681}], 'summary': 'Bayes optimal classifier outperforms others on average, but not always feasible due to large hypothesis space. gibbs sampling used for approximation.', 'duration': 28.224, 'max_score': 1540.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1540740.jpg'}, {'end': 1706.369, 'src': 'embed', 'start': 1641.009, 'weight': 0, 'content': [{'end': 1650.554, 'text': 'each of the hypothesis will apply on the test instance and weight their contributions according to their posterior probabilities.', 'start': 1641.009, 'duration': 9.545}, {'end': 1661.339, 'text': 'But in Gibbs sampling we will choose a randomly a hypothesis according to pH by D and use it to classify the new instance.', 'start': 1651.094, 'duration': 10.245}, {'end': 1675.247, 'text': 'So, we just choose one hypothesis from the distribution and use it to classify the new instance.', 'start': 1666.382, 'duration': 8.865}, {'end': 1684.735, 'text': 'Fortunately, it is a surprising result that it has been found that the error for Gibbs algorithm is quite bounded.', 'start': 1675.988, 'duration': 8.747}, {'end': 1696.081, 'text': 'If the expected value is taken over the target hypothesis drawn at random according to the prior probability distribution,', 'start': 1686.756, 'duration': 9.325}, {'end': 1706.369, 'text': 'then the expected error of the Gibbs classifier is less than equal to twice the error of the Bayes optimal classifier.', 'start': 1696.081, 'duration': 10.288}], 'summary': 'Gibbs sampling for hypothesis classification has bounded error, less than twice the bayes optimal classifier.', 'duration': 65.36, 'max_score': 1641.009, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1641009.jpg'}], 'start': 1220.532, 'title': 'Bayes optimal classifier & gibbs sampling', 'summary': 'Explains the concept of bayes optimal classifier, emphasizing the limitations and proposes gibbs sampling as an approximation resulting in an expected error no more than twice that of bayes optimal classifier, illustrated through hypothetical probabilities and an example of probability calculations.', 'chapters': [{'end': 1408.359, 'start': 1220.532, 'title': 'Bayes optimal classifier', 'summary': 'Explains the concept of bayes optimal classifier, emphasizing that the most probable hypothesis using the map criteria may not always lead to the most probable classification, illustrated through hypothetical probabilities of candidate hypotheses, and highlighting the concept of bayes optimal classification for a particular example.', 'duration': 187.827, 'highlights': ['The most probable hypothesis using the MAP criteria may not always lead to the most probable classification, as illustrated by hypothetical probabilities of candidate hypotheses.', 'In Bayes optimal classification, for a particular example, the class is determined based on the sum of the probabilities of candidate hypotheses, leading to the most probable classification.']}, {'end': 1540.74, 'start': 1410.74, 'title': 'Bayes optimal classifier', 'summary': 'Discusses the concept of the bayes optimal classifier, which outputs the class for which the summation of the probability of each hypothesis given the data is maximum, illustrated with an example of probability calculations for classification.', 'duration': 130, 'highlights': ['The Bayes optimal classifier outputs the class for which the summation over the entire hypothesis space of probability v j given h i times probability h i given d is maximum, which is illustrated with an example of probability calculations for classification.', 'The process of finding the Bayes optimal classifier involves applying all possible hypotheses to a test instance to determine the base optimal classification, but this turns out to be intractable.', 'An example is provided to demonstrate the calculation of probabilities for the Bayes optimal classifier, with the probabilities of plus and minus classes being 0.4 and 0.6 respectively, based on the given hypothesis probabilities.']}, {'end': 1738.612, 'start': 1540.74, 'title': 'Bayes optimal classifier & gibbs sampling', 'summary': 'Discusses the bayes optimal classifier and its limitations due to the huge hypothesis space, proposing the use of gibbs sampling as an approximation, which surprisingly results in an expected error no more than twice that of the bayes optimal classifier.', 'duration': 197.872, 'highlights': ['The expected error of the Gibbs classifier is less than equal to twice the error of the Bayes optimal classifier The error for Gibbs algorithm is bounded, with the expected error of the Gibbs classifier being less than or equal to twice the error of the Bayes optimal classifier.', 'Gibbs sampling can be used to choose one hypothesis which can be used to classify the instance Gibbs sampling allows the selection of one hypothesis from the posterior distribution to classify the instance, making it a tractable option.', 'The Bayes optimal classifier is not possible to apply due to the typically huge size of the hypothesis space The Bayes optimal classifier is impractical to apply due to the vast size of the hypothesis space, leading to the need for approximation methods like Gibbs sampling.']}], 'duration': 518.08, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/E3l26bTdtxI/pics/E3l26bTdtxI1220532.jpg', 'highlights': ['The expected error of the Gibbs classifier is less than equal to twice the error of the Bayes optimal classifier', 'Gibbs sampling can be used to choose one hypothesis which can be used to classify the instance', 'The Bayes optimal classifier is not possible to apply due to the typically huge size of the hypothesis space', 'The most probable hypothesis using the MAP criteria may not always lead to the most probable classification, as illustrated by hypothetical probabilities of candidate hypotheses', 'The Bayes optimal classifier outputs the class for which the summation over the entire hypothesis space of probability v j given h i times probability h i given d is maximum, which is illustrated with an example of probability calculations for classification']}], 'highlights': ['Bayesian probability interprets probability as partial beliefs and is used for modeling concepts, particularly in classification.', 'Bayes theorem is introduced to find the probability of a hypothesis given the data, allowing the determination of the most probable or likely hypothesis.', "An application of Bayes' theorem is demonstrated through the example of determining the probability of a patient having cancer based on provided data, as illustrated in Tom Mitchell's book on machine learning.", 'The prior probability of a hypothesis and the likelihood of the data are key components of Bayes theorem, which is fundamental in Bayesian inference.', 'The probability of the test being positive given that cancer is present is 0.98, and the probability of the test being negative given cancer is 0.02.', 'The maximum a posteriori hypothesis (MAP) is determined by maximizing the expression probability d given h times probability h, which is the prior probability of the hypothesis.', 'The maximum likelihood hypothesis for linear regression minimizes the sum of square errors.', 'The expected error of the Gibbs classifier is less than equal to twice the error of the Bayes optimal classifier', 'Gibbs sampling can be used to choose one hypothesis which can be used to classify the instance', 'The Bayes optimal classifier outputs the class for which the summation over the entire hypothesis space of probability v j given h i times probability h i given d is maximum, which is illustrated with an example of probability calculations for classification']}