title

Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning Algorithm | Edureka

description

** Machine Learning Training with Python: https://www.edureka.co/data-science-python-certification-course **
This Edureka video will provide you with a detailed and comprehensive knowledge of Naive Bayes Classifier Algorithm in python. At the end of the video, you will learn from a demo example on Naive Bayes. Below are the topics covered in this tutorial:
1. What is Naive Bayes?
2. Bayes Theorem and its use
3. Mathematical Working of Naive Bayes
4. Step by step Programming in Naive Bayes
5. Prediction Using Naive Bayes
Check out our playlist for more videos: http://bit.ly/2taym8X
PG in Artificial Intelligence and Machine Learning with NIT Warangal : https://www.edureka.co/post-graduate/machine-learning-and-ai
Post Graduate Certification in Data Science with IIT Guwahati - https://www.edureka.co/post-graduate/data-science-program
(450+ Hrs || 9 Months || 20+ Projects & 100+ Case studies)
Subscribe to our channel to get video updates. Hit the subscribe button above.
#MachineLearningUsingPython #MachineLearningTraning
How it Works?
1. This is a 5 Week Instructor led Online Course,40 hours of assignment and 20 hours of project work
2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course.
3. At the end of the training, you will be working on a real-time project for which we will provide you a Grade and a Verifiable Certificate!
- - - - - - - - - - - - - - - - -
About the Course
Edureka’s Machine Learning Course using Python is designed to make you grab the concepts of Machine Learning. The Machine Learning training will provide deep understanding of Machine Learning and its mechanism. As a Data Scientist, you will be learning the importance of Machine Learning and its implementation in python programming language. Furthermore, you will be taught Reinforcement Learning which in turn is an important aspect of Artificial Intelligence. You will be able to automate real life scenarios using Machine Learning Algorithms. Towards the end of the course, we will be discussing various practical use cases of Machine Learning in python programming language to enhance your learning experience.
After completing this Machine Learning Certification Training using Python, you should be able to:
Gain insight into the 'Roles' played by a Machine Learning Engineer
Automate data analysis using python
Describe Machine Learning
Work with real-time data
Learn tools and techniques for predictive modeling
Discuss Machine Learning algorithms and their implementation
Validate Machine Learning algorithms
Explain Time Series and it’s related concepts
Gain expertise to handle business in future, living the present
- - - - - - - - - - - - - - - - - - -
Why learn Machine Learning with Python?
Data Science is a set of techniques that enable the computers to learn the desired behavior from data without explicitly being programmed. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science. This course exposes you to different classes of machine learning algorithms like supervised, unsupervised and reinforcement algorithms. This course imparts you the necessary skills like data pre-processing, dimensional reduction, model evaluation and also exposes you to different machine learning algorithms like regression, clustering, decision trees, random forest, Naive Bayes and Q-Learning.
For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll free).
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka

detail

{'title': 'Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning Algorithm | Edureka', 'heatmap': [{'end': 424.616, 'start': 395.508, 'weight': 0.935}, {'end': 965.943, 'start': 927.491, 'weight': 1}], 'summary': 'Covers the concept of naive bayes algorithm, likelihood tables, bayes theorem, industrial applications, model creation, data splitting, gaussian probability calculation, and implementation with scikit-learn, achieving 68% accuracy with a 67:33 training-test split ratio and 0.96 precision and recall in iris dataset.', 'chapters': [{'end': 367.796, 'segs': [{'end': 74.138, 'src': 'embed', 'start': 47.566, 'weight': 0, 'content': [{'end': 51.909, 'text': 'Now naive Bayes is a simple but surprisingly powerful algorithm from predictive analysis.', 'start': 47.566, 'duration': 4.343}, {'end': 58.855, 'text': 'It is a classification technique based on Bayes theorem with an assumption of independence among predictors.', 'start': 52.75, 'duration': 6.105}, {'end': 63.532, 'text': 'It comprises of two parts, which is naive and bias.', 'start': 59.79, 'duration': 3.742}, {'end': 72.057, 'text': 'in simple terms, naive bias classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature,', 'start': 63.532, 'duration': 8.525}, {'end': 74.138, 'text': 'even if this features depend on each other.', 'start': 72.057, 'duration': 2.081}], 'summary': 'Naive bayes: simple but powerful classification algorithm based on bayes theorem with assumption of independence among predictors.', 'duration': 26.572, 'max_score': 47.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM47566.jpg'}, {'end': 219.125, 'src': 'embed', 'start': 189.28, 'weight': 1, 'content': [{'end': 195.585, 'text': 'Now if the evidence is provided for instance someone looks at the card that the single card is a face card.', 'start': 189.28, 'duration': 6.305}, {'end': 201.797, 'text': "The probability of King given that it's a face can be calculated using the Bayes theorem by this formula.", 'start': 196.334, 'duration': 5.463}, {'end': 210.241, 'text': "Now, since every King is also a face card, the probability of face, given that it's a king, is equal to 1,", 'start': 202.737, 'duration': 7.504}, {'end': 219.125, 'text': 'and since there are three face cards in each suit, that is, the Jack, King and Queen, the probability of the face card is equal to 12 by 52, that is,', 'start': 210.241, 'duration': 8.884}], 'summary': "Using bayes theorem, the probability of a king given that it's a face card can be calculated as 1/4.", 'duration': 29.845, 'max_score': 189.28, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM189280.jpg'}, {'end': 376.762, 'src': 'embed', 'start': 348.11, 'weight': 2, 'content': [{'end': 355.395, 'text': 'Now using these terms Bayes theorem can be rephrased as the procedure probability equals the prior probability times the likelihood ratio.', 'start': 348.11, 'duration': 7.285}, {'end': 363.854, 'text': 'So now that we know the maths which is involved behind the base theorem.', 'start': 360.251, 'duration': 3.603}, {'end': 367.796, 'text': "Let's see how we can implement this in a real-life scenario.", 'start': 364.394, 'duration': 3.402}, {'end': 376.762, 'text': 'So suppose we have a data set in which we have the outlook the humidity and we need to find out whether we should play or not on that day.', 'start': 368.597, 'duration': 8.165}], 'summary': 'Bayes theorem is the prior probability times the likelihood ratio; it can be implemented in real-life scenarios, such as determining whether to play based on outlook and humidity.', 'duration': 28.652, 'max_score': 348.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM348110.jpg'}], 'start': 7.338, 'title': 'Naive bayes algorithm', 'summary': 'Introduces the concept of naive bayes algorithm for classification, explaining its logic, steps, and application in probability theory. it also provides an example of bayes theorem using a deck of playing cards.', 'chapters': [{'end': 367.796, 'start': 7.338, 'title': 'Naive bayes algorithm overview', 'summary': 'Introduces the concept of naive bayes algorithm for classification, explaining its logic, steps involved, and its application in probability theory. it also provides an example of bayes theorem using a deck of playing cards.', 'duration': 360.458, 'highlights': ['Naive Bayes algorithm is a classification technique based on Bayes theorem, assuming independence among predictors, and is particularly useful for very large datasets. It is mentioned that the Naive Bayes algorithm is simple, powerful, and useful for very large datasets.', 'Bayes theorem is used to figure out conditional probability, relating the prior and posterior probabilities, and can be rephrased as the procedure probability equals the prior probability times the likelihood ratio. The explanation of Bayes theorem and its rephrasing as the procedure probability equals the prior probability times the likelihood ratio is provided.', "An example of Bayes theorem is given using a deck of playing cards to calculate the probability of a card being a king given that it's a face card. An example of Bayes theorem using a deck of playing cards is provided to calculate the probability of a card being a king given that it's a face card."]}], 'duration': 360.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM7338.jpg', 'highlights': ['Naive Bayes algorithm is a classification technique based on Bayes theorem, assuming independence among predictors, and is particularly useful for very large datasets.', "An example of Bayes theorem is given using a deck of playing cards to calculate the probability of a card being a king given that it's a face card.", 'Bayes theorem is used to figure out conditional probability, relating the prior and posterior probabilities, and can be rephrased as the procedure probability equals the prior probability times the likelihood ratio.']}, {'end': 559.438, 'segs': [{'end': 424.616, 'src': 'heatmap', 'start': 368.597, 'weight': 1, 'content': [{'end': 376.762, 'text': 'So suppose we have a data set in which we have the outlook the humidity and we need to find out whether we should play or not on that day.', 'start': 368.597, 'duration': 8.165}, {'end': 378.884, 'text': 'So the outlook can be sunny.', 'start': 377.323, 'duration': 1.561}, {'end': 381.846, 'text': 'overcast rain and the humidity are high.', 'start': 378.884, 'duration': 2.962}, {'end': 386.689, 'text': 'normal and the wind are categorized into two features, which are the weak and the strong winds.', 'start': 381.846, 'duration': 4.843}, {'end': 391.826, 'text': 'The first of all will create a frequency table using each attribute of the data set.', 'start': 387.645, 'duration': 4.181}, {'end': 395.107, 'text': 'So the frequency table for the outlook looks like this.', 'start': 392.727, 'duration': 2.38}, {'end': 402.59, 'text': 'We have sunny overcast and rainy the frequency table of humidity looks like this and the frequency table of wind looks like this.', 'start': 395.508, 'duration': 7.082}, {'end': 407.712, 'text': 'We have strong and weak for wind and high and normal ranges for humidity.', 'start': 403.25, 'duration': 4.462}, {'end': 411.933, 'text': 'So for each frequency table, we will generate a likelihood table now.', 'start': 408.472, 'duration': 3.461}, {'end': 417.113, 'text': 'Now the likelihood table contains the probability of a particular day.', 'start': 412.631, 'duration': 4.482}, {'end': 424.616, 'text': 'Suppose we take the Sunny and we take the play as yes and no so the probability of Sunny given that we play.', 'start': 417.633, 'duration': 6.983}], 'summary': 'Analyzing data to predict play based on outlook, humidity, and wind features', 'duration': 43.336, 'max_score': 368.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM368597.jpg'}, {'end': 572.465, 'src': 'embed', 'start': 538.078, 'weight': 0, 'content': [{'end': 546.603, 'text': 'Now, if we look at the probability of yes for that day of playing, we just need to divide it with the likelihood sum of both the yes and no.', 'start': 538.078, 'duration': 8.525}, {'end': 551.374, 'text': 'So the probability of playing tomorrow, which is yes, is 0.5..', 'start': 547.412, 'duration': 3.962}, {'end': 555.536, 'text': 'Whereas the probability of not playing is equal to 0.45.', 'start': 551.374, 'duration': 4.162}, {'end': 559.438, 'text': 'But this is based upon the data which we already have with us.', 'start': 555.536, 'duration': 3.902}, {'end': 572.465, 'text': 'So now that you have an idea of what exactly is naive bias how it works and we have seen how it can be implemented on a particular data set.', 'start': 559.458, 'duration': 13.007}], 'summary': 'Naive bias gives 50% probability of playing and 45% probability of not playing based on available data.', 'duration': 34.387, 'max_score': 538.078, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM538078.jpg'}], 'start': 368.597, 'title': 'Likelihood tables and bayes theorem', 'summary': 'Explains the process of creating likelihood tables for weather attributes and using bayes theorem to calculate the probability of playing based on weather conditions, resulting in a 0.5 probability of playing and 0.45 probability of not playing.', 'chapters': [{'end': 411.933, 'start': 368.597, 'title': 'Likelihood table creation', 'summary': 'Explains the process of creating frequency and likelihood tables for attributes like outlook, humidity, and wind in order to make decisions based on the data set, with a focus on categorizing features and generating frequency and likelihood tables.', 'duration': 43.336, 'highlights': ['The process involves creating frequency tables for each attribute of the data set, including outlook, humidity, and wind. The frequency tables categorize attributes like outlook, humidity, and wind.', 'The attributes like outlook, humidity, and wind are categorized into specific features such as sunny, overcast, rainy for outlook and high, normal for humidity. Attributes such as outlook and humidity are categorized into specific features for analysis.', 'Likelihood tables are generated for each frequency table created in the previous step. Likelihood tables are created based on the frequency tables for further analysis.']}, {'end': 559.438, 'start': 412.631, 'title': 'Likelihood table and bayes theorem', 'summary': 'Discusses the calculation of likelihood tables for different weather conditions and the use of bayes theorem to determine the probability of playing based on given weather conditions, resulting in a 0.5 probability of playing and 0.45 probability of not playing.', 'duration': 146.807, 'highlights': ['The likelihood table is created for different weather conditions, including sunny, humidity, and wind, with calculated probabilities for playing yes and no. Likelihood of yes for sunny: 0.59, likelihood of no for sunny: 0.40', 'Bayes theorem is used to calculate the likelihood of playing yes or no based on given weather conditions, resulting in a probability of 0.019 for playing yes and 0.016 for playing no on a specific day with high rain, humidity, and weak wind. Probability of playing yes: 0.5, probability of not playing: 0.45']}], 'duration': 190.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM368597.jpg', 'highlights': ['Bayes theorem calculates probability of playing based on weather conditions: 0.5 probability of playing and 0.45 probability of not playing', 'Likelihood tables are generated for each frequency table created in the previous step', 'The process involves creating frequency tables for each attribute of the data set, including outlook, humidity, and wind']}, {'end': 1194.239, 'segs': [{'end': 585.349, 'src': 'embed', 'start': 559.458, 'weight': 2, 'content': [{'end': 572.465, 'text': 'So now that you have an idea of what exactly is naive bias how it works and we have seen how it can be implemented on a particular data set.', 'start': 559.458, 'duration': 13.007}, {'end': 574.966, 'text': "Let's see where it is used in the industry.", 'start': 573.045, 'duration': 1.921}, {'end': 580.224, 'text': 'Now, starting with our first industrial use case, which is news categorization,', 'start': 576.121, 'duration': 4.103}, {'end': 585.349, 'text': 'or we can use the term text classification to broaden the spectrum of this algorithm.', 'start': 580.224, 'duration': 5.125}], 'summary': 'Naive bias used in news categorization and text classification in industry.', 'duration': 25.891, 'max_score': 559.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM559458.jpg'}, {'end': 657.792, 'src': 'embed', 'start': 627.51, 'weight': 4, 'content': [{'end': 635.654, 'text': 'from the documents or the articles and then we apply the naive bias classifier for classifying the news contents based on the news code.', 'start': 627.51, 'duration': 8.144}, {'end': 643.64, 'text': 'Now, this is by far one of the best examples of naive bias classifier, which is spam filtering.', 'start': 636.394, 'duration': 7.246}, {'end': 649.125, 'text': "Now, it's the naive bias classifier are a popular statistical technique for email filtering.", 'start': 644.461, 'duration': 4.664}, {'end': 657.792, 'text': 'They typically use bag-of-words features to identify the spam email an approach commonly used in text classification as well.', 'start': 649.625, 'duration': 8.167}], 'summary': 'Naive bias classifier used for news content classification, popular for email filtering.', 'duration': 30.282, 'max_score': 627.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM627510.jpg'}, {'end': 830.886, 'src': 'embed', 'start': 810.5, 'weight': 0, 'content': [{'end': 824.644, 'text': 'empirical comparison of naive bias versus five popular classifiers on 15 medical data sets shows that naive bias is well suited for medical application and has high performance in most of the examine medical problems.', 'start': 810.5, 'duration': 14.144}, {'end': 830.886, 'text': 'Now in the past various statistical methods have been used for modeling in the area of disease diagnosis.', 'start': 825.365, 'duration': 5.521}], 'summary': 'Naive bayes outperforms 5 classifiers on 15 medical datasets, showing high performance for disease diagnosis.', 'duration': 20.386, 'max_score': 810.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM810500.jpg'}, {'end': 910.324, 'src': 'embed', 'start': 886.74, 'weight': 3, 'content': [{'end': 895.921, 'text': 'This domain remains as a research topic in which scientists and mathematicians are working to produce a model or an algorithm that will accurately predict the weather.', 'start': 886.74, 'duration': 9.181}, {'end': 899.922, 'text': 'now, a biocene approach based model is created by,', 'start': 895.921, 'duration': 4.001}, {'end': 906.683, 'text': 'where posterior probabilities are used to calculate the likelihood of each class label for input data instance,', 'start': 899.922, 'duration': 6.761}, {'end': 910.324, 'text': 'and the one with the maximum likelihood is considered as the resulting output.', 'start': 906.683, 'duration': 3.641}], 'summary': 'Scientists are developing a biocene approach model to predict weather accurately.', 'duration': 23.584, 'max_score': 886.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM886740.jpg'}, {'end': 955.053, 'src': 'embed', 'start': 927.491, 'weight': 5, 'content': [{'end': 930.833, 'text': 'Now, there are three types of naive bias model under scikit-learn library.', 'start': 927.491, 'duration': 3.342}, {'end': 938.157, 'text': 'The first one is the Gaussian it is used in classification and it assumes that the feature follow a normal distribution.', 'start': 931.493, 'duration': 6.664}, {'end': 940.52, 'text': 'The next we have is multinomial.', 'start': 938.919, 'duration': 1.601}, {'end': 942.182, 'text': 'It is used for discrete counts.', 'start': 940.701, 'duration': 1.481}, {'end': 948.127, 'text': "For example, let's say, we have a text classification problem and here we consider Bernoulli trials,", 'start': 942.242, 'duration': 5.885}, {'end': 955.053, 'text': 'which is one step further and instead of word occurring in the document, we have count how often word occurs in the document.', 'start': 948.127, 'duration': 6.926}], 'summary': 'Scikit-learn library has three types of naive bias models: gaussian, multinomial, and bernoulli for different data distributions and classification needs.', 'duration': 27.562, 'max_score': 927.491, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM927491.jpg'}, {'end': 965.943, 'src': 'heatmap', 'start': 927.491, 'weight': 1, 'content': [{'end': 930.833, 'text': 'Now, there are three types of naive bias model under scikit-learn library.', 'start': 927.491, 'duration': 3.342}, {'end': 938.157, 'text': 'The first one is the Gaussian it is used in classification and it assumes that the feature follow a normal distribution.', 'start': 931.493, 'duration': 6.664}, {'end': 940.52, 'text': 'The next we have is multinomial.', 'start': 938.919, 'duration': 1.601}, {'end': 942.182, 'text': 'It is used for discrete counts.', 'start': 940.701, 'duration': 1.481}, {'end': 948.127, 'text': "For example, let's say, we have a text classification problem and here we consider Bernoulli trials,", 'start': 942.242, 'duration': 5.885}, {'end': 955.053, 'text': 'which is one step further and instead of word occurring in the document, we have count how often word occurs in the document.', 'start': 948.127, 'duration': 6.926}, {'end': 961.879, 'text': 'You can think of it as and the number of times outcomes number is observed in the given number of trials.', 'start': 955.634, 'duration': 6.245}, {'end': 965.943, 'text': 'And finally, we have the Bernoulli type of nearby.', 'start': 962.66, 'duration': 3.283}], 'summary': 'Scikit-learn has three types of naive bias models: gaussian, multinomial, and bernoulli, used for classification and discrete counts.', 'duration': 38.452, 'max_score': 927.491, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM927491.jpg'}, {'end': 1021.228, 'src': 'embed', 'start': 994.235, 'weight': 1, 'content': [{'end': 999.958, 'text': 'and what are the different steps one can take to create a Bayesian model and use naive bias to predict the output.', 'start': 994.235, 'duration': 5.723}, {'end': 1003.88, 'text': 'So here to understand better we are going to predict the onset of diabetes.', 'start': 1000.578, 'duration': 3.302}, {'end': 1011.783, 'text': 'Now this problem comprises of 768 observations of medical details for Pima Indian patients.', 'start': 1004.72, 'duration': 7.063}, {'end': 1021.228, 'text': 'The record describes instantaneous measurement taken from the patients such as the age the number of times pregnant and the blood work group.', 'start': 1012.644, 'duration': 8.584}], 'summary': 'Create a bayesian model using naive bias to predict diabetes onset from 768 pima indian patient observations.', 'duration': 26.993, 'max_score': 994.235, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM994235.jpg'}], 'start': 559.458, 'title': 'Naive bias in industry', 'summary': 'Explores the industrial applications of naive bias, including news categorization, spam filtering, medical data analysis, object detection, weather prediction, and diabetes onset prediction, showcasing its effectiveness in various domains and its usage in the scikit-learn library for model creation.', 'chapters': [{'end': 1194.239, 'start': 559.458, 'title': 'Naive bias in industry', 'summary': 'Explores the industrial applications of naive bias, including news categorization, spam filtering, medical data analysis, object detection, weather prediction, and diabetes onset prediction, showcasing its effectiveness in various domains and its usage in the scikit-learn library for model creation.', 'duration': 634.781, 'highlights': ['Naive bias is effectively used in news categorization and text classification, where it helps in classifying news articles based on user preferences and removing heterogeneity in layout and categorization. News categorization, text classification, web crawling, tokenization, stop words removal, spam filtering, bag-of-words features, email filtering, statistical technique, and scikit-learn library usage.', 'Naive bias is a popular technique for email filtering, using bag-of-words features to identify spam emails and achieving low false-positive spam detection rates, making it one of the oldest ways of spam filtering with roots in the 1990s. Email filtering, spam detection, bag-of-words features, low false-positive rates, and historical roots in the 1990s.', 'Naive bias is widely applied in medical data analysis, showing high performance in various medical problems and being well-suited for medical applications, as evidenced by empirical comparisons with other classifiers on 15 medical datasets. Medical data analysis, high performance, empirical comparisons, 15 medical datasets, and suitability for medical applications.', 'Naive bias is used for predicting weather and has been a challenging problem in the meteorological department, with a focus on creating a model that accurately predicts weather using a Bayesian approach and posterior probabilities. Weather prediction, Bayesian approach, posterior probabilities, and accuracy in weather prediction.', 'Naive bias is utilized for predicting the onset of diabetes using the Pima Indian diabetes dataset and involves handling the data, summarizing it, making predictions, and evaluating accuracy. Diabetes onset prediction, Pima Indian diabetes dataset, data handling, summarization, prediction, and accuracy evaluation.', 'The scikit-learn library provides three types of naive bias models - Gaussian, multinomial, and Bernoulli - each suited for different types of data and text classification problems. Scikit-learn library, Gaussian, multinomial, Bernoulli, and text classification.']}], 'duration': 634.781, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM559458.jpg', 'highlights': ['Naive bias is widely applied in medical data analysis, showing high performance in various medical problems and being well-suited for medical applications, as evidenced by empirical comparisons with other classifiers on 15 medical datasets.', 'Naive bias is utilized for predicting the onset of diabetes using the Pima Indian diabetes dataset and involves handling the data, summarizing it, making predictions, and evaluating accuracy.', 'Naive bias is effectively used in news categorization and text classification, where it helps in classifying news articles based on user preferences and removing heterogeneity in layout and categorization.', 'Naive bias is used for predicting weather and has been a challenging problem in the meteorological department, with a focus on creating a model that accurately predicts weather using a Bayesian approach and posterior probabilities.', 'Naive bias is a popular technique for email filtering, using bag-of-words features to identify spam emails and achieving low false-positive spam detection rates, making it one of the oldest ways of spam filtering with roots in the 1990s.', 'The scikit-learn library provides three types of naive bias models - Gaussian, multinomial, and Bernoulli - each suited for different types of data and text classification problems.']}, {'end': 1428.084, 'segs': [{'end': 1219.597, 'src': 'embed', 'start': 1194.239, 'weight': 0, 'content': [{'end': 1204.028, 'text': 'we need to split the data into training data sets that nape bias can use to make the prediction and test data set that we can use to evaluate the accuracy of the model.', 'start': 1194.239, 'duration': 9.789}, {'end': 1212.235, 'text': 'We need to split the data set randomly into training and testing data set in the ratio of usually which is 70 to 30.', 'start': 1204.809, 'duration': 7.426}, {'end': 1216.035, 'text': 'But for this example, I am going to use 67 and 33.', 'start': 1212.235, 'duration': 3.8}, {'end': 1219.597, 'text': 'Now 70 and 30 is a common ratio for testing algorithms.', 'start': 1216.035, 'duration': 3.562}], 'summary': 'Data split into 67% training and 33% testing for model evaluation.', 'duration': 25.358, 'max_score': 1194.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1194239.jpg'}, {'end': 1269.561, 'src': 'embed', 'start': 1243.18, 'weight': 1, 'content': [{'end': 1247.544, 'text': 'Now, for example, if there are two class values and seven numerical attributes,', 'start': 1243.18, 'duration': 4.364}, {'end': 1255.85, 'text': 'then we need a mean and the standard deviation for each of these seven attributes and the class value which makes it the 14 attribute summaries.', 'start': 1247.544, 'duration': 8.306}, {'end': 1264.097, 'text': 'So we can break the preparation of this summary down into the following subtasks, which are the separating data by class, calculating mean,', 'start': 1256.631, 'duration': 7.466}, {'end': 1269.561, 'text': 'calculating standard deviation, summarizing the data sets and summarizing attributes by class.', 'start': 1264.097, 'duration': 5.464}], 'summary': 'To summarize data with 2 class values and 7 numerical attributes, we need 14 attribute summaries including mean and standard deviation.', 'duration': 26.381, 'max_score': 1243.18, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1243180.jpg'}, {'end': 1379.96, 'src': 'embed', 'start': 1349.057, 'weight': 3, 'content': [{'end': 1351.84, 'text': 'we can calculate the mean and standard deviation for each attribute.', 'start': 1349.057, 'duration': 2.783}, {'end': 1362.689, 'text': "Now that's a function groups the values for each attribute across our data instances into their own list so that we can compute the mean and standard deviation values for each attribute.", 'start': 1352.603, 'duration': 10.086}, {'end': 1365.971, 'text': 'Now next comes to summarizing attributes by class.', 'start': 1363.67, 'duration': 2.301}, {'end': 1372.315, 'text': 'We can pull it all together by first separating our training data sets into instances, growth by class,', 'start': 1366.131, 'duration': 6.184}, {'end': 1374.697, 'text': 'then calculating the summaries for each attribute.', 'start': 1372.315, 'duration': 2.382}, {'end': 1379.96, 'text': 'Now, we are ready to make predictions using the summaries prepared from our training data.', 'start': 1375.557, 'duration': 4.403}], 'summary': 'Calculate mean and standard deviation for each attribute, summarize attributes by class, and make predictions using training data.', 'duration': 30.903, 'max_score': 1349.057, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1349057.jpg'}, {'end': 1428.084, 'src': 'embed', 'start': 1386.522, 'weight': 4, 'content': [{'end': 1389.742, 'text': 'then selecting the class with the largest probability as a prediction.', 'start': 1386.522, 'duration': 3.22}, {'end': 1400.145, 'text': 'Now we can divide this whole method into four tasks, which are the calculating Gaussian probability, density function, calculating class probability,', 'start': 1390.563, 'duration': 9.582}, {'end': 1402.926, 'text': 'making a prediction and then estimating the accuracy.', 'start': 1400.145, 'duration': 2.781}, {'end': 1406.518, 'text': 'Now to calculate the Gaussian probability density function.', 'start': 1403.817, 'duration': 2.701}, {'end': 1411.799, 'text': 'We use the Gaussian function to estimate the probability of a given attribute value,', 'start': 1407.058, 'duration': 4.741}, {'end': 1416.421, 'text': 'given the node mean and the standard deviation of the attribute estimated from the training data.', 'start': 1411.799, 'duration': 4.622}, {'end': 1424.503, 'text': 'As you can see the parameters are X mean and the standard deviation now in the calculate probability function.', 'start': 1417.661, 'duration': 6.842}, {'end': 1428.084, 'text': 'We calculate the exponent first then calculate the main division.', 'start': 1424.543, 'duration': 3.541}], 'summary': 'The method involves 4 tasks: calculating gaussian probability, density function, class probability, and accuracy estimation.', 'duration': 41.562, 'max_score': 1386.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1386522.jpg'}], 'start': 1194.239, 'title': 'Naive bias model, data splitting, and gaussian probability calculation', 'summary': 'Covers the process of splitting data into 67:33 training and testing sets, preparing attribute summaries for a naive bias model, calculating gaussian probability density function, estimating class probability, making predictions, and estimating accuracy in machine learning.', 'chapters': [{'end': 1310.504, 'start': 1194.239, 'title': 'Naive bias model and data splitting', 'summary': 'Explains the process of splitting data into training and testing sets in the ratio of 67:33, and the preparation of attribute summaries for a naive bias model, involving calculating mean and standard deviation for each attribute by class value.', 'duration': 116.265, 'highlights': ['The process of splitting the data into training and testing sets in the ratio of 67:33 is explained, allowing flexibility to adjust the ratio for testing algorithms. The data set is split randomly into training and testing data sets in the ratio of 67 to 33, providing flexibility to adjust the ratio for testing algorithms.', 'The summary of the training data involves the mean and standard deviation of each attribute by class value, with a total of 14 attribute summaries for two class values and seven numerical attributes. The summary of the training data involves the mean and standard deviation of each attribute by class value, resulting in 14 attribute summaries for two class values and seven numerical attributes.', 'The subtasks for preparing the summary of the data sets include separating data by class, calculating mean, calculating standard deviation, and summarizing attributes by class. The preparation of the summary involves subtasks such as separating data by class, calculating mean, calculating standard deviation, and summarizing attributes by class.']}, {'end': 1428.084, 'start': 1310.504, 'title': 'Gaussian probability calculation in machine learning', 'summary': 'Explains the process of calculating gaussian probability density function, estimating class probability, making predictions, and estimating accuracy in machine learning, with a focus on summarizing data and calculating mean and standard deviation for each attribute.', 'duration': 117.58, 'highlights': ['The process involves calculating the mean and standard deviation for each attribute, summarizing attributes by class, and making predictions using the summaries prepared from the training data.', 'The method can be divided into four tasks: calculating Gaussian probability density function, estimating class probability, making predictions, and estimating accuracy.', 'To calculate the Gaussian probability density function, the Gaussian function is used to estimate the probability of a given attribute value, given the mean and standard deviation of the attribute estimated from the training data.']}], 'duration': 233.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1194239.jpg', 'highlights': ['The data set is split randomly into training and testing data sets in the ratio of 67 to 33, providing flexibility to adjust the ratio for testing algorithms.', 'The summary of the training data involves the mean and standard deviation of each attribute by class value, resulting in 14 attribute summaries for two class values and seven numerical attributes.', 'The preparation of the summary involves subtasks such as separating data by class, calculating mean, calculating standard deviation, and summarizing attributes by class.', 'The process involves calculating the mean and standard deviation for each attribute, summarizing attributes by class, and making predictions using the summaries prepared from the training data.', 'The method can be divided into four tasks: calculating Gaussian probability density function, estimating class probability, making predictions, and estimating accuracy.', 'To calculate the Gaussian probability density function, the Gaussian function is used to estimate the probability of a given attribute value, given the mean and standard deviation of the attribute estimated from the training data.']}, {'end': 1590.956, 'segs': [{'end': 1455.637, 'src': 'embed', 'start': 1428.644, 'weight': 0, 'content': [{'end': 1431.865, 'text': 'This lets us fit the equation nicely into two lines.', 'start': 1428.644, 'duration': 3.221}, {'end': 1436.137, 'text': 'Now the next task is calculating the class properties.', 'start': 1432.714, 'duration': 3.423}, {'end': 1440.241, 'text': 'now that we can calculate the probability of an attribute belonging to a class,', 'start': 1436.137, 'duration': 4.104}, {'end': 1448.81, 'text': 'we can combine the probabilities of all the attributes values for a data instance and come up with a probability of the entire data instance belonging to the class.', 'start': 1440.241, 'duration': 8.569}, {'end': 1452.474, 'text': 'So now that we have calculated the class properties.', 'start': 1449.952, 'duration': 2.522}, {'end': 1455.637, 'text': "It's time to finally make our first prediction.", 'start': 1452.575, 'duration': 3.062}], 'summary': 'Calculate class properties and make first prediction.', 'duration': 26.993, 'max_score': 1428.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1428644.jpg'}, {'end': 1590.956, 'src': 'embed', 'start': 1534.823, 'weight': 1, 'content': [{'end': 1538.044, 'text': 'We are using the get prediction and the get accuracy method as well.', 'start': 1534.823, 'duration': 3.221}, {'end': 1549.868, 'text': 'So, guys, as you can see, the output of this one gives us that we are splitting the 768 rows into 514, which is the training, and 254,', 'start': 1539.284, 'duration': 10.584}, {'end': 1555.71, 'text': 'which is the test data set rows, and the accuracy of this model is 68%.', 'start': 1549.868, 'duration': 5.842}, {'end': 1560.031, 'text': 'now we can play with the amount of training and test data sets which are to be used here.', 'start': 1555.71, 'duration': 4.321}, {'end': 1567.83, 'text': 'so we can change the split ratio to 70s to 30 80s to 20 to get different sort of accuracy.', 'start': 1560.686, 'duration': 7.144}, {'end': 1572.313, 'text': 'So suppose I change the split ratio from 0.67 to 0.8.', 'start': 1568.671, 'duration': 3.642}, {'end': 1578.276, 'text': 'So, as you can see, we get the accuracy of 62%.', 'start': 1572.313, 'duration': 5.963}, {'end': 1585.894, 'text': 'so splitting it into 0.67 gave us a better result, which was 68%.', 'start': 1578.276, 'duration': 7.618}, {'end': 1590.956, 'text': 'So this is how you can implement a naive bias caution classifier.', 'start': 1585.894, 'duration': 5.062}], 'summary': 'Splitting 768 rows into 514 training and 254 test data, with 68% accuracy. adjusting split ratio affects accuracy.', 'duration': 56.133, 'max_score': 1534.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1534823.jpg'}], 'start': 1428.644, 'title': 'Implementing naive bias classifier', 'summary': 'Explains the process of calculating class properties, making predictions, and evaluating the accuracy of a naive bias classifier using a dataset of 768 rows, achieving an accuracy of 68% with a 67:33 training-test split ratio, and demonstrates the impact of changing the split ratio on accuracy.', 'chapters': [{'end': 1590.956, 'start': 1428.644, 'title': 'Implementing naive bias classifier', 'summary': 'Explains the process of calculating class properties, making predictions, and evaluating the accuracy of a naive bias classifier using a dataset of 768 rows, achieving an accuracy of 68% with a 67:33 training-test split ratio, and demonstrates the impact of changing the split ratio on accuracy.', 'duration': 162.312, 'highlights': ["The model achieved an accuracy of 68% with a 67:33 training-test split ratio. The model's accuracy is quantified, providing a measure of its effectiveness.", 'The process of calculating class properties, making predictions, and evaluating the accuracy of the naive bias classifier is explained. The steps involved in implementing and evaluating the classifier are outlined, providing a comprehensive understanding of the process.', 'Demonstrates the impact of changing the split ratio on accuracy, achieving 62% accuracy with an 80:20 split ratio. The influence of altering the training-test split ratio on the accuracy of the model is illustrated, emphasizing the importance of data partitioning.']}], 'duration': 162.312, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1428644.jpg', 'highlights': ['The process of calculating class properties, making predictions, and evaluating the accuracy of the naive bias classifier is explained. The steps involved in implementing and evaluating the classifier are outlined, providing a comprehensive understanding of the process.', "The model achieved an accuracy of 68% with a 67:33 training-test split ratio. The model's accuracy is quantified, providing a measure of its effectiveness.", 'Demonstrates the impact of changing the split ratio on accuracy, achieving 62% accuracy with an 80:20 split ratio. The influence of altering the training-test split ratio on the accuracy of the model is illustrated, emphasizing the importance of data partitioning.']}, {'end': 1817.32, 'segs': [{'end': 1688.303, 'src': 'embed', 'start': 1664.454, 'weight': 1, 'content': [{'end': 1671.231, 'text': 'So here we are going to use the Gaussian NB model which is already present in the sklearn library, which is the cycle and library.', 'start': 1664.454, 'duration': 6.777}, {'end': 1680.337, 'text': 'So first of all, what we need to do is import the sklearn data sets and the metrics and we also need to import the Gaussian NB.', 'start': 1672.532, 'duration': 7.805}, {'end': 1688.303, 'text': 'Now, once all these libraries are lowered, we need to load the data set, which is the iris data set.', 'start': 1681.858, 'duration': 6.445}], 'summary': 'Using gaussian nb model from sklearn library to analyze iris data set.', 'duration': 23.849, 'max_score': 1664.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1664454.jpg'}, {'end': 1757.949, 'src': 'embed', 'start': 1732.952, 'weight': 0, 'content': [{'end': 1739.054, 'text': 'So the expected output is data set, dot target, and the predicted is using the predicted model,', 'start': 1732.952, 'duration': 6.102}, {'end': 1744.156, 'text': 'and the model we are using is the Gaussian NB here now to summarize, the model which is created.', 'start': 1739.054, 'duration': 5.102}, {'end': 1748.937, 'text': 'We calculate the confusion Matrix and the classification report.', 'start': 1744.196, 'duration': 4.741}, {'end': 1757.949, 'text': 'So guys as you see the classification report, we have the precision of 0.96 We have the recall of 0.96.', 'start': 1750.078, 'duration': 7.871}], 'summary': 'Using gaussian nb model, achieved precision and recall of 0.96 in classification report.', 'duration': 24.997, 'max_score': 1732.952, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1732952.jpg'}], 'start': 1591.616, 'title': 'Using gaussian nb with scikit-learn', 'summary': 'Demonstrates the ease of implementing the gaussian naive bayes model from the scikit-learn library by fitting it to the iris dataset, achieving a precision and recall of 0.96, showcasing the simplicity and efficiency of scikit-learn in machine learning.', 'chapters': [{'end': 1817.32, 'start': 1591.616, 'title': 'Using gaussian nb with scikit-learn', 'summary': 'Demonstrates the ease of implementing the gaussian naive bayes model from the scikit-learn library by fitting it to the iris dataset, achieving a precision and recall of 0.96, showcasing the simplicity and efficiency of the scikit-learn library in machine learning.', 'duration': 225.704, 'highlights': ['The chapter demonstrates the ease of implementing the Gaussian Naive Bayes model from the scikit-learn library by fitting it to the iris dataset. Demonstrates the ease of implementing Gaussian Naive Bayes model from scikit-learn by fitting it to the iris dataset.', 'Achieving a precision and recall of 0.96, showcasing the simplicity and efficiency of the scikit-learn library in machine learning. Showcases the achieved precision and recall of 0.96, highlighting the simplicity and efficiency of scikit-learn library in machine learning.']}], 'duration': 225.704, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vz_xuxYS2PM/pics/vz_xuxYS2PM1591616.jpg', 'highlights': ['Achieving a precision and recall of 0.96, showcasing the simplicity and efficiency of the scikit-learn library in machine learning.', 'The chapter demonstrates the ease of implementing the Gaussian Naive Bayes model from the scikit-learn library by fitting it to the iris dataset.']}], 'highlights': ['Naive Bayes algorithm is a classification technique based on Bayes theorem, assuming independence among predictors, and is particularly useful for very large datasets.', 'Naive bias is widely applied in medical data analysis, showing high performance in various medical problems and being well-suited for medical applications, as evidenced by empirical comparisons with other classifiers on 15 medical datasets.', 'The scikit-learn library provides three types of naive bias models - Gaussian, multinomial, and Bernoulli - each suited for different types of data and text classification problems.', "The model achieved an accuracy of 68% with a 67:33 training-test split ratio. The model's accuracy is quantified, providing a measure of its effectiveness.", 'Achieving a precision and recall of 0.96, showcasing the simplicity and efficiency of the scikit-learn library in machine learning.']}