title
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With Example | Simplilearn
description
🔥 Purdue Post Graduate Program In AI And Machine Learning: https://www.simplilearn.com/pgp-ai-machine-learning-certification-training-course?utm_campaign=TempLink1&utm_medium=DescriptionFirstFold&utm_source=youtube
🔥IIT Kanpur Professional Certificate Course In AI And Machine Learning (India Only): https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=TempLink1&utm_medium=DescriptionFirstFold&utm_source=youtube
🔥AI & Machine Learning Bootcamp(US Only): https://www.simplilearn.com/ai-machine-learning-bootcamp?utm_campaign=TempLink1&utm_medium=DescriptionFirstFold&utm_source=youtube
🔥AI Engineer Masters Program (Discount Code - YTBE15): https://www.simplilearn.com/masters-in-artificial-intelligence?utm_campaign=TempLink1&utm_medium=DescriptionFirstFold&utm_source=youtube
🔥 Professional Certificate Program In Generative AI And Machine Learning (India Only) - https://www.simplilearn.com/iitr-generative-ai-machine-learning-program?utm_campaign=TempLink1&utm_medium=DescriptionFirstFold&utm_source=youtube
This Naive Bayes Classifier tutorial video will introduce you to the basic concepts of Naive Bayes classifier, what the Naive Bayes algorithm is and Bayes theorem in general. You will understand conditional probability concepts, where the Naive Bayes classifier is used and how the Naive Bayes algorithm works. By the end of this video, you will also implement the Naive Bayes algorithm for text classification in Python.
Dataset Link - https://drive.google.com/drive/folders/1yqGMb98BG2rdP2CP8o6dipuZBt7Hexfd
The topics covered in this Naive Bayes video are as follows:
00:00 - 01:06 Introduction and Agenda
01:06 - 05:45 What is Naive Bayes?
05:45 - 06:30 Why do we need Naive Bayes?
06:30 - 20:17 Understanding Naive Bayes Classifier
20:17 - 22:36 Advantages of Naive Bayes Classifier
22:36 - 43:45 Demo - Text Classification using Naive Bayes
âś…Subscribe to our Channel to learn more about the top Technologies: https://bit.ly/2VT4WtH
⏩ Check out the Machine Learning tutorial videos: https://bit.ly/3fFR4f4
For a more detailed understanding on Naive Bayes Classifier, do visit: https://bit.ly/2DHxctD
You can also go through the Slides here: https://goo.gl/Cw9wqy
#NaiveBayesClassifer #NaiveBayes #NaiveBayesAlgorithm #NaiveBayesInMachineLearning #NaiveBayesMachineLearning #NaiveBayesClassiferExample #MachineLearningAlgoithms #MachineLearning #Simplilearn
What is Naive Bayes Classifier?
Naive Bayes is a supervised learning algorithm that is based on applying Bayes’ theorem with the “naive” assumption. The Bayes Rule gives the formula for the probability of Y given X. It is called Naive because of the naive assumption that the X’s are independent of each other.
➡️ About Caltech Post Graduate Program In Data Science
This Post Graduation in Data Science leverages the superiority of Caltech's academic eminence. The Data Science program covers critical Data Science topics like Python programming, R programming, Machine Learning, Deep Learning, and Data Visualization tools through an interactive learning model with live sessions by global practitioners and practical labs.
âś… Key Features
- Simplilearn's JobAssist helps you get noticed by top hiring companies
- Caltech PG program in Data Science completion certificate
- Earn up to 14 CEUs from Caltech CTME
- Masterclasses delivered by distinguished Caltech faculty and IBM experts
- Caltech CTME Circle membership
- Online convocation by Caltech CTME Program Director
- IBM certificates for IBM courses
- Access to hackathons and Ask Me Anything sessions from IBM
- 25+ hands-on projects from the likes of Amazon, Walmart, Uber, and many more
- Seamless access to integrated labs
- Capstone projects in 3 domains
- Simplilearn’s Career Assistance to help you get noticed by top hiring companies
- 8X higher interaction in live online classes by industry experts
âś… Skills Covered
- Exploratory Data Analysis
- Descriptive Statistics
- Inferential Statistics
- Model Building and Fine Tuning
- Supervised and Unsupervised Learning
- Ensemble Learning
- Deep Learning
- Data Visualization
👉Learn more at: https://bit.ly/3fouyY0
🔥 Enroll for FREE Machine Learning Course & Get your Completion Certificate: https://www.simplilearn.com/learn-machine-learning-basics-skillup?utm_campaign=MachineLearning&utm_medium=Description&utm_source=youtube
🔥🔥 Interested in Attending Live Classes? Call Us: IN - 18002127688 / US - +18445327688
detail
{'title': 'Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With Example | Simplilearn', 'heatmap': [{'end': 368.886, 'start': 337.869, 'weight': 0.836}, {'end': 631.33, 'start': 547.402, 'weight': 0.782}, {'end': 814.321, 'start': 784.255, 'weight': 0.725}, {'end': 1656.154, 'start': 1572.751, 'weight': 0.738}], 'summary': "Introduces the naive bayes classifier and its applications in spam filtering, news text classification, and sentimental analysis, with a promise to delve into its workings and advantages, concluding with a practical python coding example. it covers the basic concepts of the bayes theorem, its application in the naive bayes classifier, and provides use cases in various fields. the application of probability in data analysis to predict purchase behavior is discussed, analyzing traits like day, discount, and free delivery. it also explores probability calculation for purchase decisions using likelihood tables and reveals insights into purchase behavior. additionally, it covers text classification using python's naive bayes classifier to classify news headlines, challenges of weighing words in text data analysis, and the relevance of tf.idf vectorizer in determining document meaning. the chapter explains text classification with tf-idf vectorizer and multinomial nb model, achieving overall accuracy and covers naive bayes classifier implementation for accurate predictions in various categories using tokenization and tf-idf vectorization.", 'chapters': [{'end': 64.876, 'segs': [{'end': 31.536, 'src': 'embed', 'start': 2.805, 'weight': 0, 'content': [{'end': 5.266, 'text': 'Introducing Naive Bayes Classifier.', 'start': 2.805, 'duration': 2.461}, {'end': 8.927, 'text': 'Have you ever wondered how your mail provider implements spam filtering?', 'start': 5.466, 'duration': 3.461}, {'end': 17.249, 'text': 'Or how online news channels perform news text classification? Or how companies perform sentimental analysis of their audience on social media?', 'start': 9.247, 'duration': 8.002}, {'end': 22.991, 'text': 'All of this and more is done through a machine learning algorithm called Naive Bayes Classifier.', 'start': 17.449, 'duration': 5.542}, {'end': 25.232, 'text': 'Welcome to Naive Bayes Tutorial.', 'start': 23.171, 'duration': 2.061}, {'end': 26.793, 'text': 'My name is Richard Kirshner.', 'start': 25.412, 'duration': 1.381}, {'end': 28.594, 'text': "I'm with the Simply Learn team.", 'start': 27.053, 'duration': 1.541}, {'end': 31.536, 'text': "That's www.simplylearn.com.", 'start': 29.014, 'duration': 2.522}], 'summary': 'Naive bayes classifier is used for spam filtering, news text classification, and sentimental analysis in machine learning.', 'duration': 28.731, 'max_score': 2.805, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02805.jpg'}, {'end': 74.763, 'src': 'embed', 'start': 44.824, 'weight': 1, 'content': [{'end': 51.808, 'text': 'Why do we need Naive Bayes? And understanding Naive Bayes Classifier, a much more in-depth of how the math works in the background.', 'start': 44.824, 'duration': 6.984}, {'end': 59.833, 'text': "Finally we'll get into the advantages of the naive Bayes classifier in the machine learning setup and then we'll roll up our sleeves and do my favorite part.", 'start': 52.068, 'duration': 7.765}, {'end': 64.876, 'text': "We'll actually do some Python coding and do some text classification using the naive Bayes.", 'start': 60.113, 'duration': 4.763}, {'end': 66.758, 'text': 'What is naive Bayes?', 'start': 65.357, 'duration': 1.401}, {'end': 74.763, 'text': "Let's start with a basic introduction to the Bayes theorem, named after Thomas Bayes from the 1700s, who first coined this in the Western literature.", 'start': 67.118, 'duration': 7.645}], 'summary': 'Introduction to naive bayes, advantages, and python coding for text classification.', 'duration': 29.939, 'max_score': 44.824, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo044824.jpg'}], 'start': 2.805, 'title': 'Introduction to naive bayes classifier', 'summary': 'Introduces naive bayes classifier and its applications in spam filtering, news text classification, and sentimental analysis, along with a promise to delve into its workings and advantages, concluding with a practical python coding example.', 'chapters': [{'end': 64.876, 'start': 2.805, 'title': 'Introduction to naive bayes classifier', 'summary': 'Introduces naive bayes classifier and its applications in spam filtering, news text classification, and sentimental analysis, along with a promise to delve into its workings and advantages, concluding with a practical python coding example.', 'duration': 62.071, 'highlights': ['The chapter covers the introduction and applications of Naive Bayes Classifier in spam filtering, news text classification, and sentimental analysis.', 'The chapter promises an in-depth understanding of the workings of Naive Bayes Classifier and its advantages in machine learning setup.', 'The chapter concludes with a practical demonstration of text classification using Naive Bayes through Python coding.']}], 'duration': 62.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02805.jpg', 'highlights': ['The chapter covers the introduction and applications of Naive Bayes Classifier in spam filtering, news text classification, and sentimental analysis.', 'The chapter promises an in-depth understanding of the workings of Naive Bayes Classifier and its advantages in machine learning setup.', 'The chapter concludes with a practical demonstration of text classification using Naive Bayes through Python coding.']}, {'end': 426.958, 'segs': [{'end': 131.853, 'src': 'embed', 'start': 104.305, 'weight': 0, 'content': [{'end': 107.126, 'text': 'So the probability of getting two heads equals one-fourth.', 'start': 104.305, 'duration': 2.821}, {'end': 112.107, 'text': 'You can see on our data set, we have two heads, and this occurs once out of the four possibilities.', 'start': 107.386, 'duration': 4.721}, {'end': 116.428, 'text': 'And then the probability of at least one tell occurs three-quarters of the time.', 'start': 112.367, 'duration': 4.061}, {'end': 120.929, 'text': "You'll see on three of the coin tosses, we have tells in them, and out of four, that's three-fourths.", 'start': 116.528, 'duration': 4.401}, {'end': 127.211, 'text': 'And the probability of the second coin being a head, given the first coin is tell, is one half.', 'start': 121.209, 'duration': 6.002}, {'end': 131.853, 'text': 'And the probability of getting two heads, given the first coin is a head, is one half.', 'start': 127.371, 'duration': 4.482}], 'summary': 'Probability of getting two heads = 1/4, at least one tail = 3/4, second coin being head given first coin is tail = 1/2.', 'duration': 27.548, 'max_score': 104.305, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0104305.jpg'}, {'end': 165.272, 'src': 'embed', 'start': 137.755, 'weight': 1, 'content': [{'end': 142.577, 'text': 'But when you have something more complex, you can see where these formulas really come in and work.', 'start': 137.755, 'duration': 4.822}, {'end': 149.461, 'text': 'So the Bayes Theorem gives us the conditional probability of an event A given another event B has occurred.', 'start': 142.897, 'duration': 6.564}, {'end': 154.764, 'text': 'In this case the first coin toss will be B and the second coin toss A.', 'start': 149.841, 'duration': 4.923}, {'end': 160.408, 'text': "This could be confusing because we've actually reversed the order of them and go from B to A instead of A to B.", 'start': 154.764, 'duration': 5.644}, {'end': 162.55, 'text': "You'll see this a lot when you work in probabilities.", 'start': 160.408, 'duration': 2.142}, {'end': 165.272, 'text': "The reason is we're looking for event A.", 'start': 162.99, 'duration': 2.282}], 'summary': 'Bayes theorem calculates conditional probability, useful for complex events and probabilities.', 'duration': 27.517, 'max_score': 137.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0137755.jpg'}, {'end': 211.342, 'src': 'embed', 'start': 180.101, 'weight': 2, 'content': [{'end': 184.904, 'text': 'given A has occurred times the probability of A over the probability of B.', 'start': 180.101, 'duration': 4.803}, {'end': 188.206, 'text': 'This simple formula can be moved around just like any algebra formula.', 'start': 184.904, 'duration': 3.302}, {'end': 196.87, 'text': 'And we could do the probability of A after a given B times probability of B equals the probability of B given A times probability of A.', 'start': 188.306, 'duration': 8.564}, {'end': 199.692, 'text': 'You can easily move that around and multiply it and divide it out.', 'start': 196.87, 'duration': 2.822}, {'end': 202.254, 'text': "Let us apply Bayes' Theorem to our example.", 'start': 200.092, 'duration': 2.162}, {'end': 203.695, 'text': 'Here we have our two quarters.', 'start': 202.434, 'duration': 1.261}, {'end': 211.342, 'text': "And we'll notice that the first two probabilities of getting two heads and at least one tail, we compute directly off the data.", 'start': 203.916, 'duration': 7.426}], 'summary': "Bayes' theorem can be applied to calculate probabilities using given data.", 'duration': 31.241, 'max_score': 180.101, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0180101.jpg'}, {'end': 247.267, 'src': 'embed', 'start': 222.492, 'weight': 4, 'content': [{'end': 227.896, 'text': "The second condition, the second set, three and four, we're going to explore a little bit more in detail.", 'start': 222.492, 'duration': 5.404}, {'end': 232.139, 'text': 'Now we stick to a simple example with two coins because you can easily understand the math.', 'start': 228.016, 'duration': 4.123}, {'end': 236.802, 'text': "The probability of throwing a tail doesn't matter what comes before it, and the same with the head.", 'start': 232.339, 'duration': 4.463}, {'end': 238.083, 'text': "So it's still going to be 50% or one half.", 'start': 236.962, 'duration': 1.121}, {'end': 247.267, 'text': "But when that probability gets more complicated, let's say you have a D6 dice or some other instance, then this formula really comes in handy.", 'start': 239.344, 'duration': 7.923}], 'summary': 'Exploring coin toss probabilities and dice outcomes with formulas.', 'duration': 24.775, 'max_score': 222.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0222492.jpg'}, {'end': 368.886, 'src': 'heatmap', 'start': 337.869, 'weight': 0.836, 'content': [{'end': 341.452, 'text': "There's also regression, but we're going to be in the classification side.", 'start': 337.869, 'duration': 3.583}, {'end': 344.174, 'text': 'And then under classification is your naive Bayes.', 'start': 341.692, 'duration': 2.482}, {'end': 348.636, 'text': "Let's go ahead and glance into where is Naive Bay's use.", 'start': 344.574, 'duration': 4.062}, {'end': 350.557, 'text': "Let's look at some of the use scenarios for it.", 'start': 348.716, 'duration': 1.841}, {'end': 353.458, 'text': 'As a classifier, we use it in face recognition.', 'start': 350.857, 'duration': 2.601}, {'end': 356.039, 'text': 'Is this Cindy, or is it not Cindy or whoever?', 'start': 353.698, 'duration': 2.341}, {'end': 362.482, 'text': 'Or it might be used to identify parts of the face that they then feed into another part of the face recognition program.', 'start': 356.56, 'duration': 5.922}, {'end': 365.004, 'text': 'This is the eye, this is the nose, this is the mouth.', 'start': 362.563, 'duration': 2.441}, {'end': 366.044, 'text': 'Weather prediction.', 'start': 365.344, 'duration': 0.7}, {'end': 368.886, 'text': 'Is it going to be rainy or sunny? Medical recognition.', 'start': 366.304, 'duration': 2.582}], 'summary': 'The transcript discusses the use of naive bayes in classification, with examples including face recognition, weather prediction, and medical recognition.', 'duration': 31.017, 'max_score': 337.869, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0337869.jpg'}, {'end': 371.808, 'src': 'embed', 'start': 344.574, 'weight': 3, 'content': [{'end': 348.636, 'text': "Let's go ahead and glance into where is Naive Bay's use.", 'start': 344.574, 'duration': 4.062}, {'end': 350.557, 'text': "Let's look at some of the use scenarios for it.", 'start': 348.716, 'duration': 1.841}, {'end': 353.458, 'text': 'As a classifier, we use it in face recognition.', 'start': 350.857, 'duration': 2.601}, {'end': 356.039, 'text': 'Is this Cindy, or is it not Cindy or whoever?', 'start': 353.698, 'duration': 2.341}, {'end': 362.482, 'text': 'Or it might be used to identify parts of the face that they then feed into another part of the face recognition program.', 'start': 356.56, 'duration': 5.922}, {'end': 365.004, 'text': 'This is the eye, this is the nose, this is the mouth.', 'start': 362.563, 'duration': 2.441}, {'end': 366.044, 'text': 'Weather prediction.', 'start': 365.344, 'duration': 0.7}, {'end': 368.886, 'text': 'Is it going to be rainy or sunny? Medical recognition.', 'start': 366.304, 'duration': 2.582}, {'end': 369.786, 'text': 'News prediction.', 'start': 369.066, 'duration': 0.72}, {'end': 371.808, 'text': "It's also used in medical diagnosis.", 'start': 369.926, 'duration': 1.882}], 'summary': 'Naive bayes used in face recognition, weather and news prediction, medical diagnosis.', 'duration': 27.234, 'max_score': 344.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0344574.jpg'}], 'start': 65.357, 'title': 'Naive bayes', 'summary': 'Introduces the basic concepts of the bayes theorem and demonstrates its application through a simple example of coin tossing, highlighting the principles of conditional probability and the formulation of the bayes theorem. it also explains the concept of conditional probability using the example of coin tosses, introduces the bayes theorem and its application in the naive bayes classifier, and provides use cases of naive bayes in various fields such as face recognition, weather prediction, medical diagnosis, and news classification.', 'chapters': [{'end': 202.254, 'start': 65.357, 'title': 'Introduction to naive bayes', 'summary': 'Introduces the basic concepts of the bayes theorem and demonstrates its application through a simple example of coin tossing, highlighting the principles of conditional probability and the formulation of the bayes theorem.', 'duration': 136.897, 'highlights': ['The Bayes theorem gives us the conditional probability of an event A given another event B has occurred, demonstrating the principle of conditional probability and its application in determining the probability of an event given the occurrence of another event.', 'The demonstration of the probability of getting two heads equals one-fourth and the probability of at least one tail occurs three-quarters of the time serves as a practical example of conditional probability.', "The explanation of the Bayes theorem's formula and its flexibility in rearrangement, such as the probability of A given B times probability of B equals the probability of B given A times probability of A, illustrates the algebraic manipulation of conditional probability formulas."]}, {'end': 426.958, 'start': 202.434, 'title': 'Understanding naive bayes classifier', 'summary': 'Explains the concept of conditional probability using the example of coin tosses, introduces the bayes theorem and its application in the naive bayes classifier, and provides use cases of naive bayes in various fields such as face recognition, weather prediction, medical diagnosis, and news classification.', 'duration': 224.524, 'highlights': ['The chapter provides a detailed explanation of conditional probability using the example of coin tosses, demonstrating the computation of probabilities for different outcomes like getting two heads and at least one tail, with specific percentages such as 75% for three tails out of four trials.', "It introduces the Bayes Theorem and its application in calculating conditional probabilities, with a focus on the probability of one event given another, and presents a simple example to illustrate the formula's use in determining the likelihood of specific coin toss outcomes.", 'The chapter explores the use cases of naive Bayes in various fields such as face recognition, weather prediction, medical diagnosis, and news classification, emphasizing its role as a classifier in identifying objects or predicting outcomes based on given data.']}], 'duration': 361.601, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo065357.jpg', 'highlights': ['The demonstration of the probability of getting two heads equals one-fourth and the probability of at least one tail occurs three-quarters of the time serves as a practical example of conditional probability.', 'The Bayes theorem gives us the conditional probability of an event A given another event B has occurred, demonstrating the principle of conditional probability and its application in determining the probability of an event given the occurrence of another event.', "The explanation of the Bayes theorem's formula and its flexibility in rearrangement, such as the probability of A given B times probability of B equals the probability of B given A times probability of A, illustrates the algebraic manipulation of conditional probability formulas.", 'The chapter explores the use cases of naive Bayes in various fields such as face recognition, weather prediction, medical diagnosis, and news classification, emphasizing its role as a classifier in identifying objects or predicting outcomes based on given data.', 'The chapter provides a detailed explanation of conditional probability using the example of coin tosses, demonstrating the computation of probabilities for different outcomes like getting two heads and at least one tail, with specific percentages such as 75% for three tails out of four trials.', "It introduces the Bayes Theorem and its application in calculating conditional probabilities, with a focus on the probability of one event given another, and presents a simple example to illustrate the formula's use in determining the likelihood of specific coin toss outcomes."]}, {'end': 822.245, 'segs': [{'end': 476.795, 'src': 'embed', 'start': 442.102, 'weight': 0, 'content': [{'end': 445.663, 'text': "And let's go ahead and start applying it to some actual data so you can see what that looks like.", 'start': 442.102, 'duration': 3.561}, {'end': 448.964, 'text': "So we're going to start with the shopping demo problem statement.", 'start': 446.003, 'duration': 2.961}, {'end': 453.666, 'text': "And remember we're going to solve this first in table form so you can see what the math looks like.", 'start': 449.144, 'duration': 4.522}, {'end': 455.506, 'text': "And then we're going to solve it in Python.", 'start': 453.786, 'duration': 1.72}, {'end': 458.907, 'text': 'And in here we want to predict whether the person will purchase a product.', 'start': 455.686, 'duration': 3.221}, {'end': 460.668, 'text': "Are they going to buy or don't buy?", 'start': 458.987, 'duration': 1.681}, {'end': 462.429, 'text': "Very important if you're running a business,", 'start': 460.988, 'duration': 1.441}, {'end': 467.671, 'text': 'you want to know how to maximize your profits or at least maximize the purchase of the people coming into your store.', 'start': 462.429, 'duration': 5.242}, {'end': 471.613, 'text': "And we're going to look at a specific combination of different variables.", 'start': 467.851, 'duration': 3.762}, {'end': 475.355, 'text': "In this case, we're going to look at the day, the discount, and the free delivery.", 'start': 471.713, 'duration': 3.642}, {'end': 476.795, 'text': 'And you can see here under the day.', 'start': 475.515, 'duration': 1.28}], 'summary': 'Applying data analysis to predict purchase behavior in a business to maximize profits.', 'duration': 34.693, 'max_score': 442.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0442102.jpg'}, {'end': 542.4, 'src': 'embed', 'start': 511.32, 'weight': 5, 'content': [{'end': 514.121, 'text': "We're showing you the first 15 of those rows for this demo.", 'start': 511.32, 'duration': 2.801}, {'end': 516.943, 'text': 'Now the actual data file you can request.', 'start': 514.322, 'duration': 2.621}, {'end': 522.308, 'text': "just type in below under the comments on the YouTube video and we'll send you some more information and send you that file.", 'start': 516.943, 'duration': 5.365}, {'end': 525.752, 'text': 'As you can see here, the file is very simple, columns and rows.', 'start': 522.509, 'duration': 3.243}, {'end': 530.676, 'text': 'We have the day, the discount, the free delivery, and did the person purchase or not.', 'start': 525.952, 'duration': 4.724}, {'end': 535.478, 'text': 'And then we have under the day, whether it was a weekday, a holiday, was it the weekend?', 'start': 531.036, 'duration': 4.442}, {'end': 542.4, 'text': 'This is a pretty simple set of data and long before computers, people used to look at this data and calculate this all by hand.', 'start': 535.778, 'duration': 6.622}], 'summary': 'Demo showcases first 15 rows of simple data set with columns for day, discount, free delivery, purchase status, and day type.', 'duration': 31.08, 'max_score': 511.32, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0511320.jpg'}, {'end': 631.33, 'src': 'heatmap', 'start': 547.402, 'weight': 0.782, 'content': [{'end': 552.244, 'text': "Also note in today's world we're not usually looking at three different variables in 30 rows.", 'start': 547.402, 'duration': 4.842}, {'end': 559.528, 'text': "Nowadays, because we're able to collect data so much, we're usually looking at 27, 30 variables across hundreds of rows.", 'start': 552.644, 'duration': 6.884}, {'end': 566.691, 'text': "The first thing we want to do is we're going to take this data and, based on the data set containing our three inputs day,", 'start': 559.968, 'duration': 6.723}, {'end': 571.534, 'text': "discount and free delivery we're going to go ahead and populate that to frequency tables for each attribute.", 'start': 566.691, 'duration': 4.843}, {'end': 576.999, 'text': 'So we want to know if they had a discount, how many people buy and did not buy.', 'start': 571.914, 'duration': 5.085}, {'end': 578.821, 'text': 'Did they have a discount? Yes or no.', 'start': 577.019, 'duration': 1.802}, {'end': 580.863, 'text': 'Do we have a free delivery? Yes or no.', 'start': 579.021, 'duration': 1.842}, {'end': 585.948, 'text': "On those days, how many people made a purchase? How many people didn't? And the same with the three days of the week.", 'start': 581.123, 'duration': 4.825}, {'end': 589.192, 'text': 'Was it a weekday, a weekend, a holiday? And did they buy? Yes or no.', 'start': 586.008, 'duration': 3.184}, {'end': 594.357, 'text': 'As we dig in deeper to this table for our Bayes Theorem, let the event buy be A.', 'start': 589.392, 'duration': 4.965}, {'end': 598.001, 'text': 'Now remember when we looked at the coins I said we really want to know what the outcome is.', 'start': 594.677, 'duration': 3.324}, {'end': 602.225, 'text': "Did the person buy or not? And that's usually event A is what you're looking for.", 'start': 598.321, 'duration': 3.904}, {'end': 606.51, 'text': 'And the independent variables discount, free delivery, and day be B.', 'start': 602.426, 'duration': 4.084}, {'end': 608.472, 'text': "So we'll call that probability of B.", 'start': 606.51, 'duration': 1.962}, {'end': 611.976, 'text': 'Now let us calculate the likelihood table for one of the variables.', 'start': 608.472, 'duration': 3.504}, {'end': 615.52, 'text': "Let's start with day which includes weekday, weekend, and holiday.", 'start': 612.336, 'duration': 3.184}, {'end': 618.362, 'text': 'And let us start by summing all of our rows.', 'start': 615.72, 'duration': 2.642}, {'end': 620.964, 'text': 'So we have the weekday row.', 'start': 618.702, 'duration': 2.262}, {'end': 624.366, 'text': "And out of the weekdays, there's 9 plus 2, so it's 11 weekdays.", 'start': 621.244, 'duration': 3.122}, {'end': 627.608, 'text': "There's 8 weekend days and 11 holidays.", 'start': 624.626, 'duration': 2.982}, {'end': 628.909, 'text': "Wow, that's a lot of holidays.", 'start': 627.868, 'duration': 1.041}, {'end': 631.33, 'text': 'And then we want to sum up the total number of days.', 'start': 629.109, 'duration': 2.221}], 'summary': 'Analyzing a dataset with 27-30 variables across hundreds of rows to create frequency tables and calculate likelihood for different variables.', 'duration': 83.928, 'max_score': 547.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0547402.jpg'}, {'end': 752.951, 'src': 'embed', 'start': 704.127, 'weight': 4, 'content': [{'end': 708.93, 'text': 'So when we look at that, probability of the weekday without a purchase is going to be .', 'start': 704.127, 'duration': 4.803}, {'end': 709.851, 'text': '33 or 33%.', 'start': 708.93, 'duration': 0.921}, {'end': 715.755, 'text': "Let's take a look at this, at different probabilities, and, based on this likelihood table,", 'start': 709.851, 'duration': 5.904}, {'end': 718.86, 'text': "let's go ahead and calculate conditional probabilities as below.", 'start': 715.755, 'duration': 3.105}, {'end': 724.358, 'text': 'The first three we just did, the probability of making a purchase on the weekday is 11 out of 30, or roughly 36 or 37%, .', 'start': 719.14, 'duration': 5.218}, {'end': 734.761, 'text': "367 The probability of not making a purchase at all, doesn't matter what day of the week, is roughly 0.2 or 20%.", 'start': 724.358, 'duration': 10.403}, {'end': 740.364, 'text': 'And the probability of a weekday no purchase is roughly 2 out of 6.', 'start': 734.761, 'duration': 5.603}, {'end': 743.706, 'text': 'So 2 out of 6 of our no purchases were made on the weekday.', 'start': 740.364, 'duration': 3.342}, {'end': 748.629, 'text': "And then finally we take our P If you looked, we've kept the symbols up there.", 'start': 743.946, 'duration': 4.683}, {'end': 752.951, 'text': "We've got probability of B, probability of A, probability of B if A.", 'start': 748.649, 'duration': 4.302}], 'summary': 'The probability of making a purchase on a weekday is 36-37%.', 'duration': 48.824, 'max_score': 704.127, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0704127.jpg'}, {'end': 819.242, 'src': 'heatmap', 'start': 781.413, 'weight': 7, 'content': [{'end': 784.255, 'text': "and that'd be the probability of no purchase done on the weekday.", 'start': 781.413, 'duration': 2.842}, {'end': 794.243, 'text': 'and this is important because we can look at this and say as the probability of buying on the weekday is more than the probability of not buying on the weekday,', 'start': 784.255, 'duration': 9.988}, {'end': 798.026, 'text': 'we can conclude that customers will most likely buy the product on a weekday.', 'start': 794.243, 'duration': 3.783}, {'end': 802.317, 'text': "Now we've kept our chart simple and we're only looking at one aspect,", 'start': 798.396, 'duration': 3.921}, {'end': 806.218, 'text': 'so you should be able to look at the table and come up with the same information or the same conclusion.', 'start': 802.317, 'duration': 3.901}, {'end': 808.239, 'text': 'That should be kind of intuitive at this point.', 'start': 806.318, 'duration': 1.921}, {'end': 811.12, 'text': 'Next, we can take the same setup.', 'start': 808.739, 'duration': 2.381}, {'end': 814.321, 'text': 'We have the frequency tables of all three independent variables.', 'start': 811.52, 'duration': 2.801}, {'end': 819.242, 'text': "Now we can construct the likelihood tables for all three of the variables we're working with.", 'start': 814.541, 'duration': 4.701}], 'summary': 'Analyzing probability of weekday purchases to predict customer behavior.', 'duration': 37.829, 'max_score': 781.413, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0781413.jpg'}], 'start': 426.958, 'title': 'Applying probability in data analysis to predict purchase behavior', 'summary': 'Discusses the application of probability in data analysis to predict purchase behavior in a shopping demo problem, analyzing traits like day, discount, and free delivery. it concludes that customers are most likely to buy products on weekdays based on probability calculations.', 'chapters': [{'end': 492.125, 'start': 426.958, 'title': 'Probability in data analysis', 'summary': 'Discusses using probability in data analysis, applying it to a shopping demo problem to predict purchase behavior based on variables like day, discount, and free delivery.', 'duration': 65.167, 'highlights': ['The chapter discusses applying probability to a shopping demo problem to predict purchase behavior based on variables like day, discount, and free delivery.', 'The chapter explains the importance of predicting purchase behavior in business to maximize profits or the purchase of the people coming into the store.', 'The chapter demonstrates solving the problem first in table form and then in Python to showcase the mathematical and computational aspects.']}, {'end': 822.245, 'start': 492.305, 'title': 'Maximizing purchase traits from data', 'summary': 'Explores analyzing a small data set to maximize traits influencing purchases, including day, discount, and free delivery, and uses probability calculations to conclude that customers are most likely to buy products on weekdays.', 'duration': 329.94, 'highlights': ['The chapter explores analyzing a small data set to maximize traits influencing purchases, including day, discount, and free delivery.', 'Probability calculations are used to conclude that customers are most likely to buy products on weekdays.', 'The data set consists of 30 rows and includes columns for day, discount, free delivery, and purchase status.', 'The likelihood table is constructed for the variables, and probability calculations reveal a 36-37% likelihood of purchasing on weekdays and a 20% likelihood of not making a purchase regardless of the day.', 'The probability of no purchase on weekdays is calculated to be roughly 17-18%, indicating that customers are more likely to buy products on weekdays.']}], 'duration': 395.287, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0426958.jpg', 'highlights': ['The chapter demonstrates solving the problem first in table form and then in Python to showcase the mathematical and computational aspects.', 'The chapter discusses applying probability to a shopping demo problem to predict purchase behavior based on variables like day, discount, and free delivery.', 'The chapter explains the importance of predicting purchase behavior in business to maximize profits or the purchase of the people coming into the store.', 'The chapter explores analyzing a small data set to maximize traits influencing purchases, including day, discount, and free delivery.', 'The likelihood table is constructed for the variables, and probability calculations reveal a 36-37% likelihood of purchasing on weekdays and a 20% likelihood of not making a purchase regardless of the day.', 'The data set consists of 30 rows and includes columns for day, discount, free delivery, and purchase status.', 'The probability of no purchase on weekdays is calculated to be roughly 17-18%, indicating that customers are more likely to buy products on weekdays.', 'Probability calculations are used to conclude that customers are most likely to buy products on weekdays.']}, {'end': 1352.33, 'segs': [{'end': 871.992, 'src': 'embed', 'start': 840.262, 'weight': 1, 'content': [{'end': 844.364, 'text': 'Does that lead to a purchase or not? And this is where it starts getting really exciting.', 'start': 840.262, 'duration': 4.102}, {'end': 851.846, 'text': 'Let us use these three likelihood tables to calculate whether a customer will purchase a product on a specific combination of day,', 'start': 844.544, 'duration': 7.302}, {'end': 854.708, 'text': 'discount and free delivery or not purchase.', 'start': 851.846, 'duration': 2.862}, {'end': 857.108, 'text': 'Here let us take a combination of these factors.', 'start': 855.048, 'duration': 2.06}, {'end': 861.23, 'text': 'Day equals holiday, discount equals yes, free delivery equals yes.', 'start': 857.408, 'duration': 3.822}, {'end': 865.071, 'text': "Let's dig deeper into the math and actually see what this looks like.", 'start': 861.79, 'duration': 3.281}, {'end': 871.992, 'text': "And we're going to start with looking for the probability of them not purchasing on the following combinations of days.", 'start': 865.431, 'duration': 6.561}], 'summary': 'Using likelihood tables to predict customer purchase based on day, discount, and free delivery combinations.', 'duration': 31.73, 'max_score': 840.262, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0840262.jpg'}, {'end': 987.61, 'src': 'embed', 'start': 958.995, 'weight': 2, 'content': [{'end': 963.038, 'text': 'And then that is going to be multiplied by the probability of them not making a purchase.', 'start': 958.995, 'duration': 4.043}, {'end': 968.061, 'text': "And then we want to divide that by the total probabilities, and they're multiplied together.", 'start': 963.378, 'duration': 4.683}, {'end': 974.064, 'text': 'So we have the probability of a discount, the probability of a free delivery, and the probability of it being on a holiday.', 'start': 968.221, 'duration': 5.843}, {'end': 979.706, 'text': 'When we plug those numbers in, we see that 1 out of 6 were no purchase on a discounted day,', 'start': 974.284, 'duration': 5.422}, {'end': 987.61, 'text': '2 out of 6 were a no purchase on a free delivery day and 3 out of 6 were a no purchase on a holiday.', 'start': 979.706, 'duration': 7.904}], 'summary': '1 out of 6 no purchase on discounted day, 2 out of 6 on free delivery, 3 out of 6 on holiday.', 'duration': 28.615, 'max_score': 958.995, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0958995.jpg'}, {'end': 1208.357, 'src': 'embed', 'start': 1176.397, 'weight': 0, 'content': [{'end': 1178.518, 'text': 'given these three different variables.', 'start': 1176.397, 'duration': 2.121}, {'end': 1182.82, 'text': "So if it's on a holiday, if it's with a discount and has free delivery,", 'start': 1178.758, 'duration': 4.062}, {'end': 1187.223, 'text': "then there's an 84.71% chance that the customer is going to come in and make a purchase.", 'start': 1182.82, 'duration': 4.403}, {'end': 1189.244, 'text': 'Hooray! They purchased our stuff.', 'start': 1187.483, 'duration': 1.761}, {'end': 1190.024, 'text': "We're making money.", 'start': 1189.344, 'duration': 0.68}, {'end': 1195.247, 'text': "If you're owning a shop, that's like is the bottom line is you want to make some money so you can keep your shop open and have a living.", 'start': 1190.124, 'duration': 5.123}, {'end': 1199.59, 'text': 'Now I promised you that we were going to be finishing up the math here with a few pages.', 'start': 1195.447, 'duration': 4.143}, {'end': 1202.032, 'text': "So we're going to move on and we're going to do two steps.", 'start': 1199.891, 'duration': 2.141}, {'end': 1208.357, 'text': 'The first step is I want you to understand why you want to use the naive Bayes.', 'start': 1202.292, 'duration': 6.065}], 'summary': '84.71% chance of purchase with holiday, discount, and free delivery, emphasizing the importance of making money as a shop owner and the relevance of using naive bayes.', 'duration': 31.96, 'max_score': 1176.397, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01176397.jpg'}, {'end': 1306.078, 'src': 'embed', 'start': 1283.429, 'weight': 3, 'content': [{'end': 1291.872, 'text': "This is why it's used in a lot of our predictions on online shopping carts, referrals, spam filters is because there's no time delay,", 'start': 1283.429, 'duration': 8.443}, {'end': 1298.294, 'text': "as it has to go through and figure out a neural network or one of the other mini setups where you're doing classification.", 'start': 1291.872, 'duration': 6.422}, {'end': 1302.255, 'text': "And certainly there's a lot of other tools out there in the machine learning that can handle these.", 'start': 1298.414, 'duration': 3.841}, {'end': 1306.078, 'text': 'but most of them are not as fast as the naive Bayes.', 'start': 1302.915, 'duration': 3.163}], 'summary': 'Naive bayes is used in predictions for online shopping, referrals, and spam filters due to its speed and efficiency in classification.', 'duration': 22.649, 'max_score': 1283.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01283429.jpg'}, {'end': 1358.673, 'src': 'embed', 'start': 1333.839, 'weight': 4, 'content': [{'end': 1341.104, 'text': "another one that overlaps, and because the two overlap, they can then predict the unknowns for the group that they haven't done the second study on,", 'start': 1333.839, 'duration': 7.265}, {'end': 1341.884, 'text': 'or vice versa.', 'start': 1341.104, 'duration': 0.78}, {'end': 1346.687, 'text': "So it's very powerful in that it is not sensitive to the irrelevant features and, in fact,", 'start': 1342.064, 'duration': 4.623}, {'end': 1350.029, 'text': "you can use it to help predict features that aren't even in there.", 'start': 1346.687, 'duration': 3.342}, {'end': 1352.33, 'text': "So now we're down to my favorite part.", 'start': 1350.289, 'duration': 2.041}, {'end': 1355.532, 'text': "We're going to roll up our sleeves and do some actual programming.", 'start': 1352.35, 'duration': 3.182}, {'end': 1358.673, 'text': "We're going to do the use case text classification.", 'start': 1355.592, 'duration': 3.081}], 'summary': 'Overlap of studies can predict unknowns, not sensitive to irrelevant features, and powerful for predicting unseen features. next, programming for text classification will be done.', 'duration': 24.834, 'max_score': 1333.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01333839.jpg'}], 'start': 822.245, 'title': 'Probability calculation and naive bayes classifier', 'summary': 'Discusses probability calculation for purchase decisions using likelihood tables to reveal insights into purchase behavior, along with an 84.71% likelihood of purchase based on specific conditions. it also explores the advantages of the naive bayes classifier, including simplicity, efficiency with smaller training data, scalability, real-time prediction capability, and insensitivity to irrelevant features.', 'chapters': [{'end': 881.753, 'start': 822.245, 'title': 'Probability calculation for purchase decisions', 'summary': 'Discusses the use of likelihood tables to calculate the probability of a customer making a purchase based on specific combinations of day, discount, and free delivery, leading to exciting insights into purchase behavior.', 'duration': 59.508, 'highlights': ['The chapter emphasizes the use of likelihood tables to calculate the probability of a customer making a purchase based on specific combinations of day, discount, and free delivery.', "It mentions the calculation of probabilities for a discount leading to a purchase, free delivery's impact on purchase decisions, and the probability of a customer purchasing a product on a specific combination of day, discount, and free delivery.", 'The chapter delves into the probability of a customer not purchasing on specific combinations of days, providing insights into purchase behavior based on different factors.']}, {'end': 1195.247, 'start': 882.133, 'title': 'Conditional probability analysis', 'summary': 'Explores the application of conditional probability to analyze customer purchase behavior, revealing an 84.71% likelihood of purchase when a holiday, discount, and free delivery coincide, along with a 15.29% chance of no purchase, based on the given conditions.', 'duration': 313.114, 'highlights': ['The likelihood of a purchase is 84.71% when a holiday, discount, and free delivery coincide, while the chance of no purchase is 15.29%, derived from the conditional probabilities.', 'The probability of a no buy on a discounted day is 0.178, and the probability of a purchase on a day with a holiday, discount, and free delivery is 0.986, representing the key findings of the analysis.', 'The calculation involves multiplying the probabilities of discount, free delivery, and holiday by the overall probability of purchase, then dividing by the total probabilities, resulting in the percentages of 84.71% for purchase and 15.29% for no purchase, crucial for understanding customer behavior and maximizing sales potential.']}, {'end': 1352.33, 'start': 1195.447, 'title': 'Advantages of naive bayes classifier', 'summary': 'Discusses the six advantages of the naive bayes classifier, including its simplicity, efficiency with smaller training data, scalability, real-time prediction capability, and insensitivity to irrelevant features, making it a powerful tool in machine learning applications.', 'duration': 156.883, 'highlights': ['It needs less training data, so if you have smaller amounts of data, this is a great powerful tool for that.', "It's fast and can be used in real-time predictions, making it ideal for applications like online shopping carts and spam filters.", "It's not sensitive to irrelevant features, allowing for solid predictability even with missing or overlapping data.", "Handles both continuous and discrete data, and it's highly scalable with the number of predictors and data points.", "Very simple and easy to implement, and it's a simple algebraic function."]}], 'duration': 530.085, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo0822245.jpg', 'highlights': ['The likelihood of a purchase is 84.71% when a holiday, discount, and free delivery coincide', 'The chapter emphasizes the use of likelihood tables to calculate the probability of a customer making a purchase', 'The calculation involves multiplying the probabilities of discount, free delivery, and holiday by the overall probability of purchase', "It's fast and can be used in real-time predictions, making it ideal for applications like online shopping carts and spam filters", "It's not sensitive to irrelevant features, allowing for solid predictability even with missing or overlapping data"]}, {'end': 1886.845, 'segs': [{'end': 1394.403, 'src': 'embed', 'start': 1367.317, 'weight': 1, 'content': [{'end': 1373.941, 'text': 'So you can plug that into Python code and do that on your own time so you can walk through it since we walked through all the information on it.', 'start': 1367.317, 'duration': 6.624}, {'end': 1380.027, 'text': "But we're going to do a Python code doing text classification, very popular for doing the Naive Bayes.", 'start': 1374.221, 'duration': 5.806}, {'end': 1388.296, 'text': "So we're going to use our new tool to perform a text classification of news headlines and classify news into different topics for a news website.", 'start': 1380.368, 'duration': 7.928}, {'end': 1394.403, 'text': 'As you can see here, we have a nice image of the Google News and then related on the right, subgroups.', 'start': 1388.556, 'duration': 5.847}], 'summary': 'Using python for text classification of news headlines into topics.', 'duration': 27.086, 'max_score': 1367.317, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01367317.jpg'}, {'end': 1555.553, 'src': 'embed', 'start': 1524.273, 'weight': 3, 'content': [{'end': 1526.174, 'text': 'Remember, I said three of these were about graphing?', 'start': 1524.273, 'duration': 1.901}, {'end': 1535.179, 'text': "Well, we need our matplotlibrary.pyplot as plt and you'll see that plt is a very common setup, as is the sns and just like the np,", 'start': 1526.554, 'duration': 8.625}, {'end': 1539.901, 'text': "and we're going to import seaborn as sns and we're going to do the sns.set.", 'start': 1535.179, 'duration': 4.722}, {'end': 1545.404, 'text': 'Now seaborn sits on top of pyplot and it just makes a really nice heatmap.', 'start': 1540.241, 'duration': 5.163}, {'end': 1550.107, 'text': "It's really good for heatmaps and if you're not familiar with heatmaps that just means we give it a color scale.", 'start': 1545.664, 'duration': 4.443}, {'end': 1555.553, 'text': 'The term comes from the brighter red it is, the hotter it is in some form of data.', 'start': 1550.507, 'duration': 5.046}], 'summary': 'Import matplotlib.pyplot as plt and seaborn as sns to create a heatmap for data visualization.', 'duration': 31.28, 'max_score': 1524.273, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01524273.jpg'}, {'end': 1656.154, 'src': 'heatmap', 'start': 1572.751, 'weight': 0.738, 'content': [{'end': 1578.074, 'text': "And then from the sklearn.datasets, we're going to import the fetch20newsgroups.", 'start': 1572.751, 'duration': 5.323}, {'end': 1584.117, 'text': 'Very common one for analyzing, tokenizing words and setting them up and exploring how the words work,', 'start': 1578.574, 'duration': 5.543}, {'end': 1586.798, 'text': "and how do you categorize different things when you're dealing with documents?", 'start': 1584.117, 'duration': 2.681}, {'end': 1592.481, 'text': 'And then we set our data equal to fetch20newsgroups, so our data variable will have the data in it.', 'start': 1587.258, 'duration': 5.223}, {'end': 1596.884, 'text': "And we're going to go ahead and just print the target names, data.targetNames.", 'start': 1593.161, 'duration': 3.723}, {'end': 1598.185, 'text': "And let's see what that looks like.", 'start': 1596.944, 'duration': 1.241}, {'end': 1605.41, 'text': "And you'll see here we have AltAtheism, CompGraphics, CompOS, MSWindows.Miscellaneous.", 'start': 1598.745, 'duration': 6.665}, {'end': 1610.813, 'text': 'And it goes all the way down to TalkPolitics.Miscellaneous, TalkReligion.Miscellaneous.', 'start': 1605.95, 'duration': 4.863}, {'end': 1614.856, 'text': "These are the categories they've already assigned to this news group.", 'start': 1611.014, 'duration': 3.842}, {'end': 1621.561, 'text': "And it's called Fetch20 because you'll see there's, I believe there's 20 different topics in here, or 20 different categories as we scroll down.", 'start': 1615.196, 'duration': 6.365}, {'end': 1629.068, 'text': "Now we've gone through the 20 different categories and we're going to go ahead and start defining all the categories and set up our data.", 'start': 1621.96, 'duration': 7.108}, {'end': 1633.673, 'text': "So we're actually going to go ahead and get the data all set up and take a look at our data.", 'start': 1629.288, 'duration': 4.385}, {'end': 1638.519, 'text': "And let's move this over to our Jupyter notebook and let's see what this code does.", 'start': 1634.094, 'duration': 4.425}, {'end': 1641.141, 'text': "First we're going to set our categories.", 'start': 1639.5, 'duration': 1.641}, {'end': 1648.948, 'text': "Now if you noticed up here I could have just as easily set this equal to data.target underscore names because it's the same thing.", 'start': 1641.422, 'duration': 7.526}, {'end': 1652.291, 'text': 'But we want to kind of spell it out for you so you can see the different categories.', 'start': 1649.328, 'duration': 2.963}, {'end': 1656.154, 'text': 'It kind of makes it more visual so you can see what your data is looking like in the background.', 'start': 1652.691, 'duration': 3.463}], 'summary': 'Using fetch20newsgroups to analyze and categorize 20 different topics in data.', 'duration': 83.403, 'max_score': 1572.751, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01572751.jpg'}, {'end': 1746.003, 'src': 'embed', 'start': 1717.834, 'weight': 0, 'content': [{'end': 1721.975, 'text': "And we'll see test number five is a different article, but it's another article in here.", 'start': 1717.834, 'duration': 4.141}, {'end': 1726.237, 'text': "And maybe you're curious and you want to see just how many articles are in here.", 'start': 1722.396, 'duration': 3.841}, {'end': 1730.978, 'text': 'We could do length of train dot data.', 'start': 1726.277, 'duration': 4.701}, {'end': 1735.479, 'text': "And if we run that, you'll see that the training data has 11, 314 articles.", 'start': 1731.338, 'duration': 4.141}, {'end': 1739.821, 'text': "So we're not going to go through all those articles.", 'start': 1738.36, 'duration': 1.461}, {'end': 1741.041, 'text': "That's a lot of articles.", 'start': 1739.861, 'duration': 1.18}, {'end': 1746.003, 'text': "But we can look at one of them just so you can see what kind of information is coming out of it and what we're looking at.", 'start': 1741.281, 'duration': 4.722}], 'summary': 'Training data contains 11,314 articles, too many to review individually.', 'duration': 28.169, 'max_score': 1717.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01717834.jpg'}, {'end': 1785.1, 'src': 'embed', 'start': 1758.408, 'weight': 2, 'content': [{'end': 1764.071, 'text': "Now we've looked at it and that's pretty complicated when you look at one of these articles to try to figure out how do you weight this.", 'start': 1758.408, 'duration': 5.663}, {'end': 1769.713, 'text': 'If you look down here, we have different words, and maybe the word from Well, from, is probably in all the articles,', 'start': 1764.091, 'duration': 5.622}, {'end': 1775.736, 'text': "so it's not going to have a lot of meaning as far as trying to figure out whether this article fits one of the categories or not.", 'start': 1769.713, 'duration': 6.023}, {'end': 1780.538, 'text': 'So trying to figure out which category it fits in based on these words is where the challenge comes in.', 'start': 1776.016, 'duration': 4.522}, {'end': 1785.1, 'text': "Now that we've viewed our data, we're going to dive in and do the actual predictions.", 'start': 1780.938, 'duration': 4.162}], 'summary': 'Challenging to categorize articles based on common words, diving into predictions now.', 'duration': 26.692, 'max_score': 1758.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01758408.jpg'}, {'end': 1864.636, 'src': 'embed', 'start': 1834.319, 'weight': 4, 'content': [{'end': 1834.98, 'text': 'You can look it up.', 'start': 1834.319, 'duration': 0.661}, {'end': 1837.863, 'text': 'The notation for the math is usually tf.idf.', 'start': 1835.12, 'duration': 2.743}, {'end': 1841.325, 'text': "And that's just a way of weighing the words.", 'start': 1839.304, 'duration': 2.021}, {'end': 1848.908, 'text': "And it weighs the words based on how many times they're used in a document, how many times, or how many documents they're used in.", 'start': 1841.585, 'duration': 7.323}, {'end': 1850.229, 'text': "And it's a well-used formula.", 'start': 1849.028, 'duration': 1.201}, {'end': 1851.29, 'text': "It's been around for a while.", 'start': 1850.249, 'duration': 1.041}, {'end': 1853.451, 'text': "It's a little confusing to put this in here.", 'start': 1851.59, 'duration': 1.861}, {'end': 1858.833, 'text': "But let's let it know that it just goes in there and weights the different words in the document for us.", 'start': 1854.211, 'duration': 4.622}, {'end': 1860.774, 'text': "That way we don't have to wait.", 'start': 1859.433, 'duration': 1.341}, {'end': 1862.375, 'text': 'And if you put a weight on it.', 'start': 1860.794, 'duration': 1.581}, {'end': 1864.636, 'text': 'if you remember, I was talking about that up here earlier.', 'start': 1862.375, 'duration': 2.261}], 'summary': 'Tf.idf notation weighs words based on usage in documents.', 'duration': 30.317, 'max_score': 1834.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01834319.jpg'}], 'start': 1352.35, 'title': 'Text analysis with python and tf.idf', 'summary': "Covers text classification using python's naive bayes classifier to classify news headlines, with a focus on training data and graphical visualization. it also covers text data analysis, creation of training and testing sets, and analyzing 11,314 articles. additionally, it explains the challenges of weighing words in text data analysis and the relevance of tf.idf vectorizer in determining document meaning.", 'chapters': [{'end': 1621.561, 'start': 1352.35, 'title': 'Text classification with python', 'summary': 'Covers the implementation of a python code for text classification using the naive bayes classifier to classify news headlines into different topics, with a focus on the process of training the data set and the use of graphical visualization, based on the fetch20newsgroups dataset.', 'duration': 269.211, 'highlights': ['The chapter covers the implementation of a Python code for text classification using the Naive Bayes classifier to classify news headlines into different topics, with a focus on the process of training the data set and the use of graphical visualization. It also includes a request to obtain the data for the shopping cart. The script is written in Python 3.5 and is suitable for Python 3.x versions.', 'The Python script includes basic imports, training the data set, and generating a graph to visualize the data. It also emphasizes the importance of graphical visualization in data analysis for better understanding. It utilizes the fetch20newsgroups dataset for analyzing and categorizing news topics. The target names include categories such as AltAtheism, CompGraphics, MSWindows.Miscellaneous, TalkReligion.Miscellaneous, and more.', 'The script utilizes Python libraries such as matplotlib, numpy, and seaborn for graphing and visualization purposes. It explains the significance of these libraries in creating visual representations of data, particularly heatmaps for displaying color-scaled data. The fetch20newsgroups dataset is used for analyzing and categorizing news topics based on tokenizing words and exploring word categorization in documents.']}, {'end': 1758.188, 'start': 1621.96, 'title': 'Text data analysis', 'summary': 'Covers setting up data for text analysis, extracting categories, creating training and testing sets, and analyzing the number of articles, with the training data containing 11,314 articles.', 'duration': 136.228, 'highlights': ['The training data contains 11,314 articles, providing a substantial amount of data for analysis.', 'The process involves setting up categories, creating training and testing sets, and examining individual articles to understand the nature of the data.', 'The code demonstrates fetching 20 newsgroups, extracting subsets for training and testing, and printing out specific articles for analysis.']}, {'end': 1886.845, 'start': 1758.408, 'title': 'Text data analysis with tf.idf', 'summary': 'Discusses the challenges of weighing words in text data analysis, introduces the tf.idf vectorizer for weighing words, and explains its relevance in determining the meaning of a document.', 'duration': 128.437, 'highlights': ['The chapter discusses the challenges of weighing words in text data analysis, emphasizing the difficulty in determining the category a document fits in based on its words and the relevance of specific words.', "Introduces the tf.idf vectorizer for weighing words, explaining its function in weighing words based on their usage in documents and its role in determining the value of words in describing a document's content.", "Explains the relevance of specific words in determining the meaning of a document, highlighting the low weight of common words like 'from' and the higher weight of descriptive words like 'criminal' and 'weapons'."]}], 'duration': 534.495, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01352350.jpg', 'highlights': ['The training data contains 11,314 articles, providing a substantial amount of data for analysis.', 'The chapter covers the implementation of a Python code for text classification using the Naive Bayes classifier to classify news headlines into different topics, with a focus on the process of training the data set and the use of graphical visualization.', 'The chapter discusses the challenges of weighing words in text data analysis, emphasizing the difficulty in determining the category a document fits in based on its words and the relevance of specific words.', 'The script utilizes Python libraries such as matplotlib, numpy, and seaborn for graphing and visualization purposes. It explains the significance of these libraries in creating visual representations of data, particularly heatmaps for displaying color-scaled data.', "Introduces the tf.idf vectorizer for weighing words, explaining its function in weighing words based on their usage in documents and its role in determining the value of words in describing a document's content."]}, {'end': 2621.961, 'segs': [{'end': 2041.612, 'src': 'embed', 'start': 2013.391, 'weight': 1, 'content': [{'end': 2017.673, 'text': 'And once we go into our naive base, we want to put the train target in there.', 'start': 2013.391, 'duration': 4.282}, {'end': 2026.119, 'text': "So the train data that's been mapped to the TFID vectorizer is now going through the multinomial NB and then we're telling it.", 'start': 2017.794, 'duration': 8.325}, {'end': 2027.12, 'text': 'well, these are the answers.', 'start': 2026.119, 'duration': 1.001}, {'end': 2029.542, 'text': 'these are the answers to the different documents.', 'start': 2027.64, 'duration': 1.902}, {'end': 2036.688, 'text': 'So this document that has all these words with these different weights from the first part is going to be whatever category it comes out of.', 'start': 2029.722, 'duration': 6.966}, {'end': 2041.612, 'text': "Maybe it's the talk show or the article on religion, miscellaneous.", 'start': 2036.808, 'duration': 4.804}], 'summary': 'Using tfid vectorizer, train data goes through multinomial nb to categorize documents.', 'duration': 28.221, 'max_score': 2013.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02013391.jpg'}, {'end': 2112.963, 'src': 'embed', 'start': 2087.523, 'weight': 4, 'content': [{'end': 2094.969, 'text': 'The confusion matrix, which is confusing just by its very name, is basically going to ask how confused is our answer?', 'start': 2087.523, 'duration': 7.446}, {'end': 2099.633, 'text': 'Did it get it correct or did it miss some things in there or have some missed labels?', 'start': 2095.208, 'duration': 4.425}, {'end': 2104.696, 'text': "And then we're going to put that on a heat map, so we'll have some nice colors to look at to see how that plots out.", 'start': 2099.993, 'duration': 4.703}, {'end': 2109.5, 'text': "Let's go ahead and take this code and take a walk through it and see what that looks like.", 'start': 2104.997, 'duration': 4.503}, {'end': 2112.963, 'text': "So back to our Jupyter notebook, I'm going to put the code in there.", 'start': 2110.041, 'duration': 2.922}], 'summary': 'Exploring confusion matrix on jupyter notebook with code walkthrough.', 'duration': 25.44, 'max_score': 2087.523, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02087523.jpg'}, {'end': 2514.47, 'src': 'embed', 'start': 2487.833, 'weight': 2, 'content': [{'end': 2493.658, 'text': 'So when we take our definition, our function, and we run all these things through, kudos, we made it.', 'start': 2487.833, 'duration': 5.825}, {'end': 2501.484, 'text': 'We were able to correctly classify texts into different groups based on which category they belong to using the naive Bayes classifier.', 'start': 2493.858, 'duration': 7.626}, {'end': 2507.987, 'text': 'Now, we did throw in the pipeline, the TF-IDF vectorizer, we threw in the graphs.', 'start': 2501.724, 'duration': 6.263}, {'end': 2514.47, 'text': "Those are all things that you don't necessarily have to know to understand the Naive Bayes setup or classifier.", 'start': 2508.067, 'duration': 6.403}], 'summary': 'Successfully classified texts into groups using naive bayes and tf-idf vectorizer.', 'duration': 26.637, 'max_score': 2487.833, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02487833.jpg'}, {'end': 2555.099, 'src': 'embed', 'start': 2529.997, 'weight': 0, 'content': [{'end': 2535.682, 'text': "You don't have to know those to understand Naive Bayes, but they certainly help for understanding the industry and data science.", 'start': 2529.997, 'duration': 5.685}, {'end': 2540.185, 'text': 'And we can see our categorizer, our Naive Bayes classifier.', 'start': 2535.962, 'duration': 4.223}, {'end': 2547.972, 'text': 'We were able to predict the category religion, space motorcycles, autos, politics and properly classify all these different things.', 'start': 2540.385, 'duration': 7.587}, {'end': 2550.714, 'text': 'we pushed into our prediction and our trained model.', 'start': 2547.972, 'duration': 2.742}, {'end': 2553.096, 'text': "Let's go ahead and wrap it up.", 'start': 2551.194, 'duration': 1.902}, {'end': 2555.099, 'text': "And let's just go through what we covered.", 'start': 2553.598, 'duration': 1.501}], 'summary': 'Naive bayes accurately classified religion, space, motorcycles, autos, and politics categories.', 'duration': 25.102, 'max_score': 2529.997, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo02529997.jpg'}], 'start': 1887.305, 'title': 'Text classification and naive bayes', 'summary': 'Explains text classification with tf-idf vectorizer and multinomial nb model, achieving overall accuracy and covers naive bayes classifier implementation for accurate predictions in various categories using tokenization and tf-idf vectorization.', 'chapters': [{'end': 2309.917, 'start': 1887.305, 'title': 'Text classification and evaluation', 'summary': "Explains the process of text classification using tf-idf vectorizer and multinomial nb model, followed by the evaluation using confusion matrix and heat map to assess the model's performance, achieving an overall accuracy with some misclassifications.", 'duration': 422.612, 'highlights': ['The process of text classification using TF-IDF vectorizer and multinomial NB model', 'Evaluation using confusion matrix and heat map']}, {'end': 2621.961, 'start': 2310.298, 'title': 'Naive bayes text classification', 'summary': 'Covers the implementation of a naive bayes classifier for text categorization, achieving accurate predictions for different categories including religion, space, automobiles, and politics, using techniques such as tokenization, tf-idf vectorization, and pipeline processing.', 'duration': 311.663, 'highlights': ['The chapter demonstrates the successful use of a Naive Bayes classifier to accurately predict the categories of texts, including religion, space, automobiles, and politics, showcasing the practical application of techniques such as tokenization and TF-IDF vectorization. This showcases the effective implementation of Naive Bayes for text categorization.', 'The implementation of a Naive Bayes classifier is shown to accurately classify various texts into different groups, utilizing techniques like tokenization, TF-IDF vectorization, and pipeline processing, demonstrating the practical application of these methods for text categorization.', 'The chapter emphasizes the practical application of Naive Bayes for text classification, showcasing its effectiveness in accurately predicting categories such as religion, space, automobiles, and politics, while employing techniques like tokenization, TF-IDF vectorization, and pipeline processing for efficient categorization.']}], 'duration': 734.656, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/l3dZ6ZNFjo0/pics/l3dZ6ZNFjo01887305.jpg', 'highlights': ['The chapter demonstrates the successful use of a Naive Bayes classifier to accurately predict the categories of texts, including religion, space, automobiles, and politics, showcasing the practical application of techniques such as tokenization and TF-IDF vectorization.', 'The process of text classification using TF-IDF vectorizer and multinomial NB model', 'The implementation of a Naive Bayes classifier is shown to accurately classify various texts into different groups, utilizing techniques like tokenization, TF-IDF vectorization, and pipeline processing, demonstrating the practical application of these methods for text categorization.', 'The chapter emphasizes the practical application of Naive Bayes for text classification, showcasing its effectiveness in accurately predicting categories such as religion, space, automobiles, and politics, while employing techniques like tokenization, TF-IDF vectorization, and pipeline processing for efficient categorization.', 'Evaluation using confusion matrix and heat map']}], 'highlights': ['The likelihood of a purchase is 84.71% when a holiday, discount, and free delivery coincide', 'The training data contains 11,314 articles, providing a substantial amount of data for analysis', 'The chapter covers the implementation of a Python code for text classification using the Naive Bayes classifier to classify news headlines into different topics', 'The chapter demonstrates the successful use of a Naive Bayes classifier to accurately predict the categories of texts, including religion, space, automobiles, and politics', 'The chapter promises an in-depth understanding of the workings of Naive Bayes Classifier and its advantages in machine learning setup', 'The chapter explores analyzing a small data set to maximize traits influencing purchases, including day, discount, and free delivery', 'The chapter discusses applying probability to a shopping demo problem to predict purchase behavior based on variables like day, discount, and free delivery', 'The chapter explains the importance of predicting purchase behavior in business to maximize profits or the purchase of the people coming into the store', "The chapter introduces the tf.idf vectorizer for weighing words, explaining its function in weighing words based on their usage in documents and its role in determining the value of words in describing a document's content", 'The chapter provides a detailed explanation of conditional probability using the example of coin tosses, demonstrating the computation of probabilities for different outcomes like getting two heads and at least one tail, with specific percentages such as 75% for three tails out of four trials', 'The chapter explores the use cases of naive Bayes in various fields such as face recognition, weather prediction, medical diagnosis, and news classification, emphasizing its role as a classifier in identifying objects or predicting outcomes based on given data']}