title
Text Classification Using BERT & Tensorflow | Deep Learning Tutorial 47 (Tensorflow, Keras & Python)

description
Using BERT and Tensorflow 2.0, we will write simple code to classify emails as spam or not spam. BERT will be used to generate sentence encoding for all emails and after that we will use a simple neural network with one drop out layer and one output layer. What is BERT? https://www.youtube.com/watch?v=7kLi8u2dJz0 Code: https://github.com/codebasics/deep-learning-keras-tf-tutorial/blob/master/47_BERT_text_classification/BERT_email_classification-handle-imbalance.ipynb Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses. Deep learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO Machine learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw πŸ”–HashtagsπŸ”– #BERTModel #bertmodelnlppython #BERTtextclassification #BERTtutorial #tensorflowbert #tensorflowberttutorial 🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website. πŸŽ₯ Codebasics Hindi channel: https://www.youtube.com/channel/UCTmFBhuhMibVoSfYom1uXEg #️⃣ Social Media #️⃣ πŸ”— Discord: https://discord.gg/r42Kbuk πŸ“Έ Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/ πŸ“Έ Instagram: https://www.instagram.com/codebasicshub/ πŸ”Š Facebook: https://www.facebook.com/codebasicshub πŸ“± Twitter: https://twitter.com/codebasicshub πŸ“ Linkedin (Personal): https://www.linkedin.com/in/dhavalsays/ πŸ“ Linkedin (Codebasics): https://www.linkedin.com/company/codebasics/ ❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

detail
{'title': 'Text Classification Using BERT & Tensorflow | Deep Learning Tutorial 47 (Tensorflow, Keras & Python)', 'heatmap': [{'end': 1123.574, 'start': 1085.347, 'weight': 0.775}, {'end': 1461.316, 'start': 1397.32, 'weight': 0.959}, {'end': 1529.782, 'start': 1505.933, 'weight': 0.835}], 'summary': 'Tutorial covers using bert for email classification, including generating a 768-length embedding vector, handling imbalanced data by downsampling majority class to 15% spam and 85% ham emails, creating neural network layers with dropout and dense layers, achieving an accuracy of 91%, and discussing text preprocessing, encoding, and cosine similarity.', 'chapters': [{'end': 75.727, 'segs': [{'end': 75.727, 'src': 'embed', 'start': 23.322, 'weight': 0, 'content': [{'end': 31.944, 'text': 'So we saw in a previous video that the purpose of BERT is to generate an embedding vector for the entire sentence.', 'start': 23.322, 'duration': 8.622}, {'end': 36.326, 'text': 'And that is something that we can feed into our neural network and do the training.', 'start': 32.445, 'duration': 3.881}, {'end': 40.347, 'text': 'So here we will generate a vector of 768 length.', 'start': 36.646, 'duration': 3.701}, {'end': 42.868, 'text': 'Why 768? We have covered that in a previous video.', 'start': 40.607, 'duration': 2.261}, {'end': 51.953, 'text': 'then we will supply that to a very simple neural network with only one dense layer.', 'start': 43.768, 'duration': 8.185}, {'end': 61.539, 'text': 'one neuron in the dense layer as an output will also put a dropout layer in between, just to tackle the overfitting.', 'start': 51.953, 'duration': 9.586}, {'end': 67.842, 'text': 'Now, if you open this bird box, by the way, it has two components preprocessing and encoding.', 'start': 62.419, 'duration': 5.423}, {'end': 70.084, 'text': 'And we talked about that in a previous video as well.', 'start': 68.182, 'duration': 1.902}, {'end': 73.466, 'text': 'So previous video watching that is quite a prerequisite.', 'start': 70.124, 'duration': 3.342}, {'end': 75.727, 'text': "So let's jump into coding now.", 'start': 73.726, 'duration': 2.001}], 'summary': 'Bert generates a 768-length embedding vector for sentence input to a neural network with a single dense layer and a dropout layer to tackle overfitting.', 'duration': 52.405, 'max_score': 23.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA23322.jpg'}], 'start': 0.618, 'title': 'Using bert for email classification', 'summary': 'Discusses the use of bert for email classification, including generating a 768-length embedding vector for sentences and training a neural network with a dense layer and a dropout layer to tackle overfitting.', 'chapters': [{'end': 75.727, 'start': 0.618, 'title': 'Bert for email classification', 'summary': 'Discusses using bert for email classification, generating a 768-length embedding vector for sentences, and training a neural network with a dense layer and a dropout layer for tackling overfitting.', 'duration': 75.109, 'highlights': ['BERT converts an email sentence into a 768-length embedding vector, which can be fed into a neural network for training.', 'The neural network for email classification consists of a simple architecture with one dense layer and a dropout layer to tackle overfitting.', 'The purpose of BERT is to generate an embedding vector for the entire sentence, which is then used for email classification.']}], 'duration': 75.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA618.jpg', 'highlights': ['BERT converts an email sentence into a 768-length embedding vector for neural network training.', 'The neural network for email classification has a simple architecture with one dense layer and a dropout layer.', "BERT's purpose is to generate an embedding vector for the entire sentence, used for email classification."]}, {'end': 545.196, 'segs': [{'end': 224.68, 'src': 'embed', 'start': 192.861, 'weight': 0, 'content': [{'end': 194.963, 'text': 'So there is definitely an imbalance.', 'start': 192.861, 'duration': 2.102}, {'end': 209.933, 'text': 'So downsampling technique, what it says is from 4825 ham emails, just pick any random 747 samples, discard the rest of them.', 'start': 195.743, 'duration': 14.19}, {'end': 215.716, 'text': 'that way then you have equal number of spam and ham email.', 'start': 210.994, 'duration': 4.722}, {'end': 224.68, 'text': "now, obviously, when you're discarding rest of the samples, you are losing your training data, which in certain cases might not be good.", 'start': 215.716, 'duration': 8.964}], 'summary': 'Imbalance in email data: downsampling to 747 samples from 4825 ham emails creates equal spam and ham, but leads to loss of training data.', 'duration': 31.819, 'max_score': 192.861, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA192861.jpg'}, {'end': 478.471, 'src': 'embed', 'start': 441.658, 'weight': 1, 'content': [{'end': 443.96, 'text': 'I want to create a binary column called spam.', 'start': 441.658, 'duration': 2.302}, {'end': 453.299, 'text': 'you know boolean like one or zero, that kind of thing, and you know how to create a new column.', 'start': 443.96, 'duration': 9.339}, {'end': 467.086, 'text': 'when you do this, it will create a new column and the way you create a new column is you go through the existing data frame, the category column,', 'start': 453.299, 'duration': 13.787}, {'end': 469.327, 'text': 'and you use the function apply.', 'start': 467.086, 'duration': 2.241}, {'end': 478.471, 'text': "so if you've seen my pandas videos, you will get an understanding that on category column we are applying some transformation.", 'start': 469.327, 'duration': 9.144}], 'summary': "Create a binary 'spam' column using apply function on the category column.", 'duration': 36.813, 'max_score': 441.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA441658.jpg'}], 'start': 77.168, 'title': 'Handling imbalanced email data', 'summary': "Discusses handling an imbalanced dataset of 4825 ham emails and 747 spam emails, achieving a balanced distribution by downsampling the majority class to 15% spam emails and 85% ham emails. it also outlines creating a binary 'spam' column using a lambda function in python.", 'chapters': [{'end': 330.752, 'start': 77.168, 'title': 'Handling imbalanced data in email classification', 'summary': 'Discusses handling an imbalanced dataset consisting of 4825 ham emails and 747 spam emails, with a focus on downsampling the majority class to achieve a balanced distribution by randomly selecting 747 samples, resulting in 15% spam emails and 85% ham emails.', 'duration': 253.584, 'highlights': ['The dataset contains 4825 ham emails and 747 spam emails, indicating an imbalance in the data set.', 'The downsampling technique involves randomly selecting 747 samples from the majority class to achieve a balanced distribution, resulting in 15% spam emails and 85% ham emails.', 'Other techniques for handling imbalanced data include SMOTE oversampling and can be explored through additional resources.']}, {'end': 545.196, 'start': 330.752, 'title': 'Balancing data and creating binary column', 'summary': "Outlines the process of balancing a data frame to have an equal number of spam and ham emails, discarding the rest of the sample, and creating a binary 'spam' column through the use of lambda function in python.", 'duration': 214.444, 'highlights': ['Balancing the data frame by discarding the rest of the sample to have an equal number of spam and ham emails. The speaker discards the rest of the sample and picks 747 samples from the HAM category to achieve an equal number of spam and ham emails.', "Creating a binary 'spam' column using a lambda function to designate 1 for spam and 0 for ham. The speaker uses a lambda function to create a new column where spam is designated as 1 and ham as 0, ensuring a binary representation of the 'spam' category.", 'Explaining the process of applying a transformation using the function apply on the category column. The speaker explains the use of the apply function to apply a transformation on the category column, providing insights into the process of creating a new column based on the existing data.']}], 'duration': 468.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA77168.jpg', 'highlights': ['The downsampling technique involves randomly selecting 747 samples from the majority class to achieve a balanced distribution, resulting in 15% spam emails and 85% ham emails.', "Creating a binary 'spam' column using a lambda function to designate 1 for spam and 0 for ham.", 'Balancing the data frame by discarding the rest of the sample to have an equal number of spam and ham emails.']}, {'end': 1054.817, 'segs': [{'end': 583.153, 'src': 'embed', 'start': 545.256, 'weight': 6, 'content': [{'end': 552.68, 'text': "I'm ready to do train test split and we have seen this again and again in so many of my machine learning videos.", 'start': 545.256, 'duration': 7.424}, {'end': 555.982, 'text': 'We use train test split from sklearn.', 'start': 552.76, 'duration': 3.222}, {'end': 559.904, 'text': 'And this is a standard way of doing it.', 'start': 557.023, 'duration': 2.881}, {'end': 573.432, 'text': 'You will use stratify so that in your train and test sample the distribution of the categories is equal.', 'start': 563.826, 'duration': 9.606}, {'end': 578.987, 'text': 'Alright, my Xtrain etc is ready.', 'start': 576.423, 'duration': 2.564}, {'end': 583.153, 'text': 'I will just print the head of Xtrain just to make sure.', 'start': 579.007, 'duration': 4.146}], 'summary': 'Using train test split from sklearn to ensure equal distribution of categories for machine learning model training.', 'duration': 37.897, 'max_score': 545.256, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA545256.jpg'}, {'end': 677.684, 'src': 'embed', 'start': 610.21, 'weight': 0, 'content': [{'end': 615.932, 'text': "so you're just using this url to download those train models and you'll realize that when you execute this,", 'start': 610.21, 'duration': 5.722}, {'end': 626.578, 'text': "You'll see star for a few seconds or minutes based on your speed, because it is downloading the train model locally on your computer.", 'start': 616.826, 'duration': 9.752}, {'end': 640.232, 'text': 'Now, as you have seen in our presentation that my goal is to supply sentence to BERT model and create this length vector.', 'start': 629.727, 'duration': 10.505}, {'end': 648.076, 'text': "Let's write a simple Python function that takes the sentence as an input and returns this vector as an output.", 'start': 641.193, 'duration': 6.883}, {'end': 654.78, 'text': 'Meanwhile, this download got completed, so my BERT preprocess and encoder is ready.', 'start': 649.437, 'duration': 5.343}, {'end': 657.626, 'text': 'want to write a function.', 'start': 656.445, 'duration': 1.181}, {'end': 677.684, 'text': 'you know something like this when I say get sentence embedding and when I supply a sentence like this, it should just return me 768 len vector.', 'start': 657.626, 'duration': 20.058}], 'summary': 'Downloading train models and creating sentence embeddings using bert for 768 len vector.', 'duration': 67.474, 'max_score': 610.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA610210.jpg'}, {'end': 774.527, 'src': 'embed', 'start': 739.515, 'weight': 1, 'content': [{'end': 751.878, 'text': 'you call preprocess on your sentences, obviously, and you get the.', 'start': 739.515, 'duration': 12.363}, {'end': 753.678, 'text': 'actually there is a spelling mistake.', 'start': 751.878, 'duration': 1.8}, {'end': 756.799, 'text': 'so you call preprocess on sentences.', 'start': 753.678, 'duration': 3.121}, {'end': 765.443, 'text': 'you get preprocess text as an output and that you supply into bot encoder.', 'start': 756.799, 'duration': 8.644}, {'end': 769.245, 'text': 'the code is so simple, friends.', 'start': 765.443, 'duration': 3.802}, {'end': 771.085, 'text': "it's just like a function pointer.", 'start': 769.245, 'duration': 1.84}, {'end': 774.527, 'text': "you know it's a simple function.", 'start': 771.085, 'duration': 3.442}], 'summary': 'Preprocessing sentences with a simple function pointer.', 'duration': 35.012, 'max_score': 739.515, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA739515.jpg'}, {'end': 898.92, 'src': 'embed', 'start': 799.289, 'weight': 2, 'content': [{'end': 802.972, 'text': "that's why watching previous video is very much essential.", 'start': 799.289, 'duration': 3.683}, {'end': 810.239, 'text': "i'm just going to return this, by the way, and that would be my sentence encoding,", 'start': 802.972, 'duration': 7.267}, {'end': 817.597, 'text': "And when you run it you can see it's returning a tensor of shape 2768..", 'start': 811.513, 'duration': 6.084}, {'end': 825.783, 'text': 'So this particular tensor is my sentence and encoding for the first sentence.', 'start': 817.597, 'duration': 8.186}, {'end': 838.131, 'text': 'For the second sentence, here is my encoding and this encodings are returned by pre-trained BERT model, which I have downloaded from TF Hub website.', 'start': 826.423, 'duration': 11.708}, {'end': 848.835, 'text': 'Great, now let me call sentence encoding for some simple words, you know, like banana, grapes, mangoes and so on.', 'start': 839.651, 'duration': 9.184}, {'end': 853.357, 'text': 'I want to see what this really means.', 'start': 849.395, 'duration': 3.962}, {'end': 859.079, 'text': 'What is the benefit of having this bird encoding? So.', 'start': 854.937, 'duration': 4.142}, {'end': 865.978, 'text': 'I will try cosine similarity.', 'start': 863.757, 'duration': 2.221}, {'end': 872.982, 'text': 'Cosine similarity is a way to measure how similar two vectors are.', 'start': 866.018, 'duration': 6.964}, {'end': 882.266, 'text': 'If two vectors are pointing in the same direction, then that means the cosine similarities will be close to one.', 'start': 873.742, 'duration': 8.524}, {'end': 890.671, 'text': "Now, if you don't know about cosine similarity, you can do core basics cosine similarity.", 'start': 882.907, 'duration': 7.764}, {'end': 894.439, 'text': "You'll find a very simple video.", 'start': 891.618, 'duration': 2.821}, {'end': 898.92, 'text': 'I try to make videos which even a high school student can understand it easily.', 'start': 894.879, 'duration': 4.041}], 'summary': 'Demonstrating sentence encoding using pre-trained bert model and cosine similarity for similarity measurement.', 'duration': 99.631, 'max_score': 799.289, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA799289.jpg'}, {'end': 1054.817, 'src': 'embed', 'start': 1026.516, 'weight': 3, 'content': [{'end': 1028.957, 'text': 'Three is Jeff Bezos, four is Elon Musk.', 'start': 1026.516, 'duration': 2.441}, {'end': 1034.819, 'text': 'See 98% because when they train board model on all the Wikipedia text.', 'start': 1030.217, 'duration': 4.602}, {'end': 1039.579, 'text': 'The context in which Jeff Bezos and Elon Musk appear would be similar.', 'start': 1036.019, 'duration': 3.56}, {'end': 1044.982, 'text': 'When they would be talking about Jeff Bezos, you will see words such as entrepreneur, rich.', 'start': 1040.601, 'duration': 4.381}, {'end': 1051.834, 'text': 'you know, like company name, uh, their plants and so on.', 'start': 1046.909, 'duration': 4.925}, {'end': 1054.817, 'text': "so that's why their cosine similarity is similar.", 'start': 1051.834, 'duration': 2.983}], 'summary': 'Jeff bezos and elon musk have a 98% similarity in context, based on training a model on all wikipedia text, leading to similar cosine similarity.', 'duration': 28.301, 'max_score': 1026.516, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1026516.jpg'}], 'start': 545.256, 'title': 'Bert model and text embedding', 'summary': 'Covers the process of using pre-trained bert models to create 768-length sentence embeddings, emphasizing text preprocessing and encoding, with a final output being a tensor of shape 2768, and discussing the concept of cosine similarity using quantifiable data.', 'chapters': [{'end': 711.092, 'start': 545.256, 'title': 'Bert model for text embedding', 'summary': 'Covers the process of train test split, utilizing stratify for equal category distribution, and downloading and using pre-trained bert models to create sentence embeddings, resulting in a 768 length vector for each input sentence.', 'duration': 165.836, 'highlights': ['The process of train test split from sklearn is demonstrated, with emphasis on using stratify to ensure equal distribution of categories in the train and test samples.', 'The method of downloading pre-trained BERT models from the TensorFlow Hub website is explained, highlighting the process of creating sentence embeddings, resulting in a 768 length vector for each input sentence.', 'A Python function for generating sentence embeddings using the pre-trained BERT model is presented, with the capability to handle multiple statements and return multiple 768 length vectors.']}, {'end': 825.783, 'start': 711.572, 'title': 'Text preprocessing and encoding', 'summary': 'Discusses the process of text preprocessing and encoding, emphasizing the importance of understanding its simplicity and the need for watching the previous video, with the final output being a tensor of shape 2768.', 'duration': 114.211, 'highlights': ['The importance of understanding the simplicity of the text preprocessing and encoding process', 'Emphasizing the need for watching the previous video for better comprehension', 'The final output being a tensor of shape 2768']}, {'end': 1054.817, 'start': 826.423, 'title': 'Bert model and cosine similarity', 'summary': 'Discusses the use of pre-trained bert model for sentence encoding and demonstrates the concept of cosine similarity using examples of word vectors, with a focus on understanding the benefits and limitations, including quantifiable data such as cosine similarity values.', 'duration': 228.394, 'highlights': ['The pre-trained BERT model is used for sentence encoding, returning encodings for simple words like banana, grapes, and mangoes, with each embedding being a vector of size 768.', 'The concept of cosine similarity is explained as a way to measure the similarity of two vectors, with a focus on the angle between vectors and how it affects the similarity, supported by examples of cosine similarity values such as 0.99 and 0.84.', 'The limitations of cosine similarity and word embeddings are highlighted, emphasizing the contextual training on Wikipedia and Google Books, with an example showing a 98% cosine similarity between Jeff Bezos and Elon Musk due to the similar context in which they appear.']}], 'duration': 509.561, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA545256.jpg', 'highlights': ['The process of creating sentence embeddings using pre-trained BERT models results in a 768-length vector for each input sentence.', 'The importance of understanding the simplicity of text preprocessing and encoding is emphasized.', 'The concept of cosine similarity is explained as a way to measure the similarity of two vectors, with a focus on the angle between vectors and how it affects the similarity, supported by examples of cosine similarity values such as 0.99 and 0.84.', 'The limitations of cosine similarity and word embeddings are highlighted, emphasizing the contextual training on Wikipedia and Google Books, with an example showing a 98% cosine similarity between Jeff Bezos and Elon Musk due to the similar context in which they appear.', 'The method of downloading pre-trained BERT models from the TensorFlow Hub website is explained, highlighting the process of creating sentence embeddings, resulting in a 768-length vector for each input sentence.', 'The pre-trained BERT model is used for sentence encoding, returning encodings for simple words like banana, grapes, and mangoes, with each embedding being a vector of size 768.', 'The process of train test split from sklearn is demonstrated, with emphasis on using stratify to ensure equal distribution of categories in the train and test samples.', 'A Python function for generating sentence embeddings using the pre-trained BERT model is presented, with the capability to handle multiple statements and return multiple 768-length vectors.', 'The final output is a tensor of shape 2768.']}, {'end': 1753.754, 'segs': [{'end': 1123.574, 'src': 'heatmap', 'start': 1054.817, 'weight': 2, 'content': [{'end': 1056.298, 'text': 'anyway, you can play with it.', 'start': 1054.817, 'duration': 1.481}, {'end': 1063.305, 'text': "but let's get back to our main objective, which is to build the classification model.", 'start': 1056.298, 'duration': 7.007}, {'end': 1068.923, 'text': 'so now we have this function, which is returning the sentence vector.', 'start': 1063.305, 'duration': 5.618}, {'end': 1071.505, 'text': 'But I mean, we are not going to use this function.', 'start': 1069.704, 'duration': 1.801}, {'end': 1079.929, 'text': 'We will be using these two pre-processing and encoding functions into our BERT layer.', 'start': 1071.525, 'duration': 8.404}, {'end': 1085.347, 'text': "Now I'm going to create model.", 'start': 1080.77, 'duration': 4.577}, {'end': 1093.614, 'text': 'so there are two ways of creating the tensorflow model one is sequential and one is functional.', 'start': 1085.347, 'duration': 8.267}, {'end': 1100.64, 'text': 'I would suggest you read this article of regarding sequential and functional model in our video.', 'start': 1093.614, 'duration': 7.026}, {'end': 1108.167, 'text': 'so far we have created sequential models, which are which look something like this there is a sequence of layers,', 'start': 1100.64, 'duration': 7.527}, {'end': 1112.827, 'text': "And you know how to create sequential model, because that's what we have done so far.", 'start': 1109.505, 'duration': 3.322}, {'end': 1116.35, 'text': 'But functional model looks something like this.', 'start': 1113.368, 'duration': 2.982}, {'end': 1123.574, 'text': "You create an input layer, then you create another layer and pass the first layer as if you're calling a function.", 'start': 1117.85, 'duration': 5.724}], 'summary': 'Objective: build a classification model using pre-processing and encoding functions, create a tensorflow model with sequential and functional approaches.', 'duration': 58.01, 'max_score': 1054.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1054817.jpg'}, {'end': 1248.204, 'src': 'embed', 'start': 1200.447, 'weight': 1, 'content': [{'end': 1206.131, 'text': 'now that input layer, i will pass it to bird preprocess.', 'start': 1200.447, 'duration': 5.684}, {'end': 1207.411, 'text': 'what is bird preprocess?', 'start': 1206.131, 'duration': 1.28}, {'end': 1208.232, 'text': 'we have seen it here.', 'start': 1207.411, 'duration': 0.821}, {'end': 1211.695, 'text': 'This is BERT preprocess and this is BERT encoder.', 'start': 1209.434, 'duration': 2.261}, {'end': 1213.915, 'text': "So I'm passing that as a text input.", 'start': 1211.895, 'duration': 2.02}, {'end': 1226.699, 'text': 'What I get is a preprocessed text and that I supplied to BERT encoder.', 'start': 1214.536, 'duration': 12.163}, {'end': 1237.943, 'text': 'OK, this is fairly straightforward and you get outputs as a result.', 'start': 1230.761, 'duration': 7.182}, {'end': 1248.204, 'text': 'so these are BERT layers, basically BERT layers.', 'start': 1240.936, 'duration': 7.268}], 'summary': 'Input passes through bert preprocess and encoder, yielding preprocessed text and bert layer outputs.', 'duration': 47.757, 'max_score': 1200.447, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1200447.jpg'}, {'end': 1461.316, 'src': 'heatmap', 'start': 1397.32, 'weight': 0.959, 'content': [{'end': 1406.702, 'text': 'So here I got my model and I will just control enter.', 'start': 1397.32, 'duration': 9.382}, {'end': 1414.854, 'text': 'Okay Just print a model summary to take a look.', 'start': 1409.623, 'duration': 5.231}, {'end': 1424.135, 'text': 'Here you see trainable parameters is 769, because 768 is my input.', 'start': 1416.237, 'duration': 7.898}, {'end': 1430.698, 'text': 'so if you look at our diagram here, so this this layer will be of size 768.', 'start': 1424.135, 'duration': 6.563}, {'end': 1436.482, 'text': 'so you need to train those parameters and one parameter for this.', 'start': 1430.698, 'duration': 5.784}, {'end': 1448.669, 'text': "so 769 and remaining parameters are non-trainable because they are coming from bird and we don't need to worry about retraining them.", 'start': 1436.482, 'duration': 12.187}, {'end': 1455.663, 'text': 'okay, now comes model compile And we will pretty much use.', 'start': 1448.669, 'duration': 6.994}, {'end': 1461.316, 'text': 'Adam is an optimizer loss binary cross entropy because our output is binary one zero.', 'start': 1456.786, 'duration': 4.53}], 'summary': 'Model has 769 trainable parameters, using adam optimizer and binary cross entropy loss.', 'duration': 63.996, 'max_score': 1397.32, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1397320.jpg'}, {'end': 1544.478, 'src': 'heatmap', 'start': 1505.933, 'weight': 0, 'content': [{'end': 1508.714, 'text': 'it is just printing those metrics.', 'start': 1505.933, 'duration': 2.781}, {'end': 1513.197, 'text': 'the actual training is driven by this loss function.', 'start': 1508.714, 'duration': 4.483}, {'end': 1519.096, 'text': 'It took some time, but eventually my accuracy came out to be 91%.', 'start': 1514.474, 'duration': 4.622}, {'end': 1522.378, 'text': 'Precision and recall is this much.', 'start': 1519.096, 'duration': 3.282}, {'end': 1529.782, 'text': 'Now I made my data set balance so I can rely on accuracy, but in general.', 'start': 1524.279, 'duration': 5.503}, {'end': 1535.393, 'text': 'if you are training on imbalanced data set, you should not rely on accuracy.', 'start': 1531.491, 'duration': 3.902}, {'end': 1544.478, 'text': 'okay, and you can watch my other video, but when you do model evaluation again, accuracy is coming to be 90 percent, which is okay.', 'start': 1535.393, 'duration': 9.085}], 'summary': 'Training achieved 91% accuracy, emphasized model evaluation using balanced dataset.', 'duration': 30.004, 'max_score': 1505.933, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1505933.jpg'}, {'end': 1678.313, 'src': 'embed', 'start': 1604.021, 'weight': 4, 'content': [{'end': 1609.083, 'text': 'So this is truth on y-axis, x-axis is predicted.', 'start': 1604.021, 'duration': 5.062}, {'end': 1616.647, 'text': '154 times, I had 0 as a truth and my model predicted that to be 0.', 'start': 1609.103, 'duration': 7.544}, {'end': 1625.411, 'text': '185 times, I had a spam email and my model predicted it to be a spam email.', 'start': 1616.647, 'duration': 8.764}, {'end': 1628.433, 'text': 'And these are the errors.', 'start': 1627.072, 'duration': 1.361}, {'end': 1640.43, 'text': 'so, on two occasion what happened was i had a spam email, but my model says it is not spam and 33 times the truth.', 'start': 1629.568, 'duration': 10.862}, {'end': 1646.331, 'text': 'basically, my email was not spam, but it said it is spam.', 'start': 1640.43, 'duration': 5.901}, {'end': 1654.513, 'text': 'all right, when a print classification report i see good accuracy, good precision and good recall.', 'start': 1646.331, 'duration': 8.182}, {'end': 1659.44, 'text': "So it's important that your F1 score is higher.", 'start': 1656.158, 'duration': 3.282}, {'end': 1665.004, 'text': 'So here in both the cases, the F1 score came out to be 90%.', 'start': 1659.52, 'duration': 5.484}, {'end': 1676.392, 'text': 'And when I do inference on couple of sentences, so see the first three looks spammy and the values are, see more than 0.5.', 'start': 1665.004, 'duration': 11.388}, {'end': 1678.313, 'text': "More than 0.5 means it's spam.", 'start': 1676.392, 'duration': 1.921}], 'summary': 'Model achieved 90% f1 score, good accuracy, precision, and recall; 33 false positives for spam emails.', 'duration': 74.292, 'max_score': 1604.021, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1604021.jpg'}, {'end': 1734.996, 'src': 'embed', 'start': 1708.82, 'weight': 5, 'content': [{'end': 1714.124, 'text': "it's a ready-made jupyter notebook, but i want you all to Try this out.", 'start': 1708.82, 'duration': 5.304}, {'end': 1719.648, 'text': 'you know, copy paste this code, try to tweak it and try to see how this is working.', 'start': 1714.124, 'duration': 5.524}, {'end': 1724.77, 'text': 'It is using some advanced APS text data set from directory, etc.', 'start': 1719.728, 'duration': 5.042}, {'end': 1726.551, 'text': 'Prefetch We have covered all of that.', 'start': 1724.95, 'duration': 1.601}, {'end': 1728.793, 'text': 'TensorFlow data set and all of that.', 'start': 1726.571, 'duration': 2.222}, {'end': 1733.415, 'text': 'And just try to understand this particular example.', 'start': 1729.533, 'duration': 3.882}, {'end': 1734.996, 'text': 'It is about movie reviews.', 'start': 1733.515, 'duration': 1.481}], 'summary': 'Try out the ready-made jupyter notebook with advanced aps text dataset and tensorflow dataset for movie reviews.', 'duration': 26.176, 'max_score': 1708.82, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1708820.jpg'}], 'start': 1054.817, 'title': 'Building tensorflow model with bert and creating neural network layers', 'summary': 'Covers building a tensorflow model with bert, comparing sequential and functional model creation, utilizing bert layers for non-sequential architecture, creating neural network layers with dropout and dense layers, achieving an accuracy of 91% with precision and recall of 90%, and concludes with an exercise to try out a tensorflow tutorial on classifying text with bert.', 'chapters': [{'end': 1248.204, 'start': 1054.817, 'title': 'Building tensorflow model with bert', 'summary': 'Discusses building a tensorflow model with bert, comparing sequential and functional model creation, and utilizing bert layers for non-sequential architecture.', 'duration': 193.387, 'highlights': ['The chapter explores creating a TensorFlow model with BERT, emphasizing the use of pre-processing and encoding functions into the BERT layer.', 'It compares sequential and functional model creation in TensorFlow, highlighting the benefits of the functional model for non-sequential architecture.', 'The use of BERT layers is explained, showcasing the process of passing input layers to BERT preprocess and BERT encoder for generating outputs.']}, {'end': 1753.754, 'start': 1248.204, 'title': 'Creating neural network layers and model', 'summary': 'Covers the creation of neural network layers with dropout and dense layers, model construction, compilation, training, evaluation, and inference, achieving an accuracy of 91% with precision and recall of 90%, and concludes with an exercise to try out a tensorflow tutorial on classifying text with bert.', 'duration': 505.55, 'highlights': ["The model achieved an accuracy of 91% with precision and recall of 90%. The model's accuracy was 91% with precision and recall both at 90%.", 'The confusion matrix revealed 154 correct predictions for non-spam emails and 185 correct predictions for spam emails, with some misclassifications. The confusion matrix showed 154 correct predictions for non-spam emails, 185 correct predictions for spam emails, and some misclassifications.', 'The F1 score for both cases was 90%. The F1 score for both cases was 90%.', 'An exercise is provided to try out a TensorFlow tutorial on classifying text with BERT. An exercise is given to try out a TensorFlow tutorial on classifying text with BERT, encouraging viewers to copy and tweak the code to understand the example.']}], 'duration': 698.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hOCDJyZ6quA/pics/hOCDJyZ6quA1054817.jpg', 'highlights': ['The model achieved an accuracy of 91% with precision and recall of 90%.', 'The use of BERT layers is explained, showcasing the process of passing input layers to BERT preprocess and BERT encoder for generating outputs.', 'It compares sequential and functional model creation in TensorFlow, highlighting the benefits of the functional model for non-sequential architecture.', 'The chapter explores creating a TensorFlow model with BERT, emphasizing the use of pre-processing and encoding functions into the BERT layer.', 'The confusion matrix revealed 154 correct predictions for non-spam emails, 185 correct predictions for spam emails, with some misclassifications.', 'An exercise is provided to try out a TensorFlow tutorial on classifying text with BERT, encouraging viewers to copy and tweak the code to understand the example.', 'The F1 score for both cases was 90%.']}], 'highlights': ['BERT converts an email sentence into a 768-length embedding vector for neural network training.', 'The neural network for email classification has a simple architecture with one dense layer and a dropout layer.', "BERT's purpose is to generate an embedding vector for the entire sentence, used for email classification.", 'The downsampling technique involves randomly selecting 747 samples from the majority class to achieve a balanced distribution, resulting in 15% spam emails and 85% ham emails.', "Creating a binary 'spam' column using a lambda function to designate 1 for spam and 0 for ham.", 'Balancing the data frame by discarding the rest of the sample to have an equal number of spam and ham emails.', 'The process of creating sentence embeddings using pre-trained BERT models results in a 768-length vector for each input sentence.', 'The importance of understanding the simplicity of text preprocessing and encoding is emphasized.', 'The concept of cosine similarity is explained as a way to measure the similarity of two vectors, with a focus on the angle between vectors and how it affects the similarity, supported by examples of cosine similarity values such as 0.99 and 0.84.', 'The limitations of cosine similarity and word embeddings are highlighted, emphasizing the contextual training on Wikipedia and Google Books, with an example showing a 98% cosine similarity between Jeff Bezos and Elon Musk due to the similar context in which they appear.', 'The method of downloading pre-trained BERT models from the TensorFlow Hub website is explained, highlighting the process of creating sentence embeddings, resulting in a 768-length vector for each input sentence.', 'The pre-trained BERT model is used for sentence encoding, returning encodings for simple words like banana, grapes, and mangoes, with each embedding being a vector of size 768.', 'The process of train test split from sklearn is demonstrated, with emphasis on using stratify to ensure equal distribution of categories in the train and test samples.', 'A Python function for generating sentence embeddings using the pre-trained BERT model is presented, with the capability to handle multiple statements and return multiple 768-length vectors.', 'The final output is a tensor of shape 2768.', 'The model achieved an accuracy of 91% with precision and recall of 90%.', 'The use of BERT layers is explained, showcasing the process of passing input layers to BERT preprocess and BERT encoder for generating outputs.', 'It compares sequential and functional model creation in TensorFlow, highlighting the benefits of the functional model for non-sequential architecture.', 'The chapter explores creating a TensorFlow model with BERT, emphasizing the use of pre-processing and encoding functions into the BERT layer.', 'The confusion matrix revealed 154 correct predictions for non-spam emails, 185 correct predictions for spam emails, with some misclassifications.', 'An exercise is provided to try out a TensorFlow tutorial on classifying text with BERT, encouraging viewers to copy and tweak the code to understand the example.', 'The F1 score for both cases was 90%.']}