title
Word Embedding and Word2Vec, Clearly Explained!!!
description
Words are great, but if we want to use them as input to a neural network, we have to convert them to numbers. One of the most popular methods for assigning numbers to words is to use a Neural Network to create Word Embeddings. In this StatQuest, we go through the steps required to create Word Embeddings, and show how we can visualize and validate them. We then talk about one of the most popular Word Embedding tools, word2vec. BAM!!!
Note, this StatQuest assumes that you are already familiar with...
The Basics of how Neural Networks Work: https://youtu.be/CqOfi41LfDw
The Basics of how Backpropagation Works: https://youtu.be/IN2XmBhILt4
How the Softmax function works: https://youtu.be/KpKog-L9veg
How Cross Entropy works: https://youtu.be/6ArSys5qHAU
If you'd like to support StatQuest, please consider...
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
4:25 Building a Neural Network to do Word Embedding
8:18 Visualizing and Validating the Word Embedding
10:42 Summary of Main Ideas
11:44 word2vec
13:36 Speeding up training with Negative Sampling
#StatQuest #word2vec
detail
{'title': 'Word Embedding and Word2Vec, Clearly Explained!!!', 'heatmap': [{'end': 367.346, 'start': 299.333, 'weight': 0.728}, {'end': 452.282, 'start': 436.095, 'weight': 0.733}, {'end': 585.059, 'start': 558.689, 'weight': 0.852}, {'end': 755.296, 'start': 699.72, 'weight': 0.781}, {'end': 857.71, 'start': 795.531, 'weight': 0.747}, {'end': 916.453, 'start': 882.671, 'weight': 0.754}], 'summary': 'Explains word embedding, word2vec, and neural network word embeddings, emphasizing the representation of similar words by similar numbers and addressing challenges in using words for machine learning. it also discusses the optimization of word2vec for more efficient word embeddings and highlights support options for statquest.', 'chapters': [{'end': 59.089, 'segs': [{'end': 59.089, 'src': 'embed', 'start': 0.089, 'weight': 0, 'content': [{'end': 18.376, 'text': 'If you want to turn words into numbers And you want those numbers to make sense, Then use word embeddings And similar words will have similar numbers.', 'start': 0.089, 'duration': 18.287}, {'end': 20.837, 'text': 'Hooray, StatQuest!.', 'start': 18.376, 'duration': 2.461}, {'end': 25.303, 'text': "Hello, I'm Josh Starmer and welcome to StatQuest.", 'start': 21.82, 'duration': 3.483}, {'end': 31.768, 'text': "Today we're going to talk about word embedding and Word2Vec, and they're going to be clearly explained.", 'start': 25.623, 'duration': 6.145}, {'end': 40.214, 'text': "Lightning makes using the cloud just as easy to use as your laptop, and that's cool.", 'start': 32.588, 'duration': 7.626}, {'end': 45.498, 'text': 'This StatQuest is also brought to you by the letters A, B, and C.', 'start': 40.734, 'duration': 4.764}, {'end': 46.259, 'text': 'A, always.', 'start': 45.498, 'duration': 0.761}, {'end': 47.199, 'text': 'B, be.', 'start': 46.439, 'duration': 0.76}, {'end': 48.5, 'text': 'C, curious.', 'start': 47.54, 'duration': 0.96}, {'end': 50.202, 'text': 'Always be curious.', 'start': 48.901, 'duration': 1.301}, {'end': 59.089, 'text': "Also, I want to give a special thanks to Alex Lavarie and the students at Boston University's SPARC for letting me test this StatQuest out on them.", 'start': 50.902, 'duration': 8.187}], 'summary': 'Word embedding and word2vec make words into numbers, explained clearly. lightning makes cloud as easy as a laptop. a, b, c: always be curious.', 'duration': 59, 'max_score': 0.089, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY089.jpg'}], 'start': 0.089, 'title': 'Word embedding and word2vec', 'summary': "Delves into word embedding and word2vec, showcasing the representation of similar words by similar numbers, and emphasizes the ease of using the cloud for fast performance. it acknowledges statquest's educational contribution and highlights the importance of curiosity.", 'chapters': [{'end': 59.089, 'start': 0.089, 'title': 'Word embedding and word2vec', 'summary': 'Explains the concept of word embedding and word2vec, demonstrating how similar words are represented by similar numbers, and emphasizes the ease of using the cloud for lightning-fast performance. statquest is also acknowledged for its educational contribution, and the importance of curiosity is highlighted.', 'duration': 59, 'highlights': ['The concept of word embedding and Word2Vec is explained, demonstrating how similar words are represented by similar numbers.', 'Emphasizing the ease of using the cloud for lightning-fast performance.', 'Acknowledgment of StatQuest for educational contribution.', 'Highlighting the importance of curiosity.']}], 'duration': 59, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY089.jpg', 'highlights': ['The concept of word embedding and Word2Vec is explained, demonstrating how similar words are represented by similar numbers.', 'Emphasizing the ease of using the cloud for lightning-fast performance.', 'Acknowledgment of StatQuest for educational contribution.', 'Highlighting the importance of curiosity.']}, {'end': 242.101, 'segs': [{'end': 116.123, 'src': 'embed', 'start': 83.846, 'weight': 0, 'content': [{'end': 89.451, 'text': "For example, Hey, Norm, isn't StatQuest awesome? It sure is, Squatch.", 'start': 83.846, 'duration': 5.605}, {'end': 97.118, 'text': "Unfortunately, a lot of machine learning algorithms, including neural networks, don't work well with words.", 'start': 90.332, 'duration': 6.786}, {'end': 100.481, 'text': "Hey, neural network isn't StatQuest?", 'start': 97.978, 'duration': 2.503}, {'end': 112.721, 'text': 'awesome?. So if we want to plug words into a neural network or some other machine learning algorithm, we need a way to turn the words into numbers.', 'start': 100.481, 'duration': 12.24}, {'end': 116.123, 'text': '24, 0.3, 5.1.', 'start': 114.023, 'duration': 2.1}], 'summary': 'Machine learning algorithms struggle with words, requiring conversion to numbers.', 'duration': 32.277, 'max_score': 83.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY083846.jpg'}, {'end': 216.595, 'src': 'embed', 'start': 177.335, 'weight': 1, 'content': [{'end': 182.438, 'text': 'And that means the neural network will probably need a lot more complexity and training,', 'start': 177.335, 'duration': 5.103}, {'end': 189.703, 'text': "because learning how to correctly process the word great won't help the neural network correctly use the word awesome.", 'start': 182.438, 'duration': 7.265}, {'end': 197.148, 'text': 'So it would be nice if similar words that are used in similar ways could be given similar numbers,', 'start': 190.564, 'duration': 6.584}, {'end': 202.132, 'text': 'so that learning how to use one word will help learn how to use the other at the same time.', 'start': 197.148, 'duration': 4.984}, {'end': 209.53, 'text': 'And because the same words can be used in different contexts, or made plural or used in some other way,', 'start': 203.126, 'duration': 6.404}, {'end': 216.595, 'text': 'it might be nice to assign each word more than one number, so that the neural network can more easily adjust to different contexts.', 'start': 209.53, 'duration': 7.065}], 'summary': 'Neural network needs more complexity and training to process words correctly and adapt to different contexts.', 'duration': 39.26, 'max_score': 177.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0177335.jpg'}], 'start': 60.47, 'title': 'Neural network word embeddings', 'summary': 'Discusses using neural networks to create word embeddings, addressing challenges in using words for machine learning and the need to convert words into numbers. it also covers the concept of converting words into numbers in neural networks, highlighting challenges of assigning random numbers to words and the need for similar words to be given similar numbers.', 'chapters': [{'end': 116.123, 'start': 60.47, 'title': 'Neural networks for word embeddings', 'summary': 'Discusses how neural networks can be used to create word embeddings, addressing challenges in using words with machine learning algorithms and the need to convert words into numbers for machine learning.', 'duration': 55.653, 'highlights': ['Neural networks are used to create word embeddings, addressing challenges with using words in machine learning algorithms.', 'Words need to be converted into numbers (e.g. 24, 0.3, 5.1) for machine learning algorithms.', 'Familiarity with neural networks, backpropagation, softmax function, and cross entropy is assumed for understanding the content.']}, {'end': 242.101, 'start': 116.125, 'title': 'Word to number conversion in neural networks', 'summary': 'Discusses the concept of converting words into numbers in neural networks, highlighting the challenges of assigning random numbers to words and the need for similar words used in similar ways to be given similar numbers, to aid in neural network training and processing.', 'duration': 125.976, 'highlights': ["Assigning random numbers to words can lead to different numbers for similar words, increasing the complexity and training needed for neural networks - e.g., 'Great' and 'Awesome' being assigned numbers 4.2 and -32.1, respectively.", 'Similar words used in similar ways need to be given similar numbers to aid in neural network training and processing, to ensure learning how to use one word helps in learning how to use the other at the same time.', "The concept of assigning each word more than one number is suggested to help the neural network adjust to different contexts, such as using the word 'great' in a positive or negative way.", 'Words used in different contexts or forms (e.g., plural) may require multiple numbers to be assigned to them, enabling the neural network to more easily adjust to the variations in language usage.']}], 'duration': 181.631, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY060470.jpg', 'highlights': ['Words need to be converted into numbers for machine learning algorithms.', 'Assigning random numbers to words can lead to different numbers for similar words, increasing complexity.', 'Similar words used in similar ways need to be given similar numbers to aid in neural network training.', 'The concept of assigning each word more than one number is suggested to help the neural network adjust to different contexts.']}, {'end': 755.296, 'segs': [{'end': 292.11, 'src': 'embed', 'start': 242.982, 'weight': 0, 'content': [{'end': 249.167, 'text': 'Hey Josh, deciding what words are similar and used in similar contexts sounds like a lot of work.', 'start': 242.982, 'duration': 6.185}, {'end': 255.613, 'text': 'And using more than one number per word to account for different contexts sounds like even more work.', 'start': 249.968, 'duration': 5.645}, {'end': 257.733, 'text': "Don't worry, Squatch.", 'start': 256.512, 'duration': 1.221}, {'end': 263.578, 'text': 'The good news is that we can get a super simple neural network to do all of the work for us.', 'start': 258.154, 'duration': 5.424}, {'end': 272.384, 'text': 'Bam! To show how we can get a super simple neural network to figure out what numbers should go with different words.', 'start': 264.319, 'duration': 8.065}, {'end': 274.446, 'text': "let's imagine we have two phrases.", 'start': 272.384, 'duration': 2.062}, {'end': 276.227, 'text': 'Troll 2 is great.', 'start': 274.926, 'duration': 1.301}, {'end': 278.669, 'text': 'And Jim Cotta is great.', 'start': 276.867, 'duration': 1.802}, {'end': 286.167, 'text': 'Anyway, both phrases have two words Troll 2 and Gymkata in similar contexts,', 'start': 279.824, 'duration': 6.343}, {'end': 292.11, 'text': "because someone and I'm not going to name names thinks that these two terrible movies are great.", 'start': 286.167, 'duration': 5.943}], 'summary': 'Using a super simple neural network to determine word contexts and numbers for words, illustrated with two phrases.', 'duration': 49.128, 'max_score': 242.982, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0242982.jpg'}, {'end': 367.346, 'src': 'heatmap', 'start': 299.333, 'weight': 0.728, 'content': [{'end': 303.115, 'text': 'the first thing we do is create inputs for each unique word.', 'start': 299.333, 'duration': 3.782}, {'end': 308.958, 'text': 'In this case, we have four unique words in the training data, so we have four inputs.', 'start': 303.935, 'duration': 5.023}, {'end': 314.261, 'text': 'Now we connect each input to at least one activation function.', 'start': 310.097, 'duration': 4.164}, {'end': 318.744, 'text': 'this activation function uses the identity function.', 'start': 315.241, 'duration': 3.503}, {'end': 323.308, 'text': 'And thus, the input value is the same as the output value.', 'start': 319.625, 'duration': 3.683}, {'end': 329.434, 'text': "In other words, this activation function doesn't do anything except give us a place to do addition.", 'start': 324.149, 'duration': 5.285}, {'end': 336.183, 'text': 'The number of activation functions corresponds to how many numbers we want to associate with each word.', 'start': 330.421, 'duration': 5.762}, {'end': 342.804, 'text': 'And the weights on those connections will, ultimately, be the numbers that we associate with each word.', 'start': 336.923, 'duration': 5.881}, {'end': 348.106, 'text': 'Now, in this example, we want to associate two numbers with each word.', 'start': 343.765, 'duration': 4.341}, {'end': 351.787, 'text': "So that means we'll use two activation functions.", 'start': 348.966, 'duration': 2.821}, {'end': 358.869, 'text': 'And the weights on the connection to the second activation function will be another number associated with each word.', 'start': 352.667, 'duration': 6.202}, {'end': 367.346, 'text': "However, like always, these weights start out with random values that we'll optimize with backpropagation.", 'start': 359.844, 'duration': 7.502}], 'summary': 'Creating inputs for unique words, using activation functions to associate numbers, optimizing with backpropagation.', 'duration': 68.013, 'max_score': 299.333, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0299333.jpg'}, {'end': 372.327, 'src': 'embed', 'start': 343.765, 'weight': 5, 'content': [{'end': 348.106, 'text': 'Now, in this example, we want to associate two numbers with each word.', 'start': 343.765, 'duration': 4.341}, {'end': 351.787, 'text': "So that means we'll use two activation functions.", 'start': 348.966, 'duration': 2.821}, {'end': 358.869, 'text': 'And the weights on the connection to the second activation function will be another number associated with each word.', 'start': 352.667, 'duration': 6.202}, {'end': 367.346, 'text': "However, like always, these weights start out with random values that we'll optimize with backpropagation.", 'start': 359.844, 'duration': 7.502}, {'end': 372.327, 'text': 'Now, in order to do backpropagation, we have to make predictions.', 'start': 368.266, 'duration': 4.061}], 'summary': 'Associating two numbers with each word using two activation functions, optimizing weights with backpropagation for predictions', 'duration': 28.562, 'max_score': 343.765, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0343765.jpg'}, {'end': 465.128, 'src': 'heatmap', 'start': 436.095, 'weight': 0.733, 'content': [{'end': 440.757, 'text': 'In order to make these predictions, we connect the activation functions to outputs.', 'start': 436.095, 'duration': 4.662}, {'end': 445.639, 'text': 'And we add weights to those connections with random initialization values.', 'start': 441.597, 'duration': 4.042}, {'end': 452.282, 'text': 'And then we run the outputs through the softmax function because we have multiple outputs for classification.', 'start': 446.5, 'duration': 5.782}, {'end': 457.764, 'text': 'And that means we can use the cross entropy loss function for backpropagation.', 'start': 453.122, 'duration': 4.642}, {'end': 465.128, 'text': 'Again, the goal is to train this neural network so that it correctly predicts the next word in a phrase.', 'start': 458.705, 'duration': 6.423}], 'summary': 'Neural network trained to predict next word in a phrase', 'duration': 29.033, 'max_score': 436.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0436095.jpg'}, {'end': 585.059, 'src': 'heatmap', 'start': 558.689, 'weight': 0.852, 'content': [{'end': 567.885, 'text': 'However, because both words appear in the same context in the training data, We hope that back propagation will make their weights more similar.', 'start': 558.689, 'duration': 9.196}, {'end': 573.31, 'text': 'So we start with these weights and we end up with these new weights.', 'start': 568.766, 'duration': 4.544}, {'end': 579.355, 'text': 'The new weights on the connections from the inputs to the activation functions are the word embeddings.', 'start': 574.19, 'duration': 5.165}, {'end': 585.059, 'text': 'And when we plot the words using the new weights, which are now the embeddings,', 'start': 580.276, 'duration': 4.783}], 'summary': 'Training data context improves word weights for similarity', 'duration': 26.37, 'max_score': 558.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0558689.jpg'}, {'end': 755.296, 'src': 'heatmap', 'start': 677.148, 'weight': 3, 'content': [{'end': 685.054, 'text': 'Lastly, having similar words with similar embeddings means training a neural network to process language is easier,', 'start': 677.148, 'duration': 7.906}, {'end': 689.658, 'text': 'because learning how one word is used helps learn how similar words are used.', 'start': 685.054, 'duration': 4.604}, {'end': 698.82, 'text': "Double bam! Now, so far, we've shown we can train a neural network to predict the next word in each phrase.", 'start': 690.578, 'duration': 8.242}, {'end': 704.901, 'text': "But just predicting the next word doesn't give us a lot of context to understand each one.", 'start': 699.72, 'duration': 5.181}, {'end': 714.984, 'text': "So, now let's learn about the two strategies that Word2Vec, a popular method for creating word embeddings, uses to include more context.", 'start': 706.082, 'duration': 8.902}, {'end': 724.958, 'text': 'The first method, called Continuous Bag of Words, increases the context by using the surrounding words to predict what occurs in the middle.', 'start': 715.916, 'duration': 9.042}, {'end': 736.021, 'text': 'For example, the Continuous Bag of Words method could use the words troll to and great to predict the word that occurs between them, is.', 'start': 725.759, 'duration': 10.262}, {'end': 744.844, 'text': 'The second method, called Skip Gram, increases the context by using the word in the middle to predict the surrounding words.', 'start': 737.042, 'duration': 7.802}, {'end': 755.296, 'text': 'For example, the skip gram method could use the word is to predict the surrounding words, troll two, great, and gym kata.', 'start': 745.728, 'duration': 9.568}], 'summary': 'Word2vec uses two strategies to include more context: continuous bag of words and skip gram.', 'duration': 47.81, 'max_score': 677.148, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0677148.jpg'}], 'start': 242.982, 'title': 'Neural networks for word embeddings and contextual word usage', 'summary': 'Covers utilizing a simple neural network for contextual word usage with examples, and creating a neural network for word embeddings using activation functions and backpropagation, including details on training, generating word embeddings, and using word2vec.', 'chapters': [{'end': 292.11, 'start': 242.982, 'title': 'Neural network for contextual word usage', 'summary': 'Discusses utilizing a super simple neural network to determine appropriate numerical representations for words in different contexts, using the example of two phrases with similar contexts despite different words, troll 2 and gymkata.', 'duration': 49.128, 'highlights': ['By utilizing a super simple neural network, we can determine appropriate numerical representations for words in different contexts, as demonstrated by the example of two phrases, Troll 2 and Gymkata, having similar contexts despite different words.', 'Using more than one number per word to account for different contexts can be efficiently handled by a neural network, reducing the manual workload typically involved in determining similar words and their contexts.', "The example of the phrases 'Troll 2 is great' and 'Jim Cotta is great' showcases how different words, Troll 2 and Gymkata, can be recognized as having similar contexts due to someone's positive perception of these movies, demonstrating the effectiveness of the neural network."]}, {'end': 755.296, 'start': 293.11, 'title': 'Neural network for word embeddings', 'summary': 'Discusses creating a neural network for word embeddings using activation functions and backpropagation to associate numbers with words, training the network to predict the next word with optimized weights, generating word embeddings, and using word2vec to include more context for word embeddings.', 'duration': 462.186, 'highlights': ['Using activation functions and backpropagation to associate numbers with words, training the network to predict the next word with optimized weights The neural network uses activation functions and backpropagation to assign numbers to words, with the goal of predicting the next word in a phrase.', 'Generating word embeddings and making predictions using the trained network After training, the neural network generates word embeddings and accurately predicts the next word for input words, demonstrating the effectiveness of the training.', 'Explaining the strategies Continuous Bag of Words and Skip Gram used by Word2Vec to include more context for word embeddings The chapter introduces the strategies Continuous Bag of Words and Skip Gram employed by Word2Vec to provide more context for word embeddings, enhancing the understanding of words within their context.']}], 'duration': 512.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0242982.jpg', 'highlights': ["The example of the phrases 'Troll 2 is great' and 'Jim Cotta is great' showcases how different words, Troll 2 and Gymkata, can be recognized as having similar contexts due to someone's positive perception of these movies, demonstrating the effectiveness of the neural network.", 'Using more than one number per word to account for different contexts can be efficiently handled by a neural network, reducing the manual workload typically involved in determining similar words and their contexts.', 'By utilizing a super simple neural network, we can determine appropriate numerical representations for words in different contexts, as demonstrated by the example of two phrases, Troll 2 and Gymkata, having similar contexts despite different words.', 'Explaining the strategies Continuous Bag of Words and Skip Gram used by Word2Vec to include more context for word embeddings The chapter introduces the strategies Continuous Bag of Words and Skip Gram employed by Word2Vec to provide more context for word embeddings, enhancing the understanding of words within their context.', 'Generating word embeddings and making predictions using the trained network After training, the neural network generates word embeddings and accurately predicts the next word for input words, demonstrating the effectiveness of the training.', 'Using activation functions and backpropagation to associate numbers with words, training the network to predict the next word with optimized weights The neural network uses activation functions and backpropagation to assign numbers to words, with the goal of predicting the next word in a phrase.']}, {'end': 970.772, 'segs': [{'end': 857.71, 'src': 'heatmap', 'start': 779.137, 'weight': 2, 'content': [{'end': 789.466, 'text': 'Thus, instead of just having a vocabulary of four words and phrases, Word2Vec might have a vocabulary of about three million words and phrases.', 'start': 779.137, 'duration': 10.329}, {'end': 795.531, 'text': 'Thus, the total number of weights in this neural network that we need to optimize is.', 'start': 790.387, 'duration': 5.144}, {'end': 799.334, 'text': '3 million words and phrases.', 'start': 795.531, 'duration': 3.803}, {'end': 805.779, 'text': 'times at least 100, the number of weights each word has going to the activation functions.', 'start': 799.334, 'duration': 6.445}, {'end': 811.343, 'text': 'times 2, for the weights that get us from the activation functions to the outputs.', 'start': 805.779, 'duration': 5.564}, {'end': 813.885, 'text': 'for a total of 600 million weights.', 'start': 811.343, 'duration': 2.542}, {'end': 816.407, 'text': 'So training can be slow.', 'start': 814.866, 'duration': 1.541}, {'end': 823.288, 'text': 'However, one way that Word2vec speeds things up is to use something called negative sampling.', 'start': 817.506, 'duration': 5.782}, {'end': 830.81, 'text': "Negative sampling works by randomly selecting a subset of words we don't want to predict for optimization.", 'start': 824.268, 'duration': 6.542}, {'end': 837.551, 'text': 'For example, say like we wanted the word aardvark to predict the word a.', 'start': 831.77, 'duration': 5.781}, {'end': 842.493, 'text': 'That means that only the word aardvark has a 1 in it, and all of the other words have 0s.', 'start': 837.551, 'duration': 4.942}, {'end': 851.746, 'text': 'And that means we can ignore the weights coming from every word but aardvark, because the other words multiply their weights by zero.', 'start': 844.042, 'duration': 7.704}, {'end': 857.71, 'text': 'That alone removes close to 300 million weights from this optimization step.', 'start': 852.687, 'duration': 5.023}], 'summary': 'Word2vec has a vocabulary of 3 million words and 600 million weights, but negative sampling reduces it by 300 million weights.', 'duration': 51.673, 'max_score': 779.137, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0779137.jpg'}, {'end': 916.453, 'src': 'heatmap', 'start': 858.57, 'weight': 0, 'content': [{'end': 863.313, 'text': 'However, we still have 300 million weights after the activation functions.', 'start': 858.57, 'duration': 4.743}, {'end': 872.761, 'text': "So, because we want to predict the word A, then we don't want to predict aardvark, abandon, and all of the other words.", 'start': 864.292, 'duration': 8.469}, {'end': 880.969, 'text': "So, for this example, let's imagine WordDevec randomly selects abandon as a word we don't want to predict.", 'start': 873.822, 'duration': 7.147}, {'end': 888.476, 'text': "In practice, Word2Vec would select between 2 and 20 words that we don't want to predict.", 'start': 882.671, 'duration': 5.805}, {'end': 893.401, 'text': 'However, in this example, we just select one, abandon.', 'start': 889.477, 'duration': 3.924}, {'end': 899.546, 'text': 'So now, Word2Vec only uses the output values for a and abandon.', 'start': 894.462, 'duration': 5.084}, {'end': 906.993, 'text': 'And that means, for this round of backpropagation, we can ignore the weights that lead to all of the other possible outputs.', 'start': 900.567, 'duration': 6.426}, {'end': 916.453, 'text': 'So, in the end, out of the 600 million total weights in this neural network, we only optimize 300 per step.', 'start': 908.286, 'duration': 8.167}], 'summary': 'In word2vec, out of 600m weights, only 300m are optimized per step.', 'duration': 29.906, 'max_score': 858.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0858570.jpg'}, {'end': 970.772, 'src': 'embed', 'start': 944.864, 'weight': 5, 'content': [{'end': 946.206, 'text': "There's something for everyone.", 'start': 944.864, 'duration': 1.342}, {'end': 950.874, 'text': "Hooray! We've made it to the end of another exciting StatQuest.", 'start': 946.991, 'duration': 3.883}, {'end': 954.377, 'text': 'If you like this StatQuest and want to see more, please subscribe.', 'start': 951.295, 'duration': 3.082}, {'end': 960.903, 'text': 'And if you want to support StatQuest, consider contributing to my Patreon campaign, becoming a channel member,', 'start': 954.838, 'duration': 6.065}, {'end': 965.227, 'text': 'buying one or two of my original songs or a t-shirt or a hoodie, or just donate.', 'start': 960.903, 'duration': 4.324}, {'end': 967.109, 'text': 'The links are in the description below.', 'start': 965.547, 'duration': 1.562}, {'end': 970.772, 'text': 'Alright, until next time, quest on!.', 'start': 967.729, 'duration': 3.043}], 'summary': 'Statquest offers various ways to support and enjoy their content, encouraging subscriptions and contributions through patreon and merchandise purchases.', 'duration': 25.908, 'max_score': 944.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0944864.jpg'}], 'start': 756.077, 'title': 'Optimizing word2vec for efficient word embeddings', 'summary': 'Discusses optimizing word2vec by reducing the number of optimized weights from 600 million to 300 per step, leading to more efficient word embeddings. it also covers the support options for statquest such as subscribing, contributing to patreon campaign, becoming a channel member, buying original songs, t-shirts, or hoodies, and donating.', 'chapters': [{'end': 830.81, 'start': 756.077, 'title': 'Word2vec: scaling and optimization', 'summary': 'Discusses the scaling of word2vec, where instead of using two activation functions, 100 or more are used to create a vast number of embeddings per word, with a vocabulary of about three million words and phrases and a total of 600 million weights in the neural network. training can be slow, but word2vec speeds things up using negative sampling.', 'duration': 74.733, 'highlights': ['Word2Vec uses 100 or more activation functions to create numerous embeddings per word, resulting in a vocabulary of about three million words and phrases.', 'The total number of weights in the neural network for Word2Vec is 600 million, leading to slow training.', 'Word2Vec utilizes negative sampling to optimize by randomly selecting a subset of words to predict.']}, {'end': 944.424, 'start': 831.77, 'title': 'Optimizing word2vec for efficient word embeddings', 'summary': 'Explains how word2vec efficiently creates word embeddings by selecting specific words for prediction, reducing the number of optimized weights from 600 million to 300 per step.', 'duration': 112.654, 'highlights': ["Word2Vec efficiently reduces the number of optimized weights from 600 million to 300 per step by selecting specific words for prediction, such as abandoning 2-20 words that we don't want to predict (e.g., aardvark, abandon), thus streamlining the optimization process.", 'By selecting specific words for prediction, Word2Vec can ignore the weights from all other words, removing close to 300 million weights from the optimization step, leading to a more efficient process.']}, {'end': 970.772, 'start': 944.864, 'title': 'Supporting statquest', 'summary': 'Discusses supporting statquest by subscribing, contributing to patreon campaign, becoming a channel member, buying original songs, t-shirts, or hoodies, and donating.', 'duration': 25.908, 'highlights': ['The chapter emphasizes supporting StatQuest by suggesting to subscribe, contribute to the Patreon campaign, become a channel member, buy original songs, t-shirts, or hoodies, and donate.', 'The end of the exciting StatQuest is marked with a call for support, including subscribing and contributing to the Patreon campaign.']}], 'duration': 214.695, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/viZrOnJclY0/pics/viZrOnJclY0756077.jpg', 'highlights': ["Word2Vec efficiently reduces the number of optimized weights from 600 million to 300 per step by selecting specific words for prediction, such as abandoning 2-20 words that we don't want to predict (e.g., aardvark, abandon), thus streamlining the optimization process.", 'By selecting specific words for prediction, Word2Vec can ignore the weights from all other words, removing close to 300 million weights from the optimization step, leading to a more efficient process.', 'Word2Vec uses 100 or more activation functions to create numerous embeddings per word, resulting in a vocabulary of about three million words and phrases.', 'The total number of weights in the neural network for Word2Vec is 600 million, leading to slow training.', 'Word2Vec utilizes negative sampling to optimize by randomly selecting a subset of words to predict.', 'The chapter emphasizes supporting StatQuest by suggesting to subscribe, contribute to the Patreon campaign, become a channel member, buy original songs, t-shirts, or hoodies, and donate.', 'The end of the exciting StatQuest is marked with a call for support, including subscribing and contributing to the Patreon campaign.']}], 'highlights': ['The concept of word embedding and Word2Vec is explained, demonstrating how similar words are represented by similar numbers.', "The example of the phrases 'Troll 2 is great' and 'Jim Cotta is great' showcases how different words, Troll 2 and Gymkata, can be recognized as having similar contexts due to someone's positive perception of these movies, demonstrating the effectiveness of the neural network.", 'Explaining the strategies Continuous Bag of Words and Skip Gram used by Word2Vec to include more context for word embeddings The chapter introduces the strategies Continuous Bag of Words and Skip Gram employed by Word2Vec to provide more context for word embeddings, enhancing the understanding of words within their context.', "Word2Vec efficiently reduces the number of optimized weights from 600 million to 300 per step by selecting specific words for prediction, such as abandoning 2-20 words that we don't want to predict (e.g., aardvark, abandon), thus streamlining the optimization process."]}