title

LSTM Networks - The Math of Intelligence (Week 8)

description

Recurrent Networks can be improved to remember long range dependencies by using whats called a Long-Short Term Memory (LSTM) Cell. Let's build one using just numpy! I'll go over the cell components as well as the forward and backward pass logic.
Code for this video:
https://github.com/llSourcell/LSTM_Networks
Please Subscribe! And like. And comment. Thats what keeps me going.
More learning resources:
https://www.youtube.com/watch?v=ftMq5ps503w
https://www.youtube.com/watch?v=cdLUzrjnlr4
https://www.youtube.com/watch?v=hWgGJeAvLws
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/
And please support me on Patreon:
https://www.patreon.com/user?u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
https://www.wagergpt.co

detail

{'title': 'LSTM Networks - The Math of Intelligence (Week 8)', 'heatmap': [{'end': 1027.753, 'start': 946.582, 'weight': 0.756}, {'end': 2678.736, 'start': 2651.209, 'weight': 0.979}], 'summary': 'Explores building lstm networks using numpy to generate eminem lyrics, delves into recurrent networks, the vanishing gradient problem, long-term dependencies, training through back propagation, and the significance of lstm cells in neural networks, covering applications in nlp, image classification, and text prediction.', 'chapters': [{'end': 86.05, 'segs': [{'end': 68.946, 'src': 'embed', 'start': 0.109, 'weight': 0, 'content': [{'end': 9.596, 'text': "Hello world, it's Siraj and our task today is to build a recurrent network, a type of recurrent network called an LSTM or long,", 'start': 0.109, 'duration': 9.487}, {'end': 13.179, 'text': 'short-term memory network to generate Eminem lyrics.', 'start': 9.596, 'duration': 3.583}, {'end': 20.525, 'text': "That's our task, and we're going to build this without using any libraries, just NumPy, which is for matrix math,", 'start': 13.479, 'duration': 7.046}, {'end': 24.387, 'text': 'because we want to learn about the math behind LSTM networks.', 'start': 20.525, 'duration': 3.862}, {'end': 34.305, 'text': "We've talked about recurrent networks earlier on in the series, and LSTMs are the next logical step in that progression of neural network learnings.", 'start': 25.308, 'duration': 8.997}, {'end': 36.209, 'text': "So that's what we're going to do today.", 'start': 34.926, 'duration': 1.283}, {'end': 39.49, 'text': "We're gonna first talk about recurring networks.", 'start': 38.049, 'duration': 1.441}, {'end': 46.796, 'text': "We're gonna do a little refresher on what recurring networks are, how we can improve them, and then we'll talk about LSTM networks and how they are,", 'start': 39.51, 'duration': 7.286}, {'end': 49.177, 'text': 'improvements and the mathematics behind them.', 'start': 46.796, 'duration': 2.381}, {'end': 50.879, 'text': "Then we'll build all of this.", 'start': 49.217, 'duration': 1.662}, {'end': 56.483, 'text': "Build as in we're going to look, I'm gonna explain the code, because there's quite a lot of code to go over here.", 'start': 51.359, 'duration': 5.124}, {'end': 57.404, 'text': "There's a lot of code.", 'start': 56.743, 'duration': 0.661}, {'end': 68.946, 'text': "But we're going to look at all the code and then I'm going to code out manually the forward propagation parts for both the greater recurrent network and then the LSTM cell itself.", 'start': 58.624, 'duration': 10.322}], 'summary': 'Siraj will build an lstm network to generate eminem lyrics using numpy for matrix math and will explain the code manually.', 'duration': 68.837, 'max_score': 0.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY109.jpg'}], 'start': 0.109, 'title': 'Building lstm network for generating eminem lyrics', 'summary': 'Focuses on building a long short-term memory (lstm) network using only numpy to generate eminem lyrics, providing an overview of recurrent networks, discussing improvements and the mathematics behind lstms, and manually coding the forward propagation parts for both the recurrent network and the lstm cell.', 'chapters': [{'end': 86.05, 'start': 0.109, 'title': 'Building lstm network for generating eminem lyrics', 'summary': 'Focuses on building a long short-term memory (lstm) network using only numpy to generate eminem lyrics, providing an overview of recurrent networks, discussing improvements and the mathematics behind lstms, and manually coding the forward propagation parts for both the recurrent network and the lstm cell.', 'duration': 85.941, 'highlights': ['The chapter focuses on building a long short-term memory (LSTM) network using only NumPy to generate Eminem lyrics. This is the primary goal of the chapter and sets the context for the entire discussion.', 'Providing an overview of recurrent networks, discussing improvements and the mathematics behind LSTMs. The chapter covers the basics of recurrent networks, their enhancements, and delves into the mathematical aspects of LSTM networks.', 'Manually coding the forward propagation parts for both the recurrent network and the LSTM cell. The chapter involves manual coding of the forward propagation parts for both the recurrent network and the LSTM cell, emphasizing a hands-on approach to understanding the implementation.']}], 'duration': 85.941, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY109.jpg', 'highlights': ['The chapter focuses on building a long short-term memory (LSTM) network using only NumPy to generate Eminem lyrics.', 'Providing an overview of recurrent networks, discussing improvements and the mathematics behind LSTMs.', 'Manually coding the forward propagation parts for both the recurrent network and the LSTM cell.']}, {'end': 445.512, 'segs': [{'end': 147.561, 'src': 'embed', 'start': 108.266, 'weight': 0, 'content': [{'end': 108.746, 'text': 'We know this.', 'start': 108.266, 'duration': 0.48}, {'end': 110.706, 'text': "They're useful for learning sequential data.", 'start': 108.806, 'duration': 1.9}, {'end': 120.709, 'text': "A series of video frames, text, music, anything that is a sequence of data, that's where recurrent networks do really well.", 'start': 111.567, 'duration': 9.142}, {'end': 121.969, 'text': "That's what they're made for.", 'start': 121.049, 'duration': 0.92}, {'end': 122.91, 'text': "They're very simple.", 'start': 122.149, 'duration': 0.761}, {'end': 127.351, 'text': 'You have your input, your input data, and then you have your hidden state, and then you have an output.', 'start': 122.95, 'duration': 4.401}, {'end': 133.714, 'text': 'And so the difference between recurrent nets and feedforward nets are that recurrent nets have something different here.', 'start': 127.811, 'duration': 5.903}, {'end': 139.917, 'text': "So in a normal feedforward net, we would have our input, our hidden layer, and then our output layer, and that's it.", 'start': 133.974, 'duration': 5.943}, {'end': 143.599, 'text': 'And then we would have two weight matrices between each of these layers.', 'start': 139.957, 'duration': 3.642}, {'end': 147.561, 'text': 'that are, those are matrices, right that we multiply right?', 'start': 143.599, 'duration': 3.962}], 'summary': 'Recurrent networks excel in processing sequential data, utilizing input, hidden state, and output, with a distinction from feedforward nets in terms of weight matrices.', 'duration': 39.295, 'max_score': 108.266, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY108266.jpg'}, {'end': 332.762, 'src': 'embed', 'start': 303.731, 'weight': 4, 'content': [{'end': 306.733, 'text': "and no, I'm just kidding, the problem is that one with whatever.", 'start': 303.731, 'duration': 3.002}, {'end': 307.554, 'text': "so here's the problem.", 'start': 306.733, 'duration': 0.821}, {'end': 309.155, 'text': 'the problem is actually really interesting.', 'start': 307.554, 'duration': 1.601}, {'end': 311.157, 'text': "it's called the vanishing gradient problem.", 'start': 309.155, 'duration': 2.002}, {'end': 312.738, 'text': "ok, so it's called the vanishing gradient problem.", 'start': 311.157, 'duration': 1.581}, {'end': 315.158, 'text': "so let's say you're like what is this?", 'start': 313.118, 'duration': 2.04}, {'end': 320.88, 'text': "so let's say we, we are trying to predict the next word in a sequence of text, which is actually what we are trying to do right now.", 'start': 315.158, 'duration': 5.722}, {'end': 323.46, 'text': "duh, but let's say that's what we're trying to do.", 'start': 320.88, 'duration': 2.58}, {'end': 327.601, 'text': "and let's say we're trying to predict the last word in this sentence, the grass is green.", 'start': 323.46, 'duration': 4.141}, {'end': 329.221, 'text': "right. so all we're given.", 'start': 327.601, 'duration': 1.62}, {'end': 332.762, 'text': "so we know that the word is, you know, green, but let's say we're trying to predict it.", 'start': 329.221, 'duration': 3.541}], 'summary': 'The vanishing gradient problem in predicting text sequences.', 'duration': 29.031, 'max_score': 303.731, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY303731.jpg'}], 'start': 86.77, 'title': 'Recurrent networks and the vanishing gradient problem', 'summary': 'Explains the concept of recurrent networks and their usefulness in learning sequential data, as well as delves into the vanishing gradient problem and its impact on predicting distant words in a sequence.', 'chapters': [{'end': 164.771, 'start': 86.77, 'title': 'Understanding recurrent networks', 'summary': 'Explains the concept of recurrent networks, highlighting their usefulness in learning sequential data like video frames, text, and music, and the key difference between recurrent and feedforward networks.', 'duration': 78.001, 'highlights': ['Recurrent networks are useful for learning sequential data such as video frames, text, and music, and are designed for this purpose.', 'The key difference between recurrent and feedforward networks lies in the addition of another weight matrix in recurrent networks.', 'Recurrent networks have input data, hidden state, and output, while feedforward networks have input, hidden layer, and output layer with weight matrices in between.']}, {'end': 445.512, 'start': 164.771, 'title': 'Recurrent networks and the vanishing gradient problem', 'summary': 'Explains the concept of recurrent networks as a chain of feed forward networks, and delves into the vanishing gradient problem, using examples to illustrate its impact on predicting distant words in a sequence.', 'duration': 280.741, 'highlights': ['Recurrent networks are explained as a series of feed forward networks, where both input data points and hidden states are fed in at every time step for training. This method of training involves feeding in both the input data and the hidden state at every time step, creating a series of feed forward networks.', 'The vanishing gradient problem is introduced, affecting the ability of recurrent networks to predict distant words in a sequence. The vanishing gradient problem is described as a challenge for recurrent networks in predicting distant words in a sequence, illustrated through examples of predicting words with varying distances in a text.']}], 'duration': 358.742, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY86770.jpg', 'highlights': ['Recurrent networks are useful for learning sequential data such as video frames, text, and music, and are designed for this purpose.', 'The key difference between recurrent and feedforward networks lies in the addition of another weight matrix in recurrent networks.', 'Recurrent networks have input data, hidden state, and output, while feedforward networks have input, hidden layer, and output layer with weight matrices in between.', 'Recurrent networks are explained as a series of feed forward networks, where both input data points and hidden states are fed in at every time step for training.', 'The vanishing gradient problem is introduced, affecting the ability of recurrent networks to predict distant words in a sequence.']}, {'end': 809.017, 'segs': [{'end': 510.734, 'src': 'embed', 'start': 445.733, 'weight': 0, 'content': [{'end': 447.474, 'text': "So of course he's going to speak fluent French.", 'start': 445.733, 'duration': 1.741}, {'end': 453.479, 'text': "So we've got to find a way for our network to be able to remember long-term dependencies.", 'start': 447.834, 'duration': 5.645}, {'end': 457.383, 'text': 'And so the whole idea behind recurrent networks.', 'start': 454.901, 'duration': 2.482}, {'end': 465.689, 'text': 'the whole reason that we feed in the hidden state from the previous time step for every new iteration is so that we can have a form of neural memory.', 'start': 457.383, 'duration': 8.306}, {'end': 469.392, 'text': "That's why we don't just feed in the previous input.", 'start': 466.49, 'duration': 2.902}, {'end': 472.633, 'text': 'But we feed it in the previous hidden state.', 'start': 470.072, 'duration': 2.561}, {'end': 479.194, 'text': 'because the hidden state is that matrix that represents the learnings at the learnings of the network,', 'start': 472.633, 'duration': 6.561}, {'end': 484.496, 'text': 'so that we can Give it a form of memory like what is remember before and the new data points.', 'start': 479.194, 'duration': 5.302}, {'end': 487.056, 'text': "but the problem and you would think, okay, So that's all you need, right.", 'start': 484.496, 'duration': 2.56}, {'end': 487.516, 'text': 'of course,', 'start': 487.056, 'duration': 0.46}, {'end': 497.079, 'text': 'the hidden state is going to be updated more and more and the whole idea of Recurrence is made so that we have a form of neural memory for sequences specifically.', 'start': 487.516, 'duration': 9.563}, {'end': 504.351, 'text': 'but the problem is Now, whenever we are back propagating, the gradient tends to vanish.', 'start': 497.079, 'duration': 7.272}, {'end': 506.072, 'text': "So it's called the vanishing gradient problem.", 'start': 504.491, 'duration': 1.581}, {'end': 506.992, 'text': 'So let me explain this.', 'start': 506.412, 'duration': 0.58}, {'end': 510.734, 'text': "So whenever we're forward propagating, we have our input data right?", 'start': 507.352, 'duration': 3.382}], 'summary': 'Recurrent networks maintain neural memory for sequences, but face vanishing gradient problem.', 'duration': 65.001, 'max_score': 445.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY445733.jpg'}, {'end': 576.123, 'src': 'embed', 'start': 545.908, 'weight': 3, 'content': [{'end': 552.555, 'text': 'and then we use that error value to compute the partial derivative with respect to our weights going backwards in our network recursively.', 'start': 545.908, 'duration': 6.647}, {'end': 557.821, 'text': "We're performing gradient descent or, in the context of neural networks, back propagation,", 'start': 552.876, 'duration': 4.945}, {'end': 561.565, 'text': 'because we are back propagating an error gradient across every layer.', 'start': 557.821, 'duration': 3.744}, {'end': 568.697, 'text': 'But what happens is that as the gradient Remember, this is a chain of operations, the chain rule.', 'start': 562.266, 'duration': 6.431}, {'end': 576.123, 'text': 'What happens is, as we are propagating this gradient value backwards across every layer,', 'start': 569.658, 'duration': 6.465}], 'summary': 'Backward propagation computes partial derivative with respect to weights in neural networks.', 'duration': 30.215, 'max_score': 545.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY545908.jpg'}, {'end': 711.703, 'src': 'embed', 'start': 679.459, 'weight': 4, 'content': [{'end': 679.759, 'text': "It's good.", 'start': 679.459, 'duration': 0.3}, {'end': 687.426, 'text': "It's good for us and so So that's the whole point of performing gradient descent, aka back propagation.", 'start': 679.819, 'duration': 7.607}, {'end': 688.947, 'text': 'And so the gradient gets smaller.', 'start': 687.787, 'duration': 1.16}, {'end': 700.975, 'text': 'And so what this means is that the magnitude of change in the first layers of the network is going to be smaller than the magnitude of change in the tail end of the network,', 'start': 689.268, 'duration': 11.707}, {'end': 701.776, 'text': 'the last layers.', 'start': 700.975, 'duration': 0.801}, {'end': 707.019, 'text': 'So the last layers are going to be more affected by the change.', 'start': 702.356, 'duration': 4.663}, {'end': 711.703, 'text': 'But the first layers are going to be not as affected because the gradient update is smaller.', 'start': 707.24, 'duration': 4.463}], 'summary': 'Performing gradient descent reduces magnitude of change in network layers, affecting last layers more than first layers.', 'duration': 32.244, 'max_score': 679.459, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY679459.jpg'}, {'end': 747.725, 'src': 'embed', 'start': 723.574, 'weight': 5, 'content': [{'end': 730.329, 'text': 'and If E of these factors is smaller than one, the gradients may vanish in time.', 'start': 723.574, 'duration': 6.755}, {'end': 733.092, 'text': 'If larger than one, an exploding might happen.', 'start': 730.429, 'duration': 2.663}, {'end': 735.074, 'text': "So that's called the exploding gradient problem.", 'start': 733.132, 'duration': 1.942}, {'end': 736.635, 'text': 'The gradient itself is too big.', 'start': 735.134, 'duration': 1.501}, {'end': 737.976, 'text': 'So it can go either direction.', 'start': 736.935, 'duration': 1.041}, {'end': 739.298, 'text': "Usually it's a vanishing gradient.", 'start': 738.056, 'duration': 1.242}, {'end': 747.725, 'text': 'But yeah, this is a problem, right? We want to somehow maintain that gradient value as we are back propagating.', 'start': 739.738, 'duration': 7.987}], 'summary': 'Exploding gradient problem occurs when e is larger than one, causing the gradient to become too big and potentially go in either direction.', 'duration': 24.151, 'max_score': 723.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY723574.jpg'}, {'end': 801.492, 'src': 'embed', 'start': 770.525, 'weight': 6, 'content': [{'end': 772.667, 'text': 'maintain, remember, you know whatever you want to call it.', 'start': 770.525, 'duration': 2.142}, {'end': 778.513, 'text': 'And so the solution for this is called using an LSTM cell, or a long short-term memory cell.', 'start': 772.987, 'duration': 5.526}, {'end': 781.957, 'text': 'And if you think about it, the knowledge of our recurrent network is pretty chaotic.', 'start': 778.814, 'duration': 3.143}, {'end': 788.604, 'text': "Like, let's say, we're trying to you know, caption a video, a set of a video, right?", 'start': 782.237, 'duration': 6.367}, {'end': 792.826, 'text': "And it sees a guy and he's eating a burger and the Statue of Liberty is behind him.", 'start': 788.944, 'duration': 3.882}, {'end': 796.629, 'text': 'So then the network thinks, okay, he must be in the United States.', 'start': 792.866, 'duration': 3.763}, {'end': 801.492, 'text': "But then he's eating sushi and it thinks, oh, he must be in Japan just because it's seeing sushi.", 'start': 796.929, 'duration': 4.563}], 'summary': 'Using lstm cell to address chaotic knowledge in recurrent network.', 'duration': 30.967, 'max_score': 770.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY770525.jpg'}], 'start': 445.733, 'title': 'Recurrent neural networks and training', 'summary': 'Explains recurrent networks for long-term dependencies, addressing the vanishing gradient problem and the use of hidden states. it also discusses neural network training through back propagation, the impact of gradient descent on weight updates, and the use of lstm cells to address vanishing or exploding gradients.', 'chapters': [{'end': 510.734, 'start': 445.733, 'title': 'Recurrent neural networks', 'summary': 'Explains the concept of recurrent networks for remembering long-term dependencies in neural networks, addressing the vanishing gradient problem and the use of hidden states for neural memory.', 'duration': 65.001, 'highlights': ['Recurrent networks are designed to remember long-term dependencies by using hidden states from previous time steps to provide a form of neural memory for sequences.', 'The vanishing gradient problem occurs during backpropagation in recurrent networks, leading to the gradient becoming extremely small and hindering learning.', 'Hidden states in recurrent networks are used to represent the learnings of the network and provide a form of memory for remembering previous data points.']}, {'end': 809.017, 'start': 510.774, 'title': 'Neural network training and gradient descent', 'summary': 'Discusses the process of training neural networks through back propagation, including the computation of error values, the impact of gradient descent on weight updates, and the problem of vanishing or exploding gradients, with a solution involving the use of lstm cells.', 'duration': 298.243, 'highlights': ['The process of back propagation involves computing the error value and using it to update weights going backwards in the network recursively, ultimately aiming to minimize the error value through gradient descent. This process involves computing the partial derivative with respect to the error and weights, recursively propagating the gradient value backwards across every layer, with the goal of minimizing the error value through gradient descent.', 'The impact of gradient descent on weight updates is such that the magnitude of change in the first layers of the network is smaller than the magnitude of change in the last layers, with the potential issue of vanishing or exploding gradients resulting from factors like weights and activation functions. The magnitude of change in the first layers is smaller due to the decreasing gradient value, while factors like weights and activation functions can lead to vanishing or exploding gradients, presenting challenges in maintaining the gradient value during back propagation.', 'The solution for the vanishing or exploding gradient problem involves using an LSTM cell, which helps in maintaining the gradient value during back propagation, especially in scenarios where the knowledge of recurrent networks can be chaotic and prone to forgetting important information. The solution to the vanishing or exploding gradient problem involves the use of an LSTM cell, which helps in maintaining the gradient value during back propagation, particularly in scenarios where recurrent networks may struggle with retaining important information in a chaotic context.']}], 'duration': 363.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY445733.jpg', 'highlights': ['Recurrent networks use hidden states to remember long-term dependencies.', 'The vanishing gradient problem hinders learning in recurrent networks.', 'Hidden states in recurrent networks represent the learnings and provide memory for previous data points.', 'Back propagation involves recursively updating weights to minimize error through gradient descent.', 'Gradient descent leads to smaller changes in early layers and potential vanishing or exploding gradients.', 'LSTM cells address vanishing or exploding gradients in recurrent networks.', 'LSTM cells help maintain gradient value in chaotic contexts for recurrent networks.']}, {'end': 1250.487, 'segs': [{'end': 882.579, 'src': 'embed', 'start': 856.546, 'weight': 0, 'content': [{'end': 861.947, 'text': "it's just a more complicated or more extensive series of matrix operations.", 'start': 856.546, 'duration': 5.401}, {'end': 863.488, 'text': 'so let me let me talk about what this is.', 'start': 861.947, 'duration': 1.541}, {'end': 865.028, 'text': 'okay, let me let me give this a go.', 'start': 863.488, 'duration': 1.54}, {'end': 868.529, 'text': 'so LSTM cells consist of three gates.', 'start': 865.028, 'duration': 3.501}, {'end': 876.154, 'text': "you've got an input gate right here, you have your output gate right here, you have your forget gate right here and then you have a cell state.", 'start': 868.529, 'duration': 7.625}, {'end': 882.579, 'text': "Now you also see this input modulation gate, so that's actually only used sometimes, but let's just forget about that.", 'start': 876.454, 'duration': 6.125}], 'summary': 'Lstm cells consist of three gates and a cell state, with an input modulation gate used occasionally.', 'duration': 26.033, 'max_score': 856.546, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY856546.jpg'}, {'end': 946.042, 'src': 'embed', 'start': 918.455, 'weight': 4, 'content': [{'end': 923.36, 'text': 'so in a way you can think of an LSTM cell as kind of like a main neural network.', 'start': 918.455, 'duration': 4.905}, {'end': 929.566, 'text': 'These gates all have weights themselves, so they all have their own set of weight matrices.', 'start': 923.96, 'duration': 5.606}, {'end': 934.11, 'text': 'That means that an LSTM cell is fully differentiable.', 'start': 929.966, 'duration': 4.144}, {'end': 943.359, 'text': 'That means that we can compute the derivative of each of these components or gates, which means that we can update them over time.', 'start': 934.49, 'duration': 8.869}, {'end': 946.042, 'text': 'We can have them learn over time.', 'start': 943.399, 'duration': 2.643}], 'summary': 'Lstm cell is fully differentiable with its own weight matrices, allowing for learning over time.', 'duration': 27.587, 'max_score': 918.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY918455.jpg'}, {'end': 1027.753, 'src': 'heatmap', 'start': 946.582, 'weight': 0.756, 'content': [{'end': 946.923, 'text': 'And so.', 'start': 946.582, 'duration': 0.341}, {'end': 951.927, 'text': 'So these are the equations for each of these right.', 'start': 949.944, 'duration': 1.983}, {'end': 953.851, 'text': 'so for forget gate, or implicate.', 'start': 951.927, 'duration': 1.924}, {'end': 957.156, 'text': "on our output gate, It's input times, weight at a bias, activate,", 'start': 953.851, 'duration': 3.305}, {'end': 962.786, 'text': 'where the input it consists of the input and the hidden state from the previous time step.', 'start': 957.156, 'duration': 5.63}, {'end': 968.426, 'text': 'so Each of these weights, these gates, have their own set of weight values,', 'start': 962.786, 'duration': 5.64}, {'end': 977.971, 'text': "and so what we want is we want a way for our model to know what to forget and what to remember and What to pay attention to in what it's learned right?", 'start': 968.426, 'duration': 9.545}, {'end': 981.153, 'text': "What is the relevant and that's called the attention mechanism?", 'start': 978.311, 'duration': 2.842}, {'end': 993.313, 'text': "What is the relevant data in everything that it's learned? What is the? Relevant part of what it's being fed in this time step to remember and what should forget?", 'start': 981.553, 'duration': 11.76}, {'end': 995.674, 'text': 'The cell state is the long-term memory.', 'start': 993.313, 'duration': 2.361}, {'end': 1000.078, 'text': 'it represents what all of the learnings across all of time.', 'start': 995.674, 'duration': 4.404}, {'end': 1004.562, 'text': "The hidden state is akin to working memory, so it's kind of like a current memory.", 'start': 1000.078, 'duration': 4.484}, {'end': 1010.987, 'text': 'the forget gate, also called the Remember vector, learns what to forget and what to remember.', 'start': 1004.562, 'duration': 6.425}, {'end': 1013.189, 'text': 'right one or zero binary outcome.', 'start': 1010.987, 'duration': 2.202}, {'end': 1021.612, 'text': 'The input gate determines how much of the input to let into the cell state also called a save vector What to save and what not to.', 'start': 1013.869, 'duration': 7.743}, {'end': 1024.473, 'text': 'and the output gate is akin to an attention mechanism?', 'start': 1021.612, 'duration': 2.861}, {'end': 1027.753, 'text': 'What part of that data should it focus on?', 'start': 1024.913, 'duration': 2.84}], 'summary': 'Lstm model uses gates to forget, remember, and focus data. aims for relevant learning and memory retention.', 'duration': 81.171, 'max_score': 946.582, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY946582.jpg'}, {'end': 1021.612, 'src': 'embed', 'start': 993.313, 'weight': 1, 'content': [{'end': 995.674, 'text': 'The cell state is the long-term memory.', 'start': 993.313, 'duration': 2.361}, {'end': 1000.078, 'text': 'it represents what all of the learnings across all of time.', 'start': 995.674, 'duration': 4.404}, {'end': 1004.562, 'text': "The hidden state is akin to working memory, so it's kind of like a current memory.", 'start': 1000.078, 'duration': 4.484}, {'end': 1010.987, 'text': 'the forget gate, also called the Remember vector, learns what to forget and what to remember.', 'start': 1004.562, 'duration': 6.425}, {'end': 1013.189, 'text': 'right one or zero binary outcome.', 'start': 1010.987, 'duration': 2.202}, {'end': 1021.612, 'text': 'The input gate determines how much of the input to let into the cell state also called a save vector What to save and what not to.', 'start': 1013.869, 'duration': 7.743}], 'summary': 'Lstm has long-term and current memory, forgets based on binary outcome, and saves input selectively.', 'duration': 28.299, 'max_score': 993.313, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY993313.jpg'}, {'end': 1226.294, 'src': 'embed', 'start': 1205.239, 'weight': 2, 'content': [{'end': 1215.243, 'text': "so there's a lot of different ways that we can frame our our problem, and you know LSTMs can be applied to almost any problem, Very useful.", 'start': 1205.239, 'duration': 10.004}, {'end': 1217.805, 'text': "And so what are some other great examples? Well, I've got two.", 'start': 1215.703, 'duration': 2.102}, {'end': 1221.209, 'text': 'One is for automatic speech recognition with TensorFlow.', 'start': 1218.366, 'duration': 2.843}, {'end': 1226.294, 'text': "So yes, it's a little abstracted with TensorFlow, but it's a really cool use case and definitely something to check out.", 'start': 1221.509, 'duration': 4.785}], 'summary': 'Lstms can be applied to various problems, including automatic speech recognition with tensorflow.', 'duration': 21.055, 'max_score': 1205.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1205239.jpg'}], 'start': 809.037, 'title': 'Lstm in neural networks', 'summary': "Discusses lstm cells' significance in updating information accurately in neural networks over time, and their structure including three gates and a cell state. it also explains the working of lstm in neural networks, applications in nlp, image classification, and provides examples of automatic speech recognition and visualization of lstms.", 'chapters': [{'end': 946.042, 'start': 809.037, 'title': 'Lstm cell in neural networks', 'summary': 'Describes the need for lstm cells to update information accurately in neural networks over time, and explains the structure and functionality of lstm cells, consisting of three gates and a cell state, allowing for differentiability and learning over time.', 'duration': 137.005, 'highlights': ['The LSTM cell is a solution to update information accurately in neural networks, replacing the RNN cell, and consists of three gates (input, output, and forget) and a cell state, allowing for learning over time.', 'LSTM cells have their own set of weight matrices for each gate, making them fully differentiable and enabling the computation of derivatives for learning over time.', 'LSTM cells consist of three gates (input, output, and forget) and a cell state, allowing for differentiability and learning over time.']}, {'end': 1250.487, 'start': 946.582, 'title': 'Understanding lstm in neural networks', 'summary': 'Explains the working of lstm in neural networks, focusing on forget gate, input gate, output gate, cell state, and hidden state, and their role in remembering, forgetting, and attention mechanisms. it also mentions the applications in nlp, image classification, and provides examples of automatic speech recognition and visualization of lstms.', 'duration': 303.905, 'highlights': ['The cell state is the long-term memory, representing all learnings across time, while the hidden state is akin to working memory, reflecting the current memory.', 'The forget gate, also known as the Remember vector, learns what to forget and what to remember, resulting in a binary outcome of one or zero.', 'The input gate, also called a save vector, determines how much of the input to let into the cell state, deciding what to save and what not to.', 'LSTMs are utilized for various applications such as NLP tasks, image classification, and automatic speech recognition, with examples including translation, sentiment analysis, and predicting sequences from sequential data.']}], 'duration': 441.45, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY809037.jpg', 'highlights': ['The LSTM cell is a solution to update information accurately in neural networks, replacing the RNN cell, and consists of three gates (input, output, and forget) and a cell state, allowing for learning over time.', 'The cell state is the long-term memory, representing all learnings across time, while the hidden state is akin to working memory, reflecting the current memory.', 'LSTMs are utilized for various applications such as NLP tasks, image classification, and automatic speech recognition, with examples including translation, sentiment analysis, and predicting sequences from sequential data.', 'The forget gate, also known as the Remember vector, learns what to forget and what to remember, resulting in a binary outcome of one or zero.', 'LSTM cells have their own set of weight matrices for each gate, making them fully differentiable and enabling the computation of derivatives for learning over time.']}, {'end': 2005.731, 'segs': [{'end': 1326.324, 'src': 'embed', 'start': 1268.138, 'weight': 0, 'content': [{'end': 1271.359, 'text': 'This just jumble of numbers with no explanation.', 'start': 1268.138, 'duration': 3.221}, {'end': 1278.861, 'text': "this represents the forward pass, the series of Matrix operations that we're computing To get our output, our predicted output.", 'start': 1271.359, 'duration': 7.502}, {'end': 1280.482, 'text': "okay so, and we'll look at this in code.", 'start': 1278.861, 'duration': 1.621}, {'end': 1282.899, 'text': 'and So, okay so.', 'start': 1280.482, 'duration': 2.417}, {'end': 1285.32, 'text': 'our recurrent neural network is going to remember.', 'start': 1282.899, 'duration': 2.421}, {'end': 1293.785, 'text': "so what we're going to do is, given our M&M text, we're going to predict the input, the next word in the sequence, given the previous words.", 'start': 1285.32, 'duration': 8.465}, {'end': 1301.45, 'text': 'So what that means is the input is going to be all of those words that we have, just every word, and our output will be the same thing,', 'start': 1294.186, 'duration': 7.264}, {'end': 1302.831, 'text': 'but just moved over one.', 'start': 1301.45, 'duration': 1.381}, {'end': 1305.793, 'text': "That means that, you know, it'll look like this.", 'start': 1303.171, 'duration': 2.622}, {'end': 1310.856, 'text': "Like, you know, during training, we have a series of, you know, training iterations, right? So we'll say, you know, like.", 'start': 1305.833, 'duration': 5.023}, {'end': 1313.317, 'text': "Let's say, you know my name is slim.", 'start': 1311.956, 'duration': 1.361}, {'end': 1316.059, 'text': 'shady Would be like the text.', 'start': 1313.317, 'duration': 2.742}, {'end': 1317.579, 'text': 'so we would say my.', 'start': 1316.059, 'duration': 1.52}, {'end': 1318.7, 'text': "and we're trying to predict name.", 'start': 1317.579, 'duration': 1.121}, {'end': 1320.741, 'text': "okay?. We've got that.", 'start': 1318.7, 'duration': 2.041}, {'end': 1323.923, 'text': 'Compute the error back, propagate.', 'start': 1320.741, 'duration': 3.182}, {'end': 1326.324, 'text': 'next, iteration weights are updated.', 'start': 1323.923, 'duration': 2.401}], 'summary': 'Explaining the forward pass and prediction in recurrent neural network with an example of m&m text, leading to weight updates.', 'duration': 58.186, 'max_score': 1268.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1268138.jpg'}, {'end': 1511.596, 'src': 'embed', 'start': 1485.342, 'weight': 3, 'content': [{'end': 1493.826, 'text': 'and then we have our array of expected output values and finally we have our LSTM cell, which we initialize right here, giving it our inputs,', 'start': 1485.342, 'duration': 8.484}, {'end': 1498.348, 'text': 'our outputs, the amount of recurrence and the learning rate, just like we did for the recurrent network.', 'start': 1493.826, 'duration': 4.522}, {'end': 1506.171, 'text': "remember, it's like a mini network, it's like a network in a network and if you think of the gates as networks, and it's a network in a network,", 'start': 1498.348, 'duration': 7.823}, {'end': 1511.596, 'text': "inside of Network, or no, it's three networks inside of a network, inside of a network.", 'start': 1506.171, 'duration': 5.425}], 'summary': 'Introduction to lstm cells and their network structure.', 'duration': 26.254, 'max_score': 1485.342, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1485342.jpg'}, {'end': 1629.998, 'src': 'embed', 'start': 1601.866, 'weight': 4, 'content': [{'end': 1610.311, 'text': "so that's that's, that's us initializing our LSTM cell and now we can run forward propagation Through the LSTM cell itself.", 'start': 1601.866, 'duration': 8.445}, {'end': 1615.113, 'text': "so it's going to return our cell states, our hidden states, our forget gate, our.", 'start': 1610.311, 'duration': 4.802}, {'end': 1624.216, 'text': 'So we can, you know, sort in our array of values C, which is a cell state, and then the output,', 'start': 1615.533, 'duration': 8.683}, {'end': 1629.998, 'text': 'and so we can compute that using the forward prop function of the LSTM cell.', 'start': 1624.216, 'duration': 5.782}], 'summary': 'Initializing lstm cell for forward propagation with cell states, hidden states, and forget gate.', 'duration': 28.132, 'max_score': 1601.866, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1601866.jpg'}, {'end': 1934.053, 'src': 'embed', 'start': 1903.343, 'weight': 5, 'content': [{'end': 1905.105, 'text': "So the way we do that it's it's three steps.", 'start': 1903.343, 'duration': 1.762}, {'end': 1909.868, 'text': 'We do we compute the error times, the recurrent network weight matrix,', 'start': 1905.165, 'duration': 4.703}, {'end': 1920.057, 'text': 'and then we set the input values of the LCM cell for recurrence and We set the input values of the LSTM cell for recurrence and then, finally,', 'start': 1909.868, 'duration': 10.189}, {'end': 1923.119, 'text': 'we set the cell state of the LSTM cell for recurrence.', 'start': 1920.057, 'duration': 3.062}, {'end': 1925.06, 'text': "That's pre-updates.", 'start': 1924.18, 'duration': 0.88}, {'end': 1934.053, 'text': 'And then We recursively call this back propagation using these newly computed values right?', 'start': 1925.761, 'duration': 8.292}], 'summary': 'The process involves three steps: computing error times, setting input values, and recursively calling back propagation.', 'duration': 30.71, 'max_score': 1903.343, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1903343.jpg'}, {'end': 2005.731, 'src': 'embed', 'start': 1975.697, 'weight': 6, 'content': [{'end': 1986.5, 'text': 'so then, So, then So notice this update step.', 'start': 1975.697, 'duration': 10.803}, {'end': 1989.081, 'text': 'What this update step is, is RMS prop.', 'start': 1986.52, 'duration': 2.561}, {'end': 1994.965, 'text': "It's a way of decaying our learning rates over time, and this improves convergence.", 'start': 1989.121, 'duration': 5.844}, {'end': 1998.807, 'text': "There's a lot of different methodologies for improving on gradient descent.", 'start': 1994.985, 'duration': 3.822}, {'end': 2005.731, 'text': "There's ATOM, there's RMS prop, there's AdaGrad, there's a bunch of these, and RMS prop is one of them.", 'start': 1999.227, 'duration': 6.504}], 'summary': 'Rms prop is a method for decaying learning rates, improving convergence.', 'duration': 30.034, 'max_score': 1975.697, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1975697.jpg'}], 'start': 1250.547, 'title': 'Lstm cell in recurrent networks', 'summary': 'Covers lstm cell implementation for text prediction, including forward pass, training iterations, weight matrix initialization, activation functions, and backpropagation steps.', 'chapters': [{'end': 1367.449, 'start': 1250.547, 'title': 'Recurrent network and lstm for text prediction', 'summary': 'Covers the implementation of a recurrent network using lstm cells for text prediction, where the input is a sequence of words and the output is the next word, with a focus on the forward pass and training iterations.', 'duration': 116.902, 'highlights': ['The input for the recurrent neural network is a sequence of words, and the output is the next word in the sequence. This highlights the key focus of the text prediction task.', 'The jumble of numbers represents the forward pass, which involves a series of matrix operations to compute the predicted output. This highlights the process of the forward pass for computing predicted output.', 'During training iterations, the network predicts the next word, computes the error, and propagates it to update the weights. This highlights the iterative process of training the recurrent network.']}, {'end': 2005.731, 'start': 1369.219, 'title': 'Lstm cell initialization and propagation', 'summary': 'Covers the initialization and forward and backward propagation of the lstm cell in recurrent networks, including details on weight matrix initialization, activation functions, and backpropagation steps for gradient updates and error logging.', 'duration': 636.512, 'highlights': ['The chapter explains the initialization of the LSTM cell, including the weight matrix, learning rate, and arrays for storing input, cell states, output values, hidden states, and gate values.', 'It discusses the forward propagation process in a loop for the LSTM cell, including setting inputs, computing cell states, hidden states, forget gate, input gates, and output gates, and calculating the output prediction.', 'It details the steps for backward propagation, which involves updating weight matrices, computing error values, and propagating errors back through the LSTM cell for gradient updates and error logging.', 'The chapter also introduces RMS prop as a method for decaying learning rates over time to improve convergence in gradient descent.']}], 'duration': 755.184, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY1250547.jpg', 'highlights': ['The jumble of numbers represents the forward pass, involving matrix operations.', 'The input for the recurrent neural network is a sequence of words, and the output is the next word.', 'During training iterations, the network predicts the next word, computes the error, and updates the weights.', 'The chapter explains the initialization of the LSTM cell, including weight matrix and arrays for storing input, cell states, output values, and gate values.', 'It discusses the forward propagation process in a loop for the LSTM cell, including computing cell states, hidden states, forget gate, input gates, and output gates.', 'It details the steps for backward propagation, involving updating weight matrices, computing error values, and propagating errors back through the LSTM cell.', 'The chapter introduces RMS prop as a method for decaying learning rates over time to improve convergence in gradient descent.']}, {'end': 2238.987, 'segs': [{'end': 2044.771, 'src': 'embed', 'start': 2006.052, 'weight': 0, 'content': [{'end': 2007.613, 'text': "But here's the formula, I'll just put it up there.", 'start': 2006.052, 'duration': 1.561}, {'end': 2009.634, 'text': "Okay, so that's what this is.", 'start': 2008.313, 'duration': 1.321}, {'end': 2016.747, 'text': 'So now we will, so the sample function is the same thing as a forward propagation function.', 'start': 2011.803, 'duration': 4.944}, {'end': 2022.132, 'text': "It's just, it's what we're gonna use once we've trained our model to predict or to generate new words.", 'start': 2017.088, 'duration': 5.044}, {'end': 2023.353, 'text': "So it's the same thing right?", 'start': 2022.432, 'duration': 0.921}, {'end': 2031.84, 'text': "We have our input and for a number of words that we define, we'll say you know, generate words or predict words for as many iterations as we define.", 'start': 2023.393, 'duration': 8.447}, {'end': 2032.781, 'text': "So it's the same thing.", 'start': 2032.121, 'duration': 0.66}, {'end': 2034.022, 'text': 'So we can just skip that.', 'start': 2033.242, 'duration': 0.78}, {'end': 2036.004, 'text': 'Now for our LSTM cell.', 'start': 2034.803, 'duration': 1.201}, {'end': 2042.47, 'text': "So for our LSTM cell, We've given it the same parameters as we did for our recurrent network.", 'start': 2036.404, 'duration': 6.066}, {'end': 2044.771, 'text': 'It is, after all, a mini network in and of itself.', 'start': 2042.51, 'duration': 2.261}], 'summary': 'The sample function is used for forward propagation and predicting/generating new words after training the model.', 'duration': 38.719, 'max_score': 2006.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2006052.jpg'}, {'end': 2101.596, 'src': 'embed', 'start': 2065.98, 'weight': 1, 'content': [{'end': 2067.101, 'text': 'Our learning rate as well.', 'start': 2065.98, 'duration': 1.121}, {'end': 2069.562, 'text': "And now, we're going to create weight matrices.", 'start': 2067.481, 'duration': 2.081}, {'end': 2074.322, 'text': "We'll initialize these weight matrices randomly, just like we would for any kind of neural network.", 'start': 2069.601, 'duration': 4.721}, {'end': 2081.726, 'text': "We'll initialize weight matrices for our three gate values, for our forget, our input gate, and our output gate, as well as our cell state.", 'start': 2074.904, 'duration': 6.822}, {'end': 2084.527, 'text': 'So the cell state itself, let me go up here.', 'start': 2081.766, 'duration': 2.761}, {'end': 2093.549, 'text': 'has a a set of weight matrices right, just like all recurrent network, all neural networks.', 'start': 2086.382, 'duration': 7.167}, {'end': 2101.596, 'text': 'and so The node has its own set of weight matrices, that that we multiply to get that output value right?', 'start': 2093.549, 'duration': 8.047}], 'summary': 'Weight matrices for gate values initialized randomly for recurrent network.', 'duration': 35.616, 'max_score': 2065.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2065980.jpg'}, {'end': 2136.824, 'src': 'embed', 'start': 2114.007, 'weight': 5, 'content': [{'end': 2121.251, 'text': "you can think of these as gates, you can think of them as layers even, But they're called gates, But layers are very similar.", 'start': 2114.007, 'duration': 7.244}, {'end': 2122.432, 'text': "You know what I'm saying.", 'start': 2121.271, 'duration': 1.161}, {'end': 2125.674, 'text': 'the layers are very similar input times, weight, add a bias, activate.', 'start': 2122.432, 'duration': 3.242}, {'end': 2128.517, 'text': 'But we call them gates to differentiate.', 'start': 2125.674, 'duration': 2.843}, {'end': 2136.824, 'text': 'Not to be confused with so many terms here, not to not to can be confused with the actual mathematical term differentiate, But to discriminate right.', 'start': 2128.517, 'duration': 8.307}], 'summary': 'Neural network layers function as gates to process input data, differentiating through bias and activation.', 'duration': 22.817, 'max_score': 2114.007, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2114007.jpg'}, {'end': 2180.699, 'src': 'embed', 'start': 2149.154, 'weight': 3, 'content': [{'end': 2152.918, 'text': 'remember, all of these gates are differentiable.', 'start': 2149.154, 'duration': 3.764}, {'end': 2154.58, 'text': "So that's where these gradient values will go.", 'start': 2152.918, 'duration': 1.662}, {'end': 2159.285, 'text': 'so then we can update them through back propagation, because we back propagate Through.', 'start': 2154.58, 'duration': 4.705}, {'end': 2160.546, 'text': "that's the cell itself.", 'start': 2159.285, 'duration': 1.261}, {'end': 2164.169, 'text': "We don't just back propagate through the recurrent network at a high level.", 'start': 2160.906, 'duration': 3.263}, {'end': 2168.494, 'text': 'we are back propagating, that is, we are computing gradient values for each of these gates,', 'start': 2164.169, 'duration': 4.325}, {'end': 2173.276, 'text': 'and So we were updating weights for the input forget cell state,', 'start': 2168.494, 'duration': 4.782}, {'end': 2180.699, 'text': 'the output gate where and the Outer level weight matrix as well for the recurrent network.', 'start': 2173.276, 'duration': 7.423}], 'summary': 'Differentiable gates allow back propagation to update weights for input, forget cell state, output gate, and outer level weight matrix.', 'duration': 31.545, 'max_score': 2149.154, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2149154.jpg'}, {'end': 2223.99, 'src': 'embed', 'start': 2198.035, 'weight': 4, 'content': [{'end': 2202.758, 'text': 'So then we have our activation function, sigmoid, just like before, and our derivative, just like before.', 'start': 2198.035, 'duration': 4.723}, {'end': 2205.979, 'text': 'And so what we do is we add another activation function here.', 'start': 2202.858, 'duration': 3.121}, {'end': 2208.501, 'text': 'This is good practice in LSTM networks.', 'start': 2206.48, 'duration': 2.021}, {'end': 2211.823, 'text': 'You usually see the tanh function applied pretty much all the time.', 'start': 2208.561, 'duration': 3.262}, {'end': 2218.166, 'text': "And you might be asking, well, why do we use tanh over other activation functions? I've got a great video on this.", 'start': 2212.303, 'duration': 5.863}, {'end': 2223.99, 'text': "It's called, which activation function should I use? Search which activation function should I use? Great video on this.", 'start': 2218.567, 'duration': 5.423}], 'summary': "In lstm networks, tanh is commonly used as an activation function, with a video titled 'which activation function should i use?' explaining the reason.", 'duration': 25.955, 'max_score': 2198.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2198035.jpg'}], 'start': 2006.052, 'title': 'Lstm networks', 'summary': 'Covers neural network training with lstm cells, including initialization of matrices and word generation, as well as understanding lstm networks, focusing on gates, differentiation from layers, gradient computation, and the usage of tanh activation function.', 'chapters': [{'end': 2114.007, 'start': 2006.052, 'title': 'Neural network training with lstm cell', 'summary': 'Explains the process of training a neural network using lstm cells, including the initialization of input, output, cell state, and weight matrices as well as the generation of new words through forward propagation.', 'duration': 107.955, 'highlights': ['The chapter explains the process of training a neural network using LSTM cells The transcript discusses the training process of a neural network using LSTM cells to predict or generate new words.', 'Initialization of input, output, cell state, and weight matrices The chapter details the initialization process of input, output, and cell state, along with the random initialization of weight matrices for the LSTM cell.', 'Generation of new words through forward propagation The transcript mentions the usage of the sample function, which is equivalent to forward propagation, to generate new words once the model is trained.']}, {'end': 2238.987, 'start': 2114.007, 'title': 'Understanding lstm networks', 'summary': 'Explains the concept of gates in lstm networks, their differentiation from layers, initialization of gates, computation of gradient values for back propagation, and the usage of tanh activation function to prevent vanishing gradient problem.', 'duration': 124.98, 'highlights': ['LSTM gates are initialized and gradient values are computed for back propagation. All gates in LSTM networks are initialized, and gradient values are computed for back propagation, allowing for the updating of weights and learning what to forget, remember, and pay attention to.', 'Tanh activation function is used in LSTM networks to prevent the vanishing gradient problem and provide stronger gradients. The tanh activation function is preferred in LSTM networks as it prevents the vanishing gradient problem and provides stronger gradients by centering the data around zero, unlike the sigmoid function.', "Differentiation between gates and layers in LSTM networks. The explanation of gates in LSTM networks and the differentiation from layers is provided, emphasizing the similar input, weight, bias, and activation process while using the term 'gates' to distinguish them."]}], 'duration': 232.935, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2006052.jpg', 'highlights': ['The chapter explains the process of training a neural network using LSTM cells.', 'Initialization of input, output, cell state, and weight matrices is detailed.', 'Generation of new words through forward propagation is mentioned.', 'All gates in LSTM networks are initialized, and gradient values are computed for back propagation.', 'Tanh activation function is used in LSTM networks to prevent the vanishing gradient problem and provide stronger gradients.', 'Differentiation between gates and layers in LSTM networks is explained.']}, {'end': 2702.599, 'segs': [{'end': 2418.862, 'src': 'embed', 'start': 2393.907, 'weight': 1, 'content': [{'end': 2402.193, 'text': "And to get our actual output, our predicted output, we'll multiply our output gate times the activated version of our cell state.", 'start': 2393.907, 'duration': 8.286}, {'end': 2404.475, 'text': "And that's gonna give us our predicted output.", 'start': 2402.793, 'duration': 1.682}, {'end': 2414.658, 'text': 'We can then return our cell state, our predicted output, forget gate, input gate, cell, and then our output gate as well.', 'start': 2406.189, 'duration': 8.469}, {'end': 2416.079, 'text': "That's forward propagation.", 'start': 2415.078, 'duration': 1.001}, {'end': 2418.862, 'text': "Alright, and that's the equation that I showed up there.", 'start': 2417.02, 'duration': 1.842}], 'summary': 'Forward propagation equation: multiply output gate by activated cell state to get predicted output.', 'duration': 24.955, 'max_score': 2393.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2393907.jpg'}, {'end': 2468.999, 'src': 'embed', 'start': 2440.861, 'weight': 3, 'content': [{'end': 2444.242, 'text': 'we want to prevent the gradient from vanishing, so clipping it helps that.', 'start': 2440.861, 'duration': 3.381}, {'end': 2449.183, 'text': "and we'll multiply the error by the activated cell state to compute the output derivative.", 'start': 2444.242, 'duration': 4.941}, {'end': 2455.264, 'text': "and Then we'll compute the output update, which is the output derivatives times, the activated output times, the input,", 'start': 2449.183, 'duration': 6.081}, {'end': 2460.289, 'text': "and And then we'll compute the derivative of the cell state, which is error times, output times,", 'start': 2455.264, 'duration': 5.025}, {'end': 2462.551, 'text': 'the derivative of the cell state plus the derivative cell.', 'start': 2460.289, 'duration': 2.262}, {'end': 2468.999, 'text': "So we're computing derivatives of all of these components in the backwards order, as before.", 'start': 2462.992, 'duration': 6.007}], 'summary': 'Prevent vanishing gradient through activation and derivative computation.', 'duration': 28.138, 'max_score': 2440.861, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2440861.jpg'}, {'end': 2530.527, 'src': 'embed', 'start': 2505.57, 'weight': 2, 'content': [{'end': 2511.212, 'text': 'the cell state and the hidden state, So many different parameters that we have computed gradients for and back.', 'start': 2505.57, 'duration': 5.642}, {'end': 2515.253, 'text': 'propagation is going to let us compute all of those right, recursively,', 'start': 2511.212, 'duration': 4.041}, {'end': 2521.378, 'text': 'Computing the error with respect to our weights for every single component We have in that network,', 'start': 2516.454, 'duration': 4.924}, {'end': 2524.701, 'text': 'in the reverse order that we did forward propagation.', 'start': 2521.378, 'duration': 3.323}, {'end': 2527.464, 'text': "and so Yeah, it's just.", 'start': 2524.701, 'duration': 2.763}, {'end': 2530.527, 'text': "you know, it's more Matrix math than we did before.", 'start': 2527.464, 'duration': 3.063}], 'summary': 'Backpropagation computes gradients for cell and hidden states recursively in reverse order.', 'duration': 24.957, 'max_score': 2505.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2505570.jpg'}, {'end': 2638.786, 'src': 'embed', 'start': 2610.772, 'weight': 4, 'content': [{'end': 2617.956, 'text': "And then this catch or this if statement will say if our error is small enough, that is, it's between this range here.", 'start': 2610.772, 'duration': 7.184}, {'end': 2623.219, 'text': 'then we can go ahead and sample, which means predict new words, generate new words.', 'start': 2617.956, 'duration': 5.263}, {'end': 2624.759, 'text': 'Our network is trained.', 'start': 2623.559, 'duration': 1.2}, {'end': 2625.94, 'text': 'our error is smallest.', 'start': 2624.759, 'duration': 1.181}, {'end': 2630.502, 'text': "let's go ahead and now just predict new words without having to compute backpropagation to update our weights.", 'start': 2625.94, 'duration': 4.562}, {'end': 2632.483, 'text': 'Our weights are already updated enough.', 'start': 2630.802, 'duration': 1.681}, {'end': 2638.786, 'text': 'So then we can go ahead and define a seed word and then predict some new text by calling that sample function, which is the forward propagation.', 'start': 2632.703, 'duration': 6.083}], 'summary': 'Trained network can predict new words without backpropagation.', 'duration': 28.014, 'max_score': 2610.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2610772.jpg'}, {'end': 2678.736, 'src': 'heatmap', 'start': 2651.209, 'weight': 0.979, 'content': [{'end': 2654.911, 'text': "And then once it's done, I actually have a saved copy.", 'start': 2651.209, 'duration': 3.702}, {'end': 2663.074, 'text': "It's spitting out some pretty dope lyrics, right? These are some pretty dope lyrics, right? That's it for this lesson.", 'start': 2657.172, 'duration': 5.902}, {'end': 2665.896, 'text': 'Definitely look at this Jupyter Notebook afterwards.', 'start': 2663.415, 'duration': 2.481}, {'end': 2670.658, 'text': "Look at those links that I've sent you in the description as well as inside of the Jupyter Notebook.", 'start': 2666.456, 'duration': 4.202}, {'end': 2674.26, 'text': 'And make sure that you understand how to use it.', 'start': 2671.418, 'duration': 2.842}, {'end': 2678.736, 'text': 'why, at least why to use LSTM cells?', 'start': 2675.513, 'duration': 3.223}], 'summary': 'Generating dope lyrics using lstm cells, with saved copies and helpful resources provided.', 'duration': 27.527, 'max_score': 2651.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2651209.jpg'}, {'end': 2702.599, 'src': 'embed', 'start': 2678.736, 'weight': 0, 'content': [{'end': 2682.099, 'text': "the reason is because it's to remember long-term dependencies.", 'start': 2678.736, 'duration': 3.363}, {'end': 2685.743, 'text': "that's at least a high level that you should have got from this video.", 'start': 2682.099, 'duration': 3.644}, {'end': 2689.426, 'text': 'it learns what to forget, what to remember and what to pay attention to.', 'start': 2685.743, 'duration': 3.683}, {'end': 2695.953, 'text': 'those are the three things that an LSTM cell, as opposed to a regular recurrent network, lets you do, ok.', 'start': 2689.426, 'duration': 6.527}, {'end': 2701.258, 'text': "so yeah, please subscribe for more programming videos and for now I've got to learn to forget.", 'start': 2695.953, 'duration': 5.305}, {'end': 2702.599, 'text': 'so thanks for watching.', 'start': 2701.258, 'duration': 1.341}], 'summary': 'Lstm cell handles long-term dependencies, learns what to forget, remember, and pay attention to.', 'duration': 23.863, 'max_score': 2678.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2678736.jpg'}], 'start': 2238.987, 'title': 'Lstm and text generation', 'summary': 'Covers lstm cell forward propagation, backward propagation in recurrent networks, and lstm cell for text generation. it includes computations for forget, input, and output gates, preventing gradient vanishing, and aims for small error rates during training.', 'chapters': [{'end': 2418.862, 'start': 2238.987, 'title': 'Lstm cell forward propagation', 'summary': 'Covers the forward propagation for an lstm cell, including the computations for forget gate, cell state update, input gate, output gate, and predicted output.', 'duration': 179.875, 'highlights': ['The forward propagation for an LSTM cell involves computing the forget gate, updating the cell state, computing the input gate, updating the cell state again, and finally computing the output gate and predicted output.', 'The forget gate is computed by multiplying the input by the forget gate and then activating the product, followed by updating the cell state by multiplying it by the forget gate to determine what to forget.', 'To compute the predicted output, the dot product between the output gate and the cell state is computed, followed by multiplying the output gate by the activated version of the cell state to obtain the predicted output.']}, {'end': 2549.429, 'start': 2419.342, 'title': 'Backward propagation in recurrent networks', 'summary': 'Discusses the process of backward propagation in recurrent networks, emphasizing the computation of derivatives and updates for various components, with a focus on preventing gradient vanishing and recursively computing error with respect to weights.', 'duration': 130.087, 'highlights': ['The process involves computing the derivatives and updates for various components in the reverse order of forward propagation, recursively computing error with respect to weights for every single component in the network.', 'Clipping values to prevent gradient vanishing and updating gradient values for forget, input, cell, and output components are crucial steps in the backward propagation process.', 'The backward propagation involves approximately six to nine more steps than a recurrent network, and the update step includes updating gates using the computed gradient values.']}, {'end': 2702.599, 'start': 2552.401, 'title': 'Lstm cell and text generation', 'summary': 'Explains the implementation of lstm cell for text generation, including loading text, training the network, and generating new lyrics, aiming to remember long-term dependencies and achieve small error rates during training.', 'duration': 150.198, 'highlights': ['The LSTM cell allows the network to remember long-term dependencies and achieve small error rates during training. The LSTM cell is designed to remember long-term dependencies and achieve small error rates during training, enabling the network to learn what to forget, what to remember, and what to pay attention to.', 'The process involves loading MNM text files of lyrics, computing unique words, initializing the recurrent network with hyperparameters, and training for 5,000 iterations with a learning rate of 0.001. The process includes loading MNM text files of lyrics, computing unique words, initializing the recurrent network with hyperparameters, and training for 5,000 iterations with a learning rate of 0.001.', 'The network is trained to predict new words and generate new lyrics, and if the error is small enough, it can sample and predict new words without having to compute backpropagation to update weights. The network is trained to predict new words and generate new lyrics, and if the error is small enough, it can sample and predict new words without having to compute backpropagation to update weights.']}], 'duration': 463.612, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/9zhrxE5PQgY/pics/9zhrxE5PQgY2238987.jpg', 'highlights': ['The LSTM cell is designed to remember long-term dependencies and achieve small error rates during training.', 'The forward propagation for an LSTM cell involves computing the forget gate, updating the cell state, computing the input gate, updating the cell state again, and finally computing the output gate and predicted output.', 'The process involves computing the derivatives and updates for various components in the reverse order of forward propagation, recursively computing error with respect to weights for every single component in the network.', 'Clipping values to prevent gradient vanishing and updating gradient values for forget, input, cell, and output components are crucial steps in the backward propagation process.', 'The network is trained to predict new words and generate new lyrics, and if the error is small enough, it can sample and predict new words without having to compute backpropagation to update weights.']}], 'highlights': ['The LSTM cell is designed to remember long-term dependencies and achieve small error rates during training.', 'The chapter focuses on building a long short-term memory (LSTM) network using only NumPy to generate Eminem lyrics.', 'The vanishing gradient problem hinders learning in recurrent networks.', 'The LSTM cell is a solution to update information accurately in neural networks, replacing the RNN cell, and consists of three gates (input, output, and forget) and a cell state, allowing for learning over time.', 'The vanishing gradient problem is introduced, affecting the ability of recurrent networks to predict distant words in a sequence.', 'LSTM cells address vanishing or exploding gradients in recurrent networks.', 'The jumble of numbers represents the forward pass, involving matrix operations.', 'The input for the recurrent neural network is a sequence of words, and the output is the next word.', 'The chapter explains the process of training a neural network using LSTM cells.', 'The forward propagation for an LSTM cell involves computing the forget gate, updating the cell state, computing the input gate, updating the cell state again, and finally computing the output gate and predicted output.']}