title
Recurrent Neural Network - The Math of Intelligence (Week 5)
description
Recurrent neural networks let us learn from sequential data (time series, music, audio, video frames, etc ). We're going to build one from scratch in numpy (including backpropagation) to generate a sequence of words in the style of Franz Kafka.
Code for this video:
https://github.com/llSourcell/recurrent_neural_network
Please Subscribe! And like. And comment. That's what keeps me going.
More learning resources:
https://www.youtube.com/watch?v=hWgGJeAvLws
https://www.youtube.com/watch?v=cdLUzrjnlr4
https://medium.freecodecamp.org/dive-into-deep-learning-with-these-23-online-courses-bf247d289cc0
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
https://deeplearning4j.org/lstm.html
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/
And please support me on Patreon:
https://www.patreon.com/user?u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
https://www.wagergpt.co
detail
{'title': 'Recurrent Neural Network - The Math of Intelligence (Week 5)', 'heatmap': [{'end': 739.325, 'start': 680.577, 'weight': 0.731}, {'end': 851.875, 'start': 819.087, 'weight': 0.712}, {'end': 1127.61, 'start': 1062.668, 'weight': 1}, {'end': 1204.541, 'start': 1148.258, 'weight': 0.752}, {'end': 1648.434, 'start': 1585.857, 'weight': 0.805}, {'end': 2052.219, 'start': 2020.134, 'weight': 0.887}], 'summary': "Covers generating words in 'metamorphosis' style, understanding neural network computation, the importance of sequence in data and recurrent networks, recurrent network model training, neural network model parameters, and recurrent neural network training, with practical applications and recommendations for training in the cloud.", 'chapters': [{'end': 42.39, 'segs': [{'end': 42.39, 'src': 'embed', 'start': 0.069, 'weight': 0, 'content': [{'end': 3.652, 'text': "Hello world, it's Siraj, and today we're going to generate words.", 'start': 0.069, 'duration': 3.583}, {'end': 10.197, 'text': "So, given some book or movie script or any kind of text corpus, it's plug and play.", 'start': 3.932, 'duration': 6.265}, {'end': 12.979, 'text': 'so you can give it any kind of text corpus.', 'start': 10.197, 'duration': 2.782}, {'end': 17.243, 'text': 'it will learn how to generate words in the style of that corpus of text.', 'start': 12.979, 'duration': 4.264}, {'end': 19.586, 'text': "And in this case, we're gonna give it a book.", 'start': 17.803, 'duration': 1.783}, {'end': 27.136, 'text': 'The book is called Metamorphosis by Franz Kafka, which was a really crazy, weird writer from the 20th century.', 'start': 19.766, 'duration': 7.37}, {'end': 28.257, 'text': "Anyway, he's a cool dude.", 'start': 27.196, 'duration': 1.061}, {'end': 31.321, 'text': "Anyway, we're gonna generate words in the style of that book.", 'start': 28.638, 'duration': 2.683}, {'end': 33.023, 'text': 'And this can be applied to any text.', 'start': 31.501, 'duration': 1.522}, {'end': 34.565, 'text': 'Any type of text.', 'start': 33.524, 'duration': 1.041}, {'end': 36.086, 'text': "It doesn't just have to be words.", 'start': 34.585, 'duration': 1.501}, {'end': 37.306, 'text': 'It can be code.', 'start': 36.486, 'duration': 0.82}, {'end': 40.008, 'text': 'It can be HTML, whatever it is.', 'start': 37.867, 'duration': 2.141}, {'end': 41.429, 'text': "But that's what we're going to do.", 'start': 40.508, 'duration': 0.921}, {'end': 42.39, 'text': 'No libraries.', 'start': 41.709, 'duration': 0.681}], 'summary': "Siraj demonstrates word generation from 'metamorphosis' by kafka using any text corpus.", 'duration': 42.321, 'max_score': 0.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA69.jpg'}], 'start': 0.069, 'title': "Generating words in 'metamorphosis' style", 'summary': "Discusses a plug and play approach for generating words in the style of 'metamorphosis' by franz kafka, applicable to any type of text without the need for libraries.", 'chapters': [{'end': 42.39, 'start': 0.069, 'title': "Generating words in the style of 'metamorphosis'", 'summary': "Discusses generating words in the style of the book 'metamorphosis' by franz kafka using a plug and play approach, applicable to any type of text without the need for libraries.", 'duration': 42.321, 'highlights': ['Applicable to any type of text, including code and HTML, without the need for libraries.', "Demonstrates generating words in the style of 'Metamorphosis' by Franz Kafka using a plug and play approach.", 'Applies machine learning to learn how to generate words in the style of a given text corpus.']}], 'duration': 42.321, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA69.jpg', 'highlights': ['Applicable to any type of text, including code and HTML, without the need for libraries.', 'Applies machine learning to learn how to generate words in the style of a given text corpus.', "Demonstrates generating words in the style of 'Metamorphosis' by Franz Kafka using a plug and play approach."]}, {'end': 350.138, 'segs': [{'end': 73.465, 'src': 'embed', 'start': 42.61, 'weight': 0, 'content': [{'end': 43.25, 'text': 'Just NumPy.', 'start': 42.61, 'duration': 0.64}, {'end': 48.954, 'text': "So I'm going to go through the derivation, the forward propagation, calculating the loss, all the math.", 'start': 43.33, 'duration': 5.624}, {'end': 50.915, 'text': 'So get ready for some math.', 'start': 49.334, 'duration': 1.581}, {'end': 53.777, 'text': 'Put on your linear algebra and calculus hats.', 'start': 51.055, 'duration': 2.722}, {'end': 57.339, 'text': 'Okay? So this is kind of what it looks like, this first image here.', 'start': 54.157, 'duration': 3.182}, {'end': 61.169, 'text': "And I'm gonna actually code it as well.", 'start': 58.865, 'duration': 2.304}, {'end': 66.679, 'text': "so it's, I'm not just gonna glaze over, I'm gonna code it so we can see the outputs as I go.", 'start': 61.169, 'duration': 5.51}, {'end': 71.503, 'text': "but the very end part will be I'm gonna code the important parts.", 'start': 66.679, 'duration': 4.824}, {'end': 73.465, 'text': 'let me just say that okay, so okay.', 'start': 71.503, 'duration': 1.962}], 'summary': 'Derivation and implementation of numpy operations for forward propagation and loss calculation, with coding demonstration.', 'duration': 30.855, 'max_score': 42.61, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA42610.jpg'}, {'end': 114.532, 'src': 'embed', 'start': 87.658, 'weight': 3, 'content': [{'end': 92.602, 'text': 'okay. so character by character by character, not word by word by word, okay.', 'start': 87.658, 'duration': 4.944}, {'end': 97.765, 'text': "so it's gonna be trained for a thousand iterations, and the more you train it, the better it's going to get.", 'start': 92.602, 'duration': 5.163}, {'end': 103.627, 'text': "so if you leave this thing running overnight on your laptop, then by the time you wake up it'll be really good.", 'start': 97.765, 'duration': 5.862}, {'end': 106.028, 'text': "however, I wouldn't recommend training on your laptop.", 'start': 103.627, 'duration': 2.401}, {'end': 112.511, 'text': 'as my song says, I train my models in the cloud now because my laptop takes longer, right?', 'start': 106.028, 'duration': 6.483}, {'end': 114.532, 'text': 'so what is a recurrent network?', 'start': 112.511, 'duration': 2.021}], 'summary': 'Training for 1000 iterations improves character model in recurrent network.', 'duration': 26.874, 'max_score': 87.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA87658.jpg'}, {'end': 221.449, 'src': 'embed', 'start': 193.507, 'weight': 2, 'content': [{'end': 196.95, 'text': 'if you want to add a bias, which you should in practice, you should add a bias.', 'start': 193.507, 'duration': 3.443}, {'end': 203.115, 'text': "I've built neural networks without biases before, for example, but you really should add a bias, and I'll talk about why in a second.", 'start': 197.691, 'duration': 5.424}, {'end': 207.218, 'text': 'But you should add a bias value, and then you activate the output of that.', 'start': 203.575, 'duration': 3.643}, {'end': 216.125, 'text': 'And by activate I mean you take the output of that dot product plus bias operation, the output of that, and you feed it into an activation function,', 'start': 207.258, 'duration': 8.867}, {'end': 221.449, 'text': "a non-linearity, whether that's a sigmoid or tanh or rectified linear unit.", 'start': 216.125, 'duration': 5.324}], 'summary': 'Neural networks should include a bias for improved performance and activation function for non-linearity.', 'duration': 27.942, 'max_score': 193.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA193507.jpg'}, {'end': 351.279, 'src': 'embed', 'start': 324.501, 'weight': 1, 'content': [{'end': 328.844, 'text': 'So the largest function, the function on the outside.', 'start': 324.501, 'duration': 4.343}, {'end': 337.97, 'text': "here is then the output layer, because we're feeding it, the output of all that chain of computation that already occurred.", 'start': 328.844, 'duration': 9.126}, {'end': 340.792, 'text': "right. so that's what that is.", 'start': 338.65, 'duration': 2.142}, {'end': 344.434, 'text': "so it's a composite function and we would use feed forward nets anytime.", 'start': 340.792, 'duration': 3.642}, {'end': 350.138, 'text': 'we have two variables that are related temperature, location, hide and wait, car speed and brand.', 'start': 344.434, 'duration': 5.704}, {'end': 351.279, 'text': 'these are all mappings.', 'start': 350.138, 'duration': 1.141}], 'summary': 'Composite function with feed forward nets used for mapping related variables.', 'duration': 26.778, 'max_score': 324.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA324501.jpg'}], 'start': 42.61, 'title': 'Understanding neural network computation', 'summary': 'Explains neural network computation as a series of matrix operations and activation functions, emphasizing the role of biases and the concept of a feed-forward network as a composite function. it also covers the derivation, forward propagation, and training of a character-level recurrent network using numpy, with a mention of training for a thousand iterations and the recommendation to train models in the cloud.', 'chapters': [{'end': 123.272, 'start': 42.61, 'title': 'Understanding recurrent networks with numpy', 'summary': 'Covers the derivation, forward propagation, and training of a character-level recurrent network using numpy, with a mention of training for a thousand iterations and the recommendation to train models in the cloud.', 'duration': 80.662, 'highlights': ['The chapter explains the derivation, forward propagation, and training of a character-level recurrent network using NumPy.', 'It mentions that the network will be trained for a thousand iterations, with improved performance over time.', 'The recommendation is made to train models in the cloud due to the extended time required for training on a laptop.', 'A distinction is made between character-level and word-level recurrent networks, with a focus on generating characters sequentially.']}, {'end': 350.138, 'start': 123.272, 'title': 'Understanding neural network computation', 'summary': 'Explains neural network computation as a series of matrix operations and activation functions, highlighting the crucial role of biases and the concept of a feed-forward network as a composite function.', 'duration': 226.866, 'highlights': ['The role of biases in neural networks is crucial as it affects the computation graph and the application of activation functions, emphasizing the necessity to add bias values for effective learning.', "A feed-forward network is depicted as a composite function, with nested functions representing each layer, and the output layer being the most nested function, providing a clear understanding of the network's structure and functionality.", 'The explanation of neural networks as a series of matrix operations and activation functions, emphasizing the neurons as the output of matrix operations and the need for activation functions to enable learning of both linear and nonlinear functions.']}], 'duration': 307.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA42610.jpg', 'highlights': ['The chapter explains the derivation, forward propagation, and training of a character-level recurrent network using NumPy.', "A feed-forward network is depicted as a composite function, with nested functions representing each layer, and the output layer being the most nested function, providing a clear understanding of the network's structure and functionality.", 'The role of biases in neural networks is crucial as it affects the computation graph and the application of activation functions, emphasizing the necessity to add bias values for effective learning.', 'The recommendation is made to train models in the cloud due to the extended time required for training on a laptop.']}, {'end': 932.891, 'segs': [{'end': 499.804, 'src': 'embed', 'start': 394.551, 'weight': 0, 'content': [{'end': 398.812, 'text': 'The stock prices before are what matter to the current stock price.', 'start': 394.551, 'duration': 4.261}, {'end': 405.113, 'text': "that's actually debatable in the in the context of stock prices, But it applies to all time series data.", 'start': 398.812, 'duration': 6.301}, {'end': 406.073, 'text': 'so video right.', 'start': 405.113, 'duration': 0.96}, {'end': 411.115, 'text': 'if you want to generate the next frame in a video, It matters what frames came before it.', 'start': 406.073, 'duration': 5.042}, {'end': 416.296, 'text': "you can't just learn a mapping between a frame and the time that that frame shows up.", 'start': 411.115, 'duration': 5.181}, {'end': 423.15, 'text': "because then what happens is Given some new time, you can't just generate a frame based on nothing else, right?", 'start': 416.296, 'duration': 6.854}, {'end': 425.512, 'text': 'It depends on the frames that came before it.', 'start': 423.491, 'duration': 2.021}, {'end': 427.654, 'text': "You see what I'm saying? The sequence matters.", 'start': 425.712, 'duration': 1.942}, {'end': 429.055, 'text': 'The sequence matters here.', 'start': 427.934, 'duration': 1.121}, {'end': 431.797, 'text': 'The alphabet or lyrics of a song.', 'start': 429.315, 'duration': 2.482}, {'end': 437.662, 'text': "You can't just generate a lyric or an alphabet depending on the index that it's at.", 'start': 432.097, 'duration': 5.565}, {'end': 439.543, 'text': "You've got to know what came before it.", 'start': 437.882, 'duration': 1.661}, {'end': 443.426, 'text': 'Try to recite the alphabet backwards.', 'start': 440.044, 'duration': 3.382}, {'end': 447.75, 'text': 'And so to get into neuroscience for a second, try to recite the alphabet backwards.', 'start': 443.506, 'duration': 4.244}, {'end': 451.873, 'text': "It's hard, right? Z, Y, X, W key, Q, U.", 'start': 447.97, 'duration': 3.903}, {'end': 453.294, 'text': "Okay, see, I can't even do it right now.", 'start': 451.873, 'duration': 1.421}, {'end': 454.915, 'text': "I'm not going to edit that out.", 'start': 454.014, 'duration': 0.901}, {'end': 456.836, 'text': 'So, or any song.', 'start': 455.375, 'duration': 1.461}, {'end': 457.937, 'text': 'Try to recite a song backwards.', 'start': 456.876, 'duration': 1.061}, {'end': 460.038, 'text': "You can't because you learned it in a sequence.", 'start': 457.977, 'duration': 2.061}, {'end': 461.839, 'text': "It's a kind of conditional memory.", 'start': 460.338, 'duration': 1.501}, {'end': 467.363, 'text': "What you remember depends on what you've stored previously, right? It's conditional memory in that way.", 'start': 462.2, 'duration': 5.163}, {'end': 471.065, 'text': 'And that is what recurrent networks help us do.', 'start': 467.743, 'duration': 3.322}, {'end': 473.367, 'text': 'They help us compute conditional memory.', 'start': 471.105, 'duration': 2.262}, {'end': 477.25, 'text': 'They help us compute the next value in a sequence of values.', 'start': 473.547, 'duration': 3.703}, {'end': 479.891, 'text': "So that's what recurrent networks are good at.", 'start': 478.11, 'duration': 1.781}, {'end': 480.792, 'text': "That's what they're made for.", 'start': 479.911, 'duration': 0.881}, {'end': 483.454, 'text': "And it's not like this is some new technology.", 'start': 481.292, 'duration': 2.162}, {'end': 486.595, 'text': 'Recurrent networks are invented, they were invented in the 80s.', 'start': 483.694, 'duration': 2.901}, {'end': 488.637, 'text': 'Neural networks were invented in the 50s.', 'start': 487.056, 'duration': 1.581}, {'end': 490.538, 'text': 'But why is this super hot right now?', 'start': 488.677, 'duration': 1.861}, {'end': 491.799, 'text': 'Why are you watching this video??', 'start': 490.878, 'duration': 0.921}, {'end': 499.804, 'text': 'Because with the invention of bigger data and bigger computing power, when you take these recurrent networks and give them those two things,', 'start': 492.039, 'duration': 7.765}], 'summary': 'Sequence matters in stock prices, video frames, and memory storage. recurrent networks compute conditional memory and are invented in the 80s.', 'duration': 105.253, 'max_score': 394.551, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA394551.jpg'}, {'end': 557.996, 'src': 'embed', 'start': 537.524, 'weight': 5, 'content': [{'end': 547.39, 'text': "and that's really what makes it different than a feed forward network is that we're adding a third weight matrix and what the third weight matrix is doing is it's connecting the current hidden state,", 'start': 537.524, 'duration': 9.866}, {'end': 553.213, 'text': 'So the hidden state at the current time step, to the hidden state at the previous time step.', 'start': 548.851, 'duration': 4.362}, {'end': 557.996, 'text': "So it's a recurrent weight matrix and you'll see programmatically and mathematically what I'm talking about.", 'start': 553.213, 'duration': 4.783}], 'summary': 'Recurrent neural network adds a third weight matrix to connect current and previous hidden states.', 'duration': 20.472, 'max_score': 537.524, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA537524.jpg'}, {'end': 739.325, 'src': 'heatmap', 'start': 680.577, 'weight': 0.731, 'content': [{'end': 684.24, 'text': "and so the blue arrow is what's different here compared to a feed-forward network.", 'start': 680.577, 'duration': 3.663}, {'end': 693.305, 'text': "we're feeding in the previous hidden state, as was the input, to compute the current hidden state and To compute our output, our y value,", 'start': 684.24, 'duration': 9.065}, {'end': 696.807, 'text': "and we're using a loss function to improve our network every time.", 'start': 693.305, 'duration': 3.502}, {'end': 703.029, 'text': 'And so if you think of what that recurrence looks like, it looks like this so remember that feed forward network We just looked at.', 'start': 696.807, 'duration': 6.222}, {'end': 708.672, 'text': 'the difference here Is that we are feeding in the output of the hidden state back into the input.', 'start': 703.029, 'duration': 5.643}, {'end': 711.193, 'text': 'the output of this weight times bias?', 'start': 708.672, 'duration': 2.521}, {'end': 714.054, 'text': "Activate operation, that's in a layer.", 'start': 712.173, 'duration': 1.881}, {'end': 721.612, 'text': 'Okay. so of the formula for a recurrent network looks like this, which basically says that the current hidden state, HT,', 'start': 714.654, 'duration': 6.958}, {'end': 731.056, 'text': 'is a function of the previous hidden state and the current input, Okay, and the theta value right here are the parameters of the function,', 'start': 721.612, 'duration': 9.444}, {'end': 733.999, 'text': 'so the network learns to use H of T as a lossy.', 'start': 731.056, 'duration': 2.943}, {'end': 735.02, 'text': 'summary of the task.', 'start': 733.999, 'duration': 1.021}, {'end': 739.325, 'text': 'relevant aspects of the past sequence of inputs up to T.', 'start': 735.02, 'duration': 4.305}], 'summary': 'Recurrent network feeds previous hidden state as input, uses loss function to improve, and learns to use h of t as a summary of the past sequence of inputs up to t.', 'duration': 58.748, 'max_score': 680.577, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA680577.jpg'}, {'end': 851.875, 'src': 'heatmap', 'start': 819.087, 'weight': 0.712, 'content': [{'end': 823.651, 'text': "And from our forward pass we're going to calculate the probability for every possible next char,", 'start': 819.087, 'duration': 4.564}, {'end': 826.935, 'text': 'given that t according to the state of the model using the parameters.', 'start': 823.651, 'duration': 3.284}, {'end': 833.361, 'text': "And then we're going to measure our error as the distance between the previous probability value and the target char.", 'start': 827.455, 'duration': 5.906}, {'end': 838.346, 'text': "So that's what acts as our label, the next char in the sequence.", 'start': 833.421, 'duration': 4.925}, {'end': 839.327, 'text': 'And we just keep doing that.', 'start': 838.366, 'duration': 0.961}, {'end': 845.891, 'text': "So it's a dynamic error right, and so we, once we have that error value, We'll use it to help us calculate,", 'start': 839.887, 'duration': 6.004}, {'end': 851.875, 'text': 'to help us Calculate our gradients for each of our parameters to see the impact they have on the loss.', 'start': 845.891, 'duration': 5.984}], 'summary': 'Calculating probabilities and measuring errors to update parameters in model training.', 'duration': 32.788, 'max_score': 819.087, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA819087.jpg'}], 'start': 350.138, 'title': 'Importance of sequence in data and recurrent networks', 'summary': 'Discusses the importance of sequence in data, emphasizing the need to consider preceding data. it also explains how recurrent networks enable conditional memory, outperforming other machine learning models in accuracy by leveraging bigger data and computing power.', 'chapters': [{'end': 461.839, 'start': 350.138, 'title': 'Importance of sequence in data', 'summary': 'Discusses the importance of sequence in data, using examples such as stock prices, video frames, and song lyrics, emphasizing the need to consider the preceding data for accurate predictions and generation.', 'duration': 111.701, 'highlights': ['Stock prices are a great example of when time matters, as the previous stock prices significantly impact the current stock price. The chapter emphasizes the impact of previous stock prices on the current stock price, highlighting the importance of considering preceding data in financial analysis.', 'In the context of generating the next frame in a video, it is crucial to consider the frames that came before it, as they influence the generation of the subsequent frame. The discussion emphasizes the need to consider preceding frames in video generation, demonstrating the significance of sequence in the context of video data processing.', 'The chapter also mentions the importance of sequence in learning the alphabet or lyrics of a song, as attempting to recite them backwards illustrates the impact of conditional memory and the need for preceding knowledge. The chapter discusses the conditional memory associated with learning the alphabet or song lyrics, highlighting the challenge of reciting them backwards and emphasizing the significance of sequence in memory recall.']}, {'end': 932.891, 'start': 462.2, 'title': 'Recurrent networks and conditional memory', 'summary': 'Explains how recurrent networks enable conditional memory, leveraging bigger data and computing power to outperform other machine learning models in accuracy, with a detailed highlight on the structure and function of recurrent networks, the use of recurrent weight matrix, and the application in time series prediction and sequential data generation.', 'duration': 470.691, 'highlights': ['Recurrent networks leverage bigger data and computing power to outperform other machine learning models in accuracy With the invention of bigger data and computing power, recurrent networks outperform other machine learning models in accuracy.', 'The structure and function of recurrent networks Recurrent networks are designed for computing conditional memory, computing the next value in a sequence of values, and using a third weight matrix to connect the current hidden state to the previous hidden state.', 'The use of recurrent weight matrix The recurrent weight matrix connects the current hidden state to the previous hidden state, enabling the network to remember sequential data and compute conditional memory.', 'Application in time series prediction and sequential data generation Recurrent networks are applied in time series prediction, weather forecasting, stock prices, traffic volume, sequential data generation, music, video, audio, and binary addition.']}], 'duration': 582.753, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA350138.jpg', 'highlights': ['The chapter emphasizes the impact of previous stock prices on the current stock price, highlighting the importance of considering preceding data in financial analysis.', 'The discussion emphasizes the need to consider preceding frames in video generation, demonstrating the significance of sequence in the context of video data processing.', 'The chapter discusses the conditional memory associated with learning the alphabet or song lyrics, highlighting the challenge of reciting them backwards and emphasizing the significance of sequence in memory recall.', 'Recurrent networks leverage bigger data and computing power to outperform other machine learning models in accuracy.', 'Recurrent networks are designed for computing conditional memory, computing the next value in a sequence of values, and using a third weight matrix to connect the current hidden state to the previous hidden state.', 'The recurrent weight matrix connects the current hidden state to the previous hidden state, enabling the network to remember sequential data and compute conditional memory.', 'Recurrent networks are applied in time series prediction, weather forecasting, stock prices, traffic volume, sequential data generation, music, video, audio, and binary addition.']}, {'end': 1374.184, 'segs': [{'end': 986.46, 'src': 'embed', 'start': 933.211, 'weight': 0, 'content': [{'end': 936.614, 'text': 'And so once we understand the intuition behind recurrent networks,', 'start': 933.211, 'duration': 3.403}, {'end': 941.598, 'text': 'then we can move on to LSTM networks and bidirectional networks and recursive networks.', 'start': 936.614, 'duration': 4.984}, {'end': 947.202, 'text': 'Those are more advanced networks and they solve some problems with recurrent networks.', 'start': 942.078, 'duration': 5.124}, {'end': 950.705, 'text': "But before you get there, you've got to understand recurrent networks.", 'start': 947.502, 'duration': 3.203}, {'end': 952.968, 'text': 'okay. so this code contains four parts.', 'start': 950.925, 'duration': 2.043}, {'end': 955.611, 'text': 'the first part is for us to load the training data.', 'start': 952.968, 'duration': 2.643}, {'end': 959.015, 'text': "then we'll define our network, then we'll define our loss function,", 'start': 955.611, 'duration': 3.404}, {'end': 964.121, 'text': 'and the loss function is going to contain both the forward pass and the backward pop pass.', 'start': 959.015, 'duration': 5.106}, {'end': 966.865, 'text': 'so the real meat of the code is going to happen in the loss function,', 'start': 964.121, 'duration': 2.744}, {'end': 974.411, 'text': "And what it's going to do is it's going to return the gradient values that we can then use to update our weights later on during training.", 'start': 967.265, 'duration': 7.146}, {'end': 977.713, 'text': 'But the meat of the code is going to happen in the loss function.', 'start': 974.491, 'duration': 3.222}, {'end': 984.638, 'text': "And once we've defined that, we'll write a function to then make predictions, which in this case would be to generate words.", 'start': 978.293, 'duration': 6.345}, {'end': 986.46, 'text': "And we'll train the network as well.", 'start': 985.099, 'duration': 1.361}], 'summary': 'Understanding recurrent networks is crucial before moving on to more advanced networks such as lstm and bidirectional networks, as explained in the transcript.', 'duration': 53.249, 'max_score': 933.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA933211.jpg'}, {'end': 1127.61, 'src': 'heatmap', 'start': 1062.668, 'weight': 1, 'content': [{'end': 1066.85, 'text': "And it's going to tell us how many unique chars there are, which matters to us,", 'start': 1062.668, 'duration': 4.182}, {'end': 1073.314, 'text': 'because we want to make a vector of the size of the number of chars that there are.', 'start': 1066.85, 'duration': 6.464}, {'end': 1075.655, 'text': 'So let me go ahead and print that out.', 'start': 1073.834, 'duration': 1.821}, {'end': 1079.618, 'text': "And it's going to tell us exactly what the deal is.", 'start': 1076.536, 'duration': 3.082}, {'end': 1081.058, 'text': "And so we've got.", 'start': 1080.378, 'duration': 0.68}, {'end': 1085.341, 'text': "Care. that's how many characters it has.", 'start': 1083.06, 'duration': 2.281}, {'end': 1090.885, 'text': 'okay, so the data has 137 K characters and 81 of them are unique.', 'start': 1085.341, 'duration': 5.544}, {'end': 1092.747, 'text': 'okay, good, good to know, good to know.', 'start': 1090.885, 'duration': 1.862}, {'end': 1096.309, 'text': 'our next step is to calculate the vocab size.', 'start': 1092.747, 'duration': 3.562}, {'end': 1101.593, 'text': "Okay, so, we're going to calculate the vocab size because we want to be able to feed vectors into our network.", 'start': 1096.309, 'duration': 5.284}, {'end': 1104.775, 'text': "We can't just feed in raw String care.", 'start': 1101.593, 'duration': 3.182}, {'end': 1105.856, 'text': 'you know, chars.', 'start': 1104.775, 'duration': 1.081}, {'end': 1113.861, 'text': "we've got to convert the chars to vectors because a vector is an array of float values in this case, and vector is an array,", 'start': 1105.856, 'duration': 8.005}, {'end': 1121.866, 'text': "a list of numbers in the context of machine learning, and so So we'll calculate the vocab size to help us do this.", 'start': 1113.861, 'duration': 8.005}, {'end': 1127.61, 'text': "so we're going to create two dictionaries and both of these dictionaries are going to convert the.", 'start': 1121.866, 'duration': 5.744}], 'summary': 'Data contains 137k characters, 81 unique. calculating vocab size for vector conversion.', 'duration': 64.942, 'max_score': 1062.668, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1062668.jpg'}, {'end': 1096.309, 'src': 'embed', 'start': 1066.85, 'weight': 4, 'content': [{'end': 1073.314, 'text': 'because we want to make a vector of the size of the number of chars that there are.', 'start': 1066.85, 'duration': 6.464}, {'end': 1075.655, 'text': 'So let me go ahead and print that out.', 'start': 1073.834, 'duration': 1.821}, {'end': 1079.618, 'text': "And it's going to tell us exactly what the deal is.", 'start': 1076.536, 'duration': 3.082}, {'end': 1081.058, 'text': "And so we've got.", 'start': 1080.378, 'duration': 0.68}, {'end': 1085.341, 'text': "Care. that's how many characters it has.", 'start': 1083.06, 'duration': 2.281}, {'end': 1090.885, 'text': 'okay, so the data has 137 K characters and 81 of them are unique.', 'start': 1085.341, 'duration': 5.544}, {'end': 1092.747, 'text': 'okay, good, good to know, good to know.', 'start': 1090.885, 'duration': 1.862}, {'end': 1096.309, 'text': 'our next step is to calculate the vocab size.', 'start': 1092.747, 'duration': 3.562}], 'summary': 'Data has 137k characters and 81 are unique. calculating vocab size next.', 'duration': 29.459, 'max_score': 1066.85, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1066850.jpg'}, {'end': 1212.024, 'src': 'heatmap', 'start': 1136.968, 'weight': 5, 'content': [{'end': 1141.132, 'text': "One will convert from character to integer, which is the one that I've just written,", 'start': 1136.968, 'duration': 4.164}, {'end': 1147.297, 'text': "and then the next one is going to say let's convert the integers to characters.", 'start': 1141.132, 'duration': 6.165}, {'end': 1156.643, 'text': "And once we've done that, we can go ahead and say well, let's print all of the values that it's storing, because these are our dictionaries,", 'start': 1148.258, 'duration': 8.385}, {'end': 1160.405, 'text': "that we're going to use in a second to convert our values into vectors.", 'start': 1156.643, 'duration': 3.762}, {'end': 1162.166, 'text': "So let's go ahead and print that.", 'start': 1161.106, 'duration': 1.06}, {'end': 1171.812, 'text': "And what's the deal here? Oh, enumerate, right? Enumerate, right.", 'start': 1162.807, 'duration': 9.005}, {'end': 1176.179, 'text': 'Great. So here are our vectors.', 'start': 1172.392, 'duration': 3.787}, {'end': 1182.906, 'text': "right there It's a dictionary, or here here are dictionaries, one for characters to integers and one for integers to characters.", 'start': 1176.179, 'duration': 6.727}, {'end': 1188.232, 'text': 'Okay, so once we have that now.', 'start': 1182.906, 'duration': 5.326}, {'end': 1190.335, 'text': "So we've done that already.", 'start': 1188.232, 'duration': 2.103}, {'end': 1196.977, 'text': "and so then we're going to say let's Create a vector for character a.", 'start': 1190.335, 'duration': 6.642}, {'end': 1199.878, 'text': 'so this is what vectorization looks like for for us.', 'start': 1196.977, 'duration': 2.901}, {'end': 1202.5, 'text': "So let's say we want to create a vector for the character a.", 'start': 1199.878, 'duration': 2.622}, {'end': 1204.541, 'text': "so we'll say it will initialize the vectors, empty.", 'start': 1202.5, 'duration': 2.041}, {'end': 1212.024, 'text': "So it's just a vector of zeros, of the size of the vocab, Okay, and so of the size of our vocab.", 'start': 1204.541, 'duration': 7.483}], 'summary': "Converting characters to integers and vice versa, printing and creating vectors for character 'a'.", 'duration': 75.056, 'max_score': 1136.968, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1136968.jpg'}, {'end': 1344.673, 'src': 'embed', 'start': 1314.797, 'weight': 6, 'content': [{'end': 1318.402, 'text': "So we want to say that our network's gonna have 100 neurons for its hidden layer.", 'start': 1314.797, 'duration': 3.605}, {'end': 1332.53, 'text': "And then we're gonna say that we want There are going to be 25 characters that are generated at every time step.", 'start': 1318.442, 'duration': 14.088}, {'end': 1333.47, 'text': "That's our sequence length.", 'start': 1332.55, 'duration': 0.92}, {'end': 1337.151, 'text': 'And then our learning rate is going to be this very small number.', 'start': 1333.87, 'duration': 3.281}, {'end': 1340.192, 'text': "Because if it's too slow, then it's never going to converge.", 'start': 1337.952, 'duration': 2.24}, {'end': 1344.673, 'text': "But if it's too high, then it will overshoot and it's just never going to converge.", 'start': 1340.212, 'duration': 4.461}], 'summary': 'Network has 100 neurons in hidden layer, 25 characters generated at each time step, and a small learning rate for convergence.', 'duration': 29.876, 'max_score': 1314.797, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1314797.jpg'}], 'start': 933.211, 'title': 'Recurrent networks and model training', 'summary': 'Emphasizes understanding recurrent networks, loading training data, defining network, loss function, and prediction function. it also covers loading training data, calculating vocab size, creating dictionaries, and defining model parameters for a 3-layer neural network with 100 neurons in the hidden layer and a sequence length of 25 characters for generation.', 'chapters': [{'end': 986.46, 'start': 933.211, 'title': 'Understanding recurrent networks and training data', 'summary': 'Highlights the importance of understanding recurrent networks before moving on to advanced networks, and explains the process of loading training data, defining the network, loss function, and prediction function.', 'duration': 53.249, 'highlights': ['The chapter emphasizes the importance of understanding recurrent networks before delving into advanced networks like LSTM, bidirectional, and recursive networks.', 'The code contains four parts: loading the training data, defining the network, defining the loss function, and implementing the prediction function.', 'The loss function is crucial as it contains both the forward and backward pass, providing the gradient values necessary for updating weights during training.', 'The training process involves making predictions, typically generating words, and training the network to improve performance.']}, {'end': 1374.184, 'start': 987.32, 'title': 'Neural network training data loading and model parameter definition', 'summary': 'Outlines the process of loading training data, calculating vocab size, creating dictionaries for character-to-integer conversion, and defining model parameters for a neural network with 3 layers, 100 neurons in the hidden layer, and a sequence length of 25 characters for generation.', 'duration': 386.864, 'highlights': ['The data has 137K characters and 81 unique characters, which is important for creating a vector of the size of the number of characters. The data consists of 137K characters with 81 unique characters, crucial for generating vectors.', 'Creation of two dictionaries for character-to-integer and integer-to-character conversion, essential for converting values into vectors. The process involves creating two dictionaries for character-to-integer and integer-to-character conversion, critical for transforming values into vectors.', 'Defining the network with 3 layers, 100 neurons in the hidden layer, and a sequence length of 25 characters for generation. The network is defined with 3 layers, 100 neurons in the hidden layer, and a sequence length of 25 characters for generation.']}], 'duration': 440.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA933211.jpg', 'highlights': ['The loss function is crucial as it contains both the forward and backward pass, providing the gradient values necessary for updating weights during training.', 'The chapter emphasizes the importance of understanding recurrent networks before delving into advanced networks like LSTM, bidirectional, and recursive networks.', 'The code contains four parts: loading the training data, defining the network, defining the loss function, and implementing the prediction function.', 'The training process involves making predictions, typically generating words, and training the network to improve performance.', 'The data has 137K characters and 81 unique characters, which is important for creating a vector of the size of the number of characters.', 'Creation of two dictionaries for character-to-integer and integer-to-character conversion, essential for converting values into vectors.', 'Defining the network with 3 layers, 100 neurons in the hidden layer, and a sequence length of 25 characters for generation.']}, {'end': 1905.337, 'segs': [{'end': 1673.515, 'src': 'heatmap', 'start': 1585.857, 'weight': 0, 'content': [{'end': 1587.458, 'text': 'This is the forward pass,', 'start': 1585.857, 'duration': 1.601}, {'end': 1603.951, 'text': "is the dot product between the input to hidden state weight matrix and the input data That's this term right here plus the dot product between the hidden state and the hidden state to hidden state matrix and the hidden state.", 'start': 1587.458, 'duration': 16.493}, {'end': 1610.213, 'text': "and then we add the hidden bias and that's going to give us the hidden state value at the current time step.", 'start': 1603.951, 'duration': 6.262}, {'end': 1611.914, 'text': "right. so that's what that represents.", 'start': 1610.213, 'duration': 1.701}, {'end': 1614.615, 'text': 'and then we take that value and we feed it.', 'start': 1611.914, 'duration': 2.701}, {'end': 1619.616, 'text': 'we compute a dot product with the next weight matrix and that is the hidden state to the output,', 'start': 1614.615, 'duration': 5.001}, {'end': 1629.059, 'text': "and then we add that That output bias value and that's going to give us the unnormal, Unnormalized log probabilities for the next charge,", 'start': 1619.616, 'duration': 9.443}, {'end': 1637.361, 'text': 'which we then squash into probability values using the, this, this function, p, Which is actually right here, p right here,', 'start': 1629.059, 'duration': 8.302}, {'end': 1638.641, 'text': "but I'll talk about that in a second, okay?", 'start': 1637.361, 'duration': 1.28}, {'end': 1643.542, 'text': "So that's our forward pass and then for our backward pass, the backward pass is going to be.", 'start': 1638.941, 'duration': 4.601}, {'end': 1648.434, 'text': "Before I talk about the backward pass, let's talk about the loss for a second.", 'start': 1645.652, 'duration': 2.782}, {'end': 1650.936, 'text': 'so the loss is the negative log likelihood.', 'start': 1648.434, 'duration': 2.502}, {'end': 1654.339, 'text': "So it's the negative log value of p and p.", 'start': 1650.936, 'duration': 3.403}, {'end': 1655.72, 'text': 'is this function here?', 'start': 1654.339, 'duration': 1.381}, {'end': 1659.663, 'text': 'so Which is represented programmatically by this?', 'start': 1655.72, 'duration': 3.943}, {'end': 1662.526, 'text': "Right here, right, so it's the.", 'start': 1660.804, 'duration': 1.722}, {'end': 1673.515, 'text': "it's e to the x, where x is the output value from the that it received, divided by the sum of all of the e to the probability values.", 'start': 1662.526, 'duration': 10.989}], 'summary': 'The forward pass computes hidden state and output probabilities, while the loss is the negative log likelihood.', 'duration': 87.658, 'max_score': 1585.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1585857.jpg'}, {'end': 1730.632, 'src': 'embed', 'start': 1706.107, 'weight': 2, 'content': [{'end': 1712.013, 'text': "Given an error value, we're gonna compute the partial derivative of the error with respect to each weight recursively.", 'start': 1706.107, 'duration': 5.906}, {'end': 1715.557, 'text': "So the reason we're using the chain rule is this.", 'start': 1712.373, 'duration': 3.184}, {'end': 1722.684, 'text': 'So, because we have three weight matrices we have the input to hidden, hidden to output and hidden to hidden.', 'start': 1715.777, 'duration': 6.907}, {'end': 1725.807, 'text': 'we want to compute gradient values for all three of those.', 'start': 1722.684, 'duration': 3.123}, {'end': 1726.989, 'text': "So that's what this looks like.", 'start': 1725.847, 'duration': 1.142}, {'end': 1730.632, 'text': 'We want to compute gradient values for all three of those weight matrices.', 'start': 1727.469, 'duration': 3.163}], 'summary': 'Compute partial derivatives of error with respect to each weight recursively to compute gradient values for three weight matrices.', 'duration': 24.525, 'max_score': 1706.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1706107.jpg'}], 'start': 1374.524, 'title': 'Neural network model parameters and recurrent network forward and backward pass', 'summary': 'Discusses the definition of model parameters for a character-level recurrent network, emphasizing input, hidden, and output states. it also explains the forward and backward pass in a recurrent network, including loss computation, back propagation, and weight matrix updates.', 'chapters': [{'end': 1523.577, 'start': 1374.524, 'title': 'Neural network model parameters', 'summary': 'Discusses the definition of model parameters, including the initialization of weight matrices and biases for a character-level recurrent network, with specific emphasis on input, hidden, and output states.', 'duration': 149.053, 'highlights': ["The weights from the input to the hidden state are initialized randomly using the NumPy's random randn function to be a value between the hidden size and the vocab size, scaled by 0.01 for a character-level recurrent network.", 'The recurrent weight matrix is defined for the transition from the hidden state to the next hidden state in the network, specifying the weights between these states.', 'The third weight matrix corresponds to the mapping from the hidden states to the network outputs, with the size being between the vocab size and the hidden size.']}, {'end': 1905.337, 'start': 1530.845, 'title': 'Recurrent network forward and backward pass', 'summary': 'Explains the forward pass in a recurrent network, detailing the calculation of the hidden state, and the backward pass involving loss computation, back propagation using the chain rule, and updating weight matrices.', 'duration': 374.492, 'highlights': ["The forward pass involves a series of matrix operations, including dot products between input and hidden state weight matrices, hidden bias addition, and output bias addition, resulting in unnormalized log probabilities for the next characters. The forward pass is a series of matrix operations involving dot products between weight matrices, addition of biases, and computation of unnormalized log probabilities, providing insight into the network's calculation process.", "The loss is defined as the negative log likelihood, computed using the negative log value of the probability function, and utilized to perform back propagation using the chain rule to compute gradients for each weight matrix. The chapter details the computation of loss as the negative log likelihood and the subsequent use of the chain rule for back propagation to calculate gradients for each weight matrix, providing a comprehensive understanding of the network's training process.", "The chain rule is employed to recursively compute the partial derivatives of the error with respect to each weight, facilitating the computation of gradient values for all weight matrices, enabling their simultaneous update during training. The chain rule's application in recursively computing partial derivatives of errors with respect to weights is highlighted, showcasing its role in calculating gradient values for all weight matrices and enabling simultaneous updates during training."]}], 'duration': 530.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1374524.jpg', 'highlights': ['The forward pass involves matrix operations, resulting in unnormalized log probabilities for next characters.', 'The loss is defined as negative log likelihood, computed using the negative log value of the probability function.', 'The chain rule is employed to recursively compute partial derivatives of the error with respect to each weight.']}, {'end': 2733.21, 'segs': [{'end': 1949.618, 'src': 'embed', 'start': 1905.337, 'weight': 1, 'content': [{'end': 1913.653, 'text': "anyway, to define our loss function, Our loss function is going to be so we're going to give it our inputs and our targets as its parameters,", 'start': 1905.337, 'duration': 8.316}, {'end': 1917.716, 'text': 'as well as the Hidden state from the previous time step.', 'start': 1913.653, 'duration': 4.063}, {'end': 1923, 'text': "okay, so then let's define our Parameters that we're going to store these values in.", 'start': 1917.716, 'duration': 5.284}, {'end': 1926.023, 'text': "so I'm going to define four parameters.", 'start': 1923, 'duration': 3.023}, {'end': 1930.106, 'text': "Okay, these are lists that we're going to store values at every time step in.", 'start': 1926.023, 'duration': 4.083}, {'end': 1931.887, 'text': 'okay, as we compute them.', 'start': 1930.106, 'duration': 1.781}, {'end': 1934.869, 'text': 'So these are empty dictionaries.', 'start': 1932.748, 'duration': 2.121}, {'end': 1936.03, 'text': 'so x of x.', 'start': 1934.869, 'duration': 1.161}, {'end': 1943.635, 'text': 'so xs is going to will store the one hot encoded input characters for each of the of the 25 time steps.', 'start': 1936.03, 'duration': 7.605}, {'end': 1946.836, 'text': 'so consort, this will store the input characters.', 'start': 1943.635, 'duration': 3.201}, {'end': 1949.618, 'text': 'Hs is going to store the hidden state outputs.', 'start': 1946.836, 'duration': 2.782}], 'summary': 'Defining loss function and parameters; storing input and hidden state values.', 'duration': 44.281, 'max_score': 1905.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1905337.jpg'}, {'end': 2003.426, 'src': 'embed', 'start': 1975.783, 'weight': 0, 'content': [{'end': 1980.928, 'text': 'the Hs currently with the previous in state and Using the equal sign would just create a reference.', 'start': 1975.783, 'duration': 5.145}, {'end': 1983.43, 'text': 'but we want to create a whole separate array.', 'start': 1980.928, 'duration': 2.502}, {'end': 1984.851, 'text': "So that's why we don't, we don't.", 'start': 1983.43, 'duration': 1.421}, {'end': 1994.479, 'text': "We don't want hs with the element negative 1 to automatically chain if change, if h previous has changed, So we'll create an entirely new copy of it.", 'start': 1985.552, 'duration': 8.927}, {'end': 1999.784, 'text': "and so then we'll initialize our loss as 0 and then and then, okay, So we'll initialize our loss as 0.", 'start': 1994.479, 'duration': 5.305}, {'end': 2003.426, 'text': 'so this is our loss, scalar value.', 'start': 1999.784, 'duration': 3.642}], 'summary': 'Creating separate array to avoid automatic chaining, initializing loss as 0.', 'duration': 27.643, 'max_score': 1975.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1975783.jpg'}, {'end': 2052.219, 'src': 'heatmap', 'start': 2020.134, 'weight': 0.887, 'content': [{'end': 2028.181, 'text': "so the forward pass is going to be We're going to start off with that one of k representation.", 'start': 2020.134, 'duration': 8.047}, {'end': 2037.631, 'text': 'We place a zero vector as the t-th input, and then inside that t-th input, we use the integer in inputs list to set the correct value there.', 'start': 2028.221, 'duration': 9.41}, {'end': 2039.273, 'text': "Okay, so that's on that second line.", 'start': 2037.811, 'duration': 1.462}, {'end': 2042.094, 'text': "And then once we have that, we're going to compute the hidden state.", 'start': 2039.853, 'duration': 2.241}, {'end': 2043.995, 'text': 'Now remember, I showed you the equation before.', 'start': 2042.134, 'duration': 1.861}, {'end': 2045.716, 'text': 'We just repeat that equation here.', 'start': 2044.356, 'duration': 1.36}, {'end': 2052.219, 'text': 'And then we compute our output, just like I showed before, and then our probabilities, the probabilities for the next chars.', 'start': 2046.197, 'duration': 6.022}], 'summary': 'The forward pass computes hidden state and output probabilities for next characters.', 'duration': 32.085, 'max_score': 2020.134, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA2020134.jpg'}, {'end': 2321.036, 'src': 'embed', 'start': 2290.907, 'weight': 4, 'content': [{'end': 2293.531, 'text': 'then what happens is as the gradient is moving.', 'start': 2290.907, 'duration': 2.624}, {'end': 2298.677, 'text': "by moving, I mean you're computing the dot product of it for every layer with the current weight Matrix,", 'start': 2293.531, 'duration': 5.146}, {'end': 2304.584, 'text': "wherever you're at using the partial derivative and The, the gradient value will get smaller and smaller.", 'start': 2298.677, 'duration': 5.907}, {'end': 2306.645, 'text': "there's a problem with the recurrent networks.", 'start': 2304.584, 'duration': 2.061}, {'end': 2309.127, 'text': "That's called the vanishing gradient problem.", 'start': 2306.725, 'duration': 2.402}, {'end': 2313.15, 'text': "Okay, and so it gets smaller and smaller, and there's a way to prevent that.", 'start': 2309.127, 'duration': 4.023}, {'end': 2321.036, 'text': 'one way is to clip those, those values by defining some, some interval that they can, that could they can reach.', 'start': 2313.15, 'duration': 7.886}], 'summary': 'Vanishing gradient problem occurs in recurrent networks, preventing it by clipping values within a defined interval.', 'duration': 30.129, 'max_score': 2290.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA2290907.jpg'}, {'end': 2418.28, 'src': 'embed', 'start': 2387.639, 'weight': 3, 'content': [{'end': 2390.942, 'text': "and then we're gonna do a backward pass to calculate all those gradient values.", 'start': 2387.639, 'duration': 3.303}, {'end': 2393.764, 'text': "And then we're going to update the model using a technique.", 'start': 2391.502, 'duration': 2.262}, {'end': 2400.128, 'text': "It's a type of gradient descent technique called attagrad, which is just, it just decays the learning rate.", 'start': 2393.884, 'duration': 6.244}, {'end': 2401.629, 'text': "But it's gradient descent.", 'start': 2400.168, 'duration': 1.461}, {'end': 2402.71, 'text': "You'll see what I'm talking about.", 'start': 2401.849, 'duration': 0.861}, {'end': 2403.55, 'text': "It's not complicated.", 'start': 2402.75, 'duration': 0.8}, {'end': 2404.911, 'text': "But it's called attagrad.", 'start': 2403.99, 'duration': 0.921}, {'end': 2408.153, 'text': "So we're going to create two arrays of charts from the data file.", 'start': 2405.551, 'duration': 2.602}, {'end': 2410.535, 'text': 'The target one is going to be shifted from the input one.', 'start': 2408.193, 'duration': 2.342}, {'end': 2413.076, 'text': 'So we basically just shift it by one, as you notice here.', 'start': 2410.795, 'duration': 2.281}, {'end': 2418.28, 'text': 'So now we have our inputs and our targets, right? And these numbers are actually character values in the dictionary.', 'start': 2413.377, 'duration': 4.903}], 'summary': 'Using attagrad technique for gradient descent to update model and create arrays of charts from data file.', 'duration': 30.641, 'max_score': 2387.639, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA2387639.jpg'}, {'end': 2719.038, 'src': 'embed', 'start': 2696.24, 'weight': 5, 'content': [{'end': 2703.585, 'text': "This is one thing that's being worked on, but the idea is that you'll get different results for whatever set of matrices,", 'start': 2696.24, 'duration': 7.345}, {'end': 2708.549, 'text': 'that you Add deeper layers to their different papers on this.', 'start': 2703.585, 'duration': 4.964}, {'end': 2714.634, 'text': "but yes, adding deeper layers is going to give you better results, And that's deep learning, Recurrent nets applied to deep learning.", 'start': 2708.549, 'duration': 6.085}, {'end': 2719.038, 'text': 'But this is a simple three layer feed forward network that works really well,', 'start': 2714.634, 'duration': 4.404}], 'summary': 'Adding deeper layers improves results in deep learning.', 'duration': 22.798, 'max_score': 2696.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA2696240.jpg'}], 'start': 1905.337, 'title': 'Recurrent neural network training', 'summary': 'Discusses the process of training a recurrent neural network, including initializing loss, performing forward and backward passes, using the adagrad technique for parameter updates, addressing challenges of vanishing gradients, implementing deeper layers, and practical applications of the network.', 'chapters': [{'end': 1994.479, 'start': 1905.337, 'title': 'Loss function and parameter definition', 'summary': 'Discusses the definition of the loss function, parameters for storing values at every time step, and the initialization of the hidden state outputs, with a focus on creating separate arrays to avoid automatic chaining.', 'duration': 89.142, 'highlights': ['The loss function is defined with inputs, targets, and hidden state from the previous time step, and four parameters are defined to store values at each time step.', 'The parameters include lists to store one-hot encoded input characters, input characters, hidden state outputs, target values, and normalized probabilities for characters.', 'The process involves initializing the hidden state outputs by creating a whole separate array to avoid automatic chaining from the previous hidden state.']}, {'end': 2733.21, 'start': 1994.479, 'title': 'Recurrent neural network training', 'summary': 'Explains the process of training a recurrent neural network, including initializing loss, performing forward and backward passes, using the adagrad technique for parameter updates, and generating text. it also discusses the challenges of vanishing gradients, the implementation of deeper layers for improved results, and provides practical applications of the network.', 'duration': 738.731, 'highlights': ['The chapter explains the process of training a recurrent neural network, including initializing loss, performing forward and backward passes, using the Adagrad technique for parameter updates, and generating text. The transcript provides a detailed explanation of the training process for a recurrent neural network, covering the initialization of loss, forward and backward passes, and the use of the Adagrad technique for parameter updates.', 'Challenges of vanishing gradients in recurrent networks are discussed, along with techniques to mitigate the issue, such as clipping values or using LSTM networks. The transcript addresses the vanishing gradient problem in recurrent networks and suggests methods to mitigate the issue, including value clipping and the application of LSTM networks.', 'The implementation of deeper layers in the network for improved results is explored, with the explanation that the addition of deeper layers depends on the specific matrices and can yield different outcomes. The transcript delves into the implementation of deeper layers in the network for improved results, highlighting that the choice of matrices for adding deeper layers can lead to different outcomes.', 'Practical applications of the recurrent neural network are mentioned, including generating various types of text, such as Wikipedia articles, fake news, and code. The transcript outlines practical applications of the recurrent neural network, including the generation of diverse text types, such as Wikipedia articles, fake news, and code.']}], 'duration': 827.873, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BwmddtPFWtA/pics/BwmddtPFWtA1905337.jpg', 'highlights': ['The process involves initializing the hidden state outputs by creating a whole separate array to avoid automatic chaining from the previous hidden state.', 'The parameters include lists to store one-hot encoded input characters, input characters, hidden state outputs, target values, and normalized probabilities for characters.', 'The loss function is defined with inputs, targets, and hidden state from the previous time step, and four parameters are defined to store values at each time step.', 'The transcript provides a detailed explanation of the training process for a recurrent neural network, covering the initialization of loss, forward and backward passes, and the use of the Adagrad technique for parameter updates.', 'Challenges of vanishing gradients in recurrent networks are discussed, along with techniques to mitigate the issue, such as clipping values or using LSTM networks.', 'The implementation of deeper layers in the network for improved results is explored, with the explanation that the addition of deeper layers depends on the specific matrices and can yield different outcomes.', 'Practical applications of the recurrent neural network are mentioned, including generating various types of text, such as Wikipedia articles, fake news, and code.']}], 'highlights': ['Recurrent networks leverage bigger data and computing power to outperform other machine learning models in accuracy.', 'The loss function is crucial as it contains both the forward and backward pass, providing the gradient values necessary for updating weights during training.', 'The chapter emphasizes the importance of understanding recurrent networks before delving into advanced networks like LSTM, bidirectional, and recursive networks.', 'The forward pass involves matrix operations, resulting in unnormalized log probabilities for next characters.', 'The recommendation is made to train models in the cloud due to the extended time required for training on a laptop.', 'The chapter discusses the conditional memory associated with learning the alphabet or song lyrics, highlighting the challenge of reciting them backwards and emphasizing the significance of sequence in memory recall.', 'The process involves initializing the hidden state outputs by creating a whole separate array to avoid automatic chaining from the previous hidden state.', 'The chapter emphasizes the impact of previous stock prices on the current stock price, highlighting the importance of considering preceding data in financial analysis.', 'The data has 137K characters and 81 unique characters, which is important for creating a vector of the size of the number of characters.', 'The role of biases in neural networks is crucial as it affects the computation graph and the application of activation functions, emphasizing the necessity to add bias values for effective learning.']}