title
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka

description
( TensorFlow Training - https://www.edureka.co/ai-deep-learning-with-tensorflow ) This Edureka Recurrent Neural Networks tutorial video (Blog: https://goo.gl/4zxMfU) will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story Below are the topics covered in this tutorial: 1. Why Not Feedforward Networks? 2. What Are Recurrent Neural Networks? 3. Training A Recurrent Neural Network 4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient 5. Long Short-Term Memory Networks (LSTMs) 6. LSTM Use-Case Subscribe to our channel to get video updates. Hit the subscribe button above. Check our complete Deep Learning With TensorFlow playlist here: https://goo.gl/cck4hE PG in Artificial Intelligence and Machine Learning with NIT Warangal : https://www.edureka.co/post-graduate/machine-learning-and-ai Post Graduate Certification in Data Science with IIT Guwahati - https://www.edureka.co/post-graduate/data-science-program (450+ Hrs || 9 Months || 20+ Projects & 100+ Case studies) - - - - - - - - - - - - - - How it Works? 1. This is 21 hrs of Online Live Instructor-led course. Weekend class: 7 sessions of 3 hours each. 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - About the Course Edureka's Deep learning with Tensorflow course will help you to learn the basic concepts of TensorFlow, the main functions, operations and the execution pipeline. Starting with a simple “Hello Word” example, throughout the course you will be able to see how TensorFlow can be used in curve fitting, regression, classification and minimization of error functions. This concept is then explored in the Deep Learning world. You will evaluate the common, and not so common, deep neural networks and see how these can be exploited in the real world with complex raw data using TensorFlow. In addition, you will learn how to apply TensorFlow for backpropagation to tune the weights and biases while the Neural Networks are being trained. Finally, the course covers different types of Deep Architectures, such as Convolutional Networks, Recurrent Networks and Autoencoders. Delve into neural networks, implement Deep Learning algorithms, and explore layers of data abstraction with the help of this Deep Learning with TensorFlow course. - - - - - - - - - - - - - - Who should go for this course? The following professionals can go for this course: 1. Developers aspiring to be a 'Data Scientist' 2. Analytics Managers who are leading a team of analysts 3. Business Analysts who want to understand Deep Learning (ML) Techniques 4. Information Architects who want to gain expertise in Predictive Analytics 5. Professionals who want to captivate and analyze Big Data 6. Analysts wanting to understand Data Science methodologies However, Deep learning is not just focused to one particular industry or skill set, it can be used by anyone to enhance their portfolio. - - - - - - - - - - - - - - Why Learn Deep Learning With TensorFlow? TensorFlow is one of the best libraries to implement Deep Learning. TensorFlow is a software library for numerical computation of mathematical expressions, using data flow graphs. Nodes in the graph represent mathematical operations, while the edges represent the multidimensional data arrays (tensors) that flow between them. It was created by Google and tailored for Machine Learning. In fact, it is being widely used to develop solutions with Deep Learning. Machine learning is one of the fastest-growing and most exciting fields out there, and Deep Learning represents its true bleeding edge. Deep learning is primarily a study of multi-layered neural networks, spanning over a vast range of model architectures. Traditional neural networks relied on shallow nets, composed of one input, one hidden layer and one output layer. Deep-learning networks are distinguished from these ordinary neural networks having more hidden layers, or so-called more depth. These kinds of nets are capable of discovering hidden structures within unlabeled and unstructured data (i.e. images, sound, and text), which constitutes the vast majority of data in the world. For more information, please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free). Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

detail
{'title': 'Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka', 'heatmap': [{'end': 739.716, 'start': 713.106, 'weight': 0.71}, {'end': 784.721, 'start': 757.302, 'weight': 0.747}, {'end': 977.932, 'start': 870.254, 'weight': 0.715}, {'end': 1044.545, 'start': 999.264, 'weight': 0.704}, {'end': 1355.435, 'start': 1321.334, 'weight': 0.712}], 'summary': 'This tutorial explores the limitations of feedforward networks, the need for recurrent neural networks, backpropagation, vanishing gradient, and lstm use cases. it also covers implementing rnn and lstm models with tensorflow, including 50,000 iterations and a batch size of 1000 for coherent story generation and addressing long-term dependency issues.', 'chapters': [{'end': 170.609, 'segs': [{'end': 44.632, 'src': 'embed', 'start': 16.733, 'weight': 0, 'content': [{'end': 21.955, 'text': "After that, we'll understand how recurrent neural networks solve those issues, and we'll understand how exactly it works.", 'start': 16.733, 'duration': 5.222}, {'end': 26.996, 'text': "Then we'll focus on various issues with recurrent neural networks, namely vanishing and exploring gradient.", 'start': 22.455, 'duration': 4.541}, {'end': 32.96, 'text': "After that we'll understand how LSTMs, that is long short term memory units, solve this problem.", 'start': 27.654, 'duration': 5.306}, {'end': 35.622, 'text': "And finally we'll be looking at an LSTM use case.", 'start': 33.38, 'duration': 2.242}, {'end': 39.767, 'text': "So guys let's begin, we'll understand why can't we use feed forward networks.", 'start': 36.323, 'duration': 3.444}, {'end': 44.632, 'text': 'Now let us take an example of a feed forward network that is used for image classification.', 'start': 40.688, 'duration': 3.944}], 'summary': 'The transcript discusses recurrent neural networks, lstms, and feed forward networks for image classification.', 'duration': 27.899, 'max_score': 16.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc16733.jpg'}, {'end': 136.227, 'src': 'embed', 'start': 110.934, 'weight': 1, 'content': [{'end': 116.115, 'text': 'That is, output at t plus one has no relation with output at t minus two, t minus one, and at t.', 'start': 110.934, 'duration': 5.181}, {'end': 120.918, 'text': 'So basically we cannot use feed forward networks for predicting the next word in a sentence.', 'start': 116.835, 'duration': 4.083}, {'end': 127.562, 'text': 'Similarly, you can think of many other examples where we need the previous output, some information from the previous output,', 'start': 121.358, 'duration': 6.204}, {'end': 129.082, 'text': 'so as to infer the new output.', 'start': 127.562, 'duration': 1.52}, {'end': 130.743, 'text': 'This is just one small example.', 'start': 129.363, 'duration': 1.38}, {'end': 132.685, 'text': 'There are many other examples that you can think of.', 'start': 130.764, 'duration': 1.921}, {'end': 136.227, 'text': "So we'll move forward and understand how we can solve this particular problem.", 'start': 133.165, 'duration': 3.062}], 'summary': 'Feed forward networks cannot predict next word in a sentence without previous output.', 'duration': 25.293, 'max_score': 110.934, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc110934.jpg'}], 'start': 0.069, 'title': 'Recurrent neural networks', 'summary': 'Discusses the limitations of feedforward networks, the need for recurrent neural networks, emphasizing the dependency of outputs, the importance of previous information, and an lstm use case.', 'chapters': [{'end': 170.609, 'start': 0.069, 'title': 'Recurrent neural networks', 'summary': 'Discusses the limitations of feedforward networks and the need for recurrent neural networks, focusing on issues and solutions, with an emphasis on the dependency of outputs and the importance of previous information, and finally looking at an lstm use case.', 'duration': 170.54, 'highlights': ['Recurrent neural networks solve the limitations of feedforward networks by addressing the dependency of outputs and the importance of previous information. RNNs address the issue of outputs being independent of each other in feedforward networks, highlighting the need for output dependency and the importance of previous information.', 'The example of predicting the next word in a sentence illustrates the need for previous output information, which feedforward networks cannot provide. The example of predicting the next word in a sentence demonstrates the necessity of previous output information, which feedforward networks lack, showcasing the limitation of feedforward networks.', 'Understanding the functioning of LSTMs in solving the issues of vanishing and exploring gradients in recurrent neural networks. Exploration of how LSTMs address the issues of vanishing and exploring gradients in recurrent neural networks, providing a solution to the challenges faced.']}], 'duration': 170.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc69.jpg', 'highlights': ['Recurrent neural networks address the dependency of outputs and the importance of previous information, solving limitations of feedforward networks.', 'The example of predicting the next word in a sentence demonstrates the necessity of previous output information, showcasing the limitation of feedforward networks.', 'LSTMs address the issues of vanishing and exploring gradients in recurrent neural networks, providing a solution to the challenges faced.']}, {'end': 505.732, 'segs': [{'end': 310.8, 'src': 'embed', 'start': 277.15, 'weight': 0, 'content': [{'end': 281.855, 'text': 'and these predictions will be used as inputs in order to predict what exercise will happen today.', 'start': 277.15, 'duration': 4.705}, {'end': 287.858, 'text': 'Similarly, if you have missed your gym say for two days, three days, or one week, so you need to roll back.', 'start': 282.515, 'duration': 5.343}, {'end': 291.16, 'text': 'You need to go to the last day when you went to the gym.', 'start': 288.158, 'duration': 3.002}, {'end': 293.781, 'text': 'you need to figure out what exercise you did on that day.', 'start': 291.16, 'duration': 2.621}, {'end': 298.624, 'text': "feed that as an input, and then only you'll be getting the relevant output as to what exercise will happen today.", 'start': 293.781, 'duration': 4.843}, {'end': 301.706, 'text': "Now what I'll do, I'll convert these things into a vector.", 'start': 299.325, 'duration': 2.381}, {'end': 303.129, 'text': 'Now, what is a vector?', 'start': 302.148, 'duration': 0.981}, {'end': 305.793, 'text': 'Vector is nothing but a list of numbers, all right?', 'start': 303.35, 'duration': 2.443}, {'end': 310.8, 'text': 'So this is the new information, guys, along with the information from the prediction at the previous time step.', 'start': 306.113, 'duration': 4.687}], 'summary': 'Predict exercises based on past gym attendance using vectorization.', 'duration': 33.65, 'max_score': 277.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc277150.jpg'}, {'end': 382.857, 'src': 'embed', 'start': 348.63, 'weight': 2, 'content': [{'end': 354.215, 'text': 'Similarly, this new output that we have got will take some information from that feed in as an input to our network,', 'start': 348.63, 'duration': 5.585}, {'end': 357.898, 'text': 'along with the new information to get the new prediction, and this process keeps on repeating.', 'start': 354.215, 'duration': 3.683}, {'end': 361.31, 'text': 'Now let me show you the math behind the recurrent neural networks.', 'start': 358.649, 'duration': 2.661}, {'end': 365.771, 'text': 'So this is the structure of a recurrent neural network guys, let me explain you what happens here.', 'start': 362.01, 'duration': 3.761}, {'end': 371.293, 'text': 'Now consider a time t equals to zero, we have input x naught, and we need to figure out what is h naught.', 'start': 366.171, 'duration': 5.122}, {'end': 382.857, 'text': 'So, according to this equation, h of zero is equal to wi weight matrix, multiplied by our input x of zero plus WR into H of zero.', 'start': 371.733, 'duration': 11.124}], 'summary': 'Recurrent neural networks use input data to make predictions, with a process that repeats using specific equations and matrices.', 'duration': 34.227, 'max_score': 348.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc348630.jpg'}, {'end': 505.732, 'src': 'embed', 'start': 473.702, 'weight': 3, 'content': [{'end': 476.143, 'text': 'Now you must be thinking how to train a recurrent neural network.', 'start': 473.702, 'duration': 2.441}, {'end': 480.165, 'text': 'So a recurrent neural network uses backpropagation algorithm for training.', 'start': 476.963, 'duration': 3.202}, {'end': 483.327, 'text': 'But backpropagation happens for every timestamp.', 'start': 480.666, 'duration': 2.661}, {'end': 486.61, 'text': 'That is why it is commonly called as backpropagation through type.', 'start': 483.888, 'duration': 2.722}, {'end': 491.413, 'text': "And I've discussed backpropagation in detail in artificial neural network tutorial, so you can go through that.", 'start': 486.99, 'duration': 4.423}, {'end': 496.937, 'text': "Over here I won't be discussing backpropagation in detail, I'll just give you a brief introduction of what it is.", 'start': 491.913, 'duration': 5.024}, {'end': 501.62, 'text': 'Now with backpropagation there are certain issues, namely vanishing and exploding gradients.', 'start': 497.637, 'duration': 3.983}, {'end': 503.081, 'text': 'Let us see those one by one.', 'start': 501.98, 'duration': 1.101}, {'end': 505.732, 'text': 'So, in vanishing gradient, what happens??', 'start': 503.991, 'duration': 1.741}], 'summary': 'Recurrent neural network training uses backpropagation for every timestamp, leading to vanishing and exploding gradients.', 'duration': 32.03, 'max_score': 473.702, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc473702.jpg'}], 'start': 170.969, 'title': 'Recurrent neural networks and basics', 'summary': 'Delves into recurrent neural networks, covering their understanding, implementation, and basics. it explains the concept and working of rnns, detailing the calculation of hidden states and outputs at different timestamps, and introduces the backpropagation algorithm used for training, addressing issues like vanishing and exploding gradients.', 'chapters': [{'end': 365.771, 'start': 170.969, 'title': 'Recurrent neural networks: understanding and implementation', 'summary': 'Explains the concept of recurrent neural networks using an analogy of a gym schedule, highlighting the need for considering previous timestamps and demonstrating the process with a vector representation.', 'duration': 194.802, 'highlights': ['The need to consider previous timestamps and the impact of missed gym days on predicting the current exercise, emphasizing the importance of historical data in recurrent neural networks.', 'The process of converting gym exercises into a vector representation, demonstrating how the vector contains information from the previous timestamp and the new information for predicting the current exercise.', 'The description of the structure and functioning of recurrent neural networks, emphasizing the iterative process of using previous outputs and new information as inputs for predicting the current output.']}, {'end': 505.732, 'start': 366.171, 'title': 'Recurrent neural network basics', 'summary': 'Explains the working of a recurrent neural network (rnn) using equations and examples, detailing the calculation of hidden states and outputs at different time stamps, and introduces the backpropagation algorithm used for training and the issues of vanishing and exploding gradients.', 'duration': 139.561, 'highlights': ['Recurrent neural network (RNN) equations and examples The chapter explains the working of a recurrent neural network (RNN) using equations and examples to calculate hidden states and outputs at different time stamps.', 'Introduction to backpropagation algorithm for RNN training The chapter introduces the backpropagation algorithm used for training a recurrent neural network (RNN), emphasizing that backpropagation happens for every timestamp and is commonly called backpropagation through time.', 'Issues of vanishing and exploding gradients in RNN The chapter briefly introduces the issues of vanishing and exploding gradients in recurrent neural networks (RNN) and mentions that these issues will be discussed further.']}], 'duration': 334.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc170969.jpg', 'highlights': ['The process of converting gym exercises into a vector representation, demonstrating how the vector contains information from the previous timestamp and the new information for predicting the current exercise.', 'The need to consider previous timestamps and the impact of missed gym days on predicting the current exercise, emphasizing the importance of historical data in recurrent neural networks.', 'The description of the structure and functioning of recurrent neural networks, emphasizing the iterative process of using previous outputs and new information as inputs for predicting the current output.', 'Introduction to backpropagation algorithm for RNN training The chapter introduces the backpropagation algorithm used for training a recurrent neural network (RNN), emphasizing that backpropagation happens for every timestamp and is commonly called backpropagation through time.', 'Recurrent neural network (RNN) equations and examples The chapter explains the working of a recurrent neural network (RNN) using equations and examples to calculate hidden states and outputs at different time stamps.', 'Issues of vanishing and exploding gradients in RNN The chapter briefly introduces the issues of vanishing and exploding gradients in recurrent neural networks (RNN) and mentions that these issues will be discussed further.']}, {'end': 663.999, 'segs': [{'end': 533.166, 'src': 'embed', 'start': 506.352, 'weight': 0, 'content': [{'end': 514.054, 'text': 'When you use back propagation, you tend to calculate the error, which is nothing but the actual output that you already know, minus the model output,', 'start': 506.352, 'duration': 7.702}, {'end': 516.556, 'text': 'the output that you got through your model and the square of that.', 'start': 514.054, 'duration': 2.502}, {'end': 518.537, 'text': 'So you figure out the error.', 'start': 517.155, 'duration': 1.382}, {'end': 526.12, 'text': 'With that error, what do you do? You tend to find out the change in error with respect to change in weight, or any variable.', 'start': 518.996, 'duration': 7.124}, {'end': 527.521, 'text': "So we'll call it weight here.", 'start': 526.5, 'duration': 1.021}, {'end': 533.166, 'text': 'So change of error with respect to weight multiplied by learning rate will give you the change in weight.', 'start': 527.942, 'duration': 5.224}], 'summary': 'Back propagation calculates error for adjusting weights.', 'duration': 26.814, 'max_score': 506.352, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc506352.jpg'}, {'end': 595.343, 'src': 'embed', 'start': 571.826, 'weight': 2, 'content': [{'end': 578.31, 'text': 'So there might be certain examples where you know you are trying to predict say a next word in a sentence and that sentence is pretty long.', 'start': 571.826, 'duration': 6.484}, {'end': 588.258, 'text': 'For example if I say I went to France dash dash dash I went to France then there are certain words then I say few of them speak dash.', 'start': 578.691, 'duration': 9.567}, {'end': 591.08, 'text': 'Now I need to predict speak what will come after speak.', 'start': 588.438, 'duration': 2.642}, {'end': 595.343, 'text': 'So for that I need to go back in time and check what was the context.', 'start': 591.42, 'duration': 3.923}], 'summary': 'Predicting the next word in a sentence, considering context', 'duration': 23.517, 'max_score': 571.826, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc571826.jpg'}, {'end': 644.513, 'src': 'embed', 'start': 613.037, 'weight': 1, 'content': [{'end': 614.839, 'text': 'And that is nothing but your vanishing gradient.', 'start': 613.037, 'duration': 1.802}, {'end': 616.54, 'text': "All right, I'll repeat it once more.", 'start': 615.299, 'duration': 1.241}, {'end': 620.001, 'text': 'So what happens in backpropagation, you first calculate the error.', 'start': 616.84, 'duration': 3.161}, {'end': 625.364, 'text': 'This error is nothing but the difference between the actual output or model output and the square of that.', 'start': 620.502, 'duration': 4.862}, {'end': 630.787, 'text': 'With that error, we figure out what will be the change in error when we change a particular variable, say weight.', 'start': 625.704, 'duration': 5.083}, {'end': 636.19, 'text': 'So DE by DW, multiply it with learning rate to get the change in the variable or change in the weight.', 'start': 631.167, 'duration': 5.023}, {'end': 639.691, 'text': "Now, we'll add that change in the weight to our old weight to get the new weight.", 'start': 636.63, 'duration': 3.061}, {'end': 644.513, 'text': "This is what back propagation is, guys, all right? I'm just giving you a small introduction to back propagation.", 'start': 639.731, 'duration': 4.782}], 'summary': 'Backpropagation calculates error and adjusts weights to minimize it.', 'duration': 31.476, 'max_score': 613.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc613037.jpg'}], 'start': 506.352, 'title': 'Backpropagation and vanishing gradient', 'summary': 'Explains backpropagation, error calculation, weight change, and the concept of vanishing gradient, impacting model training in long sequences.', 'chapters': [{'end': 663.999, 'start': 506.352, 'title': 'Backpropagation and vanishing gradient', 'summary': 'Explains backpropagation, including the calculation of error, finding change in weight, and the concept of vanishing gradient, which can lead to negligible weight updates, impacting the training of models in scenarios with long sequences.', 'duration': 157.647, 'highlights': ['Backpropagation involves calculating the error, finding change in weight, and updating weights to reduce error. Backpropagation includes calculating error as the difference between actual output and model output, finding change in error with respect to a variable (e.g., weight), and updating weights to reduce error.', 'The concept of vanishing gradient is explained, where a very small gradient multiplied by a learning rate can result in negligible weight updates, impacting the training of models in scenarios with long sequences. Vanishing gradient occurs when a very small change in error with respect to a variable (e.g., weight) results in negligible weight updates, particularly in scenarios with long sequences, impacting the training of models.', 'Scenario of predicting the next word in a long sentence is discussed, highlighting the complexity and potential impact on weight updates due to the vanishing gradient. The scenario of predicting the next word in a long sentence is discussed, emphasizing the complexity and potential impact on weight updates due to the vanishing gradient.']}], 'duration': 157.647, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc506352.jpg', 'highlights': ['Backpropagation involves calculating error, finding change in weight, and updating weights to reduce error.', 'The concept of vanishing gradient occurs when a very small change in error with respect to a variable results in negligible weight updates, particularly in scenarios with long sequences, impacting the training of models.', 'Scenario of predicting the next word in a long sentence is discussed, emphasizing the complexity and potential impact on weight updates due to the vanishing gradient.']}, {'end': 854.125, 'segs': [{'end': 741.697, 'src': 'heatmap', 'start': 713.106, 'weight': 0, 'content': [{'end': 720.971, 'text': 'So at that time your d by dw will keep on increasing, delta w will become large and because of that your weights, the new weight that will come,', 'start': 713.106, 'duration': 7.865}, {'end': 723.013, 'text': 'will be very different from your old weight.', 'start': 720.971, 'duration': 2.042}, {'end': 725.795, 'text': 'So these two are the problems with back propagation.', 'start': 723.513, 'duration': 2.282}, {'end': 727.956, 'text': 'Now let us see how to solve these problems.', 'start': 726.235, 'duration': 1.721}, {'end': 733.44, 'text': 'Now exploding gradients can be solved with the help of truncated BTT back propagation through time.', 'start': 729.037, 'duration': 4.403}, {'end': 739.716, 'text': 'So instead of starting back propagation at the last timestamp, we can choose a smaller timestamp like 10.', 'start': 733.84, 'duration': 5.876}, {'end': 741.697, 'text': 'Or we can clip the gradients at a threshold.', 'start': 739.716, 'duration': 1.981}], 'summary': 'Back propagation problems can be solved by truncated btt or gradient clipping.', 'duration': 37.458, 'max_score': 713.106, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc713106.jpg'}, {'end': 801.62, 'src': 'heatmap', 'start': 757.302, 'weight': 1, 'content': [{'end': 761.643, 'text': "In this tutorial, we'll be discussing LSTMs that are long, short-term memory units.", 'start': 757.302, 'duration': 4.341}, {'end': 764.544, 'text': 'Now let us understand what exactly are LSTMs.', 'start': 762.163, 'duration': 2.381}, {'end': 769.047, 'text': 'So guys, we saw what are the two limitations with the recurrent neural networks.', 'start': 765.424, 'duration': 3.623}, {'end': 772.27, 'text': "Now we'll understand how we can solve that with the help of LSTMs.", 'start': 769.468, 'duration': 2.802}, {'end': 780.237, 'text': 'Now what are LSTMs? Long short term memory networks, usually called as LSTMs, are nothing but a special kind of recurrent neural network.', 'start': 772.691, 'duration': 7.546}, {'end': 784.721, 'text': 'And these recurrent neural networks are capable of learning long term dependencies.', 'start': 780.658, 'duration': 4.063}, {'end': 789.926, 'text': "Now what are long term dependencies? I've discussed that in the previous slide, but I'll just explain it to you here as well.", 'start': 785.102, 'duration': 4.824}, {'end': 795.397, 'text': 'Now what happens sometimes we only need to look at the recent information to perform the present task.', 'start': 790.535, 'duration': 4.862}, {'end': 796.798, 'text': 'Now let me give you an example.', 'start': 795.657, 'duration': 1.141}, {'end': 801.62, 'text': 'Consider a language model trying to predict the next word based on the previous ones.', 'start': 797.278, 'duration': 4.342}], 'summary': 'Lstms are special recurrent neural networks capable of learning long term dependencies.', 'duration': 44.318, 'max_score': 757.302, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc757302.jpg'}, {'end': 834.272, 'src': 'embed', 'start': 812.185, 'weight': 4, 'content': [{'end': 820.648, 'text': "Now in such cases, where the gap between the relevant information and the place that it's needed is small, RNNs can learn to use the past information.", 'start': 812.185, 'duration': 8.463}, {'end': 824.489, 'text': "And at that time, there won't be such problems like vanishing and exploring gradient.", 'start': 821.028, 'duration': 3.461}, {'end': 828.13, 'text': 'But there are a few cases where we need more context.', 'start': 825.069, 'duration': 3.061}, {'end': 832.932, 'text': 'Consider trying to predict the last word in the text, I grew up in France.', 'start': 828.59, 'duration': 4.342}, {'end': 834.272, 'text': 'Then there are some words.', 'start': 833.332, 'duration': 0.94}], 'summary': 'Rnns excel with close context, avoiding gradient issues. context needed in certain cases, like predicting text endings.', 'duration': 22.087, 'max_score': 812.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc812185.jpg'}], 'start': 664.139, 'title': 'Gradient problems in neural networks', 'summary': 'Discusses vanishing and exploding gradient problems caused by small and large values of de by dw, and solutions such as lstms and gradient clipping to address these issues, ensuring better learning outcomes and handling long-term dependencies in recurrent neural networks.', 'chapters': [{'end': 723.013, 'start': 664.139, 'title': 'Vanishing and exploding gradient problem', 'summary': 'Discusses the vanishing and exploding gradient problems in neural networks, where small and large values of de by dw lead to negligible or significant changes in weights, impacting learning and causing vanishing or exploding gradients.', 'duration': 58.874, 'highlights': ['The vanishing gradient problem occurs when the DE by DW becomes very small, resulting in negligible changes in weights, hindering learning and causing long-term dependencies. If DE by DW is smaller than one, the resulting delta W will be very small, leading to new weights almost equal to the old weights.', 'The exploding gradient problem occurs when DE by DW becomes very large, leading to significant changes in weights, impacting long-term dependencies. If DE by DW becomes greater than one, delta w will become large, resulting in new weights very different from the old weights.']}, {'end': 854.125, 'start': 723.513, 'title': 'Solving gradient problems with lstms', 'summary': 'Discusses the limitations of back propagation, offers solutions including truncated btt, gradient clipping, relu activation, and lstms, explaining how lstms address long-term dependencies in recurrent neural networks.', 'duration': 130.612, 'highlights': ['LSTMs address long-term dependencies in recurrent neural networks LSTMs are capable of learning long-term dependencies in recurrent neural networks, addressing the issue of needing more context for certain tasks.', 'Solutions to gradient problems include truncated BTT, gradient clipping, ReLU activation, and LSTMs Solutions such as truncated BTT, gradient clipping, ReLU activation, and LSTMs are discussed to address gradient problems in back propagation.', 'Explaining the need for LSTMs in cases requiring more context The chapter illustrates the requirement for LSTMs in cases where more context is needed, such as predicting the last word in a text, to narrow down relevant information.']}], 'duration': 189.986, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc664139.jpg', 'highlights': ['Solutions such as truncated BTT, gradient clipping, ReLU activation, and LSTMs are discussed to address gradient problems in back propagation.', 'LSTMs are capable of learning long-term dependencies in recurrent neural networks, addressing the issue of needing more context for certain tasks.', 'The vanishing gradient problem occurs when the DE by DW becomes very small, resulting in negligible changes in weights, hindering learning and causing long-term dependencies.', 'The exploding gradient problem occurs when DE by DW becomes very large, leading to significant changes in weights, impacting long-term dependencies.', 'The chapter illustrates the requirement for LSTMs in cases where more context is needed, such as predicting the last word in a text, to narrow down relevant information.']}, {'end': 1361.44, 'segs': [{'end': 883.86, 'src': 'embed', 'start': 854.585, 'weight': 0, 'content': [{'end': 856.528, 'text': 'And this is nothing but long-term dependencies.', 'start': 854.585, 'duration': 1.943}, {'end': 860.573, 'text': 'And the LSTMs are capable of handling such long-term dependencies.', 'start': 857.008, 'duration': 3.565}, {'end': 865.119, 'text': 'Now LSTMs also have a chain-like structure, like recurrent neural networks.', 'start': 861.294, 'duration': 3.825}, {'end': 869.834, 'text': 'Now all the recurrent neural networks have the form of a chain of repeating modules of neural networks.', 'start': 865.592, 'duration': 4.242}, {'end': 875.556, 'text': 'Now in standard RNNs, the repeating module will have a very simple structure such as a single tanh layer that you can see.', 'start': 870.254, 'duration': 5.302}, {'end': 878.538, 'text': 'Now this tanh layer is nothing but a squashing function.', 'start': 875.836, 'duration': 2.702}, {'end': 883.86, 'text': 'Now what I mean by squashing function is to convert my values between minus one and one.', 'start': 878.938, 'duration': 4.922}], 'summary': 'Lstms handle long-term dependencies with chain-like structure and squashing function.', 'duration': 29.275, 'max_score': 854.585, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc854585.jpg'}, {'end': 977.932, 'src': 'heatmap', 'start': 870.254, 'weight': 0.715, 'content': [{'end': 875.556, 'text': 'Now in standard RNNs, the repeating module will have a very simple structure such as a single tanh layer that you can see.', 'start': 870.254, 'duration': 5.302}, {'end': 878.538, 'text': 'Now this tanh layer is nothing but a squashing function.', 'start': 875.836, 'duration': 2.702}, {'end': 883.86, 'text': 'Now what I mean by squashing function is to convert my values between minus one and one.', 'start': 878.938, 'duration': 4.922}, {'end': 885.721, 'text': "Alright, that's why we use tanh.", 'start': 884.18, 'duration': 1.541}, {'end': 887.922, 'text': 'And this is an example of an RNN.', 'start': 886.141, 'duration': 1.781}, {'end': 890.863, 'text': "Now we'll understand what exactly are LSTMs.", 'start': 888.282, 'duration': 2.581}, {'end': 893.024, 'text': 'Now this is a structure of an LSTM.', 'start': 891.383, 'duration': 1.641}, {'end': 899.169, 'text': 'Or if you notice, LSTM also have a chain-like structure, but the repeating module has different structures.', 'start': 893.564, 'duration': 5.605}, {'end': 903.813, 'text': 'Instead of having a single neural network layer, there are four interacting in a very special way.', 'start': 899.749, 'duration': 4.064}, {'end': 906.275, 'text': 'Now the key to LSTM is the cell state.', 'start': 904.253, 'duration': 2.022}, {'end': 910.638, 'text': "Now this particular line that I'm highlighting, this is what is called the cell state.", 'start': 906.615, 'duration': 4.023}, {'end': 913.18, 'text': 'The horizontal line running through the top of the diagram.', 'start': 911.018, 'duration': 2.162}, {'end': 914.942, 'text': 'So this is nothing but your cell state.', 'start': 913.4, 'duration': 1.542}, {'end': 918.594, 'text': 'Now you can consider the cell state as a kind of a conveyor belt.', 'start': 915.492, 'duration': 3.102}, {'end': 923.118, 'text': 'It runs straight down the entire chain with only some minor linear interactions.', 'start': 918.774, 'duration': 4.344}, {'end': 928.141, 'text': "Now what I'll do, I'll give you a walkthrough of LSTM step by step, all right? So we'll start with the first step.", 'start': 923.558, 'duration': 4.583}, {'end': 935.367, 'text': 'All right guys, so the first step in our LSTM is to decide what information we are going to throw away from the cell state.', 'start': 929.142, 'duration': 6.225}, {'end': 938.409, 'text': "And you know what is a cell state, right? I've discussed in the previous slide.", 'start': 935.867, 'duration': 2.542}, {'end': 941.531, 'text': 'Now this decision is made by the sigmoid layer.', 'start': 939.03, 'duration': 2.501}, {'end': 944.714, 'text': "So the layer that I'm highlighting with my cursor, it is a sigmoid layer.", 'start': 942.132, 'duration': 2.582}, {'end': 946.514, 'text': 'called the forget gate layer.', 'start': 945.354, 'duration': 1.16}, {'end': 954.357, 'text': 'It looks at ht, minus one, that is the information from the previous timestamp, and xt, which is the new input and outputs,', 'start': 946.995, 'duration': 7.362}, {'end': 957.978, 'text': 'a number between zeros and one for each number in the cell state.', 'start': 954.357, 'duration': 3.621}, {'end': 960.439, 'text': 'ct minus one, which is coming from the previous timestamp.', 'start': 957.978, 'duration': 2.461}, {'end': 965.28, 'text': 'One represents completely keep this, while a zero represents completely get rid of this.', 'start': 961.179, 'duration': 4.101}, {'end': 971.627, 'text': 'Now if we go back to our example of a language model trying to predict the next word based on all the previous ones.', 'start': 965.902, 'duration': 5.725}, {'end': 977.932, 'text': 'in such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used.', 'start': 971.627, 'duration': 6.305}], 'summary': 'Lstm has a chain-like structure with 4 interacting layers, uses cell state as a conveyor belt, and makes decisions using a sigmoid layer.', 'duration': 107.678, 'max_score': 870.254, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc870254.jpg'}, {'end': 923.118, 'src': 'embed', 'start': 891.383, 'weight': 1, 'content': [{'end': 893.024, 'text': 'Now this is a structure of an LSTM.', 'start': 891.383, 'duration': 1.641}, {'end': 899.169, 'text': 'Or if you notice, LSTM also have a chain-like structure, but the repeating module has different structures.', 'start': 893.564, 'duration': 5.605}, {'end': 903.813, 'text': 'Instead of having a single neural network layer, there are four interacting in a very special way.', 'start': 899.749, 'duration': 4.064}, {'end': 906.275, 'text': 'Now the key to LSTM is the cell state.', 'start': 904.253, 'duration': 2.022}, {'end': 910.638, 'text': "Now this particular line that I'm highlighting, this is what is called the cell state.", 'start': 906.615, 'duration': 4.023}, {'end': 913.18, 'text': 'The horizontal line running through the top of the diagram.', 'start': 911.018, 'duration': 2.162}, {'end': 914.942, 'text': 'So this is nothing but your cell state.', 'start': 913.4, 'duration': 1.542}, {'end': 918.594, 'text': 'Now you can consider the cell state as a kind of a conveyor belt.', 'start': 915.492, 'duration': 3.102}, {'end': 923.118, 'text': 'It runs straight down the entire chain with only some minor linear interactions.', 'start': 918.774, 'duration': 4.344}], 'summary': 'Lstm has a chain-like structure with four interacting neural network layers; cell state acts as a conveyor belt.', 'duration': 31.735, 'max_score': 891.383, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc891383.jpg'}, {'end': 954.357, 'src': 'embed', 'start': 929.142, 'weight': 2, 'content': [{'end': 935.367, 'text': 'All right guys, so the first step in our LSTM is to decide what information we are going to throw away from the cell state.', 'start': 929.142, 'duration': 6.225}, {'end': 938.409, 'text': "And you know what is a cell state, right? I've discussed in the previous slide.", 'start': 935.867, 'duration': 2.542}, {'end': 941.531, 'text': 'Now this decision is made by the sigmoid layer.', 'start': 939.03, 'duration': 2.501}, {'end': 944.714, 'text': "So the layer that I'm highlighting with my cursor, it is a sigmoid layer.", 'start': 942.132, 'duration': 2.582}, {'end': 946.514, 'text': 'called the forget gate layer.', 'start': 945.354, 'duration': 1.16}, {'end': 954.357, 'text': 'It looks at ht, minus one, that is the information from the previous timestamp, and xt, which is the new input and outputs,', 'start': 946.995, 'duration': 7.362}], 'summary': 'Lstm uses forget gate to discard information from cell state.', 'duration': 25.215, 'max_score': 929.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc929142.jpg'}, {'end': 1044.545, 'src': 'heatmap', 'start': 999.264, 'weight': 0.704, 'content': [{'end': 1007.687, 'text': 'So currently Ft will be nothing but the weight matrix, multiplied by Ht minus one and Xt and plus the bias,', 'start': 999.264, 'duration': 8.423}, {'end': 1010.088, 'text': 'and this equation is passed through a sigmoid layer.', 'start': 1007.687, 'duration': 2.401}, {'end': 1012.828, 'text': 'Alright, and we get an output that is zero and one.', 'start': 1010.528, 'duration': 2.3}, {'end': 1016.81, 'text': 'Zero means completely get rid of this, and one means completely keep this.', 'start': 1012.869, 'duration': 3.941}, {'end': 1019.972, 'text': 'Alright, so this is what basically is happening in the first step.', 'start': 1017.39, 'duration': 2.582}, {'end': 1022.194, 'text': 'Now let us see what happens in the next step.', 'start': 1020.392, 'duration': 1.802}, {'end': 1026.037, 'text': 'So the next step is to decide what information we are going to store.', 'start': 1022.654, 'duration': 3.383}, {'end': 1032.622, 'text': 'In the previous step we decided what information we are going to keep, but here we are going to decide what information we are going to store here.', 'start': 1026.457, 'duration': 6.165}, {'end': 1035.843, 'text': 'All right, what new information we are going to store in the cell state.', 'start': 1033.002, 'duration': 2.841}, {'end': 1037.242, 'text': 'Now this has two parts.', 'start': 1036.163, 'duration': 1.079}, {'end': 1044.545, 'text': "First, a sigmoid layer, this is called a sigmoid layer, which is also known as an input gate layer, decide which values we'll update.", 'start': 1037.623, 'duration': 6.922}], 'summary': 'Lstm neural network processes input data using weight matrix and bias through sigmoid layer to determine what to keep and store.', 'duration': 45.281, 'max_score': 999.264, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc999264.jpg'}, {'end': 1054.848, 'src': 'embed', 'start': 1026.457, 'weight': 3, 'content': [{'end': 1032.622, 'text': 'In the previous step we decided what information we are going to keep, but here we are going to decide what information we are going to store here.', 'start': 1026.457, 'duration': 6.165}, {'end': 1035.843, 'text': 'All right, what new information we are going to store in the cell state.', 'start': 1033.002, 'duration': 2.841}, {'end': 1037.242, 'text': 'Now this has two parts.', 'start': 1036.163, 'duration': 1.079}, {'end': 1044.545, 'text': "First, a sigmoid layer, this is called a sigmoid layer, which is also known as an input gate layer, decide which values we'll update.", 'start': 1037.623, 'duration': 6.922}, {'end': 1046.506, 'text': 'All right, so what values we need to update.', 'start': 1044.964, 'duration': 1.542}, {'end': 1054.848, 'text': "And there's also a tanash layer that creates a vector of the candidate values, c bar of t minus one, that will be added to the state later on.", 'start': 1046.886, 'duration': 7.962}], 'summary': 'Deciding what information to store with sigmoid and tanash layers.', 'duration': 28.391, 'max_score': 1026.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1026457.jpg'}, {'end': 1148.582, 'src': 'embed', 'start': 1118.024, 'weight': 4, 'content': [{'end': 1123.227, 'text': "we'll multiply the old cell state CT minus one with FT that we got in the first step,", 'start': 1118.024, 'duration': 5.203}, {'end': 1126.768, 'text': 'forgetting the things that we decided to forget earlier in the first step, if you can recall.', 'start': 1123.227, 'duration': 3.541}, {'end': 1128.309, 'text': 'Then what we do?', 'start': 1127.268, 'duration': 1.041}, {'end': 1135.294, 'text': 'we add it to it and ct, then we add it by the term that will come after multiplication of it and c, bar,', 'start': 1128.309, 'duration': 6.985}, {'end': 1140.857, 'text': 't and this new candidate value scaled by how much, we decided to update each state value.', 'start': 1135.294, 'duration': 5.563}, {'end': 1141.117, 'text': 'all right?', 'start': 1140.857, 'duration': 0.26}, {'end': 1148.582, 'text': 'So in the case of the language model that we were discussing, this is where we would actually drop the information about the old subject, gender,', 'start': 1141.458, 'duration': 7.124}], 'summary': 'The process involves multiplying old cell state by ft and adding it to ct, then adding the new candidate value scaled by the update factor.', 'duration': 30.558, 'max_score': 1118.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1118024.jpg'}, {'end': 1185.832, 'src': 'embed', 'start': 1157.188, 'weight': 5, 'content': [{'end': 1164.295, 'text': 'Now our last step is to decide what we are going to output and this output will depend on a cell state but it will be a filtered version.', 'start': 1157.188, 'duration': 7.107}, {'end': 1170.361, 'text': 'Now finally what we need to do is we need to decide what we are going to output and this output will be based on our cell state.', 'start': 1164.776, 'duration': 5.585}, {'end': 1177.007, 'text': 'First we need to pass HT-1 and X3 through a sigmoid activation function so that we get an output that is OT.', 'start': 1170.801, 'duration': 6.206}, {'end': 1185.832, 'text': 'All right, and this OT will be in turn multiplied by the cell state after passing it through a tanh squashing function or an activation function.', 'start': 1177.768, 'duration': 8.064}], 'summary': 'Output depends on cell state, involving ht-1, x3, sigmoid activation, and tanh function.', 'duration': 28.644, 'max_score': 1157.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1157188.jpg'}, {'end': 1313.923, 'src': 'embed', 'start': 1279.178, 'weight': 6, 'content': [{'end': 1281.061, 'text': 'Alright, let me show you how we are going to do that.', 'start': 1279.178, 'duration': 1.883}, {'end': 1284.206, 'text': 'So this is what we are trying to do in our use case guys.', 'start': 1282.025, 'duration': 2.181}, {'end': 1288.408, 'text': "We'll feed ILSTM with correct sequences from the text of three symbols.", 'start': 1284.606, 'duration': 3.802}, {'end': 1293.591, 'text': 'For example, had a general and a label that is council in this particular example.', 'start': 1288.788, 'duration': 4.803}, {'end': 1297.593, 'text': 'Eventually our network will learn to predict the next symbol correctly.', 'start': 1294.191, 'duration': 3.402}, {'end': 1299.554, 'text': 'So obviously we need to train it on something.', 'start': 1297.773, 'duration': 1.781}, {'end': 1301.115, 'text': 'Let us see what we are going to train it on.', 'start': 1299.614, 'duration': 1.501}, {'end': 1307.078, 'text': "So we'll be training ILSTM to predict the next word using a sample short story that you can see over here.", 'start': 1301.755, 'duration': 5.323}, {'end': 1313.923, 'text': 'So it has basically 112 unique symbols, so even comma and full stop are considered as symbols.', 'start': 1307.986, 'duration': 5.937}], 'summary': 'Training ilstm on 112 unique symbols to predict next word in short story.', 'duration': 34.745, 'max_score': 1279.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1279178.jpg'}, {'end': 1355.435, 'src': 'heatmap', 'start': 1321.334, 'weight': 0.712, 'content': [{'end': 1328.118, 'text': 'So what we need to do is we need to convert these unique symbols into a unique integer value based on the frequency of occurrence.', 'start': 1321.334, 'duration': 6.784}, {'end': 1329.879, 'text': "And like that we'll create a dictionary.", 'start': 1328.378, 'duration': 1.501}, {'end': 1334.221, 'text': 'For example, we have had here that will have value 20.', 'start': 1329.939, 'duration': 4.282}, {'end': 1338.203, 'text': 'A will have value six, general will have value 33.', 'start': 1334.221, 'duration': 3.982}, {'end': 1341.204, 'text': 'And then what happens, our LSTM will create 112 element vector.', 'start': 1338.203, 'duration': 3.001}, {'end': 1348.83, 'text': 'that will contain the probability of each of these words, or each of these unique integer values.', 'start': 1342.825, 'duration': 6.005}, {'end': 1355.435, 'text': "Alright. so, since 0.6 has the highest probability in this particular vector, it'll pick the index value of 0.6,", 'start': 1349.23, 'duration': 6.205}], 'summary': 'Convert symbols to integers based on frequency for lstm processing.', 'duration': 34.101, 'max_score': 1321.334, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1321334.jpg'}], 'start': 854.585, 'title': 'Understanding lstms in neural networks and language modeling', 'summary': 'Provides an in-depth understanding of lstms in neural networks, detailing their structure, functionality, and applications for language modeling. it covers the capability to handle long-term dependencies, the process of deciding output based on cell state, and a use case involving training lstm to predict the next word in a sentence.', 'chapters': [{'end': 1156.548, 'start': 854.585, 'title': 'Understanding lstms in neural networks', 'summary': 'Explains the structure and functionality of lstms in neural networks, highlighting the capability to handle long-term dependencies and detailing the steps involved in processing and updating cell states in a chain-like structure with four interacting neural network layers.', 'duration': 301.963, 'highlights': ['LSTMs are capable of handling long-term dependencies in neural networks. LSTMs are designed to manage and process long-term dependencies within neural networks, providing a solution for effectively capturing and utilizing information from previous timestamps.', 'Structure of LSTMs involves a chain-like structure with four interacting neural network layers. LSTMs feature a distinctive structure comprising four interacting neural network layers, highlighting the complexity and specialized design for processing and updating cell states within the network.', "Explanation of the forget gate layer in LSTMs and its role in deciding what information to discard from the cell state. The forget gate layer plays a crucial role in determining which information from the previous timestamp and new inputs should be discarded from the cell state, enhancing the network's ability to manage and prioritize relevant data.", "Description of the input gate layer in LSTMs and its function in deciding what new information to store in the cell state. The input gate layer is responsible for identifying and storing new information in the cell state, providing insights into the process of determining and incorporating valuable data to enhance the network's performance.", 'Detailing the process of updating the cell state in LSTMs by combining the old and new cell states with determined values. The update process of the cell state in LSTMs involves multiplying the old cell state with determined forgetting factors, adding the new cell state, and incorporating new candidate values, showcasing the intricate steps for effectively updating and managing information within the network.']}, {'end': 1361.44, 'start': 1157.188, 'title': 'Understanding lstms for language modeling', 'summary': 'Explains the process of deciding the output based on cell state in lstm, using examples and equations to illustrate its functionality, and later discussing a use case involving training lstm to predict the next word in a sentence, with a focus on converting symbols to integers and creating a probability vector.', 'duration': 204.252, 'highlights': ['The process of deciding the output based on cell state in LSTM involves passing HT-1 and X3 through a sigmoid activation function to obtain an output OT, which is then multiplied by the cell state after passing it through a tanh squashing function to produce the new output H2, which only outputs the decided value.', 'Illustrating the functionality of LSTMs, the process involves deciding what to forget, adding new information to the cell state, combining the information to obtain the new cell state, and finally obtaining the desired output by passing HT-1 and XT through a sigmoid function, multiplying it with the tanh new cell state.', 'A use case involving training LSTM to predict the next word in a sentence requires feeding LSTM with correct sequences from the text of three symbols and converting these unique symbols into a unique integer value based on frequency of occurrence, followed by creating a dictionary and a probability vector for each word, eventually enabling the LSTM to predict the next symbol correctly.']}], 'duration': 506.855, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc854585.jpg', 'highlights': ['LSTMs are designed to manage and process long-term dependencies within neural networks, providing a solution for effectively capturing and utilizing information from previous timestamps.', 'LSTMs feature a distinctive structure comprising four interacting neural network layers, highlighting the complexity and specialized design for processing and updating cell states within the network.', "The forget gate layer plays a crucial role in determining which information from the previous timestamp and new inputs should be discarded from the cell state, enhancing the network's ability to manage and prioritize relevant data.", "The input gate layer is responsible for identifying and storing new information in the cell state, providing insights into the process of determining and incorporating valuable data to enhance the network's performance.", 'The update process of the cell state in LSTMs involves multiplying the old cell state with determined forgetting factors, adding the new cell state, and incorporating new candidate values, showcasing the intricate steps for effectively updating and managing information within the network.', 'The process of deciding the output based on cell state in LSTM involves passing HT-1 and X3 through a sigmoid activation function to obtain an output OT, which is then multiplied by the cell state after passing it through a tanh squashing function to produce the new output H2, which only outputs the decided value.', 'A use case involving training LSTM to predict the next word in a sentence requires feeding LSTM with correct sequences from the text of three symbols and converting these unique symbols into a unique integer value based on frequency of occurrence, followed by creating a dictionary and a probability vector for each word, eventually enabling the LSTM to predict the next symbol correctly.']}, {'end': 1888.347, 'segs': [{'end': 1401.256, 'src': 'embed', 'start': 1377.363, 'weight': 1, 'content': [{'end': 1383.505, 'text': "We'll be using TensorFlow which is a popular Python library for implementing deep neural networks or neural networks in general.", 'start': 1377.363, 'duration': 6.142}, {'end': 1385.805, 'text': "Alright, so I'll quickly open my PyCharm now.", 'start': 1384.105, 'duration': 1.7}, {'end': 1390.926, 'text': "So guys, this is my PyCharm and over here I've already written the code in order to execute the use case that we have.", 'start': 1386.425, 'duration': 4.501}, {'end': 1393.407, 'text': 'So first we need to do is import libraries.', 'start': 1391.346, 'duration': 2.061}, {'end': 1396.007, 'text': 'NumPy, Ferraris, TensorFlow, we know.', 'start': 1394.087, 'duration': 1.92}, {'end': 1398.628, 'text': 'TensorFlow.contrib, from that we need to import RNN.', 'start': 1396.127, 'duration': 2.501}, {'end': 1401.256, 'text': 'in random collections in time.', 'start': 1399.215, 'duration': 2.041}], 'summary': 'Using tensorflow for implementing neural networks and importing libraries like numpy, ferraris, tensorflow, and tensorflow.contrib in pycharm.', 'duration': 23.893, 'max_score': 1377.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1377363.jpg'}, {'end': 1458.637, 'src': 'embed', 'start': 1431.749, 'weight': 2, 'content': [{'end': 1437.43, 'text': "Alright, so then we have defined training underscore file which will have our story on which we'll train our model on.", 'start': 1431.749, 'duration': 5.681}, {'end': 1439.431, 'text': 'Then what we need to do is read this file.', 'start': 1437.81, 'duration': 1.621}, {'end': 1445.172, 'text': 'So how are we gonna do that? First is read line by line whatever content that we have in our file.', 'start': 1439.911, 'duration': 5.261}, {'end': 1446.793, 'text': 'Then we are going to strip it.', 'start': 1445.612, 'duration': 1.181}, {'end': 1450.814, 'text': 'That means we are going to remove the first and the last white space.', 'start': 1446.893, 'duration': 3.921}, {'end': 1455.255, 'text': 'Then again we are splitting it just to remove all the white spaces that are there.', 'start': 1451.174, 'duration': 4.081}, {'end': 1458.637, 'text': "After that, we're creating an array, and then we are reshaping it.", 'start': 1455.875, 'duration': 2.762}], 'summary': 'The process involves reading, stripping, and reshaping the training data file to prepare for model training.', 'duration': 26.888, 'max_score': 1431.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1431749.jpg'}, {'end': 1580.863, 'src': 'embed', 'start': 1555.058, 'weight': 0, 'content': [{'end': 1561.403, 'text': "So batch size is what? After every 1000 epochs, you'll see the output, all right? So it'll be processing it in batches of 1000 iterations.", 'start': 1555.058, 'duration': 6.345}, {'end': 1563.485, 'text': 'Then we have N underscore input S3.', 'start': 1561.903, 'duration': 1.582}, {'end': 1568.349, 'text': 'Now the number of units in the RNN cell will keep it as 512.', 'start': 1563.865, 'duration': 4.484}, {'end': 1570.27, 'text': 'Then we need to define X and Y.', 'start': 1568.349, 'duration': 1.921}, {'end': 1577.056, 'text': 'So X will be our placeholder that will have the input values, and Y will have all the labels, all right, over cap size.', 'start': 1570.27, 'duration': 6.786}, {'end': 1580.863, 'text': "So X is a placeholder where we'll be feeding in our input dictionary.", 'start': 1577.901, 'duration': 2.962}], 'summary': 'Using batch size of 1000 for 512 units in rnn cell and defining input placeholders x and y.', 'duration': 25.805, 'max_score': 1555.058, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1555058.jpg'}, {'end': 1747.106, 'src': 'embed', 'start': 1722.508, 'weight': 5, 'content': [{'end': 1728.333, 'text': 'And this council will be fed back as a part of the new input, and our new input will be a general council.', 'start': 1722.508, 'duration': 5.825}, {'end': 1729.934, 'text': 'So it will be a general council.', 'start': 1728.353, 'duration': 1.581}, {'end': 1734.078, 'text': 'Alright, so these three words will become our new input to predict the new output, which is two.', 'start': 1730.255, 'duration': 3.823}, {'end': 1735.219, 'text': 'Alright, and so on.', 'start': 1734.438, 'duration': 0.781}, {'end': 1741.061, 'text': "So surprisingly, LSTM actually creates a story that somehow makes sense, so let's just read it.", 'start': 1735.697, 'duration': 5.364}, {'end': 1747.106, 'text': 'Had a general counselor to consider what measures they could take to outwit their common enemy, the cat.', 'start': 1741.562, 'duration': 5.544}], 'summary': 'Using lstm, a general council is created to predict the new output, resulting in two, and eventually forming a coherent story.', 'duration': 24.598, 'max_score': 1722.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1722508.jpg'}, {'end': 1830.989, 'src': 'embed', 'start': 1795.653, 'weight': 4, 'content': [{'end': 1798.195, 'text': 'and both of these issues are due to long term dependencies.', 'start': 1795.653, 'duration': 2.542}, {'end': 1804.82, 'text': 'Now in order to solve those issues because of long term dependencies we use long short term memory networks called LSTMs.', 'start': 1798.595, 'duration': 6.225}, {'end': 1809.944, 'text': 'So we understood what exactly LSTM is and after that we even saw a use case of LSTM.', 'start': 1805.4, 'duration': 4.544}, {'end': 1812.646, 'text': "So guys this is it for today's session.", 'start': 1810.545, 'duration': 2.101}, {'end': 1819.131, 'text': "Now if you're looking for live online instructor led training for artificial intelligence using TensorFlow then you can visit our website.", 'start': 1813.027, 'duration': 6.104}, {'end': 1819.732, 'text': 'Let me show you.', 'start': 1819.151, 'duration': 0.581}, {'end': 1825.877, 'text': 'So this is a website guys where you can enroll at idureka.co slash AI deep learning with TensorFlow.', 'start': 1820.677, 'duration': 5.2}, {'end': 1830.989, 'text': "All right, so it is basically a structured program where we'll be starting with the basics.", 'start': 1826.547, 'duration': 4.442}], 'summary': 'Issues due to long term dependencies are solved using lstms, with a use case presented. website offers ai training.', 'duration': 35.336, 'max_score': 1795.653, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1795653.jpg'}], 'start': 1361.5, 'title': 'Implementing rnn and lstm models with tensorflow', 'summary': 'Delves into implementing an rnn model with tensorflow, covering data processing and training. it also explores lstm model training with 50,000 iterations and a batch size of 1000, leading to coherent story generation and addressing long-term dependency issues. additionally, it provides insights into a training program on artificial intelligence using tensorflow.', 'chapters': [{'end': 1487.051, 'start': 1361.5, 'title': 'Implementing rnn model with tensorflow', 'summary': 'Discusses implementing an rnn model using tensorflow, including importing libraries, defining training file, reading and processing data, and feeding in the training data.', 'duration': 125.551, 'highlights': ['The chapter discusses importing libraries including NumPy, Ferraris, TensorFlow, TensorFlow.contrib, RNN, random collections, and time to evaluate the time taken for training.', 'It explains the process of reading and processing the training file, including reading line by line, stripping white spaces, splitting the content, creating an array, and reshaping the data to ensure compatibility.', 'The chapter also covers the feeding of the training data into the RNN model for execution.']}, {'end': 1888.347, 'start': 1487.331, 'title': 'Lstm model training and text generation', 'summary': 'Explains the process of training an lstm model with 50,000 iterations and a batch size of 1000, followed by text generation using the model, resulting in a coherent story. it also covers the use of lstm for solving long-term dependency issues and offers information about a training program on artificial intelligence using tensorflow.', 'duration': 401.016, 'highlights': ['The chapter explains the process of training an LSTM model with 50,000 iterations and a batch size of 1000. The process of training the LSTM model is detailed with specific values for iterations and batch size, providing quantifiable data.', 'Text generation using the LSTM model results in a coherent story. The LSTM model successfully generates a coherent story by predicting the next word based on the input, demonstrating the effectiveness of the model.', 'The chapter covers the use of LSTM for solving long-term dependency issues. The use of LSTM is explained as a solution for long-term dependency issues in recurrent neural networks, providing valuable insight into the functionality of LSTM.', 'Information about a training program on artificial intelligence using TensorFlow is provided. Details about a training program on artificial intelligence using TensorFlow, including the curriculum and support features, are presented, offering additional value to the audience.']}], 'duration': 526.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/y7qrilE-Zlc/pics/y7qrilE-Zlc1361500.jpg', 'highlights': ['The chapter covers the feeding of the training data into the RNN model for execution.', 'The chapter discusses importing libraries including NumPy, Ferraris, TensorFlow, TensorFlow.contrib, RNN, random collections, and time to evaluate the time taken for training.', 'The chapter explains the process of reading and processing the training file, including reading line by line, stripping white spaces, splitting the content, creating an array, and reshaping the data to ensure compatibility.', 'The chapter explains the process of training an LSTM model with 50,000 iterations and a batch size of 1000.', 'The chapter covers the use of LSTM for solving long-term dependency issues.', 'Text generation using the LSTM model results in a coherent story.', 'Information about a training program on artificial intelligence using TensorFlow is provided.']}], 'highlights': ['LSTMs address the issues of vanishing and exploring gradients in recurrent neural networks, providing a solution to the challenges faced.', 'The example of predicting the next word in a sentence demonstrates the necessity of previous output information, showcasing the limitation of feedforward networks.', 'The need to consider previous timestamps and the impact of missed gym days on predicting the current exercise, emphasizing the importance of historical data in recurrent neural networks.', 'Introduction to backpropagation algorithm for RNN training The chapter introduces the backpropagation algorithm used for training a recurrent neural network (RNN), emphasizing that backpropagation happens for every timestamp and is commonly called backpropagation through time.', 'Solutions such as truncated BTT, gradient clipping, ReLU activation, and LSTMs are discussed to address gradient problems in back propagation.', 'LSTMs are capable of learning long-term dependencies in recurrent neural networks, addressing the issue of needing more context for certain tasks.', 'LSTMs are designed to manage and process long-term dependencies within neural networks, providing a solution for effectively capturing and utilizing information from previous timestamps.', 'LSTMs feature a distinctive structure comprising four interacting neural network layers, highlighting the complexity and specialized design for processing and updating cell states within the network.', "The forget gate layer plays a crucial role in determining which information from the previous timestamp and new inputs should be discarded from the cell state, enhancing the network's ability to manage and prioritize relevant data.", "The input gate layer is responsible for identifying and storing new information in the cell state, providing insights into the process of determining and incorporating valuable data to enhance the network's performance.", 'The update process of the cell state in LSTMs involves multiplying the old cell state with determined forgetting factors, adding the new cell state, and incorporating new candidate values, showcasing the intricate steps for effectively updating and managing information within the network.', 'A use case involving training LSTM to predict the next word in a sentence requires feeding LSTM with correct sequences from the text of three symbols and converting these unique symbols into a unique integer value based on frequency of occurrence, followed by creating a dictionary and a probability vector for each word, eventually enabling the LSTM to predict the next symbol correctly.', 'The chapter explains the process of training an LSTM model with 50,000 iterations and a batch size of 1000.', 'Text generation using the LSTM model results in a coherent story.']}