title

Tutorial 34- LSTM Recurrent Neural Network In Depth Intuition

description

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
https://www.youtube.com/channel/UCNU_lfiiWBdtULKOw6X0Dig/join
Reference Link: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Please do subscribe my other channel too
https://www.youtube.com/channel/UCjWY5hREA6FFYrthD0rZNIw
Connect with me here:
Twitter: https://twitter.com/Krishnaik06
facebook: https://www.facebook.com/krishnaik06
Instagram: https://www.instagram.com/krishnaik06

detail

{'title': 'Tutorial 34- LSTM Recurrent Neural Network In Depth Intuition', 'heatmap': [{'end': 1002.1, 'start': 938.938, 'weight': 0.768}], 'summary': 'This tutorial provides an in-depth understanding of lstm recurrent neural networks, addressing the vanishing gradient problem and exploring memory cells, forget, input, and output gates. it also covers pointwise and concatenation operations, as well as the practical implementation of lstm variants for use cases.', 'chapters': [{'end': 52.829, 'segs': [{'end': 52.829, 'src': 'embed', 'start': 19.936, 'weight': 0, 'content': [{'end': 23.278, 'text': 'we have understood what are the problems of simple recurrent neural network.', 'start': 19.936, 'duration': 3.342}, {'end': 26.15, 'text': "i told you that There's a problem of vanishing gradient.", 'start': 23.278, 'duration': 2.872}, {'end': 35.437, 'text': 'right?. Whenever you have a deep recurrent neural network and suppose your output is dependent on one of the inputs which is there in the initial stages of during the back propagation,', 'start': 26.15, 'duration': 9.287}, {'end': 37.378, 'text': 'there is a problem vanishing gradient problem.', 'start': 35.437, 'duration': 1.941}, {'end': 42.822, 'text': 'that basically means, when the weights are getting updated with the help of back propagation using the chain rule,', 'start': 37.378, 'duration': 5.444}, {'end': 46.585, 'text': 'that then the weight actually becomes a very, very smaller value,', 'start': 42.822, 'duration': 3.763}, {'end': 52.829, 'text': "or the weight updation does not happen because when you're finding the derivative of that specific weight it will be a very small number.", 'start': 46.585, 'duration': 6.244}], 'summary': 'Simple recurrent neural networks face vanishing gradient problem during backpropagation, leading to very small weight updates or no updates.', 'duration': 32.893, 'max_score': 19.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk19936.jpg'}], 'start': 1.363, 'title': 'Lstm recurrent neural networks', 'summary': 'Delves into lstm recurrent neural networks, addressing the vanishing gradient problem in deep recurrent neural networks during back propagation and emphasizing potential solutions.', 'chapters': [{'end': 52.829, 'start': 1.363, 'title': 'Lstm recurrent neural networks', 'summary': 'Discusses lstm recurrent neural networks, highlighting the issue of vanishing gradient problem in deep recurrent neural networks during back propagation with a focus on resolving the problem.', 'duration': 51.466, 'highlights': ['The vanishing gradient problem occurs in deep recurrent neural networks during back propagation when the weight becomes a very small value, hindering weight updation.', 'The issue arises when the derivative of a specific weight becomes a very small number, leading to the problem of vanishing gradient in deep recurrent neural networks.']}], 'duration': 51.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk1363.jpg', 'highlights': ['The vanishing gradient problem occurs in deep recurrent neural networks during back propagation when the weight becomes a very small value, hindering weight updation.', 'The issue arises when the derivative of a specific weight becomes a very small number, leading to the problem of vanishing gradient in deep recurrent neural networks.']}, {'end': 349.075, 'segs': [{'end': 134.86, 'src': 'embed', 'start': 53.209, 'weight': 0, 'content': [{'end': 57.971, 'text': 'so if you have not seen my vanishing gradient problem, uh, just go and watch my complete deep learning playlist.', 'start': 53.209, 'duration': 4.762}, {'end': 59.772, 'text': 'all the videos have been uploaded now.', 'start': 57.971, 'duration': 1.801}, {'end': 65.975, 'text': 'today, in order to solve this particular problem, we basically use something called as lstm, recurrent neural networks and gru.', 'start': 59.772, 'duration': 6.203}, {'end': 69.657, 'text': "today, in this particular video, we'll discuss about lstm recurrent neural network.", 'start': 65.975, 'duration': 3.682}, {'end': 73.019, 'text': 'and remember, guys, please do follow this blog from kohlhaas blog.', 'start': 69.657, 'duration': 3.362}, {'end': 77.061, 'text': "so, kohlhaas blog, i'll completely dedicate this particular video to this particular blog.", 'start': 73.019, 'duration': 4.042}, {'end': 79.215, 'text': 'This is a wonderful explanation.', 'start': 77.634, 'duration': 1.581}, {'end': 83.616, 'text': "And just by reading it, reading the text, you'll be able to understand everything.", 'start': 79.315, 'duration': 4.301}, {'end': 90.719, 'text': 'Okay Now let us go ahead and try to understand how does a recurrent neural network, you know, work.', 'start': 83.977, 'duration': 6.742}, {'end': 96.742, 'text': 'And here you can see that LSTMs are explicitly designed to avoid the long-term dependency problem.', 'start': 91.059, 'duration': 5.683}, {'end': 101.484, 'text': 'So they are actually focused on resolving the vanishing gradient problem itself.', 'start': 97.182, 'duration': 4.302}, {'end': 104.445, 'text': 'So this is how my simple recurrent neural network looks like.', 'start': 101.964, 'duration': 2.481}, {'end': 107.326, 'text': 'And this is basically my LSTM recurrent neural network looks like.', 'start': 104.525, 'duration': 2.801}, {'end': 111.386, 'text': "Don't overcomplicate yourself by seeing so many gates,", 'start': 108.024, 'duration': 3.362}, {'end': 117.07, 'text': "but we'll try to divide this whole architecture into some steps and then we'll try to understand that.", 'start': 111.386, 'duration': 5.684}, {'end': 122.033, 'text': "So over here, the first thing that we need to discuss, we'll divide this into various components.", 'start': 117.47, 'duration': 4.563}, {'end': 126.175, 'text': 'So the first component is basically called as memory cell.', 'start': 122.533, 'duration': 3.642}, {'end': 129.797, 'text': "Okay We'll try to understand what exactly is memory cell.", 'start': 126.996, 'duration': 2.801}, {'end': 132.459, 'text': 'Then we have forget gate.', 'start': 129.836, 'duration': 2.623}, {'end': 134.86, 'text': 'Okay Forget gate.', 'start': 133.259, 'duration': 1.601}], 'summary': 'The lstm recurrent neural network resolves the vanishing gradient problem and uses memory cells and forget gates.', 'duration': 81.651, 'max_score': 53.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk53209.jpg'}, {'end': 261.546, 'src': 'embed', 'start': 230.944, 'weight': 1, 'content': [{'end': 235.385, 'text': 'So memory cell is basically used to remember and forget things.', 'start': 230.944, 'duration': 4.441}, {'end': 241.926, 'text': 'Okay Memory cell is basically used for remembering and forgetting.', 'start': 236.245, 'duration': 5.681}, {'end': 247.847, 'text': 'Okay How do we remember and forget is based on the context of the input.', 'start': 242.446, 'duration': 5.401}, {'end': 252.068, 'text': 'Okay So based on the context of the input.', 'start': 248.467, 'duration': 3.601}, {'end': 258.425, 'text': "Now suppose, I have a use case where I'm actually trying to generate text.", 'start': 253.083, 'duration': 5.342}, {'end': 261.546, 'text': 'I want to generate text.', 'start': 260.026, 'duration': 1.52}], 'summary': 'Memory cells are used for remembering and forgetting based on input context.', 'duration': 30.602, 'max_score': 230.944, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk230944.jpg'}, {'end': 320.098, 'src': 'embed', 'start': 288.872, 'weight': 4, 'content': [{'end': 291.473, 'text': "now, when i'm talking about this, my recurrent neural network.", 'start': 288.872, 'duration': 2.601}, {'end': 297.974, 'text': 'after coming to this particular position, if it wants to generate some more text, it should be definitely referring to my name.', 'start': 291.473, 'duration': 6.501}, {'end': 301.155, 'text': "okay, so basically, i'm talking about myself, right crush over here.", 'start': 297.974, 'duration': 3.181}, {'end': 305.036, 'text': 'so it should be uh, you know, generating the text based on this particular noun.', 'start': 301.155, 'duration': 3.881}, {'end': 307.21, 'text': 'So it should remember the context.', 'start': 305.649, 'duration': 1.561}, {'end': 312.633, 'text': 'If you are changing the context, then what happens is that what do I mean by changing the context?', 'start': 307.41, 'duration': 5.223}, {'end': 315.995, 'text': "The next line I'll write it as my brother name is Wish.", 'start': 312.673, 'duration': 3.322}, {'end': 320.098, 'text': "So my brother name is, I'm going to say Wish.", 'start': 316.396, 'duration': 3.702}], 'summary': 'Discussing context-aware generation in recurrent neural network, referring to name for text generation.', 'duration': 31.226, 'max_score': 288.872, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk288872.jpg'}], 'start': 53.209, 'title': 'Lstm and memory cells', 'summary': 'Delves into lstm recurrent neural network, addressing the vanishing gradient problem, and explores the architecture and functionality of memory cells in neural networks, encompassing memory cells, forget gate, input gate, and output gate for information retention and retrieval.', 'chapters': [{'end': 111.386, 'start': 53.209, 'title': 'Lstm recurrent neural network', 'summary': 'Discusses the lstm recurrent neural network as a solution to the vanishing gradient problem, highlighting its explicit design to avoid long-term dependency and resolve the vanishing gradient problem itself.', 'duration': 58.177, 'highlights': ['LSTMs are explicitly designed to avoid the long-term dependency problem and focus on resolving the vanishing gradient problem.', 'Kohlhaas blog is recommended for a comprehensive explanation of the topic.', 'The video covers the use of LSTM, recurrent neural networks, and GRU to solve the vanishing gradient problem.']}, {'end': 349.075, 'start': 111.386, 'title': 'Understanding memory cells in neural networks', 'summary': 'Explains the architecture of memory cells in neural networks, including components like memory cell, forget gate, input gate, and output gate, and their role in remembering and forgetting information based on input context and training data.', 'duration': 237.689, 'highlights': ['The memory cell is used to remember and forget information based on the context of the input, crucial for tasks like text generation.', 'The architecture includes components like memory cell, forget gate, input gate, and output gate, each playing a specific role in handling information.', 'The recurrent neural network should remember previous information and adapt to new context, as exemplified in the scenario of generating text and changing context with new information.']}], 'duration': 295.866, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk53209.jpg', 'highlights': ['LSTMs are explicitly designed to avoid the long-term dependency problem and focus on resolving the vanishing gradient problem.', 'The memory cell is used to remember and forget information based on the context of the input, crucial for tasks like text generation.', 'The architecture includes components like memory cell, forget gate, input gate, and output gate, each playing a specific role in handling information.', 'The video covers the use of LSTM, recurrent neural networks, and GRU to solve the vanishing gradient problem.', 'The recurrent neural network should remember previous information and adapt to new context, as exemplified in the scenario of generating text and changing context with new information.', 'Kohlhaas blog is recommended for a comprehensive explanation of the topic.']}, {'end': 606.896, 'segs': [{'end': 418.144, 'src': 'embed', 'start': 388.077, 'weight': 0, 'content': [{'end': 390.878, 'text': "And suppose my input are like, and we'll discuss about this.", 'start': 388.077, 'duration': 2.801}, {'end': 394.859, 'text': 'Okay The input is something like 1 1 1 0 0 1.', 'start': 391.159, 'duration': 3.7}, {'end': 396.701, 'text': 'Okay So total six.', 'start': 394.86, 'duration': 1.841}, {'end': 397.361, 'text': 'Yes Six.', 'start': 396.901, 'duration': 0.46}, {'end': 402.684, 'text': 'Now, when we do this pointwise operation, this basically means that I will multiply this number with this.', 'start': 398.022, 'duration': 4.662}, {'end': 405.045, 'text': 'This is what pointwise specific to location.', 'start': 402.944, 'duration': 2.101}, {'end': 406.986, 'text': 'We will be multiplying the values.', 'start': 405.565, 'duration': 1.421}, {'end': 411.723, 'text': "Okay Then I'll be multiplying this and this, this and this, and I'll be getting this one vector.", 'start': 407.086, 'duration': 4.637}, {'end': 413.904, 'text': 'So if I want to see the vector, it will look like this.', 'start': 411.783, 'duration': 2.121}, {'end': 414.824, 'text': '1, 2, 3.', 'start': 414.464, 'duration': 0.36}, {'end': 418.144, 'text': 'Then when I multiply 4 multiplied by 0, this will become 0, 0.', 'start': 414.824, 'duration': 3.32}], 'summary': 'The input is 1 1 1 0 0 1, resulting in a vector of 1, 2, 3 after pointwise multiplication.', 'duration': 30.067, 'max_score': 388.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk388077.jpg'}, {'end': 467.114, 'src': 'embed', 'start': 440.575, 'weight': 3, 'content': [{'end': 445.537, 'text': 'now, suppose, if the rnn, when the context changes, when we are actually providing the input from here,', 'start': 440.575, 'duration': 4.962}, {'end': 451.499, 'text': 'what will happen is that some of the information the rnn will forget because the context has changed.', 'start': 445.537, 'duration': 5.962}, {'end': 456.32, 'text': 'if, suppose the context has not changed, okay, if, suppose the context has not changed,', 'start': 451.499, 'duration': 4.821}, {'end': 461.162, 'text': 'this whole vectors will be something like this it will have all the values as one, one, one one.', 'start': 456.32, 'duration': 4.842}, {'end': 463.853, 'text': 'okay, that basically means that context is same.', 'start': 461.162, 'duration': 2.691}, {'end': 467.114, 'text': 'i need not forget any of any of the information.', 'start': 463.853, 'duration': 3.261}], 'summary': 'Rnn forgets some information when context changes; maintains all information if context remains the same.', 'duration': 26.539, 'max_score': 440.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk440575.jpg'}, {'end': 568.759, 'src': 'embed', 'start': 545.539, 'weight': 4, 'content': [{'end': 552.601, 'text': "you know, i'll completely dedicate this video to this particular blog and when you read everything, you'll be able to understand it very clearly,", 'start': 545.539, 'duration': 7.062}, {'end': 555.702, 'text': 'because lstm is nothing but long, short term memory.', 'start': 552.601, 'duration': 3.101}, {'end': 560.963, 'text': 'the rnn should be able to remember your previous information, whichever they are actually dependent on.', 'start': 555.702, 'duration': 5.261}, {'end': 568.759, 'text': 'okay, Now, this particular cell is actually called as forget gate.', 'start': 560.963, 'duration': 7.796}], 'summary': 'Dedicated to explaining lstm and rnn, including forget gate.', 'duration': 23.22, 'max_score': 545.539, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk545539.jpg'}], 'start': 349.075, 'title': 'Neural network operations', 'summary': 'Covers pointwise operations in neural networks, illustrating with a vector of numbers and their multiplication. it also explains lstm memory cell operations, highlighting the impact of context changes on information retention and addition, and the function of the forget cell in retaining and forgetting information.', 'chapters': [{'end': 440.575, 'start': 349.075, 'title': 'Understanding pointwise operations in neural networks', 'summary': 'Explains the pointwise operation in neural networks, demonstrating a specific example with a vector of numbers and their multiplication, resulting in a detailed understanding of the process.', 'duration': 91.5, 'highlights': ['The pointwise operation in neural networks involves multiplying corresponding elements of two vectors, as demonstrated with the example of multiplying input vectors to obtain a new vector, providing a clear understanding of the process.', 'The specific example illustrates the multiplication of input vectors, such as 1 1 1 0 0 1, resulting in a new vector with the elements 1, 2, 3, 0, 0, 6, showcasing the practical application of the pointwise operation.', 'The explanation emphasizes the impact of the pointwise operation on retaining or discarding information, as seen with the example producing 0, 0, 6 from the input vectors, highlighting the significance of the process in neural network operations.']}, {'end': 606.896, 'start': 440.575, 'title': 'Lstm memory cell operation', 'summary': 'Explains the operation of lstm memory cells, emphasizing the impact of context changes on information retention and addition, and the function of the forget cell and its role in retaining and forgetting information as inputs are provided.', 'duration': 166.321, 'highlights': ['The LSTM memory cell operation is impacted by context changes, causing the cell to forget some information and retain some based on the context, with all values in the vector being one when the context remains unchanged.', 'The forget cell in LSTM is responsible for retaining and forgetting information based on the input and previous output, ensuring that relevant information is retained and irrelevant information is forgotten.']}], 'duration': 257.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk349075.jpg', 'highlights': ['The explanation emphasizes the impact of the pointwise operation on retaining or discarding information, as seen with the example producing 0, 0, 6 from the input vectors, highlighting the significance of the process in neural network operations.', 'The specific example illustrates the multiplication of input vectors, such as 1 1 1 0 0 1, resulting in a new vector with the elements 1, 2, 3, 0, 0, 6, showcasing the practical application of the pointwise operation.', 'The pointwise operation in neural networks involves multiplying corresponding elements of two vectors, as demonstrated with the example of multiplying input vectors to obtain a new vector, providing a clear understanding of the process.', 'The LSTM memory cell operation is impacted by context changes, causing the cell to forget some information and retain some based on the context, with all values in the vector being one when the context remains unchanged.', 'The forget cell in LSTM is responsible for retaining and forgetting information based on the input and previous output, ensuring that relevant information is retained and irrelevant information is forgotten.']}, {'end': 954.811, 'segs': [{'end': 648.376, 'src': 'embed', 'start': 623.72, 'weight': 2, 'content': [{'end': 635.545, 'text': 'Okay Now this concatenation, this concatenation is basically given by, which by this particular mathematical equation, that is WFF HT minus 1 XT.', 'start': 623.72, 'duration': 11.825}, {'end': 640.047, 'text': "Now what concatenation basically means, I'll just divide this W of F.", 'start': 635.865, 'duration': 4.182}, {'end': 646.715, 'text': 'Remember, in my previous videos of recurrent neural network I told you that Whenever we are passing this previous output also,', 'start': 640.047, 'duration': 6.668}, {'end': 648.376, 'text': 'there will be some weights initialized over here.', 'start': 646.715, 'duration': 1.661}], 'summary': 'Concatenation represented by wff ht-1 xt in neural network with initialized weights.', 'duration': 24.656, 'max_score': 623.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk623720.jpg'}, {'end': 724.792, 'src': 'embed', 'start': 693.789, 'weight': 1, 'content': [{'end': 696.13, 'text': 'Since we are concatenating, this is W of H.', 'start': 693.789, 'duration': 2.341}, {'end': 698.895, 'text': 'W of H minus 1.', 'start': 697.894, 'duration': 1.001}, {'end': 700.196, 'text': 'Let me write it down like this.', 'start': 698.895, 'duration': 1.301}, {'end': 703.78, 'text': 'So, this W of F is nothing but combining two weights.', 'start': 700.216, 'duration': 3.564}, {'end': 705.522, 'text': 'Okay Concatenating two weights.', 'start': 703.88, 'duration': 1.642}, {'end': 711.187, 'text': 'The next thing is that whenever we concatenate, we are actually concatenating the input and this particular thing.', 'start': 705.582, 'duration': 5.605}, {'end': 715.631, 'text': 'So, we have actually written as HT minus 1 and HT.', 'start': 711.507, 'duration': 4.124}, {'end': 717.173, 'text': 'Okay I can also write it as common.', 'start': 715.952, 'duration': 1.221}, {'end': 720.73, 'text': 'So this two operation basically says this, and this is my bias.', 'start': 717.588, 'duration': 3.142}, {'end': 724.792, 'text': 'And this basically looks like the equation y is equal to mx plus c.', 'start': 721.17, 'duration': 3.622}], 'summary': 'Concatenating two weights results in w of h minus 1, similar to y=mx+c equation.', 'duration': 31.003, 'max_score': 693.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk693789.jpg'}, {'end': 770.298, 'src': 'embed', 'start': 741.601, 'weight': 0, 'content': [{'end': 743.082, 'text': 'Now, sigmoid activation function.', 'start': 741.601, 'duration': 1.481}, {'end': 745.483, 'text': 'you know that it makes the input.', 'start': 743.082, 'duration': 2.401}, {'end': 749.152, 'text': 'it makes the input or whatever input we are giving.', 'start': 745.483, 'duration': 3.669}, {'end': 752.093, 'text': 'it transforms it between 0 to 1.', 'start': 749.152, 'duration': 2.941}, {'end': 756.314, 'text': 'now, remember, whenever i say this is my input right?', 'start': 752.093, 'duration': 4.221}, {'end': 761.276, 'text': 'suppose my previous input is not similar to this particular input.', 'start': 756.314, 'duration': 4.962}, {'end': 761.756, 'text': 'not similar.', 'start': 761.276, 'duration': 0.48}, {'end': 765.097, 'text': 'basically means there is that not similar.', 'start': 761.756, 'duration': 3.341}, {'end': 768.698, 'text': 'basically means that there is a change in context.', 'start': 765.097, 'duration': 3.601}, {'end': 770.298, 'text': 'change in context.', 'start': 768.698, 'duration': 1.6}], 'summary': 'Sigmoid activation function transforms input to 0 to 1, detecting change in context.', 'duration': 28.697, 'max_score': 741.601, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk741601.jpg'}, {'end': 901.744, 'src': 'embed', 'start': 869.5, 'weight': 3, 'content': [{'end': 873.461, 'text': 'okay, some of the information will still be captured, even though the context has changed.', 'start': 869.5, 'duration': 3.961}, {'end': 876.162, 'text': 'okay, and now it is working like a long, short-term memory.', 'start': 873.461, 'duration': 2.701}, {'end': 885.236, 'text': 'you are able to remember things right and remember that this whole thing is basically called as a cell state, cell state.', 'start': 876.162, 'duration': 9.074}, {'end': 886.697, 'text': 'okay, just try to understand this.', 'start': 885.236, 'duration': 1.461}, {'end': 891.799, 'text': "i know it looks a little bit complication, but if you understand each and everything, you'll be able to understand this.", 'start': 886.697, 'duration': 5.102}, {'end': 893.2, 'text': 'okay, my first operation is done.', 'start': 891.799, 'duration': 1.401}, {'end': 895.001, 'text': 'this is basically called as my.', 'start': 893.2, 'duration': 1.801}, {'end': 901.744, 'text': 'this whole thing is called as my forget, forget.', 'start': 895.001, 'duration': 6.743}], 'summary': 'The transcript discusses the concept of cell state and memory.', 'duration': 32.244, 'max_score': 869.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk869500.jpg'}], 'start': 606.896, 'title': 'Neural network operations', 'summary': "Covers the concatenation operation in neural networks involving the equation wff ht - 1 xt and the combination of weights, and explores the sigmoid activation function's role in transforming inputs to values between 0 and 1, impacting context change and memory retention.", 'chapters': [{'end': 717.173, 'start': 606.896, 'title': 'Concatenation operation in neural networks', 'summary': 'Explains the concatenation operation in neural networks, involving the mathematical equation wff ht - 1 xt and the combination of weights w of t - 1 and w of t during the concatenation process.', 'duration': 110.277, 'highlights': ['The concatenation operation in neural networks is defined by the mathematical equation WFF HT - 1 XT, involving the combination of weights W of T - 1 and W of T.', 'During concatenation, two weights are combined, represented as W of H - 1 and W of H.', 'The process of concatenation involves combining the input with the specific weights, denoted as HT - 1 and HT.']}, {'end': 954.811, 'start': 717.588, 'title': 'Neural network sigmoid activation', 'summary': "Explains the concept of sigmoid activation function in neural networks and its role in transforming inputs to values between 0 and 1, highlighting the impact of context change on output vectors and the function's role in memory retention and forgetting.", 'duration': 237.223, 'highlights': ['The sigmoid activation function transforms inputs to values between 0 and 1, impacting the output vectors based on the similarity of the input, with more similar inputs resulting in more ones and less similar inputs leading to more zeros.', 'The function serves as a mechanism for memory retention and forgetting, capturing some information even when the context changes, and is compared to a long short-term memory system.', "The concept of forgetting in the neuron network's operation is explained, where the output vectors change based on the context change, leading to the forgetting of some information while retaining some despite the context change."]}], 'duration': 347.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk606896.jpg', 'highlights': ['The sigmoid activation function transforms inputs to values between 0 and 1, impacting the output vectors based on the similarity of the input, with more similar inputs resulting in more ones and less similar inputs leading to more zeros.', 'The process of concatenation involves combining the input with the specific weights, denoted as HT - 1 and HT.', 'The concatenation operation in neural networks is defined by the mathematical equation WFF HT - 1 XT, involving the combination of weights W of T - 1 and W of T.', 'The function serves as a mechanism for memory retention and forgetting, capturing some information even when the context changes, and is compared to a long short-term memory system.']}, {'end': 1405.921, 'segs': [{'end': 1002.1, 'src': 'embed', 'start': 977.837, 'weight': 3, 'content': [{'end': 986.824, 'text': 'Now again, whichever are highly positive, whichever are zeros, that will get converted into zeros because, based on my sigmoid output,', 'start': 977.837, 'duration': 8.987}, {'end': 988.105, 'text': "I'll be having ones and zeros.", 'start': 986.824, 'duration': 1.281}, {'end': 995.912, 'text': "Again, if my matrix is completely dissimilar, it needs to add some more information when I'm actually doing the pointwise operation with tanh.", 'start': 988.766, 'duration': 7.146}, {'end': 998.459, 'text': "Okay Then finally I'll get the output over here.", 'start': 996.638, 'duration': 1.821}, {'end': 1002.1, 'text': 'Okay And then I will be adding this information to my memory.', 'start': 998.819, 'duration': 3.281}], 'summary': 'Sigmoid output converts highly positive values to ones and zeros, while dissimilar matrices need additional information for pointwise operation with tanh.', 'duration': 24.263, 'max_score': 977.837, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk977837.jpg'}, {'end': 1048.365, 'src': 'embed', 'start': 1022.408, 'weight': 2, 'content': [{'end': 1034.52, 'text': 'Okay, understand, in this step, what we are doing is that we are just adding information, adding information after this particular operation.', 'start': 1022.408, 'duration': 12.112}, {'end': 1043.002, 'text': 'you know so this sigmoid, whichever are the meaningful context it will retrieve after this point.', 'start': 1034.52, 'duration': 8.482}, {'end': 1048.365, 'text': 'wise operation, after doing the tanh function, this will give you all your values between minus one to plus one.', 'start': 1043.002, 'duration': 5.363}], 'summary': 'Adding information after sigmoid and tanh functions for meaningful context retrieval.', 'duration': 25.957, 'max_score': 1022.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk1022408.jpg'}, {'end': 1365.503, 'src': 'embed', 'start': 1334.629, 'weight': 0, 'content': [{'end': 1337.751, 'text': 'And the third thing is that you need to understand about the input layer.', 'start': 1334.629, 'duration': 3.122}, {'end': 1342.093, 'text': 'And the fourth thing is basically you need to understand about the output layer.', 'start': 1339.111, 'duration': 2.982}, {'end': 1345.835, 'text': 'So this was all about LSTM.', 'start': 1343.374, 'duration': 2.461}, {'end': 1349.017, 'text': "My next video, I'll be coming up with practical implementation.", 'start': 1346.235, 'duration': 2.782}, {'end': 1351.438, 'text': 'There is a lot of varieties of LSTM.', 'start': 1349.557, 'duration': 1.881}, {'end': 1353.499, 'text': 'One is called a sequence to sequence.', 'start': 1351.738, 'duration': 1.761}, {'end': 1356.421, 'text': 'One is called as vec to sequence.', 'start': 1354.38, 'duration': 2.041}, {'end': 1359.64, 'text': 'and one is also called as vec2vec.', 'start': 1357.458, 'duration': 2.182}, {'end': 1365.503, 'text': "so we'll try to solve use cases on this and you'll be able to understand a whole lot of things, you know.", 'start': 1359.64, 'duration': 5.863}], 'summary': 'Introduction to lstm with different varieties and upcoming practical implementation.', 'duration': 30.874, 'max_score': 1334.629, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk1334629.jpg'}, {'end': 1405.921, 'src': 'embed', 'start': 1380.593, 'weight': 1, 'content': [{'end': 1384.879, 'text': 'in order to solve that, In order to remove the long-term dependency,', 'start': 1380.593, 'duration': 4.286}, {'end': 1389.404, 'text': "I'm just using LSTM so that it can remember some of the information from the past information right?", 'start': 1384.879, 'duration': 4.525}, {'end': 1397.152, 'text': "So if I don't have any context change, even my last layer output can have all the information from my first layer input itself.", 'start': 1389.764, 'duration': 7.388}, {'end': 1399.755, 'text': 'So this is what is the explanation about LSTM.', 'start': 1397.692, 'duration': 2.063}, {'end': 1401.156, 'text': 'So I hope you like this particular video.', 'start': 1399.815, 'duration': 1.341}, {'end': 1402.718, 'text': 'Please do subscribe to the channel.', 'start': 1401.737, 'duration': 0.981}, {'end': 1404.519, 'text': "If you have not already subscribed, I'll see you in the next video.", 'start': 1402.758, 'duration': 1.761}, {'end': 1405.14, 'text': 'Have a great day.', 'start': 1404.539, 'duration': 0.601}, {'end': 1405.921, 'text': 'Thank you and bye-bye.', 'start': 1405.16, 'duration': 0.761}], 'summary': 'Using lstm to address long-term dependency, ensuring information retention. subscribe for more content.', 'duration': 25.328, 'max_score': 1380.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk1380593.jpg'}], 'start': 954.811, 'title': 'Memory input layer operations and understanding lstm in neural networks', 'summary': 'Discusses operations in the input layer involving sigmoid and tanh functions for meaningful context and specific value ranges. it also explains lstm architecture, memory cells, forget gate, input and output layers, and practical implementation of lstm variants for use cases.', 'chapters': [{'end': 1067.033, 'start': 954.811, 'title': 'Memory input layer operations', 'summary': 'Discusses the operations involved in the input layer, including sigmoid and tanh functions, involving multiplication and addition, to retrieve meaningful context and provide values within specific ranges.', 'duration': 112.222, 'highlights': ['Performing a pointwise operation of multiplication based on the sigmoid output, converting highly positive and zero values into zeros, and adding the resulting information to memory.', 'Discussing the process of adding information after a particular operation, retrieving meaningful context after performing the tanh function, and passing only the meaningful information based on the multiplication operation.']}, {'end': 1405.921, 'start': 1067.433, 'title': 'Understanding lstm in neural networks', 'summary': 'Explains the architecture and working principle of lstm in neural networks, emphasizing the importance of memory cells, forget gate, input and output layers, and practical implementation of various lstm variants for solving use cases.', 'duration': 338.488, 'highlights': ['LSTM architecture and working principle The chapter provides a detailed explanation of the LSTM architecture, emphasizing the significance of memory cells, forget gate, and input and output layers.', 'Practical implementation of LSTM variants The chapter discusses various LSTM variants such as sequence to sequence, vec to sequence, and vec2vec, highlighting their practical implementation for solving use cases.', 'Importance of memory cells and context change The chapter emphasizes the significance of memory cells and context change in retaining and processing meaningful information, essential for understanding the functioning of LSTM in neural networks.']}], 'duration': 451.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rdkIOM78ZPk/pics/rdkIOM78ZPk954811.jpg', 'highlights': ['Practical implementation of LSTM variants such as sequence to sequence, vec to sequence, and vec2vec, highlighting their practical implementation for solving use cases.', 'The chapter emphasizes the significance of memory cells and context change in retaining and processing meaningful information, essential for understanding the functioning of LSTM in neural networks.', 'Discussing the process of adding information after a particular operation, retrieving meaningful context after performing the tanh function, and passing only the meaningful information based on the multiplication operation.', 'Performing a pointwise operation of multiplication based on the sigmoid output, converting highly positive and zero values into zeros, and adding the resulting information to memory.', 'LSTM architecture and working principle The chapter provides a detailed explanation of the LSTM architecture, emphasizing the significance of memory cells, forget gate, and input and output layers.']}], 'highlights': ['The LSTM architecture includes components like memory cell, forget gate, input gate, and output gate, each playing a specific role in handling information.', 'The vanishing gradient problem occurs in deep recurrent neural networks during back propagation when the weight becomes a very small value, hindering weight updation.', 'LSTMs are explicitly designed to avoid the long-term dependency problem and focus on resolving the vanishing gradient problem.', 'The explanation emphasizes the impact of the pointwise operation on retaining or discarding information, as seen with the example producing 0, 0, 6 from the input vectors, highlighting the significance of the process in neural network operations.', 'The process of concatenation involves combining the input with the specific weights, denoted as HT - 1 and HT.', 'Practical implementation of LSTM variants such as sequence to sequence, vec to sequence, and vec2vec, highlighting their practical implementation for solving use cases.']}