title
Deep Q Learning w/ DQN - Reinforcement Learning p.5
description
Hello and welcome to the first video about Deep Q-Learning and Deep Q Networks, or DQNs. Deep Q Networks are the deep learning/neural network versions of Q-Learning.
With DQNs, instead of a Q Table to look up values, you have a model that you inference (make predictions from), and rather than updating the Q table, you fit (train) your model.
Text-based tutorial and sample code: https://pythonprogramming.net/deep-q-learning-dqn-reinforcement-learning-python-tutorial/
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Instagram: https://instagram.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
#reinforcementlearning #machinelearning #python
detail
{'title': 'Deep Q Learning w/ DQN - Reinforcement Learning p.5', 'heatmap': [{'end': 79.023, 'start': 57.123, 'weight': 0.704}, {'end': 118.977, 'start': 97.278, 'weight': 0.716}, {'end': 174.652, 'start': 134.936, 'weight': 0.702}, {'end': 429.928, 'start': 387.609, 'weight': 0.98}, {'end': 1086.272, 'start': 1006.608, 'weight': 0.79}, {'end': 1360.69, 'start': 1298.322, 'weight': 0.783}, {'end': 1477.07, 'start': 1434.392, 'weight': 0.888}, {'end': 1803.662, 'start': 1745.535, 'weight': 0.701}], 'summary': 'Covers deep q learning basics and neural network for q-learning, delves into the benefits and challenges of using neural networks for deep q learning, discusses creating a convolutional neural network model with specific layers and parameters, and introduces implementing a replay memory of size 50,000 to ensure stability in both the training network and predictions while emphasizing the importance of batch size for stability and learning in neural network training.', 'chapters': [{'end': 257.603, 'segs': [{'end': 86.23, 'src': 'heatmap', 'start': 57.123, 'weight': 3, 'content': [{'end': 60.366, 'text': "If you've ever looked up reinforcement learning first, you found that all the tutorials suck.", 'start': 57.123, 'duration': 3.243}, {'end': 63.829, 'text': 'and, uh, and hopefully not this one.', 'start': 61.327, 'duration': 2.502}, {'end': 75.78, 'text': "and then, um, and then you've seen the following image so this is how the deep deep query network or deep queue network is going to, uh, learn.", 'start': 63.829, 'duration': 11.951}, {'end': 79.023, 'text': "so what's going to happen is you've got the this input.", 'start': 75.78, 'duration': 3.243}, {'end': 80.805, 'text': "in this case it's an image.", 'start': 79.023, 'duration': 1.782}, {'end': 82.326, 'text': "uh, you've got some convolution layers.", 'start': 80.805, 'duration': 1.521}, {'end': 86.23, 'text': "they don't have to be convolution layers, some fully connected layers, they don't, You don't have to have those.", 'start': 82.326, 'duration': 3.904}], 'summary': 'Transcript discusses challenges of learning reinforcement learning and the process of deep queue network.', 'duration': 42.54, 'max_score': 57.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY57123.jpg'}, {'end': 128.632, 'src': 'heatmap', 'start': 97.278, 'weight': 0.716, 'content': [{'end': 99.96, 'text': "And it's going to do so with a linear output.", 'start': 97.278, 'duration': 2.682}, {'end': 103.042, 'text': "So it's a regression model with many outputs, not just one.", 'start': 99.98, 'duration': 3.062}, {'end': 105.764, 'text': 'Now, some people did try to do just one output per.', 'start': 103.282, 'duration': 2.482}, {'end': 108.446, 'text': "So basically, you'd have a model per possible action.", 'start': 106.064, 'duration': 2.382}, {'end': 110.628, 'text': "But it doesn't really work well.", 'start': 109.607, 'duration': 1.021}, {'end': 112.109, 'text': "And that's super.", 'start': 110.668, 'duration': 1.441}, {'end': 114.27, 'text': "That's going to take a really long time to train.", 'start': 112.97, 'duration': 1.3}, {'end': 117.395, 'text': "So, anyways, here's another example.", 'start': 115.311, 'duration': 2.084}, {'end': 118.977, 'text': "It's beautiful.", 'start': 118.436, 'duration': 0.541}, {'end': 120.359, 'text': 'Really beautiful.', 'start': 119.898, 'duration': 0.461}, {'end': 122.322, 'text': 'You got input values.', 'start': 121.341, 'duration': 0.981}, {'end': 123.704, 'text': "So, again, it doesn't have to be an image.", 'start': 122.402, 'duration': 1.302}, {'end': 128.632, 'text': 'This could be delta x, delta y for the food, delta x, delta y for the enemy, for example.', 'start': 124.065, 'duration': 4.567}], 'summary': 'Regression model with multiple outputs is more efficient than one output per action, but takes longer to train.', 'duration': 31.354, 'max_score': 97.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY97278.jpg'}, {'end': 174.652, 'src': 'heatmap', 'start': 134.936, 'weight': 0.702, 'content': [{'end': 138.057, 'text': "And then here you've got your output again with a linear activation.", 'start': 134.936, 'duration': 3.121}, {'end': 140.278, 'text': "So it's just going to output these scalar values.", 'start': 138.077, 'duration': 2.201}, {'end': 142.679, 'text': 'Well, these are your Q values.', 'start': 140.338, 'duration': 2.341}, {'end': 145.94, 'text': 'They map directly to the various actions you could take.', 'start': 142.739, 'duration': 3.201}, {'end': 150.161, 'text': "So in this case, let's say this was your output.", 'start': 146.94, 'duration': 3.221}, {'end': 151.322, 'text': 'You would take the argmax.', 'start': 150.241, 'duration': 1.081}, {'end': 155.343, 'text': 'Well, the max value here is 9.5.', 'start': 151.442, 'duration': 3.901}, {'end': 161.926, 'text': 'And the values are, you know, basically if we were to map this and get the index values, it would be 0, 1, 2, 3.', 'start': 155.343, 'duration': 6.583}, {'end': 164.307, 'text': 'So argmax would be 1.', 'start': 161.926, 'duration': 2.381}, {'end': 168.889, 'text': 'So whatever action 1 is, that would be the move that we would make.', 'start': 164.307, 'duration': 4.582}, {'end': 174.652, 'text': "Okay, so we've replaced this cue table with a deep neural network.", 'start': 170.089, 'duration': 4.563}], 'summary': 'Using deep neural network to replace cue table, yielding q values and selecting optimal action.', 'duration': 39.716, 'max_score': 134.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY134936.jpg'}, {'end': 202.273, 'src': 'embed', 'start': 174.872, 'weight': 0, 'content': [{'end': 179.034, 'text': 'The benefit here is we can learn way more complex environments.', 'start': 174.872, 'duration': 4.162}, {'end': 184.878, 'text': 'First of all, we can learn more complex environments just because a deep neural network is capable of actually mapping that.', 'start': 179.115, 'duration': 5.763}, {'end': 188.82, 'text': "Also, a deep neural network can take actions that it's never seen before.", 'start': 185.438, 'duration': 3.382}, {'end': 191.382, 'text': 'So with cue learning.', 'start': 188.961, 'duration': 2.421}, {'end': 199.349, 'text': "if a certain scenario presented itself and it was outside of any of the discrete combinations that we've ever seen, well,", 'start': 191.382, 'duration': 7.967}, {'end': 202.273, 'text': "it's going to take a random action because it got initialized randomly.", 'start': 199.349, 'duration': 2.924}], 'summary': 'Deep neural networks can handle complex environments and take novel actions, enhancing learning capabilities.', 'duration': 27.401, 'max_score': 174.872, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY174872.jpg'}, {'end': 243.574, 'src': 'embed', 'start': 212.126, 'weight': 2, 'content': [{'end': 215.649, 'text': 'So, first of all, a deep neural network is going to do better in that case.', 'start': 212.126, 'duration': 3.523}, {'end': 219.393, 'text': 'so in that way it can solve for way more complex environments.', 'start': 215.649, 'duration': 3.744}, {'end': 228.622, 'text': 'but also, as we saw, as you just even barely increase that, that discrete size of your cue table,', 'start': 219.393, 'duration': 9.229}, {'end': 235.669, 'text': "the amount of memory that's required to maintain that cue table just just explodes right,", 'start': 228.622, 'duration': 7.047}, {'end': 243.574, 'text': 'and it both in in terms of your observation space um size or your discrete observation space size, but also in your actions.', 'start': 235.669, 'duration': 7.905}], 'summary': 'Deep neural network performs better in complex environments, with lower memory requirement.', 'duration': 31.448, 'max_score': 212.126, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY212126.jpg'}], 'start': 1.491, 'title': 'Deep q learning basics and neural network for q-learning', 'summary': 'Covers deep q learning basics, emphasizing the importance of deep learning prerequisites and the structure of deep q networks, with a recommendation to review the first three tutorials on deep learning basics. additionally, it discusses the use of a deep neural network to replace the q-table in q-learning, enabling it to handle more complex environments and take new actions, thus reducing memory requirements and allowing for more diverse actions.', 'chapters': [{'end': 129.393, 'start': 1.491, 'title': 'Reinforcement machine learning: deep q learning basics', 'summary': 'Discusses deep q learning basics, emphasizing the importance of deep learning prerequisites and the structure of deep q networks, with a recommendation to review the first three tutorials on deep learning basics before proceeding.', 'duration': 127.902, 'highlights': ['The importance of deep learning prerequisites is emphasized, with a suggestion to complete the first three tutorials on deep learning basics before delving into deep Q learning.', 'The structure of deep Q networks is explained, focusing on the input, convolution layers, fully connected layers, output layer, and its mapping to various actions.', 'The inefficiency of using a model per possible action is highlighted, as it is not effective and would take a long time to train.']}, {'end': 257.603, 'start': 129.473, 'title': 'Neural network for q-learning', 'summary': 'Discusses the use of a deep neural network to replace the q-table in q-learning, enabling it to handle more complex environments and take actions it has never seen before, thus reducing memory requirements and allowing for more diverse actions.', 'duration': 128.13, 'highlights': ['A deep neural network can learn more complex environments and take actions it has never seen before. Deep neural network is capable of mapping complex environments and can recognize and act on scenarios it has never encountered.', 'Using a deep neural network reduces the memory required to maintain the Q-table. The memory required to maintain the Q-table decreases significantly when a deep neural network is used.', 'The deep neural network allows for more diverse actions, such as introducing diagonal and cardinal moves. Introducing diagonal and cardinal moves as actions expands the diversity of actions that can be taken.']}], 'duration': 256.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1491.jpg', 'highlights': ['The deep neural network allows for more diverse actions, such as introducing diagonal and cardinal moves.', 'A deep neural network can learn more complex environments and take actions it has never seen before.', 'Using a deep neural network reduces the memory required to maintain the Q-table.', 'The structure of deep Q networks is explained, focusing on the input, convolution layers, fully connected layers, output layer, and its mapping to various actions.', 'The importance of deep learning prerequisites is emphasized, with a suggestion to complete the first three tutorials on deep learning basics before delving into deep Q learning.']}, {'end': 715.582, 'segs': [{'end': 289.434, 'src': 'embed', 'start': 257.603, 'weight': 0, 'content': [{'end': 259.084, 'text': "as well as don't move.", 'start': 257.603, 'duration': 1.481}, {'end': 261.887, 'text': "so um, so we're gonna introduce all of those as well.", 'start': 259.084, 'duration': 2.803}, {'end': 267.953, 'text': 'Anyway, so for those two reasons, neural networks are way better.', 'start': 263.449, 'duration': 4.504}, {'end': 270.716, 'text': 'The downside is neural networks are kind of finicky.', 'start': 268.534, 'duration': 2.182}, {'end': 274.92, 'text': "So we're going to have to handle for a lot of things that are finicky about neural networks.", 'start': 270.916, 'duration': 4.004}, {'end': 277.723, 'text': "Also, it's going to take a lot longer to train.", 'start': 275.621, 'duration': 2.102}, {'end': 289.434, 'text': 'So on an identical model or an identical environment, like in the blob env, where it took our cue table minutes to fully populate, basically.', 'start': 277.783, 'duration': 11.651}], 'summary': 'Neural networks are better but finicky, take longer to train.', 'duration': 31.831, 'max_score': 257.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY257603.jpg'}, {'end': 346.027, 'src': 'embed', 'start': 302.285, 'weight': 1, 'content': [{'end': 306.909, 'text': "But the benefit is where it takes, I'm sorry, it's going to take hours for deep queue learning.", 'start': 302.285, 'duration': 4.624}, {'end': 317.534, 'text': 'But the benefit here is for certain environments where it takes, yes, a long time, like weeks, for deep Q learning to learn an environment.', 'start': 308.169, 'duration': 9.365}, {'end': 324.297, 'text': 'it would require petabytes of memory for a Q table to figure it out.', 'start': 317.534, 'duration': 6.763}, {'end': 326.218, 'text': "So there's your difference.", 'start': 324.777, 'duration': 1.441}, {'end': 329.659, 'text': "So really, they're going to solve different types of environments.", 'start': 326.898, 'duration': 2.761}, {'end': 332.36, 'text': 'And honestly, Q learning is pretty much useless.', 'start': 329.719, 'duration': 2.641}, {'end': 335.502, 'text': 'You can use it for cool novel little niche things for sure.', 'start': 332.44, 'duration': 3.062}, {'end': 338.603, 'text': "but you're not gonna find too much use for it in the real world?", 'start': 336.322, 'duration': 2.281}, {'end': 339.244, 'text': "I don't think.", 'start': 338.603, 'duration': 0.641}, {'end': 343.746, 'text': 'whereas deep Q learning, you can actually start to apply it to really cool things.', 'start': 339.244, 'duration': 4.502}, {'end': 346.027, 'text': 'so anyways, enough jibber-jabber.', 'start': 343.746, 'duration': 2.281}], 'summary': 'Deep q learning is more beneficial than q learning, as it can be applied to real-world scenarios and requires less time and memory for learning environments.', 'duration': 43.742, 'max_score': 302.285, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY302285.jpg'}, {'end': 429.928, 'src': 'heatmap', 'start': 387.609, 'weight': 0.98, 'content': [{'end': 391.271, 'text': 'so anyways, In this case we still want to retain this.', 'start': 387.609, 'duration': 3.662}, {'end': 392.592, 'text': 'So we are still going to use this.', 'start': 391.311, 'duration': 1.281}, {'end': 397.814, 'text': "And basically, the way we're going to do this is every step this agent takes, we still need to update that Q value.", 'start': 392.652, 'duration': 5.162}, {'end': 401.156, 'text': "So what we're going to do is we query for the Q value.", 'start': 398.554, 'duration': 2.602}, {'end': 403.677, 'text': 'We take that action or a random one, depending on epsilon.', 'start': 401.236, 'duration': 2.441}, {'end': 417.683, 'text': 'Then we resample the environment, figure out what the next reward would be, and then we can calculate this new Q value and then do a fit operation.', 'start': 405.037, 'duration': 12.646}, {'end': 422.165, 'text': "So people who are familiar with deep neural networks are already like, wow, that's a lot of fits.", 'start': 417.763, 'duration': 4.402}, {'end': 423.205, 'text': 'Yep, sure is.', 'start': 422.405, 'duration': 0.8}, {'end': 426.027, 'text': "Also, that's one fit at a time.", 'start': 424.186, 'duration': 1.841}, {'end': 429.928, 'text': "So, as you're going to see, when we go to write this code, we actually have to handle for that as well,", 'start': 426.087, 'duration': 3.841}], 'summary': 'Updating q values for each agent step in the environment, involving multiple fit operations.', 'duration': 42.319, 'max_score': 387.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY387609.jpg'}, {'end': 429.928, 'src': 'embed', 'start': 405.037, 'weight': 3, 'content': [{'end': 417.683, 'text': 'Then we resample the environment, figure out what the next reward would be, and then we can calculate this new Q value and then do a fit operation.', 'start': 405.037, 'duration': 12.646}, {'end': 422.165, 'text': "So people who are familiar with deep neural networks are already like, wow, that's a lot of fits.", 'start': 417.763, 'duration': 4.402}, {'end': 423.205, 'text': 'Yep, sure is.', 'start': 422.405, 'duration': 0.8}, {'end': 426.027, 'text': "Also, that's one fit at a time.", 'start': 424.186, 'duration': 1.841}, {'end': 429.928, 'text': "So, as you're going to see, when we go to write this code, we actually have to handle for that as well,", 'start': 426.087, 'duration': 3.841}], 'summary': 'Resampling the environment to calculate new q value for deep neural networks.', 'duration': 24.891, 'max_score': 405.037, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY405037.jpg'}, {'end': 474.208, 'src': 'embed', 'start': 448.331, 'weight': 4, 'content': [{'end': 453.334, 'text': "like i said, All the tutorials I've ever seen on DeepQ learning have been terrible.", 'start': 448.331, 'duration': 5.003}, {'end': 456.016, 'text': "And there's so much information that's left out.", 'start': 453.995, 'duration': 2.021}, {'end': 461.099, 'text': 'To get the overarching concept, honestly, this picture is enough.', 'start': 457.337, 'duration': 3.762}, {'end': 468.804, 'text': "But then, when it comes time to actually look at code and how it really will work, Can you sit down and code it after you read someone's tutorial?", 'start': 462.12, 'duration': 6.684}, {'end': 470.986, 'text': 'My hope is you really can after this one.', 'start': 469.144, 'duration': 1.842}, {'end': 474.208, 'text': "But otherwise, I don't think a tutorial exists for doing it.", 'start': 471.606, 'duration': 2.602}], 'summary': 'Deepq learning tutorials lack necessary information, making it challenging to code after reading them.', 'duration': 25.877, 'max_score': 448.331, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY448331.jpg'}, {'end': 602.071, 'src': 'embed', 'start': 536.391, 'weight': 5, 'content': [{'end': 541.954, 'text': "Someone complained recently about my drinking on camera, saying they don't want to hear me gulp.", 'start': 536.391, 'duration': 5.563}, {'end': 549.458, 'text': "I'm drinking because my mouth is dry, and the most annoying thing ever is listening to someone talk with a dry mouth.", 'start': 544.195, 'duration': 5.263}, {'end': 556.426, 'text': "So you're welcome Jerk, so anyway Create model.", 'start': 550.923, 'duration': 5.503}, {'end': 564.69, 'text': "Okay, so we'll start with model equals, a sequential Model, and now let's go ahead and make some imports.", 'start': 556.726, 'duration': 7.964}, {'end': 569.732, 'text': 'So the first thing that I need to bring in is from Kara stop models.', 'start': 564.69, 'duration': 5.042}, {'end': 588.387, 'text': "Let's import a import sequential and uh, then we're going to say from keros dot layers let's import dense dropout conv 2d max pool 2d.", 'start': 569.772, 'duration': 18.615}, {'end': 593.87, 'text': 'um, activation, activation flatten.', 'start': 588.387, 'duration': 5.483}, {'end': 594.911, 'text': "i think that's everything.", 'start': 593.87, 'duration': 1.041}, {'end': 597.173, 'text': 'sorry, that ran over my face, my bad.', 'start': 594.911, 'duration': 2.262}, {'end': 599.46, 'text': 'Anyway, there we go.', 'start': 598.577, 'duration': 0.883}, {'end': 602.071, 'text': 'Dents, dropout, conv, doody.', 'start': 599.621, 'duration': 2.45}], 'summary': 'Addressing complaint about drinking on camera while discussing model creation with imports and layers', 'duration': 65.68, 'max_score': 536.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY536391.jpg'}, {'end': 661.502, 'src': 'embed', 'start': 629.608, 'weight': 7, 'content': [{'end': 633.03, 'text': 'um okay, so we got that stuff.', 'start': 629.608, 'duration': 3.422}, {'end': 638.752, 'text': "um, The other thing I'll go ahead and import to is from keros dot callbacks.", 'start': 633.03, 'duration': 5.722}, {'end': 641.773, 'text': "Let's import tensor board.", 'start': 638.792, 'duration': 2.981}, {'end': 648.176, 'text': 'We need other things, but I want to cover them when we get there.', 'start': 645.355, 'duration': 2.821}, {'end': 651.578, 'text': 'So model is equal to a sequential model.', 'start': 648.236, 'duration': 3.342}, {'end': 660.522, 'text': "Then we're going to say model dot add, and then we're going to start adding conv to capital D.", 'start': 651.918, 'duration': 8.604}, {'end': 661.502, 'text': 'Let me fix that as well.', 'start': 660.522, 'duration': 0.98}], 'summary': 'Imported modules and configured sequential model for adding convolutions.', 'duration': 31.894, 'max_score': 629.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY629608.jpg'}], 'start': 257.603, 'title': 'Deep q learning and complaint resolution', 'summary': 'Delves into the benefits and challenges of using neural networks for deep q learning, emphasizing practical applicability and the process of updating q values. it also discusses addressing a complaint about drinking on camera and provides a keras tutorial for creating a sequential model.', 'chapters': [{'end': 533.946, 'start': 257.603, 'title': 'Deep q learning overview', 'summary': 'Explains the benefits and challenges of using neural networks for deep q learning, including longer training times and memory requirements, as well as the limitations of q learning. it also emphasizes the practical applicability of deep q learning and the need for comprehensive tutorials. the process of updating q values using neural networks and the challenges in implementing it are also discussed.', 'duration': 276.343, 'highlights': ['Neural networks are way better for certain environments, but they are finicky and take longer to train compared to Q learning. Neural networks are more effective for specific environments, despite being finicky and requiring longer training times compared to Q learning.', 'Deep Q learning can take hours to learn an environment that Q learning could learn in minutes, but it is beneficial for environments where Q learning would require petabytes of memory. Deep Q learning may take significantly longer to learn certain environments compared to Q learning, but it is advantageous in scenarios where Q learning would need an impractical amount of memory.', 'Q learning is limited in real-world applications, while deep Q learning has practical applicability to various scenarios. Q learning is of limited use in real-world applications, whereas deep Q learning has practical applicability in a wide range of scenarios.', 'The process of updating Q values using neural networks involves querying for the Q value, taking an action, resampling the environment, calculating the new Q value, and performing a fit operation. The process of updating Q values using neural networks includes querying for the Q value, taking an action, resampling the environment, calculating the new Q value, and performing a fit operation.', 'Comprehensive tutorials for deep Q learning are lacking, and the chapter aims to provide practical guidance for coding and implementation. The chapter acknowledges the lack of comprehensive tutorials for deep Q learning and aims to provide practical guidance for coding and implementation.']}, {'end': 715.582, 'start': 536.391, 'title': 'Dealing with complaints', 'summary': 'Discusses addressing a complaint about drinking on camera, while providing a tutorial on creating a sequential model with specific imports and callbacks in keras.', 'duration': 179.191, 'highlights': ['Addressing a complaint about drinking on camera due to dry mouth while providing a tutorial on creating a sequential model in Keras.', "Providing specific imports for the Keras model, including 'dense', 'dropout', 'conv2d', 'maxpool2d', and 'activation'.", "Mentioning the need for importing 'tensorboard' from Keras callbacks and discussing the addition of convolutional layers to the model with specific parameters."]}], 'duration': 457.979, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY257603.jpg', 'highlights': ['Neural networks are more effective for specific environments, despite being finicky and requiring longer training times compared to Q learning.', 'Deep Q learning may take significantly longer to learn certain environments compared to Q learning, but it is advantageous in scenarios where Q learning would need an impractical amount of memory.', 'Q learning is of limited use in real-world applications, whereas deep Q learning has practical applicability in a wide range of scenarios.', 'The process of updating Q values using neural networks includes querying for the Q value, taking an action, resampling the environment, calculating the new Q value, and performing a fit operation.', 'The chapter acknowledges the lack of comprehensive tutorials for deep Q learning and aims to provide practical guidance for coding and implementation.', 'Addressing a complaint about drinking on camera due to dry mouth while providing a tutorial on creating a sequential model in Keras.', "Providing specific imports for the Keras model, including 'dense', 'dropout', 'conv2d', 'maxpool2d', and 'activation'.", "Mentioning the need for importing 'tensorboard' from Keras callbacks and discussing the addition of convolutional layers to the model with specific parameters."]}, {'end': 1155.789, 'segs': [{'end': 774.468, 'src': 'embed', 'start': 747.374, 'weight': 0, 'content': [{'end': 753.417, 'text': "And again, if you don't know what max pooling is or convolutions, check out that basics tutorial because I cover it.", 'start': 747.374, 'duration': 6.043}, {'end': 755.918, 'text': 'And I also have beautifully drawn photos.', 'start': 753.437, 'duration': 2.481}, {'end': 759.8, 'text': "If you really like my other photo, you'll love those photos.", 'start': 756.358, 'duration': 3.442}, {'end': 768.084, 'text': "Then, after the max pooling, we're going to say model.add, and we're going to add a dropout layer, and we'll drop out 20%.", 'start': 761.681, 'duration': 6.403}, {'end': 771.046, 'text': "And then we're just going to do the same thing again.", 'start': 768.084, 'duration': 2.962}, {'end': 774.468, 'text': 'So this will be 2x256, so copy pasta.', 'start': 771.126, 'duration': 3.342}], 'summary': 'Tutorial covers max pooling, convolutions, dropout 20%, and 2x256 model.', 'duration': 27.094, 'max_score': 747.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY747374.jpg'}, {'end': 853.475, 'src': 'embed', 'start': 809.074, 'weight': 1, 'content': [{'end': 815.016, 'text': "And it'll be env.action, space size.", 'start': 809.074, 'duration': 5.942}, {'end': 818.457, 'text': 'and then the activation.', 'start': 815.016, 'duration': 3.441}, {'end': 828.881, 'text': "activation will be linear and then model dot compile and we'll say loss is msc,", 'start': 818.457, 'duration': 10.424}, {'end': 843.223, 'text': 'for mean squared error optimizer will be the atom optimizer with a learning rate of 0.001.', 'start': 828.881, 'duration': 14.342}, {'end': 845.185, 'text': 'And then metrics.', 'start': 843.223, 'duration': 1.962}, {'end': 847.568, 'text': 'we will track accuracy.', 'start': 845.185, 'duration': 2.383}, {'end': 853.475, 'text': 'Okay, so that is our model.', 'start': 851.192, 'duration': 2.283}], 'summary': 'Model with linear activation, mse loss, and 0.001 learning rate for atom optimizer to track accuracy.', 'duration': 44.401, 'max_score': 809.074, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY809074.jpg'}, {'end': 1086.272, 'src': 'heatmap', 'start': 1006.608, 'weight': 0.79, 'content': [{'end': 1012.673, 'text': "there's a few reasons, but mostly it's because this model is going to be going crazy.", 'start': 1006.608, 'duration': 6.065}, {'end': 1018.839, 'text': 'So, first of all, the model itself is initialized randomly, as all neural networks are,', 'start': 1012.773, 'duration': 6.066}, {'end': 1022.042, 'text': 'but we also are going to initialize with an epsilon likely of a.', 'start': 1018.839, 'duration': 3.203}, {'end': 1028.732, 'text': 'So the agent is also going to be taking random actions meaningless.', 'start': 1023.443, 'duration': 5.289}, {'end': 1035.383, 'text': 'So initially, this model is going to be trying to fit to a whole bunch of random.', 'start': 1029.773, 'duration': 5.61}, {'end': 1038.836, 'text': "And that's going to be useless.", 'start': 1036.415, 'duration': 2.421}, {'end': 1046.117, 'text': "And but eventually, as as it's explored, as it's gotten rewards, it's going to start to hopefully figure something out.", 'start': 1039.476, 'duration': 6.641}, {'end': 1052.12, 'text': "But the problem is we're doing a dot predict for every single step this agent takes.", 'start': 1046.938, 'duration': 5.182}, {'end': 1062.724, 'text': "And what we want to have is some sort of consistency in those dot predicts that we're doing, because Besides doing a .", 'start': 1052.82, 'duration': 9.904}, {'end': 1065.409, 'text': "predict every single step, we're also doing a .", 'start': 1062.724, 'duration': 2.685}, {'end': 1067.472, 'text': 'fit every single step.', 'start': 1065.409, 'duration': 2.063}, {'end': 1076.225, 'text': "So this model, especially initially, is just going to be like all over the place as it's attempting to figure things out randomly.", 'start': 1067.992, 'duration': 8.233}, {'end': 1081.289, 'text': "So the way we're going to compensate for that is we're going to have two models.", 'start': 1077.126, 'duration': 4.163}, {'end': 1082.61, 'text': "So we've got self dot model.", 'start': 1081.329, 'duration': 1.281}, {'end': 1086.272, 'text': "This is the model that we're dot fitting every step.", 'start': 1083.05, 'duration': 3.222}], 'summary': 'The model is initialized randomly with an epsilon likely of a, which leads to inconsistency, so two models will be used to compensate.', 'duration': 79.664, 'max_score': 1006.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1006608.jpg'}, {'end': 1052.12, 'src': 'embed', 'start': 1023.443, 'weight': 3, 'content': [{'end': 1028.732, 'text': 'So the agent is also going to be taking random actions meaningless.', 'start': 1023.443, 'duration': 5.289}, {'end': 1035.383, 'text': 'So initially, this model is going to be trying to fit to a whole bunch of random.', 'start': 1029.773, 'duration': 5.61}, {'end': 1038.836, 'text': "And that's going to be useless.", 'start': 1036.415, 'duration': 2.421}, {'end': 1046.117, 'text': "And but eventually, as as it's explored, as it's gotten rewards, it's going to start to hopefully figure something out.", 'start': 1039.476, 'duration': 6.641}, {'end': 1052.12, 'text': "But the problem is we're doing a dot predict for every single step this agent takes.", 'start': 1046.938, 'duration': 5.182}], 'summary': 'Agent takes random actions initially, eventually learns from rewards.', 'duration': 28.677, 'max_score': 1023.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1023443.jpg'}, {'end': 1155.789, 'src': 'embed', 'start': 1101.042, 'weight': 2, 'content': [{'end': 1106.19, 'text': 'Step, and then this is the one that gets you know, gets trained every step.', 'start': 1101.042, 'duration': 5.148}, {'end': 1115.883, 'text': "make note of that, because you'll probably forget so so So then what happens is, after every some number of steps or episodes or whatever, is,", 'start': 1106.19, 'duration': 9.693}, {'end': 1119.346, 'text': "you'll re update your target model.", 'start': 1116.664, 'duration': 2.682}, {'end': 1123.73, 'text': "So you'll just set the weights to be equal to, to model again.", 'start': 1119.406, 'duration': 4.324}, {'end': 1129.715, 'text': 'So this just keeps a little bit of sanity in your predictions, right?', 'start': 1124.591, 'duration': 5.124}, {'end': 1136.361, 'text': "So, in the predictions that you're making, this is how you'll have a little bit of stability and consistency.", 'start': 1129.775, 'duration': 6.586}, {'end': 1142.285, 'text': "So your model can, can actually learn something because it's just going to be, there's so much randomness.", 'start': 1136.441, 'duration': 5.844}, {'end': 1144.007, 'text': "It's going to be very challenging initially.", 'start': 1142.346, 'duration': 1.661}, {'end': 1148.044, 'text': "and so long as you're doing things randomly.", 'start': 1146.283, 'duration': 1.761}, {'end': 1155.789, 'text': "Okay, so that's one way we are handling for the chaos that is about to ensue.", 'start': 1149.665, 'duration': 6.124}], 'summary': 'Update target model after every few steps for stability and consistency in predictions.', 'duration': 54.747, 'max_score': 1101.042, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1101042.jpg'}], 'start': 718.217, 'title': 'Building neural network models', 'summary': 'Discusses creating a convolutional neural network model with specific layers and parameters, including activation, max pooling, dropout, and dense layers. it covers model compilation with mean squared error loss, adam optimizer, and tracking accuracy metrics. additionally, it explores the initialization of neural network models to ensure stability in predictions.', 'chapters': [{'end': 853.475, 'start': 718.217, 'title': 'Creating a convolutional neural network', 'summary': 'Discusses building a convolutional neural network (cnn) model architecture with specific layers and parameters, including convolution, activation, max pooling, dropout, dense layers, and model compilation, utilizing rectified linear activation, max pooling with a 2x2 window, 20% dropout, dense layers of 2x256 and 64 units, and the final layer with linear activation. the model is compiled with mean squared error loss, adam optimizer with a learning rate of 0.001, and tracking accuracy metrics.', 'duration': 135.258, 'highlights': ['The model architecture includes convolution, activation, max pooling, dropout, and dense layers, with specific parameters such as rectified linear activation, 2x2 window for max pooling, and 20% dropout rate, followed by dense layers of 2x256 and 64 units, and the final layer with linear activation.', 'The model compilation involves setting the loss function to mean squared error, using the Adam optimizer with a learning rate of 0.001, and tracking accuracy metrics.']}, {'end': 1155.789, 'start': 853.695, 'title': 'Neural network model initialization', 'summary': 'Covers the initialization of a neural network model, including the use of two models to handle randomness and ensure stability in predictions, with the aim of enabling the model to learn effectively in the face of chaos and randomness.', 'duration': 302.094, 'highlights': ['The chapter covers the use of two models to handle randomness and ensure stability in predictions The chapter explains the use of two models, self.model and self.targetModel, to handle the randomness and instability in predictions, ensuring consistency and stability in the learning process.', 'The model is initialized randomly with an epsilon-greedy policy for the agent The model is initialized randomly with an epsilon-greedy policy for the agent, causing it to take random actions initially, which can be useless, but the model aims to learn through exploration and rewards.', 'The target model is updated to maintain stability and consistency in predictions The target model is re-updated after a certain number of steps or episodes to maintain stability and consistency in predictions, enabling the model to learn effectively amidst the initial chaos and randomness.']}], 'duration': 437.572, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY718217.jpg', 'highlights': ['The model architecture includes convolution, activation, max pooling, dropout, and dense layers with specific parameters.', 'The model compilation involves setting the loss function to mean squared error, using the Adam optimizer with a learning rate of 0.001, and tracking accuracy metrics.', 'The chapter covers the use of two models, self.model and self.targetModel, to handle the randomness and instability in predictions, ensuring consistency and stability in the learning process.', 'The model is initialized randomly with an epsilon-greedy policy for the agent, causing it to take random actions initially, which can be useless, but the model aims to learn through exploration and rewards.', 'The target model is re-updated after a certain number of steps or episodes to maintain stability and consistency in predictions, enabling the model to learn effectively amidst the initial chaos and randomness.']}, {'end': 1382.547, 'segs': [{'end': 1186.435, 'src': 'embed', 'start': 1156.79, 'weight': 0, 'content': [{'end': 1164.074, 'text': "Next, we're going to have self.replay underscore memory, and that is going to be a dequeue or dequeue.", 'start': 1156.79, 'duration': 7.284}, {'end': 1165.815, 'text': 'I always forget how to pronounce that.', 'start': 1164.154, 'duration': 1.661}, {'end': 1172.78, 'text': "To use that, we're going to say from collections import dequeue.", 'start': 1166.416, 'duration': 6.364}, {'end': 1180.33, 'text': "and if you don't know a dick dick you is, it is a set length.", 'start': 1175.767, 'duration': 4.563}, {'end': 1183.292, 'text': 'think of it as like an array or a list that you can say.', 'start': 1180.33, 'duration': 2.962}, {'end': 1186.435, 'text': 'I want this list to be a max size.', 'start': 1183.292, 'duration': 3.143}], 'summary': 'Introduction to using deque from collections for creating a set-length list.', 'duration': 29.645, 'max_score': 1156.79, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1156790.jpg'}, {'end': 1259.767, 'src': 'embed', 'start': 1234.857, 'weight': 1, 'content': [{'end': 1241.901, 'text': "So at least the agent is taking steps consistent with the model over time, okay? I'm trying to think of a great way to word that, but.", 'start': 1234.857, 'duration': 7.044}, {'end': 1247.623, 'text': "So we've got the prediction consistency sort of under control.", 'start': 1242.381, 'duration': 5.242}, {'end': 1250.104, 'text': "It's still going to be crazy at the start, period.", 'start': 1247.723, 'duration': 2.381}, {'end': 1252.724, 'text': "But we've got that settled.", 'start': 1250.524, 'duration': 2.2}, {'end': 1257.306, 'text': 'Now we need to handle for the fitment craziness that is going to ensue.', 'start': 1253.125, 'duration': 4.181}, {'end': 1259.767, 'text': "Because we've got two models.", 'start': 1257.806, 'duration': 1.961}], 'summary': 'Agent is taking steps consistent with the model over time, settling prediction consistency, and preparing for fitment craziness.', 'duration': 24.91, 'max_score': 1234.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1234857.jpg'}, {'end': 1360.69, 'src': 'heatmap', 'start': 1298.322, 'weight': 0.783, 'content': [{'end': 1307.584, 'text': 'but to have some sort of batch size used it generally will result in a more stable model and just learning over time.', 'start': 1298.322, 'duration': 9.262}, {'end': 1313.385, 'text': "so if you just fit with one thing, it's going to adjust all those weights in accordance to that one thing and it's going to get to the new thing,", 'start': 1307.584, 'duration': 5.801}, {'end': 1316.446, 'text': "do another fit and it's going to adjust all the weights in accordance to that one thing.", 'start': 1313.385, 'duration': 3.061}, {'end': 1323.929, 'text': "As opposed to if you throw in a batch of 64 things, it's going to adjust all the weights in accordance to all 64 things.", 'start': 1316.886, 'duration': 7.043}, {'end': 1329.372, 'text': "So it's not going to overfit to one sample and then go to the next sample and overfit to that sample and then go to another.", 'start': 1324.39, 'duration': 4.982}, {'end': 1333.554, 'text': "You know what I'm saying? So we want to handle for that too.", 'start': 1329.392, 'duration': 4.162}, {'end': 1336.255, 'text': 'And the way that we do that is we have this replay memory.', 'start': 1333.634, 'duration': 2.621}, {'end': 1341.037, 'text': "So we take this replay memory and that can be, let's say, 50, 000 steps.", 'start': 1336.836, 'duration': 4.201}, {'end': 1351.784, 'text': "And then we take a random sampling of those 50 000 steps and that's the batch that we're going to feed off of.", 'start': 1342.118, 'duration': 9.666}, {'end': 1354.506, 'text': 'so train the neural network off of.', 'start': 1351.784, 'duration': 2.722}, {'end': 1360.69, 'text': "so that's how we're getting stability in both the training network, that's getting updated every single step,", 'start': 1354.506, 'duration': 6.184}], 'summary': 'Using a batch size of 64 helps stabilize the model and prevent overfitting, achieved through replay memory with 50,000 steps.', 'duration': 62.368, 'max_score': 1298.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1298322.jpg'}, {'end': 1351.784, 'src': 'embed', 'start': 1324.39, 'weight': 2, 'content': [{'end': 1329.372, 'text': "So it's not going to overfit to one sample and then go to the next sample and overfit to that sample and then go to another.", 'start': 1324.39, 'duration': 4.982}, {'end': 1333.554, 'text': "You know what I'm saying? So we want to handle for that too.", 'start': 1329.392, 'duration': 4.162}, {'end': 1336.255, 'text': 'And the way that we do that is we have this replay memory.', 'start': 1333.634, 'duration': 2.621}, {'end': 1341.037, 'text': "So we take this replay memory and that can be, let's say, 50, 000 steps.", 'start': 1336.836, 'duration': 4.201}, {'end': 1351.784, 'text': "And then we take a random sampling of those 50 000 steps and that's the batch that we're going to feed off of.", 'start': 1342.118, 'duration': 9.666}], 'summary': 'Using replay memory of 50,000 steps to prevent overfitting in sampling.', 'duration': 27.394, 'max_score': 1324.39, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1324390.jpg'}], 'start': 1156.79, 'title': 'Implementing replay memory and neural network stability', 'summary': 'Introduces implementing a replay memory of size 50,000 using a dequeue from the collections module in python, and discusses the use of replay memory with 50,000 steps for random sampling to train the neural network, ensuring stability in both the training network and the predictions by updating them every five episodes. it also explains the concept of replay memory in reinforcement learning and emphasizes the importance of batch size for stability and learning in neural network training.', 'chapters': [{'end': 1234.317, 'start': 1156.79, 'title': 'Implementing replay memory for reinforcement learning', 'summary': 'Introduces how to implement a replay memory of size 50,000 using a dequeue from the collections module in python, and explains the concept of replay memory in reinforcement learning.', 'duration': 77.527, 'highlights': ['Implementing a replay memory of size 50,000 using a dequeue from the collections module in Python The code sets the max length of the replay memory to 50,000 and explains the use of underscores in place of commas for better readability.', "Explaining the concept of replay memory in reinforcement learning The transcript provides an overview of replay memory and its role in maintaining the agent's predictions for consistent performance."]}, {'end': 1323.929, 'start': 1234.857, 'title': 'Model training and batch size', 'summary': 'Discusses the challenges of model fitment consistency and the impact of using a batch size in neural network training, emphasizing the importance of batch size for stability and learning.', 'duration': 89.072, 'highlights': ['The importance of model fitment consistency is emphasized, with the agent taking steps consistent with the model over time.', 'The impact of using a batch size in neural network training is explained, highlighting that using a batch size generally results in a more stable model and better learning over time.', 'The drawbacks of fitting with one value every single step in neural network training are discussed, with emphasis on the importance of utilizing a batch to adjust all the weights in accordance to multiple values for stability and improved results.']}, {'end': 1382.547, 'start': 1324.39, 'title': 'Replay memory for neural network stability', 'summary': 'Discusses the use of replay memory with 50,000 steps for random sampling to train the neural network, ensuring stability in both the training network and the predictions by updating them every five episodes.', 'duration': 58.157, 'highlights': ['Replay memory consists of 50,000 steps for random sampling. The replay memory is utilized with 50,000 steps to provide a random sampling for training the neural network, ensuring stability.', 'Neural network is trained using a batch derived from the replay memory. A random sampling of the 50,000 steps in the replay memory is used as a batch to train the neural network, promoting stability in the training.', 'Predictions are smoothed out by updating the prediction model every five episodes. The prediction model is updated at intervals of every five episodes to ensure smoothness and stability in the predictions, thereby preventing overfitting.']}], 'duration': 225.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1156790.jpg', 'highlights': ['Implementing a replay memory of size 50,000 using a dequeue from the collections module in Python The code sets the max length of the replay memory to 50,000 and explains the use of underscores in place of commas for better readability.', 'The importance of model fitment consistency is emphasized, with the agent taking steps consistent with the model over time.', 'Replay memory consists of 50,000 steps for random sampling. The replay memory is utilized with 50,000 steps to provide a random sampling for training the neural network, ensuring stability.']}, {'end': 1938.602, 'segs': [{'end': 1419.722, 'src': 'embed', 'start': 1382.547, 'weight': 3, 'content': [{'end': 1392.256, 'text': "so the next thing is going to be self dot, tensor, uh, tensor board, and that's going to be equal to modified tensor board.", 'start': 1382.547, 'duration': 9.709}, {'end': 1404.528, 'text': "and we're going to set log underscore, dir to be logs, slash var, dash var,", 'start': 1392.256, 'duration': 12.272}, {'end': 1411.836, 'text': "and we're going to make this an f string and we're going to import import time.", 'start': 1404.528, 'duration': 7.308}, {'end': 1417.84, 'text': "if you're, i want to say 3.5 or younger python, you can't do f strings.", 'start': 1411.836, 'duration': 6.004}, {'end': 1419.722, 'text': "you'll have to do a dot format or whatever.", 'start': 1417.84, 'duration': 1.882}], 'summary': "Setting log_dir to 'logs/var-var' for tensor board.", 'duration': 37.175, 'max_score': 1382.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1382547.jpg'}, {'end': 1477.07, 'src': 'heatmap', 'start': 1434.392, 'weight': 0.888, 'content': [{'end': 1435.793, 'text': 'i thought it was getting pissy at me for this.', 'start': 1434.392, 'duration': 1.401}, {'end': 1440.017, 'text': "Anyway, let's go ahead and set model name before I forget.", 'start': 1437.194, 'duration': 2.823}, {'end': 1442.78, 'text': "And for now, it's whatever you want.", 'start': 1440.037, 'duration': 2.743}, {'end': 1445.862, 'text': "I'm going to say 256x2, because later you might try a 32x2 and a 64x2.", 'start': 1442.84, 'duration': 3.022}, {'end': 1447.604, 'text': 'Anyways, by 256x2, I mean the neural network.', 'start': 1445.902, 'duration': 1.702}, {'end': 1448.465, 'text': "It's a 256x2 continent.", 'start': 1447.624, 'duration': 0.841}, {'end': 1459.198, 'text': "So We've got TensorBoard done.", 'start': 1448.585, 'duration': 10.613}, {'end': 1460.719, 'text': "Obviously, this doesn't yet exist.", 'start': 1459.458, 'duration': 1.261}, {'end': 1461.859, 'text': "I'll talk about that in a moment.", 'start': 1460.759, 'duration': 1.1}, {'end': 1468.845, 'text': 'And then finally, self.targetUpdateCounter equals zero.', 'start': 1462.54, 'duration': 6.305}, {'end': 1477.07, 'text': "So we're going to use this to track internally when we are ready to update that target model.", 'start': 1469.245, 'duration': 7.825}], 'summary': 'Setting model name to 256x2 for neural network training.', 'duration': 42.678, 'max_score': 1434.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1434392.jpg'}, {'end': 1477.07, 'src': 'embed', 'start': 1442.84, 'weight': 0, 'content': [{'end': 1445.862, 'text': "I'm going to say 256x2, because later you might try a 32x2 and a 64x2.", 'start': 1442.84, 'duration': 3.022}, {'end': 1447.604, 'text': 'Anyways, by 256x2, I mean the neural network.', 'start': 1445.902, 'duration': 1.702}, {'end': 1448.465, 'text': "It's a 256x2 continent.", 'start': 1447.624, 'duration': 0.841}, {'end': 1459.198, 'text': "So We've got TensorBoard done.", 'start': 1448.585, 'duration': 10.613}, {'end': 1460.719, 'text': "Obviously, this doesn't yet exist.", 'start': 1459.458, 'duration': 1.261}, {'end': 1461.859, 'text': "I'll talk about that in a moment.", 'start': 1460.759, 'duration': 1.1}, {'end': 1468.845, 'text': 'And then finally, self.targetUpdateCounter equals zero.', 'start': 1462.54, 'duration': 6.305}, {'end': 1477.07, 'text': "So we're going to use this to track internally when we are ready to update that target model.", 'start': 1469.245, 'duration': 7.825}], 'summary': 'Discussing neural network configuration and target model update tracking.', 'duration': 34.23, 'max_score': 1442.84, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1442840.jpg'}, {'end': 1703.814, 'src': 'embed', 'start': 1646.066, 'weight': 1, 'content': [{'end': 1654.692, 'text': "Can we type? So then we're going to say self.replay underscore memory.append.", 'start': 1646.066, 'duration': 8.626}, {'end': 1658.475, 'text': 'Wow, I really want to call that transition.', 'start': 1656.473, 'duration': 2.002}, {'end': 1668.041, 'text': 'Transition So transition is just going to be your observation space, action, reward, new observation space, and then whether or not it was done.', 'start': 1658.675, 'duration': 9.366}, {'end': 1672.804, 'text': 'So we need to do that so we can do that new queue formula.', 'start': 1668.581, 'duration': 4.223}, {'end': 1674.545, 'text': "So that's all we're doing there.", 'start': 1672.904, 'duration': 1.641}, {'end': 1684.096, 'text': "That'll probably make more sense when we actually get to actually use self.updateReplayMemory or agent.updateReplayMemory.", 'start': 1677.128, 'duration': 6.968}, {'end': 1692.685, 'text': "Finally, yeah, we'll do this one last method, and then we'll save train for the next tutorial.", 'start': 1686.398, 'duration': 6.287}, {'end': 1693.767, 'text': "So that's going to take a while.", 'start': 1693.126, 'duration': 0.641}, {'end': 1697.83, 'text': 'So define get queues.', 'start': 1694.227, 'duration': 3.603}, {'end': 1703.814, 'text': 'So we will get queues self terminal state and then step.', 'start': 1698.23, 'duration': 5.584}], 'summary': 'Implementing transition observation space, action, reward, and queue formula for self.updatereplaymemory.', 'duration': 57.748, 'max_score': 1646.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1646066.jpg'}, {'end': 1820.734, 'src': 'heatmap', 'start': 1745.535, 'weight': 5, 'content': [{'end': 1756.737, 'text': "And, um, and then we're going to do div whoops, div by two 55.", 'start': 1745.535, 'duration': 11.202}, {'end': 1763.98, 'text': "And then zero so model dot predict always returns a list even though in this case We're really only predicting against one element.", 'start': 1756.737, 'duration': 7.243}, {'end': 1767.181, 'text': "It's still going to be a list of one element or an array of one element.", 'start': 1764.06, 'duration': 3.121}, {'end': 1769.302, 'text': 'So we still want the zero with element there.', 'start': 1767.181, 'duration': 2.121}, {'end': 1773.483, 'text': "div by 255 is because we're passing in this RGB image data.", 'start': 1769.302, 'duration': 4.181}, {'end': 1778.005, 'text': 'so you can very easily normalize that data by just simply dividing by 255.', 'start': 1773.483, 'duration': 4.522}, {'end': 1783.446, 'text': "and okay, so that's it.", 'start': 1778.005, 'duration': 5.441}, {'end': 1785.208, 'text': "let's go ahead and go up to the very top here.", 'start': 1783.446, 'duration': 1.762}, {'end': 1787.471, 'text': 'let me import numpy as np.', 'start': 1785.208, 'duration': 2.263}, {'end': 1790.935, 'text': "so let's say import numpy as np.", 'start': 1787.471, 'duration': 3.464}, {'end': 1792.397, 'text': 'let me go ahead and save that.', 'start': 1790.935, 'duration': 1.462}, {'end': 1798.639, 'text': "and uh, i think i'm gonna save train for the next tutorial.", 'start': 1793.895, 'duration': 4.744}, {'end': 1800.2, 'text': "uh, because that's gonna take a while.", 'start': 1798.639, 'duration': 1.561}, {'end': 1803.662, 'text': 'so quick shout out to, uh, the recent.', 'start': 1800.2, 'duration': 3.462}, {'end': 1804.523, 'text': 'well, not recent actually.', 'start': 1803.662, 'duration': 0.861}, {'end': 1812.628, 'text': 'these are my long-term channel members, people who have been sticking around for a while reginald roberts, seven months, luis fernando, seven months,', 'start': 1804.523, 'duration': 8.105}, {'end': 1820.734, 'text': 'stephen r scoot, harwood, eight months, amper hammer, eight months, edward williams, sams, 9 months and Abhijit, 10 months.', 'start': 1812.628, 'duration': 8.106}], 'summary': "Data normalization by dividing rgb image data by 255. noteworthy channel members' subscription durations: roberts 7 months, fernando 7 months, harwood 8 months, hammer 8 months, williams 9 months, abhijit 10 months.", 'duration': 22.095, 'max_score': 1745.535, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1745535.jpg'}, {'end': 1938.602, 'src': 'embed', 'start': 1913.836, 'weight': 6, 'content': [{'end': 1917.377, 'text': 'uh, and that be it that way.', 'start': 1913.836, 'duration': 3.541}, {'end': 1921.358, 'text': 'hopefully, maybe in the next tutorial we can actually do the train method here,', 'start': 1917.377, 'duration': 3.981}, {'end': 1927.819, 'text': 'talk about that deep q learning stuff and hopefully get a model training by the next video.', 'start': 1921.358, 'duration': 6.461}, {'end': 1929.1, 'text': "so i'll think about that.", 'start': 1927.819, 'duration': 1.281}, {'end': 1933.881, 'text': "you can leave your opinion below, but probably by the time you've seen this i will have already made my decision.", 'start': 1929.1, 'duration': 4.781}, {'end': 1938.602, 'text': 'so anyway, uh, yeah, until next time everybody.', 'start': 1933.881, 'duration': 4.721}], 'summary': 'Plan to discuss deep q learning in next tutorial, aim to train a model by next video.', 'duration': 24.766, 'max_score': 1913.836, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1913836.jpg'}], 'start': 1382.547, 'title': 'Modifying tensorboard and deep q learning', 'summary': 'Discusses modifying tensorboard to create one log file per .fit, reducing time and file size, and setting the model name. it also covers deep q learning, model training, and implementation of methods such as update replay memory and get queues.', 'chapters': [{'end': 1587.256, 'start': 1382.547, 'title': 'Modifying tensorboard and setting model names', 'summary': "Discusses modifying tensorboard functionality from tensorflow and keras to create one log file per .fit, with the aim of reducing time taken and file size, and setting the model name to '256x2' for tracking purposes.", 'duration': 204.709, 'highlights': ["The chapter discusses modifying TensorBoard functionality from TensorFlow and Keras to create one log file per .fit, with the aim of reducing time taken and file size, and setting the model name to '256x2' for tracking purposes.", "The self.tensorboard is set to modified tensor board with log_dir as 'logs/var-var' using f strings and import time, to keep track of the model name and time.", 'The self.targetUpdateCounter is initialized to zero to track internally when the target model needs to be updated.']}, {'end': 1938.602, 'start': 1587.836, 'title': 'Deep q learning update and model training', 'summary': "Covers the modification of tensor board, implementation of methods like update replay memory and get queues, and a mention of channel members' support, aiming to guide viewers through deep q learning and model training.", 'duration': 350.766, 'highlights': ["The chapter covers the modification of tensor board, implementation of methods like update replay memory and get queues, and a mention of channel members' support, aiming to guide viewers through deep Q learning and model training.", 'The update replay memory method is implemented to handle appending transition data, including observation space, action, reward, new observation space, and termination status, to facilitate the new queue formula, providing a comprehensive approach to reinforcement learning.', 'The get queues method is introduced for predicting against state data, involving reshaping and normalization of the input state data through NumPy operations, contributing to the development of a predictive model for reinforcement learning.', 'Acknowledgment is made to long-term channel members for their support, with specific mentions of members and their durations, reinforcing a sense of community and gratitude within the tutorial series.', 'The importance of guiding viewers through the deep Q learning process is emphasized, aiming to provide comprehensive understanding and practical application, addressing the scarcity of accessible resources for this complex topic.']}], 'duration': 556.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/t3fbETsIBCY/pics/t3fbETsIBCY1382547.jpg', 'highlights': ["The chapter discusses modifying TensorBoard functionality from TensorFlow and Keras to create one log file per .fit, with the aim of reducing time taken and file size, and setting the model name to '256x2' for tracking purposes.", 'The update replay memory method is implemented to handle appending transition data, including observation space, action, reward, new observation space, and termination status, to facilitate the new queue formula, providing a comprehensive approach to reinforcement learning.', 'The get queues method is introduced for predicting against state data, involving reshaping and normalization of the input state data through NumPy operations, contributing to the development of a predictive model for reinforcement learning.', "The self.tensorboard is set to modified tensor board with log_dir as 'logs/var-var' using f strings and import time, to keep track of the model name and time.", 'The self.targetUpdateCounter is initialized to zero to track internally when the target model needs to be updated.', 'Acknowledgment is made to long-term channel members for their support, with specific mentions of members and their durations, reinforcing a sense of community and gratitude within the tutorial series.', 'The importance of guiding viewers through the deep Q learning process is emphasized, aiming to provide comprehensive understanding and practical application, addressing the scarcity of accessible resources for this complex topic.']}], 'highlights': ['The model architecture includes convolution, activation, max pooling, dropout, and dense layers with specific parameters.', 'The model compilation involves setting the loss function to mean squared error, using the Adam optimizer with a learning rate of 0.001, and tracking accuracy metrics.', 'The chapter covers the use of two models, self.model and self.targetModel, to handle the randomness and instability in predictions, ensuring consistency and stability in the learning process.', 'The target model is re-updated after a certain number of steps or episodes to maintain stability and consistency in predictions, enabling the model to learn effectively amidst the initial chaos and randomness.', 'Implementing a replay memory of size 50,000 using a dequeue from the collections module in Python The code sets the max length of the replay memory to 50,000 and explains the use of underscores in place of commas for better readability.', 'Replay memory consists of 50,000 steps for random sampling. The replay memory is utilized with 50,000 steps to provide a random sampling for training the neural network, ensuring stability.', 'The update replay memory method is implemented to handle appending transition data, including observation space, action, reward, new observation space, and termination status, to facilitate the new queue formula, providing a comprehensive approach to reinforcement learning.', 'The get queues method is introduced for predicting against state data, involving reshaping and normalization of the input state data through NumPy operations, contributing to the development of a predictive model for reinforcement learning.']}