title

Convolutional Neural Networks - The Math of Intelligence (Week 4)

description

Convolutional Networks allow us to classify images, generate them, and can even be applied to other types of data. We're going to build one in numpy that can classify and type of alphanumeric character and it will run in a Flask web app.
Code for this video:
https://github.com/llSourcell/Convolutional_neural_network
Please Subscribe! And like. And comment. That's what keeps me going.
More learning resources:
https://github.com/dorajam/Convolutional-Network
https://beckernick.github.io/neural-network-scratch/
https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
http://cs231n.github.io/convolutional-networks/
http://deeplearning.net/tutorial/lenet.html
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
https://www.youtube.com/watch?v=q555kfIFUCM&t=31s
Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/
And please support me on Patreon:
https://www.patreon.com/user?u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
https://www.wagergpt.co

detail

{'title': 'Convolutional Neural Networks - The Math of Intelligence (Week 4)', 'heatmap': [{'end': 618.588, 'start': 547.139, 'weight': 0.834}, {'end': 801.901, 'start': 744.402, 'weight': 0.706}, {'end': 1023.165, 'start': 965.209, 'weight': 0.865}, {'end': 1162.598, 'start': 1130.556, 'weight': 0.847}, {'end': 1355.152, 'start': 1266.619, 'weight': 0.804}, {'end': 1575.973, 'start': 1437.165, 'weight': 0.805}, {'end': 1689.171, 'start': 1602.302, 'weight': 0.781}, {'end': 1826.547, 'start': 1739.706, 'weight': 0.94}, {'end': 2031.37, 'start': 1908.057, 'weight': 0.767}], 'summary': 'Explores building a convolutional network with numpy for character recognition, delves into the structure, computational cost, and application of convolutional neural networks, discusses image data processing, feature learning, classification, and the significance of fast fourier transform and max pooling, and emphasizes the implementation and deployment of convolutional neural networks using various techniques.', 'chapters': [{'end': 41.798, 'segs': [{'end': 41.798, 'src': 'embed', 'start': 0.129, 'weight': 0, 'content': [{'end': 5.831, 'text': "Hello world, it's Siraj, and we're going to build a convolutional network using no libraries.", 'start': 0.129, 'duration': 5.702}, {'end': 10.812, 'text': 'I mean, just NumPy, but no libraries, no TensorFlow, no PyTorch, none of it.', 'start': 5.991, 'duration': 4.821}, {'end': 16.594, 'text': "We're going to look at the math behind it, and we're going to build it with just NumPy for matrix math in Python.", 'start': 11.072, 'duration': 5.522}, {'end': 21.318, 'text': "Okay, and what it's gonna be able to do, let me just start off with this demo to start off with.", 'start': 17.174, 'duration': 4.144}, {'end': 27.624, 'text': "What it's gonna be able to do is recognize any character that you type in, or not type in, but draw in with your mouse.", 'start': 21.678, 'duration': 5.946}, {'end': 31.027, 'text': 'So you could draw a six like that, and then hit submit.', 'start': 28.104, 'duration': 2.923}, {'end': 32.408, 'text': "It'll start working.", 'start': 31.488, 'duration': 0.92}, {'end': 38.935, 'text': "and then it'll say it's a six, and then, if you don't want to use a six, you could say a letter like a any number or letter.", 'start': 33.129, 'duration': 5.806}, {'end': 41.798, 'text': "it's going to be able to detect, slash, predict.", 'start': 38.935, 'duration': 2.863}], 'summary': 'Building a convolutional network using numpy to recognize characters drawn with a mouse.', 'duration': 41.669, 'max_score': 0.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE129.jpg'}], 'start': 0.129, 'title': 'Building a convolutional network with numpy', 'summary': 'Delves into the process of constructing a convolutional network using numpy for matrix operations in python, enabling it to accurately identify hand-drawn characters and predict corresponding inputs.', 'chapters': [{'end': 41.798, 'start': 0.129, 'title': 'Building convolutional network with numpy', 'summary': 'Discusses building a convolutional network using only numpy for matrix math in python, enabling it to recognize any character drawn with a mouse and predict the correct character or number inputted.', 'duration': 41.669, 'highlights': ['The network is built using only NumPy for matrix math in Python, with no other libraries such as TensorFlow or PyTorch.', 'It is capable of recognizing any character drawn with a mouse and making predictions, such as identifying numbers or letters.', 'The network can accurately identify and predict characters or numbers inputted through drawing or writing.']}], 'duration': 41.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE129.jpg', 'highlights': ['The network is built using only NumPy for matrix math in Python, with no other libraries such as TensorFlow or PyTorch.', 'It is capable of recognizing any character drawn with a mouse and making predictions, such as identifying numbers or letters.', 'The network can accurately identify and predict characters or numbers inputted through drawing or writing.']}, {'end': 464.635, 'segs': [{'end': 214.901, 'src': 'embed', 'start': 172.333, 'weight': 0, 'content': [{'end': 180.579, 'text': 'more complex shapes, and eventually at the highest level, at the highest cluster level, exists the entire face, or the entire dog, or whatever it is.', 'start': 172.333, 'duration': 8.246}, {'end': 184.421, 'text': 'And this is how the mammalian visual cortex works.', 'start': 181.239, 'duration': 3.182}, {'end': 194.408, 'text': 'And so what Yann LeCun said and his team in 98, when they published probably the landmark paper of convolutional nets, which is kind of arguable,', 'start': 184.861, 'duration': 9.547}, {'end': 199.351, 'text': "I guess, because Krzyzewski's ImageNet paper was pretty good in, I think, 2012..", 'start': 194.408, 'duration': 4.943}, {'end': 201.693, 'text': "But anyway, Yann LeCun's a G, I just wanted to say that.", 'start': 199.351, 'duration': 2.342}, {'end': 209.938, 'text': 'He had the idea to be inspired by three features of the human or the mammalian visual cortex.', 'start': 202.633, 'duration': 7.305}, {'end': 214.901, 'text': 'Local connections, and that means the clusters between neurons, how each neuron,', 'start': 210.478, 'duration': 4.423}], 'summary': "Yann lecun's team in 98 published a landmark paper on convolutional nets inspired by the mammalian visual cortex.", 'duration': 42.568, 'max_score': 172.333, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE172333.jpg'}, {'end': 286.078, 'src': 'embed', 'start': 259.49, 'weight': 3, 'content': [{'end': 270.455, 'text': 'and so those three concepts were what inspired the birth of convolutional neural networks Programmatic neural networks designed to mimic the mammalian visual cortex.', 'start': 259.49, 'duration': 10.965}, {'end': 271.915, 'text': 'How cool is that?', 'start': 271.215, 'duration': 0.7}, {'end': 273.056, 'text': "That's so cool.", 'start': 271.975, 'duration': 1.081}, {'end': 275.056, 'text': 'so how does this thing work?', 'start': 273.056, 'duration': 2}, {'end': 276.496, 'text': "let's look at how this works.", 'start': 275.056, 'duration': 1.44}, {'end': 278.457, 'text': 'so we have a set of layers.', 'start': 276.496, 'duration': 1.961}, {'end': 280.817, 'text': "ok, and we'll talk about what these layers mean.", 'start': 278.457, 'duration': 2.36}, {'end': 282.158, 'text': 'right, what is layer?', 'start': 280.817, 'duration': 1.341}, {'end': 286.078, 'text': 'a layer in each case is a series.', 'start': 282.158, 'duration': 3.92}], 'summary': 'Convolutional neural networks were inspired by the mammalian visual cortex, using a series of layers to process visual data.', 'duration': 26.588, 'max_score': 259.49, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE259490.jpg'}, {'end': 370.062, 'src': 'embed', 'start': 343.807, 'weight': 4, 'content': [{'end': 347.69, 'text': "It's gonna make more and more sense as I go further and further in depth here.", 'start': 343.807, 'duration': 3.883}, {'end': 349.231, 'text': 'So stay with me here.', 'start': 348.21, 'duration': 1.021}, {'end': 354.455, 'text': 'So we have a receptive field, okay? That is some part of the image that we are focused on.', 'start': 349.691, 'duration': 4.764}, {'end': 355.936, 'text': 'We are by focused.', 'start': 355.015, 'duration': 0.921}, {'end': 361.118, 'text': 'I mean that is the part of the image that we apply a convolution operation to.', 'start': 356.036, 'duration': 5.082}, {'end': 365.72, 'text': 'Okay, and we take that receptive field and we slide it across the image.', 'start': 361.118, 'duration': 4.602}, {'end': 367.861, 'text': "okay?. You're gonna see exactly what I'm talking about in a second.", 'start': 365.72, 'duration': 2.141}, {'end': 370.062, 'text': "I'm just going it over at a high level.", 'start': 367.881, 'duration': 2.181}], 'summary': 'Explaining the concept of receptive field and convolution operation in image processing.', 'duration': 26.255, 'max_score': 343.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE343807.jpg'}, {'end': 411.17, 'src': 'embed', 'start': 384.886, 'weight': 5, 'content': [{'end': 390.409, 'text': 'The first reason is that not every neuron in each layer is connected to every other neuron in the next layer.', 'start': 384.886, 'duration': 5.523}, {'end': 391.83, 'text': "It's only a part of that.", 'start': 390.669, 'duration': 1.161}, {'end': 394.791, 'text': 'Because it would be a, to borrow from discrete math,', 'start': 392.19, 'duration': 2.601}, {'end': 404.081, 'text': 'a combinatorial explosion to connect every single pixel value in an image to every single pixel value in the next layer of features.', 'start': 395.131, 'duration': 8.95}, {'end': 405.964, 'text': 'It would be just a huge amount.', 'start': 404.402, 'duration': 1.562}, {'end': 411.17, 'text': 'So what we do instead is we take a part of that image and we iteratively slide over it.', 'start': 405.984, 'duration': 5.186}], 'summary': 'Not every neuron is connected to every other; avoids combinatorial explosion.', 'duration': 26.284, 'max_score': 384.886, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE384886.jpg'}], 'start': 41.798, 'title': 'Convolutional neural networks', 'summary': "Discusses the inspiration behind building convolutional networks, focusing on the hierarchical structure of mammalian visual cortex, yann lecun's contributions, local connections, layering, and spatial invariance. it also explains the concept, working, unique structure, and computational cost of convolutional neural networks, along with the application of dot products during the convolution operation.", 'chapters': [{'end': 259.49, 'start': 41.798, 'title': 'Neural network & mammalian visual cortex', 'summary': "Discusses the inspiration behind building a convolutional network, highlighting the hierarchical structure of mammalian visual cortex and yann lecun's contribution to convolutional nets, focusing on local connections, layering, and spatial invariance.", 'duration': 217.692, 'highlights': ["Yann LeCun's inspiration for convolutional nets was influenced by the hierarchical structure of the mammalian visual cortex, emphasizing local connections, layering, and spatial invariance, which enable the detection of features regardless of spatial variations.", 'The mammalian visual cortex operates in a hierarchical manner, with clusters of neurons representing different features, from simple lines and edges to complex shapes, ultimately enabling the recognition of entire objects like faces or animals.', "Yann LeCun and his team's 1998 paper on convolutional nets, focusing on local connections, layering, and spatial invariance, is considered a significant contribution to the field, although the landmark status is debatable compared to other influential papers like Krzyzewski's ImageNet paper in 2012."]}, {'end': 464.635, 'start': 259.49, 'title': 'Convolutional neural networks', 'summary': 'Explains the concept of convolutional neural networks, how they work, their unique structure, and the iterative sliding process, highlighting the reduced computational cost and the application of dot products during the convolution operation.', 'duration': 205.145, 'highlights': ['Convolutional neural networks are designed to mimic the mammalian visual cortex, with layers of operations applied to the input image, and every layer not connected to every other neuron in the next layer, reducing computational expense.', 'The receptive field in a convolutional network is the part of the image where a convolution operation is applied, and it is iteratively slid across the image, applying dot products to all the numbers, resembling a flashlight shining over the image.', 'The unique structure of convolutional networks involves a part of the image being connected iteratively, reducing the computational expense and avoiding a combinatorial explosion that would arise from connecting every single pixel value in an image to every single pixel value in the next layer of features.']}], 'duration': 422.837, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE41798.jpg', 'highlights': ["Yann LeCun's inspiration for convolutional nets was influenced by the hierarchical structure of the mammalian visual cortex, emphasizing local connections, layering, and spatial invariance, which enable the detection of features regardless of spatial variations.", 'The mammalian visual cortex operates in a hierarchical manner, with clusters of neurons representing different features, from simple lines and edges to complex shapes, ultimately enabling the recognition of entire objects like faces or animals.', "Yann LeCun and his team's 1998 paper on convolutional nets, focusing on local connections, layering, and spatial invariance, is considered a significant contribution to the field, although the landmark status is debatable compared to other influential papers like Krzyzewski's ImageNet paper in 2012.", 'Convolutional neural networks are designed to mimic the mammalian visual cortex, with layers of operations applied to the input image, and every layer not connected to every other neuron in the next layer, reducing computational expense.', 'The receptive field in a convolutional network is the part of the image where a convolution operation is applied, and it is iteratively slid across the image, applying dot products to all the numbers, resembling a flashlight shining over the image.', 'The unique structure of convolutional networks involves a part of the image being connected iteratively, reducing the computational expense and avoiding a combinatorial explosion that would arise from connecting every single pixel value in an image to every single pixel value in the next layer of features.']}, {'end': 813.066, 'segs': [{'end': 560.824, 'src': 'embed', 'start': 531.516, 'weight': 0, 'content': [{'end': 534.797, 'text': "There's the feature learning part, and then there's the classification part.", 'start': 531.516, 'duration': 3.281}, {'end': 541.138, 'text': 'And so for the feature learning part, what happens are three operations over and over and over again.', 'start': 535.497, 'duration': 5.641}, {'end': 543.799, 'text': 'And we can call them convolutional blocks.', 'start': 541.258, 'duration': 2.541}, {'end': 545.579, 'text': "Let's just call them convolutional blocks.", 'start': 544.139, 'duration': 1.44}, {'end': 546.739, 'text': "I'm coining the term.", 'start': 545.979, 'duration': 0.76}, {'end': 555.461, 'text': 'So what happens is we first apply convolution, then we apply ReLU, or any kind of activation, and then we apply pooling.', 'start': 547.139, 'duration': 8.322}, {'end': 556.781, 'text': 'And we repeat that.', 'start': 556.041, 'duration': 0.74}, {'end': 557.942, 'text': "That's a single block.", 'start': 556.841, 'duration': 1.101}, {'end': 560.824, 'text': 'Three operations in a single convolutional block.', 'start': 557.982, 'duration': 2.842}], 'summary': 'Feature learning involves three operations: convolution, relu activation, and pooling, repeated in a single block.', 'duration': 29.308, 'max_score': 531.516, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE531516.jpg'}, {'end': 618.588, 'src': 'heatmap', 'start': 547.139, 'weight': 0.834, 'content': [{'end': 555.461, 'text': 'So what happens is we first apply convolution, then we apply ReLU, or any kind of activation, and then we apply pooling.', 'start': 547.139, 'duration': 8.322}, {'end': 556.781, 'text': 'And we repeat that.', 'start': 556.041, 'duration': 0.74}, {'end': 557.942, 'text': "That's a single block.", 'start': 556.841, 'duration': 1.101}, {'end': 560.824, 'text': 'Three operations in a single convolutional block.', 'start': 557.982, 'duration': 2.842}, {'end': 563.405, 'text': 'Okay? So convolution, re-loop, pooling.', 'start': 561.164, 'duration': 2.241}, {'end': 565.386, 'text': 'Repeat Convolution, re-loop, pooling.', 'start': 563.485, 'duration': 1.901}, {'end': 567.367, 'text': 'Repeat Convolution, re-loop, pooling.', 'start': 565.466, 'duration': 1.901}, {'end': 570.649, 'text': 'Okay, and usually, you know, you have three blocks at least.', 'start': 567.687, 'duration': 2.962}, {'end': 573.37, 'text': "Unless you're building inception by Google, then you have 15 of these.", 'start': 570.989, 'duration': 2.381}, {'end': 585.433, 'text': 'But you have these convolutional blocks and at the very end then you flatten that output into a smaller dimensional vector and then you apply a fully connected layer to it.', 'start': 575.551, 'duration': 9.882}, {'end': 590.094, 'text': 'So that means that you then connect all the neurons in one layer to the next one,', 'start': 585.553, 'duration': 4.541}, {'end': 594.295, 'text': "just because we want to then harness all of the learnings that we've learned so far.", 'start': 590.094, 'duration': 4.201}, {'end': 595.895, 'text': "That's why we fully connect at the end.", 'start': 594.375, 'duration': 1.52}, {'end': 603.097, 'text': 'And then we take those learnings, and we squash it into a set of probability values with our last softmax function.', 'start': 596.335, 'duration': 6.762}, {'end': 611.386, 'text': 'And then we take the max value of those probabilities, and each of these probabilities is a probability for a specific class that it could be.', 'start': 604.104, 'duration': 7.282}, {'end': 618.588, 'text': "And we take the max value, let's say 72%, and we'll say, okay, well, 72% for banana, and now we know it's a banana.", 'start': 611.646, 'duration': 6.942}], 'summary': 'Convolution, relu, pooling repeated in convolutional blocks; fully connected layer applied to flattened output, ending with softmax function to determine class probability.', 'duration': 71.449, 'max_score': 547.139, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE547139.jpg'}, {'end': 618.588, 'src': 'embed', 'start': 590.094, 'weight': 1, 'content': [{'end': 594.295, 'text': "just because we want to then harness all of the learnings that we've learned so far.", 'start': 590.094, 'duration': 4.201}, {'end': 595.895, 'text': "That's why we fully connect at the end.", 'start': 594.375, 'duration': 1.52}, {'end': 603.097, 'text': 'And then we take those learnings, and we squash it into a set of probability values with our last softmax function.', 'start': 596.335, 'duration': 6.762}, {'end': 611.386, 'text': 'And then we take the max value of those probabilities, and each of these probabilities is a probability for a specific class that it could be.', 'start': 604.104, 'duration': 7.282}, {'end': 618.588, 'text': "And we take the max value, let's say 72%, and we'll say, okay, well, 72% for banana, and now we know it's a banana.", 'start': 611.646, 'duration': 6.942}], 'summary': 'Utilizing learnings to derive probabilities, determining class with 72% confidence.', 'duration': 28.494, 'max_score': 590.094, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE590094.jpg'}, {'end': 664.614, 'src': 'embed', 'start': 632.794, 'weight': 2, 'content': [{'end': 635.895, 'text': 'so for step one, we are preparing a data set of images, right.', 'start': 632.794, 'duration': 3.101}, {'end': 641.258, 'text': 'so when you think of an image, you think of a Matrix, hopefully a matrix of pixel values.', 'start': 635.895, 'duration': 5.363}, {'end': 643.38, 'text': "if you don't think of it that way, think of it, think of it that way.", 'start': 641.258, 'duration': 2.122}, {'end': 654.464, 'text': "now You're thinking of an image as a matrix of pixel values, rows by columns, and each of these, each of these Points in the matrix,", 'start': 643.38, 'duration': 11.084}, {'end': 657.927, 'text': 'represent a pixel right between 0 and 255.', 'start': 654.464, 'duration': 3.463}, {'end': 664.614, 'text': "But it's actually better, in terms of convolutional networks, to think of an image as a three dimensional matrix.", 'start': 657.927, 'duration': 6.687}], 'summary': 'Preparing a dataset of images with pixel values represented in a three-dimensional matrix for convolutional networks.', 'duration': 31.82, 'max_score': 632.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE632794.jpg'}, {'end': 751.438, 'src': 'embed', 'start': 719.167, 'weight': 3, 'content': [{'end': 723.712, 'text': "We're talking about supervised learning, learning the mapping between the input data and the output label.", 'start': 719.167, 'duration': 4.545}, {'end': 727.454, 'text': 'Dog image dog label learn the mapping given a new dog image.', 'start': 724.272, 'duration': 3.182}, {'end': 728.234, 'text': 'What is a label?', 'start': 727.514, 'duration': 0.72}, {'end': 730.135, 'text': 'well?, You just learned it, right.', 'start': 728.234, 'duration': 1.901}, {'end': 734.777, 'text': 'so, and we learn it through Back propagation, back propagate to update weights.', 'start': 730.135, 'duration': 4.642}, {'end': 736.398, 'text': 'remember the rhyme, you know what it is.', 'start': 734.777, 'duration': 1.621}, {'end': 739.8, 'text': "hey, I haven't wrapped yet in the series, but I will don't worry.", 'start': 736.398, 'duration': 3.402}, {'end': 744.342, 'text': "It's coming anyway, so Every image is a matrix of pixel values.", 'start': 739.84, 'duration': 4.502}, {'end': 751.438, 'text': "We know this we know this they're between 0 and 255 and We can use several training data sets.", 'start': 744.402, 'duration': 7.036}], 'summary': 'Supervised learning: mapping input to output label through back propagation with pixel values between 0 and 255.', 'duration': 32.271, 'max_score': 719.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE719167.jpg'}, {'end': 801.901, 'src': 'heatmap', 'start': 744.402, 'weight': 0.706, 'content': [{'end': 751.438, 'text': "We know this we know this they're between 0 and 255 and We can use several training data sets.", 'start': 744.402, 'duration': 7.036}, {'end': 753.199, 'text': 'There are two really popular ones.', 'start': 751.478, 'duration': 1.721}, {'end': 755.08, 'text': "There's CIFAR and there's COCO.", 'start': 753.239, 'duration': 1.841}, {'end': 756.74, 'text': "And there's a bunch of other ones as well.", 'start': 755.56, 'duration': 1.18}, {'end': 761.341, 'text': 'But basically these are huge data sets and you can find smaller versions of them.', 'start': 756.78, 'duration': 4.561}, {'end': 767.883, 'text': 'And each of these images, their dogs, their cars, their airplanes, their people, whatever, they all have labels for them.', 'start': 762.021, 'duration': 5.862}, {'end': 770.784, 'text': 'Handmade labels by humans, which is great for us.', 'start': 768.723, 'duration': 2.061}, {'end': 774.399, 'text': "Okay, so that's step one.", 'start': 772.577, 'duration': 1.822}, {'end': 777.721, 'text': 'Step one is to get your training data, which is your images, which are your images.', 'start': 774.439, 'duration': 3.282}, {'end': 780.223, 'text': 'Step two is to perform convolution.', 'start': 778.362, 'duration': 1.861}, {'end': 787.809, 'text': "Now, you might be asking, what is convolution? Well, I'm here to tell you that convolution is an operation that is dope as F.", 'start': 780.323, 'duration': 7.486}, {'end': 788.61, 'text': "Here's why it's dope.", 'start': 787.809, 'duration': 0.801}, {'end': 791.452, 'text': "Because it's not just used in computer science and machine learning.", 'start': 788.85, 'duration': 2.602}, {'end': 794.054, 'text': "It's used in almost every field of engineering.", 'start': 791.833, 'duration': 2.221}, {'end': 796.396, 'text': 'Think of convolution as two paint buckets.', 'start': 794.415, 'duration': 1.981}, {'end': 799.299, 'text': 'You have one paint bucket, which is red, and another one, which is blue.', 'start': 796.777, 'duration': 2.522}, {'end': 801.901, 'text': 'And what you do is just smear it all over yourself.', 'start': 799.639, 'duration': 2.262}], 'summary': 'Data sets like cifar and coco are used with labeled images for training. convolution is a widely applicable operation.', 'duration': 57.499, 'max_score': 744.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE744402.jpg'}, {'end': 810.505, 'src': 'embed', 'start': 778.362, 'weight': 4, 'content': [{'end': 780.223, 'text': 'Step two is to perform convolution.', 'start': 778.362, 'duration': 1.861}, {'end': 787.809, 'text': "Now, you might be asking, what is convolution? Well, I'm here to tell you that convolution is an operation that is dope as F.", 'start': 780.323, 'duration': 7.486}, {'end': 788.61, 'text': "Here's why it's dope.", 'start': 787.809, 'duration': 0.801}, {'end': 791.452, 'text': "Because it's not just used in computer science and machine learning.", 'start': 788.85, 'duration': 2.602}, {'end': 794.054, 'text': "It's used in almost every field of engineering.", 'start': 791.833, 'duration': 2.221}, {'end': 796.396, 'text': 'Think of convolution as two paint buckets.', 'start': 794.415, 'duration': 1.981}, {'end': 799.299, 'text': 'You have one paint bucket, which is red, and another one, which is blue.', 'start': 796.777, 'duration': 2.522}, {'end': 801.901, 'text': 'And what you do is just smear it all over yourself.', 'start': 799.639, 'duration': 2.262}, {'end': 803.001, 'text': "No, you don't do that.", 'start': 802.401, 'duration': 0.6}, {'end': 810.505, 'text': 'What you do is you take these two paint buckets and you combine them into one paint bucket, and that new paint bucket is gonna be a new color,', 'start': 803.261, 'duration': 7.244}], 'summary': 'Convolution is a versatile operation used in various fields of engineering and beyond.', 'duration': 32.143, 'max_score': 778.362, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE778362.jpg'}], 'start': 465.015, 'title': 'Convolutional networks and image data processing', 'summary': 'Discusses the structure of convolutional networks, feature learning, and classification, along with image data processing as a three-dimensional matrix of pixel values and the significance of convolution in machine learning, using popular datasets like cifar and coco.', 'chapters': [{'end': 632.794, 'start': 465.015, 'title': 'Understanding convolutional networks', 'summary': 'Discusses the structure of a convolutional network, emphasizing the process of feature learning and classification, as well as the series of operations within a convolutional block, leading to the application of a fully connected layer and a softmax function for classification.', 'duration': 167.779, 'highlights': ['A convolutional network consists of feature learning and classification parts, with the former involving a series of operations within convolutional blocks and the latter involving a fully connected layer and a softmax function for classification. The chapter explains that a convolutional network can be divided into feature learning and classification parts, with feature learning involving a series of operations within convolutional blocks, and classification involving a fully connected layer and a softmax function for classification.', 'The feature learning part involves a series of operations - convolution, ReLU activation, and pooling - repeated multiple times within convolutional blocks. The feature learning part of the convolutional network entails a series of operations including convolution, ReLU activation, and pooling, repeated within convolutional blocks.', 'The process culminates in the application of a fully connected layer to harness the learnings and a softmax function to produce probability values for specific classes. The convolutional network culminates in the application of a fully connected layer to harness the learnings and a softmax function to produce probability values for specific classes.']}, {'end': 813.066, 'start': 632.794, 'title': 'Image data processing & convolution', 'summary': 'Discusses the preparation of image data as a three-dimensional matrix of pixel values, the use of supervised learning with associated labels, and the significance of convolution in image processing and machine learning, with popular data sets like cifar and coco.', 'duration': 180.272, 'highlights': ['The first step involves preparing image data as a three-dimensional matrix of pixel values, with three channels representing red, green, and blue, and popular training data sets like CIFAR and COCO are used (e.g. CIFAR and COCO are huge data sets with labeled images of dogs, cars, airplanes, and people).', 'Supervised learning involves learning the mapping between input data and output label, and back propagation is used to update weights based on the associated label (e.g. training data consists of images with associated labels, and back propagation is used to update weights in supervised learning).', 'Convolution is described as a fundamental operation used in various fields, including computer science and machine learning, and is likened to combining two paint buckets to create a new color, illustrating the concept of convolution (e.g. convolution is a fundamental operation used in various fields and is likened to combining two paint buckets to create a new color).']}], 'duration': 348.051, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE465015.jpg', 'highlights': ['The feature learning part of the convolutional network entails a series of operations including convolution, ReLU activation, and pooling, repeated within convolutional blocks.', 'The convolutional network culminates in the application of a fully connected layer to harness the learnings and a softmax function to produce probability values for specific classes.', 'The first step involves preparing image data as a three-dimensional matrix of pixel values, with three channels representing red, green, and blue, and popular training data sets like CIFAR and COCO are used.', 'Supervised learning involves learning the mapping between input data and output label, and back propagation is used to update weights based on the associated label.', 'Convolution is described as a fundamental operation used in various fields, including computer science and machine learning, and is likened to combining two paint buckets to create a new color, illustrating the concept of convolution.']}, {'end': 1612.128, 'segs': [{'end': 1023.165, 'src': 'heatmap', 'start': 965.209, 'weight': 0.865, 'content': [{'end': 969.932, 'text': "But that's the basic idea between convolution, and that's why we call it convolution,", 'start': 965.209, 'duration': 4.723}, {'end': 978.638, 'text': 'because we are Combining or convolving the weight matrix or filter or kernel, whatever you want to call it feature map, by that input.', 'start': 969.932, 'duration': 8.706}, {'end': 985.863, 'text': "We're combining it, using the help and using that output as the input for the next layer, after activating it and pulling it.", 'start': 978.638, 'duration': 7.225}, {'end': 990.666, 'text': "Okay, so that's convolution and also Right.", 'start': 985.863, 'duration': 4.803}, {'end': 998.229, 'text': 'so we apply it to all of those dimensions for that, for that input matrix, Okay, and that gives us our activation map or feature map or filter.', 'start': 990.666, 'duration': 7.563}, {'end': 1000.95, 'text': 'right. so many different interchangeable terms here.', 'start': 998.229, 'duration': 2.721}, {'end': 1004.671, 'text': "so, Anyway, so it's computed using the dot product.", 'start': 1000.95, 'duration': 3.721}, {'end': 1006.312, 'text': 'so you might be thinking well, okay?', 'start': 1004.671, 'duration': 1.641}, {'end': 1013.975, 'text': "I see how there's a dot product, I see how there's matrix multiplication, But how does that really tell us what features there are?", 'start': 1006.532, 'duration': 7.443}, {'end': 1019.36, 'text': "I still, you're still not making the connection, probably, and why, understandably, why this?", 'start': 1013.975, 'duration': 5.385}, {'end': 1023.165, 'text': 'these series of matrix operations help us detect features.', 'start': 1019.36, 'duration': 3.805}], 'summary': 'Convolution involves combining weight matrix or filter by input, using dot product to detect features.', 'duration': 57.956, 'max_score': 965.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE965209.jpg'}, {'end': 998.229, 'src': 'embed', 'start': 969.932, 'weight': 0, 'content': [{'end': 978.638, 'text': 'because we are Combining or convolving the weight matrix or filter or kernel, whatever you want to call it feature map, by that input.', 'start': 969.932, 'duration': 8.706}, {'end': 985.863, 'text': "We're combining it, using the help and using that output as the input for the next layer, after activating it and pulling it.", 'start': 978.638, 'duration': 7.225}, {'end': 990.666, 'text': "Okay, so that's convolution and also Right.", 'start': 985.863, 'duration': 4.803}, {'end': 998.229, 'text': 'so we apply it to all of those dimensions for that, for that input matrix, Okay, and that gives us our activation map or feature map or filter.', 'start': 990.666, 'duration': 7.563}], 'summary': 'Explains the process of combining weight matrix to generate activation maps in convolutional neural networks.', 'duration': 28.297, 'max_score': 969.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE969932.jpg'}, {'end': 1162.598, 'src': 'heatmap', 'start': 1130.556, 'weight': 0.847, 'content': [{'end': 1134.718, 'text': "So that's why it's important to make the rest of the so the data.", 'start': 1130.556, 'duration': 4.162}, {'end': 1135.418, 'text': "That's irrelevant.", 'start': 1134.758, 'duration': 0.66}, {'end': 1141.244, 'text': 'We want it to be zero right in the in the feature maps, in the filters that we learn.', 'start': 1135.438, 'duration': 5.806}, {'end': 1144.728, 'text': 'In the filters that we learn, we want the irrelevant parts to be zero.', 'start': 1141.544, 'duration': 3.184}, {'end': 1150.857, 'text': 'And in the images, okay? And in the input images.', 'start': 1145.109, 'duration': 5.748}, {'end': 1157.951, 'text': "So, I could actually go even more into convolution, but It's not really necessary, but it is super dope.", 'start': 1152.078, 'duration': 5.873}, {'end': 1158.672, 'text': 'It is super dope, though.', 'start': 1157.971, 'duration': 0.701}, {'end': 1160.535, 'text': 'This is a great blog post, by the way.', 'start': 1158.953, 'duration': 1.582}, {'end': 1162.598, 'text': 'I definitely encourage you to read this blog post.', 'start': 1160.615, 'duration': 1.983}], 'summary': 'Importance of making irrelevant data zero in feature maps and filters for images and input images.', 'duration': 32.042, 'max_score': 1130.556, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1130556.jpg'}, {'end': 1243, 'src': 'embed', 'start': 1215.335, 'weight': 1, 'content': [{'end': 1225.87, 'text': "but the convolution theorem states that, And so it's a general theorem that can be applied to any set of problems.", 'start': 1215.335, 'duration': 10.535}, {'end': 1233.194, 'text': "But in terms of what's relevant to us is the convolutional theorem applied to matrix operations.", 'start': 1225.91, 'duration': 7.284}, {'end': 1240.138, 'text': "So what we can do is we can say what it says is it's the input times, the kernel, It's the dot product.", 'start': 1233.494, 'duration': 6.644}, {'end': 1243, 'text': "It's a dot product between two different matrices,", 'start': 1240.138, 'duration': 2.862}], 'summary': 'The convolution theorem applies to matrix operations, involving a dot product between input and kernel matrices.', 'duration': 27.665, 'max_score': 1215.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1215335.jpg'}, {'end': 1361.737, 'src': 'heatmap', 'start': 1256.571, 'weight': 2, 'content': [{'end': 1261.956, 'text': "it's the same thing, but it's a more complex way of looking at it, or more mathematically accurate way.", 'start': 1256.571, 'duration': 5.385}, {'end': 1266.619, 'text': 'and also the fast Fourier transform is is brought up by this,', 'start': 1261.956, 'duration': 4.663}, {'end': 1274.022, 'text': 'and The Fast Fourier Transform takes some spatial data and it converts it into Fourier space, which is like a waveform.', 'start': 1266.619, 'duration': 7.403}, {'end': 1279.424, 'text': "And you see this a lot in your day-to-day life whenever you're looking at some sound.", 'start': 1274.443, 'duration': 4.981}, {'end': 1283.125, 'text': "You know, you're listening to some sound and you look at your MP3 player and you see the waves.", 'start': 1279.604, 'duration': 3.521}, {'end': 1284.905, 'text': "That's a Fourier Transform happening.", 'start': 1283.265, 'duration': 1.64}, {'end': 1286.366, 'text': "But I won't go into that.", 'start': 1285.545, 'duration': 0.821}, {'end': 1287.726, 'text': "That's for sound and audio.", 'start': 1286.406, 'duration': 1.32}, {'end': 1289.806, 'text': "But anyway, it's a really cool blog post.", 'start': 1287.766, 'duration': 2.04}, {'end': 1290.547, 'text': 'Definitely check it out.', 'start': 1289.846, 'duration': 0.701}, {'end': 1291.807, 'text': 'Okay, so back to this.', 'start': 1291.047, 'duration': 0.76}, {'end': 1295.126, 'text': 'So we talked about convolution.', 'start': 1294.065, 'duration': 1.061}, {'end': 1297.147, 'text': "Now we're gonna talk about pooling, right?", 'start': 1295.526, 'duration': 1.621}, {'end': 1298.388, 'text': 'So what is pooling?', 'start': 1297.467, 'duration': 0.921}, {'end': 1306.714, 'text': "So whenever we apply convolution to some image, what's gonna happen at every layer is we're going to get a series of feature of,", 'start': 1298.448, 'duration': 8.266}, {'end': 1310.376, 'text': 'so each of the weights are going to consist of multiple images.', 'start': 1306.714, 'duration': 3.662}, {'end': 1315.327, 'text': 'And each of these images are going to be Every layer.', 'start': 1310.917, 'duration': 4.41}, {'end': 1317.849, 'text': "there's going to be more and smaller images.", 'start': 1315.327, 'duration': 2.522}, {'end': 1323.793, 'text': 'so the first few layers are gonna be these huge images right, and Then at the next few layers are gonna be more of those,', 'start': 1317.849, 'duration': 5.944}, {'end': 1326.154, 'text': "but they're gonna be smaller, And it's just gonna get just like that.", 'start': 1323.793, 'duration': 2.361}, {'end': 1329.277, 'text': 'okay, and at the end we squash it with some fully connected layer.', 'start': 1326.154, 'duration': 3.123}, {'end': 1331.939, 'text': 'So we get some probability values with a softmax.', 'start': 1329.277, 'duration': 2.662}, {'end': 1334.5, 'text': 'but anyway, what pooling does?', 'start': 1331.939, 'duration': 2.561}, {'end': 1335.901, 'text': 'is it, is it dense?', 'start': 1334.5, 'duration': 1.401}, {'end': 1340.963, 'text': 'is it makes the matrix, the matrices that we learn, more dense?', 'start': 1335.901, 'duration': 5.062}, {'end': 1341.803, 'text': "Here's what I mean.", 'start': 1341.283, 'duration': 0.52}, {'end': 1351.93, 'text': "So if you perform convolution between an input and a feature matrix or a weight matrix or filter, it's going to result in a matrix right?", 'start': 1342.283, 'duration': 9.647}, {'end': 1353.811, 'text': 'But this matrix is going to be pretty big.', 'start': 1351.97, 'duration': 1.841}, {'end': 1355.152, 'text': "It's going to be a pretty big matrix.", 'start': 1353.911, 'duration': 1.241}, {'end': 1361.737, 'text': 'What we can do is we can take the most important parts of that matrix and pass that on.', 'start': 1355.693, 'duration': 6.044}], 'summary': 'Fast fourier transform converts spatial data into waveforms for sound, audio; pooling makes learned matrices more dense.', 'duration': 105.166, 'max_score': 1256.571, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1256571.jpg'}, {'end': 1389.348, 'src': 'embed', 'start': 1361.777, 'weight': 3, 'content': [{'end': 1368.762, 'text': "And what that's going to do is it's going to reduce the computational complexity of our model, okay? So that's what pooling is all about.", 'start': 1361.777, 'duration': 6.985}, {'end': 1371.488, 'text': "And so pooling sets, so there's different types of pooling.", 'start': 1369.562, 'duration': 1.926}, {'end': 1374.457, 'text': 'Max pooling is the most used type of pooling, by the way.', 'start': 1372.07, 'duration': 2.387}, {'end': 1383.126, 'text': 'So basically multiply, so what happens is we stride, we define some window size and then some stride size.', 'start': 1375.563, 'duration': 7.563}, {'end': 1389.348, 'text': "So what are the intervals that we look at? And we say, okay, so for each of these windows, let's take the max value.", 'start': 1383.166, 'duration': 6.182}], 'summary': 'Pooling reduces computational complexity, with max pooling being the most used type.', 'duration': 27.571, 'max_score': 1361.777, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1361777.jpg'}, {'end': 1575.973, 'src': 'heatmap', 'start': 1437.165, 'weight': 0.805, 'content': [{'end': 1439.669, 'text': 'we talked about activation.', 'start': 1437.165, 'duration': 2.504}, {'end': 1444.962, 'text': 'and so now, No, we talked about convolution and we talked about pooling.', 'start': 1439.669, 'duration': 5.293}, {'end': 1449.326, 'text': 'And so now the third part is normalization or activation.', 'start': 1445.322, 'duration': 4.004}, {'end': 1457.735, 'text': "So remember how I said how it would be, it's so important that we have these values that are not related to our image be zero? We want it to be zero.", 'start': 1449.667, 'duration': 8.068}, {'end': 1461.638, 'text': 'So the result is zero if the feature is not detected.', 'start': 1457.995, 'duration': 3.643}, {'end': 1463.941, 'text': 'Well, the way we do that is using ReLU.', 'start': 1461.698, 'duration': 2.243}, {'end': 1466.863, 'text': 'And so ReLU stands for rectified linear unit.', 'start': 1464.401, 'duration': 2.462}, {'end': 1468.605, 'text': "It's an activation function.", 'start': 1467.163, 'duration': 1.442}, {'end': 1470.386, 'text': "It's an activation function, okay?", 'start': 1468.785, 'duration': 1.601}, {'end': 1476.431, 'text': 'We use activation functions throughout neural networks and we use them because it is.', 'start': 1470.786, 'duration': 5.645}, {'end': 1483.617, 'text': 'you can also call them nonlinearities, because they make our model able to learn nonlinear functions.', 'start': 1476.431, 'duration': 7.186}, {'end': 1486.319, 'text': 'Not just linear functions, but nonlinear functions.', 'start': 1483.697, 'duration': 2.622}, {'end': 1490.843, 'text': 'So any kind of function, right? The universal function approximation theorem, we talked about that.', 'start': 1486.339, 'duration': 4.504}, {'end': 1493.544, 'text': 'Activation functions help make this happen.', 'start': 1491.563, 'duration': 1.981}, {'end': 1501.108, 'text': 'And so ReLU is a special kind of activation function that turns all negative numbers into zero.', 'start': 1494.184, 'duration': 6.924}, {'end': 1503.569, 'text': "So that's why it's gonna make the math easier.", 'start': 1501.408, 'duration': 2.161}, {'end': 1506.05, 'text': "It won't make the math break for our convolutional network.", 'start': 1503.629, 'duration': 2.421}, {'end': 1506.931, 'text': "So we'll apply ReLU.", 'start': 1506.07, 'duration': 0.861}, {'end': 1514.394, 'text': 'So, basically what we do is for every single pixel value in the input to this ReLU activation function, we turn it.', 'start': 1507.191, 'duration': 7.203}, {'end': 1516.075, 'text': "if it's a negative, we just say make it zero.", 'start': 1514.394, 'duration': 1.681}, {'end': 1517.016, 'text': "It's super simple.", 'start': 1516.415, 'duration': 0.601}, {'end': 1518.036, 'text': "It'll be one line of code.", 'start': 1517.056, 'duration': 0.98}, {'end': 1519.237, 'text': "You'll see exactly what I'm talking about.", 'start': 1518.056, 'duration': 1.181}, {'end': 1522.753, 'text': "Okay, so that's, that's, those are our blocks.", 'start': 1520.492, 'duration': 2.261}, {'end': 1525.875, 'text': "so that's how our convolutional blocks work.", 'start': 1522.753, 'duration': 3.122}, {'end': 1532.639, 'text': "However, there is another step that I didn't talk about, that is a nice-to-have and state-of-the-art convolutional networks always use it,", 'start': 1525.875, 'duration': 6.764}, {'end': 1534, 'text': "and that's called dropout.", 'start': 1532.639, 'duration': 1.361}, {'end': 1541.448, 'text': 'so Geoffrey Hinton, the guy who invented neural networks, invented a feature, invented a technique called dropout.', 'start': 1534, 'duration': 7.448}, {'end': 1548.694, 'text': 'And what dropout is, is a good analogy is old people, or not old people, but people who are stuck in their ways.', 'start': 1541.468, 'duration': 7.226}, {'end': 1553.919, 'text': 'Let me, okay, so what dropout does is it turns neurons on and off randomly.', 'start': 1549.035, 'duration': 4.884}, {'end': 1562.505, 'text': 'What do I mean by that? I mean, the matrices for each weight value is converted to zero randomly at some layer of the network.', 'start': 1554.259, 'duration': 8.246}, {'end': 1570.81, 'text': 'And so what happens is, by doing this, our network is forced to learn new representations for the data, new pathways that data has to flow through.', 'start': 1563.185, 'duration': 7.625}, {'end': 1572.711, 'text': "It can't always flow through this neuron.", 'start': 1571.11, 'duration': 1.601}, {'end': 1575.973, 'text': 'And the reason we use it to prevent overfitting.', 'start': 1573.011, 'duration': 2.962}], 'summary': 'The transcript covers activation, relu, and dropout in convolutional networks.', 'duration': 138.808, 'max_score': 1437.165, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1437165.jpg'}, {'end': 1518.036, 'src': 'embed', 'start': 1494.184, 'weight': 4, 'content': [{'end': 1501.108, 'text': 'And so ReLU is a special kind of activation function that turns all negative numbers into zero.', 'start': 1494.184, 'duration': 6.924}, {'end': 1503.569, 'text': "So that's why it's gonna make the math easier.", 'start': 1501.408, 'duration': 2.161}, {'end': 1506.05, 'text': "It won't make the math break for our convolutional network.", 'start': 1503.629, 'duration': 2.421}, {'end': 1506.931, 'text': "So we'll apply ReLU.", 'start': 1506.07, 'duration': 0.861}, {'end': 1514.394, 'text': 'So, basically what we do is for every single pixel value in the input to this ReLU activation function, we turn it.', 'start': 1507.191, 'duration': 7.203}, {'end': 1516.075, 'text': "if it's a negative, we just say make it zero.", 'start': 1514.394, 'duration': 1.681}, {'end': 1517.016, 'text': "It's super simple.", 'start': 1516.415, 'duration': 0.601}, {'end': 1518.036, 'text': "It'll be one line of code.", 'start': 1517.056, 'duration': 0.98}], 'summary': "Relu activation turns negative numbers to zero, simplifying math for convolutional networks. it's applied to every pixel value, making it super simple with just one line of code.", 'duration': 23.852, 'max_score': 1494.184, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1494184.jpg'}, {'end': 1577.413, 'src': 'embed', 'start': 1554.259, 'weight': 5, 'content': [{'end': 1562.505, 'text': 'What do I mean by that? I mean, the matrices for each weight value is converted to zero randomly at some layer of the network.', 'start': 1554.259, 'duration': 8.246}, {'end': 1570.81, 'text': 'And so what happens is, by doing this, our network is forced to learn new representations for the data, new pathways that data has to flow through.', 'start': 1563.185, 'duration': 7.625}, {'end': 1572.711, 'text': "It can't always flow through this neuron.", 'start': 1571.11, 'duration': 1.601}, {'end': 1575.973, 'text': 'And the reason we use it to prevent overfitting.', 'start': 1573.011, 'duration': 2.962}, {'end': 1577.413, 'text': 'right. we want to prevent overfitting.', 'start': 1575.973, 'duration': 1.44}], 'summary': 'Randomly converting weight matrices to zero at some layer forces network to learn new representations, preventing overfitting.', 'duration': 23.154, 'max_score': 1554.259, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1554259.jpg'}], 'start': 813.426, 'title': 'Convolutional networks', 'summary': 'Provides an overview of convolutional networks, explaining the concept of convolution, matrix operations, and feature map generation, as well as the significance of fast fourier transform and max pooling in reducing computational complexity.', 'chapters': [{'end': 1256.571, 'start': 813.426, 'title': 'Convolutional networks overview', 'summary': 'Explains the concept of convolution in convolutional networks, where input matrices are combined with weight matrices through a series of matrix operations to produce convolved feature maps, aiding in feature detection and filtering. it also touches upon the convolution theorem and its relevance to matrix operations.', 'duration': 443.145, 'highlights': ['Convolution in convolutional networks involves combining input matrices with weight matrices through matrix operations to produce convolved feature maps for feature detection and filtering. It explains how the input matrices are processed using weight matrices through matrix operations to generate convolved feature maps for detecting features such as curves.', 'The convolution theorem is described as a general theorem that can be applied to any set of problems, particularly to matrix operations, involving the dot product between input and kernel matrices. It introduces the convolution theorem and its application to matrix operations, specifically emphasizing the dot product between input and kernel matrices for feature detection.']}, {'end': 1612.128, 'start': 1256.571, 'title': 'Understanding convolutional neural networks', 'summary': 'Explains the concepts of convolution, pooling, activation, and dropout in convolutional neural networks, highlighting the application of fast fourier transform and the significance of max pooling in reducing computational complexity.', 'duration': 355.557, 'highlights': ['The Fast Fourier Transform converts spatial data into Fourier space, used in sound and audio, and is a complex and mathematically accurate way of representing data. The Fast Fourier Transform converts spatial data into Fourier space, used in sound and audio, and is a complex and mathematically accurate way of representing data.', 'Pooling reduces the computational complexity of the model by extracting the most important parts of the matrix, with max pooling being the most widely used type of pooling. Pooling reduces the computational complexity of the model by extracting the most important parts of the matrix, with max pooling being the most widely used type of pooling.', 'Activation functions like ReLU turn all negative numbers into zero, facilitating the learning of nonlinear functions and preventing math errors in convolutional networks. Activation functions like ReLU turn all negative numbers into zero, facilitating the learning of nonlinear functions and preventing math errors in convolutional networks.', 'Dropout randomly turns neurons on and off to force the network to learn new representations, preventing overfitting and increasing generalization ability. Dropout randomly turns neurons on and off to force the network to learn new representations, preventing overfitting and increasing generalization ability.']}], 'duration': 798.702, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE813426.jpg', 'highlights': ['Convolution involves combining input matrices with weight matrices through matrix operations to produce convolved feature maps for feature detection and filtering.', 'The convolution theorem is a general theorem applied to matrix operations, emphasizing the dot product between input and kernel matrices for feature detection.', 'The Fast Fourier Transform converts spatial data into Fourier space, used in sound and audio, and is a complex and mathematically accurate way of representing data.', 'Pooling reduces the computational complexity of the model by extracting the most important parts of the matrix, with max pooling being the most widely used type of pooling.', 'Activation functions like ReLU turn all negative numbers into zero, facilitating the learning of nonlinear functions and preventing math errors in convolutional networks.', 'Dropout randomly turns neurons on and off to force the network to learn new representations, preventing overfitting and increasing generalization ability.']}, {'end': 2279.744, 'segs': [{'end': 1640.906, 'src': 'embed', 'start': 1612.128, 'weight': 1, 'content': [{'end': 1615.209, 'text': 'but basically, dropout is not as complex as that sounds.', 'start': 1612.128, 'duration': 3.081}, {'end': 1617.391, 'text': 'Dropout can be done in three lines of code.', 'start': 1615.209, 'duration': 2.182}, {'end': 1621.233, 'text': "so definitely check out this blog post as well that I've linked.", 'start': 1617.391, 'duration': 3.842}, {'end': 1627.436, 'text': 'but what it does is it just randomly picks some neurons in a layer to Set to zero right.', 'start': 1621.233, 'duration': 6.203}, {'end': 1627.956, 'text': "so it's just.", 'start': 1627.436, 'duration': 0.52}, {'end': 1631.318, 'text': "it's just three lines, okay, and you can look at it in this notebook and Right.", 'start': 1627.956, 'duration': 3.362}, {'end': 1631.819, 'text': "so that's.", 'start': 1631.318, 'duration': 0.501}, {'end': 1634.221, 'text': 'and then our last step is probability conversion.', 'start': 1631.819, 'duration': 2.402}, {'end': 1640.906, 'text': "So we've got this huge set of values, right? All these little small images that are represented by this huge output matrix.", 'start': 1634.241, 'duration': 6.665}], 'summary': 'Dropout can be implemented in three lines of code, randomly setting some neurons to zero in a layer.', 'duration': 28.778, 'max_score': 1612.128, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1612128.jpg'}, {'end': 1723.1, 'src': 'embed', 'start': 1674.693, 'weight': 0, 'content': [{'end': 1676.375, 'text': "And that's going to give us the most likely class.", 'start': 1674.693, 'duration': 1.682}, {'end': 1683.948, 'text': 'Okay, those are the seven steps of a full forward pass through a convolutional network looks like that.', 'start': 1677.984, 'duration': 5.964}, {'end': 1689.171, 'text': 'And so now you might be wondering well, okay, so how do we train this thing?', 'start': 1685.048, 'duration': 4.123}, {'end': 1691.472, 'text': 'Well, using gradient descent, right?', 'start': 1689.231, 'duration': 2.241}, {'end': 1698.731, 'text': 'And when applied to neural networks, gradient descent is called back propagation.', 'start': 1691.612, 'duration': 7.119}, {'end': 1701.032, 'text': 'exactly. I hope you got that right anyway.', 'start': 1698.731, 'duration': 2.301}, {'end': 1703.434, 'text': 'ok, so how do we learn these magic numbers?', 'start': 1701.032, 'duration': 2.402}, {'end': 1708.137, 'text': 'right, how do we learn what these weight value should be, what the feature should be?', 'start': 1703.434, 'duration': 4.703}, {'end': 1710.138, 'text': 'back propagation is how we do it right.', 'start': 1708.137, 'duration': 2.001}, {'end': 1716.042, 'text': "and so we've talked quite a bit about back propagation and gradient descent, but I'll do a little, I'll go over it again.", 'start': 1710.138, 'duration': 5.904}, {'end': 1720.639, 'text': "um, But the idea is that we have some error, that we're computing.", 'start': 1716.042, 'duration': 4.597}, {'end': 1723.1, 'text': 'This is supervised learning.', 'start': 1721.099, 'duration': 2.001}], 'summary': 'Explains 7 steps of a forward pass, back propagation for training neural networks using gradient descent, and supervised learning.', 'duration': 48.407, 'max_score': 1674.693, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1674693.jpg'}, {'end': 1826.547, 'src': 'heatmap', 'start': 1739.706, 'weight': 0.94, 'content': [{'end': 1743.347, 'text': 'repeat, repeat, softmax or squash into probability values.', 'start': 1739.706, 'duration': 3.641}, {'end': 1744.027, 'text': 'pick the biggest one.', 'start': 1743.347, 'duration': 0.68}, {'end': 1745.848, 'text': 'And we have some prediction value.', 'start': 1744.407, 'duration': 1.441}, {'end': 1751.03, 'text': 'And what we do is we compare the prediction value to the actual value and we get an error.', 'start': 1746.328, 'duration': 4.702}, {'end': 1758.233, 'text': 'And we take our error and we compute the partial derivative of the error with respect to each weight value going backwards in the network.', 'start': 1751.45, 'duration': 6.783}, {'end': 1759.233, 'text': 'Okay, like this.', 'start': 1758.613, 'duration': 0.62}, {'end': 1765.596, 'text': "Okay, and so for regression we use the mean squared error if we're using linear regression.", 'start': 1760.754, 'duration': 4.842}, {'end': 1768.757, 'text': 'And for classification we use the softmax function.', 'start': 1766.036, 'duration': 2.721}, {'end': 1778.196, 'text': 'So remember how, in the first neural network we built, and in the linear regression example we used a, We use mean squared error to compute the error.', 'start': 1769.098, 'duration': 9.098}, {'end': 1780.057, 'text': "and now we're using the softmax.", 'start': 1778.196, 'duration': 1.861}, {'end': 1784.579, 'text': "So we'll take the, so we'll take the partial derivative of the error with respect to our weights,", 'start': 1780.057, 'duration': 4.522}, {'end': 1792.182, 'text': "And then that's going to give us the gradient value that we then update each of those weight values Recursively, going backward in the network.", 'start': 1784.579, 'duration': 7.603}, {'end': 1794.102, 'text': "and that's how it learns with those features.", 'start': 1792.182, 'duration': 1.92}, {'end': 1797.504, 'text': 'with the ideal feature, the weight matrix value should be.', 'start': 1794.102, 'duration': 3.402}, {'end': 1801.987, 'text': 'But what about the other magic numbers??', 'start': 1798.984, 'duration': 3.003}, {'end': 1805.87, 'text': 'What about the number of neurons and the number of features and the size of those features?', 'start': 1802.007, 'duration': 3.863}, {'end': 1811.094, 'text': 'and the pooling window size and the window stride? Well, those that is an active area of research.', 'start': 1805.87, 'duration': 5.224}, {'end': 1817.9, 'text': 'There are best practices for values that you should use for those hyper-parameters, the tuning knobs of our network.', 'start': 1811.335, 'duration': 6.565}, {'end': 1821.503, 'text': 'And Andrej Karpathy has some great material on this.', 'start': 1818.52, 'duration': 2.983}, {'end': 1826.547, 'text': "He's probably the leading source for convolutional networks right now in terms of written content.", 'start': 1822.083, 'duration': 4.464}], 'summary': 'Neural network training involves error computation, gradient update, and hyper-parameter tuning.', 'duration': 86.841, 'max_score': 1739.706, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1739706.jpg'}, {'end': 2031.37, 'src': 'heatmap', 'start': 1908.057, 'weight': 0.767, 'content': [{'end': 1914.46, 'text': "But if you have some data, like, say, customer data, or if you were to just flip the rows and columns, It doesn't matter what order they're in,", 'start': 1908.057, 'duration': 6.403}, {'end': 1915.101, 'text': "they're still.", 'start': 1914.46, 'duration': 0.641}, {'end': 1917.492, 'text': "you know, they're still features.", 'start': 1915.101, 'duration': 2.391}, {'end': 1925.197, 'text': "so a good rule of thumb is, if you swap out the rows and columns of your data set and It's just as useful like the space doesn't matter,", 'start': 1917.492, 'duration': 7.705}, {'end': 1928.159, 'text': "then you don't want to use a cnn, else you do.", 'start': 1925.197, 'duration': 2.962}, {'end': 1932.762, 'text': "Okay. and a great and last thing the great example of using cnn's are for robot learning.", 'start': 1928.159, 'duration': 4.603}, {'end': 1939.667, 'text': 'you can use a cnn for object detection and you can use a cnn for grasp Learning and combine the two, and then you could get a robot that cooks,', 'start': 1932.762, 'duration': 6.905}, {'end': 1940.407, 'text': 'which is really cool.', 'start': 1939.667, 'duration': 0.74}, {'end': 1944.33, 'text': "I've got a great tensorflow example and a great adversarial network example.", 'start': 1940.407, 'duration': 3.923}, {'end': 1946.191, 'text': "Okay, Let's go into the code now,", 'start': 1944.67, 'duration': 1.521}, {'end': 1954.433, 'text': "And so what I'm going to do is I'm going to look at the class for the convolutional network in NumPy as well as the prediction class.", 'start': 1947.471, 'duration': 6.962}, {'end': 1955.453, 'text': "There's two classes here.", 'start': 1954.453, 'duration': 1}, {'end': 1957.614, 'text': 'Okay, so these are our three inputs.', 'start': 1955.894, 'duration': 1.72}, {'end': 1961.855, 'text': 'Pickle is for saving and loading our serialized model.', 'start': 1958.034, 'duration': 3.821}, {'end': 1969.357, 'text': "What do I mean? Pickle is Python's way of having a platform or language agnostic way of saving data so you can load it up later.", 'start': 1962.135, 'duration': 7.222}, {'end': 1970.717, 'text': 'TensorFlow uses it.', 'start': 1969.757, 'duration': 0.96}, {'end': 1972.538, 'text': 'A bunch of other libraries use it as well.', 'start': 1970.757, 'duration': 1.781}, {'end': 1979.002, 'text': "NumPy is for matrix math, and we've got our own little custom class for pre-processing the data, because we don't care about that part.", 'start': 1973.078, 'duration': 5.924}, {'end': 1987.007, 'text': "We care about the machine learning part, okay? So let's talk about our light OCR, or object optical character recognition class.", 'start': 1979.022, 'duration': 7.985}, {'end': 1993.974, 'text': "In our initialized function, we're going to load the weights from the pickle file and then store all the labels that we've loaded.", 'start': 1987.548, 'duration': 6.426}, {'end': 2001.923, 'text': "we'll define how many rows and columns in an image load up our convolutional network using the lightCNN function with our saved weights.", 'start': 1993.974, 'duration': 7.949}, {'end': 2003.565, 'text': "so, assuming we've already trained our network,", 'start': 2001.923, 'duration': 1.642}, {'end': 2008.571, 'text': 'we load it with the saved weights from the pickle file and then we define the number of pooling layers.', 'start': 2003.565, 'duration': 5.006}, {'end': 2012.457, 'text': 'okay. so once we have that, then we can use this predict function.', 'start': 2009.312, 'duration': 3.145}, {'end': 2029.83, 'text': "so, given some new image will reshape the image so it's in the correct size to perform the dot product between that image and the first layer of our convolutional network and we'll feed it into our network and it's going to output a prediction probability for our class and we'll return it.", 'start': 2012.457, 'duration': 17.373}, {'end': 2031.37, 'text': 'okay?, Super high level.', 'start': 2029.83, 'duration': 1.54}], 'summary': 'Cnns are useful for robot learning, such as object detection and grasp learning, with a great example being a robot that cooks.', 'duration': 123.313, 'max_score': 1908.057, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1908057.jpg'}, {'end': 1957.614, 'src': 'embed', 'start': 1932.762, 'weight': 3, 'content': [{'end': 1939.667, 'text': 'you can use a cnn for object detection and you can use a cnn for grasp Learning and combine the two, and then you could get a robot that cooks,', 'start': 1932.762, 'duration': 6.905}, {'end': 1940.407, 'text': 'which is really cool.', 'start': 1939.667, 'duration': 0.74}, {'end': 1944.33, 'text': "I've got a great tensorflow example and a great adversarial network example.", 'start': 1940.407, 'duration': 3.923}, {'end': 1946.191, 'text': "Okay, Let's go into the code now,", 'start': 1944.67, 'duration': 1.521}, {'end': 1954.433, 'text': "And so what I'm going to do is I'm going to look at the class for the convolutional network in NumPy as well as the prediction class.", 'start': 1947.471, 'duration': 6.962}, {'end': 1955.453, 'text': "There's two classes here.", 'start': 1954.453, 'duration': 1}, {'end': 1957.614, 'text': 'Okay, so these are our three inputs.', 'start': 1955.894, 'duration': 1.72}], 'summary': 'Combining cnn for object detection and grasp learning, creating a robot that cooks.', 'duration': 24.852, 'max_score': 1932.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1932762.jpg'}], 'start': 1612.128, 'title': 'Convolutional neural networks', 'summary': 'Provides an overview of key steps in a forward pass through a convolutional network, including dropout, probability conversion, back propagation, and hyperparameter tuning, while emphasizing the significance of cnns for spatial data processing and their applications in object detection and grasp learning.', 'chapters': [{'end': 1987.007, 'start': 1612.128, 'title': 'Convolutional neural networks', 'summary': 'Provides an overview of key steps in a forward pass through a convolutional network, including dropout, probability conversion, back propagation, and hyperparameter tuning, while emphasizing the significance of cnns for spatial data processing and their applications in object detection and grasp learning.', 'duration': 374.879, 'highlights': ['The chapter outlines key steps in a forward pass through a convolutional network, including dropout, probability conversion, back propagation, and hyperparameter tuning, emphasizing the significance of CNNs for spatial data processing and their applications in object detection and grasp learning. Provides an overview of the key steps in a forward pass through a convolutional network, emphasizing the significance of CNNs for spatial data processing and their applications in object detection and grasp learning.', "The discussion includes the simplicity of implementing dropout in three lines of code and using softmax for probability conversion, with an emphasis on how these processes contribute to the neural network's prediction capabilities. Describes the simplicity of implementing dropout in three lines of code and using softmax for probability conversion, highlighting their contribution to the neural network's prediction capabilities.", 'The transcript also touches on the significance of back propagation in learning the ideal feature weights, along with the use of gradient descent and mean squared error for regression and classification tasks. Touches on the significance of back propagation in learning the ideal feature weights, along with the use of gradient descent and mean squared error for regression and classification tasks.', 'The chapter also emphasizes the relevance of CNNs for spatial data processing and their applications in object detection and grasp learning, providing examples of using CNNs for spatial 2D or 3D data and in robot learning, including object detection and grasp learning. Emphasizes the relevance of CNNs for spatial data processing and their applications in object detection and grasp learning, providing examples of using CNNs for spatial 2D or 3D data and in robot learning.']}, {'end': 2279.744, 'start': 1987.548, 'title': 'Convolutional network and prediction process', 'summary': 'Discusses the process of loading weights, using the predict function to output class probabilities, and the code structure of the convolutional network class.', 'duration': 292.196, 'highlights': ['The predict function reshapes the input image, performs dot product with the first layer of the convolutional network, and outputs class prediction probabilities.', 'The convolutional network class initializes lists for storing layers and weights, loads weights from a pickle file, and defines the predict function.', 'The process of feeding input through convolutional layers, applying activations like relu, and performing pooling and dropout to prevent overfitting is explained in detail.', 'The classification part involves flattening the layer, using dense layers for feature combination, and applying softmax function to output probability values.']}], 'duration': 667.616, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE1612128.jpg', 'highlights': ['The chapter outlines key steps in a forward pass through a convolutional network, emphasizing the significance of CNNs for spatial data processing and their applications in object detection and grasp learning.', "Describes the simplicity of implementing dropout in three lines of code and using softmax for probability conversion, highlighting their contribution to the neural network's prediction capabilities.", 'Touching on the significance of back propagation in learning the ideal feature weights, along with the use of gradient descent and mean squared error for regression and classification tasks.', 'Provides examples of using CNNs for spatial 2D or 3D data and in robot learning, including object detection and grasp learning.']}, {'end': 2763.455, 'segs': [{'end': 2336.809, 'src': 'embed', 'start': 2280.004, 'weight': 0, 'content': [{'end': 2282.145, 'text': "But what we're gonna do is we're gonna look at these functions as well.", 'start': 2280.004, 'duration': 2.141}, {'end': 2284.496, 'text': "So let's look at these functions.", 'start': 2283.49, 'duration': 1.006}, {'end': 2293.438, 'text': "So we'll start off with the convolutional layer function and have your notebook open with me as well so you could go over this.", 'start': 2287.654, 'duration': 5.784}, {'end': 2295.519, 'text': "The link is in the description if you don't know.", 'start': 2293.698, 'duration': 1.821}, {'end': 2296.42, 'text': 'Now you know.', 'start': 2295.92, 'duration': 0.5}, {'end': 2297.941, 'text': "If you don't know, now you know.", 'start': 2296.8, 'duration': 1.141}, {'end': 2302.724, 'text': "So for our convolutional layer, given some input image, we're going to say well,", 'start': 2298.381, 'duration': 4.343}, {'end': 2306.507, 'text': "we'll store our feature maps and the bias value in these two variables features and bias.", 'start': 2302.724, 'duration': 3.783}, {'end': 2309.969, 'text': "We'll define how big our filter or patch is going to be.", 'start': 2306.807, 'duration': 3.162}, {'end': 2311.51, 'text': 'how many features do we want?', 'start': 2309.969, 'duration': 1.541}, {'end': 2312.911, 'text': 'how big is our image?', 'start': 2311.51, 'duration': 1.401}, {'end': 2314.452, 'text': 'how many channels RGB?', 'start': 2312.911, 'duration': 1.541}, {'end': 2315.453, 'text': 'so three?', 'start': 2314.452, 'duration': 1.001}, {'end': 2316.734, 'text': 'and then how many images do we have?', 'start': 2315.453, 'duration': 1.281}, {'end': 2319.136, 'text': "So given those values, we'll define a border mode.", 'start': 2316.894, 'duration': 2.242}, {'end': 2321.157, 'text': 'So a border mode.', 'start': 2319.156, 'duration': 2.001}, {'end': 2329.543, 'text': 'so when you apply fold to border mode, in this case it means that the filter has to go outside the bounds of the input by filter size divided by two.', 'start': 2321.157, 'duration': 8.386}, {'end': 2332.586, 'text': 'The area outside of the input is normally padded with zeros.', 'start': 2329.884, 'duration': 2.702}, {'end': 2336.809, 'text': 'And the border mode valid is when you get an output that is smaller than the input,', 'start': 2333.166, 'duration': 3.643}], 'summary': 'Discussing the convolutional layer function, filter size, border mode, and input padding.', 'duration': 56.805, 'max_score': 2280.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE2280004.jpg'}, {'end': 2723.677, 'src': 'embed', 'start': 2693.124, 'weight': 1, 'content': [{'end': 2694.945, 'text': 'Gradient descent back propagation works the same way.', 'start': 2693.124, 'duration': 1.821}, {'end': 2701.609, 'text': 'We take the partial derivative of our error with respect to our weights and then recursively update our weights using that gradient value.', 'start': 2695.265, 'duration': 6.344}, {'end': 2704.251, 'text': 'Gradient equals partial derivative equals delta.', 'start': 2701.889, 'duration': 2.362}, {'end': 2705.492, 'text': 'Interchangeable words.', 'start': 2704.591, 'duration': 0.901}, {'end': 2712.816, 'text': "But here's a great simple example right here where we, after the forward pass, we do the same thing in reverse order.", 'start': 2705.872, 'duration': 6.944}, {'end': 2717.399, 'text': 'So we calculate the gradient of those weights and then multiply them by the previous layer.', 'start': 2713.176, 'duration': 4.223}, {'end': 2723.677, 'text': 'And then for our JavaScript portion, we are taking the drawing from the user.', 'start': 2718.976, 'duration': 4.701}], 'summary': 'Gradient descent back propagation updates weights using partial derivatives and recursive multiplication.', 'duration': 30.553, 'max_score': 2693.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE2693124.jpg'}, {'end': 2759.112, 'src': 'embed', 'start': 2734.501, 'weight': 2, 'content': [{'end': 2740.543, 'text': 'so whenever the user actually starts moving that painting, whenever that mouse stops clicking and then the user hits the submit button,', 'start': 2734.501, 'duration': 6.042}, {'end': 2744.164, 'text': "we'll save that snapshot of that image and then feed that into the network.", 'start': 2740.543, 'duration': 3.621}, {'end': 2745.885, 'text': "And that's our Flask app.", 'start': 2744.904, 'duration': 0.981}, {'end': 2750.247, 'text': "We'll define two routes, one for our home and then one for that image, for the network.", 'start': 2746.305, 'duration': 3.942}, {'end': 2751.188, 'text': 'We can deploy it to the web.', 'start': 2750.267, 'duration': 0.921}, {'end': 2752.028, 'text': "There's a Heroku app.", 'start': 2751.208, 'duration': 0.82}, {'end': 2753.189, 'text': 'You can definitely check out the link.', 'start': 2752.048, 'duration': 1.141}, {'end': 2754.93, 'text': 'Link is in the description as well.', 'start': 2753.589, 'duration': 1.341}, {'end': 2756.091, 'text': 'Check out the notebook.', 'start': 2755.27, 'duration': 0.821}, {'end': 2757.371, 'text': "And yeah, that's it.", 'start': 2756.611, 'duration': 0.76}, {'end': 2759.112, 'text': 'Please subscribe for more programming videos.', 'start': 2757.391, 'duration': 1.721}], 'summary': 'A flask app saves image snapshots and deploys to heroku for web access.', 'duration': 24.611, 'max_score': 2734.501, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE2734501.jpg'}], 'start': 2280.004, 'title': 'Convolutional neural networks', 'summary': 'Discusses the implementation of convolutional neural networks, covering convolution, relu, max pooling, dropout, flattening, dense layers, softmax, forward and back propagation, and deployment using flask and heroku.', 'chapters': [{'end': 2336.809, 'start': 2280.004, 'title': 'Understanding convolutional layer functions', 'summary': 'Covers the implementation of a convolutional layer function, explaining the storage of feature maps and bias values, the definition of filter size and number of features, and the determination of border mode for input images.', 'duration': 56.805, 'highlights': ['The chapter explains the storage of feature maps and bias values for a convolutional layer function, along with the definition of filter size and number of features.', 'It also covers the determination of border mode for input images, including the concept of padding with zeros and the impact of border mode valid on output size.']}, {'end': 2763.455, 'start': 2336.809, 'title': 'Convolutional neural networks', 'summary': 'Discusses the implementation of convolutional neural networks, including convolution, relu, max pooling, dropout, flattening, dense layers, softmax, forward and back propagation, and deployment using flask and heroku.', 'duration': 426.646, 'highlights': ['The chapter explains the implementation of convolutional neural networks, including convolution, ReLU, max pooling, dropout, flattening, dense layers, softmax, forward and back propagation. It covers the key components and operations involved in convolutional neural networks.', 'The chapter discusses the deployment of the network using Flask and Heroku for web deployment. It provides insights into deploying the network using Flask and Heroku for web accessibility.', 'The chapter elaborates on the process of back propagation, involving the calculation of gradients and recursive updates of weights. It explains the process of back propagation, emphasizing the calculation of gradients and weight updates for network training.']}], 'duration': 483.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FTr3n7uBIuE/pics/FTr3n7uBIuE2280004.jpg', 'highlights': ['The chapter covers the implementation of convolutional neural networks, including convolution, ReLU, max pooling, dropout, flattening, dense layers, softmax, forward and back propagation, and deployment using Flask and Heroku.', 'It explains the process of back propagation, emphasizing the calculation of gradients and weight updates for network training.', 'The chapter provides insights into deploying the network using Flask and Heroku for web accessibility.', 'It also covers the determination of border mode for input images, including the concept of padding with zeros and the impact of border mode valid on output size.', 'The chapter explains the storage of feature maps and bias values for a convolutional layer function, along with the definition of filter size and number of features.']}], 'highlights': ['The network is built using only NumPy for matrix math in Python, with no other libraries such as TensorFlow or PyTorch.', 'It is capable of recognizing any character drawn with a mouse and making predictions, such as identifying numbers or letters.', 'The network can accurately identify and predict characters or numbers inputted through drawing or writing.', "Yann LeCun's inspiration for convolutional nets was influenced by the hierarchical structure of the mammalian visual cortex, emphasizing local connections, layering, and spatial invariance, which enable the detection of features regardless of spatial variations.", 'The mammalian visual cortex operates in a hierarchical manner, with clusters of neurons representing different features, from simple lines and edges to complex shapes, ultimately enabling the recognition of entire objects like faces or animals.', "Yann LeCun and his team's 1998 paper on convolutional nets, focusing on local connections, layering, and spatial invariance, is considered a significant contribution to the field, although the landmark status is debatable compared to other influential papers like Krzyzewski's ImageNet paper in 2012.", 'Convolutional neural networks are designed to mimic the mammalian visual cortex, with layers of operations applied to the input image, and every layer not connected to every other neuron in the next layer, reducing computational expense.', 'The receptive field in a convolutional network is the part of the image where a convolution operation is applied, and it is iteratively slid across the image, applying dot products to all the numbers, resembling a flashlight shining over the image.', 'The unique structure of convolutional networks involves a part of the image being connected iteratively, reducing the computational expense and avoiding a combinatorial explosion that would arise from connecting every single pixel value in an image to every single pixel value in the next layer of features.', 'The feature learning part of the convolutional network entails a series of operations including convolution, ReLU activation, and pooling, repeated within convolutional blocks.', 'The convolutional network culminates in the application of a fully connected layer to harness the learnings and a softmax function to produce probability values for specific classes.', 'The first step involves preparing image data as a three-dimensional matrix of pixel values, with three channels representing red, green, and blue, and popular training data sets like CIFAR and COCO are used.', 'Supervised learning involves learning the mapping between input data and output label, and back propagation is used to update weights based on the associated label.', 'Convolution is described as a fundamental operation used in various fields, including computer science and machine learning, and is likened to combining two paint buckets to create a new color, illustrating the concept of convolution.', 'Convolution involves combining input matrices with weight matrices through matrix operations to produce convolved feature maps for feature detection and filtering.', 'The convolution theorem is a general theorem applied to matrix operations, emphasizing the dot product between input and kernel matrices for feature detection.', 'The Fast Fourier Transform converts spatial data into Fourier space, used in sound and audio, and is a complex and mathematically accurate way of representing data.', 'Pooling reduces the computational complexity of the model by extracting the most important parts of the matrix, with max pooling being the most widely used type of pooling.', 'Activation functions like ReLU turn all negative numbers into zero, facilitating the learning of nonlinear functions and preventing math errors in convolutional networks.', 'Dropout randomly turns neurons on and off to force the network to learn new representations, preventing overfitting and increasing generalization ability.', 'The chapter outlines key steps in a forward pass through a convolutional network, emphasizing the significance of CNNs for spatial data processing and their applications in object detection and grasp learning.', "Describes the simplicity of implementing dropout in three lines of code and using softmax for probability conversion, highlighting their contribution to the neural network's prediction capabilities.", 'Touching on the significance of back propagation in learning the ideal feature weights, along with the use of gradient descent and mean squared error for regression and classification tasks.', 'Provides examples of using CNNs for spatial 2D or 3D data and in robot learning, including object detection and grasp learning.', 'The chapter covers the implementation of convolutional neural networks, including convolution, ReLU, max pooling, dropout, flattening, dense layers, softmax, forward and back propagation, and deployment using Flask and Heroku.', 'It explains the process of back propagation, emphasizing the calculation of gradients and weight updates for network training.', 'The chapter provides insights into deploying the network using Flask and Heroku for web accessibility.', 'It also covers the determination of border mode for input images, including the concept of padding with zeros and the impact of border mode valid on output size.', 'The chapter explains the storage of feature maps and bias values for a convolutional layer function, along with the definition of filter size and number of features.']}