title
Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)
description
A very simple explanation of convolutional neural network or CNN or ConvNet such that even a high school student can understand it easily. This video involves very less math and is perfect for total beginner who doesn't have any idea on what CNN is and how it works. We will cover different topics such as,
1. Why traditionally humans are better at image recognition than computers?
2. Disadvantages of using traditional artificial neural network (ANN) for image classification.
3. How human brain recognizes images?
4. How computers can use filters for feature detection
5. What is convolution operation and how it works
6. Importance of ReLU activation in CNN
7. Importance of pooling operation in CNN
8. How to handle rotation and scale in CNN
🔖 Hashtags 🔖
#convolutionalneuralnetwork #cnndeeplearning #cnntutorial #cnnmachinelearning #cnnalgorithm #cnnpython #cnntensorflow
Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses.
Deep learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO
Machine learning playlist : https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Here are some good articles on CNN,
Is CNN scale/rotation invariant?
https://stats.stackexchange.com/questions/239076/about-cnn-kernels-and-scale-rotation-invariance#:~:text=22-,1)%20The%20features%20extracted%20using%20CNN%20are%20scale%20and%20rotation,details%2C%20see%3A%20Deep%20Learning.&text=Convolution%20is%20not%20naturally%20equivariant,or%20rotation%20of%20an%20image.
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
PDF Paper on cnn: http://www.deeplearningbook.org/contents/convnets.html
🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description
Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website.
#️⃣ Social Media #️⃣
🔗 Discord: https://discord.gg/r42Kbuk
📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/
📸 Instagram: https://www.instagram.com/codebasicshub/
🔊 Facebook: https://www.facebook.com/codebasicshub
📱 Twitter: https://twitter.com/codebasicshub
📝 Linkedin: https://www.linkedin.com/company/codebasics/
DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.
detail
{'title': 'Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)', 'heatmap': [{'end': 421.022, 'start': 340.304, 'weight': 0.852}, {'end': 588.179, 'start': 567.624, 'weight': 0.817}, {'end': 804.092, 'start': 746.028, 'weight': 0.786}, {'end': 890.093, 'start': 843.095, 'weight': 0.767}, {'end': 992.008, 'start': 946.479, 'weight': 0.844}, {'end': 1050.38, 'start': 1018.677, 'weight': 0.743}, {'end': 1127.934, 'start': 1103.772, 'weight': 0.719}, {'end': 1224.931, 'start': 1188.297, 'weight': 0.888}], 'summary': "Tutorial on convolutional neural networks (cnn) covers the limitations of artificial neural networks for image recognition, the role of cnn in addressing these limitations, the process of feature detection using filters and creating feature maps through convolution operations, the benefits of max pooling in reducing image size and computation, and the basics of cnn including the benefits of relu and pooling, as well as insights into the creator's goals for their youtube channel.", 'chapters': [{'end': 163.996, 'segs': [{'end': 106.929, 'src': 'embed', 'start': 29.75, 'weight': 1, 'content': [{'end': 34.133, 'text': 'The issue with this presentation is that this is too much hard coded.', 'start': 29.75, 'duration': 4.383}, {'end': 40.458, 'text': 'If you have a little shift in digit 9, for example, 9 here was in the middle,', 'start': 35.174, 'duration': 5.284}, {'end': 46.863, 'text': 'but in this case it is in the left and the representation of numbers just changes.', 'start': 40.458, 'duration': 6.405}, {'end': 57.99, 'text': "It doesn't match with our original number grid and computer will not be able to recognize that this is number 9.", 'start': 47.683, 'duration': 10.307}, {'end': 60.832, 'text': 'There could be a variation since it is a handwritten digit.', 'start': 57.99, 'duration': 2.842}, {'end': 68.556, 'text': 'There could be variation in how you write it, which will change the two dimensional representation of numbers.', 'start': 61.752, 'duration': 6.804}, {'end': 72.118, 'text': 'And again, you will not be able to match it with the original grid.', 'start': 68.916, 'duration': 3.202}, {'end': 80.293, 'text': 'So we use artificial neural network for this kind of case to handle the variety.', 'start': 74.229, 'duration': 6.064}, {'end': 89.858, 'text': 'In this deep learning series, we have already looked at artificial neural network video on handwritten digits recognition.', 'start': 81.213, 'duration': 8.645}, {'end': 96.863, 'text': 'If you have not seen that video, please make sure you see it so that your fundamentals on artificial neural networks are clear.', 'start': 90.539, 'duration': 6.324}, {'end': 106.929, 'text': 'In that, we created a one-dimensional array by flattening the two-dimensional representation of our hand-written digit number.', 'start': 97.523, 'duration': 9.406}], 'summary': 'Hard-coded presentation causes recognition issues with number 9, requiring artificial neural network for handwritten digit recognition.', 'duration': 77.179, 'max_score': 29.75, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM29750.jpg'}, {'end': 163.996, 'src': 'embed', 'start': 134.064, 'weight': 0, 'content': [{'end': 140.066, 'text': 'in this case, the first layer, neuron itself, will be 6 million.', 'start': 134.064, 'duration': 6.002}, {'end': 143.928, 'text': "if you have, let's say, hidden layer with 4 million neurons,", 'start': 140.066, 'duration': 3.862}, {'end': 152.151, 'text': "you're talking about 24 million weights to be calculated just between the input and hidden layer.", 'start': 143.928, 'duration': 8.223}, {'end': 156.933, 'text': 'and remember, deep neural networks have many hidden layers,', 'start': 152.151, 'duration': 4.782}, {'end': 163.996, 'text': 'so this can go easily into like 500 million or 1 million of weights that you have to compute,', 'start': 156.933, 'duration': 7.063}], 'summary': 'Neural network with 6 million neurons, 24 million weights for input-hidden layer, leading to 500-1000 million weights in deep networks.', 'duration': 29.932, 'max_score': 134.064, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM134064.jpg'}], 'start': 0.069, 'title': 'Understanding convolutional neural networks', 'summary': 'Discusses the limitations of artificial neural networks in recognizing handwritten digits, challenges with larger images, and the need for convolutional neural networks, with examples of computational complexity involved.', 'chapters': [{'end': 163.996, 'start': 0.069, 'title': 'Understanding convolutional neural networks', 'summary': 'Explains the limitations of artificial neural networks in recognizing handwritten digits and the challenges with larger images, emphasizing the need for convolutional neural networks, with examples of the computational complexity involved.', 'duration': 163.927, 'highlights': ['The computational complexity of using dense neural networks for larger images, such as a 1920x1080 image with RGB channels, is highlighted, with the mention of 24 million weights to be calculated between the input and hidden layer, and the potential for deep neural networks to involve hundreds of millions of weights.', 'The limitations of using dense neural networks for recognizing handwritten digits due to variations in representation and the need for artificial neural networks to handle such variety is explained, emphasizing the drawbacks of hard-coded approaches.', 'The concept of using artificial neural networks for recognizing handwritten digits and the challenges associated with variations in representation due to handwritten input is discussed, with an emphasis on the need for convolutional neural networks to address these challenges effectively.', 'The explanation of recognizing a handwritten digit 9 as a grid of numbers and the issues with hard-coded representations in computer recognition is provided, along with the impact of variations in handwritten input on the two-dimensional representation of numbers, leading to the need for artificial neural networks in such cases.']}], 'duration': 163.927, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM69.jpg', 'highlights': ['The computational complexity of using dense neural networks for larger images is highlighted, with the mention of 24 million weights to be calculated between the input and hidden layer.', 'The limitations of using dense neural networks for recognizing handwritten digits due to variations in representation and the need for artificial neural networks to handle such variety is explained.', 'The concept of using artificial neural networks for recognizing handwritten digits and the challenges associated with variations in representation due to handwritten input is discussed.', 'The explanation of recognizing a handwritten digit 9 as a grid of numbers and the issues with hard-coded representations in computer recognition is provided.']}, {'end': 487.976, 'segs': [{'end': 226.649, 'src': 'embed', 'start': 163.996, 'weight': 0, 'content': [{'end': 167.797, 'text': "and that's too much computation for your little computer.", 'start': 163.996, 'duration': 3.801}, {'end': 175.868, 'text': "see, my rabbits are getting electrical shock because it's just too much to do So.", 'start': 167.797, 'duration': 8.071}, {'end': 182.87, 'text': 'the disadvantages of using ANN or artificial neural network for image classification is too much computation.', 'start': 175.868, 'duration': 7.002}, {'end': 186.891, 'text': 'It also treats local pixels same as pixels far apart.', 'start': 183.75, 'duration': 3.141}, {'end': 193.313, 'text': "If you have koala's face in a left corner versus right corner, it is still a koala.", 'start': 187.651, 'duration': 5.662}, {'end': 195.414, 'text': "Doesn't matter where the face is located.", 'start': 193.593, 'duration': 1.821}, {'end': 203.199, 'text': 'so the image recognition task is centered around the locality.', 'start': 196.374, 'duration': 6.825}, {'end': 213.287, 'text': "okay, so if the pixels are moved around, it should still be able to detect the object in an image, but with ANN it's hard.", 'start': 203.199, 'duration': 10.088}, {'end': 217.794, 'text': 'So how does human recognize this image so easily?', 'start': 214.088, 'duration': 3.706}, {'end': 226.649, 'text': "So let's go into the neuroscience little bit and try to see how we as humans recognize any image so easily.", 'start': 217.834, 'duration': 8.815}], 'summary': 'Disadvantages of using ann for image classification: high computation, lacks locality, unlike human recognition.', 'duration': 62.653, 'max_score': 163.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM163996.jpg'}, {'end': 306.641, 'src': 'embed', 'start': 279.828, 'weight': 2, 'content': [{'end': 285.09, 'text': 'And there are different set of neurons which are connected to these neurons, which will again aggregate the results,', 'start': 279.828, 'duration': 5.262}, {'end': 292.272, 'text': "saying that if the image has Koala's head and body, it means it is Koala's image.", 'start': 285.09, 'duration': 7.182}, {'end': 297.114, 'text': 'Same thing with handwritten digit 9.', 'start': 294.513, 'duration': 2.601}, {'end': 306.641, 'text': 'There are these little edges which come together and form a loopy circle pattern which is kind of like a head of digit 9.', 'start': 297.114, 'duration': 9.527}], 'summary': 'Neural networks can recognize koalas and handwritten digit 9 based on specific patterns and connections in the neurons.', 'duration': 26.813, 'max_score': 279.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM279828.jpg'}, {'end': 421.022, 'src': 'heatmap', 'start': 340.304, 'weight': 0.852, 'content': [{'end': 344.749, 'text': 'The first one is the head which is a loopy circle pattern.', 'start': 340.304, 'duration': 4.445}, {'end': 347.232, 'text': 'In the middle you have vertical line.', 'start': 345.59, 'duration': 1.642}, {'end': 350.215, 'text': 'In the end you have diagonal filter.', 'start': 348.533, 'duration': 1.682}, {'end': 363.353, 'text': 'So we take our original image and we will apply a convolution operation or a filter operation.', 'start': 354.827, 'duration': 8.526}, {'end': 367.476, 'text': 'So here I have a loopy circle pattern or a head filter.', 'start': 364.034, 'duration': 3.442}, {'end': 369.698, 'text': 'This filter right here.', 'start': 368.777, 'duration': 0.921}, {'end': 384.086, 'text': 'The way convolution operation works is you take 3 by 3 grid from your original image and multiply individual numbers with this filter.', 'start': 373.678, 'duration': 10.408}, {'end': 390.31, 'text': 'So this minus 1 is multiplied with this one, this one is multiplied with this one and so on.', 'start': 384.346, 'duration': 5.964}, {'end': 397.576, 'text': 'In the end you get a result and then you find the average which is divided by 9 because there are total 9 numbers.', 'start': 391.691, 'duration': 5.885}, {'end': 401.651, 'text': 'And whatever number you get, you put it here.', 'start': 399.249, 'duration': 2.402}, {'end': 404.953, 'text': 'Now this particular thing is called a feature map.', 'start': 402.631, 'duration': 2.322}, {'end': 409.155, 'text': 'So by doing this convolution operation, you are creating a feature map.', 'start': 405.553, 'duration': 3.602}, {'end': 414.358, 'text': 'So you do it for the second round of 3x3 grid.', 'start': 409.996, 'duration': 4.362}, {'end': 418.121, 'text': "Here I'm taking a stride of 1.", 'start': 414.759, 'duration': 3.362}, {'end': 421.022, 'text': 'You can take a stride of 2 or 3.', 'start': 418.121, 'duration': 2.901}], 'summary': 'Using convolution operation to create a feature map from the original image by applying a loopy circle pattern filter and finding the average of the result.', 'duration': 80.718, 'max_score': 340.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM340304.jpg'}, {'end': 401.651, 'src': 'embed', 'start': 373.678, 'weight': 3, 'content': [{'end': 384.086, 'text': 'The way convolution operation works is you take 3 by 3 grid from your original image and multiply individual numbers with this filter.', 'start': 373.678, 'duration': 10.408}, {'end': 390.31, 'text': 'So this minus 1 is multiplied with this one, this one is multiplied with this one and so on.', 'start': 384.346, 'duration': 5.964}, {'end': 397.576, 'text': 'In the end you get a result and then you find the average which is divided by 9 because there are total 9 numbers.', 'start': 391.691, 'duration': 5.885}, {'end': 401.651, 'text': 'And whatever number you get, you put it here.', 'start': 399.249, 'duration': 2.402}], 'summary': 'Convolution operation uses a 3x3 grid to multiply and average, resulting in a single value.', 'duration': 27.973, 'max_score': 373.678, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM373678.jpg'}], 'start': 163.996, 'title': 'Neural networks for image analysis', 'summary': 'Discusses the limitations of artificial neural networks (ann) for image classification, citing computational load and object recognition issues. it also explores how neural networks detect features in images and handwritten digits using filters, with examples of koalas and digit 9, and the process of creating feature maps through convolution operations.', 'chapters': [{'end': 226.649, 'start': 163.996, 'title': 'Ann for image classification', 'summary': 'Discusses the disadvantages of using artificial neural networks (ann) for image classification, citing the excessive computational load and the inability to effectively recognize objects regardless of their location in the image.', 'duration': 62.653, 'highlights': ['Artificial neural networks (ANN) for image classification require excessive computation, leading to a performance bottleneck for smaller computers.', 'ANN treats local pixels the same as pixels far apart, hindering its ability to effectively recognize objects regardless of their location in the image.', 'The image recognition task should be centered around the locality, allowing for the detection of objects even if the pixels are moved, which is challenging with ANN.', 'Human ability to recognize images easily is contrasted with the limitations of ANN, prompting a dive into neuroscience to understand the human recognition process.']}, {'end': 487.976, 'start': 227.768, 'title': 'Neural network feature detection', 'summary': 'Discusses how neural networks detect features in images and handwritten digits using filters, with examples of koalas and digit 9, and the process of creating feature maps through convolution operations.', 'duration': 260.208, 'highlights': ['Neural networks detect features in images using filters, as demonstrated with examples of koalas and digit 9.', 'The convolution operation involves multiplying a 3x3 grid from the original image with a filter to create a feature map.', "The benefit of creating a feature map is the ability to detect specific features, such as a loopy circle pattern for digit 9 or koala's eyes, nose, and ears."]}], 'duration': 323.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM163996.jpg', 'highlights': ['Artificial neural networks (ANN) for image classification require excessive computation, leading to a performance bottleneck for smaller computers.', 'The image recognition task should be centered around the locality, allowing for the detection of objects even if the pixels are moved, which is challenging with ANN.', 'Neural networks detect features in images using filters, as demonstrated with examples of koalas and digit 9.', 'The convolution operation involves multiplying a 3x3 grid from the original image with a filter to create a feature map.', 'ANN treats local pixels the same as pixels far apart, hindering its ability to effectively recognize objects regardless of their location in the image.']}, {'end': 838.813, 'segs': [{'end': 513.895, 'src': 'embed', 'start': 488.877, 'weight': 0, 'content': [{'end': 499.885, 'text': 'In summary, when you apply this filter or a convolution operation, you are generating a feature map that has that particular feature detected.', 'start': 488.877, 'duration': 11.008}, {'end': 503.267, 'text': 'So in a way, filters are nothing but the feature detectors.', 'start': 500.465, 'duration': 2.802}, {'end': 507.851, 'text': "For Koala's case, you can have eye detector.", 'start': 505.429, 'duration': 2.422}, {'end': 513.895, 'text': 'And when you apply convolution operation, in the result, see, you got these two eyes at this location.', 'start': 508.491, 'duration': 5.404}], 'summary': "Convolution operation generates feature map for detecting specific features like eyes in koala's case.", 'duration': 25.018, 'max_score': 488.877, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM488877.jpg'}, {'end': 666.635, 'src': 'heatmap', 'start': 567.624, 'weight': 2, 'content': [{'end': 578.094, 'text': 'in case of nine, we saw that we need to apply three filters the head, the middle part and the tail, And when you apply those,', 'start': 567.624, 'duration': 10.47}, {'end': 579.775, 'text': 'you get three feature maps.', 'start': 578.094, 'duration': 1.681}, {'end': 581.696, 'text': 'So I applied three filters.', 'start': 580.355, 'duration': 1.341}, {'end': 588.179, 'text': 'I got three feature maps, and this is how these feature maps are represented.', 'start': 582.336, 'duration': 5.843}, {'end': 596.103, 'text': "If you're reading any online article or a book, they're kind of stacked together and they almost form a 3D volume.", 'start': 588.199, 'duration': 7.904}, {'end': 610.357, 'text': 'In case of Koala, My eye, nose and ear filters will produce three different feature maps and I can apply convolution operation again.', 'start': 598.344, 'duration': 12.013}, {'end': 614.541, 'text': "And let's say this time the filter is to detect head.", 'start': 611.819, 'duration': 2.722}, {'end': 618.065, 'text': "By the way, the filter doesn't have to be 2D.", 'start': 615.282, 'duration': 2.783}, {'end': 620.207, 'text': 'It can be three dimensional as well.', 'start': 618.585, 'duration': 1.622}, {'end': 631.471, 'text': 'So just imagine this first dimension is representing eyes, and the second slice is representing nose, and third slice is representing ears.', 'start': 621.886, 'duration': 9.585}, {'end': 639.676, 'text': "And by doing that filter, you can say that Koala's head is in this particular region of an image.", 'start': 632.272, 'duration': 7.404}, {'end': 644.599, 'text': 'So you are aggregating these results using a different filter for head.', 'start': 639.696, 'duration': 4.903}, {'end': 649.624, 'text': 'and now this becomes a koala head detector.', 'start': 645.479, 'duration': 4.145}, {'end': 653.588, 'text': 'similarly, there could be koala body detector.', 'start': 649.624, 'duration': 3.964}, {'end': 657.211, 'text': 'and now we got these two new feature maps,', 'start': 653.588, 'duration': 3.623}, {'end': 663.438, 'text': "where this feature map is saying that koala's head is at this location and koala's body is at this particular location.", 'start': 657.211, 'duration': 6.227}, {'end': 666.635, 'text': 'then we flatten these numbers.', 'start': 665.094, 'duration': 1.541}], 'summary': 'Applying filters in convolutional neural networks generates feature maps, forming 3d volumes, enabling detection of specific features in images.', 'duration': 88.541, 'max_score': 567.624, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM567624.jpg'}, {'end': 813.938, 'src': 'heatmap', 'start': 746.028, 'weight': 5, 'content': [{'end': 754.632, 'text': 'And the second portion, where we are using dense neural network, is called classification, because the first part is detecting all the features ears,', 'start': 746.028, 'duration': 8.604}, {'end': 756.573, 'text': 'nose, eyes, head and body, et cetera.', 'start': 754.632, 'duration': 1.941}, {'end': 760.955, 'text': 'And the second part is responsible for classification.', 'start': 757.493, 'duration': 3.462}, {'end': 765.033, 'text': 'We also perform a ReLU operation.', 'start': 762.512, 'duration': 2.521}, {'end': 768.814, 'text': 'So this is not a complete convolutional neural network.', 'start': 765.333, 'duration': 3.481}, {'end': 771.214, 'text': 'There are two other components.', 'start': 769.314, 'duration': 1.9}, {'end': 779.897, 'text': 'One is ReLU, which is nothing but if you have seen my activation video on the same deep learning tutorial series.', 'start': 771.475, 'duration': 8.422}, {'end': 786.523, 'text': 'We use ReLU activation to bring non-linearity in our model.', 'start': 781.46, 'duration': 5.063}, {'end': 795.367, 'text': 'So what it will do is it will take your feature map and whatever negative values are there, it just replaces that with zero.', 'start': 787.483, 'duration': 7.884}, {'end': 796.668, 'text': 'It is so easy.', 'start': 795.888, 'duration': 0.78}, {'end': 801.05, 'text': 'And if the value is more than zero, it will keep it as it is.', 'start': 798.109, 'duration': 2.941}, {'end': 802.891, 'text': 'So you see, just look at the values.', 'start': 801.31, 'duration': 1.581}, {'end': 804.092, 'text': "It's pretty straightforward.", 'start': 802.991, 'duration': 1.101}, {'end': 813.938, 'text': 'ReLU helps with making the model nonlinear because you are picking bunch of values and making them zero.', 'start': 805.851, 'duration': 8.087}], 'summary': 'Using dense neural network for classification, including relu operation to introduce non-linearity.', 'duration': 26.455, 'max_score': 746.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM746028.jpg'}], 'start': 488.877, 'title': 'Convolutional filters and cnn feature maps', 'summary': 'Explores the role of convolutional filters as feature detectors, enabling location-invariant feature detection, demonstrated by detecting eyes and hands of koalas. it also discusses how cnns use filters to create feature maps for classification and the involvement of relu activation for non-linearity.', 'chapters': [{'end': 578.094, 'start': 488.877, 'title': 'Convolutional filter for feature detection', 'summary': 'Explains how convolutional filters act as feature detectors, allowing for location-invariant feature detection, as demonstrated in the example of detecting eyes and hands of koalas using convolution operations.', 'duration': 89.217, 'highlights': ['Convolutional filters act as feature detectors, enabling location-invariant feature detection.', 'Demonstration of detecting eyes and hands of koalas using convolution operations.', 'Explanation of applying multiple filters for detecting different parts of an object.']}, {'end': 838.813, 'start': 578.094, 'title': 'Cnn feature maps and classification', 'summary': 'Explains how convolutional neural networks use filters to create feature maps, which are then aggregated and processed through a dense neural network for classification, while also addressing the role of relu activation in bringing non-linearity to the model.', 'duration': 260.719, 'highlights': ['Convolutional operation creates feature maps representing different features like eyes, nose, and ears, which are then aggregated using different filters for head and body detection.', 'Aggregating the results using different filters leads to the creation of feature maps for head and body detection, which are then flattened and processed through a dense neural network for classification.', 'The ReLU activation function is used to introduce non-linearity by replacing negative values with zero and keeping positive values as they are, which helps in making the model non-linear.']}], 'duration': 349.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM488877.jpg', 'highlights': ['Convolutional filters act as feature detectors, enabling location-invariant feature detection.', 'Demonstration of detecting eyes and hands of koalas using convolution operations.', 'Explanation of applying multiple filters for detecting different parts of an object.', 'Convolutional operation creates feature maps representing different features like eyes, nose, and ears.', 'Aggregating the results using different filters leads to the creation of feature maps for head and body detection.', 'The ReLU activation function is used to introduce non-linearity by replacing negative values with zero.']}, {'end': 1163.488, 'segs': [{'end': 890.093, 'src': 'heatmap', 'start': 843.095, 'weight': 0.767, 'content': [{'end': 846.657, 'text': 'Do something because see for this image size.', 'start': 843.095, 'duration': 3.562}, {'end': 851.26, 'text': "If you are applying convolution, let's say with some padding.", 'start': 847.598, 'duration': 3.662}, {'end': 855.142, 'text': "You're still getting same size of image.", 'start': 852.781, 'duration': 2.361}, {'end': 857.384, 'text': 'You did not reduce the image size.', 'start': 855.363, 'duration': 2.021}, {'end': 862.667, 'text': "Sometimes people don't use padding, so they reduce the image size, but only little bit.", 'start': 857.984, 'duration': 4.683}, {'end': 867.388, 'text': 'So pulling is used to reduce the size.', 'start': 864.642, 'duration': 2.746}, {'end': 874.926, 'text': "The main purpose of pulling is to reduce the dimension so that my computer doesn't get this shock, you know.", 'start': 867.448, 'duration': 7.478}, {'end': 879.888, 'text': 'so the first pulling operation is um, the max pulling.', 'start': 874.926, 'duration': 4.962}, {'end': 887.231, 'text': 'so here you take a window of two by two and you pick the maximum number from that window and put it here.', 'start': 879.888, 'duration': 7.343}, {'end': 890.093, 'text': 'so here check this yellow window five, one, eight, two.', 'start': 887.231, 'duration': 2.862}], 'summary': 'Pooling reduces image size, e.g. max pooling uses 2x2 window to pick maximum number.', 'duration': 46.998, 'max_score': 843.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM843095.jpg'}, {'end': 1012.611, 'src': 'heatmap', 'start': 946.479, 'weight': 0, 'content': [{'end': 947.18, 'text': 'You get an idea.', 'start': 946.479, 'duration': 0.701}, {'end': 950.081, 'text': 'And we keep on taking max.', 'start': 948.72, 'duration': 1.361}, {'end': 952.882, 'text': 'And this is what we get.', 'start': 951.681, 'duration': 1.201}, {'end': 955.835, 'text': 'when our number is shifted.', 'start': 954.153, 'duration': 1.682}, {'end': 960.54, 'text': 'so see, this is the original number where we got this max pulling map.', 'start': 955.835, 'duration': 4.705}, {'end': 964.444, 'text': 'when number is shifted, you get this pulling map.', 'start': 960.54, 'duration': 3.904}, {'end': 970.33, 'text': 'so still you are detecting the loopy pattern at the top.', 'start': 964.444, 'duration': 5.886}, {'end': 981.603, 'text': 'so max pulling, along with convolution, helps you with, uh, position invariant feature detection.', 'start': 970.33, 'duration': 11.273}, {'end': 987.106, 'text': "doesn't matter where your eyes or ears are in the image, it will detect that feature.", 'start': 981.603, 'duration': 5.503}, {'end': 992.008, 'text': 'for you, there is average pulling also, instead of max.', 'start': 987.106, 'duration': 4.902}, {'end': 993.069, 'text': 'you just make an average.', 'start': 992.008, 'duration': 1.061}, {'end': 997.371, 'text': 'see pi and one, six and two, eight, eight and eight, sixteen.', 'start': 993.069, 'duration': 4.302}, {'end': 999.692, 'text': 'sixteen, divided by four, is four.', 'start': 997.371, 'duration': 2.321}, {'end': 1003.834, 'text': 'so, but max pulling is more generally used.', 'start': 999.692, 'duration': 4.142}, {'end': 1005.835, 'text': 'but sometimes people use average pulling also.', 'start': 1003.834, 'duration': 2.001}, {'end': 1012.611, 'text': "So benefits of pooling number one obvious, it's reducing your dimension and computation.", 'start': 1007.386, 'duration': 5.225}], 'summary': 'Max pulling helps with position invariant feature detection, reducing dimension and computation.', 'duration': 56.776, 'max_score': 946.479, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM946479.jpg'}, {'end': 1050.38, 'src': 'heatmap', 'start': 1018.677, 'weight': 0.743, 'content': [{'end': 1024.082, 'text': 'And the third one is model, is variant, tolerant towards variation and distortion.', 'start': 1018.677, 'duration': 5.405}, {'end': 1033.631, 'text': "because if there is a distortion and if you're picking just a maximum number, you are capturing the main feature and you are filtering all the noise.", 'start': 1024.082, 'duration': 9.549}, {'end': 1040.377, 'text': 'So this is how a complete convolutional neural network looks like.', 'start': 1035.474, 'duration': 4.903}, {'end': 1050.38, 'text': 'In that, you will have typically a convolution and ReLU layer, then you will have pooling, then there will be another convolution ReLU pooling.', 'start': 1041.696, 'duration': 8.684}], 'summary': 'Cnn model is variant-tolerant, capturing main features and filtering noise.', 'duration': 31.703, 'max_score': 1018.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1018677.jpg'}, {'end': 1163.488, 'src': 'heatmap', 'start': 1103.772, 'weight': 2, 'content': [{'end': 1106.434, 'text': 'There are three benefits of convolution operation.', 'start': 1103.772, 'duration': 2.662}, {'end': 1111.619, 'text': 'The first one is connection sparsity reduces overfitting.', 'start': 1107.035, 'duration': 4.584}, {'end': 1123.85, 'text': 'Connection sparsity means not every node is connected with every other node like in artificial neural network where we call that a dense network.', 'start': 1112.94, 'duration': 10.91}, {'end': 1127.934, 'text': 'Here we have a filter which we move around the image.', 'start': 1124.991, 'duration': 2.943}, {'end': 1131.927, 'text': 'And at the time, we are talking about only a local region.', 'start': 1128.484, 'duration': 3.443}, {'end': 1134.169, 'text': 'So we are not affecting the whole image.', 'start': 1132.387, 'duration': 1.782}, {'end': 1138.672, 'text': 'The second benefit is convolution and pulling.', 'start': 1135.329, 'duration': 3.343}, {'end': 1149.841, 'text': 'operation combined gives you a location invariant feature detection, which means koalas could be in the left corner, in the right corner anywhere.', 'start': 1138.672, 'duration': 11.169}, {'end': 1151.803, 'text': 'We will still detect it.', 'start': 1150.522, 'duration': 1.281}, {'end': 1163.488, 'text': 'Third is a parameter sharing which is when you learn the parameters for a filter you can apply them in the entire image.', 'start': 1153.106, 'duration': 10.382}], 'summary': 'Convolution operation offers benefits such as connection sparsity, location invariant feature detection, and parameter sharing.', 'duration': 50.548, 'max_score': 1103.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1103772.jpg'}], 'start': 839.333, 'title': 'Max pooling and cnn benefits', 'summary': 'Discusses max pooling for image size reduction, resulting in a feature map half the size, reducing computation, and compares it to average pooling. it also explains the structure of cnn, emphasizing feature extraction and benefits such as connection sparsity, location invariant feature detection, and parameter sharing for efficient learning.', 'chapters': [{'end': 1012.611, 'start': 839.333, 'title': 'Max pooling for image size reduction', 'summary': 'Discusses the concept of max pooling for reducing image dimensions by taking the maximum value from a window, resulting in a new feature map half the size, leading to a significant reduction in computation, with a comparison to average pooling and emphasis on position invariant feature detection.', 'duration': 173.278, 'highlights': ['Max pooling involves taking the maximum value from a window to reduce the size of the feature map, resulting in significant computation savings, for example, reducing 16 numbers into 4.', 'The concept of stride in max pooling is explained, illustrating the process of moving a window and taking the maximum value, showcasing its effectiveness in detecting position invariant features.', 'Max pooling, in conjunction with convolution, aids in position invariant feature detection, regardless of the location of the feature in the image, emphasizing its utility in image analysis.', 'The use of average pooling as an alternative to max pooling is mentioned, although max pooling is more commonly utilized due to its benefits in dimension reduction and computation.']}, {'end': 1163.488, 'start': 1013.332, 'title': 'Convolutional neural networks: feature extraction and benefits', 'summary': 'Explains the structure of a convolutional neural network, highlighting the feature extraction process and the three main benefits of convolution operations, including connection sparsity for reducing overfitting, location invariant feature detection, and parameter sharing for efficient learning.', 'duration': 150.156, 'highlights': ['Convolution and pooling operations provide location invariant feature detection, ensuring the detection of features regardless of their position, resulting in improved model tolerance towards variation and distortion.', 'Connection sparsity in convolutional neural networks reduces overfitting by not connecting every node with every other node, leading to fewer parameters and improved model generalization.', 'The first convolution layer in a complete convolutional neural network detects specific features such as eye, nose, and ears, contributing to the feature extraction process.']}], 'duration': 324.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM839333.jpg', 'highlights': ['Max pooling reduces feature map size, saving significant computation, e.g., 16 to 4', 'Max pooling aids in position invariant feature detection, enhancing image analysis', 'Convolution and pooling operations ensure location invariant feature detection', 'Connection sparsity in CNN reduces overfitting, leading to improved generalization']}, {'end': 1431.356, 'segs': [{'end': 1224.931, 'src': 'heatmap', 'start': 1165.689, 'weight': 0, 'content': [{'end': 1175.431, 'text': 'The benefit of ReLU is that it introduces non-linearity, which is essential because when we are solving a deep learning problems,', 'start': 1165.689, 'duration': 9.742}, {'end': 1177.331, 'text': 'they are non-linear by nature.', 'start': 1175.431, 'duration': 1.9}, {'end': 1182.174, 'text': 'It also speeds up training and it is faster to compute.', 'start': 1178.692, 'duration': 3.482}, {'end': 1187.497, 'text': 'Remember value is you are just doing one check whether the number is greater than zero or not.', 'start': 1182.974, 'duration': 4.523}, {'end': 1191.659, 'text': 'If it is greater than zero, keep the number less than zero, make it zero.', 'start': 1188.297, 'duration': 3.362}, {'end': 1198.283, 'text': 'The benefit of pulling is that it reduces dimension and computation.', 'start': 1194.321, 'duration': 3.962}, {'end': 1203.986, 'text': 'It reduces overfitting and make the model tolerant towards small distortions.', 'start': 1198.823, 'duration': 5.163}, {'end': 1207.898, 'text': 'how about rotation and thickness?', 'start': 1205.416, 'duration': 2.482}, {'end': 1218.686, 'text': 'because by itself cnn cannot handle the rotation and the thickness.', 'start': 1207.898, 'duration': 10.788}, {'end': 1224.931, 'text': 'so you need to have training samples which have some rotated and scaled sample.', 'start': 1218.686, 'duration': 6.245}], 'summary': 'Relu introduces non-linearity, speeds up training, and reduces overfitting in deep learning. pooling reduces dimension and computation, making the model tolerant towards small distortions.', 'duration': 25.97, 'max_score': 1165.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1165689.jpg'}, {'end': 1330.227, 'src': 'embed', 'start': 1305.656, 'weight': 1, 'content': [{'end': 1315.678, 'text': 'but this is the beauty of convolutional neural network that it will automatically detect these filters on its own and that is part of the training.', 'start': 1305.656, 'duration': 10.022}, {'end': 1324.701, 'text': "so when the neural network is training, or when the cnn is training because you're supplying thousands of koalas images here Using that,", 'start': 1315.678, 'duration': 9.023}, {'end': 1330.227, 'text': 'it will use back propagation and it will figure out the right amount of filters.', 'start': 1324.701, 'duration': 5.526}], 'summary': 'Convolutional neural network detects filters automatically during training, using thousands of koala images for back propagation.', 'duration': 24.571, 'max_score': 1305.656, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1305656.jpg'}, {'end': 1404.17, 'src': 'embed', 'start': 1380.304, 'weight': 2, 'content': [{'end': 1386.788, 'text': 'I teach data science, machine learning, Python programming and career guidance on my YouTube channel.', 'start': 1380.304, 'duration': 6.484}, {'end': 1394.869, 'text': 'If you are starting machine learning and if you are looking for a very basic beginners level of tutorials, then I have a complete playlist.', 'start': 1387.728, 'duration': 7.141}, {'end': 1404.17, 'text': 'You can start with very basic Python and pandas knowledge on this playlist and can learn machine learning in a very very easy to understand manner.', 'start': 1395.768, 'duration': 8.402}], 'summary': 'Youtube channel offers beginner-friendly tutorials on data science, machine learning, and python programming.', 'duration': 23.866, 'max_score': 1380.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1380304.jpg'}], 'start': 1165.689, 'title': 'Convolutional neural network basics', 'summary': "Explains the benefits of relu and pooling in cnn, such as introducing non-linearity, reducing dimension and computation, handling rotation and scale, and automatically detecting filters. it also outlines the process of training a cnn and offers insights into the creator's goals for their youtube channel.", 'chapters': [{'end': 1431.356, 'start': 1165.689, 'title': 'Convolutional neural network basics', 'summary': "Explains the benefits of relu and pooling in cnn, such as introducing non-linearity, reducing dimension and computation, handling rotation and scale, and automatically detecting filters. it also outlines the process of training a cnn and offers insights into the creator's goals for their youtube channel.", 'duration': 265.667, 'highlights': ['The benefits of ReLU and pooling in CNN include introducing non-linearity, reducing dimension and computation, handling rotation and scale, and automatically detecting filters.', 'The process of training a CNN involves supplying thousands of images, using back propagation to figure out the right amount of filters and their values, and specifying the number and size of filters as hyperparameters.', "Insights into the creator's goals for their YouTube channel include teaching data science, machine learning, Python programming, and career guidance, with a focus on providing basic beginners level tutorials and covering deep learning topics."]}], 'duration': 265.667, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/zfiSAzpy9NM/pics/zfiSAzpy9NM1165689.jpg', 'highlights': ['The benefits of ReLU and pooling in CNN include introducing non-linearity, reducing dimension and computation, handling rotation and scale, and automatically detecting filters.', 'The process of training a CNN involves supplying thousands of images, using back propagation to figure out the right amount of filters and their values, and specifying the number and size of filters as hyperparameters.', "Insights into the creator's goals for their YouTube channel include teaching data science, machine learning, Python programming, and career guidance, with a focus on providing basic beginners level tutorials and covering deep learning topics."]}], 'highlights': ['Max pooling reduces feature map size, saving significant computation, e.g., 16 to 4', 'The benefits of ReLU and pooling in CNN include introducing non-linearity, reducing dimension and computation, handling rotation and scale, and automatically detecting filters', 'Convolutional filters act as feature detectors, enabling location-invariant feature detection', 'The process of training a CNN involves supplying thousands of images, using back propagation to figure out the right amount of filters and their values, and specifying the number and size of filters as hyperparameters', 'The computational complexity of using dense neural networks for larger images is highlighted, with the mention of 24 million weights to be calculated between the input and hidden layer']}