title
Gradient descent, how neural networks learn | Chapter 2, Deep learning

description
Enjoy these videos? Consider sharing one or two. Help fund future projects: https://www.patreon.com/3blue1brown Special thanks to these supporters: http://3b1b.co/nn2-thanks Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks This video was supported by Amplify Partners. For any early-stage ML startup founders, Amplify Partners would love to hear from you via 3blue1brown@amplifypartners.com To learn more, I highly recommend the book by Michael Nielsen http://neuralnetworksanddeeplearning.com/ The book walks through the code behind the example in these videos, which you can find here: https://github.com/mnielsen/neural-networks-and-deep-learning MNIST database: http://yann.lecun.com/exdb/mnist/ Also check out Chris Olah's blog: http://colah.github.io/ His post on Neural networks and topology is particular beautiful, but honestly all of the stuff there is great. And if you like that, you'll *love* the publications at distill: https://distill.pub/ For more videos, Welch Labs also has some great series on machine learning: https://youtu.be/i8D90DkCLhI https://youtu.be/bxe2T-V8XRs "But I've already voraciously consumed Nielsen's, Olah's and Welch's works", I hear you say. Well well, look at you then. That being the case, I might recommend that you continue on with the book "Deep Learning" by Goodfellow, Bengio, and Courville. Thanks to Lisha Li (@lishali88) for her contributions at the end, and for letting me pick her brain so much about the material. Here are the articles she referenced at the end: https://arxiv.org/abs/1611.03530 https://arxiv.org/abs/1706.05394 https://arxiv.org/abs/1412.0233 Music by Vincent Rubinetti: https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown ------------------- Video timeline 0:00 - Introduction 0:30 - Recap 1:49 - Using training data 3:01 - Cost functions 6:55 - Gradient descent 11:18 - More on gradient vectors 12:19 - Gradient descent recap 13:01 - Analyzing the network 16:37 - Learning more 17:38 - Lisha Li interview 19:58 - Closing thoughts ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe, and click the bell to receive notifications (if you're into that). If you are new to this channel and want to see more, a good place to start is this playlist: http://3b1b.co/recommended Various social media stuffs: Website: https://www.3blue1brown.com Twitter: https://twitter.com/3Blue1Brown Patreon: https://patreon.com/3blue1brown Facebook: https://www.facebook.com/3blue1brown Reddit: https://www.reddit.com/r/3Blue1Brown

detail
{'title': 'Gradient descent, how neural networks learn | Chapter 2, Deep learning', 'heatmap': [{'end': 255.923, 'start': 219.617, 'weight': 0.703}, {'end': 582.091, 'start': 564.215, 'weight': 0.739}, {'end': 616.988, 'start': 599.728, 'weight': 0.938}, {'end': 840.895, 'start': 824.166, 'weight': 0.711}], 'summary': 'Explores neural network learning, covering gradient descent as the learning algorithm, neural network basics, cost function, and weight changes, achieving a classification accuracy of 98% on new images with potential for improvement, and delving into modern image recognition networks.', 'chapters': [{'end': 169.892, 'segs': [{'end': 169.892, 'src': 'embed', 'start': 124.082, 'weight': 0, 'content': [{'end': 130.023, 'text': "and it'll adjust those 13, 000 weights and biases so as to improve its performance on the training data.", 'start': 124.082, 'duration': 5.941}, {'end': 135.585, 'text': 'Hopefully this layered structure will mean that what it learns generalizes to images.', 'start': 130.803, 'duration': 4.782}, {'end': 140.667, 'text': 'beyond that training data And the way we test that is that after you train the network,', 'start': 135.585, 'duration': 5.082}, {'end': 146.537, 'text': "you show it more labeled data that it's never seen before and you see how accurately it classifies those new images.", 'start': 140.667, 'duration': 5.87}, {'end': 154.461, 'text': 'Fortunately for us, and what makes this such a common example, to start with,', 'start': 151.078, 'duration': 3.383}, {'end': 161.466, 'text': 'is that the good people behind the MNIST database have put together a collection of tens of thousands of handwritten digit images,', 'start': 154.461, 'duration': 7.005}, {'end': 164.008, 'text': "each one labeled with the numbers that they're supposed to be.", 'start': 161.466, 'duration': 2.542}, {'end': 169.892, 'text': 'And as provocative as it is to describe a machine as learning, once you actually see how it works,', 'start': 165.029, 'duration': 4.863}], 'summary': 'Neural network adjusts 13,000 weights to generalize learning to new images, tested on labeled data for accuracy.', 'duration': 45.81, 'max_score': 124.082, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w124082.jpg'}], 'start': 4.406, 'title': 'Neural network learning', 'summary': "Introduces gradient descent, the learning algorithm for neural networks, focusing on a neural network's structure for handwritten digit recognition and its ability to generalize to new images, leveraging a dataset with tens of thousands of labeled handwritten digit images.", 'chapters': [{'end': 169.892, 'start': 4.406, 'title': 'Neural network learning', 'summary': "Introduces gradient descent, the learning algorithm for neural networks, with a focus on a neural network's structure for handwritten digit recognition and its ability to generalize to new images, leveraging a dataset with tens of thousands of labeled handwritten digit images.", 'duration': 165.486, 'highlights': ['The neural network has 13,000 weights and biases that can be adjusted to improve its performance on training data.', "The network's structure aims to allow it to generalize to images beyond the training data, tested by classifying new images that it hasn't seen before.", 'The MNIST database contains tens of thousands of labeled handwritten digit images, serving as a common example for training and testing neural networks.']}], 'duration': 165.486, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w4406.jpg', 'highlights': ['The neural network has 13,000 weights and biases for performance improvement.', "The network's structure aims to generalize to new images beyond training data.", 'The MNIST database contains tens of thousands of labeled handwritten digit images.']}, {'end': 618.648, 'segs': [{'end': 215.876, 'src': 'embed', 'start': 188.286, 'weight': 4, 'content': [{'end': 194.529, 'text': 'and the weights in the weighted sum defining its activation are kind of like the strengths of those connections,', 'start': 188.286, 'duration': 6.243}, {'end': 198.951, 'text': 'and the bias is some indication of whether that neuron tends to be active or inactive.', 'start': 194.529, 'duration': 4.422}, {'end': 204.39, 'text': "And to start things off, we're just gonna initialize all of those weights and biases totally randomly.", 'start': 199.707, 'duration': 4.683}, {'end': 210.713, 'text': "Needless to say, this network is gonna perform pretty horribly on a given training example, since it's just doing something random.", 'start': 204.89, 'duration': 5.823}, {'end': 215.876, 'text': 'For example, you feed in this image of a three, and the output layer, it just looks like a mess.', 'start': 211.113, 'duration': 4.763}], 'summary': "Neural network's random initialization leads to poor performance.", 'duration': 27.59, 'max_score': 188.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w188286.jpg'}, {'end': 255.923, 'src': 'heatmap', 'start': 219.617, 'weight': 0.703, 'content': [{'end': 228.441, 'text': 'a way of telling the computer no bad computer that output should have activations which are zero for most neurons, but one for this neuron.', 'start': 219.617, 'duration': 8.824}, {'end': 230.722, 'text': 'What you gave me is utter trash.', 'start': 228.842, 'duration': 1.88}, {'end': 233.684, 'text': 'To say that a little more mathematically.', 'start': 231.683, 'duration': 2.001}, {'end': 241.167, 'text': 'what you do is add up the squares of the differences between each of those trash output activations and the value that you want them to have.', 'start': 233.684, 'duration': 7.483}, {'end': 244.949, 'text': "And this is what we'll call the cost of a single training example.", 'start': 241.807, 'duration': 3.142}, {'end': 252.582, 'text': 'Notice, this sum is small when the network confidently classifies the image correctly,', 'start': 245.979, 'duration': 6.603}, {'end': 255.923, 'text': "but it's large when the network seems like it doesn't really know what it's doing.", 'start': 252.582, 'duration': 3.341}], 'summary': 'Training cost measures confidence in image classification.', 'duration': 36.306, 'max_score': 219.617, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w219617.jpg'}, {'end': 319.953, 'src': 'embed', 'start': 289.525, 'weight': 3, 'content': [{'end': 292.748, 'text': 'Well, the cost function is a layer of complexity on top of that.', 'start': 289.525, 'duration': 3.223}, {'end': 297.571, 'text': 'It takes as its input those 13, 000 or so weights and biases,', 'start': 293.268, 'duration': 4.303}, {'end': 308.901, 'text': "and it spits out a single number describing how bad those weights and biases are and the way it's defined depends on the network's behavior over all the tens of thousands of pieces of training data.", 'start': 297.571, 'duration': 11.33}, {'end': 310.522, 'text': "That's a lot to think about.", 'start': 309.521, 'duration': 1.001}, {'end': 315.789, 'text': "But just telling the computer what a crappy job it's doing isn't very helpful.", 'start': 312.366, 'duration': 3.423}, {'end': 319.953, 'text': 'You want to tell it how to change those weights and biases so that it gets better.', 'start': 316.31, 'duration': 3.643}], 'summary': 'Cost function evaluates 13,000 weights and biases, aiming to improve network behavior.', 'duration': 30.428, 'max_score': 289.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w289525.jpg'}, {'end': 582.091, 'src': 'heatmap', 'start': 538.676, 'weight': 0, 'content': [{'end': 550.705, 'text': 'changing the weights and biases to decrease it means making the output of the network on each piece of training data look less like a random array of 10 values and more like an actual decision that we want it to make.', 'start': 538.676, 'duration': 12.029}, {'end': 556.809, 'text': "It's important to remember this cost function involves an average over all of the training data.", 'start': 551.465, 'duration': 5.344}, {'end': 560.832, 'text': "so if you minimize it, it means it's a better performance on all of those samples.", 'start': 556.809, 'duration': 4.023}, {'end': 571.982, 'text': 'The algorithm for computing this gradient efficiently, which is effectively the heart of how a neural network learns, is called backpropagation,', 'start': 564.215, 'duration': 7.767}, {'end': 573.884, 'text': "and it's what I'm going to be talking about next video.", 'start': 571.982, 'duration': 1.902}, {'end': 582.091, 'text': 'There, I really want to take the time to walk through what exactly happens to each weight and each bias for a given piece of training data,', 'start': 574.584, 'duration': 7.507}], 'summary': 'Adjusting weights and biases improves network performance by making outputs resemble desired decisions. backpropagation is vital for efficient gradient computation and neural network learning.', 'duration': 43.415, 'max_score': 538.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w538676.jpg'}, {'end': 626.79, 'src': 'heatmap', 'start': 599.728, 'weight': 2, 'content': [{'end': 604.793, 'text': "one consequence of that is that it's important for this cost function to have a nice, smooth output,", 'start': 599.728, 'duration': 5.065}, {'end': 608.056, 'text': 'so that we can find a local minimum by taking little steps downhill.', 'start': 604.793, 'duration': 3.263}, {'end': 616.988, 'text': 'This is why, by the way, artificial neurons have continuously ranging activations rather than simply being active or inactive in a binary way,', 'start': 609.226, 'duration': 7.762}, {'end': 618.648, 'text': 'the way that biological neurons are.', 'start': 616.988, 'duration': 1.66}, {'end': 626.79, 'text': 'This process of repeatedly nudging an input of a function by some multiple of the negative gradient is called gradient descent.', 'start': 620.309, 'duration': 6.481}], 'summary': 'Cost function needs smooth output for gradient descent in artificial neurons.', 'duration': 27.062, 'max_score': 599.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w599728.jpg'}], 'start': 169.892, 'title': 'Neural network basics and cost function', 'summary': 'Covers neural network fundamentals, including neurons, weights, and biases, and explains the cost function, gradient descent, and backpropagation, emphasizing the importance of a smooth cost function and the impact of minimizing it on overall performance.', 'chapters': [{'end': 241.167, 'start': 169.892, 'title': 'Neural network fundamentals', 'summary': 'Explains the fundamentals of neural networks, including the concept of neurons, weights, biases, and the process of defining a cost function to improve network performance.', 'duration': 71.275, 'highlights': ['The process of initializing weights and biases randomly leads to poor performance on training examples.', 'Defining a cost function involves adding up the squares of the differences between output activations and the desired values.', 'Explaining the concept of neurons as being connected to all neurons in the previous layer, and how weights and biases influence their activation.']}, {'end': 618.648, 'start': 241.807, 'title': 'Neural network cost function', 'summary': 'Discusses the concept of the cost function, the gradient descent algorithm, and the role of backpropagation in efficiently computing the gradient, emphasizing the importance of a smooth cost function in network learning and the impact of minimizing it on overall performance.', 'duration': 376.841, 'highlights': ['Backpropagation is the algorithm for efficiently computing the gradient of the cost function, effectively the heart of how a neural network learns.', 'The cost function involves an average over all training data, so minimizing it leads to better performance across all samples.', "The smoothness of the cost function's output is vital for finding a local minimum through little steps downhill, influencing the need for continuously ranging activations in artificial neurons."]}], 'duration': 448.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w169892.jpg', 'highlights': ['Backpropagation is the algorithm for efficiently computing the gradient of the cost function, effectively the heart of how a neural network learns.', 'The cost function involves an average over all training data, so minimizing it leads to better performance across all samples.', "The smoothness of the cost function's output is vital for finding a local minimum through little steps downhill, influencing the need for continuously ranging activations in artificial neurons.", 'Defining a cost function involves adding up the squares of the differences between output activations and the desired values.', 'Explaining the concept of neurons as being connected to all neurons in the previous layer, and how weights and biases influence their activation.', 'The process of initializing weights and biases randomly leads to poor performance on training examples.']}, {'end': 777.755, 'segs': [{'end': 644.096, 'src': 'embed', 'start': 620.309, 'weight': 1, 'content': [{'end': 626.79, 'text': 'This process of repeatedly nudging an input of a function by some multiple of the negative gradient is called gradient descent.', 'start': 620.309, 'duration': 6.481}, {'end': 632.512, 'text': "It's a way to converge towards some local minimum of a cost function, basically a valley in this graph.", 'start': 627.351, 'duration': 5.161}, {'end': 638.353, 'text': "I'm still showing the picture of a function with two inputs, of course, because nudges in a 13,", 'start': 633.43, 'duration': 4.923}, {'end': 644.096, 'text': '000-dimensional input space are a little hard to wrap your mind around, but there is actually a nice non-spatial way to think about this.', 'start': 638.353, 'duration': 5.743}], 'summary': 'Gradient descent converges to local minimum of cost function.', 'duration': 23.787, 'max_score': 620.309, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w620309.jpg'}, {'end': 698.577, 'src': 'embed', 'start': 655.763, 'weight': 0, 'content': [{'end': 662.607, 'text': 'But importantly, the relative magnitudes of all these components kind of tells you which changes matter more.', 'start': 655.763, 'duration': 6.844}, {'end': 666.884, 'text': 'You see in our network,', 'start': 665.543, 'duration': 1.341}, {'end': 672.987, 'text': 'an adjustment to one of the weights might have a much greater impact on the cost function than the adjustment to some other weight.', 'start': 666.884, 'duration': 6.103}, {'end': 678.089, 'text': 'Some of these connections just matter more for our training data.', 'start': 674.828, 'duration': 3.261}, {'end': 688.215, 'text': 'So a way that you can think about this gradient vector of our mind-warpingly massive cost function is that it encodes the relative importance of each weight and bias.', 'start': 679.33, 'duration': 8.885}, {'end': 692.277, 'text': 'That is, which of these changes is going to carry the most bang for your buck.', 'start': 688.695, 'duration': 3.582}, {'end': 696.655, 'text': 'This really is just another way of thinking about direction.', 'start': 693.973, 'duration': 2.682}, {'end': 698.577, 'text': 'To take a simpler example,', 'start': 697.276, 'duration': 1.301}], 'summary': "The gradient vector encodes the relative importance of each weight and bias in the network's training data.", 'duration': 42.814, 'max_score': 655.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w655763.jpg'}, {'end': 774.354, 'src': 'embed', 'start': 739.784, 'weight': 3, 'content': [{'end': 742.245, 'text': "Alright, let's zoom out and sum up where we are so far.", 'start': 739.784, 'duration': 2.461}, {'end': 749.987, 'text': 'The network itself is this function with 784 inputs and 10 outputs, defined in terms of all of these weighted sums.', 'start': 742.785, 'duration': 7.202}, {'end': 753.588, 'text': 'The cost function is a layer of complexity on top of that.', 'start': 750.767, 'duration': 2.821}, {'end': 761.63, 'text': 'It takes the 13, 000 weights and biases as inputs and spits out a single measure of lousiness based on the training examples.', 'start': 754.088, 'duration': 7.542}, {'end': 766.831, 'text': 'and the gradient of the cost function is one more layer of complexity still.', 'start': 762.489, 'duration': 4.342}, {'end': 774.354, 'text': 'It tells us what nudges to all of these weights and biases cause the fastest change to the value of the cost function,', 'start': 767.451, 'duration': 6.903}], 'summary': 'Neural network has 784 inputs, 10 outputs, and 13,000 weights. gradient provides direction for fastest change.', 'duration': 34.57, 'max_score': 739.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w739784.jpg'}], 'start': 620.309, 'title': 'Gradient descent and weight changes', 'summary': 'Delves into the concept of gradient descent and the significance of weight changes in a neural network, emphasizing the role of the gradient vector in encoding the importance of each weight and bias.', 'chapters': [{'end': 777.755, 'start': 620.309, 'title': 'Gradient descent and importance of weight changes', 'summary': 'Explains the concept of gradient descent and the relative importance of weight changes in a neural network, with emphasis on the gradient vector encoding the importance of each weight and bias.', 'duration': 157.446, 'highlights': ['The gradient vector encodes the relative importance of each weight and bias, determining which changes matter more.', 'The process of gradient descent is a way to converge towards a local minimum of a cost function, with the gradient indicating the fastest change to the value of the cost function.', 'The relative magnitudes of the components of the negative gradient determine which changes to weights and biases carry the most impact, aiding in understanding which connections matter more for the training data.', 'The function with 784 inputs and 10 outputs is defined in terms of weighted sums, with the cost function providing a measure of lousiness based on the training examples.', 'The concept of gradient descent and the importance of weight changes is illustrated using a function with two variables, emphasizing the interpretation of the gradient as encoding the relative importance of each variable.']}], 'duration': 157.446, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w620309.jpg', 'highlights': ['The gradient vector encodes the relative importance of each weight and bias, determining which changes matter more.', 'The process of gradient descent converges towards a local minimum of a cost function, with the gradient indicating the fastest change.', 'The relative magnitudes of the components of the negative gradient determine which changes to weights and biases carry the most impact.', 'The function with 784 inputs and 10 outputs is defined in terms of weighted sums, with the cost function providing a measure of lousiness based on the training examples.', 'The concept of gradient descent and the importance of weight changes is illustrated using a function with two variables.']}, {'end': 1056.35, 'segs': [{'end': 831.97, 'src': 'embed', 'start': 802.203, 'weight': 0, 'content': [{'end': 805.883, 'text': 'It classifies about 96% of the new images that it sees correctly.', 'start': 802.203, 'duration': 3.68}, {'end': 812.345, 'text': 'And honestly, if you look at some of the examples that it messes up on, you kind of feel compelled to cut it a little slack.', 'start': 806.804, 'duration': 5.541}, {'end': 821.924, 'text': 'Now if you play around with the hidden layer structure and make a couple tweaks, you can get this up to 98%.', 'start': 816.161, 'duration': 5.763}, {'end': 822.645, 'text': "And that's pretty good!", 'start': 821.924, 'duration': 0.721}, {'end': 824.166, 'text': "It's not the best.", 'start': 823.085, 'duration': 1.081}, {'end': 831.97, 'text': 'you can certainly get better performance by getting more sophisticated than this plain vanilla network, but given how daunting the initial task is,', 'start': 824.166, 'duration': 7.804}], 'summary': 'The model correctly classifies 96% of new images, but can be improved to 98% with tweaks.', 'duration': 29.767, 'max_score': 802.203, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w802203.jpg'}, {'end': 847.176, 'src': 'heatmap', 'start': 824.166, 'weight': 0.711, 'content': [{'end': 831.97, 'text': 'you can certainly get better performance by getting more sophisticated than this plain vanilla network, but given how daunting the initial task is,', 'start': 824.166, 'duration': 7.804}, {'end': 837.693, 'text': "I just think there's something incredible about any network doing this well on images that it's never seen before,", 'start': 831.97, 'duration': 5.723}, {'end': 840.895, 'text': 'given that we never specifically told it what patterns to look for.', 'start': 837.693, 'duration': 3.202}, {'end': 847.176, 'text': 'Originally, the way that I motivated this structure was by describing a hope that we might have.', 'start': 842.614, 'duration': 4.562}], 'summary': 'Plain vanilla network performs well on unseen images without specific training, demonstrating the potential of the structure.', 'duration': 23.01, 'max_score': 824.166, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w824166.jpg'}, {'end': 947.621, 'src': 'embed', 'start': 905.905, 'weight': 3, 'content': [{'end': 908.628, 'text': "doesn't exactly pick up on the patterns that we might have hoped for.", 'start': 905.905, 'duration': 2.723}, {'end': 913.745, 'text': 'And to really drive this point home, watch what happens when you input a random image.', 'start': 909.724, 'duration': 4.021}, {'end': 921.347, 'text': 'If the system was smart, you might expect it to either feel uncertain, maybe not really activating any of those 10 output neurons,', 'start': 914.505, 'duration': 6.842}, {'end': 922.808, 'text': 'or activating them all evenly.', 'start': 921.347, 'duration': 1.461}, {'end': 927.229, 'text': 'But instead it confidently gives you some nonsense answer,', 'start': 923.608, 'duration': 3.621}, {'end': 934.491, 'text': 'as if it feels as sure that this random noise is a 5 as it does that an actual image of a 5 is a 5..', 'start': 927.229, 'duration': 7.262}, {'end': 940.636, 'text': 'Phrased differently, even if this network can recognize digits pretty well, it has no idea how to draw them.', 'start': 934.491, 'duration': 6.145}, {'end': 945.179, 'text': "A lot of this is because it's such a tightly constrained training setup.", 'start': 941.837, 'duration': 3.342}, {'end': 947.621, 'text': "I mean, put yourself in the network's shoes here.", 'start': 945.8, 'duration': 1.821}], 'summary': 'Neural network struggles with random input, lacks pattern recognition and certainty.', 'duration': 41.716, 'max_score': 905.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w905905.jpg'}, {'end': 1029.035, 'src': 'embed', 'start': 998.737, 'weight': 6, 'content': [{'end': 1006.182, 'text': "Shifting the focus for a moment from how networks learn to how you learn? That'll only happen if you engage actively with the material here somehow.", 'start': 998.737, 'duration': 7.445}, {'end': 1020.772, 'text': 'One pretty simple thing that I want you to do is just pause right now and think deeply for a moment about what changes you might make to this system and how it perceives images if you wanted it to better pick up on things like edges and patterns.', 'start': 1007.083, 'duration': 13.689}, {'end': 1029.035, 'text': 'But better than that, to actually engage with the material, I highly recommend the book by Michael Nielsen on deep learning and neural networks.', 'start': 1021.786, 'duration': 7.249}], 'summary': 'Engage actively with the material to enhance learning. consider changes to image perception for improved edge and pattern recognition.', 'duration': 30.298, 'max_score': 998.737, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w998737.jpg'}], 'start': 782.938, 'title': 'Neural network image recognition', 'summary': 'Discusses the performance of a neural network with two hidden layers of 16 neurons, achieving a classification accuracy of 98% on new images, with potential for improvement. it also explores the limitations of a basic neural network in recognizing images, highlighting expected patterns and lack of understanding in recognizing random images.', 'chapters': [{'end': 824.166, 'start': 782.938, 'title': 'Neural network image classification', 'summary': 'Discusses the performance of a neural network with two hidden layers of 16 neurons, achieving a classification accuracy of 98% on new images, with the potential to improve further through structural tweaks.', 'duration': 41.228, 'highlights': ['The neural network with two hidden layers of 16 neurons achieves a classification accuracy of 98% on new images through structural tweaks.', 'The network initially classifies about 96% of new images correctly, demonstrating its solid performance on unseen data.', "Adjusting the hidden layer structure and making tweaks can further improve the classification accuracy, showcasing the network's potential for enhancement."]}, {'end': 1056.35, 'start': 824.166, 'title': "Neural network's image recognition", 'summary': "Explores the limitations of a basic neural network in recognizing images, highlighting how it doesn't pick up on expected patterns, leading to a lack of understanding and confidence in recognizing random images, despite successfully classifying most images.", 'duration': 232.184, 'highlights': ["The network doesn't pick up on the expected patterns, leading to a lack of understanding and confidence in recognizing random images.", 'The network successfully classifies most images but lacks understanding of image patterns.', "The network's tightly constrained training setup limits its ability to perceive images and make confident decisions.", 'Engaging actively with the material and seeking additional resources is recommended for better understanding.']}], 'duration': 273.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w782938.jpg', 'highlights': ['The neural network with two hidden layers of 16 neurons achieves a classification accuracy of 98% on new images through structural tweaks.', 'The network initially classifies about 96% of new images correctly, demonstrating its solid performance on unseen data.', "Adjusting the hidden layer structure and making tweaks can further improve the classification accuracy, showcasing the network's potential for enhancement.", "The network doesn't pick up on the expected patterns, leading to a lack of understanding and confidence in recognizing random images.", "The network's tightly constrained training setup limits its ability to perceive images and make confident decisions.", 'The network successfully classifies most images but lacks understanding of image patterns.', 'Engaging actively with the material and seeking additional resources is recommended for better understanding.']}, {'end': 1213.071, 'segs': [{'end': 1135.416, 'src': 'embed', 'start': 1095.195, 'weight': 0, 'content': [{'end': 1100.859, 'text': 'But it was still able to achieve the same training accuracy as you would on a properly labeled dataset.', 'start': 1095.195, 'duration': 5.664}, {'end': 1108.203, 'text': 'Basically, the millions of weights for this particular network were enough for it to just memorize the random data,', 'start': 1101.599, 'duration': 6.604}, {'end': 1114.906, 'text': 'which kind of raises the question for whether minimizing this cost function actually corresponds to any sort of structure in the image.', 'start': 1108.203, 'duration': 6.703}, {'end': 1116.787, 'text': 'or is it just, you know, memorization?', 'start': 1114.906, 'duration': 1.881}, {'end': 1119.649, 'text': 'Memorize the entire data set of what the correct classification is.', 'start': 1116.807, 'duration': 2.842}, {'end': 1128.893, 'text': 'And so a couple of you know, half a year later, at ICML this year, there was not exactly rebuttal paper paper that addressed some aspects of like.', 'start': 1120.049, 'duration': 8.844}, {'end': 1132.154, 'text': 'hey, actually these networks are doing something a little bit smarter than that.', 'start': 1128.893, 'duration': 3.261}, {'end': 1135.416, 'text': 'If you look at that accuracy curve.', 'start': 1132.194, 'duration': 3.222}], 'summary': 'Training achieved same accuracy as properly labeled dataset, raising questions about cost function and structure in the image.', 'duration': 40.221, 'max_score': 1095.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w1095195.jpg'}, {'end': 1189.279, 'src': 'embed', 'start': 1168.508, 'weight': 2, 'content': [{'end': 1179.293, 'text': 'and so it was also interesting about that is it brings into light another paper from actually a couple of years ago which has a lot more simplifications about the network layers.', 'start': 1168.508, 'duration': 10.785}, {'end': 1183.396, 'text': 'but one of the results was saying how, if you look at the optimization landscape,', 'start': 1179.293, 'duration': 4.103}, {'end': 1189.279, 'text': 'the local minima that these networks tend to learn are actually of equal quality.', 'start': 1183.396, 'duration': 5.883}], 'summary': 'Networks tend to learn local minima of equal quality.', 'duration': 20.771, 'max_score': 1168.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w1168508.jpg'}, {'end': 1213.071, 'src': 'embed', 'start': 1207.564, 'weight': 3, 'content': [{'end': 1213.071, 'text': 'I also want to give a special thanks to the VC firm Amplify Partners and their support of these initial videos in the series.', 'start': 1207.564, 'duration': 5.507}], 'summary': 'Acknowledgment to vc firm amplify partners for supporting initial videos in the series', 'duration': 5.507, 'max_score': 1207.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w1207564.jpg'}], 'start': 1058.646, 'title': 'Modern image recognition networks', 'summary': 'Delves into modern image recognition networks, exploring a study involving a deep neural network achieving training accuracy on a randomly labeled dataset, and a rebuttal paper analyzing accuracy curves and optimization landscapes.', 'chapters': [{'end': 1213.071, 'start': 1058.646, 'title': 'Modern image recognition networks', 'summary': "Discusses how modern image recognition networks learn, including a study that shows a deep neural network achieving the same training accuracy on a randomly labeled dataset, raising questions about the network's learning process, and a rebuttal paper that suggests these networks are doing something smarter by analyzing accuracy curves and optimization landscapes.", 'duration': 154.425, 'highlights': ["The study demonstrates a deep neural network achieving the same training accuracy on a randomly labeled dataset, raising questions about the network's learning process and the potential for memorization over actual structural understanding.", 'A rebuttal paper at ICML suggests that modern image recognition networks are doing something smarter, as evidenced by the analysis of accuracy curves, indicating that structured datasets allow for faster attainment of accuracy levels compared to random datasets.', 'A previous paper highlights that the local minima learned by these networks are of equal quality, implying that structured datasets enable easier identification of optimal solutions.', 'The support from Patreon and the VC firm Amplify Partners is acknowledged for making the videos in the series possible.']}], 'duration': 154.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/IHZwWFHWa-w/pics/IHZwWFHWa-w1058646.jpg', 'highlights': ['A rebuttal paper at ICML suggests that modern image recognition networks are doing something smarter, as evidenced by the analysis of accuracy curves, indicating that structured datasets allow for faster attainment of accuracy levels compared to random datasets.', "The study demonstrates a deep neural network achieving the same training accuracy on a randomly labeled dataset, raising questions about the network's learning process and the potential for memorization over actual structural understanding.", 'A previous paper highlights that the local minima learned by these networks are of equal quality, implying that structured datasets enable easier identification of optimal solutions.', 'The support from Patreon and the VC firm Amplify Partners is acknowledged for making the videos in the series possible.']}], 'highlights': ['The neural network has 13,000 weights and biases for performance improvement.', "The network's structure aims to generalize to new images beyond training data.", 'The MNIST database contains tens of thousands of labeled handwritten digit images.', 'A rebuttal paper at ICML suggests that modern image recognition networks are doing something smarter, as evidenced by the analysis of accuracy curves, indicating that structured datasets allow for faster attainment of accuracy levels compared to random datasets.', 'The neural network with two hidden layers of 16 neurons achieves a classification accuracy of 98% on new images through structural tweaks.', 'The process of gradient descent converges towards a local minimum of a cost function, with the gradient indicating the fastest change.', 'The cost function involves an average over all training data, so minimizing it leads to better performance across all samples.', "The smoothness of the cost function's output is vital for finding a local minimum through little steps downhill, influencing the need for continuously ranging activations in artificial neurons.", 'The gradient vector encodes the relative importance of each weight and bias, determining which changes matter more.', "The study demonstrates a deep neural network achieving the same training accuracy on a randomly labeled dataset, raising questions about the network's learning process and the potential for memorization over actual structural understanding."]}