title
Neural Networks from Scratch - P.6 Softmax Activation

description
The what and why of the Softmax Activation function with deep learning. Neural Networks from Scratch book: https://nnfs.io Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3 Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://discord.gg/sentdex Support the content: https://pythonprogramming.net/support-donate/ Twitter: https://twitter.com/sentdex Instagram: https://instagram.com/sentdex Facebook: https://www.facebook.com/pythonprogramming.net/ Twitch: https://www.twitch.tv/sentdex #nnfs #python #neuralnetworks

detail
{'title': 'Neural Networks from Scratch - P.6 Softmax Activation', 'heatmap': [{'end': 666.145, 'start': 610.678, 'weight': 0.744}, {'end': 1879.141, 'start': 1853.468, 'weight': 1}], 'summary': 'Covers the softmax activation function for the output layer in neural network models, including its purpose, usage, and implementation using numpy for exponentiation, normalization, batch processing, and matrix reshaping. it also addresses the limitations of existing activation functions and offers solutions, providing practical insights for training neural networks.', 'chapters': [{'end': 62.2, 'segs': [{'end': 42.616, 'src': 'embed', 'start': 4.946, 'weight': 0, 'content': [{'end': 9.668, 'text': "What's going on, everybody? Welcome to part six of the Neural Networks from Scratch video series.", 'start': 4.946, 'duration': 4.722}, {'end': 10.389, 'text': 'In this video.', 'start': 9.949, 'duration': 0.44}, {'end': 14.411, 'text': "what we're going to be talking about and covering is the Softmax activation function,", 'start': 10.389, 'duration': 4.022}, {'end': 20.966, 'text': 'which is specifically used for the output layer on our classification-style neural network models.', 'start': 14.411, 'duration': 6.555}, {'end': 23.368, 'text': 'Before we get into that, a quick update.', 'start': 21.527, 'duration': 1.841}, {'end': 27.23, 'text': 'The Neural Networks from Scratch book is now fully released.', 'start': 23.508, 'duration': 3.722}, {'end': 30.212, 'text': 'We have a hardcover, which you see here, as well as a softcover.', 'start': 27.59, 'duration': 2.622}, {'end': 32.713, 'text': 'Also, the e-book now has a PDF download.', 'start': 30.612, 'duration': 2.101}, {'end': 35.835, 'text': 'All books give you access to that PDF and e-book.', 'start': 33.474, 'duration': 2.361}, {'end': 42.616, 'text': 'and we still also have the Google Doc, so you can still highlight and ask questions in line with the text and all that.', 'start': 36.375, 'duration': 6.241}], 'summary': 'Part six covers softmax activation for classification models. neural networks from scratch book is fully released with hardcover, softcover, and e-book options.', 'duration': 37.67, 'max_score': 4.946, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU4946.jpg'}], 'start': 4.946, 'title': 'Softmax activation function', 'summary': 'Covers the softmax activation function for the output layer in classification-style neural network models. the neural networks from scratch book is fully released with hardcover, softcover, and e-book options available, providing pdf download and google doc access at nnfs.io.', 'chapters': [{'end': 62.2, 'start': 4.946, 'title': 'Neural networks: softmax activation', 'summary': 'Covers the softmax activation function, used for the output layer in classification-style neural network models. the neural networks from scratch book is fully released with hardcover, softcover, and e-book options available, providing pdf download and google doc access at nnfs.io.', 'duration': 57.254, 'highlights': ['The Neural Networks from Scratch book is now fully released, offering hardcover, softcover, and e-book options with PDF download and Google Doc access at nnfs.io.', 'The chapter discusses the Softmax activation function, specifically used for the output layer in classification-style neural network models.']}], 'duration': 57.254, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU4946.jpg', 'highlights': ['The Neural Networks from Scratch book is now fully released, offering hardcover, softcover, and e-book options with PDF download and Google Doc access at nnfs.io.', 'The chapter discusses the Softmax activation function, specifically used for the output layer in classification-style neural network models.']}, {'end': 784.544, 'segs': [{'end': 244.51, 'src': 'embed', 'start': 219.588, 'weight': 2, 'content': [{'end': 225.314, 'text': 'But the problem with this is like the rectified linear activation function, for example, is exclusive.', 'start': 219.588, 'duration': 5.726}, {'end': 228.454, 'text': "per neuron, they're not really connected in any way.", 'start': 226.092, 'duration': 2.362}, {'end': 232.016, 'text': 'So there is no relative comparison that you could really fairly make.', 'start': 228.534, 'duration': 3.482}, {'end': 236.158, 'text': 'The next problem is these are unbounded.', 'start': 233.417, 'duration': 2.741}, {'end': 244.51, 'text': 'So the relative closeness can vary considerably between each sample that you pass through this neural network.', 'start': 236.318, 'duration': 8.192}], 'summary': 'Challenges with rectified linear activation function: exclusivity, lack of connection, unboundedness, and variability in relative closeness between samples.', 'duration': 24.922, 'max_score': 219.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU219588.jpg'}, {'end': 300.273, 'src': 'embed', 'start': 270.987, 'weight': 3, 'content': [{'end': 275.211, 'text': "which in this case, we're going to be using the softmax activation function to help us solve.", 'start': 270.987, 'duration': 4.224}, {'end': 281.576, 'text': "Now, why softmax? Well, we need to kind of, again, kind of consider what's our end objective here.", 'start': 275.591, 'duration': 5.985}, {'end': 289.623, 'text': "So zooming out to a full model for the moment, what actually do we want to happen? So let's say we've got some image data.", 'start': 281.896, 'duration': 7.727}, {'end': 293.866, 'text': "We're going to pass that through the neural network, and then we get the output values.", 'start': 289.663, 'duration': 4.203}, {'end': 300.273, 'text': 'Now, what do we want those values to be? So ideally, these values would be a probability distribution.', 'start': 293.966, 'duration': 6.307}], 'summary': 'Using softmax activation to achieve probability distribution in neural network.', 'duration': 29.286, 'max_score': 270.987, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU270987.jpg'}, {'end': 510.652, 'src': 'embed', 'start': 482.18, 'weight': 1, 'content': [{'end': 487.602, 'text': 'So what this does for us is it solves our negatives issue by making sure no negative,', 'start': 482.18, 'duration': 5.422}, {'end': 492.004, 'text': 'or really no value can be negative at the output of the exponential function.', 'start': 487.602, 'duration': 4.402}, {'end': 496.866, 'text': 'And it does this while not tossing away the value or the meaning of that negativity.', 'start': 492.724, 'duration': 4.142}, {'end': 498.866, 'text': "It's still on a scale, let's say.", 'start': 496.926, 'duration': 1.94}, {'end': 501.007, 'text': 'So the exponentiation of 1.1 is 3 or less.', 'start': 498.946, 'duration': 2.061}, {'end': 504.529, 'text': 'three and some change.', 'start': 503.468, 'duration': 1.061}, {'end': 510.652, 'text': 'the exponentiation of negative 1.1 is 0.3329 or so.', 'start': 504.529, 'duration': 6.123}], 'summary': 'Exponential function prevents negative values, e.g. 1.1^3=3, -1.1^0.3329', 'duration': 28.472, 'max_score': 482.18, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU482180.jpg'}, {'end': 666.145, 'src': 'heatmap', 'start': 610.678, 'weight': 0.744, 'content': [{'end': 612.358, 'text': 'Checking the book to make sure it is identical.', 'start': 610.678, 'duration': 1.68}, {'end': 613.279, 'text': 'And it looks like it is.', 'start': 612.619, 'duration': 0.66}, {'end': 614.687, 'text': 'Okay, great.', 'start': 613.786, 'duration': 0.901}, {'end': 616.148, 'text': "So that's exponentiation.", 'start': 614.947, 'duration': 1.201}, {'end': 618.09, 'text': 'Really, not much to it at that point.', 'start': 616.249, 'duration': 1.841}, {'end': 624.136, 'text': "So the next step, once we've exponentiated these values, is to normalize the values.", 'start': 618.831, 'duration': 5.305}, {'end': 626.359, 'text': 'So what do we mean by normalize the values?', 'start': 624.176, 'duration': 2.183}, {'end': 635.248, 'text': "So, in our case, it's going to be a single output neuron's value, divided by the sum of all of the other output neurons in that output layer.", 'start': 626.699, 'duration': 8.549}, {'end': 639.937, 'text': 'And this gives us the probability distribution that we want.', 'start': 636.013, 'duration': 3.924}, {'end': 647.424, 'text': 'But we still want to exponentiate before this point, because again we need to get rid of all of these negative values,', 'start': 640.537, 'duration': 6.887}, {'end': 651.368, 'text': 'but we do not want to lose the meaning of the negative value.', 'start': 647.424, 'duration': 3.944}, {'end': 658.395, 'text': "So we're exponentiating to convert negatives to positives without actually losing the meaning of a negative value.", 'start': 651.488, 'duration': 6.907}, {'end': 666.145, 'text': "So continuing along in our raw Python implementation here, let's go ahead and code in normalization.", 'start': 660.441, 'duration': 5.704}], 'summary': 'Exponentiate and normalize values to achieve probability distribution in python implementation.', 'duration': 55.467, 'max_score': 610.678, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU610678.jpg'}, {'end': 647.424, 'src': 'embed', 'start': 618.831, 'weight': 0, 'content': [{'end': 624.136, 'text': "So the next step, once we've exponentiated these values, is to normalize the values.", 'start': 618.831, 'duration': 5.305}, {'end': 626.359, 'text': 'So what do we mean by normalize the values?', 'start': 624.176, 'duration': 2.183}, {'end': 635.248, 'text': "So, in our case, it's going to be a single output neuron's value, divided by the sum of all of the other output neurons in that output layer.", 'start': 626.699, 'duration': 8.549}, {'end': 639.937, 'text': 'And this gives us the probability distribution that we want.', 'start': 636.013, 'duration': 3.924}, {'end': 647.424, 'text': 'But we still want to exponentiate before this point, because again we need to get rid of all of these negative values,', 'start': 640.537, 'duration': 6.887}], 'summary': 'Exponentiate values, then normalize them for probability distribution.', 'duration': 28.593, 'max_score': 618.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU618831.jpg'}], 'start': 62.2, 'title': 'Activation functions in neural networks', 'summary': 'Introduces the softmax activation function, explaining its purpose and usage in training neural networks, and discusses the limitations of using accuracy to measure model performance, highlighting challenges with existing activation functions and the solution provided by exponentiation and normalization.', 'chapters': [{'end': 163.228, 'start': 62.2, 'title': 'Softmax activation function explained', 'summary': 'Introduces the softmax activation function, explaining its purpose and usage in training neural networks, using hypothetical output values and demonstrating the process of prediction and training.', 'duration': 101.028, 'highlights': ['The chapter explains the purpose of the softmax activation function in training neural networks and its usage in predicting the output values, using the example of hypothetical output values 4.8, 1.21, 2.385.', 'The instructor emphasizes the need for understanding the process of training neural networks, highlighting the complexity involved and the extensive time required to comprehend and learn the training process.', 'The instructor discusses the process of predicting output values using the softmax activation function, illustrating the determination of the largest value and its corresponding index as the prediction.', 'The chapter presents the concept of starting fresh with a new script for illustration purposes before returning to the previously built code for further development, emphasizing the step-by-step framework building approach.']}, {'end': 784.544, 'start': 163.768, 'title': 'Neural network activation functions', 'summary': 'Discusses the limitations of using accuracy to measure model performance and explains the need for a new activation function, highlighting the challenges with rectified linear and linear activation functions and the solution provided by exponentiation and normalization.', 'duration': 620.776, 'highlights': ['The need for a new activation function is explained due to the limitations of accuracy in measuring model performance, highlighting the challenges with rectified linear and linear activation functions.', 'Exponentiation is introduced as a solution to the negativity issue in the output values, ensuring that no value can be negative at the output without losing its meaning.', 'The process of normalization is detailed, which involves dividing each exponentiated value by the sum of all the other output values to obtain the desired probability distribution.']}], 'duration': 722.344, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU62200.jpg', 'highlights': ['The process of normalization is detailed, involving dividing each exponentiated value by the sum of all other output values to obtain the desired probability distribution.', 'Exponentiation is introduced as a solution to the negativity issue in the output values, ensuring that no value can be negative at the output without losing its meaning.', 'The need for a new activation function is explained due to the limitations of accuracy in measuring model performance, highlighting the challenges with rectified linear and linear activation functions.', 'The chapter explains the purpose of the softmax activation function in training neural networks and its usage in predicting the output values, using the example of hypothetical output values 4.8, 1.21, 2.385.']}, {'end': 1005.588, 'segs': [{'end': 863.819, 'src': 'embed', 'start': 810.571, 'weight': 1, 'content': [{'end': 819.098, 'text': 'And rather than doing really all of this, we delete and exp values just becomes np.exp.', 'start': 810.571, 'duration': 8.527}, {'end': 826.259, 'text': 'And what are we applying this exponential function to? Layer outputs.', 'start': 821.312, 'duration': 4.947}, {'end': 829.743, 'text': 'So, by default, typically NumPy functions.', 'start': 826.659, 'duration': 3.084}, {'end': 840.633, 'text': "what they're going to do is first, they will just by default, impact every value, And if you want it to be in a little bit more specific way,", 'start': 829.743, 'duration': 10.89}, {'end': 843.754, 'text': "you can become more specific, and we'll actually be showing that very shortly.", 'start': 840.633, 'duration': 3.121}, {'end': 849.875, 'text': "But by default, if you just do this, it's going to apply this to each value in total.", 'start': 844.874, 'duration': 5.001}, {'end': 852.616, 'text': "So that's a quicker way to get our exponential values.", 'start': 850.496, 'duration': 2.12}, {'end': 855.777, 'text': 'And then for the normalization values.', 'start': 853.036, 'duration': 2.741}, {'end': 863.819, 'text': "all we need to do at this point is, we're just going to say, norm underscore values is equal to the exp.", 'start': 855.777, 'duration': 8.042}], 'summary': 'Using np.exp for exponential function on layer outputs and normalization values in numpy functions.', 'duration': 53.248, 'max_score': 810.571, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU810571.jpg'}, {'end': 935.662, 'src': 'embed', 'start': 890.489, 'weight': 0, 'content': [{'end': 893.791, 'text': "These values actually do line up with what is in the book, but let's just run that one more time.", 'start': 890.489, 'duration': 3.302}, {'end': 896.772, 'text': 'And so if you did have any variance there, hopefully that would change it.', 'start': 894.111, 'duration': 2.661}, {'end': 900.413, 'text': 'Okay, so that lines up exactly.', 'start': 898.513, 'duration': 1.9}, {'end': 902.314, 'text': "And as you can see, it's a little shorter code.", 'start': 900.553, 'duration': 1.761}, {'end': 904.275, 'text': "I think it's a little more legible.", 'start': 902.474, 'duration': 1.801}, {'end': 911.318, 'text': 'But yeah, so that is our exponentiation.', 'start': 905.696, 'duration': 5.622}, {'end': 913.404, 'text': 'then our normalization.', 'start': 912.063, 'duration': 1.341}, {'end': 920.37, 'text': 'so, to sum up everything up to this point, for our coding of the softmax activation function we have input right,', 'start': 913.404, 'duration': 6.966}, {'end': 926.294, 'text': "which is actually going to be the output layer of data that we're going to input into this, this activation function.", 'start': 920.37, 'duration': 5.924}, {'end': 931.899, 'text': 'we exponentiate those, those input values, so each one uniquely gets exponentiated.', 'start': 926.294, 'duration': 5.605}, {'end': 935.662, 'text': 'then we normalize and then that becomes our output.', 'start': 931.899, 'duration': 3.763}], 'summary': 'Exponentiate and normalize input data for softmax activation function.', 'duration': 45.173, 'max_score': 890.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU890489.jpg'}, {'end': 985.623, 'src': 'embed', 'start': 959.508, 'weight': 3, 'content': [{'end': 964.051, 'text': 'Okay, so at this point we know everything that goes into the softmax activation function,', 'start': 959.508, 'duration': 4.543}, {'end': 973.636, 'text': 'and now it is just a function of actually applying and implementing this in such a way that makes sense for our actual neural network applications.', 'start': 964.051, 'duration': 9.585}, {'end': 982.882, 'text': "So the main issue that we have at this stage is we are currently working with a single kind of vector here of a layer's outputs,", 'start': 974.357, 'duration': 8.525}, {'end': 985.623, 'text': "when really we're not going to have a single layer of outputs.", 'start': 982.882, 'duration': 2.741}], 'summary': 'Implementing softmax for multi-layer outputs is the main challenge.', 'duration': 26.115, 'max_score': 959.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU959508.jpg'}], 'start': 784.564, 'title': 'Numpy conversion and exponential function application', 'summary': 'Covers the conversion of code to numpy, application of np.exp, exponentiation, normalization, and coding softmax activation function for neural networks.', 'chapters': [{'end': 863.819, 'start': 784.564, 'title': 'Numpy conversion and exponential function application', 'summary': 'Covers the conversion of code to numpy and the application of the exponential function np.exp to layer outputs for quicker computation, with default impact on every value.', 'duration': 79.255, 'highlights': ['The chapter covers the conversion of code to NumPy and the application of the exponential function np.exp to layer outputs for quicker computation, with default impact on every value.', 'NumPy functions impact every value by default, making it a quicker way to get exponential values.', 'Application of np.exp to layer outputs allows for quicker computation.', 'By default, NumPy functions apply to each value in total.']}, {'end': 913.404, 'start': 865.252, 'title': 'Exponentiation and normalization', 'summary': 'Covers the process of exponentiation and normalization, demonstrating its alignment with the book and the potential for variance, offering a more legible and concise code for the process.', 'duration': 48.152, 'highlights': ['The process of exponentiation and normalization aligns with the book, providing accurate and concise results.', 'Demonstrating potential variance in the process, emphasizing the need for accuracy and consistency in the results.', 'The code for exponentiation and normalization is presented as more legible and concise, enhancing its comprehensibility and usability.']}, {'end': 1005.588, 'start': 913.404, 'title': 'Softmax activation function for neural networks', 'summary': 'Explains the process of coding the softmax activation function, which involves exponentiating and normalizing the input values to produce the output, and the challenge of adapting this process to work with batches of inputs and outputs in neural network applications.', 'duration': 92.184, 'highlights': ['The process of coding the softmax activation function involves exponentiating and normalizing the input values to produce the output.', 'Adapting the softmax activation function to work with batches of inputs and outputs is a challenge in neural network applications.']}], 'duration': 221.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU784564.jpg', 'highlights': ['The process of coding the softmax activation function involves exponentiating and normalizing the input values to produce the output.', 'The chapter covers the conversion of code to NumPy and the application of the exponential function np.exp to layer outputs for quicker computation, with default impact on every value.', 'Demonstrating potential variance in the process, emphasizing the need for accuracy and consistency in the results.', 'Adapting the softmax activation function to work with batches of inputs and outputs is a challenge in neural network applications.', 'The process of exponentiation and normalization aligns with the book, providing accurate and concise results.', 'The code for exponentiation and normalization is presented as more legible and concise, enhancing its comprehensibility and usability.', 'NumPy functions impact every value by default, making it a quicker way to get exponential values.', 'Application of np.exp to layer outputs allows for quicker computation.', 'By default, NumPy functions apply to each value in total.']}, {'end': 1247.303, 'segs': [{'end': 1092.764, 'src': 'embed', 'start': 1051.057, 'weight': 1, 'content': [{'end': 1053.153, 'text': 'So How do we convert to batch?', 'start': 1051.057, 'duration': 2.096}, {'end': 1058.855, 'text': "So for exponentiating it turns out we don't actually need to do anything,", 'start': 1053.753, 'duration': 5.102}, {'end': 1066.097, 'text': 'because these NumPy functions here work by default at the individual value level.', 'start': 1058.855, 'duration': 7.242}, {'end': 1073.179, 'text': 'So if we print x values, I thought I did, I thought I auto-completed that.', 'start': 1066.597, 'duration': 6.582}, {'end': 1073.699, 'text': "I guess I didn't.", 'start': 1073.199, 'duration': 0.5}, {'end': 1075.88, 'text': 'There we have our values.', 'start': 1074.88, 'duration': 1}, {'end': 1078.661, 'text': "It's actually already done for us and it's correct.", 'start': 1075.98, 'duration': 2.681}, {'end': 1092.764, 'text': "So the next question is, okay, how do we do this step? So exp values, we don't need to change anything there for a batch, but for sum, we do.", 'start': 1079.382, 'duration': 13.382}], 'summary': 'Numpy functions work at individual value level, no need to change for exponentiating; sum needs modification for batch', 'duration': 41.707, 'max_score': 1051.057, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1051057.jpg'}, {'end': 1171.413, 'src': 'embed', 'start': 1114.688, 'weight': 0, 'content': [{'end': 1117.43, 'text': 'np.sum layer outputs.', 'start': 1114.688, 'duration': 2.742}, {'end': 1124.693, 'text': 'Now remember, what do we wanna do here? We actually want to iterate over here, over this 2D matrix.', 'start': 1117.93, 'duration': 6.763}, {'end': 1131.276, 'text': 'We wanna iterate over this and do this sum, this sum, this sum, and this sum.', 'start': 1125.453, 'duration': 5.823}, {'end': 1136.118, 'text': "But by default, I already told you, it's gonna do as individual values.", 'start': 1131.716, 'duration': 4.402}, {'end': 1140.901, 'text': 'So by default, what we get is a single scalar value.', 'start': 1136.519, 'duration': 4.382}, {'end': 1142.722, 'text': "That's not what we wanted.", 'start': 1141.801, 'duration': 0.921}, {'end': 1144.283, 'text': 'We really want three values.', 'start': 1142.962, 'duration': 1.321}, {'end': 1145.965, 'text': 'For sure we want three values.', 'start': 1144.824, 'duration': 1.141}, {'end': 1155.733, 'text': 'So how do we get those three values? So the first order of business is to pass the axes parameter here.', 'start': 1146.625, 'duration': 9.108}, {'end': 1157.334, 'text': "So we're going to say axes.", 'start': 1156.033, 'duration': 1.301}, {'end': 1160.537, 'text': 'And the axes by default is actually none.', 'start': 1158.115, 'duration': 2.422}, {'end': 1163.039, 'text': 'And that gives us the same value that we saw before.', 'start': 1161.217, 'duration': 1.822}, {'end': 1171.413, 'text': 'Axis 0, to put it extremely simply on a 2D matrix, is going to be the sum of columns.', 'start': 1164.345, 'duration': 7.068}], 'summary': 'Iterate over 2d matrix, obtain 3 values by using axes parameter.', 'duration': 56.725, 'max_score': 1114.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1114688.jpg'}, {'end': 1247.303, 'src': 'embed', 'start': 1225.046, 'weight': 3, 'content': [{'end': 1233.793, 'text': "So it's for the same reason that we want to make sure values are lining up when we do the dot product and when we're doing like a matrix product right?", 'start': 1225.046, 'duration': 8.747}, {'end': 1240.859, 'text': "So, even though np.dot it is a dot product with vectors, but it's also doing a matrix product for us, we need those values to line up.", 'start': 1234.294, 'duration': 6.565}, {'end': 1242.561, 'text': 'so we do a transpose right?', 'start': 1240.859, 'duration': 1.702}, {'end': 1247.303, 'text': 'For that same reason we need the right values to line up here.', 'start': 1244.022, 'duration': 3.281}], 'summary': 'Ensuring values line up for dot and matrix products.', 'duration': 22.257, 'max_score': 1225.046, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1225046.jpg'}], 'start': 1005.908, 'title': 'Numpy batch processing and sum layer', 'summary': 'Explores batch processing with numpy functions, demonstrating handling of exponential and sum operations with specific numerical values. it also explains np.sum layer outputs and the use of the axes parameter for summing rows and columns in a 2d matrix, emphasizing the need for values to line up in operations like division and matrix product.', 'chapters': [{'end': 1113.488, 'start': 1005.908, 'title': 'Batch processing with numpy', 'summary': 'Explores the process of converting individual values to batch processing using numpy functions, showcasing the ease of handling exponential and sum operations with specific numerical values, such as 8.9, -1.81, 0.2, 1.41, 1.051, and 0.026.', 'duration': 107.58, 'highlights': ['The NumPy functions work by default at the individual value level, exemplified by the effortless handling of exponentiating specific numerical values like 8.9, -1.81, and 0.2.', 'The process of batch conversion for sum operations is highlighted, emphasizing the ease of visually adding up specific numerical values such as 1.41, 1.051, and 0.026.', "The unnecessary presence of 'math' and 'math.e' is identified and removed, streamlining the processing steps for batch conversion using NumPy."]}, {'end': 1247.303, 'start': 1114.688, 'title': 'Numpy sum layer and axes parameter', 'summary': 'Explains the np.sum layer outputs and the use of the axes parameter to get the sum of rows and columns in a 2d matrix, emphasizing the need for values to line up in operations like division and matrix product.', 'duration': 132.615, 'highlights': ['The axes parameter is used to specify the axis along which the sum should be calculated, allowing us to get the sum of rows and columns in a 2D matrix.', 'Emphasizes the importance of ensuring values line up when performing operations like division and matrix product to avoid mismatched calculations.', 'Describes the default behavior of np.sum, which returns a single scalar value, and the need to use axes parameter to obtain multiple values.', 'Explains the significance of using transpose to ensure the right values line up for operations like matrix product.']}], 'duration': 241.395, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1005908.jpg', 'highlights': ['The axes parameter specifies the axis for sum calculation in a 2D matrix', 'NumPy functions handle exponentiation of specific numerical values effortlessly', 'Batch conversion for sum operations is streamlined using NumPy', 'Importance of ensuring values line up in operations like division and matrix product', 'Default behavior of np.sum returns a single scalar value']}, {'end': 2005.065, 'segs': [{'end': 1295.818, 'src': 'embed', 'start': 1247.623, 'weight': 1, 'content': [{'end': 1257.605, 'text': "Right now, if we did this, it's not going to actually be dividing what we want exponentials to be divided by.", 'start': 1247.623, 'duration': 9.982}, {'end': 1262.386, 'text': 'So what we want to do is we need to shape this correctly.', 'start': 1258.905, 'duration': 3.481}, {'end': 1268.907, 'text': 'And we could reshape this, but we can also just simply use keep dims.', 'start': 1263.166, 'duration': 5.741}, {'end': 1273.628, 'text': 'So keep, is it keep dims? Yeah, all one word, and then true.', 'start': 1269.567, 'duration': 4.061}, {'end': 1285.008, 'text': "run that and now it is a matrix of the exact same, I hate to say shape, but it's the same orientation?", 'start': 1274.758, 'duration': 10.25}, {'end': 1286.489, 'text': "It's the same dimensions, okay?", 'start': 1285.108, 'duration': 1.381}, {'end': 1288.792, 'text': "So it's just a sum.", 'start': 1287.37, 'duration': 1.422}, {'end': 1295.818, 'text': "so now it is literally this right here it's just the sum of these values, the sum of these values and then the sum of these values.", 'start': 1288.792, 'duration': 7.026}], 'summary': 'Reshape the matrix using keepdims=true for correct division.', 'duration': 48.195, 'max_score': 1247.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1247623.jpg'}, {'end': 1408.836, 'src': 'embed', 'start': 1383.633, 'weight': 3, 'content': [{'end': 1396.757, 'text': 'So one way to combat this overflow is to take all of the values in this output layer prior to exponentiation and subtract the largest value in that layer from all of the values in that layer.', 'start': 1383.633, 'duration': 13.124}, {'end': 1403.926, 'text': 'And what this causes is now the largest value will be zero and everything else is going to be less than zero.', 'start': 1396.777, 'duration': 7.149}, {'end': 1408.836, 'text': 'Now, because the largest value prior to exponentiation is actually a zero,', 'start': 1404.594, 'duration': 4.242}], 'summary': 'Subtracting the largest value from the output layer prior to exponentiation causes the largest value to become zero and all other values to become less than zero.', 'duration': 25.203, 'max_score': 1383.633, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1383633.jpg'}, {'end': 1456.01, 'src': 'embed', 'start': 1428.66, 'weight': 4, 'content': [{'end': 1440.243, 'text': 'So the final concern that you might have is what sort of impact does subtracting the max value from everything have on the actual output of the softmax activation function?', 'start': 1428.66, 'duration': 11.583}, {'end': 1446.865, 'text': "So, all other things being equal, and assuming that we don't have some sort of overflow error,", 'start': 1440.343, 'duration': 6.522}, {'end': 1456.01, 'text': "if we have two output layers and one we don't subtract the max from, and one we do after we do our exponentiation and our normalization,", 'start': 1446.865, 'duration': 9.145}], 'summary': 'Subtracting max value affects softmax output, assuming no overflow errors.', 'duration': 27.35, 'max_score': 1428.66, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1428660.jpg'}, {'end': 1490.373, 'src': 'embed', 'start': 1468.478, 'weight': 0, 'content': [{'end': 1483.348, 'text': "we are going to now go to the code that we left off on with part five and we're going to add our softmax activation class and then we're going to implement it here as well as change things here.", 'start': 1468.478, 'duration': 14.87}, {'end': 1486.61, 'text': "I've used different variable names and stuff like that.", 'start': 1483.868, 'duration': 2.742}, {'end': 1490.373, 'text': "So I'm going to attempt to convert this to be exactly what we have in the book as well.", 'start': 1486.95, 'duration': 3.423}], 'summary': 'Implementing softmax activation in code from part five.', 'duration': 21.895, 'max_score': 1468.478, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1468478.jpg'}, {'end': 1879.141, 'src': 'heatmap', 'start': 1853.468, 'weight': 1, 'content': [{'end': 1858.273, 'text': 'And at this point, because we do the forward, we now have the output, which is going to be probabilities.', 'start': 1853.468, 'duration': 4.805}, {'end': 1866.041, 'text': "So for example, we can print activation2.output, and let's just do the, there should be 300 of them.", 'start': 1858.534, 'duration': 7.507}, {'end': 1868.964, 'text': "So we're going to just do the first five.", 'start': 1866.582, 'duration': 2.382}, {'end': 1870.734, 'text': "Let's go ahead and run that.", 'start': 1869.793, 'duration': 0.941}, {'end': 1875.978, 'text': 'And what we get is, again, this is a batch.', 'start': 1871.755, 'duration': 4.223}, {'end': 1879.141, 'text': 'So we passed all of them at the same time in this case.', 'start': 1876.659, 'duration': 2.482}], 'summary': 'The output consists of 300 probabilities in a batch.', 'duration': 25.673, 'max_score': 1853.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1853468.jpg'}], 'start': 1247.623, 'title': 'Matrix reshaping and softmax activation', 'summary': "Discusses reshaping matrices and summing their values using 'keep dims' function and explains implementing softmax activation function to handle exponential value overflow, and demonstrates the code for a model with three output classes.", 'chapters': [{'end': 1295.818, 'start': 1247.623, 'title': 'Reshaping and summing matrix', 'summary': "Discusses reshaping a matrix and calculating the sum of its values, using the 'keep dims' function to maintain the matrix dimensions, resulting in an output that represents the sum of the original matrix values.", 'duration': 48.195, 'highlights': ["The 'keep dims' function is used to maintain the dimensions of the matrix when reshaping and summing, ensuring the output reflects the original matrix dimensions.", 'The process involves reshaping the matrix and then calculating the sum of its values, resulting in a matrix representing the sum of the original values.']}, {'end': 2005.065, 'start': 1296.815, 'title': 'Implementing softmax activation', 'summary': 'Covers the implementation of the softmax activation function, addressing the problem of overflow in exponential values, and demonstrates the code for creating a softmax activation class and testing it on a model with three output classes.', 'duration': 708.25, 'highlights': ['The softmax activation function is applied to prevent overflow in exponential values by subtracting the largest value in the output layer from all values, resulting in a range of values between zero and one.', 'The impact of subtracting the max value on the output of the softmax activation function is minimal, as it does not affect the actual output when compared to not subtracting the max value.', 'The code demonstrates the implementation of the softmax activation class and its integration into a model with three output classes, showcasing the forward method for the softmax activation and testing the output probabilities.']}], 'duration': 757.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/omz_NdFgWyU/pics/omz_NdFgWyU1247623.jpg', 'highlights': ['The code demonstrates the implementation of the softmax activation class and its integration into a model with three output classes, showcasing the forward method for the softmax activation and testing the output probabilities.', "The 'keep dims' function is used to maintain the dimensions of the matrix when reshaping and summing, ensuring the output reflects the original matrix dimensions.", 'The process involves reshaping the matrix and then calculating the sum of its values, resulting in a matrix representing the sum of the original values.', 'The softmax activation function is applied to prevent overflow in exponential values by subtracting the largest value in the output layer from all values, resulting in a range of values between zero and one.', 'The impact of subtracting the max value on the output of the softmax activation function is minimal, as it does not affect the actual output when compared to not subtracting the max value.']}], 'highlights': ['The Neural Networks from Scratch book is now fully released, offering hardcover, softcover, and e-book options at nnfs.io.', 'The chapter discusses the Softmax activation function, specifically used for the output layer in classification-style neural network models.', 'The process of coding the softmax activation function involves exponentiating and normalizing the input values to produce the output.', 'The process of normalization is detailed, involving dividing each exponentiated value by the sum of all other output values to obtain the desired probability distribution.', 'Exponentiation is introduced as a solution to the negativity issue in the output values, ensuring that no value can be negative at the output without losing its meaning.', 'The need for a new activation function is explained due to the limitations of accuracy in measuring model performance, highlighting the challenges with rectified linear and linear activation functions.', 'The chapter explains the purpose of the softmax activation function in training neural networks and its usage in predicting the output values, using the example of hypothetical output values 4.8, 1.21, 2.385.', 'The process of exponentiation and normalization aligns with the book, providing accurate and concise results.', 'The code for exponentiation and normalization is presented as more legible and concise, enhancing its comprehensibility and usability.', "The 'keep dims' function is used to maintain the dimensions of the matrix when reshaping and summing, ensuring the output reflects the original matrix dimensions.", 'The softmax activation function is applied to prevent overflow in exponential values by subtracting the largest value in the output layer from all values, resulting in a range of values between zero and one.', 'The impact of subtracting the max value on the output of the softmax activation function is minimal, as it does not affect the actual output when compared to not subtracting the max value.']}