title
Neural Networks from Scratch - P.5 Hidden Layer Activation Functions
description
Neural Networks from Scratch book, access the draft now: https://nnfs.io
NNFSiX Github: https://github.com/Sentdex/NNfSiX
Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3
Spiral data function: https://gist.github.com/Sentdex/454cb20ec5acf0e76ee8ab8448e6266c
Python 3 basics: https://pythonprogramming.net/introduction-learn-python-3-tutorials/
Intermediate Python (w/ OOP): https://pythonprogramming.net/introduction-intermediate-python-tutorial/
Mug link for fellow mug aficionados: https://amzn.to/3bvkZ6B
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Instagram: https://instagram.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
#nnfs #python #neuralnetworks
detail
{'title': 'Neural Networks from Scratch - P.5 Hidden Layer Activation Functions', 'heatmap': [{'end': 2215.806, 'start': 2184.168, 'weight': 1}], 'summary': 'Explores various activation functions in neural networks, highlighting the impact of step, sigmoid, and relu functions on training reliability and output granularity, while emphasizing the advantages of using rectified linear activation functions over sigmoid due to its fast calculation, granular optimization, and addressing the vanishing gradient problem.', 'chapters': [{'end': 306.825, 'segs': [{'end': 95.302, 'src': 'embed', 'start': 50.431, 'weight': 3, 'content': [{'end': 51.452, 'text': "So let's get into that.", 'start': 50.431, 'duration': 1.021}, {'end': 55.514, 'text': 'So for an activation function, you can use a variety of different functions.', 'start': 51.852, 'duration': 3.662}, {'end': 58.356, 'text': "In this case, we're just going to look at a step function to start.", 'start': 55.554, 'duration': 2.802}, {'end': 60.277, 'text': 'So a step function is very simple.', 'start': 58.416, 'duration': 1.861}, {'end': 66.441, 'text': 'The idea is if your input to this function is greater than zero, then the output will be a one.', 'start': 60.737, 'duration': 5.704}, {'end': 69.963, 'text': 'Otherwise the output is going to be a zero.', 'start': 66.961, 'duration': 3.002}, {'end': 72.904, 'text': "So that's all there is to this step function.", 'start': 70.203, 'duration': 2.701}, {'end': 77.047, 'text': "And now let's consider using this step function as an activation function.", 'start': 73.225, 'duration': 3.822}, {'end': 81.374, 'text': 'So now taking the step function and using it as an activation function.', 'start': 77.492, 'duration': 3.882}, {'end': 89.519, 'text': 'the way that this works in your entire neural network is each neuron in the hidden layers and the output layers will have this activation function.', 'start': 81.374, 'duration': 8.145}, {'end': 95.302, 'text': 'And this comes into play after you do the inputs times the weights plus the bias.', 'start': 89.659, 'duration': 5.643}], 'summary': 'Step function used as activation in neural network for binary output', 'duration': 44.871, 'max_score': 50.431, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A50431.jpg'}, {'end': 162.579, 'src': 'embed', 'start': 138.379, 'weight': 2, 'content': [{'end': 146.426, 'text': 'So on the macro level, again, every neuron in your hidden layers as well as your output layer is going to have an activation function associated.', 'start': 138.379, 'duration': 8.047}, {'end': 151.49, 'text': "Now what's most common and what you'll see and soon learn why, generally,", 'start': 146.826, 'duration': 4.664}, {'end': 155.653, 'text': "the output layer is going to have a different activation function than what you're using in your hidden layers.", 'start': 151.49, 'duration': 4.163}, {'end': 162.579, 'text': "But for now, we're just kind of showing an example of a variety of step functions being used as activation functions in this network.", 'start': 155.974, 'duration': 6.605}], 'summary': 'Neural network layers have activation functions, with output layer often having different function than hidden layers.', 'duration': 24.2, 'max_score': 138.379, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A138379.jpg'}, {'end': 206.207, 'src': 'embed', 'start': 178.651, 'weight': 0, 'content': [{'end': 185.715, 'text': 'it quickly became obvious that using something like a sigmoid activation function was a little more easy to train a neural network,', 'start': 178.651, 'duration': 7.064}, {'end': 190.478, 'text': 'or a little more reliable for training a neural network, due to the granularity of the output.', 'start': 185.715, 'duration': 4.763}, {'end': 197.476, 'text': 'So again, the sigmoid activation function comes in after you do the inputs times the weights plus the bias.', 'start': 191.092, 'duration': 6.384}, {'end': 206.207, 'text': 'The difference of the sigmoid activation function to the step function is that we get a more granular output from this function,', 'start': 198.196, 'duration': 8.011}], 'summary': 'Sigmoid activation function is easier to train and provides a more granular output for neural networks.', 'duration': 27.556, 'max_score': 178.651, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A178651.jpg'}, {'end': 276.605, 'src': 'embed', 'start': 246.588, 'weight': 1, 'content': [{'end': 249.771, 'text': 'or if it outputs a zero, how close were we to outputting one?', 'start': 246.588, 'duration': 3.183}, {'end': 253.57, 'text': "it's not going to be known to the optimizer that that won't be known.", 'start': 249.771, 'duration': 3.799}, {'end': 256.752, 'text': 'So that is why something like a sigmoid,', 'start': 254.051, 'duration': 2.701}, {'end': 263.156, 'text': 'where you have a more granular output that becomes more preferable and more reliable to training an entire neural network.', 'start': 256.752, 'duration': 6.404}, {'end': 266.478, 'text': 'This brings us to the rectified linear unit activation function.', 'start': 263.577, 'duration': 2.901}, {'end': 268.9, 'text': 'This function is extremely simple.', 'start': 267.079, 'duration': 1.821}, {'end': 272.662, 'text': 'If x is greater than zero, the output is whatever x is.', 'start': 269.02, 'duration': 3.642}, {'end': 276.605, 'text': 'If x is less than or equal to zero, the output is zero.', 'start': 273.103, 'duration': 3.502}], 'summary': 'Comparison of activation functions: sigmoid vs. rectified linear unit with emphasis on their outputs and reliability in training neural networks.', 'duration': 30.017, 'max_score': 246.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A246588.jpg'}], 'start': 5.049, 'title': 'Neural network activation functions', 'summary': 'Delves into various activation functions in neural networks, emphasizing the role and impact of step, sigmoid, and relu functions on training reliability and output granularity.', 'chapters': [{'end': 155.653, 'start': 5.049, 'title': 'Neural networks activation functions', 'summary': "Discusses the activation functions, specifically focusing on the step function and its role in neural networks, emphasizing its output as either 0 or 1 and its impact on the next neurons' input.", 'duration': 150.604, 'highlights': ['The step function is discussed as an activation function in neural networks, producing an output of either 0 or 1 based on the input being greater than zero, and this output becomes the input to the next neurons.', "The activation function is applied to each neuron in the hidden layers and output layers after the inputs times the weights plus the bias, resulting in a binary output that impacts the next neurons' input.", 'The importance of using different activation functions for hidden layers and output layers in neural networks is mentioned as a common practice.']}, {'end': 306.825, 'start': 155.974, 'title': 'Neural network activation functions', 'summary': 'Discusses the significance of different activation functions in neural networks, including the sigmoid and rectified linear unit (relu) functions, highlighting their impact on training reliability and granularity of outputs.', 'duration': 150.851, 'highlights': ['The sigmoid activation function is more reliable for training a neural network due to the granularity of the output.', 'The rectified linear unit (ReLU) activation function provides granular output and allows for weights and biases to impact the output.', "The step function's lack of granularity limits its reliability for training an entire neural network."]}], 'duration': 301.776, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A5049.jpg', 'highlights': ['The sigmoid activation function is more reliable for training a neural network due to the granularity of the output.', 'The rectified linear unit (ReLU) activation function provides granular output and allows for weights and biases to impact the output.', 'The importance of using different activation functions for hidden layers and output layers in neural networks is mentioned as a common practice.', "The activation function is applied to each neuron in the hidden layers and output layers after the inputs times the weights plus the bias, resulting in a binary output that impacts the next neurons' input.", 'The step function is discussed as an activation function in neural networks, producing an output of either 0 or 1 based on the input being greater than zero, and this output becomes the input to the next neurons.']}, {'end': 755.81, 'segs': [{'end': 341.99, 'src': 'embed', 'start': 307.465, 'weight': 0, 'content': [{'end': 314.027, 'text': 'Now, why might we use the rectified linear activation function over something like sigmoid, which also produces this granular output?', 'start': 307.465, 'duration': 6.562}, {'end': 320.37, 'text': 'Well, the sigmoid activation function has an issue referred to as the vanishing gradient problem,', 'start': 314.667, 'duration': 5.703}, {'end': 322.75, 'text': "which won't really make much sense until we get to gradient.", 'start': 320.37, 'duration': 2.38}, {'end': 324.131, 'text': 'just know, it has a problem.', 'start': 322.75, 'duration': 1.381}, {'end': 332.739, 'text': "And from there, there's really two main reasons why we use rectified linear, obviously because it's granular so we can still optimize well with it.", 'start': 325.109, 'duration': 7.63}, {'end': 339.427, 'text': "But the two main reasons why we use rectified linear are it's fast because it's a very simple calculation.", 'start': 333.139, 'duration': 6.288}, {'end': 341.99, 'text': 'So sigmoid function is not super complicated.', 'start': 339.507, 'duration': 2.483}], 'summary': 'Rectified linear activation function is fast and granular, avoiding vanishing gradient problem of sigmoid.', 'duration': 34.525, 'max_score': 307.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A307465.jpg'}, {'end': 508.327, 'src': 'embed', 'start': 476.942, 'weight': 2, 'content': [{'end': 483.524, 'text': "It's just those neurons, rather than a linear activation function, are using a rectified linear activation function instead.", 'start': 476.942, 'duration': 6.582}, {'end': 490.386, 'text': "And, as you can see, while it's not perfect because the network is quite small,", 'start': 484.364, 'duration': 6.022}, {'end': 494.507, 'text': "it's a much better fit than the linear activation function than we were just looking at.", 'start': 490.386, 'duration': 4.121}, {'end': 499.785, 'text': "And this is where most people would stop and just say, hey, look, this is how it works and that's that.", 'start': 495.424, 'duration': 4.361}, {'end': 508.327, 'text': 'But the question I had as I was making this video was okay, but why does it work?', 'start': 500.525, 'duration': 7.802}], 'summary': 'Using rectified linear activation function provides a better fit than linear activation function in neural networks, prompting the question of why it works.', 'duration': 31.385, 'max_score': 476.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A476942.jpg'}, {'end': 551.278, 'src': 'embed', 'start': 526.048, 'weight': 3, 'content': [{'end': 535.75, 'text': 'but yet that little itty, bitty bit of that rectified kind of clipping at zero is exactly what makes it powerful.', 'start': 526.048, 'duration': 9.702}, {'end': 543.412, 'text': "as powerful as a sigmoid activation function super fast, but this is what makes it work and it's so cool.", 'start': 535.75, 'duration': 7.662}, {'end': 544.952, 'text': 'So why does it work??', 'start': 544.072, 'duration': 0.88}, {'end': 551.278, 'text': 'So, starting with just a single neuron using a rectified linear activation function,', 'start': 545.993, 'duration': 5.285}], 'summary': 'Rectified linear activation function provides super fast power to a single neuron.', 'duration': 25.23, 'max_score': 526.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A526048.jpg'}, {'end': 635.671, 'src': 'embed', 'start': 605.917, 'weight': 4, 'content': [{'end': 613.879, 'text': 'Now, if we add another neuron with a rectified linear activation function and we have a bias of zero and a weights of one,', 'start': 605.917, 'duration': 7.962}, {'end': 617.1, 'text': 'the actual output remains completely unchanged.', 'start': 613.879, 'duration': 3.221}, {'end': 625.222, 'text': 'If we then adjust the bias of the second neuron, something pretty curious happens to this entire output.', 'start': 618.8, 'duration': 6.422}, {'end': 635.671, 'text': 'The function, the output, is offset just like it was offset by bias before, but it is offset vertically instead.', 'start': 625.882, 'duration': 9.789}], 'summary': 'Adding a neuron with rectified linear activation and adjusting bias causes vertical offset in the output.', 'duration': 29.754, 'max_score': 605.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A605917.jpg'}], 'start': 307.465, 'title': 'Activation functions in neural networks', 'summary': 'Discusses the advantages of using the rectified linear activation function over sigmoid in neural networks due to its fast and simple calculation, granular optimization, and addressing the vanishing gradient problem. it also explains the limitations of linear activation functions and demonstrates the effectiveness of rectified linear activation functions through examples and insights.', 'chapters': [{'end': 364.034, 'start': 307.465, 'title': 'Rectified linear activation function', 'summary': 'Discusses the advantages of using the rectified linear activation function over sigmoid, focusing on its fast and simple calculation, with the main reasons being its granular optimization and the vanishing gradient problem associated with sigmoid.', 'duration': 56.569, 'highlights': ['The main reasons for using the rectified linear activation function are its fast and simple calculation, making it more efficient than the sigmoid function.', 'The rectified linear activation function is faster and simpler than the sigmoid function, as it has a very simple calculation, resulting in faster optimization.', 'The vanishing gradient problem associated with the sigmoid activation function is a key reason for using the rectified linear activation function, as it allows for better gradient optimization.']}, {'end': 755.81, 'start': 364.034, 'title': 'Activation functions in neural networks', 'summary': 'Explains the importance of nonlinear activation functions in neural networks, demonstrating the limitations of linear activation functions and the effectiveness of rectified linear activation functions through examples and insights.', 'duration': 391.776, 'highlights': ['Rectified linear activation function is essential for fitting nonlinear data and outperforms linear activation functions in neural networks.', 'The rectified linear activation function is almost linear but the slight nonlinearity at zero makes it powerful and efficient.', 'The impact of adjusting weights and biases on the behavior of neurons with rectified linear activation function is explained, providing insights into the functioning of individual neurons.']}], 'duration': 448.345, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A307465.jpg', 'highlights': ['The vanishing gradient problem with sigmoid justifies using rectified linear activation function.', 'Rectified linear activation function is faster and simpler than sigmoid, leading to faster optimization.', 'Rectified linear activation function is essential for fitting nonlinear data and outperforms linear activation functions.', 'The slight nonlinearity at zero makes rectified linear activation function powerful and efficient.', 'Insights into the functioning of individual neurons are provided by adjusting weights and biases with rectified linear activation function.']}, {'end': 1516.754, 'segs': [{'end': 845.932, 'src': 'embed', 'start': 755.83, 'weight': 2, 'content': [{'end': 757.971, 'text': 'Uh, so pretty cool.', 'start': 755.83, 'duration': 2.141}, {'end': 766.358, 'text': "So what we're going to do now is show that how, you know, how we can fit this green sign function to begin.", 'start': 758.592, 'duration': 7.766}, {'end': 772.923, 'text': "We'll set the weight between the two hidden layers is first neurons as one same,", 'start': 766.558, 'duration': 6.365}, {'end': 777.627, 'text': 'with the weight between the first neuron in that second hidden layer in the output neuron.', 'start': 772.923, 'duration': 4.704}, {'end': 789.2, 'text': "Then we'll use the weight from the input to the first neuron to fit those first parts of that sine wave, basically to set the slope.", 'start': 778.956, 'duration': 10.244}, {'end': 798.143, 'text': "Once we've got that initial slope the way that we want it, and again, we got that slope by setting the weight coming into that initial neuron.", 'start': 789.32, 'duration': 8.823}, {'end': 805.391, 'text': 'that weight is going to impact the outputting slope, basically, of that rectified linear.', 'start': 798.903, 'duration': 6.488}, {'end': 808.775, 'text': 'So as you increase the weight, that slope gets steeper and steeper.', 'start': 805.431, 'duration': 3.344}, {'end': 811.999, 'text': 'If you decrease the weight, it gets shallower and shallower.', 'start': 808.815, 'duration': 3.184}, {'end': 815.563, 'text': 'And then eventually, if you go into the negatives, that slope starts to go downward right?', 'start': 812.039, 'duration': 3.524}, {'end': 824.245, 'text': 'So the current problem that we have is this slope is correct for the first part of this sine wave, but soon we need a different slope.', 'start': 816.502, 'duration': 7.743}, {'end': 825.065, 'text': 'We want something else.', 'start': 824.285, 'duration': 0.78}, {'end': 830.507, 'text': 'So we wanna stop the bounds of this pair of neurons.', 'start': 825.665, 'duration': 4.842}, {'end': 831.767, 'text': 'We wanna make this bounded.', 'start': 830.547, 'duration': 1.22}, {'end': 841.27, 'text': 'So if you recall from the example before, the way that we set, so the first neuron in this pair of neurons is responsible for the activation point.', 'start': 832.908, 'duration': 8.362}, {'end': 845.932, 'text': 'The second neuron can set the deactivation point.', 'start': 841.33, 'duration': 4.602}], 'summary': 'Adjust weight to control slope of green sign function for different parts of sine wave.', 'duration': 90.102, 'max_score': 755.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A755830.jpg'}, {'end': 939.19, 'src': 'embed', 'start': 907.091, 'weight': 1, 'content': [{'end': 907.931, 'text': 'The slope is correct.', 'start': 907.091, 'duration': 0.84}, {'end': 908.872, 'text': 'Everything looks good.', 'start': 908.051, 'duration': 0.821}, {'end': 911.252, 'text': "And now it's just simply an alignment issue.", 'start': 909.152, 'duration': 2.1}, {'end': 921.307, 'text': 'So the plan here is to use the top seven neuron pairs to fit the correct shape of this sine function,', 'start': 911.892, 'duration': 9.415}, {'end': 928.394, 'text': "and then we're going to use the bottom pair purely for controlling the offset of this function, so moving it exactly where we want.", 'start': 921.307, 'duration': 7.087}, {'end': 933.365, 'text': 'so the first seven just simply match that kind of the shape and the flow of the sine wave.', 'start': 928.394, 'duration': 4.971}, {'end': 939.19, 'text': "The last ones we're gonna use to actually offset and get it to line up correctly.", 'start': 934.006, 'duration': 5.184}], 'summary': 'Using 7 neuron pairs to fit sine function shape and 1 pair for offset control.', 'duration': 32.099, 'max_score': 907.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A907091.jpg'}, {'end': 1125.545, 'src': 'embed', 'start': 1096.928, 'weight': 0, 'content': [{'end': 1102.351, 'text': 'So we can simulate the range of input, showing where activation occurs and stops for each pair of neurons,', 'start': 1096.928, 'duration': 5.423}, {'end': 1106.853, 'text': 'and how this impacts the overall function and output of this neural network.', 'start': 1102.351, 'duration': 4.502}, {'end': 1116.919, 'text': 'So you can see how individual neurons become responsible for only really small sections and parts of the overall neural network function,', 'start': 1107.413, 'duration': 9.506}, {'end': 1118.039, 'text': 'which is really cool.', 'start': 1116.919, 'duration': 1.12}, {'end': 1125.545, 'text': "So you can kind of see as basically as it's only when both neurons are activated that their quote-unquote area of effect comes into play.", 'start': 1118.079, 'duration': 7.466}], 'summary': 'Simulating neuron activation reveals their specific impact on the overall neural network function.', 'duration': 28.617, 'max_score': 1096.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1096928.jpg'}], 'start': 755.83, 'title': 'Neural network functions', 'summary': 'Covers the adjustment of weights and activation functions in a neural network, demonstrating the impact on output slope, shape modification of functions, and neuron responsibility for network functions.', 'chapters': [{'end': 825.065, 'start': 755.83, 'title': 'Neural network weight adjustment', 'summary': 'Demonstrates the process of fitting a green sign function to a neural network by adjusting the weights between hidden layers, impacting the output slope of the rectified linear, with the ability to modify the slope of the sine wave by increasing or decreasing the weights.', 'duration': 69.235, 'highlights': ['The weight between the two hidden layers is set to one, with the weight between the first neuron in the second hidden layer and the output neuron also being set.', 'Adjusting the weight coming into the initial neuron impacts the outputting slope of the rectified linear, as increasing the weight makes the slope steeper and decreasing it makes it shallower.', 'The current problem is that the initial slope is correct for the first part of the sine wave, but a different slope is needed for the subsequent part.']}, {'end': 1516.754, 'start': 825.665, 'title': 'Neural network activation function', 'summary': 'Discusses the process of modifying the bias and weight of neuron pairs to create bounded activation and deactivation points, adjusting the slope, and offset of the function to fit the shape of the sine function, ultimately demonstrating how individual neurons become responsible for small sections of the neural network function.', 'duration': 691.089, 'highlights': ['The process of modifying the bias and weight of neuron pairs to create bounded activation and deactivation points', 'Adjusting the slope and offset of the function to fit the shape of the sine function', 'Demonstrating how individual neurons become responsible for small sections of the neural network function']}], 'duration': 760.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A755830.jpg', 'highlights': ['Demonstrating how individual neurons become responsible for small sections of the neural network function', 'Adjusting the slope and offset of the function to fit the shape of the sine function', 'The weight between the two hidden layers is set to one, with the weight between the first neuron in the second hidden layer and the output neuron also being set', 'The process of modifying the bias and weight of neuron pairs to create bounded activation and deactivation points', 'Adjusting the weight coming into the initial neuron impacts the outputting slope of the rectified linear, as increasing the weight makes the slope steeper and decreasing it makes it shallower', 'The current problem is that the initial slope is correct for the first part of the sine wave, but a different slope is needed for the subsequent part']}, {'end': 1719.066, 'segs': [{'end': 1567.881, 'src': 'embed', 'start': 1517.253, 'weight': 1, 'content': [{'end': 1518.574, 'text': 'So, as a reminder,', 'start': 1517.253, 'duration': 1.321}, {'end': 1528.163, 'text': 'pep8 is a styling suggestion and I know that many people would like to remove that basically and do maybe something like that and so on.', 'start': 1518.574, 'duration': 9.589}, {'end': 1532.726, 'text': "The idea here is that we're going to have many different types of layers and that's just how we're going to organize them.", 'start': 1528.203, 'duration': 4.523}, {'end': 1539.072, 'text': 'If you really wanted to be official, the way that you would do it is you would have separate directories that would be like your layer directories.', 'start': 1533.087, 'duration': 5.985}, {'end': 1545.394, 'text': 'So you would from you know whatever package dot layers import, dense or something like that.', 'start': 1539.132, 'duration': 6.262}, {'end': 1546.494, 'text': "right. that's how you would do it.", 'start': 1545.394, 'duration': 1.1}, {'end': 1548.255, 'text': "that's how all the major packages are doing it.", 'start': 1546.494, 'duration': 1.761}, {'end': 1552.616, 'text': "we're going to try to keep everything simple and in one file just.", 'start': 1548.255, 'duration': 4.361}, {'end': 1553.976, 'text': "i think it's easier to learn that way.", 'start': 1552.616, 'duration': 1.36}, {'end': 1556.877, 'text': "daniel agrees, so we're going to keep it this way.", 'start': 1553.976, 'duration': 2.901}, {'end': 1559.877, 'text': 'if you want to change the names, by all means do that.', 'start': 1556.877, 'duration': 3}, {'end': 1562.198, 'text': "that's, it's really up to you.", 'start': 1559.877, 'duration': 2.321}, {'end': 1565.219, 'text': "we're here to teach you neural networks from scratch, so that you can do this.", 'start': 1562.198, 'duration': 3.021}, {'end': 1567.881, 'text': "however the heck you want, That's kind of the beauty.", 'start': 1565.219, 'duration': 2.662}], 'summary': 'Teaching neural networks from scratch with flexible styling and organization.', 'duration': 50.628, 'max_score': 1517.253, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1517253.jpg'}, {'end': 1645.55, 'src': 'embed', 'start': 1607.041, 'weight': 0, 'content': [{'end': 1612.962, 'text': 'So, For example, if you type NNFS and then code the part number and then the file name that you want,', 'start': 1607.041, 'duration': 5.921}, {'end': 1616.803, 'text': 'this will give you a copy of the code for that part number.', 'start': 1612.962, 'duration': 3.841}, {'end': 1621.886, 'text': 'So if you want to get the sample code that way you can, you can also just use the GitHub page and so on.', 'start': 1617.384, 'duration': 4.502}, {'end': 1624.987, 'text': 'We might add more stuff to this tool later on, just know it exists.', 'start': 1621.946, 'duration': 3.041}, {'end': 1628.409, 'text': "And if you're going to use that, you'll need to upgrade to get the later parts because they're not there.", 'start': 1625.027, 'duration': 3.382}, {'end': 1631.885, 'text': "The main thing and the main reason why we're going to use.", 'start': 1629.844, 'duration': 2.041}, {'end': 1634.206, 'text': "Actually, there's two reasons we're going to use this NNFS package.", 'start': 1631.885, 'duration': 2.321}, {'end': 1637.747, 'text': 'The first is for stuff like NumPy.', 'start': 1634.466, 'duration': 3.281}, {'end': 1645.55, 'text': 'So our goal and our hope here is that everybody can follow along and get the same values as we do, at least if they want to.', 'start': 1637.787, 'duration': 7.763}], 'summary': 'Nnfs tool provides code for part numbers, aiding in numpy usage.', 'duration': 38.509, 'max_score': 1607.041, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1607041.jpg'}, {'end': 1707.62, 'src': 'embed', 'start': 1682.929, 'weight': 5, 'content': [{'end': 1688.29, 'text': "So instead, what we're going to do is we're going to import nnfs and then do nnfs.init.", 'start': 1682.929, 'duration': 5.361}, {'end': 1697.433, 'text': "And besides setting the random seed, the other thing nnfs init is doing for us is it's setting a default data type for NumPy to use.", 'start': 1689.068, 'duration': 8.365}, {'end': 1701.596, 'text': 'So, for whatever reason, the dot product for NumPy.', 'start': 1697.914, 'duration': 3.682}, {'end': 1706.919, 'text': 'we only found this out after getting many people in the document and finding some people were getting slightly different values,', 'start': 1701.596, 'duration': 5.323}, {'end': 1707.62, 'text': 'and it was very weird.', 'start': 1706.919, 'duration': 0.701}], 'summary': 'Import nnfs, set random seed, and initialize default data type for numpy using nnfs.init.', 'duration': 24.691, 'max_score': 1682.929, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1682929.jpg'}], 'start': 1517.253, 'title': 'Organizing neural network layers and using nnfs package', 'summary': 'Discusses organizing neural network layers, suggesting separate directories and one file for simplicity, while the nnfs package ensures consistent results and a default data type for numpy, aiming for a uniform user experience.', 'chapters': [{'end': 1567.881, 'start': 1517.253, 'title': 'Neural network layer organization', 'summary': 'Discusses organizing layers in neural networks, with a suggestion to use separate directories for layers and the preference to keep everything simple and in one file, while allowing flexibility for users.', 'duration': 50.628, 'highlights': ['The chapter emphasizes organizing layers in neural networks, suggesting to use separate directories for layers and to keep everything simple and in one file.', "The speaker mentions that it's easier to learn when everything is kept simple and in one file, and Daniel agrees with this approach.", 'The transcript highlights the flexibility for users to change names and organize layers according to their preferences.']}, {'end': 1719.066, 'start': 1569.042, 'title': 'Using nnfs package for neural network activation function', 'summary': 'Introduces using the nnfs package to ensure consistent results in generating code files and setting a default data type for numpy to ensure consistent values, aiming to provide a uniform experience for all users.', 'duration': 150.024, 'highlights': ['The chapter emphasizes the use of the NNFS package to ensure consistent results in generating code files and setting a default data type for NumPy, aiming to provide a uniform experience for all users.', 'The NNFS package allows users to generate code files by using a specific part number and file name, ensuring reproducibility of the code for consistent results.', 'NNFS init sets a default data type for NumPy, addressing the issue of inconsistent values in the dot product for NumPy, thus ensuring uniformity in results.']}], 'duration': 201.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1517253.jpg', 'highlights': ['The NNFS package ensures consistent results and a default data type for NumPy, aiming for a uniform user experience.', 'The chapter emphasizes organizing layers in neural networks, suggesting to use separate directories for layers and to keep everything simple and in one file.', 'The NNFS package allows users to generate code files by using a specific part number and file name, ensuring reproducibility of the code for consistent results.', "The speaker mentions that it's easier to learn when everything is kept simple and in one file, and Daniel agrees with this approach.", 'The transcript highlights the flexibility for users to change names and organize layers according to their preferences.', 'NNFS init sets a default data type for NumPy, addressing the issue of inconsistent values in the dot product for NumPy, thus ensuring uniformity in results.']}, {'end': 2371.83, 'segs': [{'end': 1749.296, 'src': 'embed', 'start': 1719.066, 'weight': 0, 'content': [{'end': 1725.028, 'text': "so instead, what we're doing here is we're overriding some things and going to set so everyone uses the same data type.", 'start': 1719.066, 'duration': 5.962}, {'end': 1729.969, 'text': "anyway, if you're in another language, don't worry about it, but if you want to be able to replicate everything, that's how.", 'start': 1725.028, 'duration': 4.941}, {'end': 1735.031, 'text': "the next thing that we're going to use the nnfs package for is the data set again.", 'start': 1729.969, 'duration': 5.062}, {'end': 1741.173, 'text': "if you're following along in another language, i'll either copy and paste this in the description, if i forget to someone remind me.", 'start': 1735.031, 'duration': 6.142}, {'end': 1749.296, 'text': "but anyways, we're going to use this as our, as our data set, And again, this is just a function that will generate some data for us.", 'start': 1741.173, 'duration': 8.123}], 'summary': 'Utilizing nnfs package to generate data set for everyone to replicate', 'duration': 30.23, 'max_score': 1719.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1719066.jpg'}, {'end': 1778.551, 'src': 'embed', 'start': 1749.676, 'weight': 3, 'content': [{'end': 1756.719, 'text': "Eventually we'll use a real dataset, but when we're trying to learn things, it's kind of useful for us to be able to specify exactly what we need,", 'start': 1749.676, 'duration': 7.043}, {'end': 1757.9, 'text': 'of exactly what size.', 'start': 1756.719, 'duration': 1.181}, {'end': 1766.603, 'text': "however many classes, however sparse you know, you can make so many tweaks here and just generate a dataset that's either easy or hard, and so on.", 'start': 1757.9, 'duration': 8.703}, {'end': 1770.765, 'text': "So it just makes a lot of sense at this stage that one, we don't want to hand type anymore.", 'start': 1766.643, 'duration': 4.122}, {'end': 1778.551, 'text': "but also it's just very convenient to use a data set like this that we just create just for learning purposes.", 'start': 1771.627, 'duration': 6.924}], 'summary': 'Creating a customizable dataset is useful for learning, allowing for easy or hard datasets to be generated for learning purposes.', 'duration': 28.875, 'max_score': 1749.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1749676.jpg'}, {'end': 2217.988, 'src': 'heatmap', 'start': 2184.168, 'weight': 1, 'content': [{'end': 2194.116, 'text': "So, in fact, let me let's print print layer layer one dot output, just so we can see it right before we run it through the activation function.", 'start': 2184.168, 'duration': 9.948}, {'end': 2197.398, 'text': 'so in this case you can see lots of negative values.', 'start': 2194.116, 'duration': 3.282}, {'end': 2201.759, 'text': "okay, there's, there are some positive values, but there's also many negatives.", 'start': 2197.398, 'duration': 4.361}, {'end': 2204.54, 'text': 'so what we expect to see after we feed it through here?', 'start': 2201.759, 'duration': 2.781}, {'end': 2209.202, 'text': "because that's negative values after we've done the weights and the biases right.", 'start': 2204.54, 'duration': 4.662}, {'end': 2215.806, 'text': "so so Now it's going to go through that rectified linear function and that should make all of those negatives a zero.", 'start': 2209.202, 'duration': 6.604}, {'end': 2217.988, 'text': "Okay, so that's our expectation.", 'start': 2216.287, 'duration': 1.701}], 'summary': 'Demonstrating the rectified linear function on input data, expecting negative values to become zero.', 'duration': 20.59, 'max_score': 2184.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A2184168.jpg'}, {'end': 2292.853, 'src': 'embed', 'start': 2266.623, 'weight': 2, 'content': [{'end': 2276.107, 'text': 'the network is dying recall that one of the fixes that we could make immediately to solve for this is to initialize biases to some sort of non-zero number.', 'start': 2266.623, 'duration': 9.484}, {'end': 2281.503, 'text': 'For example, Uh, it will unlikely be the case that this is going to be a problem.', 'start': 2277.008, 'duration': 4.495}, {'end': 2286.728, 'text': 'Um, but now, you know, how, how could we check that? Well, you just print it out, right? Look at it.', 'start': 2282.084, 'duration': 4.644}, {'end': 2292.853, 'text': "Um, and, uh, that's how you, you know, begin to diagnose some of these errors anyway.", 'start': 2287.428, 'duration': 5.425}], 'summary': 'Initializing biases to a non-zero number can fix network issues.', 'duration': 26.23, 'max_score': 2266.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A2266623.jpg'}, {'end': 2340.755, 'src': 'embed', 'start': 2309.93, 'weight': 4, 'content': [{'end': 2312.772, 'text': "It's a cool name, but it's actually unbelievably simple.", 'start': 2309.93, 'duration': 2.842}, {'end': 2317.176, 'text': "So anyways, that's it for the rectified linear activation function.", 'start': 2313.313, 'duration': 3.863}, {'end': 2318.177, 'text': 'In the next tutorial.', 'start': 2317.256, 'duration': 0.921}, {'end': 2327.804, 'text': "what we're going to be talking about is the softmax activation function, which is a bit more advanced and really more specific to the output layer,", 'start': 2318.177, 'duration': 9.627}, {'end': 2329.686, 'text': 'which we kind of hope will be a distribution.', 'start': 2327.804, 'duration': 1.882}, {'end': 2332.088, 'text': 'But we will explain that in the next video.', 'start': 2329.726, 'duration': 2.362}, {'end': 2340.755, 'text': 'again, if you want to check out the book and get a jump on the material in advance, you can check out the book at nnfs.io, where we are much,', 'start': 2333.027, 'duration': 7.728}], 'summary': 'Introduction to rectified linear activation function and upcoming tutorial on softmax activation function at nnfs.io', 'duration': 30.825, 'max_score': 2309.93, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A2309930.jpg'}], 'start': 1719.066, 'title': 'Neural network training and activation functions', 'summary': 'Covers the usage of nnfs package to create challenging datasets for neural network training and explains the rectified linear activation function, emphasizing non-zero biases and hinting at upcoming discussions on the softmax activation function and advanced content at nnfs.io.', 'chapters': [{'end': 2089.429, 'start': 1719.066, 'title': 'Neural network training with nnfs package', 'summary': 'Introduces the usage of the nnfs package to generate a spiral dataset for neural network training, enabling the creation of customizable and challenging datasets for practice, with specific examples and explanations.', 'duration': 370.363, 'highlights': ['The nnfs package is used to generate a spiral dataset for neural network training, allowing customization of dataset size and complexity.', 'The dataset function generates a challenging dataset with overlapping classes, suitable for practice and review in neural network training.', 'The function allows the specification of the number of feature sets and classes, with a specific example of generating a dataset with three classes and 100 feature sets per class.']}, {'end': 2371.83, 'start': 2090.931, 'title': 'Activation functions in neural networks', 'summary': "Explains the implementation of the rectified linear activation function, its impact on the layer's output, and the importance of non-zero biases, while also hinting at the upcoming discussion on the softmax activation function and the advanced content available at nnfs.io.", 'duration': 280.899, 'highlights': ["The rectified linear activation function is applied to the entire layer, transforming negative values to zero and potentially impacting the network's optimization process.", "It's important to initialize biases to non-zero numbers to prevent the network from 'dying' with an abundance of zero outputs, serving as a crucial troubleshooting step in neural network development.", 'The upcoming tutorial will cover the more advanced softmax activation function, specific to the output layer, and further resources are available at nnfs.io for comprehensive training material beyond the discussed topics.']}], 'duration': 652.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/gmjzbpSVY1A/pics/gmjzbpSVY1A1719066.jpg', 'highlights': ['The nnfs package generates a challenging dataset for neural network training.', 'The rectified linear activation function transforms negative values to zero, impacting network optimization.', "Initializing biases to non-zero numbers prevents the network from 'dying' with an abundance of zero outputs.", 'The dataset function allows customization of dataset size and complexity for practice and review in neural network training.', 'The upcoming tutorial will cover the more advanced softmax activation function specific to the output layer.']}], 'highlights': ['The rectified linear unit (ReLU) activation function provides granular output and allows for weights and biases to impact the output.', 'The sigmoid activation function is more reliable for training a neural network due to the granularity of the output.', 'The vanishing gradient problem with sigmoid justifies using rectified linear activation function.', 'The NNFS package ensures consistent results and a default data type for NumPy, aiming for a uniform user experience.', 'The importance of using different activation functions for hidden layers and output layers in neural networks is mentioned as a common practice.']}