title

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)

description

Kaggle notebook with all the code: https://www.kaggle.com/wwsalmon/simple-mnist-nn-from-scratch-numpy-no-tf-keras
Blog article with more/clearer math explanation: https://www.samsonzhang.com/2020/11/24/understanding-the-math-behind-neural-networks-by-building-one-from-scratch-no-tf-keras-just-numpy.html

detail

{'title': 'Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)', 'heatmap': [{'end': 115.325, 'start': 91.475, 'weight': 0.711}, {'end': 576.515, 'start': 508.251, 'weight': 0.816}, {'end': 966.145, 'start': 940.823, 'weight': 1}], 'summary': 'Demonstrates building a neural network from scratch using numpy and linear algebra, implementing a 3-layer network for digit classification with the mnist dataset, covering backpropagation, achieving 84% accuracy on training data, and 85.5% accuracy on testing data.', 'chapters': [{'end': 36.535, 'segs': [{'end': 36.535, 'src': 'embed', 'start': 0.249, 'weight': 0, 'content': [{'end': 1.41, 'text': 'Hi everyone, my name is Samson.', 'start': 0.249, 'duration': 1.161}, {'end': 4.753, 'text': "Today I'm going to be building a neural network from scratch.", 'start': 1.831, 'duration': 2.922}, {'end': 10.738, 'text': 'So not using TensorFlow, not using Keras, just NumPy with equations, linear algebra from the ground up.', 'start': 4.813, 'duration': 5.925}, {'end': 16.923, 'text': 'Anyone interested in artificial intelligence or machine learning is probably very familiar with neural networks at a high level, right?', 'start': 11.519, 'duration': 5.404}, {'end': 18.925, 'text': 'You have lots of layers, lots of nodes.', 'start': 16.963, 'duration': 1.962}, {'end': 19.966, 'text': 'you connect them all together.', 'start': 18.925, 'duration': 1.041}, {'end': 24.129, 'text': 'you can have some really complex models from it that make some cool predictions, right?', 'start': 19.966, 'duration': 4.163}, {'end': 29.312, 'text': "I find that a lot of that kind of learning right when you're just looking at stuff from the high level, is kind of wishy-washy.", 'start': 24.269, 'duration': 5.043}, {'end': 36.535, 'text': "And even if you go into TensorFlow and implement these networks, it's still a little unclear like how they work, you know, at least for me.", 'start': 29.592, 'duration': 6.943}], 'summary': 'Samson will build a neural network from scratch using numpy, explaining its complexities and clarifying how it works.', 'duration': 36.286, 'max_score': 0.249, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU249.jpg'}], 'start': 0.249, 'title': 'Building a neural network', 'summary': 'Delves into building a neural network from scratch using numpy and linear algebra, highlighting the lack of clarity in understanding how neural networks work at a high level and the need for a deeper understanding.', 'chapters': [{'end': 36.535, 'start': 0.249, 'title': 'Building a neural network from scratch', 'summary': 'Discusses building a neural network from scratch using numpy and linear algebra, emphasizing the lack of clarity in understanding how neural networks work at a high level and the need for a deeper understanding.', 'duration': 36.286, 'highlights': ['Building a neural network from scratch using NumPy and linear algebra.', 'Emphasizing the lack of clarity in understanding how neural networks work at a high level.', 'The need for a deeper understanding of neural networks.']}], 'duration': 36.286, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU249.jpg', 'highlights': ['Building a neural network from scratch using NumPy and linear algebra.', 'The need for a deeper understanding of neural networks.', 'Emphasizing the lack of clarity in understanding how neural networks work at a high level.']}, {'end': 472.547, 'segs': [{'end': 115.325, 'src': 'heatmap', 'start': 61.768, 'weight': 0, 'content': [{'end': 71.111, 'text': "So what the MNIST dataset is, is it's tens of thousands of these 28 by 28, so pretty low res grayscale images of handwritten digits.", 'start': 61.768, 'duration': 9.343}, {'end': 78.913, 'text': 'We are going to be building a neural network that classifies images of handwritten digits and tells you what digit is written in that image.', 'start': 71.411, 'duration': 7.502}, {'end': 85.187, 'text': "This is an overview of what everything is gonna look like, what we're gonna implement today.", 'start': 81.823, 'duration': 3.364}, {'end': 88.832, 'text': "And we're gonna start off with 28 by 28 pixel training images.", 'start': 85.207, 'duration': 3.625}, {'end': 90.954, 'text': "So that's 784 pixels overall.", 'start': 88.872, 'duration': 2.082}, {'end': 95.58, 'text': "And each of those pixels is just a pixel value, right? It's between zero and 255.", 'start': 91.475, 'duration': 4.105}, {'end': 98.223, 'text': '255 being completely white, zero being completely black.', 'start': 95.58, 'duration': 2.643}, {'end': 102.849, 'text': 'So we have m of these training images.', 'start': 100.706, 'duration': 2.143}, {'end': 105.592, 'text': 'So we can represent it as a matrix that looks like this.', 'start': 103.209, 'duration': 2.383}, {'end': 115.325, 'text': 'Each row constitutes an example, and each row is going to be 784 columns long, because each of them is going to correspond to one pixel in that image.', 'start': 105.753, 'duration': 9.572}], 'summary': 'Building a neural network to classify 28x28 pixel images of handwritten digits.', 'duration': 43.824, 'max_score': 61.768, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU61768.jpg'}, {'end': 179.07, 'src': 'embed', 'start': 150.075, 'weight': 2, 'content': [{'end': 153.598, 'text': 'So the first layer in this neural network is just going to be 784 nodes.', 'start': 150.075, 'duration': 3.523}, {'end': 159.103, 'text': "This is our input layer, right? Each of the 784 pixels maps to a node, and that's our first layer.", 'start': 153.638, 'duration': 5.465}, {'end': 163.867, 'text': "Our second layer is going to be our hidden layer, and it's going to have 10 units.", 'start': 159.784, 'duration': 4.083}, {'end': 171.694, 'text': 'And then our third layer is going to be our output layer, again, with 10 units and each corresponding to one digit that can be predicted.', 'start': 164.388, 'duration': 7.306}, {'end': 179.07, 'text': 'In that intro right there, I called this the first layer, this the second layer, and this the third layer.', 'start': 172.645, 'duration': 6.425}], 'summary': 'Neural network: 784 input nodes, 10 units in hidden and output layers.', 'duration': 28.995, 'max_score': 150.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU150075.jpg'}, {'end': 370.456, 'src': 'embed', 'start': 342.052, 'weight': 3, 'content': [{'end': 343.753, 'text': 'rather than just a linear model.', 'start': 342.052, 'duration': 1.701}, {'end': 348.075, 'text': "And that's going to make your model able to be a lot more complex and more powerful.", 'start': 343.853, 'duration': 4.222}, {'end': 354.92, 'text': "we're going to be using another really commonly used activation function called relu, or rectified linear unit.", 'start': 348.855, 'duration': 6.065}, {'end': 356.842, 'text': 'So relu just looks like this.', 'start': 355.361, 'duration': 1.481}, {'end': 357.923, 'text': "It's really simple.", 'start': 356.962, 'duration': 0.961}, {'end': 364.228, 'text': 'And how it works, so relu of x is defined as if x is greater than zero, relu of x is just equal to x.', 'start': 358.603, 'duration': 5.625}, {'end': 365.049, 'text': "It's literally linear.", 'start': 364.228, 'duration': 0.821}, {'end': 369.154, 'text': "And if x is less than zero or less than or equal to zero, it doesn't really matter.", 'start': 365.789, 'duration': 3.365}, {'end': 370.456, 'text': "It's equal to zero.", 'start': 369.675, 'duration': 0.781}], 'summary': 'Using relu activation function for a more powerful model.', 'duration': 28.404, 'max_score': 342.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU342052.jpg'}, {'end': 456.601, 'src': 'embed', 'start': 428.417, 'weight': 5, 'content': [{'end': 430.64, 'text': 'We want each of them to have a probability, right?', 'start': 428.417, 'duration': 2.223}, {'end': 438.567, 'text': "We want each of them to be a value between zero and one, where one is absolute certainty and zero is no chance that that's what it is.", 'start': 430.66, 'duration': 7.907}, {'end': 440.75, 'text': 'So to get that, we use a softmax function.', 'start': 438.968, 'duration': 1.782}, {'end': 442.832, 'text': "So I've just copy pasted an image here.", 'start': 441.29, 'duration': 1.542}, {'end': 445.334, 'text': "I didn't feel like writing this out myself.", 'start': 442.852, 'duration': 2.482}, {'end': 448.918, 'text': 'It takes each of the nodes in the layer that you feed into it.', 'start': 445.614, 'duration': 3.304}, {'end': 456.601, 'text': 'and it goes e to the z, so e to that node, divided by the sum of e to all of the nodes, right?', 'start': 450.378, 'duration': 6.223}], 'summary': 'Using softmax function to assign probability values between 0 and 1 to each node.', 'duration': 28.184, 'max_score': 428.417, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU428417.jpg'}], 'start': 36.956, 'title': 'Implementing neural network for digit classification', 'summary': 'Focuses on implementing a neural network from scratch for digit classification using the mnist dataset, consisting of 28x28 grayscale images of handwritten digits. it covers the construction of a 3-layer neural network and explains the process of applying weights, biases, and activation functions, including the importance of non-linear activation functions like relu and softmax for improved model performance.', 'chapters': [{'end': 128.308, 'start': 36.956, 'title': 'Neural network for digit classification', 'summary': 'Focuses on implementing a neural network from scratch for digit classification using the mnist dataset, consisting of 28x28 grayscale images of handwritten digits, aiming to build a model that can classify and identify the digits in the images.', 'duration': 91.352, 'highlights': ['The MNIST dataset comprises tens of thousands of 28x28 grayscale images of handwritten digits, serving as the basis for building the neural network for digit classification.', 'The neural network will process 28x28 pixel training images, each containing 784 pixels represented as pixel values ranging from 0 to 255, and transpose the matrix to treat each column as an example to be analyzed.']}, {'end': 236.833, 'start': 128.768, 'title': 'Neural network layers and training', 'summary': 'Covers the construction of a neural network with 3 layers - input layer, hidden layer, and output layer, each with 784, 10, and 10 units respectively, and the three parts of training: forward propagation, computing the output, and the initial variable a zero.', 'duration': 108.065, 'highlights': ['The neural network consists of an input layer with 784 nodes, a hidden layer with 10 units, and an output layer with 10 units, each corresponding to a predicted digit.', 'The three parts of training the network include forward propagation, computing the output, and the initial variable a zero, which is equal to x.']}, {'end': 472.547, 'start': 236.833, 'title': 'Neural network activation functions', 'summary': 'Explains the process of applying weights, biases, and activation functions in a neural network, including the importance of non-linear activation functions like relu and softmax for creating complexity and power, ultimately leading to improved model performance and output probability values between zero and one.', 'duration': 235.714, 'highlights': ['The importance of applying non-linear activation functions like relu and softmax for creating complexity and power in a neural network, ultimately leading to improved model performance. The chapter emphasizes the significance of using non-linear activation functions such as relu and softmax to add complexity and power to the neural network, improving its model performance.', 'Explanation of the relu activation function and its simple nature, where relu of x is defined as x if x is greater than zero, and zero if x is less than or equal to zero. The chapter provides an explanation of the relu activation function, defining it as equal to x if x is greater than zero and equal to zero if x is less than or equal to zero, highlighting its simplicity.', 'Detailed explanation of the softmax function used for the output layer, aiming to produce probability values between zero and one for each node, corresponding to the recognition of 10 digits. The chapter offers a detailed explanation of the softmax function utilized for the output layer, aiming to produce probability values between zero and one for each node, correlating to the recognition of 10 digits.']}], 'duration': 435.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU36956.jpg', 'highlights': ['The MNIST dataset comprises tens of thousands of 28x28 grayscale images of handwritten digits, serving as the basis for building the neural network for digit classification.', 'The neural network will process 28x28 pixel training images, each containing 784 pixels represented as pixel values ranging from 0 to 255, and transpose the matrix to treat each column as an example to be analyzed.', 'The neural network consists of an input layer with 784 nodes, a hidden layer with 10 units, and an output layer with 10 units, each corresponding to a predicted digit.', 'The importance of applying non-linear activation functions like relu and softmax for creating complexity and power in a neural network, ultimately leading to improved model performance.', 'Explanation of the relu activation function and its simple nature, where relu of x is defined as x if x is greater than zero, and zero if x is less than or equal to zero.', 'Detailed explanation of the softmax function used for the output layer, aiming to produce probability values between zero and one for each node, corresponding to the recognition of 10 digits.']}, {'end': 669.289, 'segs': [{'end': 499.682, 'src': 'embed', 'start': 473.167, 'weight': 0, 'content': [{'end': 476.668, 'text': 'So for propagation is how you take an image and get a prediction out.', 'start': 473.167, 'duration': 3.501}, {'end': 481.209, 'text': "But that's not enough, right? We need good weights and biases to make these predictions.", 'start': 477.028, 'duration': 4.181}, {'end': 485.47, 'text': 'And the whole thing about machine learning is that we will learn these weights and biases right?', 'start': 481.669, 'duration': 3.801}, {'end': 490.892, 'text': 'We will run an algorithm to optimize these weights and biases as we run it over and over again.', 'start': 485.51, 'duration': 5.382}, {'end': 495.757, 'text': "We're going to do that in something called backprop, backwards propagation.", 'start': 491.272, 'duration': 4.485}, {'end': 499.682, 'text': "And what we're doing here is basically going the opposite way.", 'start': 496.458, 'duration': 3.224}], 'summary': 'Machine learning optimizes weights and biases through backpropagation for accurate predictions.', 'duration': 26.515, 'max_score': 473.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU473167.jpg'}, {'end': 576.515, 'src': 'heatmap', 'start': 508.251, 'weight': 0.816, 'content': [{'end': 512.993, 'text': "And then we're going to see how much each of the previous weights and biases contributed to that error.", 'start': 508.251, 'duration': 4.742}, {'end': 515.294, 'text': "And then we're going to adjust those things accordingly.", 'start': 513.373, 'duration': 1.921}, {'end': 518.534, 'text': 'So DZ2 represents sort of an error of the second layer.', 'start': 515.394, 'duration': 3.14}, {'end': 521.796, 'text': "It's how much the output layer is off by.", 'start': 519.375, 'duration': 2.421}, {'end': 523.356, 'text': 'And this is really simple.', 'start': 522.336, 'duration': 1.02}, {'end': 526.758, 'text': 'We just take our predictions and we subtract the actual labels from them.', 'start': 523.517, 'duration': 3.241}, {'end': 529.159, 'text': "And we're going to one hot encode the correct label.", 'start': 526.878, 'duration': 2.281}, {'end': 532.82, 'text': "So for example, if y equals 4, we're not going to subtract 4 from this.", 'start': 529.479, 'duration': 3.341}, {'end': 541.328, 'text': "We're gonna encode y equals four into this array here, where the fourth index representing the fourth class is a one, everything else has a zero.", 'start': 533.3, 'duration': 8.028}, {'end': 546.814, 'text': 'From there, we do some fancy maths to figure out how much w and b contributed to that error.', 'start': 541.388, 'duration': 5.426}, {'end': 554.282, 'text': 'So we can find this variable dw2, that is the derivative of the loss function with respect to the weights in layer two.', 'start': 547.114, 'duration': 7.168}, {'end': 556.444, 'text': "db2, this one's really easy to understand.", 'start': 554.502, 'duration': 1.942}, {'end': 563.508, 'text': 'What this literally is is an average of the absolute error, literally just the error, how much the output was off by.', 'start': 556.964, 'duration': 6.544}, {'end': 564.749, 'text': "That's for the second layer.", 'start': 563.728, 'duration': 1.021}, {'end': 576.515, 'text': "We're finding how much the second layer was off by from the prediction and then we find out correspondingly how much we should nudge our weight and our biases for this layer.", 'start': 565.109, 'duration': 11.406}], 'summary': 'Analyzing error contributions and adjusting weights and biases for better accuracy.', 'duration': 68.264, 'max_score': 508.251, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU508251.jpg'}, {'end': 564.749, 'src': 'embed', 'start': 526.878, 'weight': 1, 'content': [{'end': 529.159, 'text': "And we're going to one hot encode the correct label.", 'start': 526.878, 'duration': 2.281}, {'end': 532.82, 'text': "So for example, if y equals 4, we're not going to subtract 4 from this.", 'start': 529.479, 'duration': 3.341}, {'end': 541.328, 'text': "We're gonna encode y equals four into this array here, where the fourth index representing the fourth class is a one, everything else has a zero.", 'start': 533.3, 'duration': 8.028}, {'end': 546.814, 'text': 'From there, we do some fancy maths to figure out how much w and b contributed to that error.', 'start': 541.388, 'duration': 5.426}, {'end': 554.282, 'text': 'So we can find this variable dw2, that is the derivative of the loss function with respect to the weights in layer two.', 'start': 547.114, 'duration': 7.168}, {'end': 556.444, 'text': "db2, this one's really easy to understand.", 'start': 554.502, 'duration': 1.942}, {'end': 563.508, 'text': 'What this literally is is an average of the absolute error, literally just the error, how much the output was off by.', 'start': 556.964, 'duration': 6.544}, {'end': 564.749, 'text': "That's for the second layer.", 'start': 563.728, 'duration': 1.021}], 'summary': 'One-hot encode labels, calculate dw2 and db2 for error analysis.', 'duration': 37.871, 'max_score': 526.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU526878.jpg'}, {'end': 669.289, 'src': 'embed', 'start': 621.82, 'weight': 3, 'content': [{'end': 629.107, 'text': 'so after we do all this fancy calculation and figure out how much each weight term, each bias term, contributed to the error,', 'start': 621.82, 'duration': 7.287}, {'end': 630.709, 'text': 'we update our parameters accordingly.', 'start': 629.107, 'duration': 1.602}, {'end': 638.592, 'text': 'So this is a set of pretty simple equations, right? W1 is equal to W1 minus alpha, some learning rate alpha, times DW1.', 'start': 631.189, 'duration': 7.403}, {'end': 641.152, 'text': 'Similarly, B1 is B1 minus alpha times DB1.', 'start': 638.672, 'duration': 2.48}, {'end': 644.674, 'text': 'W2 is W2 minus alpha times DW2.', 'start': 641.933, 'duration': 2.741}, {'end': 648.975, 'text': 'And B2 is B2 minus alpha times DB2.', 'start': 644.874, 'duration': 4.101}, {'end': 651.056, 'text': "Alpha is what's called a hyperparameter.", 'start': 649.355, 'duration': 1.701}, {'end': 652.356, 'text': "It's not trained by the model.", 'start': 651.116, 'duration': 1.24}, {'end': 660.359, 'text': 'When you run this cycle, when you run gradient descent, the learning rate is a parameter that you set, not that gradient descent set.', 'start': 653.717, 'duration': 6.642}, {'end': 664.523, 'text': "So once we've updated it, we run through the whole thing again.", 'start': 661.019, 'duration': 3.504}, {'end': 669.289, 'text': 'We go through forward prop, we make a new round of predictions, we are changing our parameters,', 'start': 665.624, 'duration': 3.665}], 'summary': 'Parameters are updated using simple equations with a learning rate alpha, which is a hyperparameter set by the user.', 'duration': 47.469, 'max_score': 621.82, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU621820.jpg'}], 'start': 473.167, 'title': 'Backpropagation in machine learning', 'summary': 'Covers backpropagation for adjusting weights and biases to minimize errors, as well as one-hot encoding, and parameter update involving derivative calculation, error calculation, and gradient descent with a learning rate alpha.', 'chapters': [{'end': 546.814, 'start': 473.167, 'title': 'Backpropagation and weight optimization', 'summary': 'Explains backpropagation in machine learning, emphasizing the process of adjusting weights and biases to minimize errors in predictions, as well as one-hot encoding for label representation.', 'duration': 73.647, 'highlights': ['Backpropagation involves adjusting weights and biases to minimize prediction errors by finding the deviation from the actual label and optimizing the model through iterative algorithms.', 'One-hot encoding is used to represent the correct label, ensuring that the model calculates the deviation and adjusts weights accurately.', 'DZ2 represents the error of the output layer, calculated by subtracting the actual labels from the predictions and applying mathematical operations to determine the contribution of weights and biases to the error.']}, {'end': 669.289, 'start': 547.114, 'title': 'Backpropagation and parameter update', 'summary': 'Explains the backpropagation process for neural network training, involving finding the derivative of the loss function for each layer, using reverse propagation to calculate errors, and updating parameters using gradient descent with a learning rate alpha.', 'duration': 122.175, 'highlights': ['The backpropagation process involves finding the derivative of the loss function for each layer to calculate the errors and adjust the weights and biases accordingly.', 'The update equations for the parameters involve subtracting the product of the learning rate alpha and the derivative of the loss function with respect to the parameter from the parameter value.', 'Alpha, the learning rate, is a hyperparameter set by the user and not trained by the model, used in the parameter update equations during gradient descent.']}], 'duration': 196.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU473167.jpg', 'highlights': ['Backpropagation involves adjusting weights and biases to minimize prediction errors through iterative algorithms.', 'One-hot encoding ensures accurate weight adjustments by representing the correct label.', 'Derivative of the loss function is used to calculate errors and adjust parameters in backpropagation.', 'Parameter update equations involve subtracting the product of learning rate and derivative of the loss function from the parameter value.', 'Alpha, the learning rate, is a user-set hyperparameter used in parameter update equations during gradient descent.']}, {'end': 1136.452, 'segs': [{'end': 716.788, 'src': 'embed', 'start': 669.289, 'weight': 0, 'content': [{'end': 675.055, 'text': 'tweaking them so that the prediction is ever closer to what the actual, correct answer should be.', 'start': 669.289, 'duration': 5.766}, {'end': 675.956, 'text': "That's math.", 'start': 675.355, 'duration': 0.601}, {'end': 678.819, 'text': "Now let's get to coding it up.", 'start': 676.877, 'duration': 1.942}, {'end': 684.686, 'text': "I'm going to attempt to do it in the next 30 minutes.", 'start': 681.465, 'duration': 3.221}, {'end': 687.987, 'text': "Bang We're going to be doing this on a site called Kaggle.", 'start': 685.066, 'duration': 2.921}, {'end': 696.21, 'text': 'Kaggle is this really great site that makes it really easy to access data sets and have notebooks, have Python notebooks, Jupyter notebooks.', 'start': 688.027, 'duration': 8.183}, {'end': 704.978, 'text': 'So our digit recognizer data set is already nicely configured here, and all we have to do is hit New Notebook, and we will be taken into a kernel.', 'start': 696.75, 'duration': 8.228}, {'end': 705.899, 'text': 'Here we are.', 'start': 705.498, 'duration': 0.401}, {'end': 707.5, 'text': 'We have a notebook.', 'start': 706.199, 'duration': 1.301}, {'end': 709.202, 'text': "And let's just rename it real quick.", 'start': 707.921, 'duration': 1.281}, {'end': 711.544, 'text': "So the first thing that we're going to do is import our package.", 'start': 709.322, 'duration': 2.222}, {'end': 716.788, 'text': "So numpy is for linear algebra, it's for working with matrices, and then pandas is just for reading the data.", 'start': 711.784, 'duration': 5.004}], 'summary': 'Coding a digit recognizer in python on kaggle in 30 minutes', 'duration': 47.499, 'max_score': 669.289, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU669289.jpg'}, {'end': 809.353, 'src': 'embed', 'start': 782.314, 'weight': 3, 'content': [{'end': 787.055, 'text': "So you set aside a chunk of data that you don't train on and that's your dev or your cross-validation data.", 'start': 782.314, 'duration': 4.741}, {'end': 790.337, 'text': 'And that way you can test your hyperparameters on that data.', 'start': 787.595, 'duration': 2.742}, {'end': 792.099, 'text': 'You can test your performance on that data.', 'start': 790.357, 'duration': 1.742}, {'end': 795.061, 'text': 'You eliminate the risk of overfitting to your actual training data.', 'start': 792.239, 'duration': 2.822}, {'end': 796.602, 'text': "We're going to shuffle our data.", 'start': 795.201, 'duration': 1.401}, {'end': 800.906, 'text': "So hopefully it's just mp.random.shuffle, okay? Hopefully I'm not going crazy here.", 'start': 796.622, 'duration': 4.284}, {'end': 802.487, 'text': "We're going to do one thing before that.", 'start': 800.926, 'duration': 1.561}, {'end': 805.089, 'text': "We're going to get the dimensions of the data.", 'start': 802.527, 'duration': 2.562}, {'end': 809.353, 'text': 'So m and n is going to be called data.shape.', 'start': 805.189, 'duration': 4.164}], 'summary': 'Dev/cross-validation data used to test hyperparameters. eliminates overfitting risk. data shuffled and dimensions obtained.', 'duration': 27.039, 'max_score': 782.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU782314.jpg'}, {'end': 844.209, 'src': 'embed', 'start': 816.336, 'weight': 2, 'content': [{'end': 820.298, 'text': "It's the amount of features plus one because we have this label column, but it'll be helpful to have that.", 'start': 816.336, 'duration': 3.962}, {'end': 821.539, 'text': 'So there we go.', 'start': 820.879, 'duration': 0.66}, {'end': 829.043, 'text': "And now we're going to go data dev is going to be equal to data from zero to 1,000.", 'start': 822.259, 'duration': 6.784}, {'end': 832.464, 'text': 'Okay, the first 1,000 examples, and then I want all of it.', 'start': 829.043, 'duration': 3.421}, {'end': 833.365, 'text': "So I'm going to go with that.", 'start': 832.504, 'duration': 0.861}, {'end': 835.106, 'text': "Let's transpose this as well.", 'start': 833.785, 'duration': 1.321}, {'end': 837.287, 'text': "Okay, so we're going to transpose it.", 'start': 835.666, 'duration': 1.621}, {'end': 844.209, 'text': "So if you remember, that's the thing where we flip it so that each column is an example rather than each row.", 'start': 837.307, 'duration': 6.902}], 'summary': 'Data dev includes first 1000 examples, transposed for feature analysis.', 'duration': 27.873, 'max_score': 816.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU816336.jpg'}, {'end': 935.92, 'src': 'embed', 'start': 907.16, 'weight': 5, 'content': [{'end': 909.562, 'text': 'So OK, now we have our data.', 'start': 907.16, 'duration': 2.402}, {'end': 913.905, 'text': 'Now we can start just bashing out code for the actual neural network part.', 'start': 910.002, 'duration': 3.903}, {'end': 915.186, 'text': "Let's see how much time we spent.", 'start': 913.925, 'duration': 1.261}, {'end': 916.147, 'text': 'OK, seven minutes.', 'start': 915.346, 'duration': 0.801}, {'end': 916.807, 'text': "We'll move fast.", 'start': 916.227, 'duration': 0.58}, {'end': 917.368, 'text': "We'll move fast.", 'start': 916.827, 'duration': 0.541}, {'end': 920.75, 'text': "The first thing we're going to do is initialize all of our parameters.", 'start': 917.848, 'duration': 2.902}, {'end': 927.415, 'text': 'We need a starting w1, b1, w2, b2, right? And we have all of the dimensions for them here.', 'start': 921.01, 'duration': 6.405}, {'end': 928.315, 'text': "So we're going to go def.", 'start': 927.595, 'duration': 0.72}, {'end': 931.597, 'text': 'And this takes no arguments.', 'start': 930.337, 'duration': 1.26}, {'end': 935.92, 'text': "It doesn't need any arguments for that function because it's creating them completely from scratch.", 'start': 931.698, 'duration': 4.222}], 'summary': 'Data ready, beginning neural network code. completed initialization in 7 minutes.', 'duration': 28.76, 'max_score': 907.16, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU907160.jpg'}, {'end': 968.507, 'src': 'heatmap', 'start': 940.823, 'weight': 1, 'content': [{'end': 944.185, 'text': 'And I think, is it that? mp.random.rand.', 'start': 940.823, 'duration': 3.362}, {'end': 946.026, 'text': "I believe it's randn.", 'start': 944.205, 'duration': 1.821}, {'end': 956.176, 'text': 'Okay And our first, and we want the dimensions of this array to be, w1 is going to be 10 by 784.', 'start': 946.046, 'duration': 10.13}, {'end': 961.721, 'text': 'This is going to generate random values between 0 and 1 for each element of this array.', 'start': 956.176, 'duration': 5.545}, {'end': 966.145, 'text': 'So we actually subtract 0.5 from that to get it between negative 0.5 and 0.5.', 'start': 961.741, 'duration': 4.404}, {'end': 968.507, 'text': 'b1 is going to be np.random.random.', 'start': 966.145, 'duration': 2.362}], 'summary': 'Using np.random to generate random values for w1 and b1 arrays.', 'duration': 27.684, 'max_score': 940.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU940823.jpg'}, {'end': 1098.991, 'src': 'embed', 'start': 1069.958, 'weight': 6, 'content': [{'end': 1072.48, 'text': 'So a1 is going to equal to rel u of z1.', 'start': 1069.958, 'duration': 2.522}, {'end': 1073.781, 'text': "That's exactly what we want.", 'start': 1072.5, 'duration': 1.281}, {'end': 1075.582, 'text': "Great And now we'll do z2.", 'start': 1074.601, 'duration': 0.981}, {'end': 1080.065, 'text': 'z2 is going to be equal to w2 dot a1 now plus b2.', 'start': 1075.702, 'duration': 4.363}, {'end': 1082.667, 'text': 'And then a2 is going to be equal to softmax.', 'start': 1080.686, 'duration': 1.981}, {'end': 1093.949, 'text': "of a1, and again we're going to need to define softmax now, softmax of z, and we're going to return here.", 'start': 1084.927, 'duration': 9.022}, {'end': 1094.53, 'text': "it's going to be.", 'start': 1093.949, 'duration': 0.581}, {'end': 1098.991, 'text': "we're going to reference this formula right here and what we can actually do is mp.exp.", 'start': 1094.53, 'duration': 4.461}], 'summary': 'Defining a1, z2, and a2 in a neural network with softmax activation.', 'duration': 29.033, 'max_score': 1069.958, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1069958.jpg'}], 'start': 669.289, 'title': 'Implementing digit recognition and neural network data splitting', 'summary': 'Involves coding a digit recognition model using a kaggle dataset, utilizing pandas for data import and conversion to numpy arrays, with a goal of creating a machine learning model within 30 minutes. it also explains how to split data into training and dev sets, shuffle the data, get dimensions of the data, and initialize parameters for a neural network, covering forward propagation with the relu and softmax functions.', 'chapters': [{'end': 759.268, 'start': 669.289, 'title': 'Implementing digit recognition with kaggle data', 'summary': 'Involves coding a digit recognition model using a kaggle dataset, utilizing pandas for data import and conversion to numpy arrays, with a goal of creating a machine learning model within 30 minutes.', 'duration': 89.979, 'highlights': ['The chapter involves coding a digit recognition model using a Kaggle dataset, aiming to complete it within 30 minutes', 'Kaggle provides easy access to well-configured datasets and Python notebooks for machine learning tasks', 'Pandas is used to import the data, which is then converted to numpy arrays for manipulation and linear algebra operations']}, {'end': 1136.452, 'start': 759.788, 'title': 'Neural network data splitting and initialization', 'summary': 'Explains how to split data into training and dev sets, shuffle the data, get dimensions of the data, and initialize parameters for a neural network. it also covers forward propagation with the relu and softmax functions.', 'duration': 376.664, 'highlights': ['The chapter explains the process of splitting data into training and dev sets to avoid overfitting, with a specific example of splitting data dev from 0 to 1,000 and transposing it for easier manipulation. Data split into training and dev sets, with dev set including the first 1,000 examples.', "It covers the process of shuffling the data using the 'mp.random.shuffle' function to randomize the order of the examples for better training. Implementation of data shuffling for better training.", "The explanation includes getting the dimensions of the data using 'data.shape' to determine the amount of rows and features, where the amount of features is adjusted by adding one to account for the label column. Obtaining the dimensions of the data and adjusting the number of features.", 'The transcript details the initialization of parameters for the neural network, including the generation of random values between -0.5 and 0.5 for weights and biases. Initialization of parameters for the neural network, generating random values for weights and biases.', 'It covers the process of forward propagation with the ReLU and softmax functions, including the calculation of z1, a1, z2, and a2 using matrix operations and the ReLU function. Explanation of forward propagation using matrix operations and ReLU function.']}], 'duration': 467.163, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU669289.jpg', 'highlights': ['The chapter involves coding a digit recognition model using a Kaggle dataset, aiming to complete it within 30 minutes', 'Pandas is used to import the data, which is then converted to numpy arrays for manipulation and linear algebra operations', 'The chapter explains the process of splitting data into training and dev sets to avoid overfitting, with a specific example of splitting data dev from 0 to 1,000 and transposing it for easier manipulation', "It covers the process of shuffling the data using the 'mp.random.shuffle' function to randomize the order of the examples for better training", "The explanation includes getting the dimensions of the data using 'data.shape' to determine the amount of rows and features, where the amount of features is adjusted by adding one to account for the label column", 'The transcript details the initialization of parameters for the neural network, including the generation of random values between -0.5 and 0.5 for weights and biases', 'It covers the process of forward propagation with the ReLU and softmax functions, including the calculation of z1, a1, z2, and a2 using matrix operations and the ReLU function']}, {'end': 1694.105, 'segs': [{'end': 1255.305, 'src': 'embed', 'start': 1230.738, 'weight': 1, 'content': [{'end': 1240.501, 'text': "So what this is doing is it's just going through and it's saying, for each row, go to the column specified by the label in y and set it to 1, right?", 'start': 1230.738, 'duration': 9.763}, {'end': 1241.581, 'text': "And that's beautiful.", 'start': 1240.941, 'duration': 0.64}, {'end': 1242.761, 'text': "That's exactly what we want.", 'start': 1241.621, 'duration': 1.14}, {'end': 1245.802, 'text': "That'll do the one-hot encoding for us.", 'start': 1243.642, 'duration': 2.16}, {'end': 1255.305, 'text': 'And the one last thing we want to do is just flip this, okay? One-hot y.t and then we can return one-hot y.', 'start': 1246.002, 'duration': 9.303}], 'summary': 'Code sets specified column to 1, achieving one-hot encoding.', 'duration': 24.567, 'max_score': 1230.738, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1230738.jpg'}, {'end': 1334.524, 'src': 'embed', 'start': 1286.34, 'weight': 2, 'content': [{'end': 1290.103, 'text': 'dw2 now is going to be equal to one over m.', 'start': 1286.34, 'duration': 3.763}, {'end': 1290.944, 'text': "Okay, let's define m.", 'start': 1290.103, 'duration': 0.841}, {'end': 1296.94, 'text': 'm is going to be y.size, just like we did before.', 'start': 1292.956, 'duration': 3.984}, {'end': 1302.064, 'text': '1 over m times dz2.', 'start': 1296.96, 'duration': 5.104}, {'end': 1306.108, 'text': 'Looks like this is a dz2.a1.t.', 'start': 1303.025, 'duration': 3.083}, {'end': 1312.555, 'text': "Yeah, I believe that's right.", 'start': 1311.795, 'duration': 0.76}, {'end': 1313.576, 'text': "I believe that's right.", 'start': 1312.575, 'duration': 1.001}, {'end': 1324.2, 'text': 'Next, db2 is just 1 over m times mp dot sum of dz2.', 'start': 1314.956, 'duration': 9.244}, {'end': 1329.382, 'text': 'And then we want our dz1, which is going to be our fancy formula here.', 'start': 1324.5, 'duration': 4.882}, {'end': 1334.524, 'text': "It's going to be w2 dot transpose dot..", 'start': 1329.402, 'duration': 5.122}], 'summary': 'Calculating derivatives for neural network parameters using defined formulas.', 'duration': 48.184, 'max_score': 1286.34, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1286340.jpg'}, {'end': 1408.307, 'src': 'embed', 'start': 1382.862, 'weight': 4, 'content': [{'end': 1388.664, 'text': 'This works because when booleans are converted to numbers, true converts to 1 and false converts to 0.', 'start': 1382.862, 'duration': 5.802}, {'end': 1391.465, 'text': "If one element in z is greater than 0, we're going to return 1.", 'start': 1388.664, 'duration': 2.801}, {'end': 1395.026, 'text': "And if otherwise, we're going to return 0, which is exactly the derivative that we want.", 'start': 1391.465, 'duration': 3.561}, {'end': 1397.606, 'text': 'Beautiful And then now the same thing as we did before.', 'start': 1395.346, 'duration': 2.26}, {'end': 1408.307, 'text': "Boop I believe it's actually going to want x as well because we're going to have x.t here.", 'start': 1397.626, 'duration': 10.681}], 'summary': 'Convert booleans to numbers, return 1 if z has element > 0, else return 0.', 'duration': 25.445, 'max_score': 1382.862, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1382862.jpg'}, {'end': 1491.35, 'src': 'embed', 'start': 1464.453, 'weight': 3, 'content': [{'end': 1471.878, 'text': "Perfect So in theory, if that's correctly implemented, that is all of the functions that we need for doing our gradient descent on our neural network.", 'start': 1464.453, 'duration': 7.425}, {'end': 1473.299, 'text': 'Def gradient descent.', 'start': 1471.958, 'duration': 1.341}, {'end': 1474.26, 'text': "That's what we're going to do now.", 'start': 1473.379, 'duration': 0.881}, {'end': 1477.262, 'text': "We're going to take x and y first because we need both of those.", 'start': 1474.68, 'duration': 2.582}, {'end': 1478.744, 'text': "And then we're going to take iterations.", 'start': 1477.282, 'duration': 1.462}, {'end': 1484.207, 'text': "Alpha And I believe that's all that we need.", 'start': 1481.185, 'duration': 3.022}, {'end': 1491.35, 'text': "So first we're going to go w1, b1, w2, b2 is equal to init params.", 'start': 1485.027, 'duration': 6.323}], 'summary': 'Functions for neural network gradient descent implemented successfully.', 'duration': 26.897, 'max_score': 1464.453, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1464453.jpg'}, {'end': 1694.105, 'src': 'embed', 'start': 1640.975, 'weight': 0, 'content': [{'end': 1645.657, 'text': "i'm just gonna skip over all of that in editing and we're gonna go straight to the end, straight to an hour later,", 'start': 1640.975, 'duration': 4.682}, {'end': 1650.079, 'text': 'when i figured out what was wrong and finally ran the model and go here.', 'start': 1645.657, 'duration': 4.422}, {'end': 1673.821, 'text': 'But look at that 75% accuracy, 78% at iteration 250..', 'start': 1657.383, 'duration': 16.438}, {'end': 1675.622, 'text': '84% accuracy on training data is what we got.', 'start': 1673.821, 'duration': 1.801}, {'end': 1681.043, 'text': "So there's definitely a lot of room left for improvement, things like adding more layers, adding more unit cell layers,", 'start': 1676.042, 'duration': 5.001}, {'end': 1687.004, 'text': 'but 84% for something that we threw together in less than 30 minutes is not bad.', 'start': 1681.043, 'duration': 5.961}, {'end': 1687.684, 'text': "It's not bad.", 'start': 1687.024, 'duration': 0.66}, {'end': 1690.205, 'text': "So let's try out a few things.", 'start': 1687.824, 'duration': 2.381}, {'end': 1694.105, 'text': "Let's now, and I'm just gonna rip code straight off from my old thing now.", 'start': 1690.365, 'duration': 3.74}], 'summary': 'Achieved 84% accuracy on training data, with potential for improvement by adding more layers and unit cell layers.', 'duration': 53.13, 'max_score': 1640.975, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1640975.jpg'}], 'start': 1136.492, 'title': 'Neural network training', 'summary': 'Covers back propagation, gradient descent, and training progress in a neural network, achieving an 84% accuracy on the training data after 250 iterations, with potential for further improvement.', 'chapters': [{'end': 1397.606, 'start': 1136.492, 'title': 'Neural network back propagation', 'summary': 'Covers the back propagation process in a neural network, including one-hot encoding of labels, calculation of gradients, and implementation of the derivative of the relu activation function.', 'duration': 261.114, 'highlights': ['The process of one-hot encoding the labels involves creating a correctly sized matrix and setting the appropriate elements to 1, which simplifies the representation of output classes.', 'The calculation of gradients involves formulas such as dw2 = 1/m * dz2 and db2 = 1/m * np.sum(dz2), where m is the number of training examples, allowing for efficient optimization of the neural network.', 'The implementation of the derivative of the ReLU activation function involves a simple approach based on boolean conversion, where elements greater than 0 return 1 and others return 0, providing an elegant solution for calculating the derivative.']}, {'end': 1545.183, 'start': 1397.626, 'title': 'Neural network gradient descent', 'summary': 'Discusses the implementation of functions for gradient descent on a neural network, including updateparams, gradient descent, and the use of forward and back propagation, with the aim to optimize parameters w1, b1, w2, b2 for a specified number of iterations using alpha.', 'duration': 147.557, 'highlights': ['The chapter covers the implementation of functions for gradient descent on a neural network, including updateParams, gradient descent, and the use of forward and back propagation to optimize parameters w1, b1, w2, b2 for a specified number of iterations using alpha.', 'The functions updateParams, gradient descent, forward prop, and back prop are discussed for the optimization of parameters w1, b1, w2, b2 using alpha for a specified number of iterations.', 'The process involves utilizing the forward prop function to calculate z1, a1, z2, a2, and then utilizing the back prop function to obtain dw1, db1, dw2, db2, which are subsequently used in the updateParams function to update the weights w1, b1, w2, b2 for a specified number of iterations using alpha.']}, {'end': 1694.105, 'start': 1545.543, 'title': 'Neural network training progress', 'summary': 'Details the process of training a neural network, including debugging errors and achieving an 84% accuracy on the training data after 250 iterations, highlighting the potential for further improvement.', 'duration': 148.562, 'highlights': ['Achieving 84% accuracy on training data after 250 iterations Significant improvement in model performance', 'Debugging errors led to initial 10% accuracy, then 75% and 78% at later iterations Impact of debugging on model performance', 'Mention of potential improvements such as adding more layers and unit cell layers Future enhancements for model']}], 'duration': 557.613, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1136492.jpg', 'highlights': ['Achieving 84% accuracy on training data after 250 iterations Significant improvement in model performance', 'The process of one-hot encoding the labels involves creating a correctly sized matrix and setting the appropriate elements to 1, which simplifies the representation of output classes.', 'The calculation of gradients involves formulas such as dw2 = 1/m * dz2 and db2 = 1/m * np.sum(dz2), where m is the number of training examples, allowing for efficient optimization of the neural network.', 'The chapter covers the implementation of functions for gradient descent on a neural network, including updateParams, gradient descent, and the use of forward and back propagation to optimize parameters w1, b1, w2, b2 for a specified number of iterations using alpha.', 'The implementation of the derivative of the ReLU activation function involves a simple approach based on boolean conversion, where elements greater than 0 return 1 and others return 0, providing an elegant solution for calculating the derivative.', 'Debugging errors led to initial 10% accuracy, then 75% and 78% at later iterations Impact of debugging on model performance', 'Mention of potential improvements such as adding more layers and unit cell layers Future enhancements for model']}, {'end': 1886.177, 'segs': [{'end': 1720.978, 'src': 'embed', 'start': 1694.365, 'weight': 3, 'content': [{'end': 1700.468, 'text': 'This is just a function to make that prediction and then print out the prediction and the label, and then display the image.', 'start': 1694.365, 'duration': 6.103}, {'end': 1703.229, 'text': "So once we've defined that, let's do test prediction.", 'start': 1700.508, 'duration': 2.721}, {'end': 1706.251, 'text': "Let's test out just a random number.", 'start': 1704.51, 'duration': 1.741}, {'end': 1706.731, 'text': "It doesn't matter.", 'start': 1706.271, 'duration': 0.46}, {'end': 1708.052, 'text': 'B1, B2, B2.', 'start': 1706.751, 'duration': 1.301}, {'end': 1710.453, 'text': "So bang, it's zero.", 'start': 1708.872, 'duration': 1.581}, {'end': 1714.815, 'text': 'Our model predicts a zero, and the label is zero.', 'start': 1711.333, 'duration': 3.482}, {'end': 1717.496, 'text': "Let's do the second one.", 'start': 1715.936, 'duration': 1.56}, {'end': 1718.277, 'text': 'So there we go.', 'start': 1717.656, 'duration': 0.621}, {'end': 1720.978, 'text': "Let's do this one.", 'start': 1719.237, 'duration': 1.741}], 'summary': 'A function makes a prediction and prints the result, testing with random numbers. model predicts 0.', 'duration': 26.613, 'max_score': 1694.365, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1694365.jpg'}, {'end': 1844.858, 'src': 'embed', 'start': 1760.146, 'weight': 0, 'content': [{'end': 1769.269, 'text': "you'd be able to recognize it better that this model wasn't quite complex enough to capture that like, basically like a 45 degree rotation here.", 'start': 1760.146, 'duration': 9.123}, {'end': 1773.212, 'text': 'But, you know, we went through a good amount without finding one that was mislabeled.', 'start': 1769.31, 'duration': 3.902}, {'end': 1775.193, 'text': 'So this is definitely a model that actually worked.', 'start': 1773.232, 'duration': 1.961}, {'end': 1781.097, 'text': 'Just to do the last step, I want to check what the cross-validation accuracy on this is.', 'start': 1775.213, 'duration': 5.884}, {'end': 1785.34, 'text': 'X dev, v1, v1, v2, v2.', 'start': 1781.117, 'duration': 4.223}, {'end': 1788.402, 'text': 'Okay And then dev predictions.', 'start': 1785.54, 'duration': 2.862}, {'end': 1790.804, 'text': "Okay Let's run that through.", 'start': 1788.622, 'duration': 2.182}, {'end': 1792.686, 'text': 'And OK, look at that.', 'start': 1790.924, 'duration': 1.762}, {'end': 1796.109, 'text': "That's an 85.5% accuracy on the dev set.", 'start': 1793.106, 'duration': 3.003}, {'end': 1797.19, 'text': "So that's not the training data.", 'start': 1796.129, 'duration': 1.061}, {'end': 1798.551, 'text': "We didn't train on this data.", 'start': 1797.65, 'duration': 0.901}, {'end': 1800.333, 'text': 'This is effectively testing data.', 'start': 1798.571, 'duration': 1.762}, {'end': 1802.535, 'text': "We haven't done any optimization for this data.", 'start': 1800.373, 'duration': 2.162}, {'end': 1804.017, 'text': 'And 85.5% accuracy.', 'start': 1802.896, 'duration': 1.121}, {'end': 1808.821, 'text': "that's pretty good.", 'start': 1807.6, 'duration': 1.221}, {'end': 1809.442, 'text': 'there you go.', 'start': 1808.821, 'duration': 0.621}, {'end': 1810.063, 'text': "we've done it.", 'start': 1809.442, 'duration': 0.621}, {'end': 1810.583, 'text': "we've done it.", 'start': 1810.063, 'duration': 0.52}, {'end': 1817.611, 'text': 'we built a neural network from scratch and maybe, watching me code through it, watching me explain those equations, hopefully it helped a bit.', 'start': 1810.583, 'duration': 7.028}, {'end': 1824.178, 'text': "um, i'll have a link in the description to an article, a blog post that i'll put up just with all of these notes.", 'start': 1817.611, 'duration': 6.567}, {'end': 1826.921, 'text': "i'll have a link to this notebook so you can look through all the code.", 'start': 1824.178, 'duration': 2.743}, {'end': 1832.086, 'text': 'you can look through all the equations, you can figure everything out for yourself, if you so desire.', 'start': 1826.921, 'duration': 5.165}, {'end': 1834.788, 'text': "there's a lot of other stuff to explore, implementing manually, right.", 'start': 1832.086, 'duration': 2.702}, {'end': 1838.172, 'text': 'so things like regularization, different optimization methods, right.', 'start': 1834.788, 'duration': 3.384}, {'end': 1844.858, 'text': "so instead of just gradient descent, there's gradient descent with momentum, there's rms prop, there's atom optimization,", 'start': 1838.172, 'duration': 6.686}], 'summary': 'Built a neural network with 85.5% accuracy on dev set, encouraging results.', 'duration': 84.712, 'max_score': 1760.146, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1760146.jpg'}, {'end': 1886.177, 'src': 'embed', 'start': 1862.716, 'weight': 1, 'content': [{'end': 1866, 'text': "and it's all the more satisfying to see on a model that you built yourself.", 'start': 1862.716, 'duration': 3.284}, {'end': 1867.449, 'text': "That'll be it for this video.", 'start': 1866.348, 'duration': 1.101}, {'end': 1871.61, 'text': "Just a simple demo of how you can build out all the math that's on this screen here.", 'start': 1867.469, 'duration': 4.141}, {'end': 1876.152, 'text': 'I hope this gave you a more concrete understanding of how neural networks work.', 'start': 1871.85, 'duration': 4.302}, {'end': 1880.654, 'text': 'Maybe piqued your interest to dive into the math and do some more yourself.', 'start': 1876.913, 'duration': 3.741}, {'end': 1886.177, 'text': "Thanks for watching this video and I'll see you guys in future videos if you decide to stick around.", 'start': 1880.895, 'duration': 5.282}], 'summary': 'Demo shows how to build neural network math, encouraging self-exploration.', 'duration': 23.461, 'max_score': 1862.716, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1862716.jpg'}], 'start': 1694.365, 'title': 'Model testing and neural networks', 'summary': 'Discusses testing model predictions, identifying mislabeled instances, and achieving an 85.5% accuracy on testing data. it also explores building a neural network from scratch, explaining equations and code, and emphasizes the satisfaction of seeing accuracy increase.', 'chapters': [{'end': 1810.583, 'start': 1694.365, 'title': 'Model testing and accuracy', 'summary': "Discusses testing a model's predictions, identifying mislabeled instances, and evaluating its accuracy, achieving an 85.5% accuracy on the testing data.", 'duration': 116.218, 'highlights': ['The model achieved an 85.5% accuracy on the testing data, without any optimization or training on this dataset, showcasing its effectiveness in making predictions.', "The mislabeled instance, where the model predicted a '3' while the label was '5', demonstrates the potential for improvement by adding more layers or units to better capture complex patterns.", "The process involves testing random numbers, displaying the prediction and label, and visually inspecting the images to assess the model's performance."]}, {'end': 1886.177, 'start': 1810.583, 'title': 'Building neural network from scratch', 'summary': 'Discusses building a neural network from scratch, explaining equations and code, with a link to detailed notes and code. it also emphasizes exploring other aspects like regularization and optimization methods, and the satisfaction of seeing accuracy increase.', 'duration': 75.594, 'highlights': ['The satisfaction of seeing the accuracy increase on a model built from scratch is emphasized, which can pique interest in understanding neural networks.', 'The chapter encourages exploring other aspects like regularization and different optimization methods, such as gradient descent with momentum, rms prop, and atom optimization, to understand their impact.', 'The availability of a link to detailed notes and code is mentioned, providing an opportunity for self-exploration and understanding of the neural network implementation.']}], 'duration': 191.812, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/w8yWXqWQYmU/pics/w8yWXqWQYmU1694365.jpg', 'highlights': ['The model achieved an 85.5% accuracy on the testing data, showcasing its effectiveness in making predictions.', 'The satisfaction of seeing the accuracy increase on a model built from scratch is emphasized, piquing interest in understanding neural networks.', 'The mislabeled instance demonstrates the potential for improvement by adding more layers or units to better capture complex patterns.', "The process involves testing random numbers, displaying the prediction and label, and visually inspecting the images to assess the model's performance.", 'The chapter encourages exploring other aspects like regularization and different optimization methods to understand their impact.', 'The availability of a link to detailed notes and code is mentioned, providing an opportunity for self-exploration and understanding of the neural network implementation.']}], 'highlights': ['Building a neural network from scratch using NumPy and linear algebra.', 'The need for a deeper understanding of neural networks.', 'Emphasizing the lack of clarity in understanding how neural networks work at a high level.', 'The MNIST dataset comprises tens of thousands of 28x28 grayscale images of handwritten digits, serving as the basis for building the neural network for digit classification.', 'The neural network will process 28x28 pixel training images, each containing 784 pixels represented as pixel values ranging from 0 to 255, and transpose the matrix to treat each column as an example to be analyzed.', 'The neural network consists of an input layer with 784 nodes, a hidden layer with 10 units, and an output layer with 10 units, each corresponding to a predicted digit.', 'The importance of applying non-linear activation functions like relu and softmax for creating complexity and power in a neural network, ultimately leading to improved model performance.', 'Explanation of the relu activation function and its simple nature, where relu of x is defined as x if x is greater than zero, and zero if x is less than or equal to zero.', 'Detailed explanation of the softmax function used for the output layer, aiming to produce probability values between zero and one for each node, corresponding to the recognition of 10 digits.', 'Backpropagation involves adjusting weights and biases to minimize prediction errors through iterative algorithms.', 'One-hot encoding ensures accurate weight adjustments by representing the correct label.', 'Derivative of the loss function is used to calculate errors and adjust parameters in backpropagation.', 'Parameter update equations involve subtracting the product of learning rate and derivative of the loss function from the parameter value.', 'Alpha, the learning rate, is a user-set hyperparameter used in parameter update equations during gradient descent.', 'The chapter involves coding a digit recognition model using a Kaggle dataset, aiming to complete it within 30 minutes', 'Pandas is used to import the data, which is then converted to numpy arrays for manipulation and linear algebra operations', 'The chapter explains the process of splitting data into training and dev sets to avoid overfitting, with a specific example of splitting data dev from 0 to 1,000 and transposing it for easier manipulation', "It covers the process of shuffling the data using the 'mp.random.shuffle' function to randomize the order of the examples for better training", "The explanation includes getting the dimensions of the data using 'data.shape' to determine the amount of rows and features, where the amount of features is adjusted by adding one to account for the label column", 'The transcript details the initialization of parameters for the neural network, including the generation of random values between -0.5 and 0.5 for weights and biases', 'It covers the process of forward propagation with the ReLU and softmax functions, including the calculation of z1, a1, z2, and a2 using matrix operations and the ReLU function', 'Achieving 84% accuracy on training data after 250 iterations Significant improvement in model performance', 'The process of one-hot encoding the labels involves creating a correctly sized matrix and setting the appropriate elements to 1, which simplifies the representation of output classes.', 'The calculation of gradients involves formulas such as dw2 = 1/m * dz2 and db2 = 1/m * np.sum(dz2), where m is the number of training examples, allowing for efficient optimization of the neural network.', 'The chapter covers the implementation of functions for gradient descent on a neural network, including updateParams, gradient descent, and the use of forward and back propagation to optimize parameters w1, b1, w2, b2 for a specified number of iterations using alpha.', 'The implementation of the derivative of the ReLU activation function involves a simple approach based on boolean conversion, where elements greater than 0 return 1 and others return 0, providing an elegant solution for calculating the derivative.', 'Debugging errors led to initial 10% accuracy, then 75% and 78% at later iterations Impact of debugging on model performance', 'Mention of potential improvements such as adding more layers and unit cell layers Future enhancements for model', 'The model achieved an 85.5% accuracy on the testing data, showcasing its effectiveness in making predictions.', 'The satisfaction of seeing the accuracy increase on a model built from scratch is emphasized, piquing interest in understanding neural networks.', 'The mislabeled instance demonstrates the potential for improvement by adding more layers or units to better capture complex patterns.', "The process involves testing random numbers, displaying the prediction and label, and visually inspecting the images to assess the model's performance.", 'The chapter encourages exploring other aspects like regularization and different optimization methods to understand their impact.', 'The availability of a link to detailed notes and code is mentioned, providing an opportunity for self-exploration and understanding of the neural network implementation.']}