title

The spelled-out intro to neural networks and backpropagation: building micrograd

description

This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.
Links:
- micrograd on github: https://github.com/karpathy/micrograd
- jupyter notebooks I built in this video: https://github.com/karpathy/nn-zero-to-hero/tree/master/lectures/micrograd
- my website: https://karpathy.ai
- my twitter: https://twitter.com/karpathy
- "discussion forum": nvm, use youtube comments below for now :)
- (new) Neural Networks: Zero to Hero series Discord channel: https://discord.gg/3zy8kqD9Cp , for people who'd like to chat more and go beyond youtube comments
Exercises:
you should now be able to complete the following google collab, good luck!:
https://colab.research.google.com/drive/1FPTx1RXtBfc4MaTkf7viZZD4U2F9gtKN?usp=sharing
Chapters:
00:00:00 intro
00:00:25 micrograd overview
00:08:08 derivative of a simple function with one input
00:14:12 derivative of a function with multiple inputs
00:19:09 starting the core Value object of micrograd and its visualization
00:32:10 manual backpropagation example #1: simple expression
00:51:10 preview of a single optimization step
00:52:52 manual backpropagation example #2: a neuron
01:09:02 implementing the backward function for each operation
01:17:32 implementing the backward function for a whole expression graph
01:22:28 fixing a backprop bug when one node is used multiple times
01:27:05 breaking up a tanh, exercising with more operations
01:39:31 doing the same thing but in PyTorch: comparison
01:43:55 building out a neural net library (multi-layer perceptron) in micrograd
01:51:04 creating a tiny dataset, writing the loss function
01:57:56 collecting all of the parameters of the neural net
02:01:12 doing gradient descent optimization manually, training the network
02:14:03 summary of what we learned, how to go towards modern neural nets
02:16:46 walkthrough of the full code of micrograd on github
02:21:10 real stuff: diving into PyTorch, finding their backward pass for tanh
02:24:39 conclusion
02:25:20 outtakes :)

detail

{'title': 'The spelled-out intro to neural networks and backpropagation: building micrograd', 'heatmap': [{'end': 6653.692, 'start': 6476.684, 'weight': 0.955}], 'summary': 'Explains neural network training, micrograd functionality, and backpropagation, covering topics like derivatives, activation functions, and implementing non-linear functions in python. it also discusses the efficiency of micrograd, pytorch implementation, and challenges in training neural networks.', 'chapters': [{'end': 270.248, 'segs': [{'end': 39.422, 'src': 'embed', 'start': 14.403, 'weight': 3, 'content': [{'end': 22.01, 'text': "we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level.", 'start': 14.403, 'duration': 7.607}, {'end': 28.855, 'text': 'now specifically, what i would like to do is i would like to take you through building of micrograd now.', 'start': 22.871, 'duration': 5.984}, {'end': 32.217, 'text': 'micrograd is this library that i released on github about two years ago,', 'start': 28.855, 'duration': 3.362}, {'end': 39.422, 'text': "but at the time i only uploaded the source code and you'd have to go in by yourself and really figure out how it works.", 'start': 32.217, 'duration': 7.205}], 'summary': 'Explore training a neural net and building micrograd library.', 'duration': 25.019, 'max_score': 14.403, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku014403.jpg'}, {'end': 111.279, 'src': 'embed', 'start': 69.801, 'weight': 0, 'content': [{'end': 75.885, 'text': 'is we can iteratively tune the weights of that neural network to minimize the loss function and therefore improve the accuracy of the network.', 'start': 69.801, 'duration': 6.084}, {'end': 83.108, 'text': 'So backpropagation would be at the mathematical core of any modern deep neural network library, like say PyTorch or JAX.', 'start': 76.545, 'duration': 6.563}, {'end': 87.21, 'text': 'So the functionality of MicroGrad is I think best illustrated by an example.', 'start': 84.189, 'duration': 3.021}, {'end': 93.553, 'text': "So if we just scroll down here, you'll see that MicroGrad basically allows you to build out mathematical expressions.", 'start': 87.611, 'duration': 5.942}, {'end': 99.937, 'text': "And here what we are doing is we have an expression that we're building out where you have two inputs, A and B.", 'start': 94.434, 'duration': 5.503}, {'end': 104.613, 'text': "and you'll see that a and b are negative, four and two.", 'start': 101.05, 'duration': 3.563}, {'end': 111.279, 'text': 'but we are wrapping those values into this value object that we are going to build out as part of micrograd.', 'start': 104.613, 'duration': 6.666}], 'summary': 'Micrograd allows building mathematical expressions to tune neural network weights and improve accuracy.', 'duration': 41.478, 'max_score': 69.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku069801.jpg'}, {'end': 250.456, 'src': 'embed', 'start': 223.351, 'weight': 4, 'content': [{'end': 228.556, 'text': 'and then we can actually query this derivative of g with respect to a, for example.', 'start': 223.351, 'duration': 5.205}, {'end': 236.264, 'text': "that's a dot grad in this case it happens to be 138 and the derivative of g with respect to b, which also happens to be here, 645,", 'start': 228.556, 'duration': 7.708}, {'end': 241.088, 'text': "and this derivative we'll see soon is very important information,", 'start': 236.264, 'duration': 4.824}, {'end': 246.253, 'text': "because it's telling us how a and b are affecting g through this mathematical expression.", 'start': 241.088, 'duration': 5.165}, {'end': 250.456, 'text': 'So, in particular, a.grad is 138..', 'start': 246.934, 'duration': 3.522}], 'summary': 'Derivative of g with respect to a is 138, with respect to b is 645, providing crucial information on their impact on g.', 'duration': 27.105, 'max_score': 223.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0223351.jpg'}], 'start': 0.229, 'title': 'Neural network training and micrograd functionality', 'summary': 'Covers the process of training neural networks and explains the concept of micrograd, an autograd engine. it also discusses the functionality of micrograd, showcasing the process of building a mathematical expression and evaluating derivatives, resulting in specific values.', 'chapters': [{'end': 87.21, 'start': 0.229, 'title': 'Neural network training explained', 'summary': 'Covers the process of training neural networks, focusing on defining and training a neural net, and explaining the concept of micrograd, an autograd engine that implements backpropagation for efficiently evaluating gradients of a loss function with respect to the weights of a neural network.', 'duration': 86.981, 'highlights': ['Micrograd is an autograd engine that implements backpropagation, allowing efficient evaluation of gradients of a loss function with respect to the weights of a neural network, crucial for iteratively tuning the weights of the network to minimize the loss function and improve accuracy.', 'The lecture aims to provide a step-by-step explanation of building micrograd, previously released on Github, and its significance in understanding the inner workings of neural network training.', 'Backpropagation, the mathematical core of modern deep neural network libraries, enables the iterative tuning of neural network weights to minimize loss function and enhance network accuracy.']}, {'end': 270.248, 'start': 87.611, 'title': 'Micrograd mathematical expression building', 'summary': 'Discusses the functionality of micrograd, showcasing the process of building a mathematical expression with two inputs, a and b, and evaluating the derivative of g with respect to a and b, resulting in 138 and 645, respectively.', 'duration': 182.637, 'highlights': ['MicroGrad allows building mathematical expressions with two inputs, A and B, and performing various operations such as addition, multiplication, raising to a constant power, etc.', 'Backpropagation at the node G allows the evaluation of the derivative of G with respect to internal nodes E, D, C, and the inputs A and B, resulting in the derivatives of 138 and 645 for A and B, respectively.', 'Understanding the derivatives of G with respect to A and B (138 and 645, respectively) provides insight into how small changes in A and B affect the growth of G.']}], 'duration': 270.019, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0229.jpg', 'highlights': ['Micrograd implements backpropagation for efficient gradient evaluation of a loss function with respect to neural network weights.', 'Micrograd allows building mathematical expressions with two inputs, A and B, and performing various operations.', 'Backpropagation enables iterative tuning of neural network weights to minimize loss function and enhance accuracy.', 'The lecture provides a step-by-step explanation of building micrograd and its significance in understanding neural network training.', 'Understanding the derivatives of G with respect to A and B (138 and 645, respectively) provides insight into their impact on G.']}, {'end': 1149.047, 'segs': [{'end': 336.919, 'src': 'embed', 'start': 287.661, 'weight': 1, 'content': [{'end': 293.726, 'text': 'But it turns out that neural networks are just mathematical expressions, just like this one, but actually a slightly bit less crazy even.', 'start': 287.661, 'duration': 6.065}, {'end': 297.167, 'text': 'neural networks are just a mathematical expression.', 'start': 295.106, 'duration': 2.061}, {'end': 303.949, 'text': "they take the input data as an input and they take the weights of a neural network as an input, and it's a mathematical expression.", 'start': 297.167, 'duration': 6.782}, {'end': 307.631, 'text': 'and the output are your predictions of your neural net or the loss function.', 'start': 303.949, 'duration': 3.682}, {'end': 313.233, 'text': "we'll see this in a bit, but basically, neural networks just happen to be a certain class of mathematical expressions,", 'start': 307.631, 'duration': 5.602}, {'end': 316.596, 'text': 'But backpropagation is actually significantly more general.', 'start': 314.013, 'duration': 2.583}, {'end': 319.039, 'text': "It doesn't actually care about neural networks at all.", 'start': 316.957, 'duration': 2.082}, {'end': 321.702, 'text': 'It only cares about arbitrary mathematical expressions.', 'start': 319.279, 'duration': 2.423}, {'end': 326.007, 'text': 'And then we happen to use that machinery for training of neural networks.', 'start': 322.123, 'duration': 3.884}, {'end': 331.975, 'text': 'Now, one more note I would like to make at this stage is that, as you see here, micrograd is a scalar-valued autograd engine.', 'start': 326.528, 'duration': 5.447}, {'end': 336.919, 'text': "So it's working on the level of individual scalars, like negative four and two,", 'start': 332.435, 'duration': 4.484}], 'summary': 'Neural networks are mathematical expressions used for predictions and training, with backpropagation being more general.', 'duration': 49.258, 'max_score': 287.661, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0287661.jpg'}, {'end': 513.1, 'src': 'embed', 'start': 472.162, 'weight': 0, 'content': [{'end': 479.964, 'text': "so, basically, there's a lot of power that comes from only 150 lines of code, and that's all you need to understand to understand.", 'start': 472.162, 'duration': 7.802}, {'end': 482.744, 'text': 'neural network training and everything else is just efficiency.', 'start': 479.964, 'duration': 2.78}, {'end': 488.206, 'text': "and of course there's a lot to efficiency, but fundamentally, that's all that's happening, okay.", 'start': 482.744, 'duration': 5.462}, {'end': 491.546, 'text': "so now let's dive right in and implement micrograph step by step.", 'start': 488.206, 'duration': 3.34}, {'end': 495.667, 'text': "the first thing i'd like to do is i'd like to make sure that you have a very good understanding, intuitively,", 'start': 491.546, 'duration': 4.121}, {'end': 499.108, 'text': 'of what a derivative is and exactly what information it gives you.', 'start': 495.667, 'duration': 3.441}, {'end': 504.412, 'text': "So let's start with some basic imports that I copy-paste in every Jupyter Notebook always.", 'start': 499.788, 'duration': 4.624}, {'end': 510.818, 'text': "And let's define a function, a scalar-valid function, f of x, as follows.", 'start': 505.493, 'duration': 5.325}, {'end': 513.1, 'text': 'So I just made this up randomly.', 'start': 511.579, 'duration': 1.521}], 'summary': 'Understanding neural network training in 150 lines of code', 'duration': 40.938, 'max_score': 472.162, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0472162.jpg'}, {'end': 664.989, 'src': 'embed', 'start': 632.075, 'weight': 3, 'content': [{'end': 634.095, 'text': "measuring what it's telling you about the function.", 'start': 632.075, 'duration': 2.02}, {'end': 636.656, 'text': 'And so if we just look up derivative.', 'start': 634.955, 'duration': 1.701}, {'end': 645.641, 'text': 'We see that, okay, so this is not a very good definition of derivative.', 'start': 642.5, 'duration': 3.141}, {'end': 647.962, 'text': 'This is a definition of what it means to be differentiable.', 'start': 645.801, 'duration': 2.161}, {'end': 655.945, 'text': 'But if you remember from your calculus, it is the limit as h goes to zero of f of x plus h minus f of x over h.', 'start': 648.682, 'duration': 7.263}, {'end': 664.989, 'text': "So, basically, what it's saying is if you slightly bump up, you're at some point x that you're interested in, or a, and if you slightly bump up,", 'start': 655.945, 'duration': 9.044}], 'summary': 'The derivative of a function is defined as the limit as h goes to zero of f of x plus h minus f of x over h.', 'duration': 32.914, 'max_score': 632.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0632075.jpg'}, {'end': 705.834, 'src': 'embed', 'start': 676.445, 'weight': 4, 'content': [{'end': 681.246, 'text': "And that's the slope of that function, the slope of that response at that point.", 'start': 676.445, 'duration': 4.801}, {'end': 687.767, 'text': 'And so we can basically evaluate the derivative here numerically by taking a very small h.', 'start': 681.966, 'duration': 5.801}, {'end': 690.527, 'text': 'Of course, the definition would ask us to take h to zero.', 'start': 687.767, 'duration': 2.76}, {'end': 694.188, 'text': "We're just going to pick a very small h, 0.001.", 'start': 690.547, 'duration': 3.641}, {'end': 696.289, 'text': "and let's say we're interested in point 3.0.", 'start': 694.188, 'duration': 2.101}, {'end': 701.151, 'text': 'so we can look at f of x, of course, as 20 and now f of x plus h.', 'start': 696.289, 'duration': 4.862}, {'end': 705.834, 'text': 'so if we slightly nudge x in a positive direction, how is the function going to respond?', 'start': 701.151, 'duration': 4.683}], 'summary': "Evaluating the derivative numerically using a small h, 0.001, at point 3.0 and observing the function's response.", 'duration': 29.389, 'max_score': 676.445, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0676445.jpg'}], 'start': 270.268, 'title': 'Understanding micrograd, neural networks, and derivatives', 'summary': 'Discusses the concept of neural networks as mathematical expressions, the significance of backpropagation, and the efficiency of micrograd in training neural networks. it also introduces the concept of derivatives by defining a scalar-valid function and evaluating the derivatives of a more complex function, providing a thorough understanding of what the derivative is telling about the function. additionally, it mentions that the autograd engine is a mere 100 lines of code, and the neural network library on top of it comprises only 150 lines of code.', 'chapters': [{'end': 491.546, 'start': 270.268, 'title': 'Understanding micrograd and neural networks', 'summary': 'Discusses the concept of neural networks as mathematical expressions, the significance of backpropagation, and the efficiency of micrograd in training neural networks, with the autograd engine being a mere 100 lines of code and the neural network library on top of it comprising only 150 lines of code.', 'duration': 221.278, 'highlights': ['Neural networks are just mathematical expressions taking input data and weights, and outputting predictions or loss functions.', 'Backpropagation is more general and does not exclusively focus on neural networks; it is utilized for arbitrary mathematical expressions.', 'Micrograd is an efficient autograd engine, operating on individual scalar values, and is designed for pedagogical purposes to comprehend backpropagation and chain rule.', 'The entire neural network library built on top of micrograd comprises only 150 lines of code, showcasing the efficiency of micrograd in training neural networks.']}, {'end': 1149.047, 'start': 491.546, 'title': 'Understanding derivatives intuitively', 'summary': 'Introduces the concept of derivatives by defining a scalar-valid function, examining its derivative at specific input points, and evaluating the derivatives of a more complex function, providing a thorough understanding of what the derivative is telling about the function.', 'duration': 657.501, 'highlights': ['The chapter introduces the concept of derivatives by defining a scalar-valid function and examining its derivative at specific input points.', 'The chapter evaluates the derivatives of a more complex function to provide a thorough understanding of what the derivative is telling about the function.', 'The concept of derivative is explained by evaluating the slope and sensitivity of the function at specific input points using a numerical approximation approach.']}], 'duration': 878.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku0270268.jpg', 'highlights': ['The entire neural network library built on top of micrograd comprises only 150 lines of code, showcasing the efficiency of micrograd in training neural networks.', 'Backpropagation is more general and is utilized for arbitrary mathematical expressions, not exclusively for neural networks.', 'Neural networks are just mathematical expressions taking input data and weights, and outputting predictions or loss functions.', 'The chapter evaluates the derivatives of a more complex function to provide a thorough understanding of what the derivative is telling about the function.', 'The concept of derivatives is explained by evaluating the slope and sensitivity of the function at specific input points using a numerical approximation approach.', 'Micrograd is an efficient autograd engine, operating on individual scalar values, and is designed for pedagogical purposes to comprehend backpropagation and chain rule.', 'The chapter introduces the concept of derivatives by defining a scalar-valid function and examining its derivative at specific input points.']}, {'end': 1984.955, 'segs': [{'end': 1202.442, 'src': 'embed', 'start': 1149.547, 'weight': 0, 'content': [{'end': 1150.747, 'text': "And we'd like to move to neural networks.", 'start': 1149.547, 'duration': 1.2}, {'end': 1154.789, 'text': 'Now, as I mentioned, neural networks will be pretty massive expressions, mathematical expressions.', 'start': 1151.228, 'duration': 3.561}, {'end': 1157.73, 'text': 'So we need some data structures that maintain these expressions.', 'start': 1155.269, 'duration': 2.461}, {'end': 1159.631, 'text': "And that's what we're going to start to build out now.", 'start': 1157.87, 'duration': 1.761}, {'end': 1166.914, 'text': "So we're going to build out this value object that I showed you in the readme page of micrograd.", 'start': 1160.672, 'duration': 6.242}, {'end': 1172.477, 'text': 'So let me copy paste a skeleton of the first very simple value object.', 'start': 1167.675, 'duration': 4.802}, {'end': 1180.283, 'text': "So class value takes a single scalar value that it wraps and keeps track of, and that's it.", 'start': 1173.737, 'duration': 6.546}, {'end': 1187.37, 'text': 'So we can, for example, do value of 2.0, and then we can look at its content.', 'start': 1180.724, 'duration': 6.646}, {'end': 1195.758, 'text': 'And Python will internally use the wrapper function to return this string, oops.', 'start': 1188.051, 'duration': 7.707}, {'end': 1202.442, 'text': "So this is a value object with data equals two that we're creating here.", 'start': 1198.84, 'duration': 3.602}], 'summary': 'Building neural network structures using value objects to maintain expressions and data. example: value of 2.0.', 'duration': 52.895, 'max_score': 1149.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01149547.jpg'}, {'end': 1273.257, 'src': 'embed', 'start': 1251.209, 'weight': 6, 'content': [{'end': 1260.017, 'text': "and so we see that what we're going to return is a new value object, and it's just it's going to be wrapping the plus of their data.", 'start': 1251.209, 'duration': 8.808}, {'end': 1264.341, 'text': 'but remember now, because data is the actual, like numbered python number.', 'start': 1260.017, 'duration': 4.324}, {'end': 1268.846, 'text': 'so this operator here is just the typical floating point plus addition.', 'start': 1264.341, 'duration': 4.505}, {'end': 1273.257, 'text': "now it's not an addition of value objects and will return a new value.", 'start': 1268.846, 'duration': 4.411}], 'summary': 'Returning a new value object wrapping the result of floating point addition of numbered python data.', 'duration': 22.048, 'max_score': 1251.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01251209.jpg'}, {'end': 1770.62, 'src': 'embed', 'start': 1744.311, 'weight': 3, 'content': [{'end': 1748.274, 'text': 'We are able to build out mathematical expressions using only plus and times so far.', 'start': 1744.311, 'duration': 3.963}, {'end': 1751.076, 'text': 'They are scalar valued along the way.', 'start': 1749.375, 'duration': 1.701}, {'end': 1755.982, 'text': 'And we can do this forward pass and build out a mathematical expression.', 'start': 1751.736, 'duration': 4.246}, {'end': 1764.192, 'text': 'So we have multiple inputs here, A, B, C, and F going into a mathematical expression that produces a single output L.', 'start': 1756.563, 'duration': 7.629}, {'end': 1766.816, 'text': 'And this here is visualizing the forward pass.', 'start': 1764.192, 'duration': 2.624}, {'end': 1769.759, 'text': 'So the output of the forward pass is negative eight.', 'start': 1767.456, 'duration': 2.303}, {'end': 1770.62, 'text': "That's the value.", 'start': 1770.06, 'duration': 0.56}], 'summary': 'Using only plus and times, we can build mathematical expressions with multiple inputs a, b, c, and f, resulting in a single output l, with the forward pass yielding a value of negative eight.', 'duration': 26.309, 'max_score': 1744.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01744311.jpg'}, {'end': 1830.481, 'src': 'embed', 'start': 1785.674, 'weight': 4, 'content': [{'end': 1793.498, 'text': "And really what we're computing for every single value here, we're going to compute the derivative of that node with respect to L.", 'start': 1785.674, 'duration': 7.824}, {'end': 1799.773, 'text': 'So the derivative of L with respect to L is just one.', 'start': 1795.532, 'duration': 4.241}, {'end': 1807.815, 'text': "And then we're going to derive what is the derivative of L with respect to F, with respect to D, with respect to C, with respect to E,", 'start': 1800.653, 'duration': 7.162}, {'end': 1810.556, 'text': 'with respect to B and with respect to A.', 'start': 1807.815, 'duration': 2.741}, {'end': 1815.538, 'text': "And in a neural network setting, you'd be very interested in the derivative of basically this loss function, L,", 'start': 1810.556, 'duration': 4.982}, {'end': 1819.571, 'text': 'respect to the weights of a neural network.', 'start': 1817.069, 'duration': 2.502}, {'end': 1822.554, 'text': 'and here of course we have just these variables a, b, c and f.', 'start': 1819.571, 'duration': 2.983}, {'end': 1830.481, 'text': "but some of these will eventually represent the weights of a neural net, and so we'll need to know how those weights are impacting the loss function.", 'start': 1822.554, 'duration': 7.927}], 'summary': 'Computing derivatives of node values for loss function in neural network', 'duration': 44.807, 'max_score': 1785.674, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01785674.jpg'}, {'end': 1941.335, 'src': 'embed', 'start': 1899.846, 'weight': 2, 'content': [{'end': 1905.892, 'text': 'so here grad is 0.4f and this will be end of grad.', 'start': 1899.846, 'duration': 6.046}, {'end': 1911.978, 'text': 'and now we are going to be showing both the data and the grad, initialized at zero,', 'start': 1905.892, 'duration': 6.086}, {'end': 1916.819, 'text': 'And we are just about getting ready to calculate the back propagation.', 'start': 1913.857, 'duration': 2.962}, {'end': 1924.525, 'text': 'And of course, this grad, again, as I mentioned, is representing the derivative of the output, in this case L, with respect to this value.', 'start': 1917.56, 'duration': 6.965}, {'end': 1929.889, 'text': 'So this is the derivative of L with respect to F, with respect to D, and so on.', 'start': 1925.085, 'duration': 4.804}, {'end': 1933.951, 'text': "So let's now fill in those gradients and actually do back propagation manually.", 'start': 1930.629, 'duration': 3.322}, {'end': 1937.913, 'text': "So let's start filling in these gradients and start all the way at the end, as I mentioned here.", 'start': 1934.431, 'duration': 3.482}, {'end': 1941.335, 'text': 'First, we are interested to fill in this gradient here.', 'start': 1938.694, 'duration': 2.641}], 'summary': 'Grad is 0.4f, preparing for back propagation.', 'duration': 41.489, 'max_score': 1899.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01899846.jpg'}], 'start': 1149.547, 'title': 'Building value objects and expression graphs', 'summary': 'Introduces the process of building value objects for neural networks and discusses building expression graphs, demonstrating forward pass, back propagation, and computation of derivatives for each value.', 'chapters': [{'end': 1365.379, 'start': 1149.547, 'title': 'Building value objects for neural networks', 'summary': 'Introduces the process of building value objects for neural networks, including defining a value object class, implementing addition and multiplication operations, and demonstrating their functionality with specific values, resulting in the expected output.', 'duration': 215.832, 'highlights': ['Implementing addition and multiplication operations for value objects', 'Demonstrating the functionality of value objects with specific values', 'Defining and implementing a value object class for neural networks']}, {'end': 1984.955, 'start': 1365.399, 'title': 'Building expression graphs and visualizing mathematical expressions', 'summary': 'Discusses building expression graphs and visualizing mathematical expressions, demonstrating the process of forward pass, back propagation, and the computation of derivatives for each value, providing a deeper understanding of how each value impacts the output.', 'duration': 619.556, 'highlights': ['The chapter discusses the process of building expression graphs and visualizing mathematical expressions, showcasing the forward pass and back propagation.', 'The explanation includes the computation of derivatives for each value, providing insights into how each value impacts the output.', "The chapter emphasizes the maintenance of the derivative of the output with respect to each value, represented by the variable 'grad', initializing at zero and gradually filled during back propagation.", 'It demonstrates the process of calculating the derivatives manually, starting from the end of the expression and filling in the gradients for each value.', 'It introduces the concept of back propagation and its significance in computing the gradients for each value, providing a deeper understanding of how changes in each variable impact the output.']}], 'duration': 835.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01149547.jpg', 'highlights': ['Demonstrating the functionality of value objects with specific values', 'Defining and implementing a value object class for neural networks', "The chapter emphasizes the maintenance of the derivative of the output with respect to each value, represented by the variable 'grad', initializing at zero and gradually filled during back propagation.", 'The chapter discusses the process of building expression graphs and visualizing mathematical expressions, showcasing the forward pass and back propagation.', 'It introduces the concept of back propagation and its significance in computing the gradients for each value, providing a deeper understanding of how changes in each variable impact the output.', 'The explanation includes the computation of derivatives for each value, providing insights into how each value impacts the output.', 'Implementing addition and multiplication operations for value objects', 'It demonstrates the process of calculating the derivatives manually, starting from the end of the expression and filling in the gradients for each value.']}, {'end': 3510.179, 'segs': [{'end': 2137.541, 'src': 'embed', 'start': 2081.389, 'weight': 0, 'content': [{'end': 2085.693, 'text': "So let's here look at the derivatives of L with respect to D and F.", 'start': 2081.389, 'duration': 4.304}, {'end': 2087.114, 'text': "Let's do D first.", 'start': 2085.693, 'duration': 1.421}, {'end': 2091.437, 'text': "So what we are interested in, if I create a Markdown on here, is we'd like to know.", 'start': 2087.895, 'duration': 3.542}, {'end': 2100.424, 'text': "basically, we have that L is D times F and we'd like to know what is D, L by D, D.", 'start': 2091.437, 'duration': 8.987}, {'end': 2104.221, 'text': 'What is that? And if you know your calculus, L is D times F.', 'start': 2100.424, 'duration': 3.797}, {'end': 2106.585, 'text': 'So what is DL by DD? It would be F.', 'start': 2104.221, 'duration': 2.364}, {'end': 2112.801, 'text': "And if you don't believe me, we can also just derive it because the proof would be fairly straightforward.", 'start': 2108.257, 'duration': 4.544}, {'end': 2125.371, 'text': 'We go to the definition of the derivative, which is f of x plus h minus f of x divide h as a limit of h goes to zero of this kind of expression.', 'start': 2113.722, 'duration': 11.649}, {'end': 2136.019, 'text': 'So when we have L is d times f, then increasing d by h would give us the output of d plus h times f.', 'start': 2126.051, 'duration': 9.968}, {'end': 2137.541, 'text': "That's basically f of x plus h, right?", 'start': 2136.019, 'duration': 1.522}], 'summary': 'Derivative of l with respect to d is f.', 'duration': 56.152, 'max_score': 2081.389, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku02081389.jpg'}, {'end': 2299.004, 'src': 'embed', 'start': 2272.652, 'weight': 2, 'content': [{'end': 2279.555, 'text': 'Gradient check is when we are deriving this backpropagation and getting the derivative with respect to all the intermediate results.', 'start': 2272.652, 'duration': 6.903}, {'end': 2285.518, 'text': 'And then numerical gradient is just estimating it using small step size.', 'start': 2280.215, 'duration': 5.303}, {'end': 2288.839, 'text': "Now we're getting to the crux of backpropagation.", 'start': 2286.258, 'duration': 2.581}, {'end': 2295.222, 'text': 'So this will be the most important node to understand, because if you understand the gradient for this node,', 'start': 2289.26, 'duration': 5.962}, {'end': 2299.004, 'text': 'you understand all of backpropagation and all of training of neural nets, basically.', 'start': 2295.222, 'duration': 3.782}], 'summary': 'Understanding gradient for backpropagation is crucial for neural net training.', 'duration': 26.352, 'max_score': 2272.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku02272652.jpg'}, {'end': 2650.649, 'src': 'embed', 'start': 2620.496, 'weight': 4, 'content': [{'end': 2625.817, 'text': 'And so we can take these intermediate rates of change, if you will, and multiply them together.', 'start': 2620.496, 'duration': 5.321}, {'end': 2629.779, 'text': 'And that justifies the chain rule intuitively.', 'start': 2626.518, 'duration': 3.261}, {'end': 2631.239, 'text': 'So have a look at chain rule.', 'start': 2630.339, 'duration': 0.9}, {'end': 2638.161, 'text': "But here, really what it means for us is there's a very simple recipe for deriving what we want, which is dl by dc.", 'start': 2631.539, 'duration': 6.622}, {'end': 2650.649, 'text': 'And what we have so far is we know want and we know what is the impact of D on L.', 'start': 2639.801, 'duration': 10.848}], 'summary': 'The chain rule justifies deriving dl by dc with a simple recipe.', 'duration': 30.153, 'max_score': 2620.496, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku02620496.jpg'}, {'end': 2705.473, 'src': 'embed', 'start': 2673.433, 'weight': 9, 'content': [{'end': 2691.036, 'text': 'And so the chain rule tells us that dl by dc, going through this intermediate variable, will just be simply dl by dd times dd by dc.', 'start': 2673.433, 'duration': 17.603}, {'end': 2692.496, 'text': "That's chain rule.", 'start': 2691.836, 'duration': 0.66}, {'end': 2702.418, 'text': "So this is identical to what's happening here, except z is rl, y is rd, and x is rc.", 'start': 2693.416, 'duration': 9.002}, {'end': 2705.473, 'text': 'So we literally just have to multiply these.', 'start': 2703.973, 'duration': 1.5}], 'summary': 'The chain rule states dl/dc = dl/dd * dd/dc. apply to z=rl, y=rd, x=rc.', 'duration': 32.04, 'max_score': 2673.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku02673433.jpg'}, {'end': 3078.882, 'src': 'embed', 'start': 3052.845, 'weight': 3, 'content': [{'end': 3058.989, 'text': 'And so in this little operation we know what the local derivatives are and we just multiply them onto the derivative always.', 'start': 3052.845, 'duration': 6.144}, {'end': 3063.732, 'text': 'So we just go through and recursively multiply on the local derivatives.', 'start': 3059.769, 'duration': 3.963}, {'end': 3065.433, 'text': "And that's what back propagation is.", 'start': 3064.252, 'duration': 1.181}, {'end': 3069.756, 'text': "It's just a recursive application of chain rule backwards through the computation graph.", 'start': 3065.793, 'duration': 3.963}, {'end': 3073.138, 'text': "Let's see this power in action just very briefly.", 'start': 3070.716, 'duration': 2.422}, {'end': 3078.882, 'text': "What we're going to do is we're going to nudge our inputs to try to make L go up.", 'start': 3073.738, 'duration': 5.144}], 'summary': 'Back propagation is a recursive application of chain rule through the computation graph.', 'duration': 26.037, 'max_score': 3052.845, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03052845.jpg'}, {'end': 3224.687, 'src': 'embed', 'start': 3182.457, 'weight': 6, 'content': [{'end': 3184.578, 'text': 'We are going to backpropagate through a neuron.', 'start': 3182.457, 'duration': 2.121}, {'end': 3189.322, 'text': 'So we want to eventually build up neural networks.', 'start': 3185.539, 'duration': 3.783}, {'end': 3192.905, 'text': "And in the simplest case, these are multilayer perceptrons, as they're called.", 'start': 3189.963, 'duration': 2.942}, {'end': 3195.087, 'text': 'So this is a two-layer neural net.', 'start': 3193.125, 'duration': 1.962}, {'end': 3197.649, 'text': "And it's got these hidden layers made up of neurons.", 'start': 3195.868, 'duration': 1.781}, {'end': 3199.47, 'text': 'And these neurons are fully connected to each other.', 'start': 3197.909, 'duration': 1.561}, {'end': 3206.255, 'text': 'now, biologically, neurons are very complicated devices, but we have very simple mathematical models of them.', 'start': 3200.211, 'duration': 6.044}, {'end': 3209.537, 'text': 'and so this is a very simple mathematical model of a neuron.', 'start': 3206.255, 'duration': 3.282}, {'end': 3215.581, 'text': "you have some inputs x's and then you have these synapses that have weights on them.", 'start': 3209.537, 'duration': 6.044}, {'end': 3224.687, 'text': "so the w's are weights and then the synapse interacts with the input to this neuron multiplicatively.", 'start': 3215.581, 'duration': 9.106}], 'summary': 'Introduction to backpropagation through a two-layer neural net with fully connected neurons.', 'duration': 42.23, 'max_score': 3182.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03182457.jpg'}, {'end': 3438.503, 'src': 'embed', 'start': 3409.537, 'weight': 8, 'content': [{'end': 3416.724, 'text': "so we're now going to take it through an activation function and let's say we use the tanh so that we produce the output.", 'start': 3409.537, 'duration': 7.187}, {'end': 3425.413, 'text': "so what we'd like to do here is we'd like to do the output and I'll call it O is n.tanh.", 'start': 3416.724, 'duration': 8.689}, {'end': 3427.615, 'text': "okay, but we haven't yet written the tanh.", 'start': 3425.413, 'duration': 2.202}, {'end': 3438.503, 'text': "Now. the reason that we need to implement another 10H function here is that 10H is a hyperbolic function and we've only so far implemented a plus and a times,", 'start': 3428.456, 'duration': 10.047}], 'summary': 'Implementing tanh activation function for producing output.', 'duration': 28.966, 'max_score': 3409.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03409537.jpg'}], 'start': 1985.955, 'title': 'Neural networks and backpropagation', 'summary': 'Covers the concepts of derivatives, backpropagation, chain rule, and activation functions in neural networks, with a focus on manual calculations and understanding the mathematical models, emphasizing the importance of chain rule in correctly chaining derivatives together to calculate the instantaneous rate of change of z relative to x.', 'chapters': [{'end': 2272.111, 'start': 1985.955, 'title': 'Derivatives and backpropagation', 'summary': 'Explains the concept of derivatives and backpropagation using a specific example, demonstrating the calculation of derivatives of a function with respect to its variables and manually setting gradients for backpropagation.', 'duration': 286.156, 'highlights': ['The concept of derivatives is demonstrated through the calculation of the derivative of a function with respect to a variable, showcasing the application of the rise over run method and the verification of the derivative value through numerical verification.', 'The manual setting of gradients for backpropagation is illustrated, as the process involves setting the gradient for a specific node and then continuing the backpropagation to calculate the derivatives with respect to other variables.', 'The process of calculating derivatives with respect to different variables, such as D and F, is explained using symbolic expansion and the definition of the derivative, providing a detailed walkthrough of the calculation process.']}, {'end': 2619.415, 'start': 2272.652, 'title': 'Understanding chain rule in backpropagation', 'summary': 'Explains the importance of understanding chain rule in backpropagation, particularly in deriving dl by dc and dl by de, and emphasizes the application of chain rule in correctly chaining derivatives together to calculate the instantaneous rate of change of z relative to x.', 'duration': 346.763, 'highlights': ['The importance of understanding the gradient for the crux node in backpropagation', 'Deriving dl by dc and dl by de, emphasizing the problem and its solution using the chain rule', 'Explanation and application of the chain rule in correctly chaining derivatives together']}, {'end': 3181.776, 'start': 2620.496, 'title': 'Understanding chain rule and back propagation', 'summary': 'Explains the chain rule and back propagation, demonstrating how to calculate derivatives of intermediate nodes, apply chain rule, and manually back propagate through a computation graph to influence the final outcome.', 'duration': 561.28, 'highlights': ['The chapter explains the chain rule and back propagation, demonstrating how to calculate derivatives of intermediate nodes, apply chain rule, and manually back propagate through a computation graph to influence the final outcome.', 'The chain rule tells us that dl by dc, going through this intermediate variable, will just be simply dl by dd times dd by dc.', "The local derivative is simply 1.0. It's very simple.", "Because dl by dd is negative two, what is dl by dc? Well, it's the local gradient, 1.0 times dl by dd, which is negative two.", 'So we can imagine it almost like flowing backwards through the graph, and a plus node will simply distribute the derivative to all the leaf nodes.', "What we're going to do is we're going to nudge our inputs to try to make L go up."]}, {'end': 3510.179, 'start': 3182.457, 'title': 'Neural networks and activation functions', 'summary': "Explains the concept of backpropagation through a neuron in a multilayer perceptron, detailing the mathematical model of neurons, the role of synapses and biases, and the use of activation functions like tanh to produce the neuron's output.", 'duration': 327.722, 'highlights': ['The chapter explains the concept of backpropagation through a neuron in a multilayer perceptron', "The role of synapses and biases in the neuron's mathematical model", "Use of activation functions like tanh to produce the neuron's output"]}], 'duration': 1524.224, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku01985955.jpg', 'highlights': ['The process of calculating derivatives with respect to different variables, such as D and F, is explained using symbolic expansion and the definition of the derivative, providing a detailed walkthrough of the calculation process.', 'The concept of derivatives is demonstrated through the calculation of the derivative of a function with respect to a variable, showcasing the application of the rise over run method and the verification of the derivative value through numerical verification.', 'The importance of understanding the gradient for the crux node in backpropagation', 'The chapter explains the chain rule and back propagation, demonstrating how to calculate derivatives of intermediate nodes, apply chain rule, and manually back propagate through a computation graph to influence the final outcome.', 'Deriving dl by dc and dl by de, emphasizing the problem and its solution using the chain rule', 'The manual setting of gradients for backpropagation is illustrated, as the process involves setting the gradient for a specific node and then continuing the backpropagation to calculate the derivatives with respect to other variables.', 'The chapter explains the concept of backpropagation through a neuron in a multilayer perceptron', "The role of synapses and biases in the neuron's mathematical model", "Use of activation functions like tanh to produce the neuron's output", 'The chain rule tells us that dl by dc, going through this intermediate variable, will just be simply dl by dd times dd by dc.']}, {'end': 4530.288, 'segs': [{'end': 3545.199, 'src': 'embed', 'start': 3510.179, 'weight': 0, 'content': [{'end': 3514.34, 'text': 'the only thing that matters is that we know how to differentiate through any one function.', 'start': 3510.179, 'duration': 4.161}, {'end': 3516.821, 'text': 'so we take some inputs and we make an output.', 'start': 3514.34, 'duration': 2.481}, {'end': 3517.721, 'text': 'the only thing that matters.', 'start': 3516.821, 'duration': 0.9}, {'end': 3523.123, 'text': 'it can be arbitrarily complex function as long as you know how to create the local derivative.', 'start': 3517.721, 'duration': 5.402}, {'end': 3527.444, 'text': "if you know the local derivative of how the inputs impact the output, then that's all you need.", 'start': 3523.123, 'duration': 4.321}, {'end': 3533.028, 'text': "so we're going to cluster up All of this expression and we're not going to break it down to its atomic pieces.", 'start': 3527.444, 'duration': 5.584}, {'end': 3535.03, 'text': "We're just going to directly implement tanh.", 'start': 3533.409, 'duration': 1.621}, {'end': 3545.199, 'text': "So let's do that depth tanh and then out will be a value of, and we need this expression here.", 'start': 3535.711, 'duration': 9.488}], 'summary': 'Differentiate any function, implement tanh, and determine local derivatives.', 'duration': 35.02, 'max_score': 3510.179, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03510179.jpg'}, {'end': 3659.484, 'src': 'embed', 'start': 3633.345, 'weight': 2, 'content': [{'end': 3637.829, 'text': "And as long as we know the derivative of 10h, then we'll be able to backpropagate through it.", 'start': 3633.345, 'duration': 4.484}, {'end': 3639.89, 'text': "Now let's see this 10h in action.", 'start': 3638.409, 'duration': 1.481}, {'end': 3644.214, 'text': "Currently it's not squashing too much because the input to it is pretty low.", 'start': 3640.431, 'duration': 3.783}, {'end': 3654.542, 'text': "So if the bias was increased to, say, 8, then we'll see that what's flowing into the 10h now is 2.", 'start': 3644.835, 'duration': 9.707}, {'end': 3657.143, 'text': 'And 10H is squashing it to 0.96.', 'start': 3654.542, 'duration': 2.601}, {'end': 3659.484, 'text': "So we're already hitting the tail of this 10H.", 'start': 3657.143, 'duration': 2.341}], 'summary': 'Backpropagation through 10h, bias increase to 8 results in 2 flowing into 10h, squashing it to 0.96.', 'duration': 26.139, 'max_score': 3633.345, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03633345.jpg'}, {'end': 3741.703, 'src': 'embed', 'start': 3706.367, 'weight': 1, 'content': [{'end': 3712.49, 'text': 'And of course, in a typical neural network setting, what we really care about the most is the derivative of these neurons.', 'start': 3706.367, 'duration': 6.123}, {'end': 3718.934, 'text': "on the weights, specifically the W2 and W1, because those are the weights that we're going to be changing in part of the optimization.", 'start': 3712.49, 'duration': 6.444}, {'end': 3722.969, 'text': 'And the other thing that we have to remember is here we have only a single neuron,', 'start': 3719.847, 'duration': 3.122}, {'end': 3725.471, 'text': "but in the neural net you typically have many neurons and they're connected.", 'start': 3722.969, 'duration': 2.502}, {'end': 3732.496, 'text': "So this is only like one small neuron, a piece of a much bigger puzzle, and eventually there's a loss function.", 'start': 3727.312, 'duration': 5.184}, {'end': 3737.8, 'text': "that sort of measures the accuracy of the neural net, and we're backpropagating with respect to that accuracy and trying to increase it.", 'start': 3732.496, 'duration': 5.304}, {'end': 3741.703, 'text': "So let's start off backpropagation here in the end.", 'start': 3739.361, 'duration': 2.342}], 'summary': 'Neural network backpropagation involves optimizing weights to increase accuracy.', 'duration': 35.336, 'max_score': 3706.367, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03706367.jpg'}, {'end': 3884.491, 'src': 'embed', 'start': 3838.384, 'weight': 3, 'content': [{'end': 3840.405, 'text': 'So the output is this number.', 'start': 3838.384, 'duration': 2.021}, {'end': 3846.677, 'text': 'o.data is this number.', 'start': 3842.054, 'duration': 4.623}, {'end': 3852.581, 'text': 'and then what this is saying is that do by dn is 1 minus this squared.', 'start': 3846.677, 'duration': 5.904}, {'end': 3857.544, 'text': 'so 1 minus o.data squared is 0.5.', 'start': 3852.581, 'duration': 4.963}, {'end': 3867.61, 'text': 'conveniently, so the local derivative of this tanh operation here is 0.5, and so that would be do by dn.', 'start': 3857.544, 'duration': 10.066}, {'end': 3875.245, 'text': "so we can fill in that n.grad is 0.5, we'll just fill it in.", 'start': 3867.61, 'duration': 7.635}, {'end': 3884.491, 'text': 'So this is exactly 0.5, one half.', 'start': 3882.65, 'duration': 1.841}], 'summary': 'The local derivative of the tanh operation is 0.5.', 'duration': 46.107, 'max_score': 3838.384, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03838384.jpg'}, {'end': 3939.572, 'src': 'embed', 'start': 3910.449, 'weight': 5, 'content': [{'end': 3913.05, 'text': 'So one times 0.5 is 0.5.', 'start': 3910.449, 'duration': 2.601}, {'end': 3921.212, 'text': 'So therefore, we know that this node here, which we called this, its grad is just 0.5.', 'start': 3913.05, 'duration': 8.162}, {'end': 3925.036, 'text': 'And we know that b.grad is also 0.5.', 'start': 3921.212, 'duration': 3.824}, {'end': 3926.438, 'text': "So let's set those and let's draw.", 'start': 3925.036, 'duration': 1.402}, {'end': 3930.702, 'text': 'So those are 0.5.', 'start': 3929.06, 'duration': 1.642}, {'end': 3932.104, 'text': 'Continuing, we have another plus.', 'start': 3930.702, 'duration': 1.402}, {'end': 3934.286, 'text': "0.5, again, we'll just distribute.", 'start': 3933.205, 'duration': 1.081}, {'end': 3936.769, 'text': 'So 0.5 will flow to both of these.', 'start': 3934.927, 'duration': 1.842}, {'end': 3939.572, 'text': 'So we can set theirs.', 'start': 3937.469, 'duration': 2.103}], 'summary': "The node's grad and b.grad are both 0.5.", 'duration': 29.123, 'max_score': 3910.449, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03910449.jpg'}, {'end': 4057.484, 'src': 'embed', 'start': 4022.075, 'weight': 9, 'content': [{'end': 4024.255, 'text': "So that's the local piece of chain rule.", 'start': 4022.075, 'duration': 2.18}, {'end': 4028.397, 'text': "Let's set them and let's redraw.", 'start': 4027.196, 'duration': 1.201}, {'end': 4035.973, 'text': "So here we see that the gradient on our weight two is zero because X2's data was zero, right?", 'start': 4030.011, 'duration': 5.962}, {'end': 4039.975, 'text': 'But X2 will have the gradient 0.5 because data here was one.', 'start': 4036.534, 'duration': 3.441}, {'end': 4049.799, 'text': "And so what's interesting here right, is because the input X2 was zero. then because of the way the times works, of course this gradient will be zero.", 'start': 4040.815, 'duration': 8.984}, {'end': 4057.484, 'text': 'And to think about intuitively why that is, Derivative always tells us the influence of this on the final output.', 'start': 4050.339, 'duration': 7.145}], 'summary': "The local chain rule shows x2's gradient as 0.5 when data is 1, and 0 when data is 0.", 'duration': 35.409, 'max_score': 4022.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04022075.jpg'}, {'end': 4168.124, 'src': 'embed', 'start': 4135.319, 'weight': 8, 'content': [{'end': 4138.924, 'text': "So if this weight goes up, then this neuron's output would have gone up.", 'start': 4135.319, 'duration': 3.605}, {'end': 4141.446, 'text': 'and proportionally because the gradient is one.', 'start': 4139.585, 'duration': 1.861}, {'end': 4145.089, 'text': 'Okay, so doing the backpropagation manually is obviously ridiculous.', 'start': 4141.466, 'duration': 3.623}, {'end': 4151.893, 'text': "So we are now going to put an end to this suffering and we're going to see how we can implement the backward pass a bit more automatically.", 'start': 4145.229, 'duration': 6.664}, {'end': 4154.354, 'text': "We're not going to be doing all of it manually out here.", 'start': 4152.053, 'duration': 2.301}, {'end': 4159.558, 'text': "It's now pretty obvious to us by example how these pluses and times are backpropagating gradients.", 'start': 4155.115, 'duration': 4.443}, {'end': 4168.124, 'text': "So let's go up to the value object and we're going to start codifying what we've seen in the examples below.", 'start': 4160.098, 'duration': 8.026}], 'summary': 'Implementing backward pass automatically, avoiding manual backpropagation.', 'duration': 32.805, 'max_score': 4135.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04135319.jpg'}, {'end': 4325.351, 'src': 'embed', 'start': 4299.397, 'weight': 7, 'content': [{'end': 4307.939, 'text': "And basically what you're seeing here is that outs grad will simply be copied onto selves grad and others grad, as we saw happens,", 'start': 4299.397, 'duration': 8.542}, {'end': 4309.04, 'text': 'for an addition operation.', 'start': 4307.939, 'duration': 1.101}, {'end': 4314.561, 'text': "So we're going to later call this function to propagate the gradient, having done an addition.", 'start': 4310.06, 'duration': 4.501}, {'end': 4317.041, 'text': "Let's now do multiplication.", 'start': 4316.001, 'duration': 1.04}, {'end': 4325.351, 'text': "We're going to also define dot backward And we're going to set its backward to be backward.", 'start': 4317.562, 'duration': 7.789}], 'summary': 'Gradient propagation function for addition and multiplication defined.', 'duration': 25.954, 'max_score': 4299.397, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04299397.jpg'}], 'start': 3510.179, 'title': 'Neural network backpropagation', 'summary': 'Delves into understanding differentiation, implementing tanh and backpropagation, emphasizing the significance of local derivatives, discussing derivative calculation, distributing gradients, and applying the chain rule for updating weights to influence neuron outputs. it also covers how to codify the process of backpropagating gradients in a value object, including examples of addition, multiplication, and 10h operations, emphasizing the use of the chain rule and local derivatives.', 'chapters': [{'end': 3741.703, 'start': 3510.179, 'title': 'Understanding differentiation and implementing tanh', 'summary': 'Discusses the importance of understanding differentiation through any function, implementing tanh, and backpropagation in neural networks. it emphasizes the significance of knowing the local derivative and demonstrates the impact of changing bias on the tanh function output.', 'duration': 231.524, 'highlights': ['The importance of knowing the local derivative of inputs impacting the output in differentiation through any function', 'Direct implementation of tanh and its impact on squashing input values', 'Emphasizing the significance of backpropagation in neural networks for optimizing weights and increasing accuracy']}, {'end': 4154.354, 'start': 3742.648, 'title': 'Derivative and backpropagation', 'summary': 'Explains the process of backpropagation in neural networks, starting with the derivative calculation of the tanh function, distributing gradients through addition nodes, and applying the chain rule for the times operation, culminating in the understanding of how to update weights to influence neuron outputs.', 'duration': 411.706, 'highlights': ['The local derivative of the tanh function is 1 - o^2, resulting in a gradient of 0.5 for the tanh operation, facilitating backpropagation through it.', 'The addition node distributes the gradient equally to its connected nodes, resulting in a gradient of 0.5 for both nodes in this scenario.', 'The gradient for the times operation is calculated using the chain rule, resulting in a gradient of 0 for the weight where the input was zero, and -1.5 for the other weight, demonstrating the influence of weights on neuron outputs.', "The understanding of how weights influence neuron outputs is crucial, as updating weights can directly impact the neuron's output, as evidenced by the gradient values."]}, {'end': 4530.288, 'start': 4155.115, 'title': 'Backpropagating gradients in value object', 'summary': 'Explains how to codify the process of backpropagating gradients in a value object, including examples of addition, multiplication, and 10h operations, emphasizing the use of chain rule and local derivatives.', 'duration': 375.173, 'highlights': ['Defining backward function for addition and multiplication operations', 'Demonstrating backpropagation for 10H operation', 'Automating gradient propagation']}], 'duration': 1020.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku03510179.jpg', 'highlights': ['The importance of knowing the local derivative of inputs impacting the output in differentiation through any function', 'Emphasizing the significance of backpropagation in neural networks for optimizing weights and increasing accuracy', 'Direct implementation of tanh and its impact on squashing input values', 'The local derivative of the tanh function is 1 - o^2, resulting in a gradient of 0.5 for the tanh operation, facilitating backpropagation through it', "The understanding of how weights influence neuron outputs is crucial, as updating weights can directly impact the neuron's output, as evidenced by the gradient values", 'The addition node distributes the gradient equally to its connected nodes, resulting in a gradient of 0.5 for both nodes in this scenario', 'Demonstrating backpropagation for 10H operation', 'Defining backward function for addition and multiplication operations', 'Automating gradient propagation', 'The gradient for the times operation is calculated using the chain rule, resulting in a gradient of 0 for the weight where the input was zero, and -1.5 for the other weight, demonstrating the influence of weights on neuron outputs']}, {'end': 5200.036, 'segs': [{'end': 4708.15, 'src': 'embed', 'start': 4684.032, 'weight': 2, 'content': [{'end': 4691.334, 'text': 'We have to get all of its full dependencies, everything that it depends on has to propagate to it before we can continue backpropagation.', 'start': 4684.032, 'duration': 7.302}, {'end': 4697.236, 'text': 'So this ordering of graphs can be achieved using something called topological sort.', 'start': 4692.455, 'duration': 4.781}, {'end': 4706.168, 'text': 'So topological sort is basically a laying out of a graph such that all the edges go only from left to right, basically.', 'start': 4698.057, 'duration': 8.111}, {'end': 4708.15, 'text': 'So here we have a graph.', 'start': 4706.869, 'duration': 1.281}], 'summary': 'Topological sort is used to order graph dependencies for backpropagation.', 'duration': 24.118, 'max_score': 4684.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04684032.jpg'}, {'end': 4869.645, 'src': 'embed', 'start': 4833.772, 'weight': 3, 'content': [{'end': 4844.687, 'text': 'Then we built a topological order And then we went for node in reversed of topo.', 'start': 4833.772, 'duration': 10.915}, {'end': 4853.115, 'text': 'Now, in the reverse order, because this list goes from, you know, we need to go through it in reversed order.', 'start': 4846.509, 'duration': 6.606}, {'end': 4857.179, 'text': 'So starting at O, node.backward.', 'start': 4854.156, 'duration': 3.023}, {'end': 4861.583, 'text': 'And this should be it.', 'start': 4858.86, 'duration': 2.723}, {'end': 4863.865, 'text': 'There we go.', 'start': 4863.585, 'duration': 0.28}, {'end': 4866.683, 'text': 'Those are the correct derivatives.', 'start': 4865.762, 'duration': 0.921}, {'end': 4869.645, 'text': 'Finally, we are going to hide this functionality.', 'start': 4867.243, 'duration': 2.402}], 'summary': 'Implemented topological order, derived correct results, and will hide functionality.', 'duration': 35.873, 'max_score': 4833.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04833772.jpg'}, {'end': 4994.264, 'src': 'embed', 'start': 4964.117, 'weight': 5, 'content': [{'end': 4972.522, 'text': 'Say I create a single node A and then I create a B that is A plus A and then I call backward.', 'start': 4964.117, 'duration': 8.405}, {'end': 4979.432, 'text': "So what's going to happen is A is 3, and then B is A plus A.", 'start': 4975.049, 'duration': 4.383}, {'end': 4981.614, 'text': "So there's two arrows on top of each other here.", 'start': 4979.432, 'duration': 2.182}, {'end': 4986.318, 'text': 'Then we can see that B is, of course, the forward pass works.', 'start': 4983.936, 'duration': 2.382}, {'end': 4990.181, 'text': 'B is just A plus A, which is 6.', 'start': 4987.018, 'duration': 3.163}, {'end': 4994.264, 'text': 'But the gradient here is not actually correct, that we calculated automatically.', 'start': 4990.181, 'duration': 4.083}], 'summary': 'Creating single nodes a and b, a=3, b=a+a=6, incorrect gradient calculated.', 'duration': 30.147, 'max_score': 4964.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04964117.jpg'}, {'end': 5107.639, 'src': 'embed', 'start': 5081.017, 'weight': 0, 'content': [{'end': 5088.576, 'text': "so fundamentally what's happening here again is Basically we're going to see an issue anytime we use a variable more than once.", 'start': 5081.017, 'duration': 7.559}, {'end': 5094.197, 'text': "Until now, in these expressions above, every variable is used exactly once, so we didn't see the issue.", 'start': 5089.356, 'duration': 4.841}, {'end': 5098.577, 'text': "But here, if a variable is used more than once, what's going to happen during backward pass?", 'start': 5095.217, 'duration': 3.36}, {'end': 5101.858, 'text': "We're back-propagating from F to E to D.", 'start': 5099.258, 'duration': 2.6}, {'end': 5103.018, 'text': 'so far, so good.', 'start': 5101.858, 'duration': 1.16}, {'end': 5107.639, 'text': 'but now E calls it backward and it deposits its gradients to A and B.', 'start': 5103.018, 'duration': 4.621}], 'summary': 'Using a variable more than once may cause issues during backward pass.', 'duration': 26.622, 'max_score': 5081.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05081017.jpg'}, {'end': 5158.951, 'src': 'embed', 'start': 5128.033, 'weight': 1, 'content': [{'end': 5129.013, 'text': 'These gradients add.', 'start': 5128.033, 'duration': 0.98}, {'end': 5136.336, 'text': 'And so instead of setting those gradients, we can simply do plus equals.', 'start': 5130.254, 'duration': 6.082}, {'end': 5138.356, 'text': 'We need to accumulate those gradients.', 'start': 5136.936, 'duration': 1.42}, {'end': 5142.578, 'text': 'Plus equals, plus equals, plus equals.', 'start': 5139.277, 'duration': 3.301}, {'end': 5151.608, 'text': 'plus equals, and this will be okay, remember, because we are initializing them at zero, so they start at zero,', 'start': 5144.824, 'duration': 6.784}, {'end': 5158.951, 'text': 'and then any contribution that flows backwards will simply add.', 'start': 5151.608, 'duration': 7.343}], 'summary': 'Accumulate gradients using plus equals to start at zero.', 'duration': 30.918, 'max_score': 5128.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05128033.jpg'}], 'start': 4530.288, 'title': 'Implementing backward propagation and backpropagation bug and solution', 'summary': 'Covers the implementation of backward propagation in a computational graph, including topological sorting for proper node ordering and calling dot_backward for correct derivatives. it also discusses the bug in backpropagation due to gradient overwriting and introduces the solution of accumulating gradients using plus equals.', 'chapters': [{'end': 4893.462, 'start': 4530.288, 'title': 'Implementing backward propagation', 'summary': 'Covers the implementation of backward propagation in a computational graph, including the concept of topological sorting and its application to ensure a proper ordering of nodes for backpropagation, as well as the detailed process of calling dot_backward on all nodes in a topological order to calculate correct derivatives.', 'duration': 363.174, 'highlights': ['The chapter discusses the concept of topological sort to ensure proper ordering of nodes for backpropagation.', 'The process of calling dot_backward on all nodes in a topological order to calculate correct derivatives is explained.', 'The detailed steps of implementing backward propagation, including the use of topological sorting and the application of correct derivatives, are outlined.']}, {'end': 5200.036, 'start': 4897.228, 'title': 'Backpropagation bug and solution', 'summary': 'Discusses the bug in backpropagation, where gradients are being overwritten due to the use of variables more than once, and introduces the solution of accumulating gradients using plus equals, ensuring correct gradient calculations.', 'duration': 302.808, 'highlights': ['The bug in backpropagation occurs when a variable is used more than once, causing gradients to be overwritten, leading to incorrect gradient calculations.', 'The solution to the bug involves accumulating gradients using plus equals, ensuring that the gradients from different branches are added together to avoid overwriting and to achieve correct gradient calculations.', 'A specific example is provided where a variable A is used to calculate B (A + A) and the backward pass results in the incorrect gradient of 1 instead of 2, demonstrating the impact of the bug on gradient calculations.', "Detailed explanation is provided for the bug's occurrence during the backward pass, highlighting the issue of overwriting gradients when a variable is used more than once in an expression."]}], 'duration': 669.748, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku04530288.jpg', 'highlights': ['The bug in backpropagation occurs when a variable is used more than once, causing gradients to be overwritten, leading to incorrect gradient calculations.', 'The solution to the bug involves accumulating gradients using plus equals, ensuring that the gradients from different branches are added together to avoid overwriting and to achieve correct gradient calculations.', 'The chapter discusses the concept of topological sort to ensure proper ordering of nodes for backpropagation.', 'The process of calling dot_backward on all nodes in a topological order to calculate correct derivatives is explained.', 'The detailed steps of implementing backward propagation, including the use of topological sorting and the application of correct derivatives, are outlined.', 'A specific example is provided where a variable A is used to calculate B (A + A) and the backward pass results in the incorrect gradient of 1 instead of 2, demonstrating the impact of the bug on gradient calculations.', "Detailed explanation is provided for the bug's occurrence during the backward pass, highlighting the issue of overwriting gradients when a variable is used more than once in an expression."]}, {'end': 5727.874, 'segs': [{'end': 5251.386, 'src': 'embed', 'start': 5200.316, 'weight': 3, 'content': [{'end': 5203.937, 'text': "So I'm not going to need any of this now that we've derived all of it.", 'start': 5200.316, 'duration': 3.621}, {'end': 5208.719, 'text': 'We are going to keep this because I want to come back to it.', 'start': 5205.898, 'duration': 2.821}, {'end': 5218.683, 'text': 'Delete the tanh, delete our modigang example, delete the step, delete this, keep the code that draws.', 'start': 5209.899, 'duration': 8.784}, {'end': 5225.465, 'text': 'and then delete this example and leave behind only the definition of value.', 'start': 5219.901, 'duration': 5.564}, {'end': 5229.508, 'text': "and now let's come back to this non-linearity here that we implemented the tanh.", 'start': 5225.465, 'duration': 4.043}, {'end': 5237.915, 'text': 'now i told you that we could have broken down tanh into its explicit atoms in terms of other expressions, if we had the exp function.', 'start': 5229.508, 'duration': 8.407}, {'end': 5243.059, 'text': 'so, if you remember, tanh is defined like this, and we chose to develop tanh as a single function,', 'start': 5237.915, 'duration': 5.144}, {'end': 5246.382, 'text': "and we can do that because we know it's derivative and we can back propagate through it.", 'start': 5243.059, 'duration': 3.323}, {'end': 5251.386, 'text': 'but we can also break down tanh into and express it as a function of exp,', 'start': 5247.182, 'duration': 4.204}], 'summary': 'Discussing the breakdown of tanh and its expression as a function of exp.', 'duration': 51.07, 'max_score': 5200.316, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05200316.jpg'}, {'end': 5413.723, 'src': 'embed', 'start': 5387.597, 'weight': 0, 'content': [{'end': 5397.368, 'text': 'So instead what happens is in Python, the way this works is you are free to define something called the Rmul, and rmol is kind of like a fallback.', 'start': 5387.597, 'duration': 9.771}, {'end': 5402.633, 'text': "so if python can't do two times a, it will check.", 'start': 5397.368, 'duration': 5.265}, {'end': 5408.979, 'text': 'if, by any chance, a knows how to multiply two, and that will be called into rmol.', 'start': 5402.633, 'duration': 6.346}, {'end': 5413.723, 'text': "so because python can't do two times a, it will check is there an rmol in value?", 'start': 5408.979, 'duration': 4.744}], 'summary': 'In python, rmul and rmol are used as fallback for multiplication operations.', 'duration': 26.126, 'max_score': 5387.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05387597.jpg'}, {'end': 5478.619, 'src': 'embed', 'start': 5453.284, 'weight': 1, 'content': [{'end': 5460.37, 'text': "So we pop out the Python number, we use math.exp to exponentiate it, create a new value object, everything that we've seen before.", 'start': 5453.284, 'duration': 7.086}, {'end': 5463.912, 'text': 'The tricky part, of course, is how do you backpropagate through e to the x?', 'start': 5460.93, 'duration': 2.982}, {'end': 5469.956, 'text': 'And so here you can potentially pause the video and think about what should go here.', 'start': 5464.993, 'duration': 4.963}, {'end': 5478.619, 'text': 'Okay, so basically, we need to know what is the local derivative of e to the x.', 'start': 5473.433, 'duration': 5.186}], 'summary': 'Using math.exp to exponentiate python numbers and backpropagating through e to the x.', 'duration': 25.335, 'max_score': 5453.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05453284.jpg'}, {'end': 5655.054, 'src': 'embed', 'start': 5629.789, 'weight': 2, 'content': [{'end': 5636.534, 'text': 'So here we create the other value, which is just this data raised to the power of other, and other here could be, for example, negative one.', 'start': 5629.789, 'duration': 6.745}, {'end': 5638.255, 'text': "That's what we are hoping to achieve.", 'start': 5636.794, 'duration': 1.461}, {'end': 5641.506, 'text': 'And then this is the backward stub.', 'start': 5639.525, 'duration': 1.981}, {'end': 5643.307, 'text': 'And this is the fun part,', 'start': 5642.047, 'duration': 1.26}, {'end': 5655.054, 'text': 'which is what is the chain rule expression here for back propagating through the power function where the power is to the power of some kind of a constant.', 'start': 5643.307, 'duration': 11.747}], 'summary': 'Calculating value raised to the power and back propagation for chain rule.', 'duration': 25.265, 'max_score': 5629.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05629789.jpg'}], 'start': 5200.316, 'title': 'Implementing non-linear functions in neural networks and arithmetic operations in python', 'summary': 'Covers implementing non-linear functions in neural networks by breaking down the tanh function and implementing arithmetic operations in python for value objects, including addition, multiplication, exponentiation, and division, addressing specific challenges and functions.', 'chapters': [{'end': 5268.26, 'start': 5200.316, 'title': 'Implementing non-linear functions in neural networks', 'summary': 'Discusses the process of breaking down the tanh function into its explicit atoms, using it as a function of exp, and implementing more expressions like exponentiation, addition, subtraction, and division, demonstrating that it yields the same results and gradients.', 'duration': 67.944, 'highlights': ['By breaking down the tanh function into its explicit atoms and expressing it as a function of exp, it forces the implementation of more expressions like exponentiation, addition, subtraction, and division, providing a good exercise to go through.', 'Developing tanh as a single function based on its derivative allows for back propagation, but breaking it down into a function of exp demonstrates that it yields the same results and gradients.', 'The chapter discusses the process of keeping certain code, deleting unnecessary elements, and focusing on the implementation of non-linear functions within neural networks.']}, {'end': 5727.874, 'start': 5268.26, 'title': 'Implementing arithmetic operations in python', 'summary': 'Explains how to implement arithmetic operations (addition, multiplication, exponentiation, and division) for value objects in python, addressing challenges such as handling non-value inputs and defining the rmul and pow functions.', 'duration': 459.614, 'highlights': ['Implementing the Rmul function for handling multiplication of a non-value object with a value object', 'Introducing the exp function to handle exponentiation and backpropagation through e to the x', 'Redefining the pow function to handle raising a value to a power, including the backward pass']}], 'duration': 527.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05200316.jpg', 'highlights': ['Implementing the Rmul function for handling multiplication of a non-value object with a value object', 'Introducing the exp function to handle exponentiation and backpropagation through e to the x', 'Redefining the pow function to handle raising a value to a power, including the backward pass', 'By breaking down the tanh function into its explicit atoms and expressing it as a function of exp, it forces the implementation of more expressions like exponentiation, addition, subtraction, and division, providing a good exercise to go through', 'Developing tanh as a single function based on its derivative allows for back propagation, but breaking it down into a function of exp demonstrates that it yields the same results and gradients', 'The chapter discusses the process of keeping certain code, deleting unnecessary elements, and focusing on the implementation of non-linear functions within neural networks']}, {'end': 6401.654, 'segs': [{'end': 5781.25, 'src': 'embed', 'start': 5751.28, 'weight': 2, 'content': [{'end': 5754.144, 'text': 'and i realized that we actually also have to know how to subtract.', 'start': 5751.28, 'duration': 2.864}, {'end': 5757.528, 'text': 'so right now a minus b will not work.', 'start': 5754.144, 'duration': 3.384}, {'end': 5761.974, 'text': 'to make it work we need one more piece of code here.', 'start': 5757.528, 'duration': 4.446}, {'end': 5766.382, 'text': 'and basically this is the subtraction.', 'start': 5761.974, 'duration': 4.408}, {'end': 5773.986, 'text': "and the way we're going to implement subtraction is we're going to implement it by addition of a negation and then to implement negation we're going to multiply by negative one.", 'start': 5766.382, 'duration': 7.604}, {'end': 5781.25, 'text': "so just again using the stuff we've already built and just expressing it in terms of what we have, and a minus b is not working.", 'start': 5773.986, 'duration': 7.264}], 'summary': 'To implement subtraction, we need to add negation by multiplying by -1.', 'duration': 29.97, 'max_score': 5751.28, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05751280.jpg'}, {'end': 5953.553, 'src': 'embed', 'start': 5915.973, 'weight': 3, 'content': [{'end': 5918.955, 'text': 'And so the reason I wanted to go through this exercise is number one.', 'start': 5915.973, 'duration': 2.982}, {'end': 5922.817, 'text': 'we got to practice a few more operations and writing more backwards passes.', 'start': 5918.955, 'duration': 3.862}, {'end': 5931.082, 'text': 'And number two, I wanted to illustrate the point that the level at which you implement your operations is totally up to you.', 'start': 5923.218, 'duration': 7.864}, {'end': 5936.086, 'text': 'You can implement backward passes for tiny expressions like a single individual plus or a single times.', 'start': 5931.443, 'duration': 4.643}, {'end': 5940.167, 'text': 'Or you can implement them for, say, 10H,', 'start': 5936.886, 'duration': 3.281}, {'end': 5945.83, 'text': "which potentially you can see it as a composite operation because it's made up of all these more atomic operations.", 'start': 5940.167, 'duration': 5.663}, {'end': 5948.311, 'text': 'But really, all of this is kind of like a fake concept.', 'start': 5946.49, 'duration': 1.821}, {'end': 5953.553, 'text': 'All that matters is we have some kind of inputs and some kind of an output, and this output is a function of the inputs in some way.', 'start': 5948.671, 'duration': 4.882}], 'summary': 'Practice more operations, implement backward passes, and illustrate implementation levels are flexible.', 'duration': 37.58, 'max_score': 5915.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05915973.jpg'}, {'end': 6241.707, 'src': 'embed', 'start': 6207.424, 'weight': 0, 'content': [{'end': 6209.525, 'text': 'And we can pop out the individual number with .', 'start': 6207.424, 'duration': 2.101}, {'end': 6219.906, 'text': 'item So basically Torch can do what we did in microGrad as a special case when your tensors are all single element tensors.', 'start': 6209.525, 'duration': 10.381}, {'end': 6229.876, 'text': 'But the big deal with PyTorch is that everything is significantly more efficient because we are working with these tensor objects and we can do lots of operations in parallel on all of these tensors.', 'start': 6220.587, 'duration': 9.289}, {'end': 6235.119, 'text': "But otherwise, what we've built very much agrees with the API of PyTorch.", 'start': 6231.715, 'duration': 3.404}, {'end': 6241.707, 'text': 'Okay, so now that we have some machinery to build out pretty complicated mathematical expressions, we can also start building out neural nets.', 'start': 6235.8, 'duration': 5.907}], 'summary': 'Torch is less efficient than pytorch, which can process operations in parallel on tensor objects, allowing the building of complex mathematical expressions and neural nets.', 'duration': 34.283, 'max_score': 6207.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06207424.jpg'}], 'start': 5727.954, 'title': 'Neural network implementation', 'summary': 'Explores back propagation and chain rule in neural networks, emphasizing correct forward and backward passes. it also covers the implementation of neural networks using pytorch, including tensor creation, defining gradients, arithmetic operations, and building a single neuron model, and discusses the efficiency of pytorch in handling tensor operations.', 'chapters': [{'end': 5970.851, 'start': 5727.954, 'title': 'Back propagation and chain rule', 'summary': 'Explains the application of chain rule and back propagation in neural networks, demonstrating the implementation of subtraction and composite operations, with a focus on achieving correct forward and backward passes.', 'duration': 242.897, 'highlights': ['The chapter illustrates the implementation of subtraction in neural networks by using negation and multiplication by -1, emphasizing the necessity of additional code for proper functionality.', 'It demonstrates the breakdown of a composite operation (10H) into more atomic operations, ensuring the same forward and backward passes and identical gradients on leaf nodes, highlighting the mathematical equivalence of the operations.', 'The chapter emphasizes the flexibility in implementing operations in back propagation, stating that as long as local gradients can be written and chained, the design of the functions is at the discretion of the user.']}, {'end': 6401.654, 'start': 5972.136, 'title': 'Neural networks with pytorch', 'summary': 'Demonstrates the implementation of neural networks using pytorch, including the creation of tensors, defining gradients, performing arithmetic operations, and building a single neuron model, while highlighting the similarities and differences with micrograd and the efficiency of pytorch in handling tensor operations.', 'duration': 429.518, 'highlights': ["PyTorch allows the creation of tensors, which are n-dimensional arrays of scalars, in contrast to micrograd's scalar valued engine.", 'The process of defining gradients in PyTorch involves explicitly stating that all nodes require gradients, in contrast to the default setting of false for leaf nodes for efficiency reasons.', "Arithmetic operations can be performed in PyTorch similar to micrograd, and tensor objects in PyTorch have 'data' and 'grad' attributes, which can be accessed using 'dot item' and 'backward' functions respectively.", 'The efficiency of PyTorch is attributed to its ability to handle operations in parallel on tensor objects, making it significantly more efficient than micrograd.', "The chapter also delves into building neural networks, starting with a single neuron model that subscribes to the PyTorch API for designing neural network modules, emphasizing the similarities with PyTorch's autograd API and the aim for efficiency."]}], 'duration': 673.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku05727954.jpg', 'highlights': ['PyTorch allows creation of n-dimensional tensors for efficient tensor operations.', "PyTorch's efficiency is attributed to parallel operations on tensor objects.", 'Demonstrates implementation of subtraction in neural networks using negation and multiplication by -1.', 'Emphasizes flexibility in implementing operations in back propagation.', 'Discusses the breakdown of composite operations into atomic operations for identical gradients.']}, {'end': 7240.957, 'segs': [{'end': 6457.49, 'src': 'embed', 'start': 6401.654, 'weight': 0, 'content': [{'end': 6410.837, 'text': 'And now what we want to do is for wi xi in.', 'start': 6401.654, 'duration': 9.183}, {'end': 6421.42, 'text': 'we want to multiply wi times xi and then we want to sum all of that together to come up with an activation and add also self.b on top.', 'start': 6410.837, 'duration': 10.583}, {'end': 6423.438, 'text': "So that's the raw activation.", 'start': 6422.438, 'duration': 1}, {'end': 6426.199, 'text': 'And then of course we need to pass that through a null linearity.', 'start': 6424.019, 'duration': 2.18}, {'end': 6428.69, 'text': "So what we're going to be returning is act.", 'start': 6426.739, 'duration': 1.951}, {'end': 6431.061, 'text': "10h And here's out.", 'start': 6428.69, 'duration': 2.371}, {'end': 6435.002, 'text': 'So now we see that we are getting some outputs.', 'start': 6432.361, 'duration': 2.641}, {'end': 6439.904, 'text': 'And we get a different output from a neuron each time because we are initializing different weights and biases.', 'start': 6435.602, 'duration': 4.302}, {'end': 6448.264, 'text': 'And then to be a bit more efficient here actually, sum, by the way, takes a second optional parameter, which is the start.', 'start': 6441.38, 'duration': 6.884}, {'end': 6451.286, 'text': 'And by default, the start is zero.', 'start': 6449.045, 'duration': 2.241}, {'end': 6455.389, 'text': 'So these elements of this sum will be added on top of zero to begin with.', 'start': 6451.626, 'duration': 3.763}, {'end': 6457.49, 'text': 'But actually we can just start with self.b.', 'start': 6455.809, 'duration': 1.681}], 'summary': 'Multiplying wi by xi and summing to get activation, with varying outputs due to different weights and biases.', 'duration': 55.836, 'max_score': 6401.654, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06401654.jpg'}, {'end': 6653.692, 'src': 'heatmap', 'start': 6476.684, 'weight': 0.955, 'content': [{'end': 6478.866, 'text': "Next up, we're going to define a layer of neurons.", 'start': 6476.684, 'duration': 2.182}, {'end': 6481.908, 'text': 'So here we have a schematic for a MLP.', 'start': 6479.486, 'duration': 2.422}, {'end': 6489.113, 'text': "So we see that these MLPs, each layer this is one layer has actually a number of neurons and they're not connected to each other,", 'start': 6482.668, 'duration': 6.445}, {'end': 6490.694, 'text': 'but all of them are fully connected to the input.', 'start': 6489.113, 'duration': 1.581}, {'end': 6495.798, 'text': "So what is a layer of neurons? It's just a set of neurons evaluated independently.", 'start': 6491.474, 'duration': 4.324}, {'end': 6501.642, 'text': "So in the interest of time, I'm going to do something fairly straightforward here.", 'start': 6496.838, 'duration': 4.804}, {'end': 6508.296, 'text': "It's literally a layer is just a list of neurons.", 'start': 6503.268, 'duration': 5.028}, {'end': 6512.503, 'text': 'And then how many neurons do we have? We take that as an input argument here.', 'start': 6509.178, 'duration': 3.325}, {'end': 6515.728, 'text': 'How many neurons do you want in your layer? Number of outputs in this layer.', 'start': 6512.764, 'duration': 2.964}, {'end': 6521.041, 'text': 'And so we just initialize completely independent neurons with this given dimensionality.', 'start': 6516.758, 'duration': 4.283}, {'end': 6525.684, 'text': 'And when we call on it, we just independently evaluate them.', 'start': 6521.481, 'duration': 4.203}, {'end': 6529.486, 'text': 'So now instead of a neuron, we can make a layer of neurons.', 'start': 6526.484, 'duration': 3.002}, {'end': 6531.708, 'text': "They are two-dimensional neurons, and let's have three of them.", 'start': 6529.726, 'duration': 1.982}, {'end': 6536.731, 'text': 'And now we see that we have three independent evaluations of three different neurons.', 'start': 6532.528, 'duration': 4.203}, {'end': 6543.876, 'text': "Okay, and finally, let's complete this picture and define an entire multi-layered perceptron, or MLP.", 'start': 6539.052, 'duration': 4.824}, {'end': 6548.494, 'text': 'And as we can see here in an MLP, these layers just feed into each other sequentially.', 'start': 6544.773, 'duration': 3.721}, {'end': 6553.456, 'text': "So let's come here and I'm just going to copy the code here in interest of time.", 'start': 6549.375, 'duration': 4.081}, {'end': 6556.077, 'text': 'So an MLP is very similar.', 'start': 6554.556, 'duration': 1.521}, {'end': 6564, 'text': "We're taking the number of inputs as before, but now, instead of taking a single N out, which is number of neurons in a single layer,", 'start': 6556.877, 'duration': 7.123}, {'end': 6565.64, 'text': "we're going to take a list of N outs.", 'start': 6564, 'duration': 1.64}, {'end': 6569.542, 'text': 'And this list defines the sizes of all the layers that we want in our MLP.', 'start': 6566.121, 'duration': 3.421}, {'end': 6577.149, 'text': 'So here we just put them all together and then iterate over consecutive pairs of these sizes and create layer objects for them.', 'start': 6570.563, 'duration': 6.586}, {'end': 6580.332, 'text': 'And then in the call function, we are just calling them sequentially.', 'start': 6578.01, 'duration': 2.322}, {'end': 6582.094, 'text': "So that's an MLP really.", 'start': 6580.753, 'duration': 1.341}, {'end': 6584.596, 'text': "And let's actually re-implement this picture.", 'start': 6583.035, 'duration': 1.561}, {'end': 6588.68, 'text': 'So we want three input neurons and then two layers of four and an output unit.', 'start': 6584.716, 'duration': 3.964}, {'end': 6591.142, 'text': 'So we want.', 'start': 6589.901, 'duration': 1.241}, {'end': 6593.489, 'text': 'a three-dimensional input.', 'start': 6592.628, 'duration': 0.861}, {'end': 6594.952, 'text': 'Say this is an example input.', 'start': 6593.77, 'duration': 1.182}, {'end': 6600.099, 'text': 'We want three inputs into two layers of four and one output.', 'start': 6595.472, 'duration': 4.627}, {'end': 6602.162, 'text': 'And this, of course, is an MLP.', 'start': 6600.7, 'duration': 1.462}, {'end': 6604.285, 'text': 'And there we go.', 'start': 6603.845, 'duration': 0.44}, {'end': 6605.888, 'text': "That's a forward pass of an MLP.", 'start': 6604.586, 'duration': 1.302}, {'end': 6608.307, 'text': 'make this a little bit nicer.', 'start': 6607.026, 'duration': 1.281}, {'end': 6615.751, 'text': "you see how we have just a single element, but it's wrapped in a list, because layer always returns lists circle for convenience,", 'start': 6608.307, 'duration': 7.444}, {'end': 6620.755, 'text': 'return outs at zero if len outs is exactly a single element.', 'start': 6615.751, 'duration': 5.004}, {'end': 6628.279, 'text': 'else return fullest, and this will allow us to just get a single value out at the last layer that only has a single neuron.', 'start': 6620.755, 'duration': 7.524}, {'end': 6638.704, 'text': 'and finally, we should be able to draw dot of n, of x, and, as you might imagine, these expressions are now getting relatively involved.', 'start': 6628.279, 'duration': 10.425}, {'end': 6647.208, 'text': "so this is an entire mlp that we're defining now, all the way until a single output.", 'start': 6638.704, 'duration': 8.504}, {'end': 6653.692, 'text': 'okay, and so obviously you would never differentiate on pen and paper these expressions,', 'start': 6647.208, 'duration': 6.484}], 'summary': 'The transcript discusses defining layers of neurons and creating a multi-layered perceptron (mlp) with specific examples and code implementation.', 'duration': 177.008, 'max_score': 6476.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06476684.jpg'}, {'end': 6564, 'src': 'embed', 'start': 6532.528, 'weight': 3, 'content': [{'end': 6536.731, 'text': 'And now we see that we have three independent evaluations of three different neurons.', 'start': 6532.528, 'duration': 4.203}, {'end': 6543.876, 'text': "Okay, and finally, let's complete this picture and define an entire multi-layered perceptron, or MLP.", 'start': 6539.052, 'duration': 4.824}, {'end': 6548.494, 'text': 'And as we can see here in an MLP, these layers just feed into each other sequentially.', 'start': 6544.773, 'duration': 3.721}, {'end': 6553.456, 'text': "So let's come here and I'm just going to copy the code here in interest of time.", 'start': 6549.375, 'duration': 4.081}, {'end': 6556.077, 'text': 'So an MLP is very similar.', 'start': 6554.556, 'duration': 1.521}, {'end': 6564, 'text': "We're taking the number of inputs as before, but now, instead of taking a single N out, which is number of neurons in a single layer,", 'start': 6556.877, 'duration': 7.123}], 'summary': 'Three independent evaluations of neurons and multi-layered perceptron (mlp) defined.', 'duration': 31.472, 'max_score': 6532.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06532528.jpg'}, {'end': 6677.255, 'src': 'embed', 'start': 6653.692, 'weight': 4, 'content': [{'end': 6662.421, 'text': 'but with micrograd we will be able to back propagate all the way through this and backpropagate into these weights of all these neurons.', 'start': 6653.692, 'duration': 8.729}, {'end': 6663.782, 'text': "So let's see how that works.", 'start': 6663.002, 'duration': 0.78}, {'end': 6668.066, 'text': "Okay, so let's create ourselves a very simple example data set here.", 'start': 6664.303, 'duration': 3.763}, {'end': 6670.489, 'text': 'So this data set has four examples.', 'start': 6668.847, 'duration': 1.642}, {'end': 6674.813, 'text': 'And so we have four possible inputs into the neural net.', 'start': 6671.229, 'duration': 3.584}, {'end': 6677.255, 'text': 'And we have four desired targets.', 'start': 6675.573, 'duration': 1.682}], 'summary': 'Using micrograd, backpropagation can be performed through a simple neural network with 4 examples and inputs.', 'duration': 23.563, 'max_score': 6653.692, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06653692.jpg'}, {'end': 7105.495, 'src': 'embed', 'start': 7072.958, 'weight': 5, 'content': [{'end': 7074.76, 'text': 'And those we, of course, we want to change.', 'start': 7072.958, 'duration': 1.802}, {'end': 7084.225, 'text': "Okay, so now we're going to want some convenience codes to gather up all of the parameters of the neural net so that we can operate on all of them simultaneously.", 'start': 7075.98, 'duration': 8.245}, {'end': 7090.086, 'text': 'And every one of them, we will nudge a tiny amount based on the gradient information.', 'start': 7084.726, 'duration': 5.36}, {'end': 7094.168, 'text': "So let's collect the parameters of the neural net all in one array.", 'start': 7090.866, 'duration': 3.302}, {'end': 7105.495, 'text': "So let's create a parameters of self that just returns self.w, which is a list, concatenated with a list of self.b.", 'start': 7094.929, 'duration': 10.566}], 'summary': 'Neural net parameters are adjusted based on gradient information for simultaneous operation.', 'duration': 32.537, 'max_score': 7072.958, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku07072958.jpg'}], 'start': 6401.654, 'title': 'Neural networks and mlp implementation', 'summary': "Covers neural network activation and summation, including the efficiency of the 'sum' function, and further discusses the implementation of a multi-layered perceptron (mlp) in python, explaining neurons, layers, loss calculation, and backpropagation.", 'chapters': [{'end': 6457.49, 'start': 6401.654, 'title': 'Neural network activation and summation', 'summary': "Discusses the process of multiplying weights with inputs, summing them together to obtain raw activation, and passing it through a non-linearity to generate outputs, with the mention of the efficiency of the 'sum' function and its optional parameter.", 'duration': 55.836, 'highlights': ['The process involves multiplying weights with inputs, summing them together, and adding the bias to obtain raw activation, followed by passing it through a non-linearity to generate outputs.', 'Different outputs are obtained from a neuron each time due to the initialization of different weights and biases.', "The 'sum' function has a second optional parameter, which can be utilized for efficiency, where the elements of the sum are added on top of a specified value, such as self.b or zero."]}, {'end': 7240.957, 'start': 6458.591, 'title': 'Implementing multi-layered perceptron', 'summary': 'Introduces the implementation of a multi-layered perceptron (mlp) in python, explaining the concept of neurons, layers, and the calculation of loss, demonstrating the iterative process of improving neural network predictions through backpropagation, and defining parameters of the neural network.', 'duration': 782.366, 'highlights': ['The chapter covers the implementation of a multi-layered perceptron (MLP) in Python, explaining the concept of neurons and layers.', 'The chapter demonstrates the iterative process of improving neural network predictions through backpropagation.', 'The chapter defines parameters of the neural network and explains the process of gathering all parameters to operate on them simultaneously.']}], 'duration': 839.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku06401654.jpg', 'highlights': ['The process involves multiplying weights with inputs, summing them together, and adding the bias to obtain raw activation, followed by passing it through a non-linearity to generate outputs.', "The 'sum' function has a second optional parameter, which can be utilized for efficiency, where the elements of the sum are added on top of a specified value, such as self.b or zero.", 'Different outputs are obtained from a neuron each time due to the initialization of different weights and biases.', 'The chapter covers the implementation of a multi-layered perceptron (MLP) in Python, explaining the concept of neurons and layers.', 'The chapter demonstrates the iterative process of improving neural network predictions through backpropagation.', 'The chapter defines parameters of the neural network and explains the process of gathering all parameters to operate on them simultaneously.']}, {'end': 8751.74, 'segs': [{'end': 8022.075, 'src': 'embed', 'start': 7989.498, 'weight': 1, 'content': [{'end': 7996.961, 'text': 'And so the grads ended up accumulating and it effectively gave us a massive step size and it made us converge extremely fast.', 'start': 7989.498, 'duration': 7.463}, {'end': 8006.245, 'text': 'But basically now we have to do more steps to get to very low values of loss and get YPRED to be really good.', 'start': 7999.542, 'duration': 6.703}, {'end': 8017.492, 'text': "We can try to step a bit greater Yeah, we're gonna get closer and closer to one minus one and one.", 'start': 8006.745, 'duration': 10.747}, {'end': 8022.075, 'text': 'So working with neural nets is sometimes tricky,', 'start': 8018.593, 'duration': 3.482}], 'summary': 'Neural nets yielded fast convergence with large step size, but require more steps for low loss and accurate predictions.', 'duration': 32.577, 'max_score': 7989.498, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku07989498.jpg'}, {'end': 8135.868, 'src': 'embed', 'start': 8103.886, 'weight': 0, 'content': [{'end': 8105.307, 'text': 'the network is doing what you want it to do.', 'start': 8103.886, 'duration': 1.421}, {'end': 8112.228, 'text': 'Yeah, so we just have a blob of neural stuff, and we can make it do arbitrary things.', 'start': 8108.086, 'duration': 4.142}, {'end': 8114.21, 'text': "And that's what gives neural nets their power.", 'start': 8112.809, 'duration': 1.401}, {'end': 8117.651, 'text': 'This is a very tiny network with 41 parameters.', 'start': 8116.111, 'duration': 1.54}, {'end': 8126.445, 'text': 'you can build significantly more complicated neural nets with billions at this point almost trillions of parameters,', 'start': 8119.463, 'duration': 6.982}, {'end': 8135.868, 'text': "and it's a massive blob of neural tissue, simulated neural tissue, roughly speaking and you can make you do extremely complex problems.", 'start': 8126.445, 'duration': 9.423}], 'summary': "Neural network's power lies in its ability to perform arbitrary tasks with billions or trillions of parameters.", 'duration': 31.982, 'max_score': 8103.886, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku08103886.jpg'}, {'end': 8338.228, 'src': 'embed', 'start': 8310.347, 'weight': 2, 'content': [{'end': 8314.268, 'text': 'And NN.module in PyTorch has also a zero grad, which I refactored out here.', 'start': 8310.347, 'duration': 3.921}, {'end': 8318.393, 'text': "So that's the end of micrograd, really.", 'start': 8316.471, 'duration': 1.922}, {'end': 8326.68, 'text': "Then there's a test which you'll see basically creates two chunks of code, one in micrograd and one in PyTorch.", 'start': 8318.653, 'duration': 8.027}, {'end': 8332.525, 'text': "And we'll make sure that the forward and the backward paths agree identically for a slightly less complicated expression.", 'start': 8327.08, 'duration': 5.445}, {'end': 8333.906, 'text': 'a slightly more complicated expression.', 'start': 8332.525, 'duration': 1.381}, {'end': 8335.906, 'text': 'Everything agrees.', 'start': 8334.425, 'duration': 1.481}, {'end': 8338.228, 'text': 'So we agree with PyTorch on all of these operations.', 'start': 8336.308, 'duration': 1.92}], 'summary': 'Refactored nn.module in pytorch; verified agreement with pytorch on operations.', 'duration': 27.881, 'max_score': 8310.347, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku08310347.jpg'}, {'end': 8546.283, 'src': 'embed', 'start': 8515.985, 'weight': 3, 'content': [{'end': 8519.447, 'text': 'And if you just search for 10H, you get apparently 2, 800 results and 406 files.', 'start': 8515.985, 'duration': 3.462}, {'end': 8521.489, 'text': "So I don't know what these files are doing, honestly.", 'start': 8519.467, 'duration': 2.022}, {'end': 8529.508, 'text': 'and why there are so many mentions of 10H.', 'start': 8527.947, 'duration': 1.561}, {'end': 8532.031, 'text': 'But unfortunately, these libraries are quite complex.', 'start': 8530.049, 'duration': 1.982}, {'end': 8534.573, 'text': "They're meant to be used, not really inspected.", 'start': 8532.051, 'duration': 2.522}, {'end': 8542.079, 'text': 'Eventually, I did stumble on someone who tries to change the 10H backward code for some reason.', 'start': 8535.774, 'duration': 6.305}, {'end': 8546.283, 'text': 'And someone here pointed to the CPU kernel and the CUDA kernel for 10H backward.', 'start': 8542.86, 'duration': 3.423}], 'summary': 'Approximately 2,800 results and 406 files mention 10h, which are complex libraries not meant for inspection.', 'duration': 30.298, 'max_score': 8515.985, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku08515985.jpg'}], 'start': 7241.697, 'title': 'Neural network training, bugs, optimization, and libraries', 'summary': 'Delves into the training process of neural networks, emphasizing gradient descent, learning rate setting, and convergence, discusses bugs causing slower optimization, introduces micrograd library with successful implementation and support for max margin loss, and covers challenges of finding the 10h backward in pytorch.', 'chapters': [{'end': 7825.806, 'start': 7241.697, 'title': 'Neural network training process', 'summary': 'Discusses the process of training a neural network using gradient descent, iterating through forward pass, backward pass, and update steps, with an emphasis on the subtle art of setting the learning rate, leading to successful convergence and low loss.', 'duration': 584.109, 'highlights': ['The neural net has 41 parameters.', 'Iterative process of forward pass, backward pass, and update in gradient descent.', 'Importance of setting the learning rate for stable convergence.']}, {'end': 8289.432, 'start': 7825.806, 'title': 'Neural nets bugs and optimization', 'summary': 'Discusses a subtle bug in neural nets where the gradients accumulate without being reset to zero, resulting in slower optimization and the need for more steps to achieve lower loss values, while also providing an intuitive understanding of neural networks and their training process.', 'duration': 463.626, 'highlights': ['The bug in neural nets where the gradients accumulate without being reset to zero results in slower optimization and the need for more steps to achieve lower loss values.', 'The intuitive understanding of neural networks and their training process is provided, emphasizing the mathematical expressions for the forward pass, loss function, backpropagation, and gradient descent.', 'The discussion on the emergent properties and complexity of neural nets highlights their power in solving extremely complex problems.']}, {'end': 8469.673, 'start': 8290.962, 'title': 'Micrograd neural networks library', 'summary': "Introduces the micrograd neural networks library, demonstrating its compatibility with pytorch's nn.module class, successful implementation of forward and backward paths, support for batching, and utilization of max margin loss and learning rate decay, resulting in a successful binary classification demo.", 'duration': 178.711, 'highlights': ["The MicroGrad neural networks library is designed to be compatible with PyTorch's NN.module class, matching its API and incorporating similar functionalities, such as zero grad (refactored from NN.module in PyTorch).", 'Successful implementation of forward and backward paths in MicroGrad, as demonstrated by creating and testing two sets of code in both MicroGrad and PyTorch, ensuring agreement on all operations for slightly less and more complicated expressions.', 'The binary classification demo in MicroGrad involves a more complex multi-layer perceptron (MLP) and a max margin loss, supporting batching for larger datasets and demonstrating successful separation of red and blue data points on the decision surface of the neural net.', 'Utilization of learning rate decay in the training loop of MicroGrad, where the learning rate is scaled as a function of the number of iterations, allowing for the fine-tuning of details as the network stabilizes near the end of the training process.']}, {'end': 8751.74, 'start': 8470.274, 'title': 'Finding 10h backward in pytorch', 'summary': "Covers the challenges of finding the backward pass for 10h in pytorch, including the complexity of pytorch's codebase and the process of registering new functions in pytorch.", 'duration': 281.466, 'highlights': ["PyTorch's codebase complexity makes it challenging to find the backward pass for 10H, with over 2800 results and 406 files mentioning 10H.", "Registering new functions in PyTorch involves subclassing Torch.rgrad.function, implementing forward and backward passes, and using the new function as a Lego block in a larger structure of PyTorch's blocks.", "The complexity and size of PyTorch's codebase pose a challenge in finding specific functionalities, unlike the simplicity of micrograd.", 'The process of finding the 10H backward code in PyTorch involves different kernels for CPU and GPU devices, including complexities related to data types and specific operations.']}], 'duration': 1510.043, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VMj-3S1tku0/pics/VMj-3S1tku07241697.jpg', 'highlights': ['The neural net has 41 parameters.', 'The bug in neural nets where the gradients accumulate without being reset to zero results in slower optimization and the need for more steps to achieve lower loss values.', "The MicroGrad neural networks library is designed to be compatible with PyTorch's NN.module class, matching its API and incorporating similar functionalities, such as zero grad (refactored from NN.module in PyTorch).", "PyTorch's codebase complexity makes it challenging to find the backward pass for 10H, with over 2800 results and 406 files mentioning 10H."]}], 'highlights': ['Micrograd implements backpropagation for efficient gradient evaluation of a loss function with respect to neural network weights.', 'The entire neural network library built on top of micrograd comprises only 150 lines of code, showcasing the efficiency of micrograd in training neural networks.', 'The process involves multiplying weights with inputs, summing them together, and adding the bias to obtain raw activation, followed by passing it through a non-linearity to generate outputs.', 'PyTorch allows creation of n-dimensional tensors for efficient tensor operations.', 'The bug in backpropagation occurs when a variable is used more than once, causing gradients to be overwritten, leading to incorrect gradient calculations.']}