title

Lecture 11 - Introduction to Neural Networks | Stanford CS229: Machine Learning (Autumn 2018)

description

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/3nqNTNo
Kian Katanforoosh
Lecturer, Computer Science
To follow along with the course schedule and syllabus, visit:
http://cs229.stanford.edu/syllabus-autumn2018.html

detail

{'title': 'Lecture 11 - Introduction to Neural Networks | Stanford CS229: Machine Learning (Autumn 2018)', 'heatmap': [{'end': 1988.136, 'start': 1926.24, 'weight': 0.782}, {'end': 2119.647, 'start': 2062.072, 'weight': 0.735}, {'end': 2219.984, 'start': 2175.478, 'weight': 0.862}, {'end': 2474.175, 'start': 2398.132, 'weight': 0.892}, {'end': 2704.567, 'start': 2556.857, 'weight': 0.824}, {'end': 3226.532, 'start': 3172.611, 'weight': 0.967}, {'end': 3387.779, 'start': 3269.264, 'weight': 0.826}, {'end': 4245.409, 'start': 4139.318, 'weight': 0.965}], 'summary': 'The lecture series on introduction to neural networks covers deep learning, logistic regression, image classification, neural network basics, disease detection, softmax multi-class network, neural network architecture, fully connected networks, vectorizing and parallelizing operations, and backpropagation, providing insights into their applications and operations with specific examples and quantifiable data.', 'chapters': [{'end': 281.201, 'segs': [{'end': 78.381, 'src': 'embed', 'start': 4.537, 'weight': 0, 'content': [{'end': 7.681, 'text': 'Hello everyone, welcome to CS229.', 'start': 4.537, 'duration': 3.144}, {'end': 13.649, 'text': "Today we're going to talk about deep learning and neural networks.", 'start': 8.823, 'duration': 4.826}, {'end': 20.939, 'text': "We're going to have two lectures on that, one today and a little bit more of it on Monday.", 'start': 15.292, 'duration': 5.647}, {'end': 24.623, 'text': "Don't hesitate to ask questions during the lecture.", 'start': 22.221, 'duration': 2.402}, {'end': 30.408, 'text': "So stop me if you don't understand something and we'll try to build the intuition around neural network together.", 'start': 25.324, 'duration': 5.084}, {'end': 35.753, 'text': 'We will actually start with an algorithm that you guys have seen previously called logistic regression.', 'start': 30.969, 'duration': 4.784}, {'end': 41.995, 'text': "Everybody remembers logistic regression? Okay, remember it's a classification algorithm.", 'start': 36.213, 'duration': 5.782}, {'end': 50.397, 'text': "We're going to do that explain how logistic regression can be interpreted as a neural network, specific case of a neural network,", 'start': 42.515, 'duration': 7.882}, {'end': 52.598, 'text': 'and then we will go to neural networks.', 'start': 50.397, 'duration': 2.201}, {'end': 56.239, 'text': 'Sounds good? So the quick intro on deep learning.', 'start': 53.058, 'duration': 3.181}, {'end': 69.816, 'text': "So deep learning is a set of techniques that is, let's say, a subset of machine learning.", 'start': 63.331, 'duration': 6.485}, {'end': 76.24, 'text': "And it's one of the growing techniques that have been used in the industry specifically for problems in computer vision,", 'start': 69.836, 'duration': 6.404}, {'end': 78.381, 'text': 'natural language processing and speech recognition.', 'start': 76.24, 'duration': 2.141}], 'summary': 'Cs229 lecture on deep learning, including logistic regression and neural networks, for computer vision, nlp, and speech recognition.', 'duration': 73.844, 'max_score': 4.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4537.jpg'}, {'end': 144.93, 'src': 'embed', 'start': 94.246, 'weight': 1, 'content': [{'end': 116.841, 'text': "So one thing we're going to see today is that deep learning is really really computationally expensive and people had to find techniques in order to parallelize the code and use GPUs specifically in order to graphical processing units in order to be able to compute the computations in deep learning.", 'start': 94.246, 'duration': 22.595}, {'end': 122.365, 'text': 'The second part is the data available.', 'start': 117.902, 'duration': 4.463}, {'end': 129.276, 'text': 'has been growing after the internet bubble, the digitalization of the word.', 'start': 123.732, 'duration': 5.544}, {'end': 132.179, 'text': 'so now people have access to large amounts of data,', 'start': 129.276, 'duration': 2.903}, {'end': 137.284, 'text': "and this type of algorithm has the specificity of being able to learn a lot when there's a lot of data.", 'start': 132.179, 'duration': 5.105}, {'end': 144.93, 'text': 'So these models are very flexible, and the more you give them data, the more they will be able to understand the salient feature of the data.', 'start': 137.884, 'duration': 7.046}], 'summary': 'Deep learning is computationally expensive, requires parallelization and gpus, benefits from large data availability for learning flexibility.', 'duration': 50.684, 'max_score': 94.246, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys94246.jpg'}, {'end': 244.17, 'src': 'embed', 'start': 217.483, 'weight': 4, 'content': [{'end': 222.366, 'text': "Let's say for now we're constrained to the fact that there is maximum one cat per image, there's no more.", 'start': 217.483, 'duration': 4.883}, {'end': 226.788, 'text': "If you had to draw the logistic regression model, that's what you would do.", 'start': 223.947, 'duration': 2.841}, {'end': 233.162, 'text': 'You would take a cat, so this is an image of a cat, Very bad at that.', 'start': 227.709, 'duration': 5.453}, {'end': 244.17, 'text': 'Sorry In computer science, you know that images can be represented as 3D matrices.', 'start': 237.505, 'duration': 6.665}], 'summary': 'Logistic regression model constrained to one cat per image, represented as 3d matrices.', 'duration': 26.687, 'max_score': 217.483, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys217483.jpg'}], 'start': 4.537, 'title': 'Deep learning and logistic regression', 'summary': "Covers an introduction to deep learning and neural networks, focusing on logistic regression as a specific case, and highlights deep learning's application in computer vision, natural language processing, and speech recognition. it also discusses the factors contributing to the success of deep learning and the application of logistic regression for binary image classification.", 'chapters': [{'end': 78.381, 'start': 4.537, 'title': 'Cs229 - deep learning and neural networks', 'summary': 'Covers an introduction to deep learning and neural networks, with a focus on logistic regression as a specific case of a neural network, and highlights that deep learning is a subset of machine learning used in industries like computer vision, natural language processing, and speech recognition.', 'duration': 73.844, 'highlights': ['Deep learning is a subset of machine learning and is used in computer vision, natural language processing, and speech recognition', 'Logistic regression can be interpreted as a specific case of a neural network', 'The lecture will cover two sessions on deep learning and neural networks']}, {'end': 281.201, 'start': 78.401, 'title': 'Deep learning and logistic regression', 'summary': 'Discusses the factors contributing to the success of deep learning, including computational methods, data availability, and algorithm techniques, before delving into the application of logistic regression for binary classification in identifying images with cats, alongside an explanation of image representation.', 'duration': 202.8, 'highlights': ['The success of deep learning is attributed to new computational methods, the availability of large amounts of data, and the development of new algorithm techniques.', 'Deep learning is computationally expensive and utilizes parallelization techniques and GPUs for efficient computation.', 'The specificity of deep learning algorithms allows for increased learning with larger datasets, making the models more flexible and capable of understanding salient features.', 'The application of logistic regression for binary classification is demonstrated using the example of identifying images with cats, with an explanation of image representation as 3D matrices and the RGB channels.']}], 'duration': 276.664, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4537.jpg', 'highlights': ['Deep learning is a subset of machine learning and is used in computer vision, natural language processing, and speech recognition', 'The success of deep learning is attributed to new computational methods, the availability of large amounts of data, and the development of new algorithm techniques', 'The specificity of deep learning algorithms allows for increased learning with larger datasets, making the models more flexible and capable of understanding salient features', 'Logistic regression can be interpreted as a specific case of a neural network', 'The application of logistic regression for binary classification is demonstrated using the example of identifying images with cats, with an explanation of image representation as 3D matrices and the RGB channels', 'Deep learning is computationally expensive and utilizes parallelization techniques and GPUs for efficient computation', 'The lecture will cover two sessions on deep learning and neural networks']}, {'end': 678.917, 'segs': [{'end': 389.171, 'src': 'embed', 'start': 281.902, 'weight': 0, 'content': [{'end': 284.083, 'text': 'Does that make sense?', 'start': 281.902, 'duration': 2.181}, {'end': 292.171, 'text': "So the first thing we will do in order to use logistic regression to find if there is a cat on this image, we're going to flatten this into a vector.", 'start': 285.609, 'duration': 6.562}, {'end': 299.834, 'text': "So I'm going to take all the numbers in this matrix and flatten them in a vector.", 'start': 296.213, 'duration': 3.621}, {'end': 303.556, 'text': 'Just an image to vector operation, nothing more.', 'start': 301.015, 'duration': 2.541}, {'end': 308.577, 'text': 'And now I can use my logistic regression because I have a vector input.', 'start': 305.056, 'duration': 3.521}, {'end': 324.594, 'text': "So I'm going to to take all of these and push them in an operation that we call the logistic operation, which has one part, that is wx plus b,", 'start': 309.478, 'duration': 15.116}, {'end': 327.535, 'text': 'where x is going to be the image.', 'start': 324.594, 'duration': 2.941}, {'end': 331.937, 'text': 'So wx plus b.', 'start': 330.136, 'duration': 1.801}, {'end': 334.078, 'text': 'And the second part is going to be the sigmoid.', 'start': 331.937, 'duration': 2.141}, {'end': 342.71, 'text': "Everybody's familiar with the sigmoid function? Function that takes a number between minus infinity and plus infinity, maps it between zero and one.", 'start': 336.246, 'duration': 6.464}, {'end': 344.911, 'text': "It's very convenient for classification problems.", 'start': 343.07, 'duration': 1.841}, {'end': 353.776, 'text': "And this we're going to call it y hat, which is sigmoid of what you've seen in class previously, I think it's theta transpose x.", 'start': 345.511, 'duration': 8.265}, {'end': 356.478, 'text': 'But here we will just separate the notation into w and b.', 'start': 353.776, 'duration': 2.702}, {'end': 389.171, 'text': "So can someone tell me what's the shape of W, the matrix W, vector matrix? Hm? What? Yeah, 64 by 64 by three as a, yeah.", 'start': 364.585, 'duration': 24.586}], 'summary': 'Using logistic regression to classify images by flattening into a vector and applying sigmoid function.', 'duration': 107.269, 'max_score': 281.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys281902.jpg'}, {'end': 463.68, 'src': 'embed', 'start': 432.853, 'weight': 4, 'content': [{'end': 436.754, 'text': "We're just changing the notations of the logistic regression that you guys have seen.", 'start': 432.853, 'duration': 3.901}, {'end': 440.716, 'text': 'And so once we have this model, we need to train it, as you know.', 'start': 437.194, 'duration': 3.522}, {'end': 451.332, 'text': 'And the process of training is that first we will initialize or parameters, these are what we call parameters.', 'start': 440.816, 'duration': 10.516}, {'end': 457.376, 'text': 'We will use the specific vocabulary of weights and bias.', 'start': 452.573, 'duration': 4.803}, {'end': 463.68, 'text': 'I believe you guys have heard this vocabulary before, weights and biases.', 'start': 459.377, 'duration': 4.303}], 'summary': 'Changing notations of logistic regression, training model with weights and biases.', 'duration': 30.827, 'max_score': 432.853, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys432853.jpg'}, {'end': 566.21, 'src': 'embed', 'start': 529.341, 'weight': 5, 'content': [{'end': 532.604, 'text': 'If you manage to minimize the loss function, you will find the right parameters.', 'start': 529.341, 'duration': 3.263}, {'end': 548.932, 'text': 'So you define a loss function that is the logistic loss, y log of y hat plus one minus y log of one minus y hat.', 'start': 534.266, 'duration': 14.666}, {'end': 552.893, 'text': 'You guys have seen this one.', 'start': 551.953, 'duration': 0.94}, {'end': 554.013, 'text': 'You remember where it comes from?', 'start': 553.013, 'duration': 1}, {'end': 560.616, 'text': 'It comes from a maximum likelihood estimation, starting from a probabilistic model.', 'start': 554.033, 'duration': 6.583}, {'end': 566.21, 'text': 'And so the idea is how can I minimize this function??', 'start': 563.168, 'duration': 3.042}], 'summary': 'Minimize logistic loss function to find right parameters', 'duration': 36.869, 'max_score': 529.341, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys529341.jpg'}, {'end': 651.171, 'src': 'embed', 'start': 619.027, 'weight': 3, 'content': [{'end': 620.247, 'text': 'and you recompute the results.', 'start': 619.027, 'duration': 1.22}, {'end': 625.929, 'text': "We're going to do some derivative later today, but just to set up the problem here.", 'start': 620.287, 'duration': 5.642}, {'end': 637.172, 'text': 'So the few things that I wanna touch on here is first how many parameters does this model have, this logistic regression, if you have to count them?', 'start': 626.749, 'duration': 10.423}, {'end': 648.45, 'text': 'So this is the number oh, 89, yeah, correct.', 'start': 646.57, 'duration': 1.88}, {'end': 651.171, 'text': 'So 12, 288 weights and one bias.', 'start': 648.99, 'duration': 2.181}], 'summary': 'Logistic regression model has 12,288 weights and one bias.', 'duration': 32.144, 'max_score': 619.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys619027.jpg'}], 'start': 281.902, 'title': 'Logistic regression for image classification and training', 'summary': 'Discusses using logistic regression to classify images by flattening them into a vector, applying the logistic operation, using the sigmoid function for classification, and covering the training process, including parameter initialization, optimization using gradient descent, and the use of a loss function to find the optimal parameters, with a matrix w having a shape of 1 by 12288, a total of 289 weights, and 1 bias for the model.', 'chapters': [{'end': 430.652, 'start': 281.902, 'title': 'Logistic regression for image classification', 'summary': 'Discusses using logistic regression to classify images, flattening the image into a vector, applying the logistic operation, and using the sigmoid function for classification, with the matrix w having a shape of 1 by 12288.', 'duration': 148.75, 'highlights': ['The matrix W has a shape of 1 by 12288, which is essential for the logistic operation and classification.', 'The chapter explains flattening the image into a vector as the first step in using logistic regression for image classification.', 'The sigmoid function is used for classification problems, mapping numbers between -∞ and +∞ to values between 0 and 1.']}, {'end': 678.917, 'start': 432.853, 'title': 'Training logistic regression model', 'summary': 'Covers the training process of logistic regression, including parameter initialization, optimization using gradient descent, and the use of a loss function to find the optimal parameters, with a total of 289 weights and 1 bias for the model.', 'duration': 246.064, 'highlights': ['The logistic regression model has a total of 289 weights and 1 bias.', 'The process of training involves initializing parameters, optimizing with gradient descent, and using a loss function to find the optimal parameters.', 'The loss function for logistic regression is defined as y log of y hat plus one minus y log of one minus y hat.']}], 'duration': 397.015, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys281902.jpg', 'highlights': ['The matrix W has a shape of 1 by 12288, essential for logistic operation.', 'Flattening the image into a vector is the first step in logistic regression.', 'The sigmoid function maps numbers between -∞ and +∞ to values between 0 and 1.', 'The logistic regression model has a total of 289 weights and 1 bias.', 'Training process involves initializing parameters, optimizing with gradient descent, and using a loss function.', 'The loss function for logistic regression is defined as y log of y hat plus one minus y log of one minus y hat.']}, {'end': 1432.05, 'segs': [{'end': 768.186, 'src': 'embed', 'start': 679.017, 'weight': 0, 'content': [{'end': 680.457, 'text': "We're going to use it a little further.", 'start': 679.017, 'duration': 1.44}, {'end': 683.318, 'text': "So we're starting with not too many parameters, actually.", 'start': 681.058, 'duration': 2.26}, {'end': 688.52, 'text': 'And one thing that we notice is that the number of parameters of a model depends on the size of the input.', 'start': 683.998, 'duration': 4.522}, {'end': 691.001, 'text': "We probably don't want that at some point.", 'start': 689.54, 'duration': 1.461}, {'end': 693.381, 'text': "So we're going to change it later on.", 'start': 691.241, 'duration': 2.14}, {'end': 703.724, 'text': 'So two equations that I want you to remember is the first one is neuron equals linear plus activation.', 'start': 694.642, 'duration': 9.082}, {'end': 705.065, 'text': 'So this is.', 'start': 704.625, 'duration': 0.44}, {'end': 707.815, 'text': 'the vocabulary we will use in neural networks.', 'start': 705.854, 'duration': 1.961}, {'end': 715.238, 'text': "We define a neuron as an operation that has two parts, one linear part and one activation part, and it's exactly that.", 'start': 708.455, 'duration': 6.783}, {'end': 716.459, 'text': 'This is actually a neuron.', 'start': 715.618, 'duration': 0.841}, {'end': 727.223, 'text': 'We have a linear part, wx plus b, and then we take the output of this linear part and we put it in an activation.', 'start': 719.78, 'duration': 7.443}, {'end': 728.964, 'text': 'that in this case is the sigmoid function.', 'start': 727.223, 'duration': 1.741}, {'end': 730.585, 'text': 'It can be other functions.', 'start': 729.564, 'duration': 1.021}, {'end': 734.947, 'text': 'So this is the first equation, not too hard.', 'start': 731.906, 'duration': 3.041}, {'end': 747.559, 'text': 'The second equation that I wanna set now is the model equals architecture plus parameters.', 'start': 736.55, 'duration': 11.009}, {'end': 755.845, 'text': "What does that mean? It means here we're trying to train a logistic regression.", 'start': 751.282, 'duration': 4.563}, {'end': 764.012, 'text': 'In order to be able to use it, we need an architecture, which is the following, a one-neuron neural network.', 'start': 756.266, 'duration': 7.746}, {'end': 768.186, 'text': 'and the parameters W and B.', 'start': 765.785, 'duration': 2.401}], 'summary': 'The transcript discusses the relationship between model parameters and input size, and defines equations for neurons and models in neural networks.', 'duration': 89.169, 'max_score': 679.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys679017.jpg'}, {'end': 846.832, 'src': 'embed', 'start': 787.937, 'weight': 2, 'content': [{'end': 795.261, 'text': 'today we will add neurons all over and the parameters will always be called W and B, but they will become bigger and bigger.', 'start': 787.937, 'duration': 7.324}, {'end': 798.354, 'text': 'because we have more data, we wanna be able to understand it.', 'start': 796.052, 'duration': 2.302}, {'end': 805.059, 'text': "You can get that it's going to be hard to understand what a cat is with only that many parameters.", 'start': 799.635, 'duration': 5.424}, {'end': 806.741, 'text': 'We want to have more parameters.', 'start': 805.64, 'duration': 1.101}, {'end': 814.347, 'text': 'Any questions so far? So this was just to set up the problem with logistic regression.', 'start': 810.084, 'duration': 4.263}, {'end': 822.053, 'text': "Let's try to set a new goal after the first goal we have set prior to that.", 'start': 816.529, 'duration': 5.524}, {'end': 835.847, 'text': 'So the second goal would be Find cat, lion, iguana in images.', 'start': 823.755, 'duration': 12.092}, {'end': 838.988, 'text': 'So a little different than before.', 'start': 837.728, 'duration': 1.26}, {'end': 843.631, 'text': 'Only thing we changed is that we wanna now to detect three types of animals.', 'start': 839.589, 'duration': 4.042}, {'end': 846.832, 'text': "If there's a cat on the image, I wanna know there's a cat.", 'start': 844.771, 'duration': 2.061}], 'summary': 'Increasing parameters for better understanding. new goal: detecting three types of animals in images.', 'duration': 58.895, 'max_score': 787.937, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys787937.jpg'}, {'end': 1236.811, 'src': 'embed', 'start': 1207.318, 'weight': 4, 'content': [{'end': 1208.098, 'text': "That's a good question.", 'start': 1207.318, 'duration': 0.78}, {'end': 1209.379, 'text': "We're going to talk about that now.", 'start': 1208.278, 'duration': 1.101}, {'end': 1212.16, 'text': 'Multiple image contain different animals or not.', 'start': 1210.099, 'duration': 2.061}, {'end': 1218.583, 'text': 'So, going back on what you said, because we decided to label our dataset like that after training,', 'start': 1212.54, 'duration': 6.043}, {'end': 1221.284, 'text': 'this neuron is naturally going to be there to detect cats.', 'start': 1218.583, 'duration': 2.701}, {'end': 1229.988, 'text': 'If we had changed the labeling scheme and I said that the second entry would correspond to the cat, the presence of the cat then, after training,', 'start': 1222.084, 'duration': 7.904}, {'end': 1232.929, 'text': 'you will detect that this neuron is responsible for detecting the cat.', 'start': 1229.988, 'duration': 2.941}, {'end': 1236.811, 'text': 'So the network is going to evolve depending on the way you label your dataset.', 'start': 1233.57, 'duration': 3.241}], 'summary': "Discussing dataset labeling's impact on neural network evolution.", 'duration': 29.493, 'max_score': 1207.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1207318.jpg'}], 'start': 679.017, 'title': 'Neural network basics and image recognition', 'summary': 'Covers neural network basics, including the relationship between parameters and input size, neuron composition, and model architecture. additionally, it delves into modifying neural networks for image recognition, detecting multiple animals, and the influence of labeling schemes on network robustness, providing insights on parameter numbers and data requirements.', 'chapters': [{'end': 787.937, 'start': 679.017, 'title': 'Neural network basics', 'summary': 'Discusses the basics of neural networks, including the relationship between the number of parameters and input size, the composition of a neuron, and the concept of a model in terms of architecture and parameters for logistic regression.', 'duration': 108.92, 'highlights': ["We define a neuron as an operation that has two parts, one linear part and one activation part, and it's exactly that. This is actually a neuron.", 'The number of parameters of a model depends on the size of the input.', "We're trying to train a logistic regression. In order to be able to use it, we need an architecture, which is the following, a one-neuron neural network, and the parameters W and B."]}, {'end': 1432.05, 'start': 787.937, 'title': 'Neural network parameters and image recognition', 'summary': 'Discusses setting new goals for image recognition, modifying a neural network to detect multiple animals in images, and the impact of labeling scheme on network robustness, with insights into the number of parameters and data requirements.', 'duration': 644.113, 'highlights': ['The chapter discusses setting new goals for image recognition, modifying a neural network to detect multiple animals in images, and the impact of labeling scheme on network robustness.', 'Insights into the number of parameters and data requirements for the modified neural network.', "Discussion on the impact of labeling scheme on the network's ability to detect different animals in the same picture."]}], 'duration': 753.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys679017.jpg', 'highlights': ['The number of parameters of a model depends on the size of the input.', "We define a neuron as an operation that has two parts, one linear part and one activation part, and it's exactly that. This is actually a neuron.", 'Insights into the number of parameters and data requirements for the modified neural network.', 'The chapter discusses setting new goals for image recognition, modifying a neural network to detect multiple animals in images, and the impact of labeling scheme on network robustness.', "Discussion on the impact of labeling scheme on the network's ability to detect different animals in the same picture.", "We're trying to train a logistic regression. In order to be able to use it, we need an architecture, which is the following, a one-neuron neural network, and the parameters W and B."]}, {'end': 1778.501, 'segs': [{'end': 1476.835, 'src': 'embed', 'start': 1432.47, 'weight': 0, 'content': [{'end': 1441.332, 'text': 'Is there cases where we have a constraint where there is only one possible outcome? It means there is no cat and lion.', 'start': 1432.47, 'duration': 8.862}, {'end': 1442.812, 'text': "There's either a cat or a lion.", 'start': 1441.352, 'duration': 1.46}, {'end': 1444.673, 'text': 'There is no iguana and lion.', 'start': 1443.172, 'duration': 1.501}, {'end': 1446.213, 'text': "There's either an iguana or a lion.", 'start': 1444.693, 'duration': 1.52}, {'end': 1448.85, 'text': 'Think about healthcare.', 'start': 1447.708, 'duration': 1.142}, {'end': 1459.806, 'text': 'There are many models that are made to detect if a skin disease is present based on cell microscopic images.', 'start': 1448.97, 'duration': 10.836}, {'end': 1463.149, 'text': 'Usually, there is no overlap between diseases.', 'start': 1460.768, 'duration': 2.381}, {'end': 1466.931, 'text': 'It means you want to classify a specific disease among a large number of diseases.', 'start': 1463.189, 'duration': 3.742}, {'end': 1472.233, 'text': "So this model would still work but would not be optimal because it's longer to train.", 'start': 1467.531, 'duration': 4.702}, {'end': 1476.835, 'text': 'Maybe one disease is super, super rare and one of the neurons is never going to be trained.', 'start': 1472.573, 'duration': 4.262}], 'summary': 'In healthcare, models detect rare diseases with no overlap between them, requiring longer training.', 'duration': 44.365, 'max_score': 1432.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1432470.jpg'}, {'end': 1583.598, 'src': 'embed', 'start': 1494.578, 'weight': 3, 'content': [{'end': 1500.179, 'text': 'And let the model learn with all the neurons learned together by creating interaction between them.', 'start': 1494.578, 'duration': 5.601}, {'end': 1506.301, 'text': 'Have you guys heard of Softmax? Yes, some of you, I see that.', 'start': 1501.54, 'duration': 4.761}, {'end': 1510.382, 'text': "Okay, so let's look at Softmax a little bit together.", 'start': 1507.461, 'duration': 2.921}, {'end': 1511.782, 'text': 'So we set a new goal now.', 'start': 1510.542, 'duration': 1.24}, {'end': 1526.818, 'text': 'which is we add a constraint which is unique animal on an image.', 'start': 1516.209, 'duration': 10.609}, {'end': 1533.544, 'text': 'So at most one animal on an image.', 'start': 1531.402, 'duration': 2.142}, {'end': 1537.908, 'text': "So I'm going to modify the network a little bit.", 'start': 1536.026, 'duration': 1.882}, {'end': 1541.851, 'text': 'We have our cat and there is no lion on the image.', 'start': 1539.649, 'duration': 2.202}, {'end': 1543.653, 'text': 'We flatten it.', 'start': 1542.932, 'duration': 0.721}, {'end': 1549.682, 'text': "And now I'm going to use the same scheme with the three neurons.", 'start': 1546.141, 'duration': 3.541}, {'end': 1555.984, 'text': 'A1, A2, A3.', 'start': 1551.162, 'duration': 4.822}, {'end': 1570.107, 'text': "But as an output, what I'm going to use is a softmax function.", 'start': 1563.386, 'duration': 6.721}, {'end': 1574.334, 'text': 'So, Let me be more precise.', 'start': 1570.147, 'duration': 4.187}, {'end': 1578.056, 'text': 'Let me actually introduce another notation to make it easier.', 'start': 1574.715, 'duration': 3.341}, {'end': 1583.598, 'text': 'As you know, the neuron is a linear part plus an activation.', 'start': 1579.477, 'duration': 4.121}], 'summary': 'Neural network modified to ensure at most one animal in an image using softmax function.', 'duration': 89.02, 'max_score': 1494.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1494578.jpg'}, {'end': 1696.786, 'src': 'embed', 'start': 1618.737, 'weight': 2, 'content': [{'end': 1620.798, 'text': "And I'm going to use the specific formula.", 'start': 1618.737, 'duration': 2.061}, {'end': 1650.026, 'text': 'So this, if you recall, is exactly the softmax formula.', 'start': 1644.864, 'duration': 5.162}, {'end': 1674.632, 'text': 'Yeah, Okay, so now The network we have.', 'start': 1669.315, 'duration': 5.317}, {'end': 1675.853, 'text': "can you guys see where it's too small?", 'start': 1674.632, 'duration': 1.221}, {'end': 1677.294, 'text': 'Too small?', 'start': 1676.873, 'duration': 0.421}, {'end': 1683.077, 'text': "Okay, I'm going to just write this formula bigger, and then you can figure out the others.", 'start': 1677.974, 'duration': 5.103}, {'end': 1696.786, 'text': 'I guess, because e of z, three, one divided by sum from k, equals one to three of e, exponential of z, k one.', 'start': 1683.077, 'duration': 13.709}], 'summary': 'Discussion about using the softmax formula for network computation.', 'duration': 78.049, 'max_score': 1618.737, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1618737.jpg'}], 'start': 1432.47, 'title': 'Disease detection and neural network modification', 'summary': 'Discusses the challenge of single outcome constraint in disease detection models, emphasizing the impact on rare diseases. it also explores the modification of neural networks with the introduction of softmax function and its impact on creating a probability distribution over all classes.', 'chapters': [{'end': 1493.518, 'start': 1432.47, 'title': 'Single outcome constraint in disease detection', 'summary': 'Discusses the constraint of having only one possible outcome, exemplified by the challenge of classifying specific diseases among a large number of diseases in healthcare models based on cell microscopic images, which may lead to suboptimal training for rare diseases.', 'duration': 61.048, 'highlights': ['The challenge of classifying specific diseases among a large number of diseases in healthcare models based on cell microscopic images, where there is no overlap between diseases, leading to suboptimal training for rare diseases.', 'The constraint of having only one possible outcome, exemplified by the scenario of working in a zoo with only one iguana and thousands of lions and cats, making it difficult to train a model effectively.']}, {'end': 1614.696, 'start': 1494.578, 'title': 'Softmax and neuron modification', 'summary': 'Discusses the modification of a neural network to add a constraint of at most one animal per image and the introduction of the softmax function, emphasizing the introduction of z11 and z112 as notations for the linear parts of neurons.', 'duration': 120.118, 'highlights': ['The introduction of the constraint of at most one animal per image modifies the neural network.', 'The implementation of the softmax function is used as an output with three neurons (A1, A2, A3).', 'The introduction of Z11 and Z112 as notations for the linear parts of neurons simplifies the representation of the neural network.']}, {'end': 1778.501, 'start': 1618.737, 'title': 'Softmax formula and probability distribution', 'summary': 'Discusses the softmax formula, explaining its application in a neural network and its impact on creating a probability distribution over all classes, ensuring the sum of outputs to be one.', 'duration': 159.764, 'highlights': ['The chapter highlights the application of the softmax formula in a neural network, where the sum of outputs is crucially required to be one, creating a probability distribution over all classes.', 'It explains that the softmax formula enforces a constraint where the probabilities of all classes are interdependent, requiring the sum of outputs to be one, altering the probabilistic output for each class.']}], 'duration': 346.031, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1432470.jpg', 'highlights': ['The challenge of classifying specific diseases among a large number of diseases in healthcare models based on cell microscopic images, leading to suboptimal training for rare diseases.', 'The constraint of having only one possible outcome, making it difficult to train a model effectively.', 'The chapter highlights the application of the softmax formula in a neural network, creating a probability distribution over all classes.', 'The introduction of the constraint of at most one animal per image modifies the neural network.', 'The implementation of the softmax function is used as an output with three neurons (A1, A2, A3).']}, {'end': 2433.319, 'segs': [{'end': 1862.317, 'src': 'embed', 'start': 1828.27, 'weight': 0, 'content': [{'end': 1837.113, 'text': "We still have three neurons and although I didn't write it, this z one is equal to w one one x plus b one, z two same, z three same.", 'start': 1828.27, 'duration': 8.843}, {'end': 1840.694, 'text': "So there's three n plus three parameters.", 'start': 1838.733, 'duration': 1.961}, {'end': 1858.436, 'text': "So one question that we didn't talk about is how do we train these parameters, These parameters, the three and plus three parameters?", 'start': 1844.695, 'duration': 13.741}, {'end': 1859.216, 'text': 'how do we train them?', 'start': 1858.436, 'duration': 0.78}, {'end': 1862.317, 'text': 'You think this scheme will work or no?', 'start': 1859.756, 'duration': 2.561}], 'summary': 'Three neurons with 3n+3 parameters need training.', 'duration': 34.047, 'max_score': 1828.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1828270.jpg'}, {'end': 1988.136, 'src': 'heatmap', 'start': 1901.831, 'weight': 3, 'content': [{'end': 1907.474, 'text': "What I'm going to do is I'm going to just sum it up for the three neurons.", 'start': 1901.831, 'duration': 5.643}, {'end': 1933.584, 'text': "Does this make sense? So I'm just doing three times this loss for each of the neurons.", 'start': 1926.24, 'duration': 7.344}, {'end': 1936.225, 'text': 'So we have exactly three times this.', 'start': 1934.464, 'duration': 1.761}, {'end': 1938.586, 'text': 'We sum them together.', 'start': 1937.706, 'duration': 0.88}, {'end': 1945.049, 'text': 'And if you train this loss function, you should be able to train the three neurons that you have.', 'start': 1940.007, 'duration': 5.042}, {'end': 1948.891, 'text': 'And again, talking about scarcity of one of the classes.', 'start': 1945.97, 'duration': 2.921}, {'end': 1960.123, 'text': 'If there is not many iguana, then the third term of this sum, is not going to help this neuron train towards detecting an iguana.', 'start': 1949.271, 'duration': 10.852}, {'end': 1962.827, 'text': "It's going to push it to detect no iguana.", 'start': 1960.644, 'duration': 2.183}, {'end': 1970.739, 'text': 'Any question on the loss function? Does this one make sense? Yeah.', 'start': 1967.013, 'duration': 3.726}, {'end': 1988.136, 'text': "Yeah, usually that's what will happen is that the output of this network once it's trained is going to be a probability distribution.", 'start': 1981.914, 'duration': 6.222}], 'summary': 'Training three neurons using loss function to detect classes, affected by scarcity, leading to a probability distribution output.', 'duration': 47.06, 'max_score': 1901.831, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1901831.jpg'}, {'end': 2072.239, 'src': 'embed', 'start': 2030.758, 'weight': 2, 'content': [{'end': 2035.341, 'text': 'So you will never be able to match the output to the input, to the label.', 'start': 2030.758, 'duration': 4.583}, {'end': 2036.441, 'text': 'Does it make sense?', 'start': 2035.361, 'duration': 1.08}, {'end': 2044.146, 'text': "So what the network is probably going to do is it's probably going to send this one to one half, this one to one half and this one to zero, probably,", 'start': 2037.042, 'duration': 7.104}, {'end': 2044.946, 'text': 'which is not what you want.', 'start': 2044.146, 'duration': 0.8}, {'end': 2052.411, 'text': "Okay, let's talk about the loss function for this softmax regression.", 'start': 2048.108, 'duration': 4.303}, {'end': 2072.239, 'text': "Cause you know what's interesting about this loss is if I take this derivative, derivative of the loss 3N with respect to W21.", 'start': 2062.072, 'duration': 10.167}], 'summary': 'Neural network may incorrectly assign outputs, discussing loss function.', 'duration': 41.481, 'max_score': 2030.758, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2030758.jpg'}, {'end': 2119.647, 'src': 'heatmap', 'start': 2062.072, 'weight': 0.735, 'content': [{'end': 2072.239, 'text': "Cause you know what's interesting about this loss is if I take this derivative, derivative of the loss 3N with respect to W21.", 'start': 2062.072, 'duration': 10.167}, {'end': 2082.578, 'text': "Do you think it's going to be harder than this derivative than this one or no? It's going to be exactly the same.", 'start': 2072.399, 'duration': 10.179}, {'end': 2085.938, 'text': 'Because only one of these three terms depends on w one two.', 'start': 2082.998, 'duration': 2.94}, {'end': 2088.639, 'text': 'It means the derivative of the two others are zero.', 'start': 2086.38, 'duration': 2.259}, {'end': 2092.161, 'text': "So we're exactly at the same complexity during the derivation.", 'start': 2089.241, 'duration': 2.92}, {'end': 2101.825, 'text': "But this one, you think if you try to compute, let's say we define a loss function that corresponds roughly to that.", 'start': 2093.542, 'duration': 8.283}, {'end': 2107.387, 'text': 'If you try to compute the derivative of the loss with respect to w two, it will become much more complex.', 'start': 2101.905, 'duration': 5.482}, {'end': 2117.345, 'text': 'Because this number the output here that is going to impact the loss function directly not only depends on the parameters of w2,', 'start': 2108.618, 'duration': 8.727}, {'end': 2119.647, 'text': 'it also depends on the parameters of w1 and w3..', 'start': 2117.345, 'duration': 2.302}], 'summary': 'Derivatives of the loss function show equal complexity for w21, but much more complexity for w2.', 'duration': 57.575, 'max_score': 2062.072, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2062072.jpg'}, {'end': 2117.345, 'src': 'embed', 'start': 2089.241, 'weight': 4, 'content': [{'end': 2092.161, 'text': "So we're exactly at the same complexity during the derivation.", 'start': 2089.241, 'duration': 2.92}, {'end': 2101.825, 'text': "But this one, you think if you try to compute, let's say we define a loss function that corresponds roughly to that.", 'start': 2093.542, 'duration': 8.283}, {'end': 2107.387, 'text': 'If you try to compute the derivative of the loss with respect to w two, it will become much more complex.', 'start': 2101.905, 'duration': 5.482}, {'end': 2117.345, 'text': 'Because this number the output here that is going to impact the loss function directly not only depends on the parameters of w2,', 'start': 2108.618, 'duration': 8.727}], 'summary': 'Challenges in computing derivative of loss function with respect to w2.', 'duration': 28.104, 'max_score': 2089.241, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2089241.jpg'}, {'end': 2219.984, 'src': 'heatmap', 'start': 2134.77, 'weight': 1, 'content': [{'end': 2137.852, 'text': "So the loss function we'll define is a very common one in deep learning.", 'start': 2134.77, 'duration': 3.082}, {'end': 2140.593, 'text': "It's called the softmax cross entropy.", 'start': 2138.612, 'duration': 1.981}, {'end': 2145.515, 'text': 'Cross entropy loss.', 'start': 2142.614, 'duration': 2.901}, {'end': 2152.879, 'text': "I'm not going into the details of where it comes from, but you can get the intuition.", 'start': 2148.577, 'duration': 4.302}, {'end': 2158.042, 'text': 'Yk log.', 'start': 2155.881, 'duration': 2.161}, {'end': 2181.921, 'text': 'So it surprisingly looks like the binary logistic loss function.', 'start': 2175.478, 'duration': 6.443}, {'end': 2191.085, 'text': 'The only difference is that we will sum it up on all the classes.', 'start': 2183.762, 'duration': 7.323}, {'end': 2202.009, 'text': "Now we will take a derivative of something that looks like that later, but I'd say, if you can try it at home on this one,", 'start': 2194.266, 'duration': 7.743}, {'end': 2203.21, 'text': 'it would be a good exercise as well.', 'start': 2202.009, 'duration': 1.201}, {'end': 2211.616, 'text': 'So this binary cross-entropy loss is very likely to be used in classification problems that are multi-class.', 'start': 2204.67, 'duration': 6.946}, {'end': 2219.984, 'text': 'Okay, so this was the first part on logistic regression types of networks.', 'start': 2215.94, 'duration': 4.044}], 'summary': 'Defines softmax cross entropy loss function for multi-class classification problems.', 'duration': 56.315, 'max_score': 2134.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2134770.jpg'}, {'end': 2390.947, 'src': 'embed', 'start': 2360.184, 'weight': 5, 'content': [{'end': 2367.067, 'text': 'We could just use a linear function, right, for the sigmoid, but this becomes a linear regression.', 'start': 2360.184, 'duration': 6.883}, {'end': 2369.448, 'text': 'The whole network becomes a linear regression.', 'start': 2367.927, 'duration': 1.521}, {'end': 2373.49, 'text': 'Another one that is very common in deep learning is called the ReLU function.', 'start': 2370.148, 'duration': 3.342}, {'end': 2379.472, 'text': "It's a function that is almost linear, but for every input that is negative, it's equal to zero.", 'start': 2374.29, 'duration': 5.182}, {'end': 2383.014, 'text': 'Because we cannot have negative h, it makes sense to use this one.', 'start': 2380.233, 'duration': 2.781}, {'end': 2390.947, 'text': 'Okay, so this is called rectified linear unit, ReLU.', 'start': 2385.123, 'duration': 5.824}], 'summary': 'Neural networks use linear and relu functions for regression, common in deep learning.', 'duration': 30.763, 'max_score': 2360.184, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2360184.jpg'}], 'start': 1779.462, 'title': 'Softmax multi-class network and neural network loss functions', 'summary': 'Covers the softmax multi-class network, training parameters, modifying the loss function, and the impact of labeling scheme, as well as the complexity of derivatives in neural network loss functions, softmax cross entropy loss, and adaptation of network architecture and loss functions for regression tasks.', 'chapters': [{'end': 2052.411, 'start': 1779.462, 'title': 'Softmax multi-class network', 'summary': "Discusses the softmax multi-class network, training parameters, modifying the loss function, and the impact of labeling scheme on the network's performance.", 'duration': 272.949, 'highlights': ['The network has three neurons and a total of 3n+3 parameters, which are trained using a modified loss function.', 'The loss function is modified to sum up the loss for each of the three neurons, enabling the training of the network.', 'The impact of labeling scheme is explained, highlighting the potential mismatch between the output and the label if the labeling scheme is not compatible with the network structure.']}, {'end': 2433.319, 'start': 2062.072, 'title': 'Neural network loss functions', 'summary': 'Explains the complexity of derivatives in neural network loss functions, introduces the softmax cross entropy loss for multi-class classification, and discusses the adaptation of network architecture and loss functions for regression tasks.', 'duration': 371.247, 'highlights': ['The softmax cross entropy loss function is commonly used in deep learning for multi-class classification problems.', 'The complexity of computing derivatives varies based on the parameters involved, impacting the choice of loss function and derivative computation.', 'Adapting network architecture and loss functions for regression involves modifying activation functions and loss functions to suit the regression task.']}], 'duration': 653.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys1779462.jpg', 'highlights': ['The network has three neurons and a total of 3n+3 parameters, trained using a modified loss function.', 'The softmax cross entropy loss function is commonly used in deep learning for multi-class classification problems.', 'The impact of labeling scheme is explained, highlighting the potential mismatch between the output and the label.', 'The loss function is modified to sum up the loss for each of the three neurons, enabling the training of the network.', 'The complexity of computing derivatives varies based on the parameters involved, impacting the choice of loss function and derivative computation.', 'Adapting network architecture and loss functions for regression involves modifying activation functions and loss functions.']}, {'end': 2913.426, 'segs': [{'end': 2704.567, 'src': 'heatmap', 'start': 2433.319, 'weight': 0, 'content': [{'end': 2439.441, 'text': 'the shape of this loss is much easier to optimize for a regression task than it is for a classification task and vice versa.', 'start': 2433.319, 'duration': 6.122}, {'end': 2442.966, 'text': "Not going to go into the details of that, but that's the intuition.", 'start': 2440.458, 'duration': 2.508}, {'end': 2447.56, 'text': "Okay, let's go have fun with neural networks.", 'start': 2444.912, 'duration': 2.648}, {'end': 2474.175, 'text': 'So we stick to our first goal.', 'start': 2471.894, 'duration': 2.281}, {'end': 2483.199, 'text': 'Given an image, tell us if there is cat or no cat.', 'start': 2477.536, 'duration': 5.663}, {'end': 2487.16, 'text': 'This is one, this is zero.', 'start': 2485.64, 'duration': 1.52}, {'end': 2490.321, 'text': "But now we're going to make a network a little more complex.", 'start': 2487.94, 'duration': 2.381}, {'end': 2491.922, 'text': "We're going to add some parameters.", 'start': 2490.341, 'duration': 1.581}, {'end': 2494.403, 'text': 'So I get my picture of the cat.', 'start': 2493.102, 'duration': 1.301}, {'end': 2499.265, 'text': 'Cat is moving.', 'start': 2497.624, 'duration': 1.641}, {'end': 2508.715, 'text': "Okay, and what I'm going to do is that I'm going to put more neurons than before.", 'start': 2504.912, 'duration': 3.803}, {'end': 2512.237, 'text': 'Maybe something like that.', 'start': 2511.336, 'duration': 0.901}, {'end': 2564.861, 'text': 'So using the same notation, you see that my square bracket here is two, indicating that there is a layer here, which is the second layer.', 'start': 2556.857, 'duration': 8.004}, {'end': 2573.625, 'text': 'While this one is the first layer and this one is the third layer.', 'start': 2571.424, 'duration': 2.201}, {'end': 2580.969, 'text': "Everybody's up to speed with the notations? Cool.", 'start': 2577.187, 'duration': 3.782}, {'end': 2587.002, 'text': 'So now notice that when you make a choice of architecture,', 'start': 2583.018, 'duration': 3.984}, {'end': 2599.993, 'text': 'you have to be careful of one thing is that the output layer has to have the same number of neurons as you want the number of classes to be for a classification and one for a regression.', 'start': 2587.002, 'duration': 12.991}, {'end': 2624.106, 'text': "So how many parameters does this network have? Can someone quickly give me the thought process? So how much here? Yeah, like 3n plus 3, let's say.", 'start': 2606.259, 'duration': 17.847}, {'end': 2640.63, 'text': 'Yeah, correct.', 'start': 2639.949, 'duration': 0.681}, {'end': 2645.113, 'text': 'So in here you would have three N weights plus three biases.', 'start': 2641.05, 'duration': 4.063}, {'end': 2652.498, 'text': 'Here you would have two times three weights plus two biases, because you have three neurons connected to two neurons.', 'start': 2646.113, 'duration': 6.385}, {'end': 2655.84, 'text': 'And here you will have two times one plus one bias.', 'start': 2653.138, 'duration': 2.702}, {'end': 2658.782, 'text': 'Make sense? So this is the total number of parameters.', 'start': 2657.041, 'duration': 1.741}, {'end': 2662.925, 'text': "So you see that we didn't add too much parameters.", 'start': 2661.104, 'duration': 1.821}, {'end': 2665.227, 'text': 'Most of the parameters are still in the input layer.', 'start': 2663.426, 'duration': 1.801}, {'end': 2670.511, 'text': "Let's define some vocabulary.", 'start': 2669.11, 'duration': 1.401}, {'end': 2672.773, 'text': 'The first word is layer.', 'start': 2671.572, 'duration': 1.201}, {'end': 2676.396, 'text': 'Layer denotes neurons that are not connected to each other.', 'start': 2673.514, 'duration': 2.882}, {'end': 2678.117, 'text': 'These two neurons are not connected to each other.', 'start': 2676.496, 'duration': 1.621}, {'end': 2679.898, 'text': 'These three neurons are not connected to each other.', 'start': 2678.297, 'duration': 1.601}, {'end': 2684.481, 'text': 'We call this cluster of neurons a layer, and this has three layers.', 'start': 2680.198, 'duration': 4.283}, {'end': 2693.568, 'text': 'We would use input layer to define the first layer, output layer to define the third layer, because it directly sees the output,', 'start': 2685.362, 'duration': 8.206}, {'end': 2696.27, 'text': 'and we would call the second layer a hidden layer.', 'start': 2693.568, 'duration': 2.702}, {'end': 2704.567, 'text': 'And the reason we call it hidden is because the input and the output are hidden from this layer.', 'start': 2699.963, 'duration': 4.604}], 'summary': 'Optimizing network for task, adding parameters, defining layers and parameters', 'duration': 166.674, 'max_score': 2433.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2433319.jpg'}, {'end': 2704.567, 'src': 'embed', 'start': 2657.041, 'weight': 3, 'content': [{'end': 2658.782, 'text': 'Make sense? So this is the total number of parameters.', 'start': 2657.041, 'duration': 1.741}, {'end': 2662.925, 'text': "So you see that we didn't add too much parameters.", 'start': 2661.104, 'duration': 1.821}, {'end': 2665.227, 'text': 'Most of the parameters are still in the input layer.', 'start': 2663.426, 'duration': 1.801}, {'end': 2670.511, 'text': "Let's define some vocabulary.", 'start': 2669.11, 'duration': 1.401}, {'end': 2672.773, 'text': 'The first word is layer.', 'start': 2671.572, 'duration': 1.201}, {'end': 2676.396, 'text': 'Layer denotes neurons that are not connected to each other.', 'start': 2673.514, 'duration': 2.882}, {'end': 2678.117, 'text': 'These two neurons are not connected to each other.', 'start': 2676.496, 'duration': 1.621}, {'end': 2679.898, 'text': 'These three neurons are not connected to each other.', 'start': 2678.297, 'duration': 1.601}, {'end': 2684.481, 'text': 'We call this cluster of neurons a layer, and this has three layers.', 'start': 2680.198, 'duration': 4.283}, {'end': 2693.568, 'text': 'We would use input layer to define the first layer, output layer to define the third layer, because it directly sees the output,', 'start': 2685.362, 'duration': 8.206}, {'end': 2696.27, 'text': 'and we would call the second layer a hidden layer.', 'start': 2693.568, 'duration': 2.702}, {'end': 2704.567, 'text': 'And the reason we call it hidden is because the input and the output are hidden from this layer.', 'start': 2699.963, 'duration': 4.604}], 'summary': 'Explanation of neural network layers and parameters.', 'duration': 47.526, 'max_score': 2657.041, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2657041.jpg'}, {'end': 2851.081, 'src': 'embed', 'start': 2779.16, 'weight': 4, 'content': [{'end': 2786.522, 'text': "And they're going to communicate what they understood to the output neuron that is going to construct the face of the cat based on what it received.", 'start': 2779.16, 'duration': 7.362}, {'end': 2788.917, 'text': 'and be able to tell if there is a cat or not.', 'start': 2787.375, 'duration': 1.542}, {'end': 2795.122, 'text': "So the reason it's called hidden layer is because we don't really know what it's going to figure out.", 'start': 2790.118, 'duration': 5.004}, {'end': 2799.466, 'text': 'But with enough data, it should understand very complex information about the data.', 'start': 2795.443, 'duration': 4.023}, {'end': 2803.851, 'text': 'The deeper you go, the more complex information the neurons are able to understand.', 'start': 2799.927, 'duration': 3.924}, {'end': 2808.915, 'text': 'Let me give you another example, which is a house prediction example.', 'start': 2805.472, 'duration': 3.443}, {'end': 2811.558, 'text': 'House price prediction.', 'start': 2810.597, 'duration': 0.961}, {'end': 2851.081, 'text': "So let's assume that our inputs are number of bedrooms, size of the house, zip code, and wealth of the neighborhood, let's say.", 'start': 2833.752, 'duration': 17.329}], 'summary': 'Neural network can process complex data, like cat face or house price, with enough data.', 'duration': 71.921, 'max_score': 2779.16, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2779160.jpg'}], 'start': 2433.319, 'title': 'Neural network architecture and layers', 'summary': 'Covers the optimization of loss for regression and classification, introduces neural network architecture for image classification, discusses network layers, their parameters, and the significance of input, hidden, and output layers, and explains the role of hidden layers in processing complex information, with examples of cat classification and house price prediction.', 'chapters': [{'end': 2655.84, 'start': 2433.319, 'title': 'Neural network architecture for image classification', 'summary': 'Discusses the optimization of loss for regression and classification tasks, introduces the concept of neural network architecture for image classification, and explains the calculation of parameters in the network.', 'duration': 222.521, 'highlights': ['The output layer must have the same number of neurons as the number of classes for classification and one for regression.', 'The addition of neurons and layers in the network increases the number of parameters, with the calculation of weights and biases for each layer.', 'The optimization of loss is easier for a regression task than for a classification task, and vice versa, due to the shape of the loss function.']}, {'end': 2729.032, 'start': 2657.041, 'title': 'Neural network layers and parameters', 'summary': 'Discusses the concept of neural network layers, with a focus on the distribution of parameters and the definitions of input, hidden, and output layers, highlighting the significance of their roles in the network architecture and abstraction of input and output data.', 'duration': 71.991, 'highlights': ['The total number of parameters is illustrated, showing that most parameters are in the input layer.', 'The concept of layers is defined, with emphasis on neurons not being connected to each other within a layer.', 'The definitions of input, hidden, and output layers are provided, emphasizing the roles and visibility of input and output data within the network architecture.']}, {'end': 2913.426, 'start': 2729.352, 'title': "Neural network's hidden layer", 'summary': "Explains the concept of a neural network's hidden layer, illustrating how it detects fundamental concepts in images and processes complex information, using examples of cat classification and house price prediction.", 'duration': 184.074, 'highlights': ['The deeper the hidden layer, the more complex information the neurons are able to understand, with enough data.', 'The hidden layer in a neural network processes information and communicates it to the output neuron to construct the image, enabling it to classify whether there is a cat or not.', 'The example of house price prediction demonstrates how a neural network can process features like zip code and wealth to predict school quality and walkability of the neighborhood.']}], 'duration': 480.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2433319.jpg', 'highlights': ['The addition of neurons and layers in the network increases the number of parameters, with the calculation of weights and biases for each layer.', 'The output layer must have the same number of neurons as the number of classes for classification and one for regression.', 'The optimization of loss is easier for a regression task than for a classification task, and vice versa, due to the shape of the loss function.', 'The total number of parameters is illustrated, showing that most parameters are in the input layer.', 'The deeper the hidden layer, the more complex information the neurons are able to understand, with enough data.', 'The hidden layer in a neural network processes information and communicates it to the output neuron to construct the image, enabling it to classify whether there is a cat or not.', 'The example of house price prediction demonstrates how a neural network can process features like zip code and wealth to predict school quality and walkability of the neighborhood.', 'The concept of layers is defined, with emphasis on neurons not being connected to each other within a layer.', 'The definitions of input, hidden, and output layers are provided, emphasizing the roles and visibility of input and output data within the network architecture.']}, {'end': 3269.264, 'segs': [{'end': 2963.967, 'src': 'embed', 'start': 2943.688, 'weight': 0, 'content': [{'end': 2954.518, 'text': 'It means that we connect every input of a layer, every input to the first layer, every output of the first layer to the input of the third layer,', 'start': 2943.688, 'duration': 10.83}, {'end': 2954.839, 'text': 'and so on.', 'start': 2954.518, 'duration': 0.321}, {'end': 2959.803, 'text': 'So all the neurons from one layer to another are connected with each other.', 'start': 2955.359, 'duration': 4.444}, {'end': 2963.967, 'text': "What we're saying is that we will let the network figure these out.", 'start': 2960.484, 'duration': 3.483}], 'summary': 'Neural network connects all layer inputs, outputs, letting the network figure things out.', 'duration': 20.279, 'max_score': 2943.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2943688.jpg'}, {'end': 3037.425, 'src': 'embed', 'start': 2988.001, 'weight': 1, 'content': [{'end': 2993.383, 'text': 'And oftentimes the network is going to be able better than humans to find these, what are the features that are represented.', 'start': 2988.001, 'duration': 5.382}, {'end': 2999.346, 'text': 'Sometimes you may hear neural networks referred as black box models.', 'start': 2994.224, 'duration': 5.122}, {'end': 3004.088, 'text': 'The reason is we will not understand what this edge would correspond to.', 'start': 3000.006, 'duration': 4.082}, {'end': 3011.592, 'text': "It's hard to figure out that this neuron is detecting a weighted average of the input features.", 'start': 3004.689, 'duration': 6.903}, {'end': 3020.51, 'text': 'Does it make sense? Another word you might hear is end-to-end learning.', 'start': 3013.083, 'duration': 7.427}, {'end': 3030.759, 'text': "The reason we talk about end-to-end learning is because we have an input, a ground truth, and we don't constrain the network in the middle.", 'start': 3021.871, 'duration': 8.888}, {'end': 3037.425, 'text': "We let it learn whatever it has to learn, and we call it end-to-end learning because we're just training based on the input and the output.", 'start': 3030.799, 'duration': 6.626}], 'summary': 'Neural networks can outperform humans in feature detection, often seen as black box models, and utilize end-to-end learning for unconstrained network training.', 'duration': 49.424, 'max_score': 2988.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2988001.jpg'}, {'end': 3226.532, 'src': 'heatmap', 'start': 3172.611, 'weight': 0.967, 'content': [{'end': 3181.419, 'text': 'So it means This A3 that I have here, sorry, this here for three neurons, I wrote three equations.', 'start': 3172.611, 'duration': 8.808}, {'end': 3189.026, 'text': 'Here for three neurons in the second layer, I just wrote a single equation to summarize it.', 'start': 3182.12, 'duration': 6.906}, {'end': 3192.549, 'text': 'But the shape of these things are going to be vectors.', 'start': 3190.407, 'duration': 2.142}, {'end': 3194.631, 'text': "So let's go over the shapes.", 'start': 3193.65, 'duration': 0.981}, {'end': 3195.611, 'text': "Let's try to define them.", 'start': 3194.691, 'duration': 0.92}, {'end': 3200.416, 'text': 'Z11 is going to be X, which is N by one.', 'start': 3196.552, 'duration': 3.864}, {'end': 3208.6, 'text': 'times W, which has to be three by N because it connects three neurons to the input.', 'start': 3201.735, 'duration': 6.865}, {'end': 3212.622, 'text': 'So this Z has to be three by one.', 'start': 3210.241, 'duration': 2.381}, {'end': 3218.666, 'text': 'And it makes sense because we have three neurons.', 'start': 3213.523, 'duration': 5.143}, {'end': 3220.308, 'text': "Now let's go deeper.", 'start': 3218.706, 'duration': 1.602}, {'end': 3225.111, 'text': "A1 is just the sigmoid of Z1, so it doesn't change the shape.", 'start': 3221.588, 'duration': 3.523}, {'end': 3226.532, 'text': 'It keeps the three by one.', 'start': 3225.411, 'duration': 1.121}], 'summary': 'Discussing neural network layers and their shapes, involving 3 neurons and matrix dimensions.', 'duration': 53.921, 'max_score': 3172.611, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3172611.jpg'}], 'start': 2913.466, 'title': 'Neural network fundamentals', 'summary': 'Covers understanding fully connected neural networks, emphasizing their ability to identify important features and the concept of end-to-end learning without constraints. it also explains the equations for forward propagating input through a neural network, including input, hidden, and output layers, and the use of matrices to summarize the equations, discussing the shapes of the linear parts and activations for each layer.', 'chapters': [{'end': 3037.425, 'start': 2913.466, 'title': 'Understanding fully connected neural networks', 'summary': "Explains the concept of fully connected neural networks, emphasizing the network's ability to figure out important features and the idea of end-to-end learning without constraining the network in the middle.", 'duration': 123.959, 'highlights': ['Fully connected layer connects every input of a layer to the first layer, every output of the first layer to the input of the third layer, allowing the network to figure out important features.', "End-to-end learning involves training the network based on input and output, without constraining it in the middle, promoting the network's ability to learn whatever it needs to.", "Neural networks are sometimes referred to as black box models due to the challenge of understanding the specific features and neurons' representations."]}, {'end': 3269.264, 'start': 3075.651, 'title': 'Neural network math: forward propagation', 'summary': 'Explains the equations for forward propagating input through a neural network, where the network has an input layer, a hidden layer, and an output layer, and demonstrates the use of matrices to summarize the equations. it also discusses the shapes of the linear parts and activations for each layer.', 'duration': 193.613, 'highlights': ['The neural network consists of an input layer, a hidden layer, and an output layer, with forward propagation equations detailed for each layer, demonstrating the use of matrices for summarization.', 'Equations for forward propagating input through the neural network are demonstrated, including the linear parts and activations for each layer.', "The shapes of the linear parts and activations for each layer, along with the dimensions of weights and biases, are discussed in detail, providing insights into the network's structure and connections."]}], 'duration': 355.798, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys2913466.jpg', 'highlights': ['Fully connected layer connects every input of a layer to the first layer, promoting feature identification', 'End-to-end learning involves training the network based on input and output, promoting flexibility', 'Neural networks are sometimes referred to as black box models due to the challenge of understanding specific features']}, {'end': 3738.166, 'segs': [{'end': 3326.956, 'src': 'embed', 'start': 3269.264, 'weight': 2, 'content': [{'end': 3276.806, 'text': 'B is going to be the number of neurons, so three by one, two by one, and finally, one by one.', 'start': 3269.264, 'duration': 7.542}, {'end': 3284.129, 'text': "So I think it's usually very helpful even when coding these type of equations to know all the shapes that are involved.", 'start': 3278.047, 'duration': 6.082}, {'end': 3289.868, 'text': 'Are you guys like totally okay with the shapes? Super easy to figure out? Okay, cool.', 'start': 3285.103, 'duration': 4.765}, {'end': 3298.538, 'text': 'So now, what is interesting is that we will try to vectorize the code even more.', 'start': 3291.43, 'duration': 7.108}, {'end': 3303.443, 'text': 'Does someone remember the difference between stochastic gradient descent and gradient descent?', 'start': 3299.419, 'duration': 4.024}, {'end': 3306.026, 'text': "What's the difference?", 'start': 3305.465, 'duration': 0.561}, {'end': 3321.433, 'text': 'Exactly, stochastic gradient descent is updates the weights and the bias after you see every example.', 'start': 3314.409, 'duration': 7.024}, {'end': 3326.956, 'text': "So the direction of the gradient is quite noisy, doesn't represent very well the entire batch.", 'start': 3321.993, 'duration': 4.963}], 'summary': 'Discussion on neuron numbers, vectorization, and stochastic gradient descent.', 'duration': 57.692, 'max_score': 3269.264, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3269264.jpg'}, {'end': 3387.779, 'src': 'heatmap', 'start': 3269.264, 'weight': 0.826, 'content': [{'end': 3276.806, 'text': 'B is going to be the number of neurons, so three by one, two by one, and finally, one by one.', 'start': 3269.264, 'duration': 7.542}, {'end': 3284.129, 'text': "So I think it's usually very helpful even when coding these type of equations to know all the shapes that are involved.", 'start': 3278.047, 'duration': 6.082}, {'end': 3289.868, 'text': 'Are you guys like totally okay with the shapes? Super easy to figure out? Okay, cool.', 'start': 3285.103, 'duration': 4.765}, {'end': 3298.538, 'text': 'So now, what is interesting is that we will try to vectorize the code even more.', 'start': 3291.43, 'duration': 7.108}, {'end': 3303.443, 'text': 'Does someone remember the difference between stochastic gradient descent and gradient descent?', 'start': 3299.419, 'duration': 4.024}, {'end': 3306.026, 'text': "What's the difference?", 'start': 3305.465, 'duration': 0.561}, {'end': 3321.433, 'text': 'Exactly, stochastic gradient descent is updates the weights and the bias after you see every example.', 'start': 3314.409, 'duration': 7.024}, {'end': 3326.956, 'text': "So the direction of the gradient is quite noisy, doesn't represent very well the entire batch.", 'start': 3321.993, 'duration': 4.963}, {'end': 3333.1, 'text': "While gradient descent or batch gradient descent is update after you've seen the whole batch of examples.", 'start': 3327.437, 'duration': 5.663}, {'end': 3336.001, 'text': 'And the gradient is much more precise.', 'start': 3334.341, 'duration': 1.66}, {'end': 3338.903, 'text': 'It points to the direction you wanna go to.', 'start': 3336.021, 'duration': 2.882}, {'end': 3350.972, 'text': "So What we're trying to do now is to write down these equations if, instead of giving one single cat image,", 'start': 3341.745, 'duration': 9.227}, {'end': 3354.435, 'text': 'we had given a bunch of images that either have a cat or not a cat.', 'start': 3350.972, 'duration': 3.463}, {'end': 3359.478, 'text': 'So now, our input x.', 'start': 3355.395, 'duration': 4.083}, {'end': 3374.932, 'text': 'So what happens for an input batch? of M examples.', 'start': 3359.478, 'duration': 15.454}, {'end': 3387.779, 'text': 'So now, our x is not anymore a single column vector.', 'start': 3381.936, 'duration': 5.843}], 'summary': 'Discussion on neuron number, vectorizing code, and gradient descent types.', 'duration': 118.515, 'max_score': 3269.264, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3269264.jpg'}, {'end': 3471.264, 'src': 'embed', 'start': 3445.578, 'weight': 4, 'content': [{'end': 3453.8, 'text': 'So what we wanna do is to be able to parallelize our code or our computation as much as possible by giving batches of inputs and parallelizing these equations.', 'start': 3445.578, 'duration': 8.222}, {'end': 3458.481, 'text': "So let's see how these equations are modified when we give it a batch of M inputs.", 'start': 3454.22, 'duration': 4.261}, {'end': 3471.264, 'text': 'I will use capital letters to denote the equivalent of the lowercase letters but for a batch of input.', 'start': 3461.862, 'duration': 9.402}], 'summary': 'Parallelize computation by giving batches of inputs, modifying equations for m inputs.', 'duration': 25.686, 'max_score': 3445.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3445578.jpg'}, {'end': 3617.79, 'src': 'embed', 'start': 3583.427, 'weight': 6, 'content': [{'end': 3586.749, 'text': "What is the algebraic problem? We said that the number of parameters doesn't change.", 'start': 3583.427, 'duration': 3.322}, {'end': 3594.112, 'text': 'It means that w has the same shape as it had before.', 'start': 3589.21, 'duration': 4.902}, {'end': 3597.493, 'text': 'b should have the same shape as it had before.', 'start': 3595.032, 'duration': 2.461}, {'end': 3600.915, 'text': 'It should be 3 by 1.', 'start': 3597.773, 'duration': 3.142}, {'end': 3607.618, 'text': "What's the problem of this equation? Exactly.", 'start': 3600.915, 'duration': 6.703}, {'end': 3613.989, 'text': "We're summing a 3 by m matrix to a 3 by 1 vector.", 'start': 3608.547, 'duration': 5.442}, {'end': 3616.209, 'text': 'This is not possible in math.', 'start': 3615.089, 'duration': 1.12}, {'end': 3616.75, 'text': "It doesn't work.", 'start': 3616.329, 'duration': 0.421}, {'end': 3617.79, 'text': "It doesn't match.", 'start': 3616.77, 'duration': 1.02}], 'summary': 'Algebraic problem: summing a 3 by m matrix to a 3 by 1 vector is not mathematically possible.', 'duration': 34.363, 'max_score': 3583.427, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3583427.jpg'}, {'end': 3738.166, 'src': 'embed', 'start': 3684.141, 'weight': 0, 'content': [{'end': 3690.762, 'text': 'So we just keep the same number of parameters, but just repeat them in order to be able to write my code in parallel.', 'start': 3684.141, 'duration': 6.621}, {'end': 3693.903, 'text': 'This is called broadcasting.', 'start': 3692.962, 'duration': 0.941}, {'end': 3701.799, 'text': 'And what is convenient is that for those of you who do not do homework, sorry, in Math Lab or Python, MATLAB, OK.', 'start': 3694.783, 'duration': 7.016}, {'end': 3703.56, 'text': 'So in MATLAB, no, Python.', 'start': 3702.219, 'duration': 1.341}, {'end': 3712.047, 'text': 'Python So in Python, there is a package that is often used to code these equations.', 'start': 3707.824, 'duration': 4.223}, {'end': 3713.368, 'text': "It's numpy.", 'start': 3712.768, 'duration': 0.6}, {'end': 3715.13, 'text': 'Some people call it numpy.', 'start': 3713.849, 'duration': 1.281}, {'end': 3715.57, 'text': 'Not sure.', 'start': 3715.17, 'duration': 0.4}, {'end': 3723.617, 'text': 'So numpy, basically numerical Python, would directly do the broadcasting.', 'start': 3716.11, 'duration': 7.507}, {'end': 3724.557, 'text': 'It means,', 'start': 3724.057, 'duration': 0.5}, {'end': 3736.405, 'text': 'if you sum this three by M matrix with a three by one parameter vector is going to automatically reproduce the parameter vector M times so that the equation works.', 'start': 3724.557, 'duration': 11.848}, {'end': 3738.166, 'text': "It's called broadcasting.", 'start': 3737.305, 'duration': 0.861}], 'summary': 'Broadcasting in numpy allows parallel code writing and automatic reproduction of parameter vector m times.', 'duration': 54.025, 'max_score': 3684.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3684141.jpg'}], 'start': 3269.264, 'title': 'Vectorizing and parallelizing neural network operations', 'summary': 'Delves into vectorizing neural network code, differentiating between stochastic and gradient descent, and examines the impact of batch size. it also covers the use of broadcasting to parallelize computations with batches of inputs, maintaining parameter numbers and utilizing numpy.', 'chapters': [{'end': 3443.725, 'start': 3269.264, 'title': 'Vectorizing neural network operations', 'summary': 'Explains the concept of vectorizing neural network code, distinguishes between stochastic gradient descent and gradient descent, and discusses the impact of batch size on computation.', 'duration': 174.461, 'highlights': ['The chapter explains the concept of vectorizing neural network code', 'Distinguishes between stochastic gradient descent and gradient descent', 'Discusses the impact of batch size on computation']}, {'end': 3738.166, 'start': 3445.578, 'title': 'Parallelizing equations with broadcasting', 'summary': 'Explains the use of broadcasting to parallelize computations with batches of inputs, maintaining the number of parameters while repeating them, and the usage of numpy for broadcasting.', 'duration': 292.588, 'highlights': ['The technique of broadcasting is used to maintain the number of parameters while parallelizing computations, such as creating a vector b tilde one, which is b one repeated m times, in order to write code in parallel.', 'The concept of broadcasting allows for the automatic reproduction of the parameter vector M times when performing operations like summing a three by M matrix with a three by one parameter vector in Python using numpy.', 'The modification of equations when given a batch of M inputs involves adjusting the shapes of the equations, such as Z1 being three by M to accommodate the batch inputs, while the parameters like W1 remain the same in shape.', 'The algebraic problem arises when trying to perform operations like summing a 3 by M matrix to a 3 by 1 vector, which is resolved using the technique of broadcasting to allow for parallelized code without changing the number of parameters.']}], 'duration': 468.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3269264.jpg', 'highlights': ['The technique of broadcasting maintains the number of parameters while parallelizing computations', 'The concept of broadcasting allows for automatic reproduction of the parameter vector M times', 'The chapter explains the concept of vectorizing neural network code', 'Distinguishes between stochastic gradient descent and gradient descent', 'Discusses the impact of batch size on computation', 'Modifying equations for a batch of M inputs involves adjusting shapes, while parameters remain the same', 'Algebraic problem when summing a 3 by M matrix to a 3 by 1 vector is resolved using broadcasting']}, {'end': 4808.907, 'segs': [{'end': 3884.638, 'src': 'embed', 'start': 3858.599, 'weight': 2, 'content': [{'end': 3868.595, 'text': 'You need the network to be complex enough to understand very detailed feature of the face and usually This neuron, what it sees as inputs are pixels.', 'start': 3858.599, 'duration': 9.996}, {'end': 3874.016, 'text': 'So it means every edge here is the multiplication of a weight by a pixel.', 'start': 3869.335, 'duration': 4.681}, {'end': 3876.017, 'text': 'So it sees pixels.', 'start': 3875.136, 'duration': 0.881}, {'end': 3882.138, 'text': 'It cannot understand the face as a whole because it sees only pixels.', 'start': 3878.217, 'duration': 3.921}, {'end': 3884.638, 'text': "It's very granular information for it.", 'start': 3882.338, 'duration': 2.3}], 'summary': 'A complex network processes detailed face features using pixels as inputs.', 'duration': 26.039, 'max_score': 3858.599, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3858599.jpg'}, {'end': 3993.304, 'src': 'embed', 'start': 3962.1, 'weight': 1, 'content': [{'end': 3964.802, 'text': 'First, nobody knows the right answer, so you have to test it.', 'start': 3962.1, 'duration': 2.702}, {'end': 3969.806, 'text': 'So you guys talked about training set, validation set, and test set.', 'start': 3965.563, 'duration': 4.243}, {'end': 3977.652, 'text': 'So what we would do is we would try 10 different architectures, train the network on these, look at the validation,', 'start': 3970.326, 'duration': 7.326}, {'end': 3981.075, 'text': 'set accuracy of all these and decide which one seems to be the best.', 'start': 3977.652, 'duration': 3.423}, {'end': 3983.977, 'text': "That's how we figure out what's the right network size.", 'start': 3981.515, 'duration': 2.462}, {'end': 3987.099, 'text': 'On top of that, using experience is often valuable.', 'start': 3984.618, 'duration': 2.481}, {'end': 3993.304, 'text': 'So if you give me a problem, I try always to gauge how complex is the problem.', 'start': 3987.5, 'duration': 5.804}], 'summary': 'Testing 10 architectures, training on them, and using validation set to decide the best network size.', 'duration': 31.204, 'max_score': 3962.1, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3962100.jpg'}, {'end': 4084.623, 'src': 'embed', 'start': 4050.517, 'weight': 0, 'content': [{'end': 4053.875, 'text': 'Based on their complexity, I think the network should be deeper.', 'start': 4050.517, 'duration': 3.358}, {'end': 4061.14, 'text': 'The more complex usually is the problem, the more data you need in order to figure out the output, the deeper should be the network.', 'start': 4054.155, 'duration': 6.985}, {'end': 4062.461, 'text': "That's an intuition, let's say.", 'start': 4061.44, 'duration': 1.021}, {'end': 4069.085, 'text': "Okay, let's move on guys because I think we have about 12 more minutes.", 'start': 4063.221, 'duration': 5.864}, {'end': 4081.174, 'text': "Okay, let's try to write the loss function.", 'start': 4078.552, 'duration': 2.622}, {'end': 4084.623, 'text': 'for this problem.', 'start': 4083.823, 'duration': 0.8}], 'summary': 'Deeper networks needed for complex problems, approx. 12 minutes left to write the loss function.', 'duration': 34.106, 'max_score': 4050.517, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4050517.jpg'}, {'end': 4245.409, 'src': 'heatmap', 'start': 4139.318, 'weight': 0.965, 'content': [{'end': 4140.778, 'text': 'We have to find the right values for these.', 'start': 4139.318, 'duration': 1.46}, {'end': 4143.198, 'text': 'And remember, model equals architecture plus parameter.', 'start': 4140.978, 'duration': 2.22}, {'end': 4144.1, 'text': 'We have our architecture.', 'start': 4143.22, 'duration': 0.88}, {'end': 4145.461, 'text': "If we have our parameters, we're done.", 'start': 4144.26, 'duration': 1.201}, {'end': 4156.685, 'text': 'So in order to do that, we have to define an objective function, sometimes called loss, sometimes called cost function.', 'start': 4147.64, 'duration': 9.045}, {'end': 4166.41, 'text': 'So usually, we would call it loss if there is only one example in the batch, and cost if there is multiple examples.', 'start': 4159.407, 'duration': 7.003}, {'end': 4169.192, 'text': 'in a match.', 'start': 4168.772, 'duration': 0.42}, {'end': 4177.557, 'text': "So the last function, let's define the cost function.", 'start': 4173.375, 'duration': 4.182}, {'end': 4184.822, 'text': 'The cost function J depends on Y hat and Y.', 'start': 4178.598, 'duration': 6.224}, {'end': 4187.783, 'text': 'So Y hat is A3.', 'start': 4184.822, 'duration': 2.961}, {'end': 4196.809, 'text': 'It depends on Y hat and Y.', 'start': 4195.568, 'duration': 1.241}, {'end': 4205.343, 'text': 'And we will set it to be the sum of the loss functions li.', 'start': 4198.257, 'duration': 7.086}, {'end': 4207.544, 'text': 'And I will normalize it.', 'start': 4206.163, 'duration': 1.381}, {'end': 4214.089, 'text': "It's not mandatory, but normalize it with 1 over n.", 'start': 4207.604, 'duration': 6.485}, {'end': 4217.792, 'text': "So what does this mean is that we're going for batch gradient descent.", 'start': 4214.089, 'duration': 3.703}, {'end': 4222.956, 'text': 'We want to compute the loss function for the whole batch, parallelize our code.', 'start': 4218.813, 'duration': 4.143}, {'end': 4237.507, 'text': 'and then calculate the cost function that will be then derived to give us the direction of the gradient that is the average direction of all the derivation with respect to the whole input batch.', 'start': 4223.925, 'duration': 13.582}, {'end': 4245.409, 'text': 'And li will be the loss function corresponding to one parameter.', 'start': 4239.528, 'duration': 5.881}], 'summary': 'Finding the right values for architecture plus parameters using cost function for batch gradient descent.', 'duration': 106.091, 'max_score': 4139.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4139318.jpg'}, {'end': 4364.889, 'src': 'embed', 'start': 4324.252, 'weight': 3, 'content': [{'end': 4329.695, 'text': "So instead of computing all our derivatives on J, we will compute them on L, but it's totally equivalent.", 'start': 4324.252, 'duration': 5.443}, {'end': 4331.437, 'text': "There's just one more step at the end.", 'start': 4330.136, 'duration': 1.301}, {'end': 4337.541, 'text': 'Okay, so now we defined our loss function, super.', 'start': 4333.798, 'duration': 3.743}, {'end': 4343.501, 'text': 'We defined our last function, and the next step is optimize.', 'start': 4340.66, 'duration': 2.841}, {'end': 4345.742, 'text': 'So we have to compute a lot of derivatives.', 'start': 4343.761, 'duration': 1.981}, {'end': 4364.889, 'text': "And that's called backward propagation.", 'start': 4362.288, 'duration': 2.601}], 'summary': 'Compute derivatives on l instead of j, equivalent and more efficient. backward propagation involved.', 'duration': 40.637, 'max_score': 4324.252, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4324252.jpg'}, {'end': 4451.276, 'src': 'embed', 'start': 4416.626, 'weight': 4, 'content': [{'end': 4419.288, 'text': 'So it means we have to compute all these derivatives.', 'start': 4416.626, 'duration': 2.662}, {'end': 4422.13, 'text': 'We have to compute derivative of the cost with respect to W1, W2, W3, B1, B2, B3.', 'start': 4419.348, 'duration': 2.782}, {'end': 4427.084, 'text': "You've done it with logistic regression.", 'start': 4425.663, 'duration': 1.421}, {'end': 4428.885, 'text': "We're going to do it with a neural network.", 'start': 4427.164, 'duration': 1.721}, {'end': 4431.866, 'text': "And you're going to understand why it's called backward propagation.", 'start': 4429.685, 'duration': 2.181}, {'end': 4433.887, 'text': 'Which one do you want to start with??', 'start': 4432.807, 'duration': 1.08}, {'end': 4435.408, 'text': 'Which derivative?', 'start': 4434.708, 'duration': 0.7}, {'end': 4440.231, 'text': 'You want to start with the derivative with respect to W1, W2, or W3?', 'start': 4436.209, 'duration': 4.022}, {'end': 4442.352, 'text': "Assuming we'll do the bias later.", 'start': 4441.111, 'duration': 1.241}, {'end': 4451.276, 'text': "W what? W1? You think W1 is a good idea? I don't want to do W1.", 'start': 4445.333, 'duration': 5.943}], 'summary': 'Computing derivatives for neural network. understanding backward propagation.', 'duration': 34.65, 'max_score': 4416.626, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys4416626.jpg'}], 'start': 3739.307, 'title': 'Neural networks and backpropagation', 'summary': 'Discusses the differences between neural networks and principal component analysis, the role of neurons in recognizing features, the process of determining the architecture of a neural network, and the complexity of problems requiring deeper networks, the forward and backward propagation equations, and the optimization process through derivative computation using the chain rule for efficient computation.', 'chapters': [{'end': 3983.977, 'start': 3739.307, 'title': 'Neural networks and principal component analysis', 'summary': 'Discusses the differences between neural networks and principal component analysis, the role of neurons in recognizing features, and the process of determining the architecture of a neural network.', 'duration': 244.67, 'highlights': ['Neural networks focus on output while PCA focuses on features', 'Role of neurons in recognizing features', 'Determining the architecture of a neural network']}, {'end': 4545.095, 'start': 3984.618, 'title': 'Neural network backward propagation', 'summary': 'Discusses the complexity of problems, the need for deeper networks for complex problems, the forward and backward propagation equations, and the optimization process through derivative computation, emphasizing the use of the chain rule for easier computation.', 'duration': 560.477, 'highlights': ['The need for deeper networks for complex problems', 'Explanation of forward and backward propagation equations', 'Optimization process through derivative computation']}, {'end': 4808.907, 'start': 4545.655, 'title': 'Backpropagation and derivatives', 'summary': 'Explains the concept of backpropagation, emphasizing the use of chain rule to compute derivatives with respect to different weights in a neural network, ensuring efficient computation and avoiding redundant work.', 'duration': 263.252, 'highlights': ['Backpropagation involves decomposing the derivative of the cost function with respect to the weights into simpler derivatives, using the chain rule to efficiently compute the derivatives with respect to different weights.', 'The use of forward propagation equations is essential to determine the proper path for derivative computation in order to avoid cancellations in the derivatives.', 'Tweaking a neural network involves adjusting activations, loss function, and other parameters to optimize its performance.']}], 'duration': 1069.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/MfIjxPh6Pys/pics/MfIjxPh6Pys3739307.jpg', 'highlights': ['The need for deeper networks for complex problems', 'Determining the architecture of a neural network', 'Role of neurons in recognizing features', 'Optimization process through derivative computation', 'Backpropagation involves decomposing the derivative of the cost function with respect to the weights into simpler derivatives']}], 'highlights': ['Deep learning is used in computer vision, NLP, and speech recognition', 'Success of deep learning attributed to new computational methods and large data availability', 'Specificity of deep learning algorithms allows increased learning with larger datasets', 'Logistic regression can be interpreted as a specific case of a neural network', 'Deep learning is computationally expensive and utilizes parallelization techniques and GPUs', 'The matrix W has a shape of 1 by 12288, essential for logistic operation', 'Flattening the image into a vector is the first step in logistic regression', 'Sigmoid function maps numbers between -∞ and +∞ to values between 0 and 1', 'Training process involves initializing parameters, optimizing with gradient descent, and using a loss function', 'The number of parameters of a model depends on the size of the input', 'Neuron defined as an operation with linear and activation parts', 'Discussion on setting new goals for image recognition and modifying a neural network', 'Application of the softmax formula in a neural network creates a probability distribution over all classes', 'The challenge of classifying specific diseases among a large number of diseases in healthcare models', 'The network has three neurons and a total of 3n+3 parameters, trained using a modified loss function', 'The addition of neurons and layers in the network increases the number of parameters', 'The output layer must have the same number of neurons as the number of classes for classification', 'Optimization of loss is easier for a regression task than for a classification task', 'Fully connected layer connects every input of a layer to the first layer', 'The technique of broadcasting maintains the number of parameters while parallelizing computations', 'The need for deeper networks for complex problems', 'Backpropagation involves decomposing the derivative of the cost function with respect to the weights into simpler derivatives']}