Coursnap

title
Deep Learning Crash Course for Beginners

description
Learn the fundamental concepts and terminology of Deep Learning, a sub-branch of Machine Learning. This course is designed for absolute beginners with no experience in programming. You will learn the key ideas behind deep learning without any code. You'll learn about Neural Networks, Machine Learning constructs like Supervised, Unsupervised and Reinforcement Learning, the various types of Neural Network architectures, and more. ✏️ Course developed by Jason Dsouza. Check out his YouTube channel: http://youtube.com/jasmcaus ⭐️ Course Contents ⭐️ ⌨️ (0:00) Introduction ⌨️ (1:18) What is Deep Learning ⌨️ (5:25) Introduction to Neural Networks ⌨️ (6:12) How do Neural Networks LEARN? ⌨️ (12:06) Core terminologies used in Deep Learning ⌨️ (12:11) Activation Functions ⌨️ (22:36) Loss Functions ⌨️ (23:42) Optimizers ⌨️ (30:10) Parameters vs Hyperparameters ⌨️ (32:03) Epochs, Batches & Iterations ⌨️ (34:24) Conclusion to Terminologies ⌨️ (35:18) Introduction to Learning ⌨️ (35:34) Supervised Learning ⌨️ (40:21) Unsupervised Learning ⌨️ (43:38) Reinforcement Learning ⌨️ (46:25) Regularization ⌨️ (51:25) Introduction to Neural Network Architectures ⌨️ (51:37) Fully-Connected Feedforward Neural Nets ⌨️ (54:05) Recurrent Neural Nets ⌨️ (1:04:40) Convolutional Neural Nets ⌨️ (1:08:07) Introduction to the 5 Steps to EVERY Deep Learning Model ⌨️ (1:08:23) 1. Gathering Data ⌨️ (1:11:27) 2. Preprocessing the Data ⌨️ (1:19:05) 3. Training your Model ⌨️ (1:19:33) 4. Evaluating your Model ⌨️ (1:19:55) 5. Optimizing your Model's Accuracy ⌨️ (1:25:15) Conclusion to the Course -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news

detail
{'title': 'Deep Learning Crash Course for Beginners', 'heatmap': [{'end': 465.585, 'start': 357.9, 'weight': 1}, {'end': 2059.541, 'start': 2003.282, 'weight': 0.72}, {'end': 2466.803, 'start': 2417.6, 'weight': 0.783}, {'end': 3961.823, 'start': 3900.921, 'weight': 0.741}], 'summary': 'This deep learning crash course covers the wide-reaching impact of deep learning, including defeating world champions in chess, transforming medical diagnosis, and explores topics such as neural networks, activation functions, loss functions, optimizers, supervised learning, overfitting, cnn architecture, and data gathering, providing insights into deep learning concepts and datasets for pre-processing.', 'chapters': [{'end': 137.263, 'segs': [{'end': 52.43, 'src': 'embed', 'start': 18.041, 'weight': 0, 'content': [{'end': 20.344, 'text': 'A lot of people, including me, never saw it coming.', 'start': 18.041, 'duration': 2.303}, {'end': 22.848, 'text': "It seemed impossible, but it's here now.", 'start': 20.925, 'duration': 1.923}, {'end': 24.73, 'text': 'Deep learning is everywhere.', 'start': 23.529, 'duration': 1.201}, {'end': 27.014, 'text': "It's beating physicians that are diagnosing cancer.", 'start': 24.951, 'duration': 2.063}, {'end': 33.062, 'text': "It's responsible for translating web pages in a matter of mere seconds to the autonomous vehicles by Waymo and Tesla.", 'start': 27.354, 'duration': 5.708}, {'end': 37.779, 'text': 'hi, my name is jason and welcome to this course in deep learning,', 'start': 34.076, 'duration': 3.703}, {'end': 47.746, 'text': "where you'll learn everything you need to get started with deep learning in python how to build remarkable algorithms capable of solving complex problems that weren't possible just a few decades ago.", 'start': 37.779, 'duration': 9.967}, {'end': 52.43, 'text': "we'll talk about what deep learning is and the difference between artificial intelligence and machine learning.", 'start': 47.746, 'duration': 4.684}], 'summary': 'Deep learning is revolutionizing various fields, from medical diagnosis to autonomous vehicles.', 'duration': 34.389, 'max_score': 18.041, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c18041.jpg'}, {'end': 137.263, 'src': 'embed', 'start': 111.693, 'weight': 2, 'content': [{'end': 118.816, 'text': "In 1997, Garry Kasparov, the most successful champion in the history of chess, lost to IBM's Deep Blue,", 'start': 111.693, 'duration': 7.123}, {'end': 121.156, 'text': 'one of the first computer or artificial systems.', 'start': 118.816, 'duration': 2.34}, {'end': 125.038, 'text': 'It was the first defeat of a reigning world chess champion by a computer.', 'start': 121.857, 'duration': 3.181}, {'end': 137.263, 'text': "In 2011, IBM's Watson competed in the game show Jeopardy! against its champions Brad Rutter and Ken Jennings, and won the first prize of $1 million.", 'start': 125.959, 'duration': 11.304}], 'summary': "Garry kasparov lost to ibm's deep blue in 1997, the first defeat of a world chess champion by a computer. in 2011, ibm's watson won $1 million on jeopardy!", 'duration': 25.57, 'max_score': 111.693, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c111693.jpg'}], 'start': 0.595, 'title': 'The power of deep learning', 'summary': 'Delves into the wide-reaching impact of deep learning, from defeating world champions in chess to transforming medical diagnosis, and introduces an in-depth course on deep learning in python.', 'chapters': [{'end': 137.263, 'start': 0.595, 'title': 'The power of deep learning', 'summary': 'Explores the widespread impact of deep learning, from defeating world champions in chess to revolutionizing medical diagnosis, and introduces a comprehensive course on deep learning in python.', 'duration': 136.668, 'highlights': ["Deep learning's triumph in defeating Garry Kasparov, the world chess champion, in 1997, marked a significant milestone in its development.", "Deep learning's role in revolutionizing medical diagnosis, surpassing physicians in detecting cancer, underscores its transformative impact on healthcare.", 'The introduction of a comprehensive course in deep learning in Python, offering essential knowledge to build powerful algorithms, reflects the increasing demand and significance of deep learning in the technology landscape.']}], 'duration': 136.668, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c595.jpg', 'highlights': ["Deep learning's role in revolutionizing medical diagnosis, surpassing physicians in detecting cancer, underscores its transformative impact on healthcare.", 'The introduction of a comprehensive course in deep learning in Python, offering essential knowledge to build powerful algorithms, reflects the increasing demand and significance of deep learning in the technology landscape.', "Deep learning's triumph in defeating Garry Kasparov, the world chess champion, in 1997, marked a significant milestone in its development."]}, {'end': 717.745, 'segs': [{'end': 158.83, 'src': 'embed', 'start': 137.363, 'weight': 0, 'content': [{'end': 146.227, 'text': "In 2015,, AlphaGo, a deep learning computer program created by Google's DeepMind division, defeated Lee Sedol, an 18-time world champion, at Go,", 'start': 137.363, 'duration': 8.864}, {'end': 148.688, 'text': 'a game Google more times complexed than chess.', 'start': 146.227, 'duration': 2.461}, {'end': 152.665, 'text': 'But deep learning can do more than just beat us at board games.', 'start': 150.063, 'duration': 2.602}, {'end': 158.83, 'text': 'It finds applications anywhere from self-driving vehicles to fake news detection to even predicting earthquakes.', 'start': 152.785, 'duration': 6.045}], 'summary': "Alphago defeated 18-time world champion at go, showcasing deep learning's potential in various applications.", 'duration': 21.467, 'max_score': 137.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c137363.jpg'}, {'end': 198.924, 'src': 'embed', 'start': 176.932, 'weight': 3, 'content': [{'end': 185.557, 'text': 'deep learning is a machine learning technique that learns features and tasks directly from data by running inputs through a biologically inspired neural network architecture.', 'start': 176.932, 'duration': 8.625}, {'end': 193.56, 'text': 'These neural networks contain a number of hidden layers through which data is processed, allowing for the machine to go deep in its learning,', 'start': 186.337, 'duration': 7.223}, {'end': 196.462, 'text': 'making connections and weighing input for the best results.', 'start': 193.56, 'duration': 2.902}, {'end': 198.924, 'text': "We'll go over neural networks in the next video.", 'start': 197.102, 'duration': 1.822}], 'summary': 'Deep learning uses neural networks to learn features directly from data.', 'duration': 21.992, 'max_score': 176.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c176932.jpg'}, {'end': 306.237, 'src': 'embed', 'start': 281.923, 'weight': 2, 'content': [{'end': 289.991, 'text': 'So why has deep learning gained popularity many decades later? Well, for one, data has become much more pervasive.', 'start': 281.923, 'duration': 8.068}, {'end': 296.678, 'text': "We're living in the age of big data, and these algorithms require massive amounts of data to effectively be implemented.", 'start': 290.432, 'duration': 6.246}, {'end': 306.237, 'text': 'Second, we have hardware and architecture that are capable of handling the vast amounts of data and computational power that these algorithms require,', 'start': 297.512, 'duration': 8.725}], 'summary': 'Deep learning gains popularity due to big data and advanced hardware.', 'duration': 24.314, 'max_score': 281.923, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c281923.jpg'}, {'end': 465.585, 'src': 'heatmap', 'start': 357.9, 'weight': 1, 'content': [{'end': 362.104, 'text': 'The input layer, the output layer, and several hidden layers between the two.', 'start': 357.9, 'duration': 4.204}, {'end': 366.549, 'text': "In the next video, we'll go over the learning process of a neural network.", 'start': 363.386, 'duration': 3.163}, {'end': 375.807, 'text': 'The learning process of a neural network can be broken into two main processes, forward propagation and back propagation.', 'start': 370.083, 'duration': 5.724}, {'end': 381.451, 'text': 'Forward propagation is the propagation of information from the input layer to the output layer.', 'start': 376.628, 'duration': 4.823}, {'end': 385.854, 'text': 'We can define our input layer as several neurons, X1 through Xn.', 'start': 381.992, 'duration': 3.862}, {'end': 393.24, 'text': 'These neurons connect to the neurons of the next layer through channels, and they are assigned numerical values called weights.', 'start': 386.675, 'duration': 6.565}, {'end': 400.275, 'text': 'The inputs are multiplied to the weights and their sum is sent as input to the neurons in the hidden layer,', 'start': 394.372, 'duration': 5.903}, {'end': 406.517, 'text': 'where each neuron in turn is associated to a numerical value called the bias, which is then added to the input sum.', 'start': 400.275, 'duration': 6.242}, {'end': 413.4, 'text': 'This weighted sum is then passed through a non-linear function called the activation function,', 'start': 408.238, 'duration': 5.162}, {'end': 417.102, 'text': 'which essentially decides if that particular neuron can contribute to the next layer.', 'start': 413.4, 'duration': 3.702}, {'end': 420.825, 'text': "In the output layer, it's basically a form of probability.", 'start': 418.143, 'duration': 2.682}, {'end': 424.628, 'text': 'The neuron with the highest value determines what the output finally is.', 'start': 421.285, 'duration': 3.343}, {'end': 426.63, 'text': "So let's go over a few terms.", 'start': 424.648, 'duration': 1.982}, {'end': 430.839, 'text': 'The weight of a neuron tells us how important the neuron is.', 'start': 427.577, 'duration': 3.262}, {'end': 434.422, 'text': 'The higher the value, the more important it is in the relationship.', 'start': 431.34, 'duration': 3.082}, {'end': 438.125, 'text': 'The bias is like the neuron having an opinion to the relationship.', 'start': 435.062, 'duration': 3.063}, {'end': 441.727, 'text': 'It serves to shift the activation function to the right or to the left.', 'start': 438.585, 'duration': 3.142}, {'end': 445.23, 'text': 'If you have had some experience with high school math,', 'start': 442.248, 'duration': 2.982}, {'end': 450.894, 'text': 'you should know that adding a scalar value to a function shifts the graph either to the left or to the right.', 'start': 445.23, 'duration': 5.664}, {'end': 452.615, 'text': 'And this is exactly what the bias does.', 'start': 451.354, 'duration': 1.261}, {'end': 455.978, 'text': 'It shifts the activation function to the right or to the left.', 'start': 452.855, 'duration': 3.123}, {'end': 460.581, 'text': 'That propagation is almost like full propagation, except in the reverse direction.', 'start': 456.438, 'duration': 4.143}, {'end': 465.585, 'text': 'Information here is passed from the output layer to the hidden layers, not the input layer.', 'start': 460.861, 'duration': 4.724}], 'summary': 'Neural network learning involves forward and back propagation through input, hidden, and output layers.', 'duration': 107.685, 'max_score': 357.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c357900.jpg'}, {'end': 400.275, 'src': 'embed', 'start': 370.083, 'weight': 4, 'content': [{'end': 375.807, 'text': 'The learning process of a neural network can be broken into two main processes, forward propagation and back propagation.', 'start': 370.083, 'duration': 5.724}, {'end': 381.451, 'text': 'Forward propagation is the propagation of information from the input layer to the output layer.', 'start': 376.628, 'duration': 4.823}, {'end': 385.854, 'text': 'We can define our input layer as several neurons, X1 through Xn.', 'start': 381.992, 'duration': 3.862}, {'end': 393.24, 'text': 'These neurons connect to the neurons of the next layer through channels, and they are assigned numerical values called weights.', 'start': 386.675, 'duration': 6.565}, {'end': 400.275, 'text': 'The inputs are multiplied to the weights and their sum is sent as input to the neurons in the hidden layer,', 'start': 394.372, 'duration': 5.903}], 'summary': 'Neural network learning involves forward and back propagation, with input layer neurons connecting to hidden layer neurons through weighted channels.', 'duration': 30.192, 'max_score': 370.083, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c370083.jpg'}], 'start': 137.363, 'title': 'The rise of deep learning and its neural networks', 'summary': "Explores the impact of alphago's 2015 victory, emphasizing deep learning's applications, ability to learn from data, and the limitations of traditional machine learning. it also explains the rise of deep learning, fueled by big data, advanced hardware, and open source software, detailing the neural network learning process, including forward and back propagation.", 'chapters': [{'end': 250.382, 'start': 137.363, 'title': 'Rise of deep learning', 'summary': "Discusses the impact of alphago's victory in 2015, highlighting the far-reaching applications of deep learning, its ability to learn directly from data, and the limitations of traditional machine learning algorithms.", 'duration': 113.019, 'highlights': ['Deep learning applications range from board games like Go to self-driving vehicles, fake news detection, and earthquake prediction.', 'AlphaGo, a deep learning computer program, defeated Lee Sedol, an 18-time world champion, at Go in 2015.', 'Deep learning is a machine learning technique that learns features and tasks directly from data by running inputs through a biologically inspired neural network architecture.', "Traditional machine learning algorithms require a lot of domain expertise, human intervention, and are limited to what they're designed for.", 'The limitations of traditional machine learning algorithms are demonstrated in the challenge of defining and recognizing complex patterns like facial features.']}, {'end': 717.745, 'start': 251.162, 'title': 'Deep learning: neural networks & backpropagation', 'summary': 'Explains the concept of deep learning, emphasizing its rise in popularity due to the age of big data, advanced hardware, and streamlined model building with open source software. it also details the learning process of neural networks, including forward propagation, back propagation, and the training algorithm.', 'duration': 466.583, 'highlights': ["Deep learning's rise in popularity is due to the age of big data, advanced hardware, and streamlined model building with open source software.", 'The learning process of a neural network involves forward propagation, back propagation, and the training algorithm.', 'Neural networks form the basis of deep learning, where algorithms are inspired by the structure of the human brain.']}], 'duration': 580.382, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c137363.jpg', 'highlights': ['Deep learning applications range from board games like Go to self-driving vehicles, fake news detection, and earthquake prediction.', 'AlphaGo, a deep learning computer program, defeated Lee Sedol, an 18-time world champion, at Go in 2015.', "Deep learning's rise in popularity is due to the age of big data, advanced hardware, and streamlined model building with open source software.", 'Deep learning is a machine learning technique that learns features and tasks directly from data by running inputs through a biologically inspired neural network architecture.', 'The learning process of a neural network involves forward propagation, back propagation, and the training algorithm.']}, {'end': 1340.59, 'segs': [{'end': 748.639, 'src': 'embed', 'start': 722.368, 'weight': 2, 'content': [{'end': 726.532, 'text': "In this section, we're going to talk about the most common terminologies used in deep learning today.", 'start': 722.368, 'duration': 4.164}, {'end': 729.114, 'text': "Let's start off with the activation function.", 'start': 727.312, 'duration': 1.802}, {'end': 739.154, 'text': 'The activation function serves to introduce something called non-linearity into the network and also decides whether a particular neuron can contribute to the next layer.', 'start': 730.37, 'duration': 8.784}, {'end': 748.639, 'text': 'But how do you decide if the neuron can fire or activate? Well, we had a couple of ideas which led to the creation of different activation functions.', 'start': 740.275, 'duration': 8.364}], 'summary': 'Common deep learning terminologies include activation functions, introducing non-linearity and determining neuron firing.', 'duration': 26.271, 'max_score': 722.368, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c722368.jpg'}, {'end': 839.887, 'src': 'embed', 'start': 807.474, 'weight': 1, 'content': [{'end': 808.315, 'text': "It's complicated, right?", 'start': 807.474, 'duration': 0.841}, {'end': 812.759, 'text': 'You would want the network to activate only one neuron and the other should be 0..', 'start': 808.875, 'duration': 3.884}, {'end': 815.821, 'text': 'Only then you would be able to say it was classified properly.', 'start': 812.759, 'duration': 3.062}, {'end': 820.297, 'text': 'In real practice, however, it is harder to train and converge it this way.', 'start': 816.932, 'duration': 3.365}, {'end': 826.426, 'text': 'It would be better if the activation was not binary and instead some probable value like 75% activated or 16% activated.', 'start': 820.558, 'duration': 5.868}, {'end': 828.85, 'text': "there's a 75% chance that it belongs to class 2, etc.", 'start': 826.426, 'duration': 2.424}, {'end': 839.887, 'text': 'then, if more than one neuron activates, you could find which neuron fires based on which has the highest probability.', 'start': 833.985, 'duration': 5.902}], 'summary': 'Neural network activation aims for accurate classification; non-binary activation improves practicality.', 'duration': 32.413, 'max_score': 807.474, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c807474.jpg'}, {'end': 938.2, 'src': 'embed', 'start': 907.905, 'weight': 6, 'content': [{'end': 914.428, 'text': 'the activation function of the final layer is nothing but just a linear function of the input of the first layer.', 'start': 907.905, 'duration': 6.523}, {'end': 916.309, 'text': 'pause for a bit and think about it.', 'start': 914.428, 'duration': 1.881}, {'end': 922.172, 'text': 'this means that the entire neural network of dozens of layers can be replaced by a single layer.', 'start': 916.309, 'duration': 5.863}, {'end': 928.716, 'text': 'remember, a combination of linear functions in the linear manner is still another linear function, and this is terrible,', 'start': 922.172, 'duration': 6.544}, {'end': 931.217, 'text': "because we've just lost the ability to stack layers.", 'start': 928.716, 'duration': 2.501}, {'end': 938.2, 'text': 'this way, no matter how much we stack, the whole network is still equivalent to a single layer with single activation.', 'start': 931.217, 'duration': 6.983}], 'summary': "The final layer's activation function is linear, making the entire neural network equivalent to a single layer.", 'duration': 30.295, 'max_score': 907.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c907905.jpg'}, {'end': 1104.78, 'src': 'embed', 'start': 1073.214, 'weight': 5, 'content': [{'end': 1078.336, 'text': 'So deciding between the sigmoid and the tanh will really depend on your requirement of the gradient strength.', 'start': 1073.214, 'duration': 5.122}, {'end': 1082.618, 'text': 'Like sigmoid, tanh is also a very popular and widely used activation function.', 'start': 1078.656, 'duration': 3.962}, {'end': 1086.619, 'text': 'And yes, like the sigmoid, tanh does have a vanishing gradient problem.', 'start': 1083.258, 'duration': 3.361}, {'end': 1093.362, 'text': 'The rectified linear unit, or the value function, is defined as a of x is equal to the max from 0 to x.', 'start': 1086.879, 'duration': 6.483}, {'end': 1097.644, 'text': 'At first look, this would look like a linear function, right? The graph is linear in the positive axis.', 'start': 1093.362, 'duration': 4.282}, {'end': 1104.78, 'text': 'Let me tell you, ReLU is in fact non-linear in nature, and combinations of ReLU are also non-linear.', 'start': 1098.956, 'duration': 5.824}], 'summary': 'Choosing between sigmoid and tanh depends on gradient strength. relu is non-linear.', 'duration': 31.566, 'max_score': 1073.214, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1073214.jpg'}, {'end': 1171.163, 'src': 'embed', 'start': 1138.753, 'weight': 4, 'content': [{'end': 1141.175, 'text': 'In other words, the activation will be dense.', 'start': 1138.753, 'duration': 2.422}, {'end': 1142.677, 'text': 'And this is costly.', 'start': 1141.836, 'duration': 0.841}, {'end': 1149.424, 'text': 'Ideally, we want only a few neurons in the network to activate, and thereby making the activation sparse and efficient.', 'start': 1143.137, 'duration': 6.287}, {'end': 1152.348, 'text': "Here's where the ReLU comes in.", 'start': 1151.087, 'duration': 1.261}, {'end': 1161.615, 'text': 'Imagine a network with randomly initialized weights and almost 50% of the network yields zero activation because of the characteristic of ReLU.', 'start': 1152.888, 'duration': 8.727}, {'end': 1165.378, 'text': 'It outputs zero for negative values of x.', 'start': 1162.036, 'duration': 3.342}, {'end': 1171.163, 'text': 'This means that only 50% of the neurons fire sparse activation, making the network lighter.', 'start': 1165.378, 'duration': 5.785}], 'summary': 'Sparse activation is achieved with relu, resulting in 50% lighter network.', 'duration': 32.41, 'max_score': 1138.753, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1138753.jpg'}, {'end': 1270.375, 'src': 'embed', 'start': 1244.641, 'weight': 3, 'content': [{'end': 1249.263, 'text': 'Because of the advantages that ReLU offers, does this mean that you should use ReLU for everything you do??', 'start': 1244.641, 'duration': 4.622}, {'end': 1251.884, 'text': 'Or could you consider sigmoid and tanh??', 'start': 1250.043, 'duration': 1.841}, {'end': 1253.665, 'text': 'Well, both.', 'start': 1252.904, 'duration': 0.761}, {'end': 1258.547, 'text': "When you know the function that you're trying to approximate has certain characteristics,", 'start': 1254.445, 'duration': 4.102}, {'end': 1264.87, 'text': 'you should choose an activation function which will approximate the function faster, leading to faster training processes.', 'start': 1258.547, 'duration': 6.323}, {'end': 1270.375, 'text': 'For example, a sigmoid function works well for binary classification problems,', 'start': 1266.072, 'duration': 4.303}], 'summary': 'Choose activation functions based on function characteristics for faster training. sigmoid is good for binary classification.', 'duration': 25.734, 'max_score': 1244.641, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1244641.jpg'}, {'end': 1323.289, 'src': 'embed', 'start': 1298.172, 'weight': 0, 'content': [{'end': 1305.536, 'text': 'in my definition of activation functions I mentioned that activation functions serve to introduce something called nonlinearity in the network.', 'start': 1298.172, 'duration': 7.364}, {'end': 1313.761, 'text': 'For all intents and purposes, introducing nonlinearity simply means that your activation function must be nonlinear, that is, not a straight line.', 'start': 1306.036, 'duration': 7.725}, {'end': 1320.205, 'text': 'Mathematically, linear functions are polynomials of degree 1, that, when graphed in the xy-plane,', 'start': 1314.679, 'duration': 5.526}, {'end': 1323.289, 'text': 'are straight lines inclined to the x-axis at a certain value.', 'start': 1320.205, 'duration': 3.084}], 'summary': 'Activation functions introduce nonlinearity in the network to prevent straight line behavior.', 'duration': 25.117, 'max_score': 1298.172, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1298172.jpg'}], 'start': 722.368, 'title': 'Activation functions in deep learning', 'summary': 'Discusses common terminologies in deep learning, focusing on activation functions, their drawbacks, the need for non-binary activation values, and limitations of linear functions in neural networks. it also covers the importance of activation functions, benefits and drawbacks of specific functions, and considerations for choosing the appropriate activation function.', 'chapters': [{'end': 888.132, 'start': 722.368, 'title': 'Activation functions in deep learning', 'summary': 'Discusses the most common terminologies used in deep learning, focusing on activation functions and their drawbacks, with the need for non-binary activation values, and the limitations of linear functions in neural networks.', 'duration': 165.764, 'highlights': ["The activation function introduces non-linearity and decides a neuron's contribution to the next layer.", 'The drawbacks of binary activation lead to the need for non-binary activation values like probabilities, enabling better classification of neurons into classes.', 'The limitations of linear functions in neural networks are discussed, particularly regarding the derivative and gradient descent.']}, {'end': 1340.59, 'start': 888.132, 'title': 'Activation functions in neural networks', 'summary': 'Discusses the importance of activation functions in neural networks, including the limitations of linear functions, the benefits and drawbacks of sigmoid, tanh, and relu functions, and the considerations for choosing the appropriate activation function based on the nature of the function being approximated.', 'duration': 452.458, 'highlights': ['The entire neural network of dozens of layers can be replaced by a single layer if all connected layers are linear in nature.', 'Sigmoid and tanh functions are non-linear in nature, allowing the stacking of layers and preventing activations from blowing up, but they suffer from the vanishing gradient problem.', 'ReLU is non-linear, unbounded, and less computationally expensive, allowing for sparse activation and efficient processing, but it faces the dying ReLU problem.', 'Choosing the appropriate activation function depends on the characteristics of the function being approximated, with considerations for faster training processes and convergence.']}], 'duration': 618.222, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c722368.jpg', 'highlights': ["The activation function introduces non-linearity and decides a neuron's contribution to the next layer.", 'The drawbacks of binary activation lead to the need for non-binary activation values like probabilities, enabling better classification of neurons into classes.', 'The limitations of linear functions in neural networks are discussed, particularly regarding the derivative and gradient descent.', 'Choosing the appropriate activation function depends on the characteristics of the function being approximated, with considerations for faster training processes and convergence.', 'ReLU is non-linear, unbounded, and less computationally expensive, allowing for sparse activation and efficient processing, but it faces the dying ReLU problem.', 'Sigmoid and tanh functions are non-linear in nature, allowing the stacking of layers and preventing activations from blowing up, but they suffer from the vanishing gradient problem.', 'The entire neural network of dozens of layers can be replaced by a single layer if all connected layers are linear in nature.']}, {'end': 2105.408, 'segs': [{'end': 1412.912, 'src': 'embed', 'start': 1383.314, 'weight': 0, 'content': [{'end': 1384.935, 'text': 'There are plenty of loss functions out there.', 'start': 1383.314, 'duration': 1.621}, {'end': 1389.798, 'text': 'For example, under regression, we have squared error loss, absolute error loss, and Huber loss.', 'start': 1384.955, 'duration': 4.843}, {'end': 1394.062, 'text': 'In binary classification, we have binary cross entropy and hinge loss.', 'start': 1390.099, 'duration': 3.963}, {'end': 1401.846, 'text': 'In multi-class classification problems, we have the multi-class cross entropy and the callback libel or divergence loss and so on.', 'start': 1394.702, 'duration': 7.144}, {'end': 1406.889, 'text': "The choice of the best function really depends on what kind of project you're working on.", 'start': 1403.327, 'duration': 3.562}, {'end': 1409.57, 'text': 'Different projects require different loss functions.', 'start': 1407.349, 'duration': 2.221}, {'end': 1412.912, 'text': "Now, I don't want to talk any further on loss functions right now.", 'start': 1410.271, 'duration': 2.641}], 'summary': 'Various loss functions exist for different tasks, such as regression and classification.', 'duration': 29.598, 'max_score': 1383.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1383314.jpg'}, {'end': 1550.021, 'src': 'embed', 'start': 1521.489, 'weight': 1, 'content': [{'end': 1525.331, 'text': 'We now come to gradient descent, often called the granddaddy of optimizers.', 'start': 1521.489, 'duration': 3.842}, {'end': 1537.034, 'text': 'Gradient descent is an iterative algorithm that starts off at a random point in the loss function and travels down its slope in steps until it reaches the lowest point or the minimum of the function.', 'start': 1526.701, 'duration': 10.333}, {'end': 1540.339, 'text': 'It is the most popular optimizer we use nowadays.', 'start': 1537.976, 'duration': 2.363}, {'end': 1542.722, 'text': "It's fast, robust and flexible.", 'start': 1540.639, 'duration': 2.083}, {'end': 1544.484, 'text': "And here's how it works.", 'start': 1543.483, 'duration': 1.001}, {'end': 1550.021, 'text': 'First, we calculate what a small change in each individual weight would do to the loss function.', 'start': 1545.478, 'duration': 4.543}], 'summary': 'Gradient descent is a popular, fast, and robust iterative algorithm for optimizing functions.', 'duration': 28.532, 'max_score': 1521.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1521489.jpg'}, {'end': 1660.154, 'src': 'embed', 'start': 1630.325, 'weight': 3, 'content': [{'end': 1633.346, 'text': 'can hinder your ability to minimize the loss function.', 'start': 1630.325, 'duration': 3.021}, {'end': 1639.005, 'text': "We don't want to make a jump so large that we skip over the optimal value for a given weight.", 'start': 1634.403, 'duration': 4.602}, {'end': 1642.987, 'text': "To make sure this doesn't happen, we use a variable called the learning rate.", 'start': 1639.526, 'duration': 3.461}, {'end': 1650.451, 'text': 'This thing is usually just a small number like 0.001 that we multiply the gradients by to scale them.', 'start': 1643.928, 'duration': 6.523}, {'end': 1654.173, 'text': 'This ensures that any changes we make to our weight are pretty small.', 'start': 1650.871, 'duration': 3.302}, {'end': 1660.154, 'text': 'In math talk, taking steps that are too large can mean that the algorithm will never converge to an optimum.', 'start': 1654.713, 'duration': 5.441}], 'summary': 'Optimize loss function with appropriate learning rate to ensure convergence.', 'duration': 29.829, 'max_score': 1630.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1630325.jpg'}, {'end': 1712.513, 'src': 'embed', 'start': 1685.276, 'weight': 4, 'content': [{'end': 1690.743, 'text': 'Instead of calculating the gradients for all your training examples on every pass of the gradient descent,', 'start': 1685.276, 'duration': 5.467}, {'end': 1695.029, 'text': "it's sometimes more efficient to only use a subset of the training examples each time.", 'start': 1690.743, 'duration': 4.286}, {'end': 1702.979, 'text': 'Stochastic gradient descent is an implementation that either uses batches of examples at a time or random examples on each pass.', 'start': 1695.409, 'duration': 7.57}, {'end': 1707.032, 'text': 'Stochastic gradient descent uses the concept of momentum.', 'start': 1704.031, 'duration': 3.001}, {'end': 1712.513, 'text': 'Momentum accumulates gradients of the past steps to dictate what might happen in the next steps.', 'start': 1707.472, 'duration': 5.041}], 'summary': 'Stochastic gradient descent uses subsets of training examples for efficiency and employs momentum to dictate next steps.', 'duration': 27.237, 'max_score': 1685.276, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1685276.jpg'}, {'end': 1777.505, 'src': 'embed', 'start': 1747.699, 'weight': 2, 'content': [{'end': 1751.001, 'text': 'The adaptive learning rate tends to get really really small over time.', 'start': 1747.699, 'duration': 3.302}, {'end': 1756.045, 'text': 'RMSProp is a special version of Adagrad developed by Professor Geoffrey Hinton.', 'start': 1751.522, 'duration': 4.523}, {'end': 1762.45, 'text': 'Instead of letting all the gradients accumulate for momentum, it accumulates gradients in a fixed window.', 'start': 1756.726, 'duration': 5.724}, {'end': 1770.418, 'text': 'rms prop is similar to add a prop, which is another optimizer that seeks to solve some of the issues that add a grant leaves open.', 'start': 1763.391, 'duration': 7.027}, {'end': 1777.505, 'text': 'adam stands for adaptive moment estimation and is another way of using past gradients to calculate the current gradient.', 'start': 1770.418, 'duration': 7.087}], 'summary': "Rmsprop and adam are adaptive learning rate optimizers, with rmsprop addressing adagrad's issues.", 'duration': 29.806, 'max_score': 1747.699, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1747699.jpg'}, {'end': 1836.3, 'src': 'embed', 'start': 1810.066, 'weight': 5, 'content': [{'end': 1816.972, 'text': 'You may have heard me referring to the words parameters quite a bit, and often this word is confused with the term hyperparameters.', 'start': 1810.066, 'duration': 6.906}, {'end': 1821.115, 'text': "In this video, I'm going to outline the basic difference between the two.", 'start': 1817.692, 'duration': 3.423}, {'end': 1829.062, 'text': 'A model parameter is a variable that is internal to the neural network and whose values can be estimated from the data itself.', 'start': 1822.496, 'duration': 6.566}, {'end': 1832.498, 'text': 'They are required by the model when making predictions.', 'start': 1830.057, 'duration': 2.441}, {'end': 1836.3, 'text': 'These values define the skill of the model on your problem.', 'start': 1833.058, 'duration': 3.242}], 'summary': 'Model parameters are internal variables in a neural network, crucial for predictions.', 'duration': 26.234, 'max_score': 1810.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1810066.jpg'}, {'end': 2059.541, 'src': 'heatmap', 'start': 1999.218, 'weight': 6, 'content': [{'end': 2003.282, 'text': 'But too many epochs could spell disaster and lead to something called overfitting,', 'start': 1999.218, 'duration': 4.064}, {'end': 2009.208, 'text': "where a model has essentially memorized the patterns in the training data and performs terribly on data it's never seen before.", 'start': 2003.282, 'duration': 5.926}, {'end': 2015.568, 'text': 'So what is the right number of epochs? Unfortunately, there is no right answer.', 'start': 2011.127, 'duration': 4.441}, {'end': 2018.149, 'text': 'The answer is different for different datasets.', 'start': 2016.048, 'duration': 2.101}, {'end': 2021.59, 'text': 'Sometimes your dataset can include millions of examples.', 'start': 2018.729, 'duration': 2.861}, {'end': 2026.031, 'text': 'Passing this entire dataset at once becomes extremely difficult.', 'start': 2022.09, 'duration': 3.941}, {'end': 2032.753, 'text': 'So what we do instead is divide the dataset into a number of batches, rather than passing the entire dataset once.', 'start': 2026.411, 'duration': 6.342}, {'end': 2038.395, 'text': 'The total number of training examples present in a single batch is called a batch size.', 'start': 2033.453, 'duration': 4.942}, {'end': 2042.216, 'text': 'Iterations is the number of batches needed to complete one epoch.', 'start': 2038.715, 'duration': 3.501}, {'end': 2048.157, 'text': 'Note, the number of batches is equal to the number of iterations for one epoch.', 'start': 2043.396, 'duration': 4.761}, {'end': 2052.339, 'text': "Let's say that we have a dataset of 34, 000 training examples.", 'start': 2048.838, 'duration': 3.501}, {'end': 2059.541, 'text': 'If we divide the dataset into batches of 500, then it will take 68 iterations to complete one epoch.', 'start': 2052.839, 'duration': 6.702}], 'summary': 'Optimizing model training with batch size and iterations, e.g., 68 iterations for 34,000 examples in batches of 500.', 'duration': 42.998, 'max_score': 1999.218, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1999218.jpg'}, {'end': 2094.344, 'src': 'embed', 'start': 2067.934, 'weight': 8, 'content': [{'end': 2072.076, 'text': 'Before we move on, I do want to mention this, and you will see this a lot in deep learning.', 'start': 2067.934, 'duration': 4.142}, {'end': 2074.897, 'text': 'You will often have a bunch of different choices to make.', 'start': 2072.556, 'duration': 2.341}, {'end': 2079.579, 'text': 'How many hidden layers should I choose? Or which activation function must I use and where?', 'start': 2075.297, 'duration': 4.282}, {'end': 2084.121, 'text': 'And, to be honest, there are no clear-cut guidelines as to what your choice should always be.', 'start': 2080.079, 'duration': 4.042}, {'end': 2086.442, 'text': "That's the fun part about deep learning.", 'start': 2084.92, 'duration': 1.522}, {'end': 2091.703, 'text': "It's extremely difficult to know in the beginning what's the right combination to use for your project.", 'start': 2086.862, 'duration': 4.841}, {'end': 2094.344, 'text': 'What works for me might not work for you.', 'start': 2092.284, 'duration': 2.06}], 'summary': 'In deep learning, there are numerous choices to make, including hidden layers and activation functions, without clear guidelines, making it difficult to determine the right combination for each project.', 'duration': 26.41, 'max_score': 2067.934, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2067934.jpg'}], 'start': 1340.59, 'title': 'Loss functions, optimizers, and model parameters in deep learning', 'summary': 'Covers loss functions such as squared error loss and binary cross entropy, optimizers like gradient descent, stochastic gradient descent, adagrad, rmsprop, and adam, and explains the difference between model parameters and hyperparameters, emphasizing the significance of epochs, batch size, and iterations in achieving accurate model predictions.', 'chapters': [{'end': 1806.205, 'start': 1340.59, 'title': 'Loss functions and optimizers in deep learning', 'summary': 'Covers the concept of loss functions, including examples such as squared error loss and binary cross entropy, as well as optimizers like gradient descent, stochastic gradient descent, and other advanced optimizers like adagrad, rmsprop, and adam, emphasizing their roles in minimizing the loss function and achieving accurate model predictions.', 'duration': 465.615, 'highlights': ['The chapter covers various loss functions such as squared error loss, absolute error loss, Huber loss, binary cross entropy, hinge loss, multi-class cross entropy, and Kullback–Leibler divergence loss, emphasizing their relevance in different types of data modeling and project requirements.', 'Gradient descent is explained as an iterative algorithm that adjusts individual weights based on their gradients, with the goal of minimizing the loss function, highlighting its popularity, speed, robustness, and flexibility in optimizing neural network models.', 'The concept of learning rate is discussed, emphasizing its role in ensuring appropriate weight adjustments and preventing convergence on local minima, with the explanation of using small numbers like 0.001 to scale gradients and avoid excessively large or small changes in weights.', 'Stochastic gradient descent is introduced as a more efficient implementation of gradient descent, utilizing subsets of training examples, and incorporating momentum to dictate future steps, ultimately reducing computational expense.', 'Advanced optimizers such as Adagrad, RMSProp, and Adam are highlighted for their adaptations of learning rates to individual features, accumulation of gradients in fixed windows, and utilization of past gradients to calculate current gradients, ultimately contributing to the minimization of loss functions in training neural networks.']}, {'end': 2105.408, 'start': 1810.066, 'title': 'Model parameters vs hyperparameters', 'summary': 'Explains the difference between model parameters and hyperparameters, the significance of epochs, batch size, and iterations in deep learning models, and the challenges in choosing configurations for deep learning projects.', 'duration': 295.342, 'highlights': ['The chapter explains the difference between model parameters and hyperparameters, emphasizing that model parameters are internal variables of the neural network, while hyperparameters are external configurations that cannot be estimated from the data.', 'It details the importance of epochs in training deep learning models, highlighting that multiple epochs are required for the network to generalize better and improve performance, but excessive epochs can lead to overfitting.', 'The significance of batch size and iterations is explained, illustrating that dividing large datasets into smaller batches and determining the number of iterations for one epoch are crucial for efficient training in deep learning.', 'The chapter also addresses the challenges in choosing configurations for deep learning projects, mentioning that there are no clear-cut guidelines for decisions such as the number of hidden layers, activation functions, and other choices, emphasizing the need for experimentation and learning through trial and error.']}], 'duration': 764.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c1340590.jpg', 'highlights': ['The chapter covers various loss functions such as squared error loss, absolute error loss, Huber loss, binary cross entropy, hinge loss, multi-class cross entropy, and Kullback–Leibler divergence loss, emphasizing their relevance in different types of data modeling and project requirements.', 'Gradient descent is explained as an iterative algorithm that adjusts individual weights based on their gradients, with the goal of minimizing the loss function, highlighting its popularity, speed, robustness, and flexibility in optimizing neural network models.', 'Advanced optimizers such as Adagrad, RMSProp, and Adam are highlighted for their adaptations of learning rates to individual features, accumulation of gradients in fixed windows, and utilization of past gradients to calculate current gradients, ultimately contributing to the minimization of loss functions in training neural networks.', 'The concept of learning rate is discussed, emphasizing its role in ensuring appropriate weight adjustments and preventing convergence on local minima, with the explanation of using small numbers like 0.001 to scale gradients and avoid excessively large or small changes in weights.', 'Stochastic gradient descent is introduced as a more efficient implementation of gradient descent, utilizing subsets of training examples, and incorporating momentum to dictate future steps, ultimately reducing computational expense.', 'The chapter explains the difference between model parameters and hyperparameters, emphasizing that model parameters are internal variables of the neural network, while hyperparameters are external configurations that cannot be estimated from the data.', 'It details the importance of epochs in training deep learning models, highlighting that multiple epochs are required for the network to generalize better and improve performance, but excessive epochs can lead to overfitting.', 'The significance of batch size and iterations is explained, illustrating that dividing large datasets into smaller batches and determining the number of iterations for one epoch are crucial for efficient training in deep learning.', 'The chapter also addresses the challenges in choosing configurations for deep learning projects, mentioning that there are no clear-cut guidelines for decisions such as the number of hidden layers, activation functions, and other choices, emphasizing the need for experimentation and learning through trial and error.']}, {'end': 2776.669, 'segs': [{'end': 2198.837, 'src': 'embed', 'start': 2169.766, 'weight': 0, 'content': [{'end': 2176.408, 'text': 'During training, a supervised learning algorithm will search for patterns in the data that correlate with the desired outputs.', 'start': 2169.766, 'duration': 6.642}, {'end': 2185.27, 'text': 'After training, it will take in new unseen inputs and will determine which label the new inputs will be classified as based on prior training data.', 'start': 2177.168, 'duration': 8.102}, {'end': 2191.913, 'text': 'The objective of a supervised learning model is to predict the correct label for newly presented input data.', 'start': 2186.15, 'duration': 5.763}, {'end': 2198.837, 'text': 'At its most basic form, a supervised learning algorithm can simply be written as y is equal to f,', 'start': 2192.514, 'duration': 6.323}], 'summary': 'Supervised learning predicts labels for new input data based on prior patterns.', 'duration': 29.071, 'max_score': 2169.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2169766.jpg'}, {'end': 2263.553, 'src': 'embed', 'start': 2233.137, 'weight': 1, 'content': [{'end': 2237.918, 'text': 'The most common example of classification is determining if an email is spam or not.', 'start': 2233.137, 'duration': 4.781}, {'end': 2244.52, 'text': 'With two classes to choose from, spam or not spam, this problem is called a binary classification problem.', 'start': 2238.539, 'duration': 5.981}, {'end': 2250.202, 'text': 'The algorithm will be given training data with emails that are both spam and not spam,', 'start': 2245.261, 'duration': 4.941}, {'end': 2255.864, 'text': 'and the model will find the features within the data that correlate to either class and create a mapping function.', 'start': 2250.202, 'duration': 5.662}, {'end': 2263.553, 'text': 'Then when provided with an unseen email, the model will use this function to determine whether or not the email is spam.', 'start': 2257.529, 'duration': 6.024}], 'summary': 'Binary classification identifies spam emails, creating a mapping function to make predictions.', 'duration': 30.416, 'max_score': 2233.137, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2233137.jpg'}, {'end': 2317.42, 'src': 'embed', 'start': 2294.791, 'weight': 2, 'content': [{'end': 2303.174, 'text': 'Regression is a predictive statistical process where the model attempts to find the important relationship between dependent and independent variables.', 'start': 2294.791, 'duration': 8.383}, {'end': 2310.257, 'text': 'The goal of a regression algorithm is to predict a continuous number, such as sales, income, and tax cost.', 'start': 2303.934, 'duration': 6.323}, {'end': 2317.42, 'text': 'The equation for a basic linear regression can be written as follows where x and y represent the features of the data,', 'start': 2310.957, 'duration': 6.463}], 'summary': 'Regression predicts relationships between variables to model continuous numbers such as sales and income.', 'duration': 22.629, 'max_score': 2294.791, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2294791.jpg'}, {'end': 2445.407, 'src': 'embed', 'start': 2417.6, 'weight': 3, 'content': [{'end': 2425.667, 'text': 'Unsupervised learning is a branch of machine learning that is used to manifest underlying patterns in data and is often used in exploratory data analysis.', 'start': 2417.6, 'duration': 8.067}, {'end': 2433.653, 'text': "Unlike supervised learning, unsupervised learning does not use label data, but instead focuses on the data's features.", 'start': 2426.768, 'duration': 6.885}, {'end': 2437.877, 'text': 'Label training data has a corresponding output for each input.', 'start': 2434.614, 'duration': 3.263}, {'end': 2445.407, 'text': 'The goal of an unsupervised learning algorithm is to analyze data and find important features in that data.', 'start': 2439.284, 'duration': 6.123}], 'summary': 'Unsupervised learning discovers data patterns without labeled data.', 'duration': 27.807, 'max_score': 2417.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2417600.jpg'}, {'end': 2466.803, 'src': 'heatmap', 'start': 2417.6, 'weight': 0.783, 'content': [{'end': 2425.667, 'text': 'Unsupervised learning is a branch of machine learning that is used to manifest underlying patterns in data and is often used in exploratory data analysis.', 'start': 2417.6, 'duration': 8.067}, {'end': 2433.653, 'text': "Unlike supervised learning, unsupervised learning does not use label data, but instead focuses on the data's features.", 'start': 2426.768, 'duration': 6.885}, {'end': 2437.877, 'text': 'Label training data has a corresponding output for each input.', 'start': 2434.614, 'duration': 3.263}, {'end': 2445.407, 'text': 'The goal of an unsupervised learning algorithm is to analyze data and find important features in that data.', 'start': 2439.284, 'duration': 6.123}, {'end': 2453.351, 'text': 'Unsupervised learning will often find subgroups or hidden patterns within the dataset that a human observer might not pick up on,', 'start': 2446.187, 'duration': 7.164}, {'end': 2455.872, 'text': "and this is extremely useful, as we'll soon find out.", 'start': 2453.351, 'duration': 2.521}, {'end': 2461.594, 'text': 'Unsupervised learning can be of two types, clustering and association.', 'start': 2457.668, 'duration': 3.926}, {'end': 2466.803, 'text': 'Clustering is the simplest and among the most common applications of unsupervised learning.', 'start': 2462.336, 'duration': 4.467}], 'summary': 'Unsupervised learning finds hidden patterns in data, useful for exploratory analysis.', 'duration': 49.203, 'max_score': 2417.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2417600.jpg'}, {'end': 2581.54, 'src': 'embed', 'start': 2563.696, 'weight': 4, 'content': [{'end': 2575.939, 'text': 'This application uses unsupervised learning algorithms where a potential client queries their requirements and Airbnb learns these patterns and recommends stays and experiences which fall under the same group of cluster.', 'start': 2563.696, 'duration': 12.243}, {'end': 2581.54, 'text': 'For example, a person looking for houses in San Francisco might not be interested in finding houses in Boston.', 'start': 2575.959, 'duration': 5.581}], 'summary': 'Airbnb uses unsupervised learning to recommend stays based on client queries.', 'duration': 17.844, 'max_score': 2563.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2563696.jpg'}, {'end': 2754.914, 'src': 'embed', 'start': 2730.786, 'weight': 5, 'content': [{'end': 2737.534, 'text': 'And finally, value is a future reward that an agent will receive by taking an action in a particular state.', 'start': 2730.786, 'duration': 6.748}, {'end': 2742.616, 'text': 'A reinforcement learning problem can be best explained through games.', 'start': 2739.591, 'duration': 3.025}, {'end': 2750.908, 'text': "Let's take the game of Pac-Man, where the goal of the agent, or Pac-Man, is to eat the food in the grid while avoiding the ghosts on its way.", 'start': 2743.217, 'duration': 7.691}, {'end': 2754.914, 'text': 'The grid world is the interactive environment for the agent.', 'start': 2751.829, 'duration': 3.085}], 'summary': 'Reinforcement learning involves future rewards in pac-man game environment.', 'duration': 24.128, 'max_score': 2730.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2730786.jpg'}], 'start': 2105.948, 'title': 'Learning in deep learning', 'summary': 'Covers supervised learning in deep learning, including types, importance, and examples of classification problems, as well as regression and unsupervised learning, and applications of unsupervised and reinforcement learning in platforms like airbnb and amazon.', 'chapters': [{'end': 2294.171, 'start': 2105.948, 'title': 'Supervised learning in deep learning', 'summary': 'Discusses supervised learning in deep learning, covering the definition, process, types, importance, and examples of classification problems, including popular algorithms such as linear classifiers, support vector machines, decision trees, k-nearest neighbors, and random forest.', 'duration': 188.223, 'highlights': ['Supervised learning involves training models on well-labeled data, with the objective of predicting the correct label for newly presented input data, typically expressed as y = f(x).', 'In supervised learning, there are two subcategories: classification and regression, with classification algorithms assigning input values to specific classes based on training data.', 'The most common example of classification is determining if an email is spam or not, with popular classification algorithms including linear classifiers, support vector machines, decision trees, k-nearest neighbors, and random forest.', 'The MNIST handwritten digits dataset serves as an example of a classification problem, where the inputs are images of handwritten digits and the output is a class label representing the digit, utilizing popular algorithms such as linear classifiers and decision trees.', 'Supervised learning is the most common sub-branch of machine learning today, often being the starting point for individuals new to machine learning.']}, {'end': 2523.96, 'start': 2294.791, 'title': 'Regression and unsupervised learning', 'summary': 'Introduces regression as a statistical process for predicting continuous numbers, such as sales and income, and explains the use of regression algorithms in simple and multiple feature models. it also discusses unsupervised learning as a branch of machine learning used to uncover patterns in data, particularly in clustering and association.', 'duration': 229.169, 'highlights': ['The chapter explains the concept of regression as a statistical process for predicting continuous numbers, such as sales, income, and tax cost, and the use of regression algorithms in simple and multiple feature models.', "The chapter describes the application of regression in predicting a student's test grade based on hours studied, demonstrating a clear positive correlation between the independent variable, hours studied, and the dependent variable, test grade.", 'The chapter introduces unsupervised learning as a branch of machine learning used to uncover patterns in data, focusing on clustering and association, and explaining the concept of clustering as the process of grouping data into different clusters or groups.', 'The chapter discusses the types of clustering, including partitional clustering and hierarchical clustering, and mentions commonly used clustering algorithms, such as k-means, expectation maximization, and hierarchical cluster analysis.', 'The chapter highlights the importance of unsupervised learning in finding subgroups or hidden patterns within a dataset that may not be noticeable to a human observer, emphasizing its usefulness in data analysis.']}, {'end': 2776.669, 'start': 2525.683, 'title': 'Unsupervised and reinforcement learning applications', 'summary': 'Discusses the applications of unsupervised learning in platforms like airbnb and amazon, as well as credit card fraud detection, and explains the goals and applications of reinforcement learning, particularly in games like pac-man.', 'duration': 250.986, 'highlights': ['Airbnb uses unsupervised learning algorithms to recommend stays and experiences based on client queries and learned patterns, as well as Amazon for product recommendations.', 'Credit card fraud detection employs unsupervised learning algorithms to identify usage patterns and potential fraudulent behavior, triggering alarms for potential fraud cases.', 'Reinforcement learning aims to maximize cumulative rewards by learning through trial and error in interactive environments, as exemplified in games like Pac-Man.']}], 'duration': 670.721, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2105948.jpg', 'highlights': ['Supervised learning involves training models on well-labeled data, with the objective of predicting the correct label for newly presented input data, typically expressed as y = f(x).', 'The most common example of classification is determining if an email is spam or not, with popular classification algorithms including linear classifiers, support vector machines, decision trees, k-nearest neighbors, and random forest.', 'The chapter explains the concept of regression as a statistical process for predicting continuous numbers, such as sales, income, and tax cost, and the use of regression algorithms in simple and multiple feature models.', 'The chapter introduces unsupervised learning as a branch of machine learning used to uncover patterns in data, focusing on clustering and association, and explaining the concept of clustering as the process of grouping data into different clusters or groups.', 'Airbnb uses unsupervised learning algorithms to recommend stays and experiences based on client queries and learned patterns, as well as Amazon for product recommendations.', 'Reinforcement learning aims to maximize cumulative rewards by learning through trial and error in interactive environments, as exemplified in games like Pac-Man.']}, {'end': 3864.652, 'segs': [{'end': 2862.237, 'src': 'embed', 'start': 2826.525, 'weight': 6, 'content': [{'end': 2829.887, 'text': 'Instead what we could do is draw a line that looks something like this.', 'start': 2826.525, 'duration': 3.362}, {'end': 2831.909, 'text': 'Now this really fits our model the best.', 'start': 2830.167, 'duration': 1.742}, {'end': 2834.471, 'text': 'But this is overfitting.', 'start': 2832.869, 'duration': 1.602}, {'end': 2842.566, 'text': "Remember that while training, we show our network some training data and once that's done, we'd expect it to be almost close to perfect.", 'start': 2835.322, 'duration': 7.244}, {'end': 2848.649, 'text': 'The problem with this graph is that, although it is probably the best line of fit for this graph,', 'start': 2843.467, 'duration': 5.182}, {'end': 2851.991, 'text': "it is the best line of fit only if you're considering your training data.", 'start': 2848.649, 'duration': 3.342}, {'end': 2862.237, 'text': "What your network has done in this graph is memorize the patterns between the training data and won't give accurate predictions at all on data it's never seen before.", 'start': 2853.272, 'duration': 8.965}], 'summary': 'Overfitting occurs when model memorizes training data, leading to inaccurate predictions on unseen data.', 'duration': 35.712, 'max_score': 2826.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2826525.jpg'}, {'end': 2941.271, 'src': 'embed', 'start': 2915.03, 'weight': 3, 'content': [{'end': 2923.254, 'text': 'What Dropout does is that at every iteration it randomly selects some nodes and removes them along with their incoming and outgoing connections,', 'start': 2915.03, 'duration': 8.224}, {'end': 2923.635, 'text': 'as shown.', 'start': 2923.254, 'duration': 0.381}, {'end': 2930.258, 'text': 'So each iteration has a different set of nodes and this results in a different set of outputs.', 'start': 2924.695, 'duration': 5.563}, {'end': 2932.58, 'text': 'So why do these models perform better?', 'start': 2930.959, 'duration': 1.621}, {'end': 2941.271, 'text': 'These models usually perform better than a single model as they capture more randomness and it memorizes less of the training data,', 'start': 2933.689, 'duration': 7.582}], 'summary': 'Dropout randomly selects nodes, improving model performance by capturing more randomness and memorizing less training data.', 'duration': 26.241, 'max_score': 2915.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2915030.jpg'}, {'end': 3007.074, 'src': 'embed', 'start': 2978.412, 'weight': 2, 'content': [{'end': 2984.137, 'text': 'This means that the main task facing our classifier is to be invariant to a wide variety of transformations.', 'start': 2978.412, 'duration': 5.725}, {'end': 2991.363, 'text': 'We can generate new x-y pairs easily just by applying transformations on the x-y inputs in our training set.', 'start': 2984.797, 'duration': 6.566}, {'end': 2999.229, 'text': 'Dataset augmentation has been a particularly effective technique for a specific classification problem, object recognition.', 'start': 2992.165, 'duration': 7.064}, {'end': 3007.074, 'text': 'Images are high dimensional and include an enormous range of factors of variation, many of which can easily be simulated.', 'start': 2999.99, 'duration': 7.084}], 'summary': 'Classifier must be invariant to transformations for effective dataset augmentation in object recognition.', 'duration': 28.662, 'max_score': 2978.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2978412.jpg'}, {'end': 3078.781, 'src': 'embed', 'start': 3056.433, 'weight': 1, 'content': [{'end': 3067.099, 'text': 'This means we can obtain a model with better validation set error and thus hopefully better test set error by stopping training at the point where the error in the validation set starts to increase.', 'start': 3056.433, 'duration': 10.666}, {'end': 3070.234, 'text': 'This strategy is known as early stopping.', 'start': 3068.032, 'duration': 2.202}, {'end': 3074.698, 'text': 'It is probably the most commonly used form of regularization and deploying today.', 'start': 3070.894, 'duration': 3.804}, {'end': 3078.781, 'text': 'Its popularity is due to both its effectiveness and its simplicity.', 'start': 3075.338, 'duration': 3.443}], 'summary': 'Early stopping reduces validation error for better test set performance.', 'duration': 22.348, 'max_score': 3056.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3056433.jpg'}, {'end': 3387.658, 'src': 'embed', 'start': 3362.314, 'weight': 5, 'content': [{'end': 3369.135, 'text': 'And the biggest problem, it turns out, is that plain vanilla feedforward neural networks cannot model sequential data.', 'start': 3362.314, 'duration': 6.821}, {'end': 3372.436, 'text': 'Sequential data is data in a sequence.', 'start': 3369.976, 'duration': 2.46}, {'end': 3375.117, 'text': 'For example, a sentence is a sequence of words.', 'start': 3372.736, 'duration': 2.381}, {'end': 3378.918, 'text': 'A ball moving in space is a sequence of all its position states.', 'start': 3375.557, 'duration': 3.361}, {'end': 3385.397, 'text': 'In the sentence that I had shown you, you understood each word based off your understanding of the previous words.', 'start': 3380.014, 'duration': 5.383}, {'end': 3387.658, 'text': 'This is called sequential memory.', 'start': 3386.097, 'duration': 1.561}], 'summary': 'Feedforward neural networks struggle with sequential data modeling, as seen in language comprehension and object motion tracking.', 'duration': 25.344, 'max_score': 3362.314, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3362314.jpg'}, {'end': 3500.636, 'src': 'embed', 'start': 3476.966, 'weight': 0, 'content': [{'end': 3485.838, 'text': 'Unlike feed forward neural networks, the recurrent neural network or RNN can operate effectively on sequences of data with variable input length.', 'start': 3476.966, 'duration': 8.872}, {'end': 3488.983, 'text': 'This is how an RNN is usually represented.', 'start': 3486.719, 'duration': 2.264}, {'end': 3492.047, 'text': 'This little loop here is called the feedback loop.', 'start': 3489.824, 'duration': 2.223}, {'end': 3496.614, 'text': 'Sometimes you may find the RNNs depicted over time like this.', 'start': 3493.553, 'duration': 3.061}, {'end': 3500.636, 'text': 'The first path represents the network in the first time step.', 'start': 3497.515, 'duration': 3.121}], 'summary': 'Rnns process variable-length data effectively, leveraging feedback loops for representation over time.', 'duration': 23.67, 'max_score': 3476.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3476966.jpg'}, {'end': 3864.652, 'src': 'embed', 'start': 3844.687, 'weight': 4, 'content': [{'end': 3854.761, 'text': 'The main difference between a gated RNN and an LCM is that the gated RNN has two gates to control its memory an update gate and a reset gate,', 'start': 3844.687, 'duration': 10.074}, {'end': 3859.869, 'text': 'while an LCM has three gates an input gate, an output gate and a forget gate.', 'start': 3854.761, 'duration': 5.108}, {'end': 3864.652, 'text': 'RNNs work well for applications that involve sequences of data that change over time.', 'start': 3860.209, 'duration': 4.443}], 'summary': 'Gated rnn has 2 gates, lcm has 3 gates; rnns suitable for evolving data.', 'duration': 19.965, 'max_score': 3844.687, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3844687.jpg'}], 'start': 2776.669, 'title': 'Deep learning concepts', 'summary': 'Covers overfitting in deep learning and techniques to address it, introduces neural network architectures and limitations, and explains the importance of recurrent neural networks in modeling variable length sequences and addressing short-term memory issues.', 'chapters': [{'end': 3078.781, 'start': 2776.669, 'title': 'Overfitting and regularization in deep learning', 'summary': 'Discusses the problem of overfitting in deep learning, emphasizing the impact on model performance and introducing techniques such as dropout and dataset augmentation to address overfitting and improve generalization.', 'duration': 302.112, 'highlights': ['The problem of overfitting is explained using the analogy of fitting a line to a dataset, highlighting the issues of underfitting and overfitting and their impact on model performance.', 'Explanation of Dropout technique, which involves randomly removing nodes and their connections during each iteration to capture more randomness and force the model to generalize better.', 'Discussion on dataset augmentation as a technique to create fake data and improve generalization, particularly effective for object recognition tasks by simulating transformations on high-dimensional images.', 'Introduction of early stopping as a common form of regularization to prevent overfitting by halting training at the point where the error in the validation set starts to increase, aiming for better test set error.']}, {'end': 3430.846, 'start': 3081.123, 'title': 'Neural network architectures', 'summary': 'Introduces fully connected feedforward neural networks, activation functions, and the limitations of plain vanilla neural networks in modeling sequential data, emphasizing the importance of sequential memory and the shortcomings of traditional neural networks in parameter sharing across time.', 'duration': 349.723, 'highlights': ['The limitations of plain vanilla feedforward neural networks in modeling sequential data due to the absence of sequential memory and parameter sharing across time.', 'The introduction of fully connected feedforward neural networks and the explanation of activation functions including linear, sigmoid, hyperbolic tangent, and rectified linear unit (ReLU).', 'The impact of adding more neurons and hidden layers on the complexity and training time of a neural network, emphasizing the trade-off between network size and computational resources.', 'The explanation of the importance of sequential memory in understanding data points in a sequence, highlighting the inability of traditional neural networks to achieve this.', 'The demonstration of the similarity in meaning between two different sentences, showcasing the challenge of feed-forward neural networks in assigning different weights to similar details at different points in time.']}, {'end': 3864.652, 'start': 3431.887, 'title': 'Understanding recurrent neural networks', 'summary': 'Explains the importance of recurrent neural networks (rnns) in modeling variable length sequences, maintaining sequence order, and dealing with long-term dependencies, highlighting their ability to learn from previous states, the issue of short-term memory caused by vanishing gradients, and the use of gated rnns and lstms to combat the short-term memory problem.', 'duration': 432.765, 'highlights': ['RNNs can operate effectively on sequences of data with variable input length and use feedback loops to propagate information via hidden states throughout time.', "The issue of short-term memory in RNNs is caused by vanishing and exploding gradient problems, leading to the network's difficulty in retaining information from previous steps.", 'Gated RNNs and LSTMs are used to combat the short-term memory problem in RNNs by employing mechanisms called gates to learn and control long-term dependencies.']}], 'duration': 1087.983, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c2776669.jpg', 'highlights': ['RNNs can operate effectively on sequences of data with variable input length and use feedback loops to propagate information via hidden states throughout time.', 'Introduction of early stopping as a common form of regularization to prevent overfitting by halting training at the point where the error in the validation set starts to increase, aiming for better test set error.', 'Discussion on dataset augmentation as a technique to create fake data and improve generalization, particularly effective for object recognition tasks by simulating transformations on high-dimensional images.', 'Explanation of Dropout technique, which involves randomly removing nodes and their connections during each iteration to capture more randomness and force the model to generalize better.', 'Gated RNNs and LSTMs are used to combat the short-term memory problem in RNNs by employing mechanisms called gates to learn and control long-term dependencies.', 'The limitations of plain vanilla feedforward neural networks in modeling sequential data due to the absence of sequential memory and parameter sharing across time.', 'The problem of overfitting is explained using the analogy of fitting a line to a dataset, highlighting the issues of underfitting and overfitting and their impact on model performance.']}, {'end': 4232.911, 'segs': [{'end': 3961.823, 'src': 'heatmap', 'start': 3900.921, 'weight': 0.741, 'content': [{'end': 3908.247, 'text': 'Like a fully connected neural network, a CNN is composed of an input layer, an output layer, and several hidden layers between the two.', 'start': 3900.921, 'duration': 7.326}, {'end': 3912.831, 'text': 'CNNs derive their names from the type of hidden layers it consists of.', 'start': 3909.168, 'duration': 3.663}, {'end': 3921.317, 'text': 'The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers, and normalization layers.', 'start': 3913.491, 'duration': 7.826}, {'end': 3929.862, 'text': 'This means that instead of traditional activation functions we use in feedforward neural networks, convolution and pooling functions are used instead.', 'start': 3922.415, 'duration': 7.447}, {'end': 3938.149, 'text': 'More often than not, the input of a CNN is typically a two-dimensional array of neurons which correspond to the pixels of an image, for example,', 'start': 3930.242, 'duration': 7.907}, {'end': 3939.95, 'text': "if you're doing image classification.", 'start': 3938.149, 'duration': 1.801}, {'end': 3943.153, 'text': 'The output layer is typically one-dimensional.', 'start': 3940.711, 'duration': 2.442}, {'end': 3949.458, 'text': 'Convolution is a technique that allows us to extract visual features from a 2D array in small chunks.', 'start': 3943.633, 'duration': 5.825}, {'end': 3956.061, 'text': 'Each neuron in a convolution layer is responsible for a small cluster of neurons in the preceding layer.', 'start': 3950.339, 'duration': 5.722}, {'end': 3961.823, 'text': 'The bounding box that determines a cluster of neurons is called a filter, also called a kernel.', 'start': 3956.901, 'duration': 4.922}], 'summary': 'Cnns consist of input, output, and hidden layers, using convolution and pooling functions to extract visual features.', 'duration': 60.902, 'max_score': 3900.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3900921.jpg'}, {'end': 3938.149, 'src': 'embed', 'start': 3909.168, 'weight': 4, 'content': [{'end': 3912.831, 'text': 'CNNs derive their names from the type of hidden layers it consists of.', 'start': 3909.168, 'duration': 3.663}, {'end': 3921.317, 'text': 'The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers, and normalization layers.', 'start': 3913.491, 'duration': 7.826}, {'end': 3929.862, 'text': 'This means that instead of traditional activation functions we use in feedforward neural networks, convolution and pooling functions are used instead.', 'start': 3922.415, 'duration': 7.447}, {'end': 3938.149, 'text': 'More often than not, the input of a CNN is typically a two-dimensional array of neurons which correspond to the pixels of an image, for example,', 'start': 3930.242, 'duration': 7.907}], 'summary': 'Cnns use convolution, pooling, fully connected & normalization layers for 2d image input.', 'duration': 28.981, 'max_score': 3909.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3909168.jpg'}, {'end': 4110.216, 'src': 'embed', 'start': 4069.027, 'weight': 0, 'content': [{'end': 4077.589, 'text': 'Convolutional neural networks are used heavily in the field of computer vision and work well for a variety of tasks, including image recognition,', 'start': 4069.027, 'duration': 8.562}, {'end': 4083.011, 'text': 'image processing, image segmentation, video analysis and natural language processing.', 'start': 4077.589, 'duration': 5.422}, {'end': 4091.163, 'text': "In this section, I'm going to discuss the five steps that are common in every deep learning project that you build.", 'start': 4085.7, 'duration': 5.463}, {'end': 4098.648, 'text': 'These can be extended to include various other aspects, but at its very core, they are very fundamentally five steps.', 'start': 4092.124, 'duration': 6.524}, {'end': 4102.731, 'text': 'Data is at the core of what deep learning is all about.', 'start': 4099.988, 'duration': 2.743}, {'end': 4106.434, 'text': 'Your model will only be as powerful as the data you bring.', 'start': 4103.412, 'duration': 3.022}, {'end': 4110.216, 'text': 'Which brings me to the first step, gathering your data.', 'start': 4107.395, 'duration': 2.821}], 'summary': 'Convolutional neural networks excel in computer vision tasks. five fundamental steps in deep learning projects. data quality crucial.', 'duration': 41.189, 'max_score': 4069.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4069027.jpg'}, {'end': 4196.535, 'src': 'embed', 'start': 4155.844, 'weight': 2, 'content': [{'end': 4163.154, 'text': 'but the general rule of thumb is that the amount of data you need for a well-performing model should be 10 times the number of parameters in that model.', 'start': 4155.844, 'duration': 7.31}, {'end': 4168.044, 'text': "However, this may differ from time to time depending on the type of model you're building.", 'start': 4164.042, 'duration': 4.002}, {'end': 4174.106, 'text': 'For example, in regression analysis, you should use around 10 examples per predictor variable.', 'start': 4168.323, 'duration': 5.783}, {'end': 4180.849, 'text': "For image classification, the minimum you should have is around 1000 images per class that you're trying to classify.", 'start': 4174.747, 'duration': 6.102}, {'end': 4184.451, 'text': 'While quantity of data matters, quality matters too.', 'start': 4181.349, 'duration': 3.102}, {'end': 4188.89, 'text': "There is no use having a lot of data if it's bad data.", 'start': 4185.488, 'duration': 3.402}, {'end': 4193.854, 'text': 'There are certain aspects of quality that tend to correspond to well-performing models.', 'start': 4189.611, 'duration': 4.243}, {'end': 4196.535, 'text': 'One aspect is reliability.', 'start': 4194.854, 'duration': 1.681}], 'summary': 'For a well-performing model, aim for 10 times the parameters in data; e.g., 10 examples per predictor variable in regression, and 1000 images per class in image classification.', 'duration': 40.691, 'max_score': 4155.844, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4155844.jpg'}], 'start': 3865.453, 'title': 'Cnn architecture and data gathering', 'summary': 'Discusses the architecture and applications of cnns, emphasizing their usage in computer vision and image classification. it also highlights the importance of data in deep learning, providing guidelines for gathering and selecting data, with examples ranging from 150 images to trillions of data points.', 'chapters': [{'end': 4091.163, 'start': 3865.453, 'title': 'Convolutional neural networks', 'summary': 'Discusses the applications, architecture, and working of convolutional neural networks (cnns), emphasizing their usage in computer vision and various tasks like image classification, with a focus on the five common steps in deep learning projects.', 'duration': 225.71, 'highlights': ['CNNs are heavily used in computer vision for tasks like image recognition, image processing, and video analysis.', 'CNNs consist of convolutional layers, pooling layers, fully connected layers, and normalization layers, and use convolution and pooling functions instead of traditional activation functions.', 'The five common steps in deep learning projects are discussed in the section.']}, {'end': 4232.911, 'start': 4092.124, 'title': 'Data gathering for deep learning', 'summary': 'Emphasizes the importance of data in deep learning, highlighting the crucial steps in gathering and selecting data, and providing guidelines for the quantity and quality of data required, with examples ranging from 150 images to trillions of data points.', 'duration': 140.787, 'highlights': ['Data is at the core of what deep learning is all about.', 'The amount of data you need for a well-performing model should be 10 times the number of parameters in that model.', 'For image classification, the minimum you should have is around 1000 images per class.', 'Reliability refers to the degree in which you can trust your data.', "There is no use having a lot of data if it's bad data."]}], 'duration': 367.458, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c3865453.jpg', 'highlights': ['CNNs are heavily used in computer vision for tasks like image recognition, image processing, and video analysis.', 'Data is at the core of what deep learning is all about.', 'The amount of data you need for a well-performing model should be 10 times the number of parameters in that model.', 'For image classification, the minimum you should have is around 1000 images per class.', 'CNNs consist of convolutional layers, pooling layers, fully connected layers, and normalization layers, and use convolution and pooling functions instead of traditional activation functions.']}, {'end': 5137.575, 'segs': [{'end': 4275.598, 'src': 'embed', 'start': 4233.451, 'weight': 0, 'content': [{'end': 4237.753, 'text': 'Luckily for us, there are plenty of resources on the web that offer good datasets for free.', 'start': 4233.451, 'duration': 4.302}, {'end': 4240.935, 'text': 'Here are a few sites where you can begin your dataset search.', 'start': 4238.433, 'duration': 2.502}, {'end': 4249.122, 'text': 'The UCI Machine Learning Repository maintains around 500 extremely well-maintained datasets that you can use in your deep learning projects.', 'start': 4241.276, 'duration': 7.846}, {'end': 4250.724, 'text': "Kaggle's another one.", 'start': 4249.803, 'duration': 0.921}, {'end': 4253.066, 'text': "You'll love how detailed their datasets are.", 'start': 4251.044, 'duration': 2.022}, {'end': 4258.21, 'text': 'They give you info on the features, data types, number of records, and so on.', 'start': 4253.526, 'duration': 4.684}, {'end': 4261.974, 'text': "You can use their kernel too, and you won't have to download the dataset.", 'start': 4258.871, 'duration': 3.103}, {'end': 4268.296, 'text': "Google's dataset search is still in beta, but it's one of the most amazing sites that you can find today.", 'start': 4262.774, 'duration': 5.522}, {'end': 4272.237, 'text': 'Reddit too is a great place to request for datasets you want.', 'start': 4269.416, 'duration': 2.821}, {'end': 4275.598, 'text': 'But again, there is a chance of it not being properly organized.', 'start': 4272.717, 'duration': 2.881}], 'summary': "Various websites offer free datasets, including uci ml repository with 500+ well-maintained datasets, detailed datasets on kaggle, and google's beta dataset search.", 'duration': 42.147, 'max_score': 4233.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4233451.jpg'}, {'end': 4323.211, 'src': 'embed', 'start': 4298.175, 'weight': 3, 'content': [{'end': 4303.939, 'text': 'In general, we usually split a dataset into three parts, training, testing, and validating sets.', 'start': 4298.175, 'duration': 5.764}, {'end': 4311.586, 'text': "We train our models with the training set, evaluate it on the validation set and finally, once it's ready to use,", 'start': 4304.7, 'duration': 6.886}, {'end': 4313.927, 'text': 'test it one last time on the testing dataset.', 'start': 4311.586, 'duration': 2.341}, {'end': 4316.949, 'text': 'Now, it is reasonable to ask the following question.', 'start': 4314.688, 'duration': 2.261}, {'end': 4323.211, 'text': 'Why not have two sets, training and testing? In that way, the process will be much simpler.', 'start': 4317.649, 'duration': 5.562}], 'summary': 'Dataset is usually split into three parts: training, testing, and validating sets, but having only two sets can simplify the process.', 'duration': 25.036, 'max_score': 4298.175, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4298175.jpg'}, {'end': 4535.994, 'src': 'embed', 'start': 4505.222, 'weight': 4, 'content': [{'end': 4508.943, 'text': "Of course, there are a couple of ways to do this, and you can Google them if you'd like.", 'start': 4505.222, 'duration': 3.721}, {'end': 4515.306, 'text': 'Dealing with missing data is one of the most challenging steps in the gathering of data for your deep learning projects.', 'start': 4509.284, 'duration': 6.022}, {'end': 4520.288, 'text': "Unless you're extremely lucky to land with the perfect data set, which is quite rare,", 'start': 4515.646, 'duration': 4.642}, {'end': 4523.829, 'text': 'dealing with missing data will probably take a significant chunk of your time.', 'start': 4520.288, 'duration': 3.541}, {'end': 4528.851, 'text': 'It is quite common in real world problems to miss some values of our data samples.', 'start': 4524.609, 'duration': 4.242}, {'end': 4535.994, 'text': 'This may be due to errors on the data collection, blank spaces on surveys, measurements not applicable and so on.', 'start': 4529.731, 'duration': 6.263}], 'summary': 'Dealing with missing data is a significant challenge in deep learning projects, often consuming a substantial amount of time.', 'duration': 30.772, 'max_score': 4505.222, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4505222.jpg'}, {'end': 4827.013, 'src': 'embed', 'start': 4779.805, 'weight': 5, 'content': [{'end': 4784.928, 'text': 'The evaluation process allows us to test a model against data it has never seen before.', 'start': 4779.805, 'duration': 5.123}, {'end': 4790.251, 'text': 'And this is meant to be representative of how good the model might perform in the real world.', 'start': 4785.508, 'duration': 4.743}, {'end': 4796.113, 'text': "After the evaluation process, there's a high chance that your model could be optimized further.", 'start': 4791.531, 'duration': 4.582}, {'end': 4801.856, 'text': 'Remember, we started with random weights and biases, and these were fine-tuned during backpropagation.', 'start': 4796.914, 'duration': 4.942}, {'end': 4806.899, 'text': "Well, in quite a few cases, backpropagation won't get it right the first time.", 'start': 4803.237, 'duration': 3.662}, {'end': 4807.999, 'text': "And that's okay.", 'start': 4807.259, 'duration': 0.74}, {'end': 4810.801, 'text': 'There are a few ways to optimize your model further.', 'start': 4808.46, 'duration': 2.341}, {'end': 4816.304, 'text': "Tuning hyperparameters is a good way of optimizing your model's performance.", 'start': 4812.401, 'duration': 3.903}, {'end': 4821.168, 'text': 'One way to do this is by showing the model the entire dataset multiple times.', 'start': 4817.025, 'duration': 4.143}, {'end': 4823.95, 'text': 'That is, by increasing the number of epochs.', 'start': 4821.768, 'duration': 2.182}, {'end': 4827.013, 'text': 'This is sometimes shown to improve accuracy.', 'start': 4824.831, 'duration': 2.182}], 'summary': "Evaluation tests model's real-world performance, can be optimized further through hyperparameter tuning and increasing epochs.", 'duration': 47.208, 'max_score': 4779.805, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4779805.jpg'}, {'end': 4940.26, 'src': 'embed', 'start': 4908.376, 'weight': 7, 'content': [{'end': 4910.718, 'text': 'Getting more data and regularization.', 'start': 4908.376, 'duration': 2.342}, {'end': 4914.459, 'text': 'Getting more data is usually the best solution.', 'start': 4912.318, 'duration': 2.141}, {'end': 4918.321, 'text': 'A model trained in more data will naturally generalize better.', 'start': 4915.039, 'duration': 3.282}, {'end': 4927.505, 'text': 'Reducing the model size by reducing the number of learnable parameters in the model and with it its learning capacity is another way.', 'start': 4919.841, 'duration': 7.664}, {'end': 4934.916, 'text': 'However, by lowering the capacity of the network, you force it to learn patterns that matter or that minimize the loss.', 'start': 4928.57, 'duration': 6.346}, {'end': 4940.26, 'text': "On the other hand, reducing the network's capacity too much will lead to underfitting.", 'start': 4935.616, 'duration': 4.644}], 'summary': 'Increasing data and reducing model size improves generalization.', 'duration': 31.884, 'max_score': 4908.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4908376.jpg'}], 'start': 4233.451, 'title': 'Deep learning datasets and pre-processing', 'summary': "Covers free datasets for deep learning, including uci machine learning repository, kaggle, and google's dataset search. it also discusses creating and pre-processing datasets, handling missing data, and training neural networks with techniques like model tuning, feature scaling, and addressing overfitting.", 'chapters': [{'end': 4275.598, 'start': 4233.451, 'title': 'Free datasets for deep learning', 'summary': "Highlights various web resources for free datasets, including uci machine learning repository with around 500 well-maintained datasets, detailed datasets on kaggle, and the beta version of google's dataset search.", 'duration': 42.147, 'highlights': ['UCI Machine Learning Repository maintains around 500 extremely well-maintained datasets.', 'Kaggle provides detailed datasets with info on features, data types, and number of records.', "Google's dataset search is one of the most amazing sites for finding datasets today.", 'Reddit is a great place to request datasets but may have issues with organization.']}, {'end': 4504.288, 'start': 4275.998, 'title': 'Creating and pre-processing datasets', 'summary': 'Covers the importance of splitting datasets into training, testing, and validation sets, the logic behind the split, considerations for model tuning, and techniques like cross-validation and time-based splits in pre-processing datasets.', 'duration': 228.29, 'highlights': ['Splitting a dataset into training, testing, and validation sets is crucial, with the general practice being to train models on the training set, evaluate on the validation set, and finally test on the testing dataset.', 'The logic behind having three sets instead of two is to facilitate model tuning through feedback received from the validation set, ensuring that all sets are similar and minimizing skewing.', 'Considerations for determining the split ratio depend on the total number of samples in the data and the complexity of the model, with models with many hyperparameters requiring a large validation set and cross-validation.', 'Cross-validation, especially k-fold cross validation, is a popular technique to avoid overfitting and make multiple splits of the training and validation sets.', 'For time series data, a common technique is to split the data by time, ensuring that the training data is older than the serving data to mirror the lag between training and serving, which is suitable for very large datasets.']}, {'end': 4740.323, 'start': 4505.222, 'title': 'Handling missing data and pre-processing for deep learning', 'summary': 'Discusses the challenges of missing data in deep learning projects, the impact of imbalanced data on model bias, and the importance of feature scaling for improved model performance.', 'duration': 235.101, 'highlights': ['Dealing with missing data is one of the most challenging steps in data gathering for deep learning projects.', 'Imbalanced data can lead to model bias, and downsampling and upweighting are used to address skewed class proportions.', 'Feature scaling, including normalization and standardization, is crucial for deep learning algorithms to perform better.']}, {'end': 5137.575, 'start': 4742.124, 'title': 'Training neural networks', 'summary': 'Discusses training neural networks, including the evaluation process, optimizing models, addressing overfitting, and data augmentation, emphasizing techniques such as tuning hyperparameters, weight regularization, and dropout.', 'duration': 395.451, 'highlights': ['The evaluation process allows us to test a model against data it has never seen before to gauge its real-world performance.', 'Tuning hyperparameters by increasing the number of epochs can improve model accuracy.', 'Addressing overfitting by getting more data or reducing the model size can help the model generalize better.', 'Applying weight regularization to the model can help in addressing overfitting by constraining the complexity of the network.', 'Data augmentation is a method of increasing the dataset without adding new data, which can improve model performance, especially in limited datasets.', 'Dropout is a technique used to reduce overfitting by randomly dropping out units or neurons in the network during training.']}], 'duration': 904.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/VyWAvY2CF9c/pics/VyWAvY2CF9c4233451.jpg', 'highlights': ['UCI Machine Learning Repository maintains around 500 extremely well-maintained datasets.', 'Kaggle provides detailed datasets with info on features, data types, and number of records.', "Google's dataset search is one of the most amazing sites for finding datasets today.", 'Splitting a dataset into training, testing, and validation sets is crucial.', 'Dealing with missing data is one of the most challenging steps in data gathering for deep learning projects.', 'The evaluation process allows us to test a model against data it has never seen before to gauge its real-world performance.', 'Tuning hyperparameters by increasing the number of epochs can improve model accuracy.', 'Addressing overfitting by getting more data or reducing the model size can help the model generalize better.']}], 'highlights': ["Deep learning's role in revolutionizing medical diagnosis, surpassing physicians in detecting cancer, underscores its transformative impact on healthcare.", 'Deep learning applications range from board games like Go to self-driving vehicles, fake news detection, and earthquake prediction.', 'The chapter covers various loss functions such as squared error loss, absolute error loss, Huber loss, binary cross entropy, hinge loss, multi-class cross entropy, and Kullback–Leibler divergence loss, emphasizing their relevance in different types of data modeling and project requirements.', 'Supervised learning involves training models on well-labeled data, with the objective of predicting the correct label for newly presented input data, typically expressed as y = f(x).', 'RNNs can operate effectively on sequences of data with variable input length and use feedback loops to propagate information via hidden states throughout time.', 'CNNs are heavily used in computer vision for tasks like image recognition, image processing, and video analysis.', 'UCI Machine Learning Repository maintains around 500 extremely well-maintained datasets.', 'The amount of data you need for a well-performing model should be 10 times the number of parameters in that model.', 'Splitting a dataset into training, testing, and validation sets is crucial.', 'Addressing overfitting by getting more data or reducing the model size can help the model generalize better.']}