title
Data - Deep Learning and Neural Networks with Python and Pytorch p.2
description
So now that you know the basics of what Pytorch is, let's apply it using a basic neural network example. The very first thing we have to consider is our data.
Text-based tutorials and sample code: https://pythonprogramming.net/data-deep-learning-neural-network-pytorch/
Linode Cloud GPUs $20 credit: https://linode.com/sentdex
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Instagram: https://instagram.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
#pytorch #deeplearning #machinelearning
detail
{'title': 'Data - Deep Learning and Neural Networks with Python and Pytorch p.2', 'heatmap': [{'end': 1018.546, 'start': 980.454, 'weight': 1}], 'summary': 'Emphasizes the significance of data in neural network training, noting that 90% of time and energy is spent on acquiring and pre-processing data. it introduces working with the mnist dataset and torch vision package, covering torchvision vision data sets, custom data sets, and data loading. it discusses the importance of managing training and testing datasets, data shuffling for generalization, and balancing data for neural networks.', 'chapters': [{'end': 133.6, 'segs': [{'end': 91.328, 'src': 'embed', 'start': 25.071, 'weight': 0, 'content': [{'end': 36.381, 'text': 'pre-processing your data and how you are going to iterate over your data pretty much consumes, i would say, 90 of your time and energy.', 'start': 25.071, 'duration': 11.31}, {'end': 40.946, 'text': 'uh, at least in terms of thinking about your model, obviously training time.', 'start': 36.381, 'duration': 4.565}, {'end': 45.17, 'text': "When you're just waiting for the model to learn stuff, it can take a long time.", 'start': 41.646, 'duration': 3.524}, {'end': 53.697, 'text': "But you, the work that you are going to be putting in, this is the step where you're going to be doing probably the majority of your work.", 'start': 45.83, 'duration': 7.867}, {'end': 58.902, 'text': "So to begin, what we're going to do is work with a kind of toy data set.", 'start': 54.498, 'duration': 4.404}, {'end': 60.003, 'text': "It's going to be MNIST.", 'start': 59.022, 'duration': 0.981}, {'end': 62.686, 'text': 'This is a really popular data set to use for beginners.', 'start': 60.063, 'duration': 2.623}, {'end': 66.269, 'text': "One, because it's a machine learning, learnable data set.", 'start': 63.286, 'duration': 2.983}, {'end': 67.13, 'text': "Like it's really simple.", 'start': 66.309, 'duration': 0.821}, {'end': 68.151, 'text': "There's no question.", 'start': 67.33, 'duration': 0.821}, {'end': 69.252, 'text': 'We can definitely learn this.', 'start': 68.191, 'duration': 1.061}, {'end': 70.293, 'text': 'We can tinker with it.', 'start': 69.292, 'duration': 1.001}, {'end': 73.617, 'text': "It just, it's a good starting data set.", 'start': 71.715, 'duration': 1.902}, {'end': 74.958, 'text': "So that's what we're going to use.", 'start': 73.757, 'duration': 1.201}, {'end': 79.803, 'text': "And we're going to use a package called Torch Vision, which you should already have installed if you haven't.", 'start': 74.978, 'duration': 4.825}, {'end': 81.484, 'text': 'Just pip install.', 'start': 80.283, 'duration': 1.201}, {'end': 84.705, 'text': 'it should just be pip install.', 'start': 81.484, 'duration': 3.221}, {'end': 86.446, 'text': 'Install torch vision like.', 'start': 84.705, 'duration': 1.741}, {'end': 89.027, 'text': 'then you can get it.', 'start': 86.446, 'duration': 2.581}, {'end': 90.507, 'text': 'so Fun fact.', 'start': 89.027, 'duration': 1.48}, {'end': 91.328, 'text': 'I also look.', 'start': 90.507, 'duration': 0.821}], 'summary': 'Pre-processing and iterating over data consumes 90% of time, starting with mnist dataset using torch vision.', 'duration': 66.257, 'max_score': 25.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs25071.jpg'}], 'start': 1.624, 'title': 'Data in deep learning', 'summary': 'Emphasizes the significance of data in neural network training, noting that 90% of the time and energy is spent on acquiring and pre-processing data. it also introduces working with the mnist dataset and torch vision package for beginners.', 'chapters': [{'end': 133.6, 'start': 1.624, 'title': 'Data in deep learning with python and pytorch', 'summary': 'Focuses on the importance of data in neural network training, highlighting that 90% of the time and energy is consumed in acquiring and pre-processing data. it introduces working with the mnist dataset and torch vision package for beginners.', 'duration': 131.976, 'highlights': ['Acquiring and pre-processing data consumes 90% of time and energy when thinking about the model, excluding training time. The step of acquiring data, pre-processing your data and how you are going to iterate over your data pretty much consumes, i would say, 90 of your time and energy.', "Introduction to working with the MNIST dataset for beginners in machine learning. Working with a kind of toy data set. It's going to be MNIST. This is a really popular data set to use for beginners.", "Usage of Torch Vision package for data manipulation and analysis in PyTorch. We're going to use a package called Torch Vision, which you should already have installed if you haven't. Just pip install."]}], 'duration': 131.976, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1624.jpg', 'highlights': ['Acquiring and pre-processing data consumes 90% of time and energy when thinking about the model, excluding training time.', 'Introduction to working with the MNIST dataset for beginners in machine learning.', 'Usage of Torch Vision package for data manipulation and analysis in PyTorch.']}, {'end': 437.655, 'segs': [{'end': 224.166, 'src': 'embed', 'start': 133.6, 'weight': 0, 'content': [{'end': 145.847, 'text': "so, uh, what we're going to do first is we're going to import, uh, torch and then we're going to import torch vision and adjust my microphone here.", 'start': 133.6, 'duration': 12.247}, {'end': 153.753, 'text': "uh, the other thing we're gonna do is from torch vision we're going to import transforms and data sets.", 'start': 145.847, 'duration': 7.906}, {'end': 156.995, 'text': "So let's make sure that import works at least.", 'start': 155.114, 'duration': 1.881}, {'end': 157.976, 'text': 'It does, cool.', 'start': 157.376, 'duration': 0.6}, {'end': 159.958, 'text': 'So what is TorchVision?', 'start': 158.577, 'duration': 1.381}, {'end': 165.582, 'text': "So Torch comes with a bunch of data sets and I can't remember now if other data sets exist now in TorchVision.", 'start': 160.098, 'duration': 5.484}, {'end': 172.447, 'text': "that aren't vision tasks, but basically it's a collection of data that is used for vision.", 'start': 165.582, 'duration': 6.865}, {'end': 176.21, 'text': 'So most training data sets with neural networks have something to do with vision,', 'start': 172.508, 'duration': 3.702}, {'end': 181.254, 'text': 'just because Vision seems to be the big thing that we benchmark against.', 'start': 176.21, 'duration': 5.044}, {'end': 184.217, 'text': 'I wish there were more data sets for other tasks.', 'start': 181.274, 'duration': 2.943}, {'end': 189.981, 'text': "I mean there's some, but vision is clearly like the main interest that people are working with,", 'start': 184.977, 'duration': 5.004}, {'end': 195.205, 'text': 'just because really neural networks are solving vision tasks and other machine learning tasks.', 'start': 189.981, 'duration': 5.224}, {'end': 198.068, 'text': "Algorithms just haven't been able to do that.", 'start': 195.846, 'duration': 2.222}, {'end': 204.533, 'text': 'So just in terms of just money from like investments and business interest.', 'start': 198.208, 'duration': 6.325}, {'end': 209.197, 'text': "it tends to be vision tasks, because that's like a low hanging fruit that we can do something with right now.", 'start': 204.533, 'duration': 4.664}, {'end': 213.621, 'text': 'But obviously there are other tasks as well, especially like when it comes to like.', 'start': 210.238, 'duration': 3.383}, {'end': 217.305, 'text': 'advertising is a huge market where we want to be able to predict what people are going to do.', 'start': 213.621, 'duration': 3.684}, {'end': 219.725, 'text': 'uh, and funneling and all that.', 'start': 218.045, 'duration': 1.68}, {'end': 224.166, 'text': 'But anyways, um, torch vision just has a bunch of vision data for us.', 'start': 219.765, 'duration': 4.401}], 'summary': 'Torchvision provides vision data sets for neural network training and benchmarking.', 'duration': 90.566, 'max_score': 133.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs133600.jpg'}, {'end': 268.879, 'src': 'embed', 'start': 242.009, 'weight': 5, 'content': [{'end': 247.09, 'text': 'Like cause you again, you got to convert things like categories, or you know, words have to be converted to numbers.', 'start': 242.009, 'duration': 5.081}, {'end': 248.96, 'text': 'all that kind of stuff.', 'start': 248.159, 'duration': 0.801}, {'end': 251.202, 'text': "So you're going to spend a long time doing that kind of thing.", 'start': 248.98, 'duration': 2.222}, {'end': 253.144, 'text': "Here, it's already done for us.", 'start': 251.703, 'duration': 1.441}, {'end': 257.368, 'text': "And then the other thing we typically need to do is batching, which I'll talk about in a minute.", 'start': 253.364, 'duration': 4.004}, {'end': 259.13, 'text': "And again, it's going to be done for us.", 'start': 257.767, 'duration': 1.363}, {'end': 265.135, 'text': "Now I will say in the next not the next tutorial, because we'll probably still be working on this,", 'start': 259.67, 'duration': 5.465}, {'end': 268.879, 'text': 'but in the next kind of model that we build the next data set.', 'start': 265.135, 'duration': 3.744}], 'summary': 'Data preprocessing is streamlined, saving time and effort.', 'duration': 26.87, 'max_score': 242.009, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs242009.jpg'}, {'end': 325.683, 'src': 'embed', 'start': 294.672, 'weight': 4, 'content': [{'end': 298.274, 'text': "You're going to have a training data set and then you're going to have a testing data set.", 'start': 294.672, 'duration': 3.602}, {'end': 307.516, 'text': "And it's important that you separate these out as soon as possible before you forget, because, in order to validate your data or your model,", 'start': 298.294, 'duration': 9.222}, {'end': 311.538, 'text': "rather you want to have what's called out of sample testing data.", 'start': 307.516, 'duration': 4.022}, {'end': 314.319, 'text': 'This is the most realistic test that we can have.', 'start': 311.978, 'duration': 2.341}, {'end': 318.2, 'text': "Basically, it's data that has never been seen before by your machine.", 'start': 314.399, 'duration': 3.801}, {'end': 325.683, 'text': 'Because if you use in-sample data, basically the machine, if it has learned to overfit,', 'start': 318.9, 'duration': 6.783}], 'summary': 'Separate training and testing data for realistic model validation.', 'duration': 31.011, 'max_score': 294.672, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs294672.jpg'}, {'end': 411.746, 'src': 'embed', 'start': 380.515, 'weight': 6, 'content': [{'end': 387.056, 'text': "So we're gonna say datasets.MNIST, And then you're going to specify where you want the data to go.", 'start': 380.515, 'duration': 6.541}, {'end': 388.657, 'text': 'I just want it to go locally.', 'start': 387.076, 'duration': 1.581}, {'end': 392.419, 'text': "So I'm going to do open and close parenthesis or quotes.", 'start': 388.697, 'duration': 3.722}, {'end': 396.781, 'text': "And we're going to say this is train equals true.", 'start': 394.02, 'duration': 2.761}, {'end': 399.643, 'text': 'Download equals true.', 'start': 397.402, 'duration': 2.241}, {'end': 401.704, 'text': 'And then any transform.', 'start': 399.663, 'duration': 2.041}, {'end': 406.967, 'text': 'So transform equals transforms dot capital C compose.', 'start': 401.744, 'duration': 5.223}, {'end': 411.746, 'text': 'And then in here, you would paste all the transforms.', 'start': 409.062, 'duration': 2.684}], 'summary': 'Using datasets.mnist to download and transform data locally for training.', 'duration': 31.231, 'max_score': 380.515, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs380515.jpg'}], 'start': 133.6, 'title': 'Torchvision data sets and dataset management', 'summary': 'Introduces torchvision vision data sets for neural network training, emphasizing the potential for diverse data sets. it also discusses the importance of managing training and testing datasets in machine learning, highlighting the need for out-of-sample testing data and the risk of overfitting. additionally, it covers the process of preparing and formatting data for neural networks with a focus on the mnist dataset.', 'chapters': [{'end': 224.166, 'start': 133.6, 'title': 'Introduction to torchvision data sets', 'summary': 'Introduces torchvision, a collection of vision data sets used for neural network training, highlighting the focus on vision tasks and the potential for more diverse data sets in the future.', 'duration': 90.566, 'highlights': ['TorchVision provides a collection of data sets used for vision tasks, which are commonly used for benchmarking neural networks.', 'Vision tasks are the main focus due to the significant interest and investment in solving vision-related problems with neural networks.', 'While vision tasks dominate, there is potential for diverse data sets for other tasks in the future, such as predicting user behavior in advertising.', 'TorchVision import includes torch, torch vision, transforms, and data sets for initial setup.']}, {'end': 437.655, 'start': 224.586, 'title': 'Managing training and testing data', 'summary': 'Discusses the importance of managing training and testing datasets in machine learning, emphasizing the need for out-of-sample testing data and the risk of overfitting when using in-sample data. it also covers the process of preparing and formatting data for neural networks, with a focus on the mnist dataset.', 'duration': 213.069, 'highlights': ['The importance of separating training and testing data is emphasized, as using in-sample data can lead to overfitting, resulting in poor performance on out-of-sample data.', 'The process of preparing and formatting data for neural networks is discussed, including the need to convert categories and words to numbers, as well as the application of transforms to the data.', 'The use of the MNIST dataset for training and testing is demonstrated, with a focus on specifying data location, applying transforms, and converting data to Tensors.']}], 'duration': 304.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs133600.jpg', 'highlights': ['TorchVision provides benchmarking data sets for vision tasks.', 'Vision tasks are a significant focus for solving problems with neural networks.', 'Potential for diverse data sets exists for tasks beyond vision, such as predicting user behavior.', 'Importing TorchVision includes torch, torch vision, transforms, and data sets for initial setup.', 'Separating training and testing data is crucial to avoid overfitting and ensure performance on out-of-sample data.', 'Preparing and formatting data for neural networks involves converting categories and words to numbers and applying transforms.', 'The MNIST dataset is used for training and testing, involving specifying data location, applying transforms, and converting data to Tensors.']}, {'end': 886.534, 'segs': [{'end': 485.791, 'src': 'embed', 'start': 438.216, 'weight': 3, 'content': [{'end': 446.903, 'text': 'But later on, you can actually write, like if you really enjoy this, you can write your own data set and you can use this kind of same syntax.', 'start': 438.216, 'duration': 8.687}, {'end': 452.327, 'text': "Because as you'll see, there's a lot of things, like especially in this tutorial, it won't become as obvious.", 'start': 446.923, 'duration': 5.404}, {'end': 462.034, 'text': 'But in the coming tutorials, it will start to become pretty obvious how tedious iterating over a data set, for example, can be.', 'start': 452.408, 'duration': 9.626}, {'end': 467.358, 'text': 'I almost think it would be just as tedious to convert it to one of these data sets as well.', 'start': 462.815, 'duration': 4.543}, {'end': 473.082, 'text': 'But anyway, just know that you can do that, and then you can also write your own kind of custom transforms.', 'start': 467.418, 'duration': 5.664}, {'end': 475.983, 'text': "But for now, we'll just use the one that's built in here.", 'start': 473.182, 'duration': 2.801}, {'end': 477.484, 'text': 'And I believe.', 'start': 476.684, 'duration': 0.8}, {'end': 479.486, 'text': "that's a valid line.", 'start': 478.405, 'duration': 1.081}, {'end': 485.791, 'text': "so actually i'm going to go ahead and take this copy paste train will be set to false.", 'start': 479.486, 'duration': 6.305}], 'summary': 'The tutorial highlights the ability to write custom data sets and transforms, and the tediousness of iterating over data sets.', 'duration': 47.575, 'max_score': 438.216, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs438216.jpg'}, {'end': 534.565, 'src': 'embed', 'start': 506.168, 'weight': 2, 'content': [{'end': 509.911, 'text': 'um, Download it, and kind of in a sort of variable here.', 'start': 506.168, 'duration': 3.743}, {'end': 519.616, 'text': "The next thing we want to do is actually Load this into another type of object that's going to help us iterate over that data.", 'start': 510.351, 'duration': 9.265}, {'end': 524.599, 'text': "So let me write these two lines and then I'll explain why we're doing that, because it might not seem obvious Why.", 'start': 519.616, 'duration': 4.983}, {'end': 526.4, 'text': "okay, we've got our training and testing data.", 'start': 524.599, 'duration': 1.801}, {'end': 528.902, 'text': 'Why do we need to talk about how to iterate over it?', 'start': 526.781, 'duration': 2.121}, {'end': 534.565, 'text': "So the next thing I'm just going to go ahead and write here is I did not think it was going to do that.", 'start': 529.082, 'duration': 5.483}], 'summary': 'Loading data into an object to iterate over training and testing data.', 'duration': 28.397, 'max_score': 506.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs506168.jpg'}, {'end': 607.341, 'src': 'embed', 'start': 576.607, 'weight': 1, 'content': [{'end': 578.649, 'text': 'And then shuffle will be equal to true.', 'start': 576.607, 'duration': 2.042}, {'end': 579.989, 'text': "And we'll talk about shuffle as well.", 'start': 578.809, 'duration': 1.18}, {'end': 583.431, 'text': 'So first let me just copy paste this line.', 'start': 580.43, 'duration': 3.001}, {'end': 591.856, 'text': "We're going to do that same thing for test set a test and yes.", 'start': 583.451, 'duration': 8.405}, {'end': 598.696, 'text': "Okay So we've separated these out and first batch size.", 'start': 591.996, 'duration': 6.7}, {'end': 607.341, 'text': "This is how many at a time do we wanna pass to our model? So in theory, some of these data sets, like this one's not really that big of a data set.", 'start': 598.756, 'duration': 8.585}], 'summary': 'Discussed setting shuffle to true and defining batch size for passing data to the model.', 'duration': 30.734, 'max_score': 576.607, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs576607.jpg'}, {'end': 750.233, 'src': 'embed', 'start': 721.476, 'weight': 6, 'content': [{'end': 723.958, 'text': 'So anyway, more on that later.', 'start': 721.476, 'duration': 2.482}, {'end': 731.664, 'text': "So one reason we have to batch is because our data is just going to be so big that we probably can't any realistic example fit at all on the GPU.", 'start': 724.098, 'duration': 7.566}, {'end': 736.814, 'text': 'The second reason is because we hope this data will generalize.', 'start': 732.485, 'duration': 4.329}, {'end': 745.21, 'text': "So, if you take your entire data set and you pass it through the model, what's going to happen is like, as the model starts to be,", 'start': 737.276, 'duration': 7.934}, {'end': 748.132, 'text': 'to optimize all those little weights and all those little connections.', 'start': 745.21, 'duration': 2.922}, {'end': 750.233, 'text': "remember, there's millions of these.", 'start': 748.132, 'duration': 2.101}], 'summary': 'Batching data is necessary due to large size, millions of weights and connections, and the need for generalization.', 'duration': 28.757, 'max_score': 721.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs721476.jpg'}, {'end': 834.29, 'src': 'embed', 'start': 810.393, 'weight': 0, 'content': [{'end': 820.18, 'text': "there's always kind of a sweet spot batch size and, like I said, it usually is between 8 and 64, regardless of like how big your your memory is.", 'start': 810.393, 'duration': 9.787}, {'end': 825.223, 'text': "but usually is just, it's just usually, sometimes you're going to go even bigger than that.", 'start': 820.18, 'duration': 5.043}, {'end': 830.107, 'text': 'so, and the reason why you want almost your batch size, you do want to be as big as possible,', 'start': 825.223, 'duration': 4.884}, {'end': 834.29, 'text': "because generally that's going to impact how quickly you can train through your data.", 'start': 830.107, 'duration': 4.183}], 'summary': 'Optimal batch size for training is between 8 and 64, impacting data training speed.', 'duration': 23.897, 'max_score': 810.393, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs810393.jpg'}], 'start': 438.216, 'title': 'Custom data sets and data loading', 'summary': 'Discusses writing custom data sets and transforms, emphasizing the tedious process of iterating over a data set and the importance of batch size and shuffle for efficient training, with recommended batch sizes ranging from 8 to 64.', 'chapters': [{'end': 485.791, 'start': 438.216, 'title': 'Writing custom data sets and transforms', 'summary': 'Discusses the ability to write custom data sets and transforms in the tutorial, highlighting the tedious process of iterating over a data set and the option to use built-in or custom transforms.', 'duration': 47.575, 'highlights': ['The tutorial emphasizes the tedious process of iterating over a data set, suggesting that it will become obvious in the coming tutorials (quantifiable data: future tutorials).', 'It mentions the option to write custom data sets and use custom transforms, while currently using the built-in ones (quantifiable data: custom options).', 'The speaker mentions the possibility of writing custom data sets and using the same syntax, indicating the flexibility and potential for customization (quantifiable data: flexibility).']}, {'end': 886.534, 'start': 485.791, 'title': 'Data loading and iteration', 'summary': 'Covers the process of downloading, loading, and iterating over a dataset, emphasizing the importance of batch size and shuffle for efficient training in deep learning models, with recommended batch sizes ranging from 8 to 64.', 'duration': 400.743, 'highlights': ['The importance of batch size is highlighted, with a recommended range of 8 to 64 for efficient training in deep learning models.', "The significance of shuffling the dataset is explained, emphasizing its role in ensuring the model's generalization and effectiveness during training.", 'Explanation about the need for batching data due to the size of the dataset and its impact on memory usage and generalization in the model.', 'The process of downloading and loading the dataset into an object for iteration and dataset manipulation is described.']}], 'duration': 448.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs438216.jpg', 'highlights': ['The importance of batch size is highlighted, with a recommended range of 8 to 64 for efficient training in deep learning models.', "The significance of shuffling the dataset is explained, emphasizing its role in ensuring the model's generalization and effectiveness during training.", 'The process of downloading and loading the dataset into an object for iteration and dataset manipulation is described.', 'The tutorial emphasizes the tedious process of iterating over a data set, suggesting that it will become obvious in the coming tutorials (quantifiable data: future tutorials).', 'It mentions the option to write custom data sets and use custom transforms, while currently using the built-in ones (quantifiable data: custom options).', 'The speaker mentions the possibility of writing custom data sets and using the same syntax, indicating the flexibility and potential for customization (quantifiable data: flexibility).', 'Explanation about the need for batching data due to the size of the dataset and its impact on memory usage and generalization in the model.']}, {'end': 1136.644, 'segs': [{'end': 955.783, 'src': 'embed', 'start': 905.449, 'weight': 0, 'content': [{'end': 910.213, 'text': 'Whereas if you shuffle, again, the name of the game is generalization.', 'start': 905.449, 'duration': 4.764}, {'end': 913.335, 'text': 'So we want to do everything we can do to generalize.', 'start': 910.613, 'duration': 2.722}, {'end': 921.121, 'text': 'Give the neural network the opportunity to learn general principles rather than just simply figuring out little tricks.', 'start': 914.156, 'duration': 6.965}, {'end': 931.308, 'text': 'Because if there is a quicker route to get to increasing or decreasing loss, really, the neural network is going to take that route.', 'start': 921.241, 'duration': 10.067}, {'end': 939.993, 'text': 'And so you have to constantly be thinking, hmm, how can I better obfuscate overfitment? So anyway, cool.', 'start': 931.488, 'duration': 8.505}, {'end': 943.696, 'text': "So we've talked about train set, batch size, shuffle, true, cool.", 'start': 940.354, 'duration': 3.342}, {'end': 953.621, 'text': "So once we've done that, so again, in most cases, even here, like training and testing, you would have to do that split all on your own.", 'start': 944.416, 'duration': 9.205}, {'end': 955.783, 'text': "You'd have to shuffle your data all on your own.", 'start': 953.862, 'duration': 1.921}], 'summary': 'Emphasize generalization to prevent overfitting and optimize neural network performance.', 'duration': 50.334, 'max_score': 905.449, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs905449.jpg'}, {'end': 1018.546, 'src': 'heatmap', 'start': 980.454, 'weight': 1, 'content': [{'end': 990.096, 'text': "so we're going to say, for data in train set, let's just print data and then we're going to break because we don't want to run over all of them.", 'start': 980.454, 'duration': 9.642}, {'end': 991.877, 'text': "so we'll just run once.", 'start': 990.096, 'duration': 1.781}, {'end': 995.838, 'text': "so what you have here is uh, it's that entire batch.", 'start': 991.877, 'duration': 3.961}, {'end': 1002.408, 'text': "so it'll be, it'll be 10 examples of handwritten digits and then 10 tensors of the actual output.", 'start': 995.838, 'duration': 6.57}, {'end': 1007.233, 'text': 'so the first example should be a, three, second one is seven, one, six and so on.', 'start': 1002.408, 'duration': 4.825}, {'end': 1013.28, 'text': 'so, uh, the way that we can actually, uh, confirm that is by saying we could say x, comma y.', 'start': 1007.233, 'duration': 6.047}, {'end': 1018.546, 'text': 'so this might be a little unclear, but remember, you guys are beyond the basics.', 'start': 1013.28, 'duration': 5.266}], 'summary': 'Training set contains 10 examples of handwritten digits and their corresponding output tensors.', 'duration': 38.092, 'max_score': 980.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs980454.jpg'}, {'end': 1074.512, 'src': 'embed', 'start': 1050.539, 'weight': 3, 'content': [{'end': 1061.968, 'text': "it's a tensor object containing, first, a tensor of tensors that is your images, and then, second, a tensor of tensors that are your labels.", 'start': 1050.539, 'duration': 11.429}, {'end': 1068.89, 'text': 'So the way that we can reference a three, for example, would be data, the zeroth, because that is your images.', 'start': 1062.348, 'duration': 6.542}, {'end': 1070.851, 'text': "And then we'll say the zeroth image.", 'start': 1069.25, 'duration': 1.601}, {'end': 1074.512, 'text': 'So this should be an image of three.', 'start': 1071.411, 'duration': 3.101}], 'summary': 'The tensor object contains images and labels, allowing reference to specific elements like the zeroth image as an image of three.', 'duration': 23.973, 'max_score': 1050.539, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1050539.jpg'}, {'end': 1136.644, 'src': 'embed', 'start': 1104.883, 'weight': 4, 'content': [{'end': 1110.387, 'text': "if you don't have this, just pip, install matplotlib.", 'start': 1104.883, 'duration': 5.504}, {'end': 1116.833, 'text': "and again, you guys aren't basics, i assume, but you would type that into your terminal, not into jupyter notebook.", 'start': 1110.387, 'duration': 6.446}, {'end': 1130, 'text': "Anyway. And then we're gonna say PLT dot M show and we want to M show, data Zero, zero now.", 'start': 1117.413, 'duration': 12.587}, {'end': 1136.644, 'text': "Here's an issue And it's not with my typing I'm just gonna make a new one real quick.", 'start': 1130.06, 'duration': 6.584}], 'summary': 'Install matplotlib using pip, and use plt dot m show to display data.', 'duration': 31.761, 'max_score': 1104.883, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1104883.jpg'}], 'start': 887.915, 'title': 'Neural network training and data shuffling', 'summary': 'Discusses the significance of shuffling data for generalization in neural network training, emphasizing the need to prevent overfitting and iteratively increasing complexity, enabling the network to learn general principles. it also explains accessing specific elements within a tensor object and utilizing matplotlib to visualize images.', 'chapters': [{'end': 1002.408, 'start': 887.915, 'title': 'Neural network training and generalization', 'summary': 'Discusses the importance of shuffling data for generalization in neural network training, emphasizing the need to obfuscate overfitment and iteratively increasing complexity, enabling the network to learn general principles.', 'duration': 114.493, 'highlights': ['Shuffling data is crucial for generalization in neural network training Shuffling data allows the neural network to learn general principles rather than specific tricks, promoting better generalization.', 'Emphasizing the need to obfuscate overfitment Constantly thinking about how to better obfuscate overfitment is essential for effective neural network training.', 'Iteratively increasing complexity to enable learning of general principles Starting with simple concepts and then slowly ratcheting up the complexity allows the network to learn general principles effectively.']}, {'end': 1136.644, 'start': 1002.408, 'title': 'Accessing and visualizing tensors', 'summary': 'Explains how to access specific elements within a tensor object, such as accessing the zeroth tensor to visualize an image of three, and utilizing matplotlib to display the image.', 'duration': 134.236, 'highlights': ['The chapter explains how to access specific elements within a tensor object, such as accessing the zeroth tensor to visualize an image of three.', 'The chapter demonstrates the use of matplotlib to display the image by importing and utilizing PLT dot M show.']}], 'duration': 248.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs887915.jpg', 'highlights': ['Shuffling data promotes better generalization in neural network training.', 'Constantly obfuscating overfitment is essential for effective neural network training.', 'Iteratively increasing complexity enables learning of general principles.', 'Accessing specific elements within a tensor object is explained.', 'The use of matplotlib to display images is demonstrated.']}, {'end': 1696.013, 'segs': [{'end': 1175.312, 'src': 'embed', 'start': 1137.124, 'weight': 1, 'content': [{'end': 1149.423, 'text': "Let us print data zero Zero boom Oh, this one actually, that's curious.", 'start': 1137.124, 'duration': 12.299}, {'end': 1151.988, 'text': "Let's do 00.shape.", 'start': 1149.443, 'duration': 2.545}, {'end': 1164.183, 'text': "Cool. So, as you can see, here's an immediate reason why, if you just did tutorials and only use TorchVision, you would immediately, out of the gate,", 'start': 1155.816, 'duration': 8.367}, {'end': 1164.924, 'text': 'be like wait what?', 'start': 1164.183, 'duration': 0.741}, {'end': 1171.669, 'text': "Because as soon as you tried to do your own dataset because notice this shape, it's a one by 28, by 28..", 'start': 1165.444, 'duration': 6.225}, {'end': 1175.312, 'text': 'That is not a typical image.', 'start': 1171.669, 'duration': 3.643}], 'summary': 'The data shape is 1x28x28, not a typical image.', 'duration': 38.188, 'max_score': 1137.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1137124.jpg'}, {'end': 1235.836, 'src': 'embed', 'start': 1194.322, 'weight': 0, 'content': [{'end': 1199.483, 'text': 'And again, these are the things you have to start thinking about when you want to start feeding it through a neural network.', 'start': 1194.322, 'duration': 5.161}, {'end': 1204.424, 'text': 'The shaping stuff throws a lot of people off really quickly.', 'start': 1199.543, 'duration': 4.881}, {'end': 1210.885, 'text': "So I'm going to do my best to make it clear when we do what each number in a shape actually means.", 'start': 1204.484, 'duration': 6.401}, {'end': 1215.226, 'text': 'Even the TensorFlow and PyTorch docs are all the same.', 'start': 1211.405, 'duration': 3.821}, {'end': 1223.41, 'text': "A lot of times they throw these shapes at you and they don't explain where did that number come from? They just like pull it out of nowhere.", 'start': 1215.806, 'duration': 7.604}, {'end': 1227.052, 'text': "And it's, it's really frustrating when you, when you have no idea how they got it.", 'start': 1223.43, 'duration': 3.622}, {'end': 1230.334, 'text': "Uh, and I'll talk about that when we get there soon.", 'start': 1227.072, 'duration': 3.262}, {'end': 1231.795, 'text': "One day, one day we'll get there.", 'start': 1230.714, 'duration': 1.081}, {'end': 1235.836, 'text': 'Um, but yeah, that, that can make learning very difficult.', 'start': 1232.615, 'duration': 3.221}], 'summary': "Understanding neural network input and shapes can be frustrating, but it's crucial for learning.", 'duration': 41.514, 'max_score': 1194.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1194322.jpg'}, {'end': 1444.57, 'src': 'embed', 'start': 1399.412, 'weight': 3, 'content': [{'end': 1401.453, 'text': 'you got to make sure your data is balanced.', 'start': 1399.412, 'duration': 2.041}, {'end': 1410.598, 'text': 'now there are ways to get around imbalanced data sets by modifying the weights of specific classes when calculating loss,', 'start': 1401.453, 'duration': 9.145}, {'end': 1415.072, 'text': 'but i have never had that work out for me.', 'start': 1412.19, 'duration': 2.882}, {'end': 1418.235, 'text': "there there's research to suggest you could get away with that.", 'start': 1415.072, 'duration': 3.163}, {'end': 1420.436, 'text': 'somehow it has never worked for me.', 'start': 1418.235, 'duration': 2.201}, {'end': 1428.883, 'text': "you generally want your data set to be as balanced as possible, so one way that we can kind of you know, at least confirm a data set's balance is so,", 'start': 1420.436, 'duration': 8.447}, {'end': 1434.847, 'text': "for example, we can just make a counter so, and then i'm going to say counter dict.", 'start': 1428.883, 'duration': 5.964}, {'end': 1436.168, 'text': "it's probably a better way to do this.", 'start': 1434.847, 'duration': 1.321}, {'end': 1439.351, 'text': 'actually, we could just make a list and then use counter.', 'start': 1436.168, 'duration': 3.183}, {'end': 1444.57, 'text': 'Hmm, part of me wants to do that.', 'start': 1441.809, 'duration': 2.761}], 'summary': 'Balanced data is crucial for effective model training.', 'duration': 45.158, 'max_score': 1399.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1399412.jpg'}, {'end': 1690.111, 'src': 'embed', 'start': 1661.1, 'weight': 5, 'content': [{'end': 1666.563, 'text': "In the next tutorial, we'll actually build the neural network and talk about all that.", 'start': 1661.1, 'duration': 5.463}, {'end': 1670.585, 'text': 'But honestly, data is, More important than the neural network.', 'start': 1666.624, 'duration': 3.961}, {'end': 1673.266, 'text': 'So, yeah.', 'start': 1671.066, 'duration': 2.2}, {'end': 1677.948, 'text': 'Quick shout out to some channel members who have been with me now for a month.', 'start': 1673.746, 'duration': 4.202}, {'end': 1687.59, 'text': "We've got Liam Ensby, Musa Kurt, Laurentu, Anik Das, Dylan Dai, Tim Gettings, and Otto Kopecky.", 'start': 1677.988, 'duration': 9.602}, {'end': 1690.111, 'text': 'Thank you guys very much for your support.', 'start': 1688.151, 'duration': 1.96}], 'summary': 'Data is more important than the neural network. shout out to channel members.', 'duration': 29.011, 'max_score': 1661.1, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1661100.jpg'}], 'start': 1137.124, 'title': 'Data shapes and balancing for neural networks', 'summary': 'Explained the challenges of working with data shapes in pytorch and the importance of understanding specific dimensions. it also discusses the importance of balancing data for neural networks and provides a practical example of iterating over a data set to confirm its balance.', 'chapters': [{'end': 1235.836, 'start': 1137.124, 'title': 'Understanding data shapes in pytorch', 'summary': 'Explained the challenges of working with data shapes in pytorch, highlighting the importance of understanding the specific dimensions required for input into a neural network, which often confuses beginners.', 'duration': 98.712, 'highlights': ['The chapter emphasized the need to understand data shapes in PyTorch for neural network input, as the non-standard shape of the data could pose immediate hurdles for beginners. Importance of understanding data shapes, challenges for beginners', 'It was mentioned that the shape of the data in PyTorch, such as a one by 28 by 28, can be confusing for beginners and needs to be clearly explained. Confusion caused by non-standard data shape', 'The frustration of encountering unexplained data shapes in PyTorch and TensorFlow documentation was highlighted, indicating the difficulty learners face in understanding these concepts. Frustration with unexplained data shapes in documentation']}, {'end': 1696.013, 'start': 1236.317, 'title': 'Balancing data for neural networks', 'summary': 'Discusses the importance of balancing data for neural networks, emphasizing the impact of imbalanced data on model performance, and provides a practical example of iterating over a data set to confirm its balance.', 'duration': 459.696, 'highlights': ['The chapter emphasizes the impact of imbalanced data on model performance, with an example where a model learns to predict a specific class to minimize loss, leading to suboptimal performance.', 'It provides a practical example of iterating over a data set to confirm its balance, calculating the percentage distribution of different classes in the data set.', 'The tutorial concludes with a mention of the importance of data over the neural network and a shout-out to channel members for their support.']}], 'duration': 558.889, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/i2yPxY2rOzs/pics/i2yPxY2rOzs1137124.jpg', 'highlights': ['Importance of understanding data shapes, challenges for beginners', 'Confusion caused by non-standard data shape', 'Frustration with unexplained data shapes in documentation', 'The chapter emphasizes the impact of imbalanced data on model performance', 'Practical example of iterating over a data set to confirm its balance', 'Importance of data over the neural network and a shout-out to channel members for their support']}], 'highlights': ['Acquiring and pre-processing data consumes 90% of time and energy when thinking about the model, excluding training time.', 'Importance of managing training and testing datasets, data shuffling for generalization, and balancing data for neural networks.', 'TorchVision provides benchmarking data sets for vision tasks.', 'Separating training and testing data is crucial to avoid overfitting and ensure performance on out-of-sample data.', 'The importance of batch size is highlighted, with a recommended range of 8 to 64 for efficient training in deep learning models.', "The significance of shuffling the dataset is explained, emphasizing its role in ensuring the model's generalization and effectiveness during training.", 'The process of downloading and loading the dataset into an object for iteration and dataset manipulation is described.', 'Shuffling data promotes better generalization in neural network training.', 'Importance of understanding data shapes, challenges for beginners', 'The chapter emphasizes the impact of imbalanced data on model performance']}