title
Loading in your own data - Deep Learning basics with Python, TensorFlow and Keras p.2
description
Welcome to a tutorial where we'll be discussing how to load in our own outside datasets, which comes with all sorts of challenges!
First, we need a dataset. Let's grab the Dogs vs Cats dataset from Microsoft: https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765
Text tutorials and sample code: https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex
detail
{'title': 'Loading in your own data - Deep Learning basics with Python, TensorFlow and Keras p.2', 'heatmap': [{'end': 298.021, 'start': 263.358, 'weight': 1}], 'summary': 'Covers loading a dataset of 12,500 cat and dog samples, processing images using numpy and opencv, and creating a training dataset for a neural network, with a focus on achieving a 50-50 split between classes and shuffling data to prevent biased learning. it also discusses challenges of classifying dog images at different resolutions, highlighting the effectiveness at 50x50 resolution.', 'chapters': [{'end': 395.515, 'segs': [{'end': 130.183, 'src': 'embed', 'start': 2.268, 'weight': 0, 'content': [{'end': 10.153, 'text': 'What is going on everybody and welcome to part two of our deep learning with Python, TensorFlow, and Keras tutorial.', 'start': 2.268, 'duration': 7.885}, {'end': 16.315, 'text': "In this tutorial, what we're going to be talking about is how to load in an outside dataset.", 'start': 11.273, 'duration': 5.042}, {'end': 21.439, 'text': "The outside dataset we're going to use is this cats and dogs dataset from Microsoft.", 'start': 16.335, 'duration': 5.104}, {'end': 23.581, 'text': 'It was for initially a Kaggle challenge.', 'start': 21.479, 'duration': 2.102}, {'end': 32.043, 'text': "And the idea is to take pictures of cats and dogs and then identify them by feeding them through a neural network and have the neural network say whether or not that's a cat or a dog.", 'start': 24.201, 'duration': 7.842}, {'end': 34.745, 'text': 'So go ahead and download that data set.', 'start': 32.704, 'duration': 2.041}, {'end': 39.527, 'text': 'And then once you have that data set, let me pull up an example here.', 'start': 34.905, 'duration': 4.622}, {'end': 43.868, 'text': 'What you should see is like this.', 'start': 41.307, 'duration': 2.561}, {'end': 45.949, 'text': 'So you should get two directories.', 'start': 44.249, 'duration': 1.7}, {'end': 47.95, 'text': "These two things I've added in.", 'start': 46.009, 'duration': 1.941}, {'end': 56.274, 'text': 'But what you should have is cat and dog, and then in here you should have some images of cats and dogs, in this case a bunch of dogs.', 'start': 48.05, 'duration': 8.224}, {'end': 65.299, 'text': "Each one has 12, 500 samples, so you should have plenty of examples to teach a model what's a cat and what's a dog.", 'start': 57.174, 'duration': 8.125}, {'end': 70.955, 'text': "So go ahead and download those, extract those, and then we'll come over here and we will get to work.", 'start': 66.792, 'duration': 4.163}, {'end': 74.217, 'text': "So we're going to import, first of all, numpy as mp.", 'start': 71.555, 'duration': 2.662}, {'end': 80.281, 'text': "We're going to import matplotlib.pyplot as plt.", 'start': 74.237, 'duration': 6.044}, {'end': 81.461, 'text': "We're going to import os.", 'start': 80.301, 'duration': 1.16}, {'end': 84.703, 'text': "We're going to import cv2 and from, and actually just cv2.", 'start': 81.501, 'duration': 3.202}, {'end': 88.066, 'text': "If you don't have numpy, pip install numpy.", 'start': 85.644, 'duration': 2.422}, {'end': 90.167, 'text': "If you don't have matplotlib, pip install matplotlib.", 'start': 88.086, 'duration': 2.081}, {'end': 91.027, 'text': "And if you don't have cv2.", 'start': 90.187, 'duration': 0.84}, {'end': 98.386, 'text': 'you will need to do a pip install opencv-python.', 'start': 92.321, 'duration': 6.065}, {'end': 103.549, 'text': "Alright, once we have those, basically I'm going to use matplotlib just to show the image.", 'start': 100.007, 'duration': 3.542}, {'end': 114.958, 'text': "We're going to use os to iterate through directories and join paths, cv2 to do some image operations, and then numpy to do various array operations.", 'start': 103.569, 'duration': 11.389}, {'end': 122.537, 'text': "So The first thing we're going to do is specify a data directory.", 'start': 118.821, 'duration': 3.716}, {'end': 130.183, 'text': 'My data is located in my X files under datasets and pet images.', 'start': 123.898, 'duration': 6.285}], 'summary': 'Tutorial on using a cats and dogs dataset for deep learning with python, tensorflow, and keras.', 'duration': 127.915, 'max_score': 2.268, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE2268.jpg'}, {'end': 231.092, 'src': 'embed', 'start': 206.256, 'weight': 5, 'content': [{'end': 213.64, 'text': 'So we read in that image, and then we say cv2.mread underscore grayscale.', 'start': 206.256, 'duration': 7.384}, {'end': 220.305, 'text': "So we're going to convert it to grayscale because one, RGB data is three times the size of grayscale data.", 'start': 213.66, 'duration': 6.645}, {'end': 225.668, 'text': "And I just don't think that color is that essential in this specific task.", 'start': 220.705, 'duration': 4.963}, {'end': 228.65, 'text': 'In a lot of identifying tasks, it is.', 'start': 226.208, 'duration': 2.442}, {'end': 231.092, 'text': 'But at least in the difference between a cat and a dog.', 'start': 228.71, 'duration': 2.382}], 'summary': 'Converting image to grayscale reduces data size, color not essential for identifying a cat and a dog.', 'duration': 24.836, 'max_score': 206.256, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE206256.jpg'}, {'end': 298.021, 'src': 'heatmap', 'start': 263.358, 'weight': 1, 'content': [{'end': 272.443, 'text': "Again, the only reason I'm using matplotlib here is because I don't know how to do inline in Jupyter Notebook with CV2.", 'start': 263.358, 'duration': 9.085}, {'end': 273.163, 'text': "I'm sure there's a way.", 'start': 272.483, 'duration': 0.68}, {'end': 275.365, 'text': 'If somebody knows it, go ahead and leave it below.', 'start': 273.183, 'duration': 2.182}, {'end': 279.627, 'text': "Then I'm just going to throw a break here and a break just so we can look at this picture real quick.", 'start': 275.785, 'duration': 3.842}, {'end': 282.408, 'text': "So as you can see, it's a grayscale image of a dog.", 'start': 279.787, 'duration': 2.621}, {'end': 283.929, 'text': 'No surprise there.', 'start': 282.969, 'duration': 0.96}, {'end': 284.91, 'text': "It's kind of what we expected.", 'start': 283.969, 'duration': 0.941}, {'end': 287.392, 'text': 'Also our data.', 'start': 286.331, 'duration': 1.061}, {'end': 290.074, 'text': 'This is what our data looks like image array.', 'start': 287.472, 'duration': 2.602}, {'end': 292.156, 'text': 'Okay, just a bunch of numbers.', 'start': 290.094, 'duration': 2.062}, {'end': 294.518, 'text': "now, What if we didn't convert it to grayscale?", 'start': 292.156, 'duration': 2.362}, {'end': 298.021, 'text': "So in this case, you can see, it's just a bunch of number.", 'start': 294.558, 'duration': 3.463}], 'summary': 'Using matplotlib for image display, grayscale image of a dog, array of numbers.', 'duration': 34.663, 'max_score': 263.358, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE263358.jpg'}, {'end': 374.639, 'src': 'embed', 'start': 346.296, 'weight': 6, 'content': [{'end': 352.881, 'text': 'you know, if at all possible, there are ways to have variable sized images to make classifications on,', 'start': 346.296, 'duration': 6.585}, {'end': 358.525, 'text': "But in the interest of keeping things as simple as possible, We'd like to make everything the same shape.", 'start': 352.881, 'duration': 5.644}, {'end': 360.507, 'text': "So that's what we're going to do next.", 'start': 359.446, 'duration': 1.061}, {'end': 362.489, 'text': 'Now we have to decide on a shape.', 'start': 360.607, 'duration': 1.882}, {'end': 369.295, 'text': "So for example, what if we say image size 50? So maybe we're going to try every image is a 50 by 50.", 'start': 362.669, 'duration': 6.626}, {'end': 374.639, 'text': 'So the way we would do that is just new array equals cv2.resize.', 'start': 369.295, 'duration': 5.344}], 'summary': 'To simplify things, all images will be resized to 50x50 for classification.', 'duration': 28.343, 'max_score': 346.296, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE346296.jpg'}], 'start': 2.268, 'title': 'Preparing image data for neural network classification', 'summary': 'Covers loading a dataset consisting of 12,500 samples each of cats and dogs, and processing images using numpy and opencv to simplify data for classification and visualization.', 'chapters': [{'end': 70.955, 'start': 2.268, 'title': 'Loading cats and dogs dataset for neural network classification', 'summary': 'Focuses on loading the cats and dogs dataset from microsoft, originally a kaggle challenge, consisting of 12,500 samples each of cats and dogs, to be used for training a neural network to classify images as cats or dogs.', 'duration': 68.687, 'highlights': ['The tutorial introduces the process of loading an outside dataset, specifically the cats and dogs dataset from Microsoft, which was initially a Kaggle challenge.', 'The dataset consists of 12,500 samples each of cats and dogs, providing ample examples to train a model for image classification.', 'The goal is to use the dataset to train a neural network to identify whether a given image is of a cat or a dog, demonstrating the practical application of deep learning with Python, TensorFlow, and Keras.']}, {'end': 395.515, 'start': 71.555, 'title': 'Image processing with numpy and opencv', 'summary': 'Covers importing necessary libraries, iterating through image categories, converting images to grayscale, and resizing images to a standard size, aiming to simplify image data for classification and visualization.', 'duration': 323.96, 'highlights': ['Iterating through image categories to process a dataset of pet images The chapter discusses iterating through directories and joining paths using os, aiming to process a pet image dataset.', 'Converting images to grayscale to simplify data and reduce size By using cv2 to convert images to grayscale, the chapter aims to simplify data and reduce the size from RGB to grayscale, making it more manageable for processing.', 'Resizing images to a standard size for normalization The chapter discusses the process of resizing images to a standard size, such as 50x50, to normalize the image data for simplifying classification and visualization.', 'Importing necessary libraries for image processing The chapter covers the process of importing essential libraries like numpy, matplotlib, os, and cv2 for image processing operations.']}], 'duration': 393.247, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE2268.jpg', 'highlights': ['The dataset consists of 12,500 samples each of cats and dogs, providing ample examples to train a model for image classification.', 'The tutorial introduces the process of loading an outside dataset, specifically the cats and dogs dataset from Microsoft, which was initially a Kaggle challenge.', 'The goal is to use the dataset to train a neural network to identify whether a given image is of a cat or a dog, demonstrating the practical application of deep learning with Python, TensorFlow, and Keras.', 'Iterating through image categories to process a dataset of pet images The chapter discusses iterating through directories and joining paths using os, aiming to process a pet image dataset.', 'Importing necessary libraries for image processing The chapter covers the process of importing essential libraries like numpy, matplotlib, os, and cv2 for image processing operations.', 'Converting images to grayscale to simplify data and reduce size By using cv2 to convert images to grayscale, the chapter aims to simplify data and reduce the size from RGB to grayscale, making it more manageable for processing.', 'Resizing images to a standard size for normalization The chapter discusses the process of resizing images to a standard size, such as 50x50, to normalize the image data for simplifying classification and visualization.']}, {'end': 617.457, 'segs': [{'end': 437.596, 'src': 'embed', 'start': 398.018, 'weight': 0, 'content': [{'end': 404.427, 'text': "I can still tell that's a dog, but eventually we can't, right? If we do 10, make it a 10 by 10, I can't tell that's a dog.", 'start': 398.018, 'duration': 6.409}, {'end': 405.428, 'text': "I don't think anybody can.", 'start': 404.467, 'duration': 0.961}, {'end': 409.274, 'text': 'We go with 20, still probably not.', 'start': 405.829, 'duration': 3.445}, {'end': 411.917, 'text': 'Maybe though, you might be able to get away with it.', 'start': 409.494, 'duration': 2.423}, {'end': 414.721, 'text': "At 30, it's still pretty hard.", 'start': 411.937, 'duration': 2.784}, {'end': 417.683, 'text': 'but now you can start to see, like you know, like the forearm or, i guess, the wrist.', 'start': 414.721, 'duration': 2.962}, {'end': 420.865, 'text': 'you know the area after the wrist, the hand of a dog.', 'start': 417.683, 'duration': 3.182}, {'end': 422.606, 'text': "i guess i don't know anyways.", 'start': 420.865, 'duration': 1.741}, {'end': 424.467, 'text': "um, that's usually longer in dogs.", 'start': 422.606, 'duration': 1.861}, {'end': 426.829, 'text': 'so i could make this classification at this point.', 'start': 424.467, 'duration': 2.362}, {'end': 429.47, 'text': "but you know, at 50 i'm pretty comfortable.", 'start': 426.829, 'duration': 2.641}, {'end': 434.854, 'text': 'but you do have to be careful because this dog takes up quite a bit of the image, whereas some of these might not like.', 'start': 429.47, 'duration': 5.384}, {'end': 437.596, 'text': "for example, i'm sure i can find one eventually.", 'start': 434.854, 'duration': 2.742}], 'summary': 'Accuracy of identifying dog decreases with image size, at 50% accuracy with caution due to image size.', 'duration': 39.578, 'max_score': 398.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE398018.jpg'}, {'end': 572.38, 'src': 'embed', 'start': 545.114, 'weight': 1, 'content': [{'end': 548.699, 'text': 'whatever. that category is so 0 for a dog, 1 for a cat.', 'start': 545.114, 'duration': 3.585}, {'end': 551.826, 'text': "Then we're going to iterate over the images.", 'start': 550.165, 'duration': 1.661}, {'end': 553.027, 'text': "We don't need to show them.", 'start': 552.106, 'duration': 0.921}, {'end': 554.428, 'text': "We don't need to break anymore.", 'start': 553.067, 'duration': 1.361}, {'end': 558.551, 'text': 'And all I want to do now is resize with this new array.', 'start': 555.128, 'duration': 3.423}, {'end': 559.932, 'text': "So we'll come down here.", 'start': 558.971, 'duration': 0.961}, {'end': 563.034, 'text': "We'll perform that resizing operation.", 'start': 560.272, 'duration': 2.762}, {'end': 567.917, 'text': "And I think that's about it.", 'start': 564.815, 'duration': 3.102}, {'end': 572.38, 'text': 'So now we just want to append this to our training data list above there.', 'start': 567.977, 'duration': 4.403}], 'summary': 'Iterated through images, resized array, and appended to training data list.', 'duration': 27.266, 'max_score': 545.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE545114.jpg'}, {'end': 621.5, 'src': 'embed', 'start': 597.048, 'weight': 5, 'content': [{'end': 602.989, 'text': "so, except exception as e and you know what i'm gonna do i am gonna, i'm just gonna pass.", 'start': 597.048, 'duration': 5.941}, {'end': 605.33, 'text': "actually, I already know that there's some that are broken.", 'start': 602.989, 'duration': 2.341}, {'end': 609.072, 'text': "Normally you would throw the exception so you could read it and figure out what's going wrong.", 'start': 605.97, 'duration': 3.102}, {'end': 611.574, 'text': "But I'm going to go ahead and just pass.", 'start': 609.693, 'duration': 1.881}, {'end': 617.457, 'text': "But there's like, you'll get like an OS error and some other warning information and all that fun stuff.", 'start': 612.594, 'duration': 4.863}, {'end': 618.878, 'text': "But I'll just pass for now.", 'start': 617.517, 'duration': 1.361}, {'end': 621.5, 'text': 'Create training data.', 'start': 619.339, 'duration': 2.161}], 'summary': 'Handling broken exceptions by passing instead of throwing, resulting in os errors and warning information.', 'duration': 24.452, 'max_score': 597.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE597048.jpg'}], 'start': 398.018, 'title': 'Dog classification and training data creation', 'summary': 'Discusses challenges of classifying dog images at different resolutions, noting that at 50x50 resolution, it becomes easier to identify the dog. additionally, it covers the process of creating a training data set for a neural network, including iterating through images, mapping categories to numerical values, resizing images, and handling exceptions, with the goal of appending the processed data to the training data list.', 'chapters': [{'end': 461.48, 'start': 398.018, 'title': 'Dog classification at various resolutions', 'summary': 'Discusses the challenges of classifying a dog image at different resolutions, noting that at 50x50 resolution, it becomes easier to identify the dog, cautioning about the impact of image size on classification accuracy.', 'duration': 63.462, 'highlights': ['At 50x50 resolution, the speaker is pretty comfortable classifying the image as a dog.', 'The speaker notes challenges in identifying the dog at 10x10, 20x20, and 30x30 resolutions.', 'Caution is advised when the dog takes up a significant portion of the image, as it might impact classification accuracy.']}, {'end': 617.457, 'start': 463.197, 'title': 'Creating training data for neural network', 'summary': 'Covers the process of creating a training data set for a neural network, including iterating through images, mapping categories to numerical values, resizing images, and handling exceptions, with the goal of appending the processed data to the training data list.', 'duration': 154.26, 'highlights': ["The process involves creating an empty training data list and iterating through images to build the dataset, while also mapping categories (e.g., 'dog' and 'cat') to numerical values (e.g., 0 and 1) for classification.", 'The resizing operation is performed on the images before appending them to the training data list.', 'The chapter emphasizes the importance of handling exceptions, such as broken images, within the dataset.']}], 'duration': 219.439, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE398018.jpg', 'highlights': ['At 50x50 resolution, the speaker is pretty comfortable classifying the image as a dog.', "The process involves creating an empty training data list and iterating through images to build the dataset, while also mapping categories (e.g., 'dog' and 'cat') to numerical values (e.g., 0 and 1) for classification.", 'Caution is advised when the dog takes up a significant portion of the image, as it might impact classification accuracy.', 'The resizing operation is performed on the images before appending them to the training data list.', 'The speaker notes challenges in identifying the dog at 10x10, 20x20, and 30x30 resolutions.', 'The chapter emphasizes the importance of handling exceptions, such as broken images, within the dataset.']}, {'end': 1129.916, 'segs': [{'end': 707.93, 'src': 'embed', 'start': 645.208, 'weight': 0, 'content': [{'end': 653.696, 'text': 'So in the case of a binary choice like we have, cats and dogs, you want to make sure you have 50-50, right? Just as many cats and just as many dogs.', 'start': 645.208, 'duration': 8.488}, {'end': 661.763, 'text': 'Now, sometimes you can have different numbers and then when you train the model, you can inform the model and say, hey, these are our class weights.', 'start': 653.836, 'duration': 7.927}, {'end': 663.204, 'text': 'So they have weights that you can pass.', 'start': 661.883, 'duration': 1.321}, {'end': 674.451, 'text': 'And the way this will work is it will weight the loss a little differently in an attempt to handle for your imbalanced dataset.', 'start': 664.005, 'duration': 10.446}, {'end': 677.954, 'text': 'But if at all possible, you definitely want to balance the dataset instead.', 'start': 674.812, 'duration': 3.142}, {'end': 683.837, 'text': "So let's say you had a dataset that was 75% dog, 25% cat.", 'start': 678.094, 'duration': 5.743}, {'end': 685.138, 'text': 'you feed that through the neural network.', 'start': 683.837, 'duration': 1.301}, {'end': 687.839, 'text': 'the neural networks gonna learn really quickly.', 'start': 685.138, 'duration': 2.701}, {'end': 695.003, 'text': "Just always predict dog and you'll be 75% right and then, when it tries to learn from there, It's gonna have a really, really hard time.", 'start': 687.839, 'duration': 7.164}, {'end': 701.967, 'text': "So So if you balance it, so it's a perfect 5050 you'll be better The next thing you want to do is shuffle the data.", 'start': 695.143, 'duration': 6.824}, {'end': 707.93, 'text': "So we've got the training data but as you can see the first thing we did was iterate over the category and then go from there.", 'start': 701.987, 'duration': 5.943}], 'summary': 'Balancing dataset is crucial for training models to avoid bias and improve accuracy.', 'duration': 62.722, 'max_score': 645.208, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE645208.jpg'}, {'end': 776.976, 'src': 'embed', 'start': 722.834, 'weight': 3, 'content': [{'end': 724.374, 'text': 'So we definitely wanna shuffle the data.', 'start': 722.834, 'duration': 1.54}, {'end': 729.896, 'text': 'So we can import random and then we can do a random.shuffle.', 'start': 724.414, 'duration': 5.482}, {'end': 731.316, 'text': 'There we go.', 'start': 730.196, 'duration': 1.12}, {'end': 732.016, 'text': 'Training data.', 'start': 731.396, 'duration': 0.62}, {'end': 733.816, 'text': 'Since training is a list, mutable, there it is.', 'start': 732.036, 'duration': 1.78}, {'end': 735.377, 'text': "It's already shuffled at this point.", 'start': 733.876, 'duration': 1.501}, {'end': 751.663, 'text': "So, for example, we could now, for, I don't know, sample in training data, we can check that our labels are correct by doing print sample one.", 'start': 736.497, 'duration': 15.166}, {'end': 752.563, 'text': 'So this will be the label.', 'start': 751.683, 'duration': 0.88}, {'end': 755.345, 'text': 'So sample zeroth would be, oh, we went through them all.', 'start': 752.603, 'duration': 2.742}, {'end': 762.908, 'text': "Anyways, sample zero would be, and let's just do up to 10 here, would be the actual image array itself.", 'start': 755.445, 'duration': 7.463}, {'end': 767.171, 'text': "Okay I don't want to run the whole thing all over again.", 'start': 763.368, 'duration': 3.803}, {'end': 768.972, 'text': "Well, we'll just wait for that.", 'start': 767.191, 'duration': 1.781}, {'end': 769.492, 'text': "That's fine.", 'start': 769.012, 'duration': 0.48}, {'end': 776.976, 'text': "So now let's take this data now, and now that it's shuffled, let's pack it into the variables that we're going to use,", 'start': 769.572, 'duration': 7.404}], 'summary': 'Shuffling training data using random.shuffle for better performance.', 'duration': 54.142, 'max_score': 722.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE722834.jpg'}, {'end': 1046.652, 'src': 'embed', 'start': 1021.349, 'weight': 6, 'content': [{'end': 1027.252, 'text': "so the last thing we're going to do is we're going to once we've created our training data, we're just going to import pickle and then, in some way,", 'start': 1021.349, 'duration': 5.903}, {'end': 1028.132, 'text': "you don't have to use pickle.", 'start': 1027.252, 'duration': 0.88}, {'end': 1033.554, 'text': "we could numpy.save or whatever somehow save your data so you don't have to redo it every time.", 'start': 1028.132, 'duration': 5.422}, {'end': 1038.353, 'text': "so pickle out equals open And I'm gonna open x.pickle.", 'start': 1033.554, 'duration': 4.799}, {'end': 1046.652, 'text': 'x.pickle wb, pickle.dump.', 'start': 1041.468, 'duration': 5.184}], 'summary': 'After creating training data, use pickle or numpy.save to avoid redoing it every time.', 'duration': 25.303, 'max_score': 1021.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE1021349.jpg'}], 'start': 617.517, 'title': 'Training data balancing and shuffling', 'summary': 'Discusses the significance of achieving a 50-50 split between classes in training data for classification and emphasizes the importance of shuffling data to prevent biased learning. additionally, it covers converting data to numpy arrays, saving the dataset using pickle, and feeding it through a convolutional neural network for efficient processing.', 'chapters': [{'end': 722.434, 'start': 617.517, 'title': 'Balancing and shuffling training data', 'summary': 'Discusses the importance of balancing training data for classification, aiming for a 50-50 split between classes and the need to shuffle the data to prevent biased learning', 'duration': 104.917, 'highlights': ["It's important to balance the training data for classification, aiming for a 50-50 split between classes to prevent biased learning", 'Imbalanced datasets can lead to biased learning, such as a dataset with 75% dogs and 25% cats, resulting in the neural network quickly learning to always predict dogs', 'Shuffling the data is crucial to prevent biased learning, as feeding data in the order of all dogs followed by all cats can lead to the neural network learning to predict only the dominant class']}, {'end': 1129.916, 'start': 722.834, 'title': 'Shuffling and preparing training data', 'summary': 'Covers shuffling and preparing training data, including converting data to numpy arrays and saving the dataset using pickle, to avoid repetitive data rebuilding, before feeding it through a convolutional neural network in the next tutorial.', 'duration': 407.082, 'highlights': ['The chapter covers shuffling and preparing training data, including converting data to numpy arrays and saving the dataset using pickle, to avoid repetitive data rebuilding, before feeding it through a convolutional neural network in the next tutorial. Shuffling data, converting to numpy arrays, saving dataset using pickle, preparing for convolutional neural network.', 'The training data is shuffled using random.shuffle, and the process of checking the correctness of labels is demonstrated. Shuffling training data, checking label correctness.', 'The process of packing shuffled data into variables for feeding into the neural network is explained, emphasizing the conversion of data to numpy arrays. Packing shuffled data, converting to numpy arrays.', 'The importance of saving the prepared dataset using pickle to avoid repetitive data rebuilding is highlighted. Importance of saving dataset using pickle.']}], 'duration': 512.399, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/j-3vuBynnOE/pics/j-3vuBynnOE617517.jpg', 'highlights': ["It's important to balance the training data for classification, aiming for a 50-50 split between classes to prevent biased learning", 'Shuffling the data is crucial to prevent biased learning, as feeding data in the order of all dogs followed by all cats can lead to the neural network learning to predict only the dominant class', 'Imbalanced datasets can lead to biased learning, such as a dataset with 75% dogs and 25% cats, resulting in the neural network quickly learning to always predict dogs', 'The chapter covers shuffling and preparing training data, including converting data to numpy arrays and saving the dataset using pickle, to avoid repetitive data rebuilding, before feeding it through a convolutional neural network in the next tutorial', 'The training data is shuffled using random.shuffle, and the process of checking the correctness of labels is demonstrated', 'The process of packing shuffled data into variables for feeding into the neural network is explained, emphasizing the conversion of data to numpy arrays', 'The importance of saving the prepared dataset using pickle to avoid repetitive data rebuilding is highlighted']}], 'highlights': ['The dataset consists of 12,500 samples each of cats and dogs, providing ample examples to train a model for image classification.', 'The tutorial introduces the process of loading an outside dataset, specifically the cats and dogs dataset from Microsoft, which was initially a Kaggle challenge.', 'The goal is to use the dataset to train a neural network to identify whether a given image is of a cat or a dog, demonstrating the practical application of deep learning with Python, TensorFlow, and Keras.', 'Iterating through image categories to process a dataset of pet images The chapter discusses iterating through directories and joining paths using os, aiming to process a pet image dataset.', 'Importing necessary libraries for image processing The chapter covers the process of importing essential libraries like numpy, matplotlib, os, and cv2 for image processing operations.', 'Converting images to grayscale to simplify data and reduce size By using cv2 to convert images to grayscale, the chapter aims to simplify data and reduce the size from RGB to grayscale, making it more manageable for processing.', 'Resizing images to a standard size for normalization The chapter discusses the process of resizing images to a standard size, such as 50x50, to normalize the image data for simplifying classification and visualization.', 'At 50x50 resolution, the speaker is pretty comfortable classifying the image as a dog.', "The process involves creating an empty training data list and iterating through images to build the dataset, while also mapping categories (e.g., 'dog' and 'cat') to numerical values (e.g., 0 and 1) for classification.", 'Caution is advised when the dog takes up a significant portion of the image, as it might impact classification accuracy.', 'The resizing operation is performed on the images before appending them to the training data list.', 'The speaker notes challenges in identifying the dog at 10x10, 20x20, and 30x30 resolutions.', "It's important to balance the training data for classification, aiming for a 50-50 split between classes to prevent biased learning", 'Shuffling the data is crucial to prevent biased learning, as feeding data in the order of all dogs followed by all cats can lead to the neural network learning to predict only the dominant class', 'Imbalanced datasets can lead to biased learning, such as a dataset with 75% dogs and 25% cats, resulting in the neural network quickly learning to always predict dogs', 'The chapter covers shuffling and preparing training data, including converting data to numpy arrays and saving the dataset using pickle, to avoid repetitive data rebuilding, before feeding it through a convolutional neural network in the next tutorial', 'The training data is shuffled using random.shuffle, and the process of checking the correctness of labels is demonstrated', 'The process of packing shuffled data into variables for feeding into the neural network is explained, emphasizing the conversion of data to numpy arrays', 'The importance of saving the prepared dataset using pickle to avoid repetitive data rebuilding is highlighted']}