title
Training Custom Object Detector - TensorFlow Object Detection API Tutorial p.5
description
Welcome to part 5 of the TensorFlow Object Detection API tutorial series. In this part of the tutorial, we will train our object detection model to detect our custom object. To do this, we need the Images, matching TFRecords for the training and testing data, and then we need to setup the configuration of the model, then we can train. For us, that means we need to setup a configuration file.
Text tutorials and sample code: https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
detail
{'title': 'Training Custom Object Detector - TensorFlow Object Detection API Tutorial p.5', 'heatmap': [{'end': 416.725, 'start': 382.577, 'weight': 0.806}, {'end': 614.417, 'start': 563.563, 'weight': 0.703}], 'summary': 'Tutorial covers configuring tensorflow object detection api for training, which may take up to three hours on a lower end gpu, using mobilenet for real-time object detection with emphasis on fast processing speed, and the importance of achieving a low average loss below one for object detection, using tensorboard to monitor model training.', 'chapters': [{'end': 86.278, 'segs': [{'end': 46.214, 'src': 'embed', 'start': 2.708, 'weight': 0, 'content': [{'end': 10.359, 'text': 'what is going on everybody, welcome to part five of our tensorflow object detection api tutorial series.', 'start': 2.708, 'duration': 7.651}, {'end': 18.75, 'text': "in this video, what we're going to be doing is, uh, setting up the configuration file that we need and then beginning the training process,", 'start': 10.359, 'duration': 8.391}, {'end': 21.514, 'text': 'which will take about an hour depending on your gpu.', 'start': 18.75, 'duration': 2.764}, {'end': 24.297, 'text': "so a decent gpu it'll take like an hour.", 'start': 21.514, 'duration': 2.783}, {'end': 31.646, 'text': "on a cpu, i have absolutely no idea and on like a lower end gpu, that's still like a gpu compute capability 3.0.", 'start': 24.297, 'duration': 7.349}, {'end': 34.109, 'text': "maybe like, i don't know, three hours, i'm not sure.", 'start': 31.646, 'duration': 2.463}, {'end': 37.351, 'text': "and anyway let's get started.", 'start': 34.109, 'duration': 3.242}, {'end': 43.593, 'text': 'so first, what we need to do is we need both a model and a configuration file.', 'start': 37.351, 'duration': 6.242}, {'end': 46.214, 'text': "now I'm gonna link to a bunch of resources.", 'start': 43.593, 'duration': 2.621}], 'summary': 'Setting up configuration file and training process takes about an hour on a decent gpu.', 'duration': 43.506, 'max_score': 2.708, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E82708.jpg'}, {'end': 90.141, 'src': 'embed', 'start': 65.025, 'weight': 1, 'content': [{'end': 73.01, 'text': "I'm not going to do that, but you can feel free to do that by checking out this configuring jobs information here.", 'start': 65.025, 'duration': 7.985}, {'end': 75.332, 'text': 'So what you can do to kind of create your own model.', 'start': 73.05, 'duration': 2.282}, {'end': 81.675, 'text': "And then you can also, if you don't want to do that, you can check out their sample configuration,", 'start': 76.832, 'duration': 4.843}, {'end': 83.896, 'text': "so you don't even have to write the configuration for yourself.", 'start': 81.675, 'duration': 2.221}, {'end': 86.278, 'text': 'And these are all the configurations.', 'start': 84.597, 'duration': 1.681}, {'end': 90.141, 'text': "We're going to be working with MobileNet.", 'start': 86.418, 'duration': 3.723}], 'summary': 'Options for configuring jobs include creating own model and using sample configurations. mobilenet will be used.', 'duration': 25.116, 'max_score': 65.025, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E865025.jpg'}], 'start': 2.708, 'title': 'Tensorflow object detection api tutorial', 'summary': 'Covers configuring tensorflow object detection api for training, with estimated training time of up to three hours on a lower end gpu, and provides resources for custom models.', 'chapters': [{'end': 86.278, 'start': 2.708, 'title': 'Tensorflow object detection api tutorial', 'summary': 'Covers setting up the configuration file for training the tensorflow object detection api, with training time estimated to take about an hour on a decent gpu and potentially up to three hours on a lower end gpu, and provides resources for creating custom models or using sample configurations.', 'duration': 83.57, 'highlights': ['The training process is estimated to take about an hour on a decent GPU and up to three hours on a lower end GPU.', 'Resources are provided for creating custom models or using sample configurations.', 'The chapter introduces the process of setting up the configuration file and beginning the training process for the TensorFlow object detection API.']}], 'duration': 83.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E82708.jpg', 'highlights': ['The training process is estimated to take about an hour on a decent GPU and up to three hours on a lower end GPU.', 'Resources are provided for creating custom models or using sample configurations.', 'The chapter introduces the process of setting up the configuration file and beginning the training process for the TensorFlow object detection API.']}, {'end': 772.496, 'segs': [{'end': 151.419, 'src': 'embed', 'start': 86.418, 'weight': 0, 'content': [{'end': 90.141, 'text': "We're going to be working with MobileNet.", 'start': 86.418, 'duration': 3.723}, {'end': 93.782, 'text': "But you can use different ones if you'd like.", 'start': 92.102, 'duration': 1.68}, {'end': 95.102, 'text': "I'm trying to think.", 'start': 93.882, 'duration': 1.22}, {'end': 97.143, 'text': 'Let me click on here.', 'start': 96.142, 'duration': 1.001}, {'end': 101.304, 'text': "Is this the one that had a breakdown? Yeah, this one has a breakdown of them, so it's kind of cool.", 'start': 97.183, 'duration': 4.121}, {'end': 108.725, 'text': 'So this is where you can download each of the models, and you can kind of pick which one you want.', 'start': 101.944, 'duration': 6.781}, {'end': 111.966, 'text': "Now, we're using MobileNet, again, because it is fun.", 'start': 109.225, 'duration': 2.741}, {'end': 116.067, 'text': 'fast, because i want to detect things in real time.', 'start': 112.426, 'duration': 3.641}, {'end': 121.049, 'text': 'so the more frames per second that we can detect, the better in my opinion.', 'start': 116.067, 'duration': 4.982}, {'end': 122.469, 'text': "so i'm doing that now.", 'start': 121.049, 'duration': 1.42}, {'end': 128.311, 'text': "if you're just trying to like classify images or something like that and you don't need to do it super fast,", 'start': 122.469, 'duration': 5.842}, {'end': 134.173, 'text': "then you might want to use something that's a little more accurate, like the rcnn or something like that.", 'start': 128.311, 'duration': 5.862}, {'end': 139.475, 'text': "so anyway, just know that those are options for you, but i'm going to be doing mobilenet,", 'start': 134.173, 'duration': 5.302}, {'end': 144.016, 'text': 'But they do have configuration files for basically all these models here.', 'start': 139.955, 'duration': 4.061}, {'end': 146.277, 'text': 'So feel free to choose a different one.', 'start': 144.156, 'duration': 2.121}, {'end': 147.398, 'text': "You don't have to copy me.", 'start': 146.297, 'duration': 1.101}, {'end': 151.419, 'text': "Anyway, but what we're going to do is use MobileNet.", 'start': 148.638, 'duration': 2.781}], 'summary': 'Using mobilenet for real-time detection, prioritizing speed with higher frames per second for faster detection.', 'duration': 65.001, 'max_score': 86.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E886418.jpg'}, {'end': 371.494, 'src': 'embed', 'start': 345.287, 'weight': 3, 'content': [{'end': 350.568, 'text': "Eventually I'm just going to copy and paste my thing to be 100%, but we definitely want to fix that.", 'start': 345.287, 'duration': 5.281}, {'end': 354.81, 'text': 'So right now batch size is 24.', 'start': 353.169, 'duration': 1.641}, {'end': 362.532, 'text': "I would keep it at 24, but if you're getting a memory error, you could decrease the batch size.", 'start': 354.81, 'duration': 7.722}, {'end': 367.033, 'text': 'So you would require less memory, basically.', 'start': 364.752, 'duration': 2.281}, {'end': 371.494, 'text': 'Because batch size is like how many samples at a time are we trying to throw into this model to learn against.', 'start': 367.053, 'duration': 4.441}], 'summary': "The batch size is currently 24, but it can be decreased if there's a memory error, as it determines how many samples are thrown into the model to learn against.", 'duration': 26.207, 'max_score': 345.287, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8345287.jpg'}, {'end': 416.725, 'src': 'heatmap', 'start': 382.577, 'weight': 0.806, 'content': [{'end': 383.597, 'text': 'Fine-tune the checkpoint.', 'start': 382.577, 'duration': 1.02}, {'end': 388.879, 'text': 'This should just be the path to the actual checkpoint file here.', 'start': 383.657, 'duration': 5.222}, {'end': 391.439, 'text': 'So it just should be SSD MobileNet.', 'start': 389.419, 'duration': 2.02}, {'end': 393.64, 'text': "So I'm just going to copy that.", 'start': 391.499, 'duration': 2.141}, {'end': 400.938, 'text': 'paste that in, and then it would be that model checkpoint.', 'start': 396.496, 'duration': 4.442}, {'end': 402.659, 'text': "I'm gonna look and see what it says.", 'start': 400.998, 'duration': 1.661}, {'end': 405.94, 'text': "Yeah, it's just model.ckpt, so leave it that way.", 'start': 402.679, 'duration': 3.261}, {'end': 410.382, 'text': 'Continuing to scroll down, just look, so paths to be configured.', 'start': 407.341, 'duration': 3.041}, {'end': 413.444, 'text': "Again, it's not pet train record.", 'start': 410.422, 'duration': 3.022}, {'end': 416.725, 'text': 'It would just be like train.record, and then this will be.', 'start': 413.804, 'duration': 2.921}], 'summary': 'Fine-tune ssd mobilenet model with model.ckpt and train.record.', 'duration': 34.148, 'max_score': 382.577, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8382577.jpg'}, {'end': 614.417, 'src': 'heatmap', 'start': 563.563, 'weight': 0.703, 'content': [{'end': 569.889, 'text': "I'm pretty sure we're gonna have to do that and We might be able to get away with another way, but I'm just going to move it into models.", 'start': 563.563, 'duration': 6.326}, {'end': 571.65, 'text': "I don't want to waste any time trying to figure out that.", 'start': 569.929, 'duration': 1.721}, {'end': 576.474, 'text': 'But you should actually, since we installed it, be able to do this without doing it this way.', 'start': 571.77, 'duration': 4.704}, {'end': 581.477, 'text': "But anyway, so let's open up the object-detection.", 'start': 576.654, 'duration': 4.823}, {'end': 584.119, 'text': 'And we want to take data, images.', 'start': 581.778, 'duration': 2.341}, {'end': 585.841, 'text': 'We need the model.', 'start': 585.2, 'duration': 0.641}, {'end': 587.382, 'text': 'We need the training directory.', 'start': 586.081, 'duration': 1.301}, {'end': 590.164, 'text': 'And we need this configuration file.', 'start': 587.402, 'duration': 2.762}, {'end': 596.07, 'text': "I'm just making sure that's everything.", 'start': 594.229, 'duration': 1.841}, {'end': 600.492, 'text': "let's go ahead and copy that over, just in case something goes wrong.", 'start': 596.07, 'duration': 4.422}, {'end': 601.852, 'text': 'merge data.', 'start': 600.492, 'duration': 1.36}, {'end': 602.312, 'text': "let's go ahead.", 'start': 601.852, 'duration': 0.46}, {'end': 605.934, 'text': 'yes, merge the data.', 'start': 602.312, 'duration': 3.622}, {'end': 607.234, 'text': 'that was a really fast copy.', 'start': 605.934, 'duration': 1.3}, {'end': 610.236, 'text': 'okay, all right.', 'start': 607.234, 'duration': 3.002}, {'end': 614.417, 'text': "now we're ready to do is attempt to train.py.", 'start': 610.236, 'duration': 4.181}], 'summary': 'Preparing data and models for object detection training.', 'duration': 50.854, 'max_score': 563.563, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8563563.jpg'}], 'start': 86.418, 'title': 'Using and setting up mobilenet for real-time object detection', 'summary': 'Covers the use of mobilenet for real-time object detection, emphasizing its fast processing speed and various model options for different accuracy levels. it also details the setup process, including downloading configuration files and models, editing configuration files, and running training scripts with specified directories and pipeline paths.', 'chapters': [{'end': 134.173, 'start': 86.418, 'title': 'Working with mobilenet', 'summary': 'Discusses the use of mobilenet for real-time object detection, highlighting its fast processing speed and the option to choose different models for varying accuracy.', 'duration': 47.755, 'highlights': ['The chapter highlights the use of MobileNet for fast real-time object detection, emphasizing the importance of high frames per second for better performance.', 'The speaker suggests that using MobileNet is fun and suitable for detecting things in real time.', 'Different models can be chosen based on the requirement, with MobileNet being preferred for its speed.', 'The speaker mentions that if high accuracy is not a priority, other models like rcnn can be considered for image classification.']}, {'end': 772.496, 'start': 134.173, 'title': 'Setting up mobilenet for object detection', 'summary': 'Demonstrates setting up mobilenet for object detection, including downloading configuration files and models, editing configuration files, and running training script with specified directories and pipeline paths.', 'duration': 638.323, 'highlights': ['The chapter demonstrates setting up MobileNet for object detection, including downloading configuration files and models, editing configuration files, and running training script with specified directories and pipeline paths.', 'The batch size for the model is set at 24, and users are advised to decrease it if encountering memory errors to require less memory for the training process.', "The checkpoint file path is specified as 'SSD MobileNet' and the model checkpoint is 'model.ckpt', emphasizing the need for accurate file paths in the configuration file."]}], 'duration': 686.078, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E886418.jpg', 'highlights': ['Different models can be chosen based on the requirement, with MobileNet being preferred for its speed.', 'The chapter highlights the use of MobileNet for fast real-time object detection, emphasizing the importance of high frames per second for better performance.', 'The chapter demonstrates setting up MobileNet for object detection, including downloading configuration files and models, editing configuration files, and running training script with specified directories and pipeline paths.', 'The batch size for the model is set at 24, and users are advised to decrease it if encountering memory errors to require less memory for the training process.']}, {'end': 1092.348, 'segs': [{'end': 868.711, 'src': 'embed', 'start': 775.711, 'weight': 0, 'content': [{'end': 781.533, 'text': "I'm pretty sure that's the only reason that would happen.", 'start': 775.711, 'duration': 5.822}, {'end': 795.957, 'text': 'Come on.', 'start': 795.297, 'duration': 0.66}, {'end': 797.237, 'text': 'You can do it.', 'start': 796.597, 'duration': 0.64}, {'end': 799.538, 'text': 'Basically, you want to start seeing some steps.', 'start': 798.057, 'duration': 1.481}, {'end': 800.358, 'text': "That's what we're waiting on.", 'start': 799.558, 'duration': 0.8}, {'end': 811.656, 'text': "And we're good, we're golden, we're on our way.", 'start': 809.073, 'duration': 2.583}, {'end': 816.04, 'text': "Okay, so yeah, we're beginning.", 'start': 812.536, 'duration': 3.504}, {'end': 820.124, 'text': 'So, basically, what you wanna look for is you want loss.', 'start': 816.46, 'duration': 3.664}, {'end': 821.725, 'text': 'you want the average loss.', 'start': 820.124, 'duration': 1.601}, {'end': 829.842, 'text': "um, you know your goal should be you would like to get under one, but your goal should be let's get about one.", 'start': 823.118, 'duration': 6.724}, {'end': 833.564, 'text': "um, definitely make sure it's below two.", 'start': 829.842, 'duration': 3.722}, {'end': 837.806, 'text': 'so some things are going to be relatively complex, especially when you have,', 'start': 833.564, 'duration': 4.242}, {'end': 844.61, 'text': "maybe if you've got multiple objects that are pretty similar to each other, um and that's or like a lot of your objects, are relatively similar,", 'start': 837.806, 'duration': 6.804}, {'end': 846.271, 'text': 'with minute differences, stuff like that.', 'start': 844.61, 'duration': 1.661}, {'end': 851.555, 'text': 'you might have to sacrifice a little bit in your expectations.', 'start': 848.052, 'duration': 3.503}, {'end': 854.517, 'text': "But at least for us, we've got one object.", 'start': 851.715, 'duration': 2.802}, {'end': 855.958, 'text': "It's macaroni and cheese.", 'start': 854.797, 'duration': 1.161}, {'end': 861.365, 'text': "In my opinion, that's a relatively easy thing to detect.", 'start': 857.802, 'duration': 3.563}, {'end': 868.711, 'text': "So, I'm expecting that we can get no problem a loss average of about 1.", 'start': 862.446, 'duration': 6.265}], 'summary': 'The goal is to achieve an average loss of about 1 for detecting macaroni and cheese.', 'duration': 93, 'max_score': 775.711, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8775711.jpg'}, {'end': 1021.609, 'src': 'embed', 'start': 990.954, 'weight': 1, 'content': [{'end': 991.635, 'text': "Apparently, I didn't.", 'start': 990.954, 'duration': 0.681}, {'end': 993.397, 'text': 'Trained up high.', 'start': 992.576, 'duration': 0.821}, {'end': 997.244, 'text': 'I think it might be 200, though, or maybe 250.', 'start': 994.983, 'duration': 2.261}, {'end': 998.184, 'text': "I can't remember.", 'start': 997.244, 'duration': 0.94}, {'end': 999.784, 'text': "I'm pretty sure I have seen it, though.", 'start': 998.544, 'duration': 1.24}, {'end': 1003.765, 'text': 'Anyway, expect to do like 10,000 steps.', 'start': 1001.545, 'duration': 2.22}, {'end': 1015.828, 'text': 'If you did a batch size of 24, probably about 10,000-ish steps will get you close to an average loss of about one.', 'start': 1004.485, 'duration': 11.343}, {'end': 1017.108, 'text': "But actually, we're falling pretty good.", 'start': 1015.868, 'duration': 1.24}, {'end': 1018.989, 'text': 'I wish you would update, man.', 'start': 1018.068, 'duration': 0.921}, {'end': 1021.609, 'text': "I'd like to show the people.", 'start': 1020.749, 'duration': 0.86}], 'summary': 'Expect to do about 10,000 steps with a batch size of 24 for an average loss of about one.', 'duration': 30.655, 'max_score': 990.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8990954.jpg'}, {'end': 1092.348, 'src': 'embed', 'start': 1043.266, 'weight': 4, 'content': [{'end': 1047.388, 'text': 'What was it? 238? Why would it record every 238 steps? Okay.', 'start': 1043.266, 'duration': 4.122}, {'end': 1051.141, 'text': 'Okay, we go.', 'start': 1047.409, 'duration': 3.732}, {'end': 1056.386, 'text': "okay. so, um, i guess, every 200-ish steps it'll record.", 'start': 1051.141, 'duration': 5.245}, {'end': 1060.771, 'text': 'so, basically, once this kind of averaged out, one, uh, gets you to about one.', 'start': 1056.386, 'duration': 4.385}, {'end': 1064.715, 'text': "um, you'd probably be good to stop.", 'start': 1060.771, 'duration': 3.944}, {'end': 1070.2, 'text': "um, once we've done this, we've got a trained model and then, in the next tutorial,", 'start': 1064.715, 'duration': 5.485}, {'end': 1081.844, 'text': 'we can export the inference graph from that model and then we can use that to detect new objects and draw boxes or not new objects to detect our object in like new images and draw,', 'start': 1070.2, 'duration': 11.644}, {'end': 1083.985, 'text': 'you know, a bounding box around it.', 'start': 1081.844, 'duration': 2.141}, {'end': 1086.026, 'text': "so anyway, that's it for now.", 'start': 1083.985, 'duration': 2.041}, {'end': 1092.348, 'text': "if you've got questions, comments concerns whatever, feel free to leave them below, otherwise i'll see you in the next tutorial.", 'start': 1086.026, 'duration': 6.322}], 'summary': 'The model records data every 200-ish steps and can be used to detect new objects in images.', 'duration': 49.082, 'max_score': 1043.266, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E81043266.jpg'}], 'start': 775.711, 'title': 'Object detection and tensorboard usage', 'summary': 'Covers the importance of achieving a low average loss below one for object detection, particularly for similar objects, and using tensorboard to monitor model training, aiming for an average loss of about one after around 10,000 steps with a batch size of 24.', 'chapters': [{'end': 868.711, 'start': 775.711, 'title': 'Object detection loss averages', 'summary': 'Discusses the process of object detection, emphasizing the importance of achieving a low average loss, ideally below one, especially when dealing with relatively similar objects. the speaker expresses confidence in achieving a loss average of about 1 for detecting a single object, macaroni and cheese.', 'duration': 93, 'highlights': ['The speaker emphasizes the importance of achieving a low average loss, ideally below one, for object detection.', 'The speaker expresses confidence in achieving a loss average of about 1 for detecting a single object, macaroni and cheese.', 'The speaker notes the potential complexity when dealing with multiple objects that are relatively similar and the need to adjust expectations accordingly.']}, {'end': 1092.348, 'start': 868.711, 'title': 'Tensorboard for model training', 'summary': 'Discusses using tensorboard to monitor model training and suggests reaching an average loss of about one after approximately 10,000 steps with a batch size of 24.', 'duration': 223.637, 'highlights': ['Reaching an average loss of about one after approximately 10,000 steps with a batch size of 24 is suggested for model training.', 'Using TensorBoard to monitor training progress and focusing on the total loss is recommended.', 'The event file is recorded approximately every 200 steps, providing insights into the training progress.', 'Exporting the inference graph from the trained model to detect new objects in images is mentioned as a future step.']}], 'duration': 316.637, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/JR8CmWyh2E8/pics/JR8CmWyh2E8775711.jpg', 'highlights': ['The speaker emphasizes the importance of achieving a low average loss, ideally below one, for object detection.', 'Reaching an average loss of about one after approximately 10,000 steps with a batch size of 24 is suggested for model training.', 'Using TensorBoard to monitor training progress and focusing on the total loss is recommended.', 'The speaker expresses confidence in achieving a loss average of about 1 for detecting a single object, macaroni and cheese.', 'The event file is recorded approximately every 200 steps, providing insights into the training progress.', 'The speaker notes the potential complexity when dealing with multiple objects that are relatively similar and the need to adjust expectations accordingly.', 'Exporting the inference graph from the trained model to detect new objects in images is mentioned as a future step.']}], 'highlights': ['The training process is estimated to take about an hour on a decent GPU and up to three hours on a lower end GPU.', 'Different models can be chosen based on the requirement, with MobileNet being preferred for its speed.', 'The speaker emphasizes the importance of achieving a low average loss, ideally below one, for object detection.']}