title

Lesson 2: Deep Learning 2019 - Data cleaning and production; SGD from scratch

description

Note: please view this using the video player at http://course.fast.ai, instead of viewing on YouTube directly, to ensure you have the latest information. If you have questions, see if your question already has an answer by searching http://forums.fast.ai, and then post there if required.
We start today's lesson learning how to build your own image classification model using your own data, including topics such as:
- Image collection
- Parallel downloading
- Creating a validation set, and
- Data cleaning, using the model to help us find data problems.
I'll demonstrate all these steps as I create a model that can take on the vital task of differentiating teddy bears from grizzly bears. Once we've got our data set in order, we'll then learn how to productionize our teddy-finder, and make it available online.
We've had some great additions since this lesson was recorded, so be sure to check out:
- The *production starter kits* on the course web site, such as https://course-v3.fast.ai/deployment_render.html for deploying to Render.com
- The new interactive GUI in the lesson notebook for using the model to find and fix mislabeled or incorrectly-collected images.
In the second half of the lesson we'll train a simple model from scratch, creating our own *gradient descent* loop. In the process, we'll be learning lots of new jargon, so be sure you've got a good place to take notes, since we'll be referring to this new terminology throughout the course (and there will be lots more introduced in every lesson from here on).

detail

{'title': 'Lesson 2: Deep Learning 2019 - Data cleaning and production; SGD from scratch', 'heatmap': [{'end': 1291.142, 'start': 1139.671, 'weight': 1}, {'end': 2994.11, 'start': 2917.033, 'weight': 0.708}], 'summary': 'Covers utilizing course resources, showcasing deep learning achievements, computer vision projects, building a bear classifier, data cleaning, resnet-34 in cnn, stochastic gradient descent, fitting line to data, regression loss functions, and neural network training with notable achievements, such as achieving a new state-of-the-art accuracy for sound data and a 1.4% error rate in the bear classifier.', 'chapters': [{'end': 257.913, 'segs': [{'end': 63.385, 'src': 'embed', 'start': 29.372, 'weight': 0, 'content': [{'end': 31.713, 'text': 'One is fact resources and official course updates.', 'start': 29.372, 'duration': 2.341}, {'end': 38.136, 'text': "This is where if there's something useful for you to know during the course, we will post there.", 'start': 32.534, 'duration': 5.602}, {'end': 40.077, 'text': 'Nobody else can reply to that thread.', 'start': 38.536, 'duration': 1.541}, {'end': 43.399, 'text': 'So if you set that thread to watching and to notifications,', 'start': 40.497, 'duration': 2.902}, {'end': 47.701, 'text': "you're not going to be bugged by anybody else except stuff that we think you need to know for the course.", 'start': 43.399, 'duration': 4.302}, {'end': 52.943, 'text': "it's got all the official information about how to get set up on each platform.", 'start': 48.662, 'duration': 4.281}, {'end': 63.385, 'text': "Please note a lot of people post all kinds of other tidbits about how they've set up things on previous solutions or previous courses or other places.", 'start': 53.883, 'duration': 9.502}], 'summary': 'Official course updates and resources are provided in a dedicated thread, ensuring that only essential information reaches the learners.', 'duration': 34.013, 'max_score': 29.372, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw29372.jpg'}, {'end': 99.499, 'src': 'embed', 'start': 70.746, 'weight': 3, 'content': [{'end': 72.286, 'text': 'and they definitely work.', 'start': 70.746, 'duration': 1.54}, {'end': 77.587, 'text': 'Okay, so I would strongly suggest you follow those tips.', 'start': 72.786, 'duration': 4.801}, {'end': 85.314, 'text': 'If you do have a question about using one of these platforms, please use these discussions, not some other topic that you create,', 'start': 78.247, 'duration': 7.067}, {'end': 90.738, 'text': "because this way people that are involved in these platforms will be able to see it and things won't get messy.", 'start': 85.314, 'duration': 5.424}, {'end': 99.499, 'text': 'And then secondly, For every lesson there will be an official updates thread for that lesson.', 'start': 91.519, 'duration': 7.98}], 'summary': 'Suggest using specific discussions for platform questions to avoid confusion. official updates threads for every lesson.', 'duration': 28.753, 'max_score': 70.746, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw70746.jpg'}, {'end': 152.168, 'src': 'embed', 'start': 118.735, 'weight': 1, 'content': [{'end': 120.935, 'text': 'So I mentioned the idea of watching a thread.', 'start': 118.735, 'duration': 2.2}, {'end': 127.857, 'text': 'So this is a really good idea, is that you can go to a thread, like particularly those official update ones, and click at the bottom, Watching.', 'start': 120.975, 'duration': 6.882}, {'end': 132.578, 'text': "Okay, and if you do that, that's going to enable notifications for any updates to that thread.", 'start': 127.877, 'duration': 4.701}, {'end': 140.279, 'text': "Particularly if you go into, click on your little username in the top right, Preferences, and turn this on, that'll give you an email as well.", 'start': 132.998, 'duration': 7.281}, {'end': 145.52, 'text': 'Okay, so any of you that have missed some of the updates so far, go back and have a look through,', 'start': 140.74, 'duration': 4.78}, {'end': 152.168, 'text': "because we're really trying to make sure that we keep you updated with anything that we think is important.", 'start': 145.52, 'duration': 6.648}], 'summary': 'Enable thread notifications for important updates to stay updated.', 'duration': 33.433, 'max_score': 118.735, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw118735.jpg'}, {'end': 208.215, 'src': 'embed', 'start': 174.864, 'weight': 4, 'content': [{'end': 181.949, 'text': "What you should do is click, summarize this topic and it'll appear like this, which is all of the most liked ones will appear,", 'start': 174.864, 'duration': 7.085}, {'end': 185.392, 'text': "and then there'll be view 31 hidden replies or whatever in between.", 'start': 181.949, 'duration': 3.443}, {'end': 189.014, 'text': "So that's how you navigate these giant topics.", 'start': 185.752, 'duration': 3.262}, {'end': 197.38, 'text': "That's also why it's important you click the like button because that's the thing that's going to cause people to to see it in this recommended view.", 'start': 189.434, 'duration': 7.946}, {'end': 208.215, 'text': "So, when you come back to work hopefully you've realized by now that on the official course website course-v3.fast.ai,", 'start': 200.609, 'duration': 7.606}], 'summary': "Navigate giant topics by clicking 'like' to prioritize most liked content for recommended view.", 'duration': 33.351, 'max_score': 174.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw174864.jpg'}, {'end': 257.913, 'src': 'embed', 'start': 220.084, 'weight': 2, 'content': [{'end': 224.227, 'text': "and step two will be how to make sure you've got the latest Python library software.", 'start': 220.084, 'duration': 4.143}, {'end': 229.411, 'text': "Okay? much like this, but they're slightly different from platform to platform.", 'start': 224.247, 'duration': 5.164}, {'end': 232.977, 'text': "So please don't use some different set of commands you read somewhere else.", 'start': 229.491, 'duration': 3.486}, {'end': 237.645, 'text': 'Only use the commands that you read about here, and that will make everything very smooth.', 'start': 233.278, 'duration': 4.367}, {'end': 246.026, 'text': "If things aren't working for you, if you get some into some kind of messy situation, which we all do and you know,", 'start': 239.122, 'duration': 6.904}, {'end': 248.548, 'text': 'just delete your instance and start again.', 'start': 246.026, 'duration': 2.522}, {'end': 253.03, 'text': "Unless you've got mission-critical stuff there, it's the easiest way just to get out of a sticky situation.", 'start': 248.588, 'duration': 4.442}, {'end': 257.913, 'text': 'And you know, if you follow the instructions here, you really should find it works fine.', 'start': 253.07, 'duration': 4.843}], 'summary': 'Follow specific commands for smooth python library software updates, and start afresh if needed.', 'duration': 37.829, 'max_score': 220.084, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw220084.jpg'}], 'start': 1.57, 'title': 'Utilizing course resources', 'summary': 'Emphasizes the significance of using official course updates and resources, covering accessing updates, enabling notifications, and returning to work using the official course website, for a smooth learning experience and seamless operation.', 'chapters': [{'end': 90.738, 'start': 1.57, 'title': 'Lesson 2: computer vision dive', 'summary': 'Discusses the importance of using official course updates and resources for setting up platforms, emphasizing the reliability of the provided information and discouraging the use of unofficial tips.', 'duration': 89.168, 'highlights': ['The importance of using official course updates and resources for setting up platforms is emphasized, with an assurance of reliability and daily testing.', 'Encouragement to use the designated discussions for platform-related questions to ensure visibility and avoid clutter.', 'Reminder about the two important topics pinned on the forum regarding official course updates and resources.']}, {'end': 197.38, 'start': 91.519, 'title': 'Lesson updates and notifications', 'summary': 'Covers the process for accessing official updates, enabling notifications for updates, and navigating overwhelming thread replies, with a focus on providing a seamless learning experience and ensuring updated information to the participants.', 'duration': 105.861, 'highlights': ['The most popular thread has 1.1 thousand replies, emphasizing the overwhelming amount of information to navigate.', "Participants can enable notifications for updates by clicking 'Watching' on official update threads, ensuring they stay informed about any new developments.", 'The chapter stresses the importance of providing updated information and creating a seamless learning experience for participants.', 'The need for participants to click the like button to ensure important information is seen in the recommended view is highlighted.']}, {'end': 257.913, 'start': 200.609, 'title': 'Returning to work: course update and platform preparation', 'summary': 'Provides guidance on returning to work using the official course website, emphasizing the importance of following specific steps to update notebooks and python library software, and suggests starting afresh if encountering issues, with the assurance that following the provided instructions should ensure smooth operation.', 'duration': 57.304, 'highlights': ['The importance of following specific steps to update notebooks and Python library software is emphasized. It is advised to use the commands provided on the official course website for a smooth transition.', 'Starting afresh by deleting the instance is recommended if encountering issues, unless there are mission-critical data involved.', 'The chapter provides guidance on returning to work using the official course website, emphasizing the importance of following specific steps to update notebooks and Python library software.']}], 'duration': 256.343, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1570.jpg', 'highlights': ['Emphasizes the significance of using official course updates and resources for a smooth learning experience and seamless operation.', "Participants can enable notifications for updates by clicking 'Watching' on official update threads, ensuring they stay informed about any new developments.", 'The importance of following specific steps to update notebooks and Python library software is emphasized. It is advised to use the commands provided on the official course website for a smooth transition.', 'Encouragement to use the designated discussions for platform-related questions to ensure visibility and avoid clutter.', 'The need for participants to click the like button to ensure important information is seen in the recommended view is highlighted.', 'Starting afresh by deleting the instance is recommended if encountering issues, unless there are mission-critical data involved.']}, {'end': 713.043, 'segs': [{'end': 285.469, 'src': 'embed', 'start': 260.435, 'weight': 0, 'content': [{'end': 265.898, 'text': 'So, this is what I really wanted to talk about most of all, is what people have been doing this week.', 'start': 260.435, 'duration': 5.463}, {'end': 272.191, 'text': "If you've noticed, and a lot of you have, there's been 167 people sharing their work.", 'start': 266.884, 'duration': 5.307}, {'end': 279.284, 'text': "And this is really cool because it's pretty intimidating to put yourself out there and say like, I'm new to all this, but here's what I've done.", 'start': 272.732, 'duration': 6.552}, {'end': 285.469, 'text': "And so examples of things I thought was really interesting was figuring out who's talking.", 'start': 280.124, 'duration': 5.345}], 'summary': '167 people shared their work this week, despite being new to the field, showing courage and participation in the community.', 'duration': 25.034, 'max_score': 260.435, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw260435.jpg'}, {'end': 342.037, 'src': 'embed', 'start': 317.398, 'weight': 1, 'content': [{'end': 328.386, 'text': 'One was looking at the sounds data that was used in this paper, and in this paper they were trying to figure out what kind of sound things were.', 'start': 317.398, 'duration': 10.988}, {'end': 334.11, 'text': 'they got a, as you would expect since they published a paper, they got a state-of-the-art of nearly 80% accuracy.', 'start': 328.986, 'duration': 5.124}, {'end': 340.876, 'text': 'Ethan Sutton then tried using the Lesson 1 techniques and got 80.5% accuracy.', 'start': 334.911, 'duration': 5.965}, {'end': 342.037, 'text': 'So I think this is pretty awesome.', 'start': 340.896, 'duration': 1.141}], 'summary': 'The paper achieved nearly 80% accuracy in sound classification, but using lesson 1 techniques, ethan sutton achieved 80.5% accuracy, showing improvement.', 'duration': 24.639, 'max_score': 317.398, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw317398.jpg'}, {'end': 385.143, 'src': 'embed', 'start': 362.554, 'weight': 2, 'content': [{'end': 369.997, 'text': 'Suvash has a new state-of-the-art accuracy for Devangari text recognition.', 'start': 362.554, 'duration': 7.443}, {'end': 372.418, 'text': "I think he's got it even higher than this now.", 'start': 370.457, 'duration': 1.961}, {'end': 376.759, 'text': 'And this is actually confirmed by the person on Twitter who created the dataset.', 'start': 373.538, 'duration': 3.221}, {'end': 379.04, 'text': "I don't think he had any idea.", 'start': 377.36, 'duration': 1.68}, {'end': 380.721, 'text': "He just posted, hey, here's a nice thing I did.", 'start': 379.08, 'duration': 1.641}, {'end': 383.502, 'text': 'And this guy on Twitter was like, oh, I made that data set.', 'start': 381.201, 'duration': 2.301}, {'end': 385.143, 'text': "Congratulations You've got a new record.", 'start': 383.542, 'duration': 1.601}], 'summary': 'Suvash achieved a new record accuracy for devangari text recognition, confirmed by the dataset creator on twitter.', 'duration': 22.589, 'max_score': 362.554, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw362554.jpg'}, {'end': 532.161, 'src': 'embed', 'start': 480.832, 'weight': 3, 'content': [{'end': 485.595, 'text': 'But the actual deep learning side is actually pretty straightforward.', 'start': 480.832, 'duration': 4.763}, {'end': 493.411, 'text': 'Another very cool result from Simon Willison and Natalie Down.', 'start': 488.208, 'duration': 5.203}, {'end': 503.736, 'text': 'They created a Cougar or Not web application over the weekend and won the Science Hack Day award in San Francisco.', 'start': 493.931, 'duration': 9.805}, {'end': 507.459, 'text': "And so I think that's pretty fantastic.", 'start': 504.997, 'duration': 2.462}, {'end': 511.12, 'text': 'So lots of examples of people doing really interesting work.', 'start': 508.279, 'duration': 2.841}, {'end': 518.65, 'text': "Hopefully this will be inspiring to you to think, wow, this is cool that I can do this with what I've learned.", 'start': 512.941, 'duration': 5.709}, {'end': 522.832, 'text': 'It can also be intimidating to think like, wow, these people are doing amazing things.', 'start': 519.13, 'duration': 3.702}, {'end': 528.858, 'text': "But it's important to realize that of the thousands of people doing this course, you know,", 'start': 523.354, 'duration': 5.504}, {'end': 532.161, 'text': "I'm just picking out the kind of a few of the really amazing ones.", 'start': 528.858, 'duration': 3.303}], 'summary': 'Simon willison and natalie down won science hack day award in san francisco with cougar or not web app, inspiring many.', 'duration': 51.329, 'max_score': 480.832, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw480832.jpg'}], 'start': 260.435, 'title': 'New projects and deep learning achievements', 'summary': 'Highlights the impressive showcase of 167 people sharing their work, achieving a new state-of-the-art accuracy for sound data, and breaking records in devangari text recognition. it also discusses impressive achievements in deep learning, including beating the previous best by more than 30% and inspiring stories about contributing to the field, as well as showcases various creative classifiers built by the community.', 'chapters': [{'end': 385.143, 'start': 260.435, 'title': 'Sharing of new projects and achievements', 'summary': 'Highlights the impressive showcase of 167 people sharing their work, including a neural network for cleaning whatsapp images, achieving a new state-of-the-art accuracy for sound data, and breaking records in devangari text recognition.', 'duration': 124.708, 'highlights': ['Suvash achieves a new state-of-the-art accuracy for Devangari text recognition, confirmed by the creator of the dataset on Twitter.', 'Ethan Sutton uses Lesson 1 techniques to achieve 80.5% accuracy in sound data analysis, surpassing the state-of-the-art accuracy of 80% achieved in a published paper.', 'A total of 167 people share their work, indicating a significant level of engagement and participation within the community.', 'Individuals showcase impressive projects, such as a neural network for cleaning WhatsApp images and analyzing sound data, demonstrating the accessibility and capabilities of current technology.']}, {'end': 713.043, 'start': 385.383, 'title': 'Deep learning achievements and inspirations', 'summary': 'Discusses impressive achievements in deep learning, including a state-of-the-art result of beating the previous best by more than 30% and inspiring stories about overcoming intimidation to contribute to the field, as well as showcases various creative classifiers built by the community.', 'duration': 327.66, 'highlights': ["Alina Harley's state-of-the-art result, beating the previous best by more than 30% and receiving recognition from a VP at a genomics analysis company, is a significant achievement in deep learning. Alina Harley's achievement of beating the previous best by more than 30% and receiving recognition from a VP at a genomics analysis company demonstrates the impactful advancements in deep learning.", "Inspiring stories of overcoming intimidation and contributing to the field, such as Daniel Armstrong's journey from feeling overwhelmed by the documentation to submitting a pull request, showcase the importance of perseverance and determination in deep learning. Daniel Armstrong's journey from feeling overwhelmed by the documentation to submitting a pull request showcases the importance of perseverance and determination in contributing to the field of deep learning.", 'Creative classifiers built by the community, including classifiers for Trinidad and Tobago Islander versus Masquerader, zucchini versus cucumber, and recognizing new versus old Panamanian buses, demonstrate the diverse and creative applications of deep learning. The diverse and creative applications of deep learning are showcased through the development of creative classifiers for various categories, such as Trinidad and Tobago Islander versus Masquerader, zucchini versus cucumber, and recognizing new versus old Panamanian buses.']}], 'duration': 452.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw260435.jpg', 'highlights': ['167 people share their work, indicating significant community engagement.', 'Ethan Sutton achieves 80.5% accuracy in sound data analysis, surpassing the state-of-the-art accuracy of 80%.', 'Suvash achieves a new state-of-the-art accuracy for Devangari text recognition, confirmed by the dataset creator on Twitter.', 'Alina Harley beats the previous best by more than 30% in deep learning, receiving recognition from a VP at a genomics analysis company.', 'Inspiring stories of overcoming intimidation and contributing to the field showcase the importance of perseverance and determination in deep learning.', 'Creative classifiers for various categories demonstrate the diverse and creative applications of deep learning.']}, {'end': 1086.015, 'segs': [{'end': 763.7, 'src': 'embed', 'start': 742.053, 'weight': 0, 'content': [{'end': 753.138, 'text': "using some techniques we'll be discussing in the next couple of courses to build something that can recognize complete or incomplete or foundation buildings and actually plot them on aerial satellite view.", 'start': 742.053, 'duration': 11.085}, {'end': 758.58, 'text': 'So lots and lots of fascinating projects.', 'start': 754.999, 'duration': 3.581}, {'end': 761.036, 'text': "So don't worry, it's only been one week.", 'start': 759.414, 'duration': 1.622}, {'end': 763.7, 'text': "That doesn't mean everybody has to have had a project out yet.", 'start': 761.257, 'duration': 2.443}], 'summary': 'Techniques discussed to build a system recognizing and plotting buildings on aerial view.', 'duration': 21.647, 'max_score': 742.053, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw742053.jpg'}, {'end': 836.208, 'src': 'embed', 'start': 810.173, 'weight': 4, 'content': [{'end': 815.056, 'text': 'That will take us back into more computer vision and then back into more NLP.', 'start': 810.173, 'duration': 4.883}, {'end': 822.662, 'text': "So the idea here is that it turns out that it's much better for learning if you kind of see things multiple times.", 'start': 815.397, 'duration': 7.265}, {'end': 825.964, 'text': "So, rather than being like okay, that's computer vision, you won't see it again.", 'start': 823.062, 'duration': 2.902}, {'end': 832.567, 'text': "for the rest of the course, we're actually going to come back to the two key applications, NLP and computer vision, a few weeks apart.", 'start': 825.964, 'duration': 6.603}, {'end': 836.208, 'text': "And that's going to force your brain to realize like, oh, I have to remember this.", 'start': 832.967, 'duration': 3.241}], 'summary': 'Repetition of computer vision and nlp enhances learning efficiency.', 'duration': 26.035, 'max_score': 810.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw810173.jpg'}, {'end': 946.007, 'src': 'embed', 'start': 921.756, 'weight': 2, 'content': [{'end': 932.482, 'text': 'So this approach is based on a lot of research, academic research into learning theory, and one guy in particular, David Perkins from Harvard,', 'start': 921.756, 'duration': 10.726}, {'end': 933.923, 'text': 'has this really great analogy.', 'start': 932.482, 'duration': 1.441}, {'end': 936.907, 'text': "He's a researcher into learning theory.", 'start': 934.244, 'duration': 2.663}, {'end': 944.766, 'text': "He describes this approach of the whole game, which is basically if you're teaching a kid to play soccer and You don't you know, first of all,", 'start': 937.347, 'duration': 7.419}, {'end': 946.007, 'text': 'teach them about.', 'start': 944.766, 'duration': 1.241}], 'summary': 'Based on academic research, the approach uses a soccer analogy for teaching.', 'duration': 24.251, 'max_score': 921.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw921756.jpg'}, {'end': 988.842, 'src': 'embed', 'start': 963.111, 'weight': 3, 'content': [{'end': 968.512, 'text': 'And then you, you know, gradually over the following years learn more and more so that you can get better and better at it.', 'start': 963.111, 'duration': 5.401}, {'end': 971.973, 'text': "So this is kind of what we're trying to get you to do, is to play soccer.", 'start': 968.892, 'duration': 3.081}, {'end': 977.015, 'text': 'which in our case is to type code and look at the inputs and look at the outputs.', 'start': 972.452, 'duration': 4.563}, {'end': 988.842, 'text': "Okay, so let's dig into our first notebook, which is called Lesson 2 Download.", 'start': 979.736, 'duration': 9.106}], 'summary': 'The goal is to improve coding skills by learning gradually, aiming for better performance, as demonstrated through the example of playing soccer.', 'duration': 25.731, 'max_score': 963.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw963111.jpg'}, {'end': 1066.621, 'src': 'embed', 'start': 1016.642, 'weight': 1, 'content': [{'end': 1022.384, 'text': 'the approach is inspired by Adrian Rosebrock, who has a terrific website called PI Image Search,', 'start': 1016.642, 'duration': 5.742}, {'end': 1028.186, 'text': 'and he has this nice explanation of how to create a data set using Google Images,', 'start': 1022.384, 'duration': 5.802}, {'end': 1032.43, 'text': 'and So that was definitely an inspiration for some of the techniques we use here.', 'start': 1028.186, 'duration': 4.244}, {'end': 1035.571, 'text': 'So thank you to Adrian And you should definitely check out his site.', 'start': 1032.45, 'duration': 3.121}, {'end': 1042.557, 'text': "It's a really it's full of lots of good resources So So here we are.", 'start': 1035.632, 'duration': 6.925}, {'end': 1051.063, 'text': 'So we are going to try to create a teddy bear detector.', 'start': 1042.957, 'duration': 8.106}, {'end': 1059.697, 'text': "Thanks, We're going to try and make a teddy bear detector and we're going to try and separate teddy bears and from black bears, from grizzly bears.", 'start': 1051.063, 'duration': 8.634}, {'end': 1061.378, 'text': 'Now this is very important.', 'start': 1060.117, 'duration': 1.261}, {'end': 1066.621, 'text': "I have a three-year-old daughter and she needs to know what she's dealing with.", 'start': 1062.078, 'duration': 4.543}], 'summary': "Inspired by adrian rosebrock's pi image search, creating a teddy bear detector to distinguish teddy bears from black and grizzly bears.", 'duration': 49.979, 'max_score': 1016.642, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1016642.jpg'}], 'start': 713.043, 'title': 'Computer vision, nlp, and teaching coding', 'summary': 'Covers various computer vision projects such as satellite image classification and building recognition, emphasizes learning through repetition and experimentation, and introduces teaching coding via analogy with a focus on creating a teddy bear detector.', 'chapters': [{'end': 919.214, 'start': 713.043, 'title': 'Computer vision and nlp projects', 'summary': 'Discusses various computer vision projects, including satellite image classification and building recognition, with a focus on learning through repetition and experimentation, as well as the recommendation to revisit course content multiple times for better understanding and retention.', 'duration': 206.171, 'highlights': ['The chapter discusses various computer vision projects, including satellite image classification and building recognition. The transcript mentions projects such as satellite image classification and building recognition with techniques to recognize complete or incomplete buildings and plot them on aerial satellite view.', 'Emphasizes learning through repetition and experimentation. The chapter advocates for going through the course content multiple times, with the recommendation to watch the videos at least three times, going through the material slowly each time to gain a deeper understanding.', 'Encourages revisiting course content for better understanding and retention. It advises not to stop at lesson one and to continue revisiting the content until a better understanding is achieved, promoting the idea of learning through repetition for improved retention and comprehension.']}, {'end': 1086.015, 'start': 921.756, 'title': 'Teaching coding via analogy', 'summary': 'Discusses an approach to teaching coding inspired by the analogy of teaching soccer, where learners gradually build skills over time, and introduces creating a teddy bear detector using image classification.', 'duration': 164.259, 'highlights': ['The approach of teaching coding is inspired by the analogy of teaching soccer, where learners gradually build skills over time. The approach of teaching coding is inspired by the analogy of teaching soccer, where learners are introduced to coding gradually over time, similar to learning soccer skills, aiming to create a teddy bear detector using image classification.', "Introduction to creating a teddy bear detector using image classification. The chapter introduces creating a teddy bear detector using image classification and aims to separate teddy bears, black bears, and grizzly bears for the purpose of identifying genuine teddy bears, particularly important for the author's three-year-old daughter.", "Acknowledgment of inspiration from Adrian Rosebrock's website PI Image Search for techniques used in creating the data set. The chapter acknowledges inspiration from Adrian Rosebrock's website PI Image Search for techniques used in creating the data set and encourages readers to explore the site for valuable resources."]}], 'duration': 372.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw713043.jpg', 'highlights': ['The chapter discusses various computer vision projects, including satellite image classification and building recognition.', 'Introduction to creating a teddy bear detector using image classification.', 'Emphasizes learning through repetition and experimentation.', 'The approach of teaching coding is inspired by the analogy of teaching soccer, where learners gradually build skills over time.', 'Encourages revisiting course content for better understanding and retention.', "Acknowledgment of inspiration from Adrian Rosebrock's website PI Image Search for techniques used in creating the data set."]}, {'end': 2093.074, 'segs': [{'end': 1291.142, 'src': 'heatmap', 'start': 1139.671, 'weight': 1, 'content': [{'end': 1146.276, 'text': "So I've got windows, so I go control Shift J, paste in that code.", 'start': 1139.671, 'duration': 6.605}, {'end': 1147.857, 'text': 'So this is a JavaScript console.', 'start': 1146.296, 'duration': 1.561}, {'end': 1154.34, 'text': "For those of you who haven't done any JavaScript before, I hit enter and it downloads my file for me.", 'start': 1147.877, 'duration': 6.463}, {'end': 1160.423, 'text': 'So I would call this teddies.txt and press save.', 'start': 1154.7, 'duration': 5.723}, {'end': 1167.426, 'text': 'Okay, so I now have a file of teddies, or URLs of teddies.', 'start': 1161.023, 'duration': 6.403}, {'end': 1175.388, 'text': "So then I would repeat that process for black bears and for brown bears, since that's the classifier I would want,", 'start': 1168.086, 'duration': 7.302}, {'end': 1177.549, 'text': "and I'd put each one in a file with an appropriate name.", 'start': 1175.388, 'duration': 2.161}, {'end': 1179.37, 'text': "So that's step one.", 'start': 1178.73, 'duration': 0.64}, {'end': 1185.812, 'text': 'So step two is we now need to download those URLs to our server.', 'start': 1179.97, 'duration': 5.842}, {'end': 1196.456, 'text': "Because remember when we're using Jupyter Notebook it's not running on our computer. it's running on SageMaker or Crestle or Google Cloud or whatever.", 'start': 1185.832, 'duration': 10.624}, {'end': 1200.974, 'text': 'To do that, we start running some Jupyter cells.', 'start': 1198.413, 'duration': 2.561}, {'end': 1206.437, 'text': "So let's grab the Fast.ai library and let's start with black bears.", 'start': 1201.034, 'duration': 5.403}, {'end': 1207.798, 'text': "I've already got my black bears URL.", 'start': 1206.477, 'duration': 1.321}, {'end': 1210.439, 'text': "So I click on this cell for black bears and I'll run it.", 'start': 1207.818, 'duration': 2.621}, {'end': 1219.744, 'text': "See here how I've got three different cells doing the same thing but different information? This is one way I like to work with Jupyter Notebook.", 'start': 1211.54, 'duration': 8.204}, {'end': 1225.097, 'text': "It's something that a lot of people with a more strict scientific background are horrified by.", 'start': 1219.804, 'duration': 5.293}, {'end': 1226.918, 'text': 'This is not reproducible research.', 'start': 1225.117, 'duration': 1.801}, {'end': 1233.76, 'text': 'I actually click here and I run this cell to create a folder called black and a file called urls black for my black bears.', 'start': 1226.938, 'duration': 6.822}, {'end': 1241.362, 'text': 'I skip the next two cells and then I run this cell to create that folder, okay?', 'start': 1234.24, 'duration': 7.122}, {'end': 1253.62, 'text': 'And then I go down to the next section and I run the next cell, which is download images for black bears, right?', 'start': 1242.022, 'duration': 11.598}, {'end': 1256.341, 'text': "So that's just going to download my black bears to that folder.", 'start': 1253.64, 'duration': 2.701}, {'end': 1264.545, 'text': "And then I'll go back and I'll click on teddies and I run that cell and then scroll back down and I'll run this cell.", 'start': 1256.901, 'duration': 7.644}, {'end': 1268.867, 'text': "And so that way I'm just going backwards and forwards to download each of the classes that I want.", 'start': 1264.965, 'duration': 3.902}, {'end': 1272.909, 'text': "Very manual, but for me, I'm very iterative.", 'start': 1269.847, 'duration': 3.062}, {'end': 1273.969, 'text': "I'm very experimental.", 'start': 1272.929, 'duration': 1.04}, {'end': 1275.11, 'text': 'That works well for me.', 'start': 1274.29, 'duration': 0.82}, {'end': 1282.419, 'text': "If you're better at kind of planning ahead than I am, you can, you know, write a proper loop or whatever and do it that way.", 'start': 1275.757, 'duration': 6.662}, {'end': 1291.142, 'text': "So, but when you see my notebooks and see things where there's these kind of like configuration cells doing the same thing in different places,", 'start': 1283.019, 'duration': 8.123}], 'summary': 'Process involves downloading files, organizing urls into files, and running jupyter cells to download images to server.', 'duration': 151.471, 'max_score': 1139.671, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1139671.jpg'}, {'end': 1177.549, 'src': 'embed', 'start': 1147.877, 'weight': 3, 'content': [{'end': 1154.34, 'text': "For those of you who haven't done any JavaScript before, I hit enter and it downloads my file for me.", 'start': 1147.877, 'duration': 6.463}, {'end': 1160.423, 'text': 'So I would call this teddies.txt and press save.', 'start': 1154.7, 'duration': 5.723}, {'end': 1167.426, 'text': 'Okay, so I now have a file of teddies, or URLs of teddies.', 'start': 1161.023, 'duration': 6.403}, {'end': 1175.388, 'text': "So then I would repeat that process for black bears and for brown bears, since that's the classifier I would want,", 'start': 1168.086, 'duration': 7.302}, {'end': 1177.549, 'text': "and I'd put each one in a file with an appropriate name.", 'start': 1175.388, 'duration': 2.161}], 'summary': 'Introduction to downloading and organizing files in javascript for teddies, black bears, and brown bears.', 'duration': 29.672, 'max_score': 1147.877, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1147877.jpg'}, {'end': 1418.03, 'src': 'embed', 'start': 1390.302, 'weight': 2, 'content': [{'end': 1398.887, 'text': "So we've got this thing in the library called verify images, which will check all of the images in a path and will tell you if there's a problem.", 'start': 1390.302, 'duration': 8.585}, {'end': 1402.589, 'text': 'If you say delete equals true, it will actually delete it for you.', 'start': 1399.407, 'duration': 3.182}, {'end': 1407.512, 'text': "Okay, so that's a really nice easy way to end up with a clean data set.", 'start': 1402.609, 'duration': 4.903}, {'end': 1418.03, 'text': "So at this point I now have a bears folder containing a grizzly folder and a Teddy's folder and a black folder, and In other words,", 'start': 1408.292, 'duration': 9.738}], 'summary': "The library feature 'verify images' can check and delete images, aiding in data organization.", 'duration': 27.728, 'max_score': 1390.302, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1390302.jpg'}, {'end': 1797.702, 'src': 'embed', 'start': 1768.433, 'weight': 0, 'content': [{'end': 1774.116, 'text': "So we've downloaded some images from Google Image Search, created a classifier.", 'start': 1768.433, 'duration': 5.683}, {'end': 1776.077, 'text': "We've got a 1.4% error rate.", 'start': 1774.136, 'duration': 1.941}, {'end': 1777.257, 'text': "Let's save it.", 'start': 1776.757, 'duration': 0.5}, {'end': 1784.721, 'text': "And then as per usual, we can use the classification interpretation class to have a look at what's going on.", 'start': 1778.598, 'duration': 6.123}, {'end': 1787.843, 'text': 'And in this case, we made one mistake.', 'start': 1785.442, 'duration': 2.401}, {'end': 1791.325, 'text': 'There was one black bear classified as grizzly bear.', 'start': 1788.684, 'duration': 2.641}, {'end': 1796.721, 'text': "So that's a really good step.", 'start': 1793.359, 'duration': 3.362}, {'end': 1797.702, 'text': "We've come a long way.", 'start': 1796.981, 'duration': 0.721}], 'summary': 'Created image classifier with 1.4% error rate, identified one misclassification, making significant progress.', 'duration': 29.269, 'max_score': 1768.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1768433.jpg'}, {'end': 1886.571, 'src': 'embed', 'start': 1862.258, 'weight': 1, 'content': [{'end': 1872.863, 'text': "Like. if you think about it, it's very unlikely that if there is some mislabeled data, that it's going to be predicted correctly and with high confidence.", 'start': 1862.258, 'duration': 10.605}, {'end': 1875.284, 'text': "That's really unlikely to happen.", 'start': 1873.583, 'duration': 1.701}, {'end': 1884.109, 'text': "So we're going to focus on the ones which the model saying either it's not confident of or it was confident of and it was wrong about.", 'start': 1875.324, 'duration': 8.785}, {'end': 1886.571, 'text': 'They are the things which might be mislabeled.', 'start': 1884.829, 'duration': 1.742}], 'summary': 'Model focuses on low-confidence and mislabeled data to improve accuracy.', 'duration': 24.313, 'max_score': 1862.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1862258.jpg'}], 'start': 1086.869, 'title': 'Building a bear classifier', 'summary': 'Covers the process of downloading and organizing images for a bear classifier with an emphasis on reproducibility, achieving a 1.4% error rate, and identifying and removing noisy data, resulting in improved model accuracy and performance.', 'chapters': [{'end': 1317.406, 'start': 1086.869, 'title': 'Downloading images and urls for classifiers', 'summary': 'Outlines the process of finding and downloading urls of teddy bears, black bears, and brown bears for image classification, involving using google images, javascript console, and jupyter notebook to create and download files and folders. manual and iterative approach is taken for downloading each class, with an emphasis on experimental and exploratory methods.', 'duration': 230.537, 'highlights': ['The process involves finding and downloading URLs of teddy bears, black bears, and brown bears for image classification. This is the main objective outlined in the chapter, emphasizing the need to gather URLs for different bear classes for image classification.', 'Using Google Images and JavaScript console to obtain the list of URLs for teddy bears, black bears, and brown bears. The speaker provides a detailed guide on how to use Google Images and JavaScript console to obtain the list of URLs, demonstrating a practical approach to acquire the necessary data for image classification.', 'The manual and iterative approach is taken for downloading each class of bear images, with an emphasis on experimental and exploratory methods. The speaker explains their manual and iterative approach to downloading URLs and images for each bear class, highlighting their preference for experimental and exploratory methods in the image classification process.']}, {'end': 1530.117, 'start': 1317.846, 'title': 'Reproducibility and creativity in data processing', 'summary': 'Discusses the importance of reproducibility in data processing while also highlighting the value of human creativity in trying out different approaches. it demonstrates the process of downloading, verifying, and organizing images for deep learning, emphasizing the significance of having a consistent validation set.', 'duration': 212.271, 'highlights': ['The chapter emphasizes the importance of reproducibility in data processing, while also highlighting the value of human creativity in trying out different approaches. Importance of reproducibility in data processing, Value of human creativity in trying out different approaches.', 'The process of downloading images involves using multiple processes, with the option to handle errors better by adjusting the configuration. Use of multiple processes for downloading images, Configuration adjustment for better error handling.', "The 'verify images' function is discussed, which checks for corrupted images and allows for their deletion, ensuring a clean data set. Discussion of 'verify images' function, Importance of ensuring a clean data set.", 'The creation of a validation set is demonstrated, highlighting the significance of having a consistent validation set and the use of a fixed random seed to ensure reproducibility. Creation of a validation set, Significance of a consistent validation set, Use of fixed random seed for reproducibility.']}, {'end': 1844.864, 'start': 1530.117, 'title': 'Building a bear classifier with 1.4% error rate', 'summary': 'Discusses building a bear classifier, achieving a 1.4% error rate, and using a combination of human expertise and computer learning to improve the dataset quality and classification accuracy.', 'duration': 314.747, 'highlights': ['Created a bear classifier with 1.4% error rate after a couple of epochs, demonstrating successful model training and accuracy improvement.', 'Emphasized the importance of combining human expertise with computer learning to clean up noisy datasets and improve classification accuracy.', 'Explained the process of selecting learning rates using Learning Rate Finder and highlighted the significance of identifying the strongest downward slope for optimal learning rate selection.', "Highlighted the usefulness of default parameter values for model training and mentioned that copying the instructor's numbers generally leads to successful outcomes."]}, {'end': 2093.074, 'start': 1845.565, 'title': 'Identifying noisy data in deep learning models', 'summary': 'Discusses the process of identifying and removing noisy data from a dataset by leveraging the top losses and a new file deleter widget, resulting in improved model accuracy and performance.', 'duration': 247.509, 'highlights': ['The process involves identifying mislabeled data by focusing on images where the model was either not confident or confidently wrong, leading to the creation of a new file deleter widget by the San Francisco Fast.ai study group.', 'The file deleter widget allows for the removal of mislabeled images from the validation dataset, resulting in cleaner metrics and improved model accuracy.', 'Utilizing the file deleter widget to clean the training dataset is recommended to eliminate noise and improve model performance, and a similar process can be repeated for the test set.', 'The demonstration of identifying and removing mislabeled images using the file deleter widget showcases the practical application of the process, resulting in improved dataset quality and model performance.']}], 'duration': 1006.205, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw1086869.jpg', 'highlights': ['Created a bear classifier with 1.4% error rate after a couple of epochs, demonstrating successful model training and accuracy improvement.', 'The process involves identifying mislabeled data by focusing on images where the model was either not confident or confidently wrong, leading to the creation of a new file deleter widget by the San Francisco Fast.ai study group.', "The 'verify images' function is discussed, which checks for corrupted images and allows for their deletion, ensuring a clean data set.", 'The process involves finding and downloading URLs of teddy bears, black bears, and brown bears for image classification. This is the main objective outlined in the chapter, emphasizing the need to gather URLs for different bear classes for image classification.']}, {'end': 4287.648, 'segs': [{'end': 2340.034, 'src': 'embed', 'start': 2290.191, 'weight': 0, 'content': [{'end': 2291.733, 'text': "So I guess the main thing I'm saying is,", 'start': 2290.191, 'duration': 1.542}, {'end': 2299.12, 'text': "if you go through this process of cleaning up your data and then you rerun your model and find it's like 0.001% better, that's normal.", 'start': 2291.733, 'duration': 7.387}, {'end': 2300.862, 'text': "Okay, that's fine.", 'start': 2299.441, 'duration': 1.421}, {'end': 2306.528, 'text': "But it's still a good idea just to make sure that you don't have too much noise in your data in case it is biased.", 'start': 2300.922, 'duration': 5.606}, {'end': 2311.213, 'text': "So at this point, we're ready to put our model in production.", 'start': 2307.329, 'duration': 3.884}, {'end': 2326.189, 'text': 'this is where I hear a lot of people ask me about you know which mega, Google, Facebook, highly distributed serving system they should use,', 'start': 2312.644, 'duration': 13.545}, {'end': 2329.57, 'text': 'and how do they use a thousand GPUs at the same time and whatever else?', 'start': 2326.189, 'duration': 3.381}, {'end': 2340.034, 'text': 'For the vast vast vast majority of things that you all do, you will want to actually run in production on a CPU, not a GPU.', 'start': 2330.99, 'duration': 9.044}], 'summary': 'Cleaning data is crucial for model accuracy; often, cpus suffice for production.', 'duration': 49.843, 'max_score': 2290.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2290191.jpg'}, {'end': 2425.443, 'src': 'embed', 'start': 2398.9, 'weight': 2, 'content': [{'end': 2405.927, 'text': "So most people I know who are running apps that aren't kind of at Google scale, based on deep learning, are using CPUs.", 'start': 2398.9, 'duration': 7.027}, {'end': 2408.49, 'text': 'And the term we use is inference, right?', 'start': 2406.768, 'duration': 1.722}, {'end': 2415.116, 'text': "So when you're not training a model, you've got a trained model and you're getting it to predict things we call that inference.", 'start': 2408.53, 'duration': 6.586}, {'end': 2420.9, 'text': "That's why we say here you probably want to use CPU for inference.", 'start': 2415.376, 'duration': 5.524}, {'end': 2425.443, 'text': "So, at inference time, you've got your pre-trained model.", 'start': 2422.261, 'duration': 3.182}], 'summary': 'For smaller-scale apps, cpus are commonly used for inference with pre-trained models.', 'duration': 26.543, 'max_score': 2398.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2398900.jpg'}, {'end': 2580.537, 'src': 'embed', 'start': 2550.706, 'weight': 3, 'content': [{'end': 2554.948, 'text': "so now you've got a data bunch that actually doesn't have any data in it at all.", 'start': 2550.706, 'duration': 4.242}, {'end': 2562.531, 'text': "It's just something that knows how to transform a new image in the same way that you trained with, so that you can now do inference.", 'start': 2555.308, 'duration': 7.223}, {'end': 2567.133, 'text': 'So you can now create a CNN with this kind of fake data bunch.', 'start': 2563.691, 'duration': 3.442}, {'end': 2571.495, 'text': 'And again, you would use exactly the same model that you trained with.', 'start': 2567.173, 'duration': 4.322}, {'end': 2574.316, 'text': 'You can now load in those saved weights.', 'start': 2572.055, 'duration': 2.261}, {'end': 2580.537, 'text': "Okay, and so this is the stuff that you would do once and just once, when your web app's starting up okay?", 'start': 2574.336, 'duration': 6.201}], 'summary': 'Using fake data bunch for cnn inference, with same model and saved weights.', 'duration': 29.831, 'max_score': 2550.706, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2550706.jpg'}, {'end': 2735.096, 'src': 'embed', 'start': 2704.166, 'weight': 4, 'content': [{'end': 2706.87, 'text': 'nice little tutorials online and kind of starter code.', 'start': 2704.166, 'duration': 2.704}, {'end': 2714.53, 'text': "You know, if in doubt, why don't you try Starlet? There's a free hosting that you can use.", 'start': 2708.171, 'duration': 6.359}, {'end': 2718.191, 'text': "There's one called Python Anywhere, for example.", 'start': 2714.59, 'duration': 3.601}, {'end': 2721.932, 'text': "The one that Simon's used, we'll mention that on the forum.", 'start': 2719.392, 'duration': 2.54}, {'end': 2727.194, 'text': "It's something you can basically package it up as a Docker thing and shoot it off and it'll serve it up for you.", 'start': 2721.952, 'duration': 5.242}, {'end': 2729.254, 'text': "So it doesn't even need to cost you any money.", 'start': 2727.214, 'duration': 2.04}, {'end': 2735.096, 'text': "And so all these classifiers that you're creating, you can turn them into web applications.", 'start': 2729.614, 'duration': 5.482}], 'summary': 'Convert classifiers into web apps using free hosting like starlet or python anywhere.', 'duration': 30.93, 'max_score': 2704.166, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2704166.jpg'}, {'end': 2865.417, 'src': 'embed', 'start': 2837.784, 'weight': 5, 'content': [{'end': 2840.566, 'text': 'It works most of the time.', 'start': 2837.784, 'duration': 2.782}, {'end': 2844.549, 'text': "So what if we try a learning rate of 0.5? That's huge.", 'start': 2840.926, 'duration': 3.623}, {'end': 2850.214, 'text': 'What happens? validation loss gets pretty damn high.', 'start': 2844.81, 'duration': 5.404}, {'end': 2854.915, 'text': "Remember, this is normally something that's underneath 1.", 'start': 2850.874, 'duration': 4.041}, {'end': 2861.717, 'text': 'So if you see your validation loss do that, before we even learn what validation loss is, just know this.', 'start': 2854.915, 'duration': 6.802}, {'end': 2864.037, 'text': "If it does that, your learning rate's too high.", 'start': 2861.797, 'duration': 2.24}, {'end': 2865.417, 'text': "That's all you need to know.", 'start': 2864.557, 'duration': 0.86}], 'summary': "A learning rate of 0.5 resulted in a high validation loss, indicating it's too high.", 'duration': 27.633, 'max_score': 2837.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2837784.jpg'}, {'end': 2994.11, 'src': 'heatmap', 'start': 2917.033, 'weight': 0.708, 'content': [{'end': 2923.618, 'text': 'If you go to learn.recorder is an object which is going to keep track of lots of things happening while you train.', 'start': 2917.033, 'duration': 6.585}, {'end': 2933.885, 'text': 'You can call plot losses to print to plot out the validation and training loss, and you can just see them just like gradually going down so slow.', 'start': 2923.958, 'duration': 9.927}, {'end': 2939.709, 'text': 'So if you see that happening, then you have a learning rate which is too small.', 'start': 2934.445, 'duration': 5.264}, {'end': 2943.712, 'text': 'Okay, so bump it up by 10 or bump it up by 100 and try again.', 'start': 2940.449, 'duration': 3.263}, {'end': 2954.852, 'text': "The other thing you'll see if your learning rate is too small is that your training loss will be higher than your validation loss.", 'start': 2946.328, 'duration': 8.524}, {'end': 2962.475, 'text': 'You never want a model where your training loss is higher than your validation loss.', 'start': 2956.532, 'duration': 5.943}, {'end': 2972.019, 'text': "That always means you haven't fitted enough, which means either your learning rate is too low or your number of epochs is too low.", 'start': 2963.175, 'duration': 8.844}, {'end': 2975.981, 'text': 'So if you have a model like that, train it some more.', 'start': 2972.379, 'duration': 3.602}, {'end': 2978.879, 'text': 'or train it with a higher learning rate.', 'start': 2976.798, 'duration': 2.081}, {'end': 2984.143, 'text': 'Okay? Too few epochs.', 'start': 2979.48, 'duration': 4.663}, {'end': 2994.11, 'text': "So what if we train for just one epoch? Our error rate certainly is better than random, it's 5%, but look at this.", 'start': 2985.324, 'duration': 8.786}], 'summary': "Learn.recorder tracks training progress, adjust learning rate if losses don't decrease. training loss shouldn't exceed validation loss.", 'duration': 77.077, 'max_score': 2917.033, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2917033.jpg'}, {'end': 3701.384, 'src': 'embed', 'start': 3672.305, 'weight': 6, 'content': [{'end': 3676.208, 'text': "But let's now dig in and actually understand it more completely.", 'start': 3672.305, 'duration': 3.903}, {'end': 3685.895, 'text': "So we're going to create this mathematical function that takes the numbers that represent the pixels and spits out probabilities for each possible class.", 'start': 3678.309, 'duration': 7.586}, {'end': 3694.918, 'text': "And by the way, a lot of the stuff that we're using here, we are stealing from other people who are awesome, and so we are putting their details here.", 'start': 3687.03, 'duration': 7.888}, {'end': 3701.384, 'text': "So like, please check out their work, because they've got great work that we are highlighting in our course.", 'start': 3695.018, 'duration': 6.366}], 'summary': "Creating a mathematical function to analyze pixels and generate class probabilities, acknowledging use of others' work.", 'duration': 29.079, 'max_score': 3672.305, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw3672305.jpg'}, {'end': 4036.987, 'src': 'embed', 'start': 4010.569, 'weight': 7, 'content': [{'end': 4017.491, 'text': "But what you might remember from school is that when you've got like two things being model-plied together,", 'start': 4010.569, 'duration': 6.922}, {'end': 4022.633, 'text': "two things being model-plied together and then they get added up, that's called a dot product.", 'start': 4017.491, 'duration': 5.142}, {'end': 4033.256, 'text': "And then if you do that for lots and lots of different numbers i, then that's called a matrix product.", 'start': 4025.934, 'duration': 7.322}, {'end': 4036.987, 'text': 'So in fact, this whole thing can be written like this.', 'start': 4034.246, 'duration': 2.741}], 'summary': 'Dot product is used to add two things being model-plied together and matrix product is obtained by doing it for lots of different numbers i.', 'duration': 26.418, 'max_score': 4010.569, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4010569.jpg'}, {'end': 4196.224, 'src': 'embed', 'start': 4170.673, 'weight': 8, 'content': [{'end': 4177.975, 'text': 'Okay, if you get more data, then you can train for longer, get a higher accuracy, lower error rate, without overfitting.', 'start': 4170.673, 'duration': 7.302}, {'end': 4182.036, 'text': "Unfortunately, there's no shortcut.", 'start': 4180.835, 'duration': 1.201}, {'end': 4183.237, 'text': 'I wish there was.', 'start': 4182.576, 'duration': 0.661}, {'end': 4185.818, 'text': 'I wish there was somewhere to know ahead of time how much data you need.', 'start': 4183.296, 'duration': 2.522}, {'end': 4187.439, 'text': 'But I will say this.', 'start': 4186.678, 'duration': 0.761}, {'end': 4189.779, 'text': 'Most of the time, you need less data than you think.', 'start': 4187.578, 'duration': 2.201}, {'end': 4196.224, 'text': 'So organizations very commonly spend too much time gathering data, getting more data than it turned out they actually needed.', 'start': 4190.279, 'duration': 5.945}], 'summary': 'More data leads to longer training, higher accuracy, lower error rate, but commonly organizations gather more data than needed.', 'duration': 25.551, 'max_score': 4170.673, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4170673.jpg'}], 'start': 2093.074, 'title': 'Deep learning and model training', 'summary': 'Covers data cleaning, app building, using cpus for inference, troubleshooting model training, simplifying deep learning concepts, and linear algebra for machine learning, emphasizing practical insights and solutions for model accuracy, efficiency, and understanding.', 'chapters': [{'end': 2398.159, 'start': 2093.074, 'title': 'Data cleaning, app building, and productionizing models', 'summary': 'Covers data cleaning for model training, building applications in jupyter notebook, and the practicality of running models in production on cpus over gpus, emphasizing the importance of reducing biased noisy data. it also encourages creating tools for fellow practitioners and learning about ipywidgets for gui programming.', 'duration': 305.085, 'highlights': ['The importance of reducing biased noisy data is emphasized, as it can significantly impact model accuracy, and even a marginal improvement after data cleaning is considered normal. The chapter emphasizes the importance of reducing biased noisy data, as it can significantly impact model accuracy. Even a marginal improvement after data cleaning is considered normal, highlighting the need to ensure minimal noise in the data.', "Encourages creating tools for fellow practitioners and learning about IPyWidgets for GUI programming to enhance fellow experimenters' experience. The chapter encourages creating tools for fellow practitioners and learning about IPyWidgets for GUI programming to enhance fellow experimenters' experience, underscoring the potential for creating applications inside notebooks and the underused nature of this capability.", 'Practicality of running models in production on CPUs over GPUs is emphasized, highlighting the ease of scaling and cost-effectiveness of using CPUs for most tasks. The practicality of running models in production on CPUs over GPUs is emphasized, highlighting the ease of scaling and cost-effectiveness of using CPUs for most tasks. It also explains the limited need for GPUs unless dealing with a very busy website.']}, {'end': 2763.071, 'start': 2398.9, 'title': 'Using cpus for inference in deep learning', 'summary': 'Discusses using cpus for inference in deep learning, including the process of loading pre-trained models, creating a data bunch, and deploying web applications, with insights on utilizing starlet for web app creation and python anywhere for free hosting.', 'duration': 364.171, 'highlights': ['The chapter emphasizes the usage of CPUs for inference in deep learning for apps not at Google scale, based on deep learning, and provides insights into the process of using pre-trained models for prediction. Most non-Google scale apps based on deep learning use CPUs for inference, and the process of loading pre-trained models for prediction is discussed.', 'The chapter details the process of creating a data bunch, including the need to pass the same information used for training, such as transforms, size, and normalization, to perform inference using a fake data bunch. Creating a data bunch for inference involves passing the same information used for training, such as transforms, size, and normalization, to perform inference using a fake data bunch.', 'The chapter discusses the usage of Starlet for creating web applications, highlighting its similarity to Flask and its modern approach enabling the use of asynchronous Python 3 functionalities. Starlet is recommended for creating web applications due to its similarity to Flask and its modern approach enabling the use of asynchronous Python 3 functionalities.', 'The chapter suggests trying Starlet and Python Anywhere for creating and hosting web applications, providing a cost-effective approach for deploying web apps without the need for significant financial investment. The chapter suggests trying Starlet and Python Anywhere for creating and hosting web applications, providing a cost-effective approach for deploying web apps without the need for significant financial investment.']}, {'end': 3208.368, 'start': 2767.284, 'title': 'Troubleshooting model training', 'summary': 'Discusses common problems in model training, such as learning rate and epochs, and provides practical solutions based on validation loss and error rates, emphasizing the significance of understanding these factors to ensure model accuracy and efficiency.', 'duration': 441.084, 'highlights': ['The significance of learning rate in model training Understanding the impact of learning rate on validation loss and the need to adjust it based on validation loss improvements or deteriorations.', 'Identifying overfitting through error rate patterns Explaining the misleading notion about overfitting based on training and validation loss, emphasizing the focus on error rate as a true indicator of overfitting.', 'The impact of epochs on model training Highlighting the similarities between the effects of too few epochs and a low learning rate, and suggesting approaches to address these issues by adjusting the training duration or learning rate.']}, {'end': 3723.5, 'start': 3208.709, 'title': 'Simplifying deep learning concepts', 'summary': 'Explains the process of converting images into matrices, creating mathematical functions to predict outcomes, error rate calculation, and setting default learning rates, aiming to simplify deep learning concepts and encourage understanding.', 'duration': 514.791, 'highlights': ['Creating a mathematical function to convert image pixels into probabilities for each possible class The chapter explains the process of creating a mathematical function that converts the numbers from images into probabilities for each possible class, simplifying the concept of deep learning.', 'Explanation of error rate calculation and application to validation set The chapter details the calculation of error rate as 1 minus accuracy, applied to the validation set as a best practice.', 'Understanding default learning rates and their significance in initial fine-tuning The chapter discusses the significance of default learning rates, such as 3e neg 3, as a good default learning rate that works most of the time for initial fine-tuning in deep learning models.']}, {'end': 4287.648, 'start': 3725.942, 'title': 'Linear algebra for machine learning', 'summary': 'Explains the use of linear algebra in machine learning, using the example of a linear equation y = a1x1 + a2x2 to represent multiple data points, and discusses the importance of having enough data for training models.', 'duration': 561.706, 'highlights': ['The use of linear algebra in machine learning is exemplified through the representation of a linear equation y = a1x1 + a2x2 to model multiple data points, demonstrating the application of linear algebra in machine learning. linear algebra, machine learning, linear equation, multiple data points', 'The importance of having enough data for training models is emphasized, with the recommendation to gather more data to improve accuracy and lower error rates without overfitting. data gathering, training models, accuracy, error rates, overfitting']}], 'duration': 2194.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw2093074.jpg', 'highlights': ['The importance of reducing biased noisy data is emphasized, as it can significantly impact model accuracy, and even a marginal improvement after data cleaning is considered normal.', 'Practicality of running models in production on CPUs over GPUs is emphasized, highlighting the ease of scaling and cost-effectiveness of using CPUs for most tasks.', 'The chapter emphasizes the usage of CPUs for inference in deep learning for apps not at Google scale, based on deep learning, and provides insights into the process of using pre-trained models for prediction.', 'Creating a data bunch for inference involves passing the same information used for training, such as transforms, size, and normalization, to perform inference using a fake data bunch.', 'The chapter suggests trying Starlet and Python Anywhere for creating and hosting web applications, providing a cost-effective approach for deploying web apps without the need for significant financial investment.', 'Understanding the impact of learning rate on validation loss and the need to adjust it based on validation loss improvements or deteriorations.', 'The chapter explains the process of creating a mathematical function that converts the numbers from images into probabilities for each possible class, simplifying the concept of deep learning.', 'The use of linear algebra in machine learning is exemplified through the representation of a linear equation y = a1x1 + a2x2 to model multiple data points, demonstrating the application of linear algebra in machine learning.', 'The importance of having enough data for training models is emphasized, with the recommendation to gather more data to improve accuracy and lower error rates without overfitting.']}, {'end': 5166.08, 'segs': [{'end': 4385.42, 'src': 'embed', 'start': 4328.789, 'weight': 0, 'content': [{'end': 4340.358, 'text': "yeah and I understand we're going to be learning all about this shortly you don't There's no copy of ResNet-34.", 'start': 4328.789, 'duration': 11.569}, {'end': 4343.599, 'text': 'ResNet-34 is actually what we call an architecture.', 'start': 4340.498, 'duration': 3.101}, {'end': 4344.76, 'text': "We're going to be learning a lot about this.", 'start': 4343.619, 'duration': 1.141}, {'end': 4345.84, 'text': "It's a functional form.", 'start': 4344.84, 'duration': 1}, {'end': 4349.082, 'text': 'Just like this is a linear functional form.', 'start': 4346.381, 'duration': 2.701}, {'end': 4351.003, 'text': "It doesn't take up any room.", 'start': 4349.522, 'duration': 1.481}, {'end': 4352.043, 'text': "It doesn't contain anything.", 'start': 4351.043, 'duration': 1}, {'end': 4352.823, 'text': "It's just a function.", 'start': 4352.063, 'duration': 0.76}, {'end': 4354.704, 'text': 'ResNet-34 is just a function.', 'start': 4353.164, 'duration': 1.54}, {'end': 4356.485, 'text': "It doesn't contain anything.", 'start': 4355.004, 'duration': 1.481}, {'end': 4357.666, 'text': "It doesn't store anything.", 'start': 4356.505, 'duration': 1.161}, {'end': 4366.327, 'text': "I think the confusion here is that we often use a pre-trained neural net that's been learnt on ImageNet.", 'start': 4358.026, 'duration': 8.301}, {'end': 4370.81, 'text': "In this case, we don't need to use a pre-trained neural net.", 'start': 4366.967, 'duration': 3.843}, {'end': 4385.42, 'text': 'And actually, to entirely avoid that, even getting created, you can actually pass pre-trained equals false,', 'start': 4371.31, 'duration': 14.11}], 'summary': "Resnet-34 is a functional form, not containing or storing anything, and doesn't require pre-training.", 'duration': 56.631, 'max_score': 4328.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4328789.jpg'}, {'end': 4566.093, 'src': 'embed', 'start': 4537.836, 'weight': 3, 'content': [{'end': 4539.616, 'text': "So that's where the S would be student.", 'start': 4537.836, 'duration': 1.78}, {'end': 4541.637, 'text': 'That would be student gradient descent.', 'start': 4539.656, 'duration': 1.981}, {'end': 4544.438, 'text': "So that's version 1 of SGD.", 'start': 4542.537, 'duration': 1.901}, {'end': 4549.319, 'text': "Version 2 of SGD, which is what we're going to talk about today, is where we're going to have a computer,", 'start': 4545.038, 'duration': 4.281}, {'end': 4555.18, 'text': 'try lots of things and try and come up with a really good function, and that will be called stochastic gradient descent.', 'start': 4549.319, 'duration': 5.861}, {'end': 4563.371, 'text': 'So, The other one that you hear a lot on Twitter is stochastic grad student dissent.', 'start': 4556.041, 'duration': 7.33}, {'end': 4566.093, 'text': "So that's the other one that you hear.", 'start': 4563.531, 'duration': 2.562}], 'summary': 'Stochastic gradient descent (sgd) has two versions: version 1 is student gradient descent, while version 2 involves a computer trying various functions, known as stochastic grad student dissent.', 'duration': 28.257, 'max_score': 4537.836, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4537836.jpg'}, {'end': 4778.118, 'src': 'embed', 'start': 4750.353, 'weight': 4, 'content': [{'end': 4752.875, 'text': 'A vector of length 4 would be a tensor.', 'start': 4750.353, 'duration': 2.522}, {'end': 4759.185, 'text': 'A 3D array of length 3 by 4 by 6 would be a tensor.', 'start': 4753.856, 'duration': 5.329}, {'end': 4760.826, 'text': "That's all a tensor is.", 'start': 4759.966, 'duration': 0.86}, {'end': 4765.389, 'text': 'Okay, and so we have these all the time.', 'start': 4761.607, 'duration': 3.782}, {'end': 4770.953, 'text': 'For example, an image is a three-dimensional tensor.', 'start': 4765.69, 'duration': 5.263}, {'end': 4778.118, 'text': "It's got number of rows by number of columns by number of channels, normally red, green, blue.", 'start': 4771.494, 'duration': 6.624}], 'summary': 'Tensors can be of various dimensions, e.g., images are 3d tensors with dimensions for rows, columns, and channels.', 'duration': 27.765, 'max_score': 4750.353, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4750353.jpg'}], 'start': 4288.889, 'title': 'Resnet-34 in cnn and stochastic gradient descent in pytorch', 'summary': 'Discusses the misconception of resnet-34 in a cnn model, clarifies its function without taking up storage, and provides a method to avoid loading pre-trained neural net, saving 0.2 seconds at inference time. additionally, it covers the basics of stochastic gradient descent (sgd), creating synthetic data for a linear model in pytorch, and understanding tensors and their usage in deep learning, with a focus on the key concepts of pytorch.', 'chapters': [{'end': 4486.508, 'start': 4288.889, 'title': 'Understanding resnet-34 in cnn', 'summary': "Discusses the misconception of resnet-34 in a cnn model, clarifies that it's just a function without taking up storage, and provides a method to avoid loading pre-trained neural net, which can save 0.2 seconds at inference time.", 'duration': 197.619, 'highlights': ['The misconception that a model created by .save, which is about 85 megabytes on disk, would be able to run without also needing a copy of ResNet-34 is clarified as ResNet-34 is just a function without containing or storing anything.', 'A method to avoid loading pre-trained neural net is provided by passing pre-trained equals false, which can save 0.2 seconds at inference time and allows writing the code in PyTorch with no loops, resulting in faster execution.', "The explanation of ResNet-34 as a mathematical function that doesn't take any storage and doesn't have to be loaded, compared to the storage requirement for pre-trained models, is emphasized, providing insights into the efficient handling of neural network architecture in CNN models."]}, {'end': 5166.08, 'start': 4487.95, 'title': 'Stochastic gradient descent in pytorch', 'summary': 'Covers the basics of stochastic gradient descent (sgd), creating synthetic data for a linear model in pytorch, and understanding tensors and their usage in deep learning, with a focus on the key concepts of pytorch.', 'duration': 678.13, 'highlights': ['The chapter explains the concept of stochastic gradient descent (SGD) and its two versions - student gradient descent and stochastic gradient descent, providing a comprehensive overview of the topic. It provides an overview of stochastic gradient descent (SGD) and its two versions - student gradient descent and stochastic gradient descent.', 'The transcript delves into the creation of synthetic data for a linear model in PyTorch, with specific coefficients and a matrix product between x and a, demonstrating the practical application of PyTorch for creating and manipulating data. It explains the creation of synthetic data for a linear model in PyTorch, involving specific coefficients and a matrix product between x and a.', 'The concept of tensors in deep learning is thoroughly discussed, emphasizing their significance in representing data and their various dimensions and ranks, offering a fundamental understanding of tensors and their usage in deep learning. It thoroughly discusses the concept of tensors in deep learning, emphasizing their significance in representing data and their various dimensions and ranks.']}], 'duration': 877.191, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw4288889.jpg', 'highlights': ['A method to avoid loading pre-trained neural net is provided by passing pre-trained equals false, saving 0.2 seconds at inference time.', 'The misconception that a model created by .save would be able to run without needing a copy of ResNet-34 is clarified as ResNet-34 is just a function without containing or storing anything.', "The explanation of ResNet-34 as a mathematical function that doesn't take any storage and doesn't have to be loaded is emphasized, providing insights into the efficient handling of neural network architecture in CNN models.", 'The chapter explains the concept of stochastic gradient descent (SGD) and its two versions - student gradient descent and stochastic gradient descent, providing a comprehensive overview of the topic.', 'The concept of tensors in deep learning is thoroughly discussed, emphasizing their significance in representing data and their various dimensions and ranks.']}, {'end': 5635.159, 'segs': [{'end': 5299.737, 'src': 'embed', 'start': 5274.559, 'weight': 0, 'content': [{'end': 5282.304, 'text': "It's like, I know, and we can't print the 50 million numbers anymore, but it is literally, identically doing the same thing.", 'start': 5274.559, 'duration': 7.745}, {'end': 5292.61, 'text': 'And the reason this is hard to digest is that the human brain has a lot of trouble conceptualizing of what an equation with 50 million numbers looks like and can do.', 'start': 5282.664, 'duration': 9.946}, {'end': 5296.092, 'text': 'So you just kind of now will have to take my word for it.', 'start': 5293.23, 'duration': 2.862}, {'end': 5299.737, 'text': 'it can do things like recognize teddy bits.', 'start': 5296.596, 'duration': 3.141}], 'summary': 'Equation with 50 million numbers can recognize teddy bits.', 'duration': 25.178, 'max_score': 5274.559, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5274559.jpg'}, {'end': 5477.886, 'src': 'embed', 'start': 5438.018, 'weight': 3, 'content': [{'end': 5440.639, 'text': 'you may also see RMSE, which is root mean squared error.', 'start': 5438.018, 'duration': 2.621}, {'end': 5443.299, 'text': 'And so the mean squared error is a loss.', 'start': 5441.239, 'duration': 2.06}, {'end': 5447.5, 'text': "it's the difference between some prediction that you've made okay,", 'start': 5443.299, 'duration': 4.201}, {'end': 5453.561, 'text': 'which you know is like the value at the line and the actual number of ice cream sales.', 'start': 5447.5, 'duration': 6.061}, {'end': 5463.444, 'text': 'And so, in the mathematics of this, people normally refer to the actual, they normally call it Y, and the prediction they normally call it Y.', 'start': 5454.862, 'duration': 8.582}, {'end': 5465.244, 'text': 'hat, as in they write it.', 'start': 5463.444, 'duration': 1.8}, {'end': 5477.886, 'text': "And so what I try to do, like when we're writing something like a, you know, mean squared error equation,", 'start': 5470.601, 'duration': 7.285}], 'summary': 'Rmse is a measure of prediction accuracy in ice cream sales, using y and y hat.', 'duration': 39.868, 'max_score': 5438.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5438018.jpg'}, {'end': 5566.996, 'src': 'embed', 'start': 5530.74, 'weight': 5, 'content': [{'end': 5538.367, 'text': 'And so then we can take the mean of that to find the average square of the differences between the actuals and the predictors.', 'start': 5530.74, 'duration': 7.627}, {'end': 5562.174, 'text': "So if you're more comfortable with mathematical notation, what we just wrote there was the sum of y hat minus y squared over n.", 'start': 5539.767, 'duration': 22.407}, {'end': 5566.996, 'text': 'So that equation is the same as that equation.', 'start': 5562.174, 'duration': 4.822}], 'summary': 'Calculating the average square of differences using mean and mathematical notation.', 'duration': 36.256, 'max_score': 5530.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5530740.jpg'}], 'start': 5166.461, 'title': 'Fitting line to data and regression loss functions', 'summary': 'Explains the process of fitting a line to 100 points and regression loss functions, emphasizing the scalability of the approach to 50 million numbers, the usage of mse loss function, and its significance in regression problems.', 'chapters': [{'end': 5321.385, 'start': 5166.461, 'title': 'Fitting line to data', 'summary': 'Explains the process of fitting a line to 100 points using an approach that works equally well for 50 million numbers, utilizing an almost identical technique, and training models with functions that can recognize objects.', 'duration': 154.924, 'highlights': ['The chapter explains the process of fitting a line to 100 points using an approach that works equally well for 50 million numbers, utilizing an almost identical technique. The technique used to fit a line to 100 points works equally well for 50 million numbers, demonstrating its versatility and scalability.', 'Training models with functions that can recognize objects. The functions learned in the process can be very powerful, allowing models to recognize objects like teddy bears, emphasizing the practical applications of the technique.', 'The concept of fitting a line to data is hard to digest for many, as it involves dealing with equations containing 50 million numbers. The difficulty in conceptualizing equations with 50 million numbers is highlighted, underscoring the challenge in understanding the process of fitting lines to data.']}, {'end': 5635.159, 'start': 5322.242, 'title': 'Regression loss functions', 'summary': 'Explains the concept of regression and the mean squared error (mse) loss function used to minimize the error between the line and the points, highlighting its importance in regression problems and the mathematical calculation involved.', 'duration': 312.917, 'highlights': ['The most common loss function for regression problems is the mean squared error (MSE), which is used to measure the difference between the predicted values (Y hat) and the actual values (Y), providing a mathematical representation for minimizing the error between the line and the points. Mean squared error (MSE) is the most common loss function for regression problems, measuring the difference between predicted values (Y hat) and actual values (Y). It provides a mathematical representation for minimizing the error between the line and the points.', 'The mean squared error (MSE) involves squaring the differences between predicted and actual values and then taking the average of those squared differences to quantify the average square of the differences between the actuals and the predictors. Mean squared error (MSE) involves squaring the differences between predicted and actual values, then taking the average of those squared differences to quantify the average square of the differences between the actuals and the predictors.', 'The mathematical function for mean squared error (MSE) is expressed as the sum of (Y hat minus Y) squared divided by n, providing a formal representation for the calculation of the mean squared error. The mathematical function for mean squared error (MSE) is expressed as the sum of (Y hat minus Y) squared divided by n, providing a formal representation for the calculation of the mean squared error.']}], 'duration': 468.698, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5166461.jpg', 'highlights': ['The technique used to fit a line to 100 points works equally well for 50 million numbers, demonstrating its versatility and scalability.', 'The concept of fitting a line to data is hard to digest for many, as it involves dealing with equations containing 50 million numbers.', 'The functions learned in the process can be very powerful, allowing models to recognize objects like teddy bears, emphasizing the practical applications of the technique.', 'Mean squared error (MSE) is the most common loss function for regression problems, measuring the difference between predicted values (Y hat) and actual values (Y).', 'Mean squared error (MSE) involves squaring the differences between predicted and actual values, then taking the average of those squared differences to quantify the average square of the differences between the actuals and the predictors.', 'The mathematical function for mean squared error (MSE) is expressed as the sum of (Y hat minus Y) squared divided by n, providing a formal representation for the calculation of the mean squared error.']}, {'end': 7125.218, 'segs': [{'end': 5799.806, 'src': 'embed', 'start': 5736.349, 'weight': 0, 'content': [{'end': 5737.99, 'text': "They don't bother with a zero, right?", 'start': 5736.349, 'duration': 1.641}, {'end': 5748.652, 'text': "But if you want to actually see exactly what it is, you can write dot type and you can see it's a float tensor, okay?", 'start': 5738.37, 'duration': 10.282}, {'end': 5761.283, 'text': 'And so now we can calculate our predictions with this, like random guess x at a matrix product of x and a,', 'start': 5751.796, 'duration': 9.487}, {'end': 5766.888, 'text': "and we can now calculate the mean squared error of our predictions and our actuals, and that's our loss.", 'start': 5761.283, 'duration': 5.605}, {'end': 5773.313, 'text': 'Okay, so for this regression our loss is 8.9.', 'start': 5767.968, 'duration': 5.345}, {'end': 5783.326, 'text': 'And so we can now plot scatterplot of x against y, and we can plot the scatterplot of x against y hat, our predictions, and there they are.', 'start': 5773.313, 'duration': 10.013}, {'end': 5791.674, 'text': "Okay, so this is the line, sorry, line, and here's our actuals.", 'start': 5784.547, 'duration': 7.127}, {'end': 5794.657, 'text': "So that's not great, which is not surprising, it's just a guess.", 'start': 5792.054, 'duration': 2.603}, {'end': 5799.806, 'text': 'So SGD or gradient descent more generally,', 'start': 5795.778, 'duration': 4.028}], 'summary': 'Using float tensor, calculated mean squared error, loss is 8.9 for regression.', 'duration': 63.457, 'max_score': 5736.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5736349.jpg'}, {'end': 6263.872, 'src': 'embed', 'start': 6230.694, 'weight': 3, 'content': [{'end': 6240.628, 'text': 'Our learning rate was too high and meant that we jumped all the way past the right answer further than we started with,', 'start': 6230.694, 'duration': 9.934}, {'end': 6243.049, 'text': 'and it got worse and worse and worse.', 'start': 6240.628, 'duration': 2.421}, {'end': 6247.01, 'text': "So that's what a learning rate too high does.", 'start': 6244.409, 'duration': 2.601}, {'end': 6258.433, 'text': 'On the other hand, if our learning rate is too low, then you just take tiny little steps.', 'start': 6251.271, 'duration': 7.162}, {'end': 6263.872, 'text': "And so eventually you're going to get there, doing lots and lots of calculations along the way.", 'start': 6259.814, 'duration': 4.058}], 'summary': 'High learning rate led to overshooting, low rate led to slow progress', 'duration': 33.178, 'max_score': 6230.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6230694.jpg'}, {'end': 6321.368, 'src': 'embed', 'start': 6290.538, 'weight': 2, 'content': [{'end': 6291.478, 'text': "And so that's all it does.", 'start': 6290.538, 'duration': 0.94}, {'end': 6297.784, 'text': "So if you look inside the source code of any deep learning library, you'll find this.", 'start': 6291.923, 'duration': 5.861}, {'end': 6303.805, 'text': "You'll find something that just says coefficients dot subtract learning rate times gradient.", 'start': 6298.504, 'duration': 5.301}, {'end': 6312.106, 'text': "And we'll learn about some minor, not minor, we'll learn about some easy but important optimizations we can do to make this go faster.", 'start': 6305.285, 'duration': 6.821}, {'end': 6315.367, 'text': "But that's basically it.", 'start': 6314.507, 'duration': 0.86}, {'end': 6321.368, 'text': "There's a couple of other little minor issues that we don't need to talk about now, one involving zeroing out the gradients.", 'start': 6316.007, 'duration': 5.361}], 'summary': 'Deep learning libraries use coefficients, learning rate, and gradients for optimization.', 'duration': 30.83, 'max_score': 6290.538, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6290538.jpg'}, {'end': 6458.394, 'src': 'embed', 'start': 6429.182, 'weight': 4, 'content': [{'end': 6433.966, 'text': 'And so did that 100 times, waiting 20 milliseconds after each one, and there it is.', 'start': 6429.182, 'duration': 4.784}, {'end': 6445.608, 'text': "Right?. So you might think that, like visualizing your algorithms with animations, is some amazing and complex thing to do, but actually, now you know it's 1,", 'start': 6433.986, 'duration': 11.622}, {'end': 6449.092, 'text': '2, 3, 4, 5, 6, 7, 8, 9, 10, 11 lines of code.', 'start': 6445.608, 'duration': 3.484}, {'end': 6454.754, 'text': 'Okay, so I think that is pretty damn cool.', 'start': 6450.713, 'duration': 4.041}, {'end': 6458.394, 'text': 'So that is SGD visualized.', 'start': 6455.814, 'duration': 2.58}], 'summary': 'Visualizing algorithms with animations is achieved by 11 lines of code.', 'duration': 29.212, 'max_score': 6429.182, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6429182.jpg'}, {'end': 6514.725, 'src': 'embed', 'start': 6481.238, 'weight': 5, 'content': [{'end': 6482.178, 'text': 'And try to get a feel for it.', 'start': 6481.238, 'duration': 0.94}, {'end': 6484.499, 'text': 'Maybe you can even try a 3D plot.', 'start': 6482.838, 'duration': 1.661}, {'end': 6486.72, 'text': "I haven't tried that yet, but I'm sure it would work fine too.", 'start': 6484.699, 'duration': 2.021}, {'end': 6495.802, 'text': 'So, the only difference between stochastic gradient descent and this is something called mini-batches.', 'start': 6488.48, 'duration': 7.322}, {'end': 6502.34, 'text': "You'll see what we did here was we calculated the value of the loss on the whole data set on every iteration.", 'start': 6496.417, 'duration': 5.923}, {'end': 6508.702, 'text': "But if your data set is one and a half million images in ImageNet, that's going to be really slow right?", 'start': 6503.34, 'duration': 5.362}, {'end': 6514.725, 'text': "Just to do a single update of your parameters, you've got to calculate the loss on one and a half million images.", 'start': 6508.722, 'duration': 6.003}], 'summary': 'Comparison between stochastic gradient descent and mini-batch gradient descent in handling large datasets.', 'duration': 33.487, 'max_score': 6481.238, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6481238.jpg'}, {'end': 6828.558, 'src': 'embed', 'start': 6798.684, 'weight': 6, 'content': [{'end': 6805.727, 'text': "that your thoughts are actually there because somebody has told you you're not a math person,", 'start': 6798.684, 'duration': 7.043}, {'end': 6809.709, 'text': "but there's actually no academic research to suggest that there is such a thing.", 'start': 6805.727, 'duration': 3.982}, {'end': 6818.312, 'text': 'In fact, there are some cultures like Romania and China, where the not a math person concept never even appeared.', 'start': 6810.129, 'duration': 8.183}, {'end': 6828.558, 'text': "It's almost unheard of in some cultures for somebody to say, I'm not a math person, because they just never entered that cultural identity.", 'start': 6820.073, 'duration': 8.485}], 'summary': "Cultural differences show no evidence of 'not a math person' concept, with examples from romania and china.", 'duration': 29.874, 'max_score': 6798.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6798684.jpg'}, {'end': 6909.882, 'src': 'embed', 'start': 6882.001, 'weight': 9, 'content': [{'end': 6889.925, 'text': "Okay, so the last thing I want to close with is the idea of, and we're going to look at this more next week, underfitting and overfitting.", 'start': 6882.001, 'duration': 7.924}, {'end': 6896.295, 'text': 'we just fit a line to our data.', 'start': 6892.933, 'duration': 3.362}, {'end': 6900.597, 'text': "But imagine that our data wasn't actually line shaped right?", 'start': 6896.595, 'duration': 4.002}, {'end': 6909.882, 'text': "And so if we tried to fit something which was like constant plus constant times x i.e. a line to it, then it's never going to fit very well right?", 'start': 6900.897, 'duration': 8.985}], 'summary': 'Discussing underfitting and overfitting, and the limitations of fitting a line to non-linear data.', 'duration': 27.881, 'max_score': 6882.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6882001.jpg'}, {'end': 7016.856, 'src': 'embed', 'start': 6968.006, 'weight': 7, 'content': [{'end': 6973.748, 'text': "There are other ways to make sure that we don't overfit, and in general this is called regularization.", 'start': 6968.006, 'duration': 5.742}, {'end': 6985.267, 'text': "Regularization are all the techniques to make sure that when we train our model, it's going to work not only well on the data it's seen,", 'start': 6974.509, 'duration': 10.758}, {'end': 6986.828, 'text': "but on the data it hasn't seen yet.", 'start': 6985.267, 'duration': 1.561}, {'end': 6997.437, 'text': "So the most important thing to know when you've trained a model is actually how well does it work on data that it hasn't been trained with.", 'start': 6988.029, 'duration': 9.408}, {'end': 7004.042, 'text': "And so as we're going to learn a lot about next week, that's why we have this thing called a validation set.", 'start': 6998.337, 'duration': 5.705}, {'end': 7016.856, 'text': 'So what happens with a validation set is that we do our mini-batch STD training loop with one set of data, with one set of teddy bears, grizzlies,', 'start': 7004.582, 'duration': 12.274}], 'summary': 'Regularization techniques prevent overfitting and ensure model generalization to unseen data.', 'duration': 48.85, 'max_score': 6968.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw6968006.jpg'}], 'start': 5635.159, 'title': 'Neural network training and optimization', 'summary': 'Delves into neural network training, loss function, stochastic gradient descent, and overfitting, with a mean squared error of 8.9 and the process of parameter optimization using gradient descent and stochastic gradient descent, aiming to prevent overfitting and underfitting.', 'chapters': [{'end': 5799.806, 'start': 5635.159, 'title': 'Neural network training and loss function', 'summary': 'Explores the process of neural network training, including the concept of the loss function, guessing initial parameter values, calculating predictions, mean squared error, and the resulting loss value of 8.9 for a specific regression, along with the implementation of stochastic gradient descent.', 'duration': 164.647, 'highlights': ['The loss function is used to assess the quality of the line that fits the data, with the resulting loss value for a specific regression being 8.9. The loss function is pivotal in evaluating the accuracy of the line fitting the data, with a quantifiable loss value of 8.9 obtained for a specific regression.', 'The process involves guessing initial parameter values, in this case, a1 and a2 being both 1, and then creating a tensor for the guess. The process starts with guessing initial parameter values, such as a1 and a2 both being 1, and then creating a tensor to represent the guess.', 'The predictions are calculated using a matrix product of the input and the guessed parameters, followed by the calculation of the mean squared error between the predictions and the actual values. The predictions are derived using a matrix product of the input and the initial parameter guess, with the subsequent computation of the mean squared error between the predicted and actual values.']}, {'end': 6340.751, 'start': 5799.806, 'title': 'Gradient descent in machine learning', 'summary': 'Discusses the concept of gradient descent for optimizing parameters in machine learning models, emphasizing the calculation of derivatives, the use of learning rate, and the iterative process of updating coefficients to minimize the loss function.', 'duration': 540.945, 'highlights': ['The chapter emphasizes the calculation of derivatives and the use of learning rate in the gradient descent process. The derivative is utilized to determine the impact of changing parameters on the mean squared error (MSE), while the learning rate is crucial in controlling the magnitude of parameter updates.', 'The iterative process of updating coefficients involves subtracting the gradients multiplied by the learning rate, aiming to minimize the loss function. The coefficients are updated iteratively by subtracting the gradients multiplied by the learning rate, with the objective of minimizing the loss function through controlled parameter adjustments.', 'The importance of selecting an appropriate learning rate is highlighted, as it influences the magnitude of parameter updates and directly impacts the convergence of the optimization process. The learning rate plays a crucial role in determining the speed and stability of the optimization process, with a high learning rate leading to overshooting and divergence, and a low learning rate resulting in slow convergence.']}, {'end': 6879.6, 'start': 6341.912, 'title': 'Understanding stochastic gradient descent', 'summary': 'Introduces stochastic gradient descent (sgd) as a method to optimize parameters in a model, using visualizations and explanations to demystify the process. it explains the concept of mini-batches, learning rate, epochs, architecture, parameters, and loss functions, providing insights into the mathematical aspects and dispelling misconceptions about math proficiency.', 'duration': 537.688, 'highlights': ['SGD visualized with matplotlib animation simplifies complex concepts into 11 lines of code, making it accessible and easy to understand. Visualizing SGD with animation simplifies complex concepts, making it accessible and easy to understand.', 'Explanation of mini-batches and their role in speeding up parameter updates, providing a practical understanding of stochastic gradient descent. Mini-batches speed up parameter updates, providing a practical understanding of stochastic gradient descent.', 'Dispelling misconceptions about math proficiency and providing insights into cultural influences on learning math, encouraging learners to embrace mathematical concepts. Dispelling misconceptions about math proficiency and providing insights into cultural influences on learning math.']}, {'end': 7125.218, 'start': 6882.001, 'title': 'Overfitting, underfitting, and regularization', 'summary': 'Emphasizes the concepts of overfitting, underfitting, and regularization to ensure models work well on unseen data, highlighted by the importance of validation sets and techniques for preventing overfitting.', 'duration': 243.217, 'highlights': ['The importance of validation sets is emphasized to evaluate how well a model generalizes to unseen data.', 'Regularization techniques are discussed to prevent overfitting and ensure models work well on unseen data.', 'The concepts of overfitting and underfitting are explained in the context of fitting mathematical functions to data to ensure optimal model performance.']}], 'duration': 1490.059, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ccMHJeQU4Qw/pics/ccMHJeQU4Qw5635159.jpg', 'highlights': ['The loss function is pivotal in evaluating the accuracy of the line fitting the data, with a quantifiable loss value of 8.9 obtained for a specific regression.', 'The predictions are derived using a matrix product of the input and the initial parameter guess, with the subsequent computation of the mean squared error between the predicted and actual values.', 'The iterative process of updating coefficients involves subtracting the gradients multiplied by the learning rate, aiming to minimize the loss function.', 'The learning rate plays a crucial role in determining the speed and stability of the optimization process, with a high learning rate leading to overshooting and divergence, and a low learning rate resulting in slow convergence.', 'Visualizing SGD with animation simplifies complex concepts, making it accessible and easy to understand.', 'Mini-batches speed up parameter updates, providing a practical understanding of stochastic gradient descent.', 'Dispelling misconceptions about math proficiency and providing insights into cultural influences on learning math.', 'The importance of validation sets is emphasized to evaluate how well a model generalizes to unseen data.', 'Regularization techniques are discussed to prevent overfitting and ensure models work well on unseen data.', 'The concepts of overfitting and underfitting are explained in the context of fitting mathematical functions to data to ensure optimal model performance.']}], 'highlights': ['Ethan Sutton achieves 80.5% accuracy in sound data analysis, surpassing the state-of-the-art accuracy of 80%.', 'Suvash achieves a new state-of-the-art accuracy for Devangari text recognition, confirmed by the dataset creator on Twitter.', 'Alina Harley beats the previous best by more than 30% in deep learning, receiving recognition from a VP at a genomics analysis company.', 'Created a bear classifier with 1.4% error rate after a couple of epochs, demonstrating successful model training and accuracy improvement.', 'The importance of reducing biased noisy data is emphasized, as it can significantly impact model accuracy, and even a marginal improvement after data cleaning is considered normal.', 'The process involves identifying mislabeled data by focusing on images where the model was either not confident or confidently wrong, leading to the creation of a new file deleter widget by the San Francisco Fast.ai study group.', 'The technique used to fit a line to 100 points works equally well for 50 million numbers, demonstrating its versatility and scalability.', 'Mean squared error (MSE) is the most common loss function for regression problems, measuring the difference between predicted values (Y hat) and actual values (Y).', 'The loss function is pivotal in evaluating the accuracy of the line fitting the data, with a quantifiable loss value of 8.9 obtained for a specific regression.', 'The iterative process of updating coefficients involves subtracting the gradients multiplied by the learning rate, aiming to minimize the loss function.', 'Visualizing SGD with animation simplifies complex concepts, making it accessible and easy to understand.', 'The importance of validation sets is emphasized to evaluate how well a model generalizes to unseen data.', 'Regularization techniques are discussed to prevent overfitting and ensure models work well on unseen data.', 'The concepts of overfitting and underfitting are explained in the context of fitting mathematical functions to data to ensure optimal model performance.']}