title
Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

description
Lecture 1 gives an introduction to the field of computer vision, discussing its history and key challenges. We emphasize that computer vision encompasses a wide variety of different tasks, and that despite the recent successes of deep learning we are still a long way from realizing the goal of human-level visual intelligence. Keywords: Computer vision, Cambrian Explosion, Camera Obscura, Hubel and Wiesel, Block World, Normalized Cut, Face Detection, SIFT, Spatial Pyramid Matching, Histogram of Oriented Gradients, PASCAL Visual Object Challenge, ImageNet Challenge Slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture1.pdf -------------------------------------------------------------------------------------- Convolutional Neural Networks for Visual Recognition Instructors: Fei-Fei Li: http://vision.stanford.edu/feifeili/ Justin Johnson: http://cs.stanford.edu/people/jcjohns/ Serena Yeung: http://ai.stanford.edu/~syyeung/ Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This lecture collection is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. From this lecture collection, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. Website: http://cs231n.stanford.edu/ For additional learning opportunities please visit: http://online.stanford.edu/

detail
{'title': 'Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition', 'heatmap': [{'end': 840.514, 'start': 796.571, 'weight': 0.723}, {'end': 1289.073, 'start': 1250.02, 'weight': 1}, {'end': 1569.895, 'start': 1453.152, 'weight': 0.722}, {'end': 2196.171, 'start': 2149.786, 'weight': 0.789}, {'end': 2330.569, 'start': 2218.37, 'weight': 0.75}, {'end': 3064.728, 'start': 3022.963, 'weight': 0.821}], 'summary': 'The lecture covers the exponential growth of cs231n class at stanford university, the overwhelming presence of visual data in internet traffic, the interdisciplinary nature of computer vision, the history of vision and computer vision, the evolution of object recognition, advancements in deep learning algorithms, and practical implementation of convolutional neural networks in python.', 'chapters': [{'end': 197.672, 'segs': [{'end': 39.286, 'src': 'embed', 'start': 4.898, 'weight': 0, 'content': [{'end': 5.999, 'text': 'Stanford University.', 'start': 4.898, 'duration': 1.101}, {'end': 9.881, 'text': 'So welcome everyone to CS231N.', 'start': 7.58, 'duration': 2.301}, {'end': 14.525, 'text': "This is an amazing, I'm super excited to offer this class again for the third time.", 'start': 10.442, 'duration': 4.083}, {'end': 20.589, 'text': "It seems that every time we offer this class, it's just growing exponentially, unlike most things in the world.", 'start': 15.105, 'duration': 5.484}, {'end': 23.551, 'text': "So this is the third time we're teaching this class.", 'start': 20.989, 'duration': 2.562}, {'end': 25.572, 'text': 'The first time we had 150 students.', 'start': 23.931, 'duration': 1.641}, {'end': 28.334, 'text': 'Last year we had 350 students, so it doubled.', 'start': 26.032, 'duration': 2.302}, {'end': 31.276, 'text': "This year we've doubled again to about 730 students when I checked this morning.", 'start': 28.654, 'duration': 2.622}, {'end': 39.286, 'text': 'So anyone who was not able to fit into the lecture hall, I apologize.', 'start': 34.358, 'duration': 4.928}], 'summary': 'Cs231n at stanford is growing exponentially, with 730 students this year, doubling from 150 in its first year.', 'duration': 34.388, 'max_score': 4.898, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G44898.jpg'}, {'end': 86.213, 'src': 'embed', 'start': 55.299, 'weight': 2, 'content': [{'end': 59.182, 'text': 'And what is computer vision? Computer vision is really the study of visual data.', 'start': 55.299, 'duration': 3.883}, {'end': 65.646, 'text': "Since there's so many people enrolled in this class, I think I probably don't need to convince you that this is an important problem,", 'start': 60.142, 'duration': 5.504}, {'end': 67.167, 'text': "but I'm still gonna try to do that anyway.", 'start': 65.646, 'duration': 1.521}, {'end': 74.729, 'text': 'So the amount of visual data in our world has really exploded just to a ridiculous degree in the last couple of years.', 'start': 67.967, 'duration': 6.762}, {'end': 78.131, 'text': 'And this is largely a result of the large number of sensors in the world.', 'start': 75.25, 'duration': 2.881}, {'end': 86.213, 'text': 'So probably most of us in this room are carrying around smartphones, and each smartphone has one, two, or maybe even three cameras on it.', 'start': 78.611, 'duration': 7.602}], 'summary': 'Computer vision is vital due to the explosion of visual data from numerous sensors and smartphones.', 'duration': 30.914, 'max_score': 55.299, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G455299.jpg'}, {'end': 123.701, 'src': 'embed', 'start': 90.715, 'weight': 1, 'content': [{'end': 97.439, 'text': "And as a result of all these sensors, there's just a crazy large massive amount of visual data being produced out there in the world each day.", 'start': 90.715, 'duration': 6.724}, {'end': 108.587, 'text': 'So one statistic that I really like to kind of put this in perspective is a 2015 study from Cisco that estimated that by 2017,', 'start': 98.2, 'duration': 10.387}, {'end': 113.53, 'text': 'which is where we are now that roughly 80% of all traffic on the internet would be video.', 'start': 108.587, 'duration': 4.943}, {'end': 123.701, 'text': 'So this is not even counting all the images and other types of visual data on the web, but just from a pure number of bits perspective.', 'start': 114.17, 'duration': 9.531}], 'summary': 'By 2017, 80% of internet traffic was estimated to be video data.', 'duration': 32.986, 'max_score': 90.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G490715.jpg'}], 'start': 4.898, 'title': 'Computer vision explosion and challenges of visual data', 'summary': 'Discusses the exponential growth of the cs231n class at stanford university, from 150 students to 730 students, reflecting the increasing significance of computer vision. it also highlights the overwhelming presence of visual data, with around 80% of internet traffic being video by 2017, and emphasizes the difficulty in understanding and utilizing this data effectively.', 'chapters': [{'end': 74.729, 'start': 4.898, 'title': 'Cs231n at stanford: computer vision explosion', 'summary': 'Discusses the exponential growth of the cs231n class at stanford university, from 150 students during the first offering to 730 students in the current year, reflecting the increasing significance of computer vision due to the explosive growth of visual data in recent years.', 'duration': 69.831, 'highlights': ['The class size has doubled from 150 students during the first offering to 730 students in the current year. The exponential growth of the CS231N class at Stanford University is evidenced by the increase in student enrollment from 150 to 730.', 'The explosive growth of visual data in recent years has heightened the significance of computer vision. The chapter emphasizes the increasing importance of computer vision, attributing it to the exponential growth of visual data in recent years.']}, {'end': 197.672, 'start': 75.25, 'title': 'Challenges of visual data', 'summary': 'Discusses the overwhelming presence of visual data due to the abundance of sensors, highlighting that by 2017, around 80% of internet traffic would be video, and emphasizes the difficulty in understanding and utilizing this data effectively.', 'duration': 122.422, 'highlights': ['By 2017, roughly 80% of all traffic on the internet would be video, indicating the substantial amount of visual data present.', 'Every second, around five hours of video are uploaded to YouTube, underlining the massive volume of visual content being added constantly.', "Visual data is described as the 'dark matter' of the internet due to its abundance and the challenge it poses for algorithms to comprehend, drawing a comparison with dark matter in physics."]}], 'duration': 192.774, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G44898.jpg', 'highlights': ['The class size has doubled from 150 students to 730 students, reflecting the exponential growth of the CS231N class at Stanford University.', 'Roughly 80% of all internet traffic would be video by 2017, indicating the substantial amount of visual data present.', 'The explosive growth of visual data has heightened the significance of computer vision, emphasizing its increasing importance.']}, {'end': 442.329, 'segs': [{'end': 253.172, 'src': 'embed', 'start': 197.672, 'weight': 0, 'content': [{'end': 203.234, 'text': "but there's no way that they could ever have an employee sit down and watch and understand and annotate every video.", 'start': 197.672, 'duration': 5.562}, {'end': 210.175, 'text': 'So if they want to catalog and serve you relevant videos and maybe monetize by putting ads in those videos.', 'start': 203.874, 'duration': 6.301}, {'end': 215.596, 'text': "it's really crucial that we develop technologies that can dive in and automatically understand the content of visual data.", 'start': 210.175, 'duration': 5.421}, {'end': 226.634, 'text': 'So this field of computer vision is truly an interdisciplinary field and it touches on many different areas of science and engineering and technology.', 'start': 218.368, 'duration': 8.266}, {'end': 234.88, 'text': 'So, obviously computer vision is the center of the universe, but sort of as a constellation of fields around computer vision.', 'start': 227.175, 'duration': 7.705}, {'end': 240.965, 'text': 'we touch on areas like physics because we need to understand optics and image formation and how images are actually physically formed.', 'start': 234.88, 'duration': 6.085}, {'end': 249.129, 'text': 'We need to understand biology and psychology to understand how animal brains physically see and process visual information.', 'start': 241.485, 'duration': 7.644}, {'end': 253.172, 'text': 'We, of course, draw a lot on computer science, mathematics and engineering,', 'start': 249.59, 'duration': 3.582}], 'summary': 'Developing technologies for automatic video understanding is crucial for cataloging and serving relevant videos, and potentially monetizing through ads.', 'duration': 55.5, 'max_score': 197.672, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4197672.jpg'}, {'end': 301.98, 'src': 'embed', 'start': 275.564, 'weight': 3, 'content': [{'end': 280.029, 'text': 'And our lab really focuses on machine learning and the computer science side of things.', 'start': 275.564, 'duration': 4.465}, {'end': 282.712, 'text': 'I work a little bit more on language and vision.', 'start': 280.749, 'duration': 1.963}, {'end': 283.833, 'text': "I've done some projects in that.", 'start': 282.752, 'duration': 1.081}, {'end': 289.2, 'text': 'And other folks in our group have worked a little bit on the neuroscience and cognitive science side of things.', 'start': 284.494, 'duration': 4.706}, {'end': 296.818, 'text': 'So as a bit of introduction, you might be curious about how this course relates to other courses at Stanford.', 'start': 292.116, 'duration': 4.702}, {'end': 301.98, 'text': 'So we kind of assume a basic introductory level understanding of computer vision.', 'start': 297.458, 'duration': 4.522}], 'summary': 'Lab focuses on machine learning, vision, and language. assumes basic understanding of computer vision.', 'duration': 26.416, 'max_score': 275.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4275564.jpg'}, {'end': 378.598, 'src': 'embed', 'start': 333.37, 'weight': 4, 'content': [{'end': 339.932, 'text': "but we're really focusing on the computer vision side of things and really focusing all of our motivation in computer vision.", 'start': 333.37, 'duration': 6.562}, {'end': 346.635, 'text': 'Also concurrently taught this quarter is CS231A, taught by Professor Silvio Savarasi.', 'start': 340.932, 'duration': 5.703}, {'end': 353.038, 'text': 'And CS231A really focuses on, is a more all-encompassing computer vision course.', 'start': 347.195, 'duration': 5.843}, {'end': 359.501, 'text': "It's focusing on things like 3D reconstruction, on matching and robotic vision,", 'start': 353.518, 'duration': 5.983}, {'end': 362.503, 'text': 'and is a bit more all-encompassing with regards to vision than our course.', 'start': 359.501, 'duration': 3.002}, {'end': 370.19, 'text': 'And this course, CS231N, really focuses on a particular class of algorithms revolving around neural networks,', 'start': 363.543, 'duration': 6.647}, {'end': 374.994, 'text': 'and especially convolutional neural networks and their applications to various visual recognition tasks.', 'start': 370.19, 'duration': 4.804}, {'end': 378.598, 'text': "Of course, there's also a number of seminar courses that are taught,", 'start': 376.295, 'duration': 2.303}], 'summary': 'Cs231n focuses on computer vision and convolutional neural networks, while cs231a covers a broader range of computer vision topics.', 'duration': 45.228, 'max_score': 333.37, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4333370.jpg'}], 'start': 197.672, 'title': 'Computer vision', 'summary': 'Emphasizes the importance of developing computer vision technologies, highlighting its interdisciplinary nature and the focus on machine learning and neural networks in courses cs231a and cs231n at stanford vision lab.', 'chapters': [{'end': 253.172, 'start': 197.672, 'title': 'Importance of computer vision', 'summary': 'Emphasizes the importance of developing computer vision technologies to automatically understand and categorize visual data for cataloging and serving relevant videos, with the field touching on various disciplines including physics, biology, psychology, computer science, mathematics, and engineering.', 'duration': 55.5, 'highlights': ['Computer vision technologies are crucial for cataloging and serving relevant videos and potentially monetizing them through ads, as it is infeasible for employees to manually annotate every video, emphasizing the need for automated content understanding.', 'The interdisciplinary nature of computer vision involves fields such as physics, biology, psychology, computer science, mathematics, and engineering, highlighting the wide-ranging impact and relevance of this technology.', 'Understanding optics, image formation, and animal brain processing is necessary in computer vision, drawing on areas like physics, biology, and psychology, demonstrating the diverse knowledge base required for advancements in this field.', 'The chapter emphasizes the interdisciplinary nature of computer vision, involving physics, biology, psychology, computer science, mathematics, and engineering, showcasing its broad impact and relevance across various scientific and technological domains.']}, {'end': 333.37, 'start': 253.172, 'title': 'Computer vision introduction', 'summary': 'Introduces the stanford vision lab and the focus on machine learning and computer science, assuming a basic understanding of computer vision, and mentions the intersection of deep learning and natural language processing course taught by professors manning and socher.', 'duration': 80.198, 'highlights': ['The Stanford Vision Lab is focused on machine learning and computer science, with the instructor being a PhD student in the lab.', 'Assumes a basic introductory level understanding of computer vision, suggesting that undergrads who have not seen computer vision before may have taken course CS131.', 'Mentions the overlap between the current course and the previous course taught by Professors Manning and Socher about the intersection of deep learning and natural language processing.']}, {'end': 442.329, 'start': 333.37, 'title': 'Focus on computer vision', 'summary': 'Discusses the focus on computer vision in courses cs231a and cs231n, with emphasis on neural networks and convolutional neural networks, and the importance of understanding the history of computer vision.', 'duration': 108.959, 'highlights': ['CS231N focuses on neural networks and convolutional neural networks for visual recognition tasks.', 'CS231A is an all-encompassing computer vision course covering 3D reconstruction, matching, and robotic vision.', 'Understanding the history of computer vision is critical for the development of convolutional neural networks.']}], 'duration': 244.657, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4197672.jpg', 'highlights': ['Computer vision technologies automate video annotation, crucial for content understanding and monetization.', 'Interdisciplinary nature of computer vision spans physics, biology, psychology, computer science, mathematics, and engineering.', 'Understanding optics, image formation, and animal brain processing is essential in computer vision.', 'Stanford Vision Lab focuses on machine learning and computer science.', 'CS231N emphasizes neural networks and convolutional neural networks for visual recognition tasks.', 'CS231A covers 3D reconstruction, matching, and robotic vision.']}, {'end': 1385.287, 'segs': [{'end': 639.887, 'src': 'embed', 'start': 540.175, 'weight': 0, 'content': [{'end': 550.9, 'text': 'From the studies of fossils, he discovered around 540 million years ago, the first animals developed eyes.', 'start': 540.175, 'duration': 10.725}, {'end': 561.612, 'text': 'And the onset of vision started this explosive speciation phase.', 'start': 552.426, 'duration': 9.186}, {'end': 564.193, 'text': 'Animals can suddenly see.', 'start': 562.372, 'duration': 1.821}, {'end': 568.336, 'text': 'Once you can see, life becomes much more proactive.', 'start': 564.834, 'duration': 3.502}, {'end': 574.498, 'text': 'Some predators went after preys and preys have to escape from predators.', 'start': 569.056, 'duration': 5.442}, {'end': 587.362, 'text': 'So the evolution or onset of vision started an evolutionary arms race and animals had to evolve quickly in order to survive as a species.', 'start': 575.198, 'duration': 12.164}, {'end': 592.187, 'text': 'So that was the beginning of vision in animals.', 'start': 588.985, 'duration': 3.202}, {'end': 603.674, 'text': 'After 540 million years, vision has developed into the biggest sensory system of almost all animals, especially intelligent animals.', 'start': 592.968, 'duration': 10.706}, {'end': 613.161, 'text': 'In humans, we have almost 50% of the neurons in our cortex involved in visual processing.', 'start': 604.355, 'duration': 8.806}, {'end': 626.17, 'text': 'It is the biggest sensory system that enables us to survive, work, move around, manipulate things, communicate, entertain, and many things.', 'start': 613.761, 'duration': 12.409}, {'end': 633.036, 'text': 'So vision is really important for animals and especially intelligent animals.', 'start': 626.871, 'duration': 6.165}, {'end': 639.887, 'text': 'So that was a quick story of biological vision.', 'start': 634.142, 'duration': 5.745}], 'summary': 'First animals developed eyes 540 million years ago, leading to explosive speciation and an evolutionary arms race, making vision crucial for survival and functionality in intelligent animals.', 'duration': 99.712, 'max_score': 540.175, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4540175.jpg'}, {'end': 741.249, 'src': 'embed', 'start': 705.251, 'weight': 4, 'content': [{'end': 711.013, 'text': 'In the meantime, biologists started studying the mechanism of vision.', 'start': 705.251, 'duration': 5.762}, {'end': 721.617, 'text': 'One of the most influential work, in both human vision or animal vision, as well as that inspired computer vision,', 'start': 711.813, 'duration': 9.804}, {'end': 729.42, 'text': 'is the work done by Hubel and Wiesel in the 50s and 60s using electrophysiology.', 'start': 721.617, 'duration': 7.803}, {'end': 731.521, 'text': 'what they were asking.', 'start': 730.3, 'duration': 1.221}, {'end': 741.249, 'text': 'the question is what was the visual processing mechanism like in primates in mammals?', 'start': 731.521, 'duration': 9.728}], 'summary': "Hubel and wiesel's 50s and 60s work shaped human, animal, and computer vision.", 'duration': 35.998, 'max_score': 705.251, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4705251.jpg'}, {'end': 840.514, 'src': 'heatmap', 'start': 796.571, 'weight': 0.723, 'content': [{'end': 807.698, 'text': 'But by and large, what they discovered is visual processing starts with simple structure of the visual world, oriented edges.', 'start': 796.571, 'duration': 11.127}, {'end': 813.382, 'text': 'And as information moves along the visual processing pathway,', 'start': 808.358, 'duration': 5.024}, {'end': 822.948, 'text': 'the brain builds up the complexity of the visual information until it can recognize the complex visual world.', 'start': 813.382, 'duration': 9.566}, {'end': 830.605, 'text': 'So the history of computer vision also starts around early 60s.', 'start': 824.659, 'duration': 5.946}, {'end': 840.514, 'text': 'Block World is a set of work published by Larry Roberts which is widely known as one of the first,', 'start': 831.465, 'duration': 9.049}], 'summary': 'Visual processing starts with simple structure of the visual world, as information moves along the visual processing pathway, the brain builds up the complexity of the visual information until it can recognize the complex visual world. computer vision history starts around early 60s with block world being one of the first works.', 'duration': 43.943, 'max_score': 796.571, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4796571.jpg'}, {'end': 950.84, 'src': 'embed', 'start': 895.267, 'weight': 3, 'content': [{'end': 909.233, 'text': 'The field of computer vision has blossomed from one summer project into a field of thousands of researchers worldwide still working on some of the most fundamental problems of vision.', 'start': 895.267, 'duration': 13.966}, {'end': 912.634, 'text': 'We still have not yet solved vision.', 'start': 909.693, 'duration': 2.941}, {'end': 922.74, 'text': 'but it has grown into one of the most important and fastest growing areas of artificial intelligence.', 'start': 913.194, 'duration': 9.546}, {'end': 927.944, 'text': 'Another person that we should pay tribute to is David Marr.', 'start': 923.861, 'duration': 4.083}, {'end': 950.84, 'text': 'David Marr was a MIT vision scientist and he has written an influential book in the late 70s about what he thinks vision is and how we should go about computer vision and developing algorithms that can enable computers to recognize the visual world.', 'start': 929.212, 'duration': 21.628}], 'summary': "Computer vision has grown into a fast-growing field with thousands of researchers, yet the problem of vision remains unsolved. david marr's influential book in the late 70s has contributed to the development of computer vision algorithms.", 'duration': 55.573, 'max_score': 895.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4895267.jpg'}, {'end': 1127.359, 'src': 'embed', 'start': 1099.022, 'weight': 6, 'content': [{'end': 1107.428, 'text': 'In Palo Alto, both at Stanford as well as SRI, two groups of scientists have proposed similar ideas.', 'start': 1099.022, 'duration': 8.406}, {'end': 1112.092, 'text': 'One is called generalized cylinder, one is called pictorial structure.', 'start': 1107.648, 'duration': 4.444}, {'end': 1120.396, 'text': 'The basic idea is that Every object is composed of simple geometric primitives.', 'start': 1112.452, 'duration': 7.944}, {'end': 1127.359, 'text': 'For example, a person can be pieced together by generalized cylindrical shapes,', 'start': 1120.796, 'duration': 6.563}], 'summary': 'In palo alto, two groups propose ideas: generalized cylinder and pictorial structure for object recognition.', 'duration': 28.337, 'max_score': 1099.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41099022.jpg'}, {'end': 1289.073, 'src': 'heatmap', 'start': 1191.954, 'weight': 7, 'content': [{'end': 1204.422, 'text': 'So there was a lot of effort in trying to think what is the tasks in computer vision in the 60s, 70s, and 80s.', 'start': 1191.954, 'duration': 12.468}, {'end': 1211.707, 'text': 'And frankly, it was very hard to solve the problem of object recognition.', 'start': 1205.183, 'duration': 6.524}, {'end': 1225.58, 'text': "Everything I've shown you so far are very audacious, ambitious attempts but they remain at the level of toy examples or just a few examples.", 'start': 1211.887, 'duration': 13.693}, {'end': 1234.131, 'text': 'Not a lot of progress has been made in terms of delivering something that can work in the real world.', 'start': 1226.141, 'duration': 7.99}, {'end': 1246.097, 'text': 'So, as people think about what are the problems to solve in vision, one important question came around is if object recognition is too hard,', 'start': 1235.37, 'duration': 10.727}, {'end': 1249.339, 'text': 'maybe we should first do object segmentation.', 'start': 1246.097, 'duration': 3.242}, {'end': 1258.144, 'text': 'That is the task of taking an image and group the pixels into meaningful areas.', 'start': 1250.02, 'duration': 8.124}, {'end': 1260.625, 'text': 'We might not know the pixels.', 'start': 1258.764, 'duration': 1.861}, {'end': 1269.471, 'text': 'that group together is called a person, but we can extract out all the pixels that belong to the person from its background.', 'start': 1260.625, 'duration': 8.846}, {'end': 1272.113, 'text': 'That is called image segmentation.', 'start': 1270.051, 'duration': 2.062}, {'end': 1281.178, 'text': "So here's one very early seminal work by Jitendra Malik and his student, Jianbo Shi from Berkeley,", 'start': 1272.553, 'duration': 8.625}, {'end': 1289.073, 'text': 'from using a graph theory algorithm for the problem of image segmentation.', 'start': 1281.178, 'duration': 7.895}], 'summary': 'In the 60s-80s, object recognition was challenging. focus shifted to image segmentation. progress in computer vision remained limited.', 'duration': 66.19, 'max_score': 1191.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41191954.jpg'}, {'end': 1349.504, 'src': 'embed', 'start': 1322.789, 'weight': 9, 'content': [{'end': 1332.893, 'text': 'These are techniques such as support vector machines, boosting graphical models, including the first wave of neural network.', 'start': 1322.789, 'duration': 10.104}, {'end': 1345.442, 'text': 'And one particular work that made a lot of contribution was using AdaBoost algorithm to do real time face detection by Paul Viola and Michael Jones.', 'start': 1333.653, 'duration': 11.789}, {'end': 1349.504, 'text': "And there's a lot to admire in this work.", 'start': 1346.522, 'duration': 2.982}], 'summary': 'Techniques like support vector machines and adaboost algorithm were used for real-time face detection, making significant contributions in the field.', 'duration': 26.715, 'max_score': 1322.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41322789.jpg'}], 'start': 443.21, 'title': 'Evolution of vision and computer vision history', 'summary': 'Explores the origins of vision, tracing back to 540 million years ago, emphasizing its significance for animal behavior and survival. it also traces the history of mechanical and computer vision, highlighting the rapid growth of computer vision as a crucial area of artificial intelligence with thousands of researchers still working on fundamental problems.', 'chapters': [{'end': 639.887, 'start': 443.21, 'title': 'Evolution of vision', 'summary': 'Explores the origins of vision, tracing back to 540 million years ago, highlighting the rapid speciation phase triggered by the development of eyes and the subsequent impact on animal behavior and survival. it emphasizes the significance of vision as the foremost sensory system for almost all animals, particularly intelligent ones, with humans utilizing nearly 50% of their cortex neurons for visual processing.', 'duration': 196.677, 'highlights': ['The development of eyes around 540 million years ago led to an explosive speciation phase, with the number of animal species increasing from a few to hundreds of thousands within a short period of time, sparking an evolutionary arms race and necessitating quick evolution for survival.', 'Vision has evolved into the most significant sensory system for nearly all animals, including humans, with approximately 50% of the neurons in the human cortex dedicated to visual processing, enabling crucial functions such as survival, movement, communication, and entertainment.', 'The onset of vision initiated a proactive shift in animal behavior, leading to predators pursuing preys and preys adapting to evade predators, marking the beginning of an evolutionary arms race, emphasizing the pivotal role of vision in catalyzing the survival and evolution of species.']}, {'end': 1156.632, 'start': 640.467, 'title': 'History of mechanical vision and computer vision', 'summary': 'Traces the history of mechanical vision from the camera obscura in the renaissance period to the work of hubel and wiesel in the 50s and 60s, the development of computer vision in the 60s and 70s, and the influential thought process of david marr. the field of computer vision has grown into one of the most important and fastest growing areas of artificial intelligence with thousands of researchers still working on fundamental problems.', 'duration': 516.165, 'highlights': ['The field of computer vision has blossomed from one summer project into a field of thousands of researchers worldwide still working on some of the most fundamental problems of vision. Highlights the growth of computer vision as it has developed from a single summer project to a field with thousands of researchers working on fundamental vision problems.', 'The work done by Hubel and Wiesel in the 50s and 60s using electrophysiology has been influential in both human vision or animal vision, as well as that inspired computer vision. Emphasizes the influential work of Hubel and Wiesel in the 50s and 60s, which has inspired both human and animal vision studies as well as computer vision.', "The history of computer vision also starts around early 60s with the publication of 'Block World' by Larry Roberts and the MIT Summer Vision Project in 1966. Illustrates the early history of computer vision, starting in the early 60s with the publication of 'Block World' and the MIT Summer Vision Project in 1966.", "David Marr's influential book in the late 70s proposed a thought process that has dominated computer vision for several decades, providing a hierarchical process for deconstructing visual information. Highlights the influential thought process proposed by David Marr in the late 70s, which has dominated computer vision for several decades, providing a hierarchical process for deconstructing visual information.", 'In the 70s, two groups of scientists in Palo Alto proposed ideas for representing objects: generalized cylinder and pictorial structure, both reducing complex object structures into simpler shapes and their geometric configuration. Discusses the influential ideas proposed in the 70s by scientists in Palo Alto for representing objects, emphasizing the reduction of complex object structures into simpler shapes and their geometric configuration.']}, {'end': 1385.287, 'start': 1159.673, 'title': 'Evolution of computer vision', 'summary': 'Discusses the historical evolution of computer vision from the 60s to 2000, highlighting challenges in object recognition, advancements in image segmentation, and the significant impact of machine learning techniques on real-time face detection.', 'duration': 225.614, 'highlights': ['The significant challenges in object recognition persisted from the 60s to 80s, with limited progress in delivering practical solutions for real-world applications.', 'The shift towards object segmentation arose as a potential solution to the complexity of object recognition, leading to early seminal works in image segmentation by Jitendra Malik and Jianbo Shi.', 'Advancements in machine learning techniques, particularly statistical methods like support vector machines, boosting, graphical models, and early neural networks, gained momentum around 1999-2000, culminating in the impactful use of the AdaBoost algorithm for real-time face detection by Paul Viola and Michael Jones.', "The AdaBoost algorithm enabled near real-time face detection despite the technological limitations of slow computer chips, leading to its rapid adoption in real-world applications, as evidenced by Fujifilm's integration of a real-time face detector in their digital camera within five years of the publication of the paper."]}], 'duration': 942.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G4443210.jpg', 'highlights': ['The development of eyes around 540 million years ago led to an explosive speciation phase, sparking an evolutionary arms race and necessitating quick evolution for survival.', 'Vision has evolved into the most significant sensory system for nearly all animals, including humans, with approximately 50% of the neurons in the human cortex dedicated to visual processing, enabling crucial functions such as survival, movement, communication, and entertainment.', 'The onset of vision initiated a proactive shift in animal behavior, emphasizing the pivotal role of vision in catalyzing the survival and evolution of species.', 'The field of computer vision has blossomed from one summer project into a field of thousands of researchers worldwide still working on some of the most fundamental problems of vision.', 'The work done by Hubel and Wiesel in the 50s and 60s using electrophysiology has been influential in both human vision or animal vision, as well as that inspired computer vision.', "David Marr's influential book in the late 70s proposed a thought process that has dominated computer vision for several decades, providing a hierarchical process for deconstructing visual information.", 'In the 70s, two groups of scientists in Palo Alto proposed ideas for representing objects: generalized cylinder and pictorial structure, both reducing complex object structures into simpler shapes and their geometric configuration.', 'The significant challenges in object recognition persisted from the 60s to 80s, with limited progress in delivering practical solutions for real-world applications.', 'The shift towards object segmentation arose as a potential solution to the complexity of object recognition, leading to early seminal works in image segmentation by Jitendra Malik and Jianbo Shi.', 'Advancements in machine learning techniques, particularly statistical methods like support vector machines, boosting, graphical models, and early neural networks, gained momentum around 1999-2000, culminating in the impactful use of the AdaBoost algorithm for real-time face detection by Paul Viola and Michael Jones.', 'The AdaBoost algorithm enabled near real-time face detection despite the technological limitations of slow computer chips, leading to its rapid adoption in real-world applications.']}, {'end': 2179.316, 'segs': [{'end': 1414.097, 'src': 'embed', 'start': 1387.364, 'weight': 4, 'content': [{'end': 1394.488, 'text': 'So as a field, we continue to explore how we can do object recognition better.', 'start': 1387.364, 'duration': 7.124}, {'end': 1408.576, 'text': 'So one of the very influential way of thinking in the late 90s till the first 10 years of 2000 is feature-based object recognition.', 'start': 1395.068, 'duration': 13.508}, {'end': 1414.097, 'text': 'And here is a seminal work by David Lowe called Sift Feature.', 'start': 1409.256, 'duration': 4.841}], 'summary': 'Field explores improving object recognition. sift feature is a seminal work.', 'duration': 26.733, 'max_score': 1387.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41387364.jpg'}, {'end': 1577.999, 'src': 'heatmap', 'start': 1453.152, 'weight': 3, 'content': [{'end': 1465.464, 'text': 'So the task of object recognition began with identifying these critical features on the object and then match the features to a similar object.', 'start': 1453.152, 'duration': 12.312}, {'end': 1470.029, 'text': "That's an easier task than pattern matching the entire object.", 'start': 1465.544, 'duration': 4.485}, {'end': 1487.087, 'text': 'So here is a figure from his paper where it shows that several dozen SIFT features from one stop sign are identified and matched to the SIFT features of another stop sign.', 'start': 1470.629, 'duration': 16.458}, {'end': 1500.128, 'text': 'Using the same building block, which is features, diagnostic features in images, we have,', 'start': 1491.44, 'duration': 8.688}, {'end': 1505.953, 'text': 'as a field has made another step forward and start to recognizing holistic things.', 'start': 1500.128, 'duration': 5.825}, {'end': 1511.758, 'text': "Here's an example algorithm called spatial pyramid matching.", 'start': 1506.433, 'duration': 5.325}, {'end': 1521.185, 'text': 'The idea is that There are features in the images that can give us clues about which type of scene it is,', 'start': 1512.238, 'duration': 8.947}, {'end': 1525.848, 'text': "whether it's a landscape or a kitchen or a highway, and so on.", 'start': 1521.185, 'duration': 4.663}, {'end': 1538.856, 'text': 'And this particular work takes these features from different parts of the image and in different resolutions and put them together in a feature descriptor.', 'start': 1526.468, 'duration': 12.388}, {'end': 1544.1, 'text': 'And then we do support vector machine algorithm on top of that.', 'start': 1539.297, 'duration': 4.803}, {'end': 1553.565, 'text': 'Similarly, a very similar work has gained momentum in human recognition.', 'start': 1545.281, 'duration': 8.284}, {'end': 1559.949, 'text': 'So, putting together these features,', 'start': 1554.186, 'duration': 5.763}, {'end': 1569.895, 'text': 'we have a number of work that looks at how we can compose human bodies in more realistic images and recognize them.', 'start': 1559.949, 'duration': 9.946}, {'end': 1573.617, 'text': 'So one work is called the Histogram of Gradients.', 'start': 1570.355, 'duration': 3.262}, {'end': 1577.999, 'text': 'Another work is called deformable body part models.', 'start': 1574.137, 'duration': 3.862}], 'summary': 'Object recognition uses features to match and identify objects, such as using sift features for stop signs and spatial pyramid matching for scene recognition. similar work has advanced human recognition with methods like histogram of gradients and deformable body part models.', 'duration': 38.702, 'max_score': 1453.152, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41453152.jpg'}, {'end': 1913.481, 'src': 'embed', 'start': 1873.479, 'weight': 0, 'content': [{'end': 1885.427, 'text': 'And this is the gigantic, probably the biggest data set producing the field of AI at that time.', 'start': 1873.479, 'duration': 11.948}, {'end': 1895.012, 'text': 'And it began to push forward the algorithm development of object recognition into another phase.', 'start': 1886.087, 'duration': 8.925}, {'end': 1900.655, 'text': 'Especially important is how to benchmark the progress.', 'start': 1896.193, 'duration': 4.462}, {'end': 1913.481, 'text': 'So starting in 2009, the ImageNet team rolled out an international challenge called ImageNet Large Scale Visual Recognition Challenge.', 'start': 1901.215, 'duration': 12.266}], 'summary': 'Imagenet dataset pushed ai field with largest data set, leading to algorithm advancements and the imagenet challenge in 2009.', 'duration': 40.002, 'max_score': 1873.479, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41873479.jpg'}, {'end': 2088.581, 'src': 'embed', 'start': 2023.318, 'weight': 1, 'content': [{'end': 2035.623, 'text': "but to go from an error rate that's unacceptable for real world application all the way to being on par with humans in ImageNet Challenge.", 'start': 2023.318, 'duration': 12.305}, {'end': 2037.525, 'text': 'the field took only a few years.', 'start': 2035.623, 'duration': 1.902}, {'end': 2050.743, 'text': 'And one particular moment you should notice on this graph is the year 2012.', 'start': 2038.766, 'duration': 11.977}, {'end': 2056.608, 'text': 'In the first two years, our error rate hovered around 25%.', 'start': 2050.743, 'duration': 5.865}, {'end': 2071.041, 'text': "But in 2012, the error rate was dropped almost 10% to 16%, even though now it's better, but that drop was very significant.", 'start': 2056.608, 'duration': 14.433}, {'end': 2088.581, 'text': 'And the winning algorithm of that year is a convolutional neural network model that beat all other algorithms around that time to win the ImageNet challenge.', 'start': 2071.962, 'duration': 16.619}], 'summary': 'In 2012, error rate dropped 10% to 16% with the winning algorithm being a convolutional neural network model in the imagenet challenge.', 'duration': 65.263, 'max_score': 2023.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42023318.jpg'}], 'start': 1387.364, 'title': 'Evolution of object recognition', 'summary': 'Discusses the transition from feature-based to holistic object recognition techniques, advancements in human recognition, improvements in benchmark datasets like pascal visual object challenge and imagenet, development of convolutional neural network models, and the significant drop in error rate during the 2012 imagenet challenge.', 'chapters': [{'end': 1577.999, 'start': 1387.364, 'title': 'Evolution of object recognition techniques', 'summary': 'Discusses the transition from feature-based object recognition to holistic recognition, exemplified by sift feature and spatial pyramid matching, and the advancements in human recognition such as histogram of gradients and deformable body part models.', 'duration': 190.635, 'highlights': ['Transition from feature-based object recognition to holistic recognition The field transitioned from matching entire objects to identifying critical features and utilizing them for pattern matching, exemplified by Sift Feature and spatial pyramid matching.', 'Advancements in human recognition techniques Advancements in recognizing human bodies in images, including the Histogram of Gradients and deformable body part models, have gained momentum in the field.', "Influence of David Lowe's Sift Feature David Lowe's Sift Feature was influential in identifying critical features of objects that remain invariant to changes, leading to a shift in object recognition techniques."]}, {'end': 2179.316, 'start': 1579.14, 'title': 'Evolution of object recognition', 'summary': 'Discusses the evolution of object recognition, highlighting the improvements in benchmark datasets like pascal visual object challenge and imagenet, the development of convolutional neural network models, and the significant drop in error rate during the 2012 imagenet challenge.', 'duration': 600.176, 'highlights': ['The ImageNet Challenge resulted in a significant drop in error rate, reaching a level on par with human performance by 2012. The error rate steadily decreased over the years, reaching a level comparable to human performance by 2012, demonstrating the remarkable progress in object recognition.', 'The development of convolutional neural network models, particularly in 2012, showed the tremendous capacity and ability to make progress in the field of computer vision. The winning algorithm of the 2012 ImageNet challenge was a convolutional neural network model, marking a significant advancement in the field of computer vision.', 'The creation of ImageNet, with almost 15 million images organized in 22,000 categories, significantly pushed forward the algorithm development of object recognition. ImageNet, with its massive dataset and international challenge, played a crucial role in advancing the development of object recognition algorithms, setting new benchmarks for progress.']}], 'duration': 791.952, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G41387364.jpg', 'highlights': ['The creation of ImageNet, with almost 15 million images organized in 22,000 categories, significantly pushed forward the algorithm development of object recognition.', 'The winning algorithm of the 2012 ImageNet challenge was a convolutional neural network model, marking a significant advancement in the field of computer vision.', 'The error rate steadily decreased over the years, reaching a level comparable to human performance by 2012, demonstrating the remarkable progress in object recognition.', 'Advancements in recognizing human bodies in images, including the Histogram of Gradients and deformable body part models, have gained momentum in the field.', 'The field transitioned from matching entire objects to identifying critical features and utilizing them for pattern matching, exemplified by Sift Feature and spatial pyramid matching.', "David Lowe's Sift Feature was influential in identifying critical features of objects that remain invariant to changes, leading to a shift in object recognition techniques."]}, {'end': 2896.967, 'segs': [{'end': 2244.147, 'src': 'embed', 'start': 2218.37, 'weight': 2, 'content': [{'end': 2226.836, 'text': 'So this relatively basic tool of image classification is super useful on its own and could be applied all over the place for many different applications.', 'start': 2218.37, 'duration': 8.466}, {'end': 2239.144, 'text': "But in this course we're also gonna talk about several other visual recognition problems that build upon many of the tools that we develop for the purpose of image classification.", 'start': 2228.957, 'duration': 10.187}, {'end': 2244.147, 'text': "We'll talk about other problems such as object detection or image captioning.", 'start': 2239.984, 'duration': 4.163}], 'summary': 'Image classification is useful and applicable to various applications. course covers other visual recognition problems like object detection and image captioning.', 'duration': 25.777, 'max_score': 2218.37, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42218370.jpg'}, {'end': 2330.569, 'src': 'heatmap', 'start': 2218.37, 'weight': 0.75, 'content': [{'end': 2226.836, 'text': 'So this relatively basic tool of image classification is super useful on its own and could be applied all over the place for many different applications.', 'start': 2218.37, 'duration': 8.466}, {'end': 2239.144, 'text': "But in this course we're also gonna talk about several other visual recognition problems that build upon many of the tools that we develop for the purpose of image classification.", 'start': 2228.957, 'duration': 10.187}, {'end': 2244.147, 'text': "We'll talk about other problems such as object detection or image captioning.", 'start': 2239.984, 'duration': 4.163}, {'end': 2248.27, 'text': 'So the setup in object detection is a little bit different.', 'start': 2244.967, 'duration': 3.303}, {'end': 2253.875, 'text': 'Rather than classifying an entire image as a cat or a dog or a horse or whatnot.', 'start': 2248.67, 'duration': 5.205}, {'end': 2263.463, 'text': 'instead we want to go in and draw bounding boxes and say that there is a dog here and a cat here and a carb over in the background and draw these boxes describing where objects are in the image.', 'start': 2253.875, 'duration': 9.588}, {'end': 2271.071, 'text': "We'll also talk about image captioning, where given an image, the system now needs to produce a natural language sentence describing the image.", 'start': 2264.504, 'duration': 6.567}, {'end': 2275.215, 'text': 'It sounds like a really hard, complicated and different problem,', 'start': 2271.832, 'duration': 3.383}, {'end': 2282.943, 'text': "but we'll see that many of the tools we develop in service of image classification will be reused in these other problems as well.", 'start': 2275.215, 'duration': 7.728}, {'end': 2289.727, 'text': 'So we mentioned this before in the context of the ImageNet challenge,', 'start': 2286.644, 'duration': 3.083}, {'end': 2297.995, 'text': "but one of the things that's really driven the progress of the field in recent years has been this adoption of convolutional neural networks, or CNNs,", 'start': 2289.727, 'duration': 8.268}, {'end': 2299.537, 'text': 'or sometimes called ComNets.', 'start': 2297.995, 'duration': 1.542}, {'end': 2310.699, 'text': 'So, if we look at the algorithms that have won the ImageNet challenge for the last several years, in 2011 we see this method from Lin et al,', 'start': 2300.495, 'duration': 10.204}, {'end': 2312.94, 'text': 'which is still hierarchical.', 'start': 2310.699, 'duration': 2.241}, {'end': 2314.52, 'text': 'it consists of multiple layers.', 'start': 2312.94, 'duration': 1.58}, {'end': 2320.243, 'text': 'So first we compute some features, next we compute some local invariances, some pooling,', 'start': 2314.981, 'duration': 5.262}, {'end': 2325.725, 'text': 'and go through several layers of processing and then finally feed this resulting descriptor to a linear SVM.', 'start': 2320.243, 'duration': 5.482}, {'end': 2329.328, 'text': "What you'll notice here is that this is still hierarchical.", 'start': 2327.186, 'duration': 2.142}, {'end': 2330.569, 'text': "We're still detecting edges.", 'start': 2329.408, 'duration': 1.161}], 'summary': 'Introduction to image classification, object detection, and image captioning using convolutional neural networks and their applications in various visual recognition problems.', 'duration': 112.199, 'max_score': 2218.37, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42218370.jpg'}, {'end': 2390.101, 'src': 'embed', 'start': 2359.915, 'weight': 1, 'content': [{'end': 2363.919, 'text': 'And since then, every year, the winner of ImageNet has been a neural network.', 'start': 2359.915, 'duration': 4.004}, {'end': 2367.602, 'text': 'And the trend has been that these networks are getting deeper and deeper each year.', 'start': 2364.379, 'duration': 3.223}, {'end': 2373.167, 'text': 'So AlexNet was a seven or eight layer neural network, depending on how exactly you count things.', 'start': 2368.303, 'duration': 4.864}, {'end': 2382.675, 'text': 'In 2015 we had these much deeper networks, GoogleNet from Google, and VGG, the VGG network from Oxford, which was about 19 layers at that time.', 'start': 2373.708, 'duration': 8.967}, {'end': 2390.101, 'text': 'And then, in 2015, it got really crazy and this paper came out from Microsoft Research Asia, called residual networks,', 'start': 2383.235, 'duration': 6.866}], 'summary': 'Annual imagenet winners are neural networks, increasing in depth each year.', 'duration': 30.186, 'max_score': 2359.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42359915.jpg'}, {'end': 2449.654, 'src': 'embed', 'start': 2422.696, 'weight': 0, 'content': [{'end': 2430.818, 'text': "But one point that's really important is that it's true that the breakthrough moment for convolutional neural networks was in 2012,,", 'start': 2422.696, 'duration': 8.122}, {'end': 2433.479, 'text': 'when these networks performed very well on the ImageNet challenge.', 'start': 2430.818, 'duration': 2.661}, {'end': 2436.64, 'text': "But they certainly weren't invented in 2012.", 'start': 2434.119, 'duration': 2.521}, {'end': 2439.561, 'text': 'These algorithms had actually been around for quite a long time before that.', 'start': 2436.64, 'duration': 2.921}, {'end': 2449.654, 'text': 'So one of the sort of foundational works in this area of convolutional neural networks was actually in the 90s,', 'start': 2441.403, 'duration': 8.251}], 'summary': 'Convolutional neural networks had breakthrough in 2012, but existed since the 90s.', 'duration': 26.958, 'max_score': 2422.696, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42422696.jpg'}, {'end': 2581.015, 'src': 'embed', 'start': 2554.247, 'weight': 3, 'content': [{'end': 2560.552, 'text': 'Just by having more compute available, it allowed researchers to explore with larger architectures and larger models.', 'start': 2554.247, 'duration': 6.305}, {'end': 2563.714, 'text': 'And in some cases just increasing the model size,', 'start': 2560.972, 'duration': 2.742}, {'end': 2568.117, 'text': 'but still using these kind of classical approaches and classical algorithms tends to work quite well.', 'start': 2563.714, 'duration': 4.403}, {'end': 2573.841, 'text': 'So this idea of increasing computation is super important in the history of deep learning.', 'start': 2569.057, 'duration': 4.784}, {'end': 2581.015, 'text': 'I think the second key innovation that changed between now and the 90s was data.', 'start': 2576.034, 'duration': 4.981}], 'summary': 'More compute enabled larger architectures, models, and improved performance in deep learning. data was another key innovation.', 'duration': 26.768, 'max_score': 2554.247, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42554247.jpg'}, {'end': 2626.309, 'src': 'embed', 'start': 2597.939, 'weight': 4, 'content': [{'end': 2603.26, 'text': 'before the internet was super, super widely used and it was very difficult to collect large, varied data sets.', 'start': 2597.939, 'duration': 5.321}, {'end': 2611.923, 'text': 'But now, in the 2010s, with data sets like Pascal and ImageNet, there existed these relatively large,', 'start': 2604.44, 'duration': 7.483}, {'end': 2618.386, 'text': 'high quality labeled data sets that were again orders of magnitude bigger than the data sets available in the 90s.', 'start': 2611.923, 'duration': 6.463}, {'end': 2626.309, 'text': 'And these much larger data sets again allowed us to work with higher capacity models and train these models to actually work quite well on real world problems.', 'start': 2618.866, 'duration': 7.443}], 'summary': 'In the 2010s, large labeled data sets like pascal and imagenet enabled training higher capacity models for real world problems.', 'duration': 28.37, 'max_score': 2597.939, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42597939.jpg'}, {'end': 2705.178, 'src': 'embed', 'start': 2674.452, 'weight': 5, 'content': [{'end': 2679.355, 'text': 'And we need to continue to develop our algorithms to do even better and tackle more ambitious problems.', 'start': 2674.452, 'duration': 4.903}, {'end': 2683.629, 'text': 'Some examples of this are going back to these older ideas, in fact.', 'start': 2680.488, 'duration': 3.141}, {'end': 2689.472, 'text': 'Things like semantic segmentation or perceptual grouping, where, rather than labeling the entire image,', 'start': 2684.27, 'duration': 5.202}, {'end': 2692.613, 'text': 'we want to understand for every pixel in the image, what is it doing??', 'start': 2689.472, 'duration': 3.141}, {'end': 2693.193, 'text': 'What does it mean?', 'start': 2692.653, 'duration': 0.54}, {'end': 2696.274, 'text': "And we'll revisit that idea a little bit later in the course.", 'start': 2694.013, 'duration': 2.261}, {'end': 2705.178, 'text': "There's definitely work going back to this idea of 3D understanding, of reconstructing the entire world, and that's still an unsolved problem,", 'start': 2697.035, 'duration': 8.143}], 'summary': 'Developing algorithms for better image understanding and 3d reconstruction.', 'duration': 30.726, 'max_score': 2674.452, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42674452.jpg'}], 'start': 2179.316, 'title': 'Image classification and deep learning evolution', 'summary': 'Introduces image classification and cnns, and discusses their applications, including recognizing food, products, and artworks. it also covers the evolution of deep learning algorithms, highlighting advancements, challenges, and potential future developments in computer vision.', 'chapters': [{'end': 2507.552, 'start': 2179.316, 'title': 'Image classification and convolutional neural networks', 'summary': 'Introduces the concept and applications of image classification, such as recognizing food, products, and artworks, and discusses the evolution and breakthrough of convolutional neural networks (cnns) from 2012 to 2015, with deeper networks performing better in the imagenet challenge.', 'duration': 328.236, 'highlights': ["The breakthrough moment for convolutional neural networks was in 2012, with the creation of the seven-layer convolutional neural network, known as AlexNet, which performed exceptionally well in the ImageNet competition. In 2012, the creation of AlexNet, a seven-layer convolutional neural network, marked a breakthrough in the field, performing exceptionally well in the ImageNet competition and leading to subsequent years' winners being neural networks.", 'The trend in the ImageNet challenge has been that neural networks are getting deeper each year, with GoogleNet and VGG networks in 2015 being significantly deeper, and the introduction of 152-layer residual networks from Microsoft Research Asia. The trend in the ImageNet challenge shows that neural networks are getting deeper each year, with the introduction of significantly deeper networks, such as the 152-layer residual networks from Microsoft Research Asia.', 'The concept and applications of image classification, including recognizing food, products, and artworks, are discussed, emphasizing the broad utility of this relatively basic tool in various settings. The broad utility of image classification is emphasized, with applications including recognizing food, products, and artworks, indicating its relevance in various settings.', 'The chapter discusses the primary focus on the image classification problem and its application in various settings, highlighting its usefulness and relevance in industry and academia. The chapter focuses on the image classification problem and its application in various settings, emphasizing its usefulness and relevance in both industry and academia.', 'The foundational works in the area of convolutional neural networks are discussed, dating back to the 90s, including the creation of a convolutional neural network for recognizing digits by Jan LeCun and collaborators. The foundational works in the area of convolutional neural networks dating back to the 90s are discussed, including the creation of a convolutional neural network for recognizing digits by Jan LeCun and collaborators.']}, {'end': 2896.967, 'start': 2509.471, 'title': 'Evolution of deep learning', 'summary': 'Explains the surge in popularity of deep learning algorithms, attributed to advancements in computation and data availability, leading to larger models and better real-world performance, while highlighting ongoing challenges and the potential for future developments in computer vision.', 'duration': 387.496, 'highlights': ["Advancements in computation, including faster computers and the emergence of GPUs, have enabled the exploration of larger architectures and models, leading to improved deep learning performance. Moore's Law and the exponential growth in the number of transistors on chips have provided several orders of magnitude increase in computation, allowing for the utilization of larger models and architectures for deep learning.", 'The availability of larger, high-quality labeled datasets, such as Pascal and ImageNet, in the 2010s has facilitated training higher capacity models, improving their real-world performance compared to the limited labeled data sets available in the 90s. The existence of orders of magnitude larger labeled datasets in the 2010s, compared to the 90s, has enabled the training of deep learning models on more varied and extensive data, leading to improved real-world performance.', 'Challenges and open problems in computer vision, such as semantic segmentation, perceptual grouping, 3D understanding, and activity recognition, highlight the need for continued algorithmic development in tackling more ambitious tasks. The field of computer vision faces numerous open challenges, including semantic segmentation, 3D understanding, and activity recognition, emphasizing the ongoing need for algorithmic advancements to address complex visual tasks.']}], 'duration': 717.651, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42179316.jpg', 'highlights': ['The breakthrough moment for convolutional neural networks was in 2012, with the creation of the seven-layer convolutional neural network, known as AlexNet, which performed exceptionally well in the ImageNet competition.', 'The trend in the ImageNet challenge shows that neural networks are getting deeper each year, with the introduction of significantly deeper networks, such as the 152-layer residual networks from Microsoft Research Asia.', 'The broad utility of image classification is emphasized, with applications including recognizing food, products, and artworks, indicating its relevance in various settings.', 'Advancements in computation, including faster computers and the emergence of GPUs, have enabled the exploration of larger architectures and models, leading to improved deep learning performance.', 'The existence of orders of magnitude larger labeled datasets in the 2010s, compared to the 90s, has enabled the training of deep learning models on more varied and extensive data, leading to improved real-world performance.', 'Challenges and open problems in computer vision, such as semantic segmentation, perceptual grouping, 3D understanding, and activity recognition, highlight the need for continued algorithmic development in tackling more ambitious tasks.']}, {'end': 3470.459, 'segs': [{'end': 2984.654, 'src': 'embed', 'start': 2954.977, 'weight': 0, 'content': [{'end': 2959.439, 'text': 'It can be very useful, it can go out and make the world a better place in various ways.', 'start': 2954.977, 'duration': 4.462}, {'end': 2968.001, 'text': 'Computer vision could be applied in places like medical diagnosis and self-driving cars and robotics and all these different places.', 'start': 2960.139, 'duration': 7.862}, {'end': 2972.463, 'text': 'In addition to sort of tying back to this core idea of understanding human intelligence.', 'start': 2968.382, 'duration': 4.081}, {'end': 2977.487, 'text': 'So to me, I think that computer vision is this fantastically amazing, interesting field,', 'start': 2973.223, 'duration': 4.264}, {'end': 2984.654, 'text': "and I'm really glad that over the course of the quarter we'll get to really dive in and dig into all these different details about how these algorithms are working these days.", 'start': 2977.487, 'duration': 7.167}], 'summary': 'Computer vision has diverse applications, such as medical diagnosis and self-driving cars, making it a fascinating and useful field.', 'duration': 29.677, 'max_score': 2954.977, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42954977.jpg'}, {'end': 3083.761, 'src': 'heatmap', 'start': 3022.963, 'weight': 1, 'content': [{'end': 3026.846, 'text': "We're both PhD students working under Feifei on various computer vision problems.", 'start': 3022.963, 'duration': 3.883}, {'end': 3033.973, 'text': 'We have an amazing teaching staff this year of 18 TAs so far, many of whom are sitting over here in the front.', 'start': 3028.227, 'duration': 5.746}, {'end': 3040.118, 'text': 'These guys are really the unsung heroes, behind the scenes, making the course run smoothly, making sure everything happens well.', 'start': 3034.553, 'duration': 5.565}, {'end': 3042.521, 'text': 'Be nice to them.', 'start': 3041.96, 'duration': 0.561}, {'end': 3048.288, 'text': "I think I also should mention that this is the third time we've taught this course.", 'start': 3044.443, 'duration': 3.845}, {'end': 3052.393, 'text': "And it's the first time that Andre Karpathy has not been an instructor in this course.", 'start': 3048.628, 'duration': 3.765}, {'end': 3054.796, 'text': 'He was a very close friend of mine.', 'start': 3053.214, 'duration': 1.582}, {'end': 3056.878, 'text': "He's still alive.", 'start': 3055.096, 'duration': 1.782}, {'end': 3057.319, 'text': "He's OK.", 'start': 3056.918, 'duration': 0.401}, {'end': 3057.719, 'text': "Don't worry.", 'start': 3057.339, 'duration': 0.38}, {'end': 3064.728, 'text': "But he graduated and now, so he's actually here, I think, hanging around in the lecture hall.", 'start': 3059.501, 'duration': 5.227}, {'end': 3071.237, 'text': 'So a lot of the development and the history of this course is really due to him working on it with me over the last couple of years.', 'start': 3065.129, 'duration': 6.108}, {'end': 3073.299, 'text': 'So I think you should be aware of that.', 'start': 3071.697, 'duration': 1.602}, {'end': 3081.92, 'text': 'Also about logistics, probably the best way for keeping in touch with the course staff is through Piazza.', 'start': 3075.498, 'duration': 6.422}, {'end': 3083.761, 'text': 'You should all go and sign up right now.', 'start': 3082.42, 'duration': 1.341}], 'summary': '18 tas, 3rd time teaching, andre karpathy no longer instructor, use piazza for communication', 'duration': 60.798, 'max_score': 3022.963, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G43022963.jpg'}, {'end': 3152.575, 'src': 'embed', 'start': 3122.182, 'weight': 2, 'content': [{'end': 3125.165, 'text': 'But for the most part, most of your communication with the staff should be through Piazza.', 'start': 3122.182, 'duration': 2.983}, {'end': 3128.926, 'text': 'We also have an optional textbook this year.', 'start': 3126.465, 'duration': 2.461}, {'end': 3130.287, 'text': 'This is by no means required.', 'start': 3128.966, 'duration': 1.321}, {'end': 3132.507, 'text': 'You can go through the course totally fine without it.', 'start': 3130.687, 'duration': 1.82}, {'end': 3133.688, 'text': 'Everything will be self-contained.', 'start': 3132.568, 'duration': 1.12}, {'end': 3142.311, 'text': "This is sort of exciting because it's maybe the first textbook about deep learning that got published earlier this year by Ian Goodfellow,", 'start': 3134.688, 'duration': 7.623}, {'end': 3143.752, 'text': 'Yashua Bengio and Aaron Corville.', 'start': 3142.311, 'duration': 1.441}, {'end': 3146.773, 'text': 'I put the Amazon link here in the slides.', 'start': 3144.712, 'duration': 2.061}, {'end': 3147.873, 'text': 'You can go get it if you want to.', 'start': 3146.793, 'duration': 1.08}, {'end': 3152.575, 'text': "But also, the whole content of the book is free online, so you don't even have to buy it if you don't want to.", 'start': 3148.533, 'duration': 4.042}], 'summary': 'Communication through piazza, optional textbook available, free online content', 'duration': 30.393, 'max_score': 3122.182, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G43122182.jpg'}, {'end': 3210.843, 'src': 'embed', 'start': 3179.271, 'weight': 3, 'content': [{'end': 3181.932, 'text': 'how the network is trained and tested, and whatnot, and all that.', 'start': 3179.271, 'duration': 2.661}, {'end': 3188.955, 'text': "And throughout the course, through the assignments, you'll be implementing your own convolutional neural networks from scratch in Python.", 'start': 3182.432, 'duration': 6.523}, {'end': 3193.316, 'text': "You'll be implementing the full forward and backward passes through these things and by the end,", 'start': 3189.695, 'duration': 3.621}, {'end': 3195.997, 'text': "you'll have implemented a whole convolutional neural network totally on your own.", 'start': 3193.316, 'duration': 2.681}, {'end': 3197.558, 'text': "I think that's really cool.", 'start': 3196.738, 'duration': 0.82}, {'end': 3204.461, 'text': "But we're also kind of practical and we know that in most cases people are probably not writing these things from scratch.", 'start': 3198.418, 'duration': 6.043}, {'end': 3210.843, 'text': 'So we also want to give you a good introduction to some of the state of the art software tools that are used in practice for these things.', 'start': 3204.901, 'duration': 5.942}], 'summary': 'Learn to implement convolutional neural networks from scratch in python and explore state-of-the-art software tools.', 'duration': 31.572, 'max_score': 3179.271, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G43179271.jpg'}, {'end': 3340.809, 'src': 'embed', 'start': 3311.376, 'weight': 4, 'content': [{'end': 3315.02, 'text': 'And a large portion of your grade will be the final course project,', 'start': 3311.376, 'duration': 3.644}, {'end': 3319.445, 'text': "where you'll work in teams of one to three and produce some amazing project that will blow everyone's minds.", 'start': 3315.02, 'duration': 4.425}, {'end': 3326.077, 'text': "We have a late policy, so you have seven late days that you're free to allocate among your different homeworks.", 'start': 3320.753, 'duration': 5.324}, {'end': 3333.984, 'text': 'These are meant to cover things like minor illnesses or traveling or conferences or anything like that.', 'start': 3327.378, 'duration': 6.606}, {'end': 3340.809, 'text': "If you come to us at the end of the quarter and say that, oh, I suddenly have to go give a presentation at this conference, that's not gonna be okay.", 'start': 3334.264, 'duration': 6.545}], 'summary': 'Final project is a major part of the grade, with 7 late days for flexibility.', 'duration': 29.433, 'max_score': 3311.376, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G43311376.jpg'}, {'end': 3440.043, 'src': 'embed', 'start': 3415.498, 'weight': 5, 'content': [{'end': 3421.906, 'text': 'We also assume a little bit of knowledge coming in of computer vision, maybe at the level of CS131 or 231A.', 'start': 3415.498, 'duration': 6.408}, {'end': 3424.85, 'text': "If you have taken those courses before, you'll be fine.", 'start': 3422.306, 'duration': 2.544}, {'end': 3431.458, 'text': "If you haven't, I think you'll be okay in this class, but you might have a tiny bit of catching up to do, but I think you'll probably be okay.", 'start': 3425.21, 'duration': 6.248}, {'end': 3433.04, 'text': 'Those are not super strict prerequisites.', 'start': 3431.478, 'duration': 1.562}, {'end': 3440.043, 'text': 'We also assume a little bit of background knowledge about machine learning, maybe at the level of CS229.', 'start': 3434.201, 'duration': 5.842}], 'summary': 'Assumes some background in computer vision and machine learning, at the level of cs131/231a and cs229 respectively.', 'duration': 24.545, 'max_score': 3415.498, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G43415498.jpg'}], 'start': 2897.227, 'title': 'Computer vision, deep learning essentials', 'summary': 'Discusses the applications and logistics of computer vision, and introduces deep learning essentials including algorithm mechanics, practical implementation of convolutional neural networks in python, and emphasizes collaboration and prerequisite knowledge in python, c/c++, calculus, linear algebra, computer vision, and machine learning.', 'chapters': [{'end': 3158.658, 'start': 2897.227, 'title': 'Computer vision and logistics of the class', 'summary': 'Discusses the excitement around computer vision, its potential applications, and the logistics of the class, including the teaching staff and preferred communication methods.', 'duration': 261.431, 'highlights': ['The chapter discusses the excitement around computer vision and its potential applications in various fields such as medical diagnosis, self-driving cars, robotics, and more. Potential applications in medical diagnosis, self-driving cars, and robotics.', 'The logistics of the class are explained, including the teaching staff, preferred communication methods, and the optional textbook for additional readings. Logistics of the class, including teaching staff, preferred communication methods, and optional textbook.', 'The teaching staff, including the professor and PhD students, are introduced, along with the mention of an optional textbook for the course. Introduction of teaching staff and mention of an optional textbook.']}, {'end': 3470.459, 'start': 3161.843, 'title': 'Deep learning essentials', 'summary': 'Introduces the deep mechanics of algorithms, covers practical implementation of convolutional neural networks in python, and emphasizes the importance of collaboration and prerequisite knowledge in python, c/c++, calculus, linear algebra, computer vision, and machine learning.', 'duration': 308.616, 'highlights': ['The course focuses on understanding the deep mechanics of algorithms and practical implementation of convolutional neural networks in Python. Students will implement their own convolutional neural networks from scratch in Python and gain exposure to state-of-the-art software tools like TensorFlow, Torch, and PyTorch.', 'Collaboration and adherence to the honor code are emphasized, and a late policy of seven days is in place for the homework assignments. The course emphasizes collaboration within the bounds of the honor code and provides a late policy of seven days for allocating among different homeworks.', 'Prerequisite knowledge in Python, C/C++, calculus, linear algebra, computer vision, and machine learning is essential for the course. The course requires a deep familiarity with Python, some familiarity with C/C++, knowledge of calculus, linear algebra, computer vision, and machine learning at the level of CS131 or 231A and CS229.']}], 'duration': 573.232, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/vT1JzLTH4G4/pics/vT1JzLTH4G42897227.jpg', 'highlights': ['Potential applications in medical diagnosis, self-driving cars, and robotics.', 'Logistics of the class, including teaching staff, preferred communication methods, and optional textbook.', 'Introduction of teaching staff and mention of an optional textbook.', 'Students will implement their own convolutional neural networks from scratch in Python and gain exposure to state-of-the-art software tools like TensorFlow, Torch, and PyTorch.', 'The course emphasizes collaboration within the bounds of the honor code and provides a late policy of seven days for allocating among different homeworks.', 'The course requires a deep familiarity with Python, some familiarity with C/C++, knowledge of calculus, linear algebra, computer vision, and machine learning at the level of CS131 or 231A and CS229.']}], 'highlights': ['The class size has doubled from 150 students to 730 students, reflecting the exponential growth of the CS231N class at Stanford University.', 'Roughly 80% of all internet traffic would be video by 2017, indicating the substantial amount of visual data present.', 'The explosive growth of visual data has heightened the significance of computer vision, emphasizing its increasing importance.', 'Computer vision technologies automate video annotation, crucial for content understanding and monetization.', 'Interdisciplinary nature of computer vision spans physics, biology, psychology, computer science, mathematics, and engineering.', 'Understanding optics, image formation, and animal brain processing is essential in computer vision.', 'Stanford Vision Lab focuses on machine learning and computer science.', 'CS231N emphasizes neural networks and convolutional neural networks for visual recognition tasks.', 'The development of eyes around 540 million years ago led to an explosive speciation phase, sparking an evolutionary arms race and necessitating quick evolution for survival.', 'Vision has evolved into the most significant sensory system for nearly all animals, including humans, with approximately 50% of the neurons in the human cortex dedicated to visual processing, enabling crucial functions such as survival, movement, communication, and entertainment.', 'The onset of vision initiated a proactive shift in animal behavior, emphasizing the pivotal role of vision in catalyzing the survival and evolution of species.', 'The field of computer vision has blossomed from one summer project into a field of thousands of researchers worldwide still working on some of the most fundamental problems of vision.', 'The work done by Hubel and Wiesel in the 50s and 60s using electrophysiology has been influential in both human vision or animal vision, as well as that inspired computer vision.', "David Marr's influential book in the late 70s proposed a thought process that has dominated computer vision for several decades, providing a hierarchical process for deconstructing visual information.", 'In the 70s, two groups of scientists in Palo Alto proposed ideas for representing objects: generalized cylinder and pictorial structure, both reducing complex object structures into simpler shapes and their geometric configuration.', 'The significant challenges in object recognition persisted from the 60s to 80s, with limited progress in delivering practical solutions for real-world applications.', 'The shift towards object segmentation arose as a potential solution to the complexity of object recognition, leading to early seminal works in image segmentation by Jitendra Malik and Jianbo Shi.', 'Advancements in machine learning techniques, particularly statistical methods like support vector machines, boosting, graphical models, and early neural networks, gained momentum around 1999-2000, culminating in the impactful use of the AdaBoost algorithm for real-time face detection by Paul Viola and Michael Jones.', 'The AdaBoost algorithm enabled near real-time face detection despite the technological limitations of slow computer chips, leading to its rapid adoption in real-world applications.', 'The creation of ImageNet, with almost 15 million images organized in 22,000 categories, significantly pushed forward the algorithm development of object recognition.', 'The winning algorithm of the 2012 ImageNet challenge was a convolutional neural network model, marking a significant advancement in the field of computer vision.', 'The error rate steadily decreased over the years, reaching a level comparable to human performance by 2012, demonstrating the remarkable progress in object recognition.', 'The breakthrough moment for convolutional neural networks was in 2012, with the creation of the seven-layer convolutional neural network, known as AlexNet, which performed exceptionally well in the ImageNet competition.', 'The trend in the ImageNet challenge shows that neural networks are getting deeper each year, with the introduction of significantly deeper networks, such as the 152-layer residual networks from Microsoft Research Asia.', 'The broad utility of image classification is emphasized, with applications including recognizing food, products, and artworks, indicating its relevance in various settings.', 'Advancements in computation, including faster computers and the emergence of GPUs, have enabled the exploration of larger architectures and models, leading to improved deep learning performance.', 'The existence of orders of magnitude larger labeled datasets in the 2010s, compared to the 90s, has enabled the training of deep learning models on more varied and extensive data, leading to improved real-world performance.', 'Challenges and open problems in computer vision, such as semantic segmentation, perceptual grouping, 3D understanding, and activity recognition, highlight the need for continued algorithmic development in tackling more ambitious tasks.', 'Potential applications in medical diagnosis, self-driving cars, and robotics.', 'Logistics of the class, including teaching staff, preferred communication methods, and optional textbook.', 'Introduction of teaching staff and mention of an optional textbook.', 'Students will implement their own convolutional neural networks from scratch in Python and gain exposure to state-of-the-art software tools like TensorFlow, Torch, and PyTorch.', 'The course emphasizes collaboration within the bounds of the honor code and provides a late policy of seven days for allocating among different homeworks.', 'The course requires a deep familiarity with Python, some familiarity with C/C++, knowledge of calculus, linear algebra, computer vision, and machine learning at the level of CS131 or 231A and CS229.']}