title
Week 1 – Lecture: History, motivation, and evolution of Deep Learning
description
Course website: https://bit.ly/DLSP20-web
Playlist: http://bit.ly/pDL-YouTube
Speaker: Yann LeCun
Week 1: http://bit.ly/DLSP20-01
0:00:00 – Week 1 – Lecture
LECTURE Part A: http://bit.ly/DLSP20-01-1
We discuss the motivation behind deep learning. We begin with the history and inspiration of deep learning. Then we discuss the history of pattern recognition and introduce gradient descent and its computation by backpropagation. Finally, we discuss the hierarchical representation of the visual cortex.
0:03:37 – Inspiration of Deep Learning and Its History, Supervised Learning
0:24:21 – History of Pattern Recognition and Introduction to Gradient Descent
0:38:56 – Computing Gradients by Backpropagation, Hierarchical Representation of the Visual Cortex
LECTURE Part B: http://bit.ly/DLSP20-01-2
We first discuss the evolution of CNNs, from Fukushima to LeCun to Alexnet. We then discuss some applications of CNN's, such as image segmentation, autonomous vehicles, and medical image analysis. We discuss the hierarchical nature of deep networks and the attributes of deep networks that make them advantageous. We conclude with a discussion of generating and learning features/representations.
0:49:25 – Evolution of CNNs
1:05:55 – Deep Learning & Feature Extraction
1:19:27 – Learning Representations
detail
{'title': 'Week 1 – Lecture: History, motivation, and evolution of Deep Learning', 'heatmap': [{'end': 4815.341, 'start': 4748.936, 'weight': 1}], 'summary': 'The lecture provides an introduction to the course and class structure, an overview of the deep learning course, history and evolution of neural nets, stochastic gradient descent and back propagation, neural networks in image processing, impact of convolutional nets, computer vision applications, random projections, and optimizing circuits and natural image manifolds, offering insights into key topics and historical perspectives in deep learning.', 'chapters': [{'end': 61.042, 'segs': [{'end': 28.616, 'src': 'embed', 'start': 0.384, 'weight': 0, 'content': [{'end': 3.005, 'text': 'Okay, so first of all, I have a terrible confession to make.', 'start': 0.384, 'duration': 2.621}, {'end': 11.189, 'text': 'This class is actually being run not by me, but by these two guys, Alfredo Canziani and Mark Goldstein, whose names are here.', 'start': 3.025, 'duration': 8.164}, {'end': 11.949, 'text': "They're the TAs.", 'start': 11.269, 'duration': 0.68}, {'end': 15.911, 'text': "And you'll talk to them much more often than you'll talk to me.", 'start': 12.949, 'duration': 2.962}, {'end': 18.031, 'text': "That's the first thing.", 'start': 17.371, 'duration': 0.66}, {'end': 23.794, 'text': "The other confession I have to make is that if you have questions about this class, don't ask them at the end of this course,", 'start': 18.071, 'duration': 5.723}, {'end': 26.435, 'text': 'because I have to run right after the class to catch an airplane.', 'start': 23.794, 'duration': 2.641}, {'end': 28.616, 'text': "But I can't wait until next week.", 'start': 27.395, 'duration': 1.221}], 'summary': 'Tas alfredo canziani and mark goldstein will run the class; instructor to leave for an airplane immediately after the class.', 'duration': 28.232, 'max_score': 0.384, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30384.jpg'}, {'end': 68.828, 'src': 'embed', 'start': 38.864, 'weight': 1, 'content': [{'end': 45.65, 'text': 'I will do what I can to post the PDF of the slides on the website, probably just before the lecture,', 'start': 38.864, 'duration': 6.786}, {'end': 47.551, 'text': 'probably just a few minutes before the lecture usually.', 'start': 45.65, 'duration': 1.901}, {'end': 53.296, 'text': 'But it should be there by the time you get to class, or at least by the time I get to class.', 'start': 49.013, 'duration': 4.283}, {'end': 61.042, 'text': "There's going to be nine lectures that I'm going to teach on Monday evenings.", 'start': 56.258, 'duration': 4.784}, {'end': 68.828, 'text': 'There is also a practical session every Tuesday night that Fredo and Marc will be running.', 'start': 61.723, 'duration': 7.105}], 'summary': 'Nine monday evening lectures, pdf slides posted just before class, practical session on tuesday nights.', 'duration': 29.964, 'max_score': 38.864, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo3038864.jpg'}], 'start': 0.384, 'title': 'Introduction to course and class structure', 'summary': 'Introduces the course and class structure, highlighting the two tas, alfredo canziani and mark goldstein, and the availability of course slides on the website before each lecture.', 'chapters': [{'end': 61.042, 'start': 0.384, 'title': 'Introduction to course and class structure', 'summary': 'Outlines that the class is being run by two tas, alfredo canziani and mark goldstein, and mentions the availability of the course slides on the website before each lecture.', 'duration': 60.658, 'highlights': ['The class is being run by two TAs, Alfredo Canziani and Mark Goldstein, and students will interact with them more frequently than with the lecturer.', 'The course slides will be posted on the website just before the lecture, ensuring availability for students before class.']}], 'duration': 60.658, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30384.jpg', 'highlights': ['The class is being run by two TAs, Alfredo Canziani and Mark Goldstein, and students will interact with them more frequently than with the lecturer.', 'The course slides will be posted on the website just before the lecture, ensuring availability for students before class.']}, {'end': 820.357, 'segs': [{'end': 86.481, 'src': 'embed', 'start': 61.723, 'weight': 0, 'content': [{'end': 68.828, 'text': 'There is also a practical session every Tuesday night that Fredo and Marc will be running.', 'start': 61.723, 'duration': 7.105}, {'end': 79.496, 'text': "So they'll go through some of the practical questions, some refreshers on mathematics that are necessary for this and basic concepts,", 'start': 68.948, 'duration': 10.548}, {'end': 83.86, 'text': 'some tutorials on how to use PyTorch and various other software tools.', 'start': 79.496, 'duration': 4.364}, {'end': 86.481, 'text': "And there's going to be three guest lectures.", 'start': 84.96, 'duration': 1.521}], 'summary': 'Practical sessions every tuesday, covering math, pytorch, and 3 guest lectures.', 'duration': 24.758, 'max_score': 61.723, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo3061723.jpg'}, {'end': 256.146, 'src': 'embed', 'start': 226.274, 'weight': 1, 'content': [{'end': 229.038, 'text': "because you've played with this before or other things.", 'start': 226.274, 'duration': 2.764}, {'end': 236.958, 'text': 'So, intro to supervised learning in neural nets, deep learning.', 'start': 233.316, 'duration': 3.642}, {'end': 238.599, 'text': "that's what I'm going to talk about today.", 'start': 236.958, 'duration': 1.641}, {'end': 241.38, 'text': 'what deep learning can do, what it cannot do, what are good features?', 'start': 238.599, 'duration': 2.781}, {'end': 244.681, 'text': "Deep learning is about learning representations, so I'm going to talk about.", 'start': 241.54, 'duration': 3.141}, {'end': 249.683, 'text': 'Next week will be about backpropagation and basic architectural components.', 'start': 245.322, 'duration': 4.361}, {'end': 256.146, 'text': 'so things like The fact that you build neural nets out of modules and you connect with each other, you compute gradients,', 'start': 249.683, 'duration': 6.463}], 'summary': 'Intro to supervised learning in neural nets and deep learning, covering what it can and cannot do, and good features.', 'duration': 29.872, 'max_score': 226.274, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30226274.jpg'}, {'end': 628.405, 'src': 'embed', 'start': 600.854, 'weight': 2, 'content': [{'end': 605.236, 'text': "It's already In the space of a year, it's become dominant in natural language processing.", 'start': 600.854, 'duration': 4.382}, {'end': 615.66, 'text': "And in the last few months just three months there's been a few papers that show that self-supervised learning methods actually work really well in things like computer vision as well.", 'start': 605.917, 'duration': 9.743}, {'end': 623.003, 'text': 'And so my guess is that self-supervised learning is going to take over the world in the next few years.', 'start': 616.02, 'duration': 6.983}, {'end': 625.224, 'text': "So I think it's useful to hear about it in this class.", 'start': 623.063, 'duration': 2.161}, {'end': 628.405, 'text': 'So the things like.', 'start': 626.624, 'duration': 1.781}], 'summary': 'Self-supervised learning is becoming dominant in natural language processing and computer vision, with potential to take over the world in the next few years.', 'duration': 27.551, 'max_score': 600.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30600854.jpg'}], 'start': 61.723, 'title': 'Deep learning course overview and introduction to deep learning in neural nets', 'summary': 'Provides an overview of the deep learning course structure, practical sessions, and key topics such as natural language processing and computer vision, as well as an introduction to supervised learning in neural nets, deep learning architecture, applications, and future potential.', 'chapters': [{'end': 226.274, 'start': 61.723, 'title': 'Deep learning course overview', 'summary': 'Outlines the structure of the deep learning course, including practical sessions, guest lectures, midterm exam, final project, group projects, and prerequisite knowledge, with a focus on topics like natural language processing, computer vision, and self-supervised learning.', 'duration': 164.551, 'highlights': ['The course includes practical sessions on Tuesdays covering practical questions, mathematics refreshers, basic concepts, PyTorch tutorials, and other software tools.', 'There will be three guest lectures on topics such as natural language processing, computer vision, and self-supervised learning.', 'There is a possibility of a midterm exam around March, with evaluation based on the midterm and final project.', 'The project will likely involve a combination of self-supervised learning and autonomous driving, potentially requiring collaboration in groups of two or three.', 'The course prerequisite includes basic familiarity with machine learning, PyTorch, TensorFlow, and training neural nets.']}, {'end': 820.357, 'start': 226.274, 'title': 'Introduction to deep learning in neural nets', 'summary': 'Covers an introduction to supervised learning in neural nets, deep learning architecture components, applications, and future potential, along with the importance of understanding the optimization and inference processes in deep learning.', 'duration': 594.083, 'highlights': ['The chapter covers an introduction to supervised learning in neural nets, deep learning architecture components, applications, and future potential. Topics include backpropagation, basic architectural components, various types of architectures, macro architectures, recurrent neural nets, applications of recurrent neural nets, and recent architectures such as memory networks, transformers, and adapters.', 'Understanding the optimization process in deep learning is crucial due to the non-convex nature of the cost function and the presence of local minima and saddle points. Key concepts include optimization in the convex case, gradient-based optimization, initialization tricks, normalization tricks, regularization tricks like dropout, gradient clipping, momentum, average SGD, methods for parallelizing SGD, target prop, and the Lagrangian formulation of backprop.', 'The chapter emphasizes the significance of understanding the inference processes in neural nets and introduces the concept of energy-based models for learning, whether supervised, unsupervised, or self-supervised. It covers the use of energy-based models for implementing reasoning with neural nets, representing problems with multiple possible answers, and the application of energy-based models in self-supervised learning and beyond.', 'The future potential of self-supervised learning, particularly in natural language processing and computer vision, is highlighted as a dominant trend that is likely to become increasingly influential in the field of deep learning. The chapter discusses various methods under self-supervised learning, such as variational autoencoders, denoising autoencoders, BERT, and generative adversarial networks, and the potential for achieving higher levels of machine intelligence through self-supervised learning approaches.']}], 'duration': 758.634, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo3061723.jpg', 'highlights': ['The course includes practical sessions on Tuesdays covering practical questions, mathematics refreshers, basic concepts, PyTorch tutorials, and other software tools.', 'The chapter covers an introduction to supervised learning in neural nets, deep learning architecture components, applications, and future potential.', 'The future potential of self-supervised learning, particularly in natural language processing and computer vision, is highlighted as a dominant trend that is likely to become increasingly influential in the field of deep learning.']}, {'end': 2109.891, 'segs': [{'end': 900.476, 'src': 'embed', 'start': 873.476, 'weight': 1, 'content': [{'end': 880.261, 'text': 'And they got the idea that if neurons are basically threshold units that are on or off,', 'start': 873.476, 'duration': 6.785}, {'end': 885.485, 'text': 'then by connecting neurons with each other you can build Boolean circuits and you can basically do logical inference with neurons.', 'start': 880.261, 'duration': 5.224}, {'end': 892.41, 'text': 'And so they say, you know, the brain is basically a logical inference machine because the neurons are binary.', 'start': 886.165, 'duration': 6.245}, {'end': 900.476, 'text': 'And this idea, so the idea was that a neuron computes a weighted sum of its inputs and then compares the weighted sum to a threshold.', 'start': 892.95, 'duration': 7.526}], 'summary': 'Neurons as threshold units form boolean circuits for logical inference in the brain.', 'duration': 27, 'max_score': 873.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30873476.jpg'}, {'end': 960.541, 'src': 'embed', 'start': 937.281, 'weight': 0, 'content': [{'end': 945.384, 'text': "And he had the idea of what's now called Hebbian learning, which is that if two neurons fire together, then the connection that links them increases,", 'start': 937.281, 'duration': 8.103}, {'end': 947.425, 'text': "and if they don't fire together, maybe it decreases.", 'start': 945.384, 'duration': 2.041}, {'end': 954.554, 'text': "That's not an idea for a learning algorithm, but it's sort of a first idea perhaps.", 'start': 950.349, 'duration': 4.205}, {'end': 960.541, 'text': 'And then cybernetics was proposed by this guy Norbert Wiener, who is here.', 'start': 955.775, 'duration': 4.766}], 'summary': 'Hebbian learning suggests increased neuron connection with firing together. cybernetics proposed by norbert wiener.', 'duration': 23.26, 'max_score': 937.281, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30937281.jpg'}, {'end': 1167.357, 'src': 'embed', 'start': 1140.841, 'weight': 3, 'content': [{'end': 1145.282, 'text': "People don't listen to the same kind of fashions, if you will.", 'start': 1140.841, 'duration': 4.441}, {'end': 1154.148, 'text': 'And then the field took off again in 1985, roughly, with the emergence of back propagation.', 'start': 1145.942, 'duration': 8.206}, {'end': 1159.471, 'text': 'So back propagation is an algorithm for training multilayer neural nets, as many of you know.', 'start': 1154.188, 'duration': 5.283}, {'end': 1164.114, 'text': "And people were looking for something like this in the 60s and basically didn't find it.", 'start': 1160.252, 'duration': 3.862}, {'end': 1167.357, 'text': "And the reason they didn't find it was because they had the wrong neurons.", 'start': 1164.775, 'duration': 2.582}], 'summary': 'Neural net training algorithm back propagation emerged in 1985, after unsuccessful search in the 60s.', 'duration': 26.516, 'max_score': 1140.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301140841.jpg'}, {'end': 1299.453, 'src': 'embed', 'start': 1272.811, 'weight': 4, 'content': [{'end': 1276.192, 'text': 'So when, around 2009, 2010,,', 'start': 1272.811, 'duration': 3.381}, {'end': 1282.853, 'text': 'people realized that you could use multilayer neural nets trained with backprop and get an improvement for speech recognition.', 'start': 1276.192, 'duration': 6.661}, {'end': 1284.053, 'text': "It didn't start with ImageNet.", 'start': 1282.973, 'duration': 1.08}, {'end': 1288.303, 'text': 'It started with speech recognition around 2010.', 'start': 1284.113, 'duration': 4.19}, {'end': 1293.408, 'text': 'And within 18 months of the first papers being published on this,', 'start': 1288.303, 'duration': 5.105}, {'end': 1299.453, 'text': 'every major player in speech recognition had deployed commercial speech recognition systems that use neural nets.', 'start': 1293.408, 'duration': 6.045}], 'summary': 'By 2010, neural nets improved speech recognition, leading to rapid commercial deployment.', 'duration': 26.642, 'max_score': 1272.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301272811.jpg'}, {'end': 1342.573, 'src': 'embed', 'start': 1315.785, 'weight': 5, 'content': [{'end': 1321.451, 'text': 'Then, at the end of 2012,, early 2013,, the same thing happened in computer vision,', 'start': 1315.785, 'duration': 5.666}, {'end': 1325.075, 'text': 'where the computer vision community realized deep learning convolutional nets,', 'start': 1321.451, 'duration': 3.624}, {'end': 1328.599, 'text': 'in particular work much better than whatever it is that they were using before,', 'start': 1325.075, 'duration': 3.524}, {'end': 1332.583, 'text': 'and started to switch using convolutional nets and basically abandon all previous techniques.', 'start': 1328.599, 'duration': 3.984}, {'end': 1335.306, 'text': 'So that created the second revolution now in computer vision.', 'start': 1333.184, 'duration': 2.122}, {'end': 1342.573, 'text': 'And then three years later, around 2016 or so, the same thing happened in natural language processing,', 'start': 1336.087, 'duration': 6.486}], 'summary': 'In 2012-2013, computer vision shifted to deep learning, followed by natural language processing in 2016.', 'duration': 26.788, 'max_score': 1315.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301315785.jpg'}, {'end': 1393.072, 'src': 'embed', 'start': 1361.028, 'weight': 6, 'content': [{'end': 1362.169, 'text': 'But let me get to this.', 'start': 1361.028, 'duration': 1.141}, {'end': 1368.119, 'text': "Okay, so you all know what supervised running is, I'm sure.", 'start': 1364.818, 'duration': 3.301}, {'end': 1373.8, 'text': 'And this is really what the vast majority not the vast majority,', 'start': 1368.139, 'duration': 5.661}, {'end': 1381.342, 'text': 'like 90-some percent applications of deep learning use supervised running as kind of the main thing.', 'start': 1373.8, 'duration': 7.542}, {'end': 1387.463, 'text': 'So supervised running is the process by which you collect a bunch of pairs of inputs and outputs.', 'start': 1381.362, 'duration': 6.101}, {'end': 1393.072, 'text': "of examples of, let's say, images together with a category, if you want to do image recognition,", 'start': 1388.604, 'duration': 4.468}], 'summary': 'Supervised learning is used in 90% of deep learning applications, involving pairs of inputs and outputs for image recognition.', 'duration': 32.044, 'max_score': 1361.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301361028.jpg'}, {'end': 1434.685, 'src': 'embed', 'start': 1412.476, 'weight': 7, 'content': [{'end': 1423.965, 'text': 'then you tweak the parameters of the machine think of it as a parameterized function of some kind and you tweak the parameters of that function in such a way that the output gets closer to the one you want.', 'start': 1412.476, 'duration': 11.489}, {'end': 1427.428, 'text': 'Okay? This is a non-technical term what supervised learning is all about.', 'start': 1424.586, 'duration': 2.842}, {'end': 1431.442, 'text': 'So show a picture of a car.', 'start': 1430.161, 'duration': 1.281}, {'end': 1434.685, 'text': "If the system doesn't say car, tweak the parameters.", 'start': 1431.602, 'duration': 3.083}], 'summary': 'Supervised learning involves tweaking parameters to match desired output, as in identifying a car in a picture.', 'duration': 22.209, 'max_score': 1412.476, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301412476.jpg'}, {'end': 1483.162, 'src': 'embed', 'start': 1455.99, 'weight': 8, 'content': [{'end': 1459.321, 'text': "That's what gradient computation and backpropagation is about.", 'start': 1455.99, 'duration': 3.331}, {'end': 1465.238, 'text': 'But before we get to this, A little bit of history again.', 'start': 1460.164, 'duration': 5.074}, {'end': 1470.359, 'text': 'So there was a flurry of models, basic models for classification.', 'start': 1465.739, 'duration': 4.62}, {'end': 1476.26, 'text': 'You know, starting with a perceptron, there was another competing model called the Eta line, which is on the top right here.', 'start': 1470.379, 'duration': 5.881}, {'end': 1480.821, 'text': 'Based on the same kind of basic architectures, computer weighted sum of inputs compared to a threshold.', 'start': 1476.42, 'duration': 4.401}, {'end': 1483.162, 'text': "If it's above the threshold, turn on.", 'start': 1480.861, 'duration': 2.301}], 'summary': 'History of basic models for classification, including perceptron and eta line.', 'duration': 27.172, 'max_score': 1455.99, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301455990.jpg'}, {'end': 1686.215, 'src': 'embed', 'start': 1659.323, 'weight': 9, 'content': [{'end': 1667.685, 'text': 'where one stage is built by hand, where the representation of the input is a result of some hand-engineered program,', 'start': 1659.323, 'duration': 8.362}, {'end': 1671.646, 'text': 'essentially the idea of deep learning is that you learn the entire task end-to-end.', 'start': 1667.685, 'duration': 3.961}, {'end': 1683.954, 'text': 'Okay, so basically you build your pattern recognition system or whatever it is that you want to do with it as a cascade or a sequence of modules.', 'start': 1672.748, 'duration': 11.206}, {'end': 1686.215, 'text': 'All of those modules have tunable parameters.', 'start': 1684.294, 'duration': 1.921}], 'summary': 'Deep learning enables end-to-end task learning with tunable parameters.', 'duration': 26.892, 'max_score': 1659.323, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301659323.jpg'}, {'end': 1906.045, 'src': 'embed', 'start': 1872.926, 'weight': 10, 'content': [{'end': 1874.728, 'text': "That's a basic neural net, essentially.", 'start': 1872.926, 'duration': 1.802}, {'end': 1879.411, 'text': 'Okay?. Now, why is that called a neural net at all?', 'start': 1876.009, 'duration': 3.402}, {'end': 1888.539, 'text': "It's because when you take a vector and you multiply a vector by a matrix To compute each component of the output,", 'start': 1879.731, 'duration': 8.808}, {'end': 1895.261, 'text': 'you actually compute a weighted sum of the components of the input by a corresponding row in the matrix.', 'start': 1888.539, 'duration': 6.722}, {'end': 1906.045, 'text': "right?. So this little symbol here there's a bunch of components of the vector coming into this layer and you take a row of the matrix,", 'start': 1895.261, 'duration': 10.784}], 'summary': 'Introduction to neural nets and matrix computation', 'duration': 33.119, 'max_score': 1872.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo301872926.jpg'}], 'start': 820.477, 'title': 'History and evolution of neural nets', 'summary': 'Discusses the history of cybernetics and ai, including the work of mcculloch and pitts, hebbian learning, and the concept of self-regulating systems. it also traces the evolution of neural nets from the 1950s to their resurgence in the 2010s, highlighting their impact on fields such as speech recognition and natural language processing.', 'chapters': [{'end': 1016.55, 'start': 820.477, 'title': 'History of cybernetics and ai', 'summary': 'Discusses the history of cybernetics, including the work of mcculloch and pitts in the 1940s, hebbian learning, and the concept of self-regulating systems proposed by norbert wiener, which laid the foundation for ai and neural networks.', 'duration': 196.073, 'highlights': ['The concept of self-regulating systems proposed by Norbert Wiener, which laid the foundation for AI and neural networks. Norbert Wiener proposed the idea of cybernetics, involving systems with sensors and actuators that enable feedback loops and self-regulation, forming the basis for AI and neural networks.', 'The work of McCulloch and Pitts in the 1940s, who introduced the idea of neurons as threshold units and building Boolean circuits for logical inference. McCulloch and Pitts introduced the concept of neurons as threshold units and building Boolean circuits, laying the groundwork for logical inference using neurons.', 'Hebbian learning, the idea that modifying the strength of connections between neurons can lead to learning. Donald Hebb proposed the concept of Hebbian learning, suggesting that modifying the strength of connections between neurons, based on their firing together, can facilitate learning.']}, {'end': 1358.566, 'start': 1018.011, 'title': 'Evolution of neural nets', 'summary': 'Traces the evolution of neural nets from the early 1950s to their resurgence in the 2010s, highlighting their initial development, decline, and subsequent impact on fields such as speech recognition, computer vision, and natural language processing.', 'duration': 340.555, 'highlights': ['Neural nets saw a period of disinterest and abandonment between the late 1960s and 1984, with a resurgence in 1985 following the emergence of back propagation, which allowed for training multilayer neural nets. Period of disinterest and abandonment between 1969/68 and 1984; resurgence in 1985', 'Neural nets experienced a period of resurgence in speech recognition around 2010, with major players deploying commercial speech recognition systems within 18 months of the first papers being published on using multilayer neural nets trained with backprop. Resurgence in speech recognition around 2010; deployment of commercial speech recognition systems within 18 months', 'The computer vision community abandoned previous techniques and embraced deep learning convolutional nets, marking a revolution in computer vision around 2012-2013. Abandonment of previous techniques and adoption of deep learning convolutional nets around 2012-2013']}, {'end': 1544.926, 'start': 1361.028, 'title': 'Supervised learning in deep learning', 'summary': 'Discusses the concept of supervised learning in deep learning, highlighting the prevalence of its usage in deep learning applications and the process of tweaking parameters to achieve the desired output, with a brief overview of the history of basic models for classification.', 'duration': 183.898, 'highlights': ['Supervised learning is the main approach in 90% of deep learning applications, involving feeding examples to the machine, tweaking parameters to achieve the desired output, and utilizing gradient computation and backpropagation. Supervised learning is the main approach in 90% of deep learning applications, involving feeding examples to the machine, tweaking parameters to achieve the desired output, and utilizing gradient computation and backpropagation.', 'The process of supervised learning entails tweaking the parameters of the machine, such as the weights in neural nets, to adjust the output closer to the desired result. The process of supervised learning entails tweaking the parameters of the machine, such as the weights in neural nets, to adjust the output closer to the desired result.', 'The chapter also provides a brief history of basic models for classification, including the perceptron and the Eta line, which laid the conceptual foundation for pattern recognition systems for several decades. The chapter also provides a brief history of basic models for classification, including the perceptron and the Eta line, which laid the conceptual foundation for pattern recognition systems for several decades.']}, {'end': 2109.891, 'start': 1546.007, 'title': 'Deep learning: end-to-end pattern recognition', 'summary': 'Introduces the concept of deep learning as a method to learn the entire pattern recognition task end-to-end, using multiple layers with tunable parameters and non-linear functions, revolutionizing the traditional two-stage process of pattern recognition with a hand-engineered feature extractor.', 'duration': 563.884, 'highlights': ['Deep learning as an end-to-end pattern recognition method Deep learning revolutionizes the traditional two-stage process of pattern recognition with a hand-engineered feature extractor by learning the entire task end-to-end using multiple layers with tunable parameters and non-linear functions.', 'The structure of a basic neural net The structure of a basic neural net involves taking an input represented as a vector, multiplying it by a matrix with tunable parameters, passing the resulting vector through a non-linear function, and repeating this process through multiple layers, with the goal of learning the task end-to-end.', 'Supervised learning and gradient descent for parameter optimization Supervised learning involves comparing the output of the system with a target output using an objective function to compute the average error, which is minimized by finding the value of parameters through gradient descent, a process akin to descending a smooth mountain to reach the valley.']}], 'duration': 1289.414, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo30820477.jpg', 'highlights': ['Norbert Wiener proposed the idea of cybernetics, involving systems with sensors and actuators that enable feedback loops and self-regulation, forming the basis for AI and neural networks.', 'McCulloch and Pitts introduced the concept of neurons as threshold units and building Boolean circuits, laying the groundwork for logical inference using neurons.', 'Hebbian learning, suggesting that modifying the strength of connections between neurons, based on their firing together, can facilitate learning.', 'Resurgence in 1985 following the emergence of back propagation, which allowed for training multilayer neural nets.', 'Resurgence in speech recognition around 2010; deployment of commercial speech recognition systems within 18 months.', 'Adoption of deep learning convolutional nets around 2012-2013, marking a revolution in computer vision.', 'Supervised learning is the main approach in 90% of deep learning applications, involving feeding examples to the machine, tweaking parameters to achieve the desired output, and utilizing gradient computation and backpropagation.', 'The process of supervised learning entails tweaking the parameters of the machine, such as the weights in neural nets, to adjust the output closer to the desired result.', 'The chapter also provides a brief history of basic models for classification, including the perceptron and the Eta line, which laid the conceptual foundation for pattern recognition systems for several decades.', 'Deep learning revolutionizes the traditional two-stage process of pattern recognition with a hand-engineered feature extractor by learning the entire task end-to-end using multiple layers with tunable parameters and non-linear functions.', 'The structure of a basic neural net involves taking an input represented as a vector, multiplying it by a matrix with tunable parameters, passing the resulting vector through a non-linear function, and repeating this process through multiple layers, with the goal of learning the task end-to-end.', 'Supervised learning involves comparing the output of the system with a target output using an objective function to compute the average error, which is minimized by finding the value of parameters through gradient descent, a process akin to descending a smooth mountain to reach the valley.']}, {'end': 2433.357, 'segs': [{'end': 2213.313, 'src': 'embed', 'start': 2184.266, 'weight': 0, 'content': [{'end': 2187.668, 'text': 'And it looks kind of semi-periodic, because the examples are always shown in the same order,', 'start': 2184.266, 'duration': 3.402}, {'end': 2189.869, 'text': "which is not what you're supposed to do with stochastic gradient.", 'start': 2187.668, 'duration': 2.201}, {'end': 2192.571, 'text': 'But as you can see, the path is really erratic.', 'start': 2190.69, 'duration': 1.881}, {'end': 2195.372, 'text': "Why do people use this? There's various reasons.", 'start': 2193.071, 'duration': 2.301}, {'end': 2201.316, 'text': 'One reason is that empirically it converges much, much faster, particularly if you have a very large training set.', 'start': 2196.493, 'duration': 4.823}, {'end': 2205.707, 'text': 'And the other reason is that you actually get better generalization in the end.', 'start': 2202.944, 'duration': 2.763}, {'end': 2209.33, 'text': 'So if you measure the performance of the system on a separate set that you you know,', 'start': 2205.767, 'duration': 3.563}, {'end': 2213.313, 'text': 'I assume you all know the concept of training set and test set and validation set.', 'start': 2209.33, 'duration': 3.983}], 'summary': 'Stochastic gradient descent converges faster with large training sets, leading to better generalization.', 'duration': 29.047, 'max_score': 2184.266, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302184266.jpg'}, {'end': 2254.31, 'src': 'embed', 'start': 2217.277, 'weight': 1, 'content': [{'end': 2223.442, 'text': 'you get generally better generalization if you use stochastic gradient and if you actually use the real true gradient, descent.', 'start': 2217.277, 'duration': 6.165}, {'end': 2237.53, 'text': "The problem is, yes, No, it's worse.", 'start': 2225.524, 'duration': 12.006}, {'end': 2241.932, 'text': 'So, computing the entire gradient or the entire dataset, it is computationally feasible.', 'start': 2237.87, 'duration': 4.062}, {'end': 2243.413, 'text': 'I mean, you can do it.', 'start': 2242.773, 'duration': 0.64}, {'end': 2249.356, 'text': "It's not any more expensive than, you know.", 'start': 2243.653, 'duration': 5.703}, {'end': 2252.629, 'text': 'It will be less noisy, but it will be slower.', 'start': 2251.408, 'duration': 1.221}, {'end': 2254.31, 'text': 'So let me tell you why.', 'start': 2253.249, 'duration': 1.061}], 'summary': 'Using stochastic gradient gives better generalization, even though computing the entire dataset is feasible and less noisy.', 'duration': 37.033, 'max_score': 2217.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302217277.jpg'}, {'end': 2362.324, 'src': 'embed', 'start': 2297.996, 'weight': 2, 'content': [{'end': 2305.317, 'text': 'where you have samples that have a lot of redundance in them, like many samples are very similar to each other, et cetera.', 'start': 2297.996, 'duration': 7.321}, {'end': 2313.356, 'text': 'So if there is any kind of coherence, If your system is capable of generalization, then that means stochastic gradient is going to be more efficient,', 'start': 2305.618, 'duration': 7.738}, {'end': 2318.561, 'text': "because if you don't use stochastic gradient, you're not going to be able to take advantage of that redundancy.", 'start': 2313.356, 'duration': 5.205}, {'end': 2323.686, 'text': "So that's one case where noise is good for you.", 'start': 2321.243, 'duration': 2.443}, {'end': 2329.419, 'text': "Okay Don't pay attention to the formula.", 'start': 2326.417, 'duration': 3.002}, {'end': 2332.242, 'text': "Don't get scared because we're going to come back to this in more details.", 'start': 2329.519, 'duration': 2.723}, {'end': 2338.126, 'text': 'But why is back propagation called back propagation? And again, this is very informal.', 'start': 2333.122, 'duration': 5.004}, {'end': 2343.39, 'text': "It's basically a practical application of the chain rule.", 'start': 2340.808, 'duration': 2.582}, {'end': 2350.475, 'text': 'So you can think of a neural net of the type that I showed earlier as a bunch of modules that are stacked on top of each other.', 'start': 2343.651, 'duration': 6.824}, {'end': 2353.737, 'text': 'And you can think of this as compositions of functions.', 'start': 2351.515, 'duration': 2.222}, {'end': 2358.12, 'text': 'And you all know the basic rule of calculus.', 'start': 2354.698, 'duration': 3.422}, {'end': 2362.324, 'text': 'you know how do you compute the derivative of a function composed with another function?', 'start': 2358.12, 'duration': 4.204}], 'summary': 'Stochastic gradient is efficient for redundant samples, back propagation applies the chain rule in neural networks.', 'duration': 64.328, 'max_score': 2297.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302297996.jpg'}, {'end': 2433.357, 'src': 'embed', 'start': 2385.843, 'weight': 5, 'content': [{'end': 2390.667, 'text': "More generally, actually, they take multidimensional arrays as input and multidimensional arrays as output, but it doesn't matter.", 'start': 2385.843, 'duration': 4.824}, {'end': 2402.076, 'text': 'And basically, what is the generalization of this chain rule in the case of kind of functional modules that have multiple inputs,', 'start': 2392.368, 'duration': 9.708}, {'end': 2404.458, 'text': 'multiple outputs that you can view as functions right?', 'start': 2402.076, 'duration': 2.382}, {'end': 2410.483, 'text': "And basically it's the same rule if you kind of blindly apply them.", 'start': 2407.021, 'duration': 3.462}, {'end': 2416.869, 'text': "It's the same rule as you applied for regular derivatives, except here you have to use partial derivatives.", 'start': 2410.524, 'duration': 6.345}, {'end': 2419.371, 'text': 'But you know,', 'start': 2416.929, 'duration': 2.442}, {'end': 2427.973, 'text': 'What you see in the end is that if you want to compute the derivative of the difference between the output you want and the output you get,', 'start': 2420.987, 'duration': 6.986}, {'end': 2433.357, 'text': 'which is the value of your objective function with respect to any variable inside of the network,', 'start': 2427.973, 'duration': 5.384}], 'summary': 'The chain rule applies to functional modules with multiple inputs and outputs, using partial derivatives for computing the derivative.', 'duration': 47.514, 'max_score': 2385.843, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302385843.jpg'}], 'start': 2113.253, 'title': 'Stochastic gradient descent and back propagation in neural networks', 'summary': 'Discusses the concept of stochastic gradient descent, highlighting its efficiency in computing the objective function and gradient over training sets, its erratic yet faster convergence compared to true gradient descent, and its better generalization performance, particularly in cases of large training sets and data redundancy. additionally, it explains the practical application of back propagation in neural networks through the chain rule and the derivative computation, involving scalar and vector functions.', 'chapters': [{'end': 2318.561, 'start': 2113.253, 'title': 'Stochastic gradient descent', 'summary': 'Discusses the concept of stochastic gradient descent, highlighting its efficiency in computing the objective function and gradient over training sets, its erratic yet faster convergence compared to true gradient descent, and its better generalization performance, particularly in cases of large training sets and data redundancy.', 'duration': 205.308, 'highlights': ['Stochastic gradient descent converges much faster, particularly with large training sets. Empirical evidence shows that stochastic gradient descent converges much faster, especially when dealing with very large training sets.', 'Stochastic gradient descent yields better generalization performance compared to true gradient descent. Using stochastic gradient descent results in better generalization performance when testing the system on a separate set, compared to using the true gradient descent.', 'Efficiency of stochastic gradient descent due to data redundancy and coherence in samples. Stochastic gradient descent is more efficient in cases where there is redundancy and coherence in the samples, as it takes advantage of this redundancy for faster computations.']}, {'end': 2433.357, 'start': 2321.243, 'title': 'Back propagation in neural networks', 'summary': 'Discusses back propagation in neural networks, explaining its practical application through the chain rule and the derivative computation, involving scalar and vector functions.', 'duration': 112.114, 'highlights': ['Back propagation in neural networks is a practical application of the chain rule, involving the computation of derivatives for scalar and vector functions.', 'Neural networks can be viewed as a stack of modules that are compositions of functions, where the derivative of a function composed with another function is computed using the product of their derivatives.', 'The generalization of the chain rule for functional modules with multiple inputs and outputs involves the use of partial derivatives.', 'The computation of the derivative of the objective function with respect to any variable inside the network is crucial for back propagation in neural networks.']}], 'duration': 320.104, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302113253.jpg', 'highlights': ['Stochastic gradient descent converges much faster, particularly with large training sets. Empirical evidence shows that stochastic gradient descent converges much faster, especially when dealing with very large training sets.', 'Stochastic gradient descent yields better generalization performance compared to true gradient descent. Using stochastic gradient descent results in better generalization performance when testing the system on a separate set, compared to using the true gradient descent.', 'Efficiency of stochastic gradient descent due to data redundancy and coherence in samples. Stochastic gradient descent is more efficient in cases where there is redundancy and coherence in the samples, as it takes advantage of this redundancy for faster computations.', 'Back propagation in neural networks is a practical application of the chain rule, involving the computation of derivatives for scalar and vector functions.', 'Neural networks can be viewed as a stack of modules that are compositions of functions, where the derivative of a function composed with another function is computed using the product of their derivatives.', 'The generalization of the chain rule for functional modules with multiple inputs and outputs involves the use of partial derivatives.', 'The computation of the derivative of the objective function with respect to any variable inside the network is crucial for back propagation in neural networks.']}, {'end': 3452.401, 'segs': [{'end': 2628.655, 'src': 'embed', 'start': 2598.878, 'weight': 0, 'content': [{'end': 2603.96, 'text': 'so, first of all, well, this is a human brain, but I mean this chart is from much later,', 'start': 2598.878, 'duration': 5.082}, {'end': 2609.242, 'text': 'but all mammalian visual system is organized in a similar way.', 'start': 2603.96, 'duration': 5.282}, {'end': 2613.744, 'text': 'You have signals coming into your eyes, striking your retina.', 'start': 2610.183, 'duration': 3.561}, {'end': 2621.108, 'text': 'You have a few layers of neurons in your retina in front of your photoreceptors that kind of pre-process signal if you want.', 'start': 2614.284, 'duration': 6.824}, {'end': 2628.655, 'text': "They kind of compress it because you can't have, you know, the human eye has something like a hundred million pixels.", 'start': 2621.128, 'duration': 7.527}], 'summary': 'Mammalian visual system processes signals from eyes with around 100 million pixels.', 'duration': 29.777, 'max_score': 2598.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302598878.jpg'}, {'end': 2846.91, 'src': 'embed', 'start': 2814.217, 'weight': 1, 'content': [{'end': 2815.738, 'text': 'And it only turns on for Jennifer Aniston.', 'start': 2814.217, 'duration': 1.521}, {'end': 2834.006, 'text': 'So with the idea that somehow the visual cortex can do pattern recognition and seems to have this sort of hierarchical structure,', 'start': 2824.222, 'duration': 9.784}, {'end': 2834.887, 'text': 'multi-layer structure.', 'start': 2834.006, 'duration': 0.881}, {'end': 2841.636, 'text': "There's also the idea that the visual process is essentially a feed-forward process.", 'start': 2834.908, 'duration': 6.728}, {'end': 2845.649, 'text': "So the process by which you recognize every object, It's very fast.", 'start': 2841.676, 'duration': 3.973}, {'end': 2846.91, 'text': 'It takes about 100 milliseconds.', 'start': 2845.689, 'duration': 1.221}], 'summary': 'Visual cortex can recognize objects in 100 milliseconds', 'duration': 32.693, 'max_score': 2814.217, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302814217.jpg'}, {'end': 3239.077, 'src': 'embed', 'start': 3205.24, 'weight': 2, 'content': [{'end': 3208.105, 'text': 'Speech recognition experiments were somewhat successful, but not as much.', 'start': 3205.24, 'duration': 2.865}, {'end': 3217.265, 'text': 'pretty quickly we realized we could use those convolutional nets not just to recognize individual characters but to recognize groups of characters.', 'start': 3210.3, 'duration': 6.965}, {'end': 3218.806, 'text': 'so multiple characters at a single time.', 'start': 3217.265, 'duration': 1.541}, {'end': 3227.732, 'text': "And it's because of this convolutional nature of the network, which I'll come back to in three lectures, that basically allow those systems to just.", 'start': 3219.487, 'duration': 8.245}, {'end': 3239.077, 'text': 'it will be applied to a large image and then they will just turn on whatever they see in their field of view, whatever they see,', 'start': 3228.633, 'duration': 10.444}], 'summary': 'Speech recognition experiments somewhat successful, convolutional nets recognize groups of characters.', 'duration': 33.837, 'max_score': 3205.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo303205240.jpg'}], 'start': 2433.357, 'title': 'Neural networks and image processing', 'summary': 'Delves into processing challenges, including matrix size and structure, drawing inspiration from the visual system, and the organization of the visual cortex. it also explores visual recognition, the fast recognition process, and convolutional neural networks, and discusses the evolution of pattern recognition, the application of convolutional nets for character and object recognition, and the use of semantic segmentation for classifying multiple categories, with practical examples and successful applications in robotics and image analysis.', 'chapters': [{'end': 2744.219, 'start': 2433.357, 'title': 'Neural network structure and image processing', 'summary': 'Discusses the challenges of processing images in neural networks, including the matrix size and structure, drawing inspiration from the mammalian visual system, and the organization of the visual cortex.', 'duration': 310.862, 'highlights': ["The visual cortex organization and mammalian visual system's compression of signals to 1 million fibers from 100 million pixels, influenced by the work of Hubel and Wiesel, is discussed, addressing the challenges in processing image inputs in neural networks.", 'The impracticality of using a completely full matrix for image inputs due to the large number of components and the need for structure hypothesis is highlighted, emphasizing the limitations in using a full matrix for practical applications.', "The challenges in the organization of the mammalian visual system and the mistake in vertebrates' wiring configuration, leading to a blind spot in the visual field, are detailed, providing insights into the evolutionary differences in wiring configuration between vertebrates and invertebrates."]}, {'end': 3155.76, 'start': 2745.14, 'title': 'Visual recognition and convolutional neural networks', 'summary': 'Discusses the feed-forward ventral pathway for visual recognition, highlighting the hierarchical structure and the fast recognition process taking about 100 milliseconds, and introduces the concept of convolutional neural networks inspired by the discoveries of hubel, wiesel, and fukushima.', 'duration': 410.62, 'highlights': ['The feed-forward ventral pathway for visual recognition is a fast process taking about 100 milliseconds, indicating the hierarchical structure of the visual cortex and the absence of time for recurrent connections. Fast recognition process (100 milliseconds), absence of time for recurrent connections, hierarchical structure of the visual cortex.', 'The concept of convolutional neural networks is introduced, inspired by the discoveries of Hubel, Wiesel, and Fukushima, employing local connections, weight sharing, and feature detection layers. Inspiration from Hubel, Wiesel, and Fukushima, local connections, weight sharing, feature detection layers.', 'Neurons in the inferior temporal cortex represent object categories, as discovered through experiments with patients, showcasing the specific activation for different categories like Jennifer Aniston. Representation of object categories in the inferior temporal cortex, specific activation for different categories.', "Hubel and Wiesel's discoveries on the organization of neurons in a retinotopic way and their response to oriented features led to the development of neural net models and the concept of complex cells for shift invariance. Organization of neurons in a retinotopic way, response to oriented features, development of neural net models, concept of complex cells for shift invariance."]}, {'end': 3452.401, 'start': 3158.371, 'title': 'Convolutional nets for object recognition', 'summary': 'Discusses the evolution of pattern recognition, the application of convolutional nets for character and object recognition, and the use of semantic segmentation for classifying multiple categories, with practical examples and the successful application in robotics and image analysis.', 'duration': 294.03, 'highlights': ['The revolutionary approach of using convolutional nets for character recognition and object detection was a significant departure from the traditional feature extraction and classifier model, enabling end-to-end learning of the entire task. This approach represented a departure from the traditional model of feature extraction and classifier, allowing for end-to-end learning of the entire task.', "The capability of convolutional nets to recognize not only individual characters but also groups of characters simultaneously was a breakthrough, demonstrating the potential of the network's convolutional nature and its ability to detect objects in large images. Convolutional nets exhibited the capability to recognize groups of characters simultaneously, showcasing the potential of the network's convolutional nature and its object detection ability.", 'The application of convolutional nets for simultaneous segmentation and recognition in natural images, such as face and pedestrian detection, showcased their practical utility and demonstrated the potential for driving robots and performing real-time image analysis. Convolutional nets were applied for simultaneous segmentation and recognition in natural images, demonstrating practical utility in tasks such as face and pedestrian detection, with potential applications in driving robots and real-time image analysis.']}], 'duration': 1019.044, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo302433357.jpg', 'highlights': ["The visual cortex organization and mammalian visual system's compression of signals to 1 million fibers from 100 million pixels, influenced by the work of Hubel and Wiesel, is discussed, addressing the challenges in processing image inputs in neural networks.", 'The feed-forward ventral pathway for visual recognition is a fast process taking about 100 milliseconds, indicating the hierarchical structure of the visual cortex and the absence of time for recurrent connections. Fast recognition process (100 milliseconds), absence of time for recurrent connections, hierarchical structure of the visual cortex.', 'The revolutionary approach of using convolutional nets for character recognition and object detection was a significant departure from the traditional feature extraction and classifier model, enabling end-to-end learning of the entire task.', "The capability of convolutional nets to recognize not only individual characters but also groups of characters simultaneously was a breakthrough, demonstrating the potential of the network's convolutional nature and its ability to detect objects in large images."]}, {'end': 4085.808, 'segs': [{'end': 3567.433, 'src': 'embed', 'start': 3478.893, 'weight': 0, 'content': [{'end': 3486.08, 'text': 'but he was really kind of transfixed by the potential applications of this and he was just about to take a sabbatical and work for a company called Mobileye,', 'start': 3478.893, 'duration': 7.187}, {'end': 3490.224, 'text': 'which was a startup in Israel at the time working on autonomous driving.', 'start': 3486.08, 'duration': 4.144}, {'end': 3494.268, 'text': 'And so a couple months after he heard my talk,', 'start': 3491.065, 'duration': 3.203}, {'end': 3499.292, 'text': 'he started working at Mobileye and he told the Mobileye people you should try this convolutional net stuff.', 'start': 3494.268, 'duration': 5.024}, {'end': 3500.233, 'text': 'this works really well.', 'start': 3499.292, 'duration': 0.941}, {'end': 3504.917, 'text': "And the engineers said, meh, we don't believe in that stuff, we have our own method.", 'start': 3501.094, 'duration': 3.823}, {'end': 3512.104, 'text': 'So he implemented it and tried it himself, beat the hell out of all the benchmarks they had.', 'start': 3506.459, 'duration': 5.645}, {'end': 3515.687, 'text': 'And all of a sudden, the whole company switched to using convolutional nets.', 'start': 3513.044, 'duration': 2.643}, {'end': 3530.439, 'text': 'And they were the first company to actually come up with a vision system for cars that you know can keep a car on a highway and can brake if there is a pedestrian or a cyclist crossing.', 'start': 3517.409, 'duration': 13.03}, {'end': 3531.52, 'text': "I'll come back to this in a minute.", 'start': 3530.559, 'duration': 0.961}, {'end': 3537.785, 'text': 'They were basically using this technique, semantic segmentation, very similar to the one I showed for the robot before.', 'start': 3532.261, 'duration': 5.524}, {'end': 3544.51, 'text': 'This was a guy by the name of Shai Shalef Schwartz.', 'start': 3540.447, 'duration': 4.063}, {'end': 3550.615, 'text': 'You have to be aware of the fact also that back in the 80s,', 'start': 3547.592, 'duration': 3.023}, {'end': 3556.081, 'text': 'people were really interested in implementing special types of hardware that could run neural nets really fast.', 'start': 3550.615, 'duration': 5.466}, {'end': 3560.105, 'text': 'And these are kind of a few examples of neural net chips that were actually implemented.', 'start': 3556.682, 'duration': 3.423}, {'end': 3567.433, 'text': 'I had to do with some of them, but they were implemented by people working in the same group as I was at Bell Labs in New Jersey.', 'start': 3560.126, 'duration': 7.307}], 'summary': 'Convolutional nets revolutionized vision systems, leading to successful application in car safety technology.', 'duration': 88.54, 'max_score': 3478.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo303478893.jpg'}, {'end': 3617.777, 'src': 'embed', 'start': 3586.954, 'weight': 4, 'content': [{'end': 3598.341, 'text': 'You go to any conference on computer architecture, you know chip like ISACC, which is the big kind of solid-state circuit conference,', 'start': 3586.954, 'duration': 11.387}, {'end': 3601.203, 'text': 'have to talk about neural net accelerators.', 'start': 3598.341, 'duration': 2.862}, {'end': 3605.231, 'text': 'And I worked on a few of those things.', 'start': 3604.13, 'duration': 1.101}, {'end': 3616.916, 'text': 'Okay, so then something happened, as I told you, around 2010, 13, 15, in speech recognition, image recognition, natural language processing,', 'start': 3607.572, 'duration': 9.344}, {'end': 3617.777, 'text': "and it's continuing.", 'start': 3616.916, 'duration': 0.861}], 'summary': 'Chip conferences discuss neural net accelerators for speech recognition, image recognition, and natural language processing.', 'duration': 30.823, 'max_score': 3586.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo303586954.jpg'}, {'end': 3816.504, 'src': 'embed', 'start': 3787.461, 'weight': 3, 'content': [{'end': 3792.704, 'text': 'And then that was a watershed moment for the computer vision community.', 'start': 3787.461, 'duration': 5.243}, {'end': 3796.026, 'text': 'A lot of people said, okay, you know, now we know that this thing works.', 'start': 3792.724, 'duration': 3.302}, {'end': 3816.504, 'text': 'And the whole community went from basically refusing every paper that had neural nets in them in 2011 and 2012 to refusing every paper that does not have a convolutional net in it in 2016..', 'start': 3799.57, 'duration': 16.934}], 'summary': 'Computer vision community embraced convolutional nets in 2016.', 'duration': 29.043, 'max_score': 3787.461, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo303787461.jpg'}], 'start': 3452.421, 'title': 'Impact of convolutional nets and evolution of neural nets', 'summary': 'Covers the impact of convolutional nets on autonomous driving, highlighting the adoption of semantic segmentation and performance improvements. it also discusses the evolution of neural nets from the 1980s to the present, including milestones such as decreased error rates on imagenet and increasing complexity of network architectures.', 'chapters': [{'end': 3544.51, 'start': 3452.421, 'title': 'Impact of convolutional nets on autonomous driving', 'summary': "Details the pivotal role of convolutional nets in revolutionizing vision systems for autonomous driving, leading to the adoption of semantic segmentation and significant improvements in performance, as exemplified by the case of mobileye's vision system for cars.", 'duration': 92.089, 'highlights': ["Mobileye's adoption of convolutional net for vision systems resulted in significant performance improvements, leading to the first vision system for cars capable of keeping a car on a highway and braking for pedestrians or cyclists.", "A professor's recommendation of convolutional nets led to the entire company switching to this technique, surpassing their previous benchmarks and revolutionizing their vision system for autonomous driving.", 'The pivotal role of convolutional nets in revolutionizing vision systems for autonomous driving was exemplified by the case of Mobileye, where the technique led to significant performance improvements and the adoption of semantic segmentation.']}, {'end': 4085.808, 'start': 3547.592, 'title': 'Evolution of neural nets in 1980s to present', 'summary': 'Discusses the evolution of neural nets from the 1980s to the present, including the resurgence of interest in neural net accelerators, the development of deep learning, and the impact on computer vision community, with key milestones such as the significant decrease in error rates on imagenet dataset and the increasing complexity of network architectures.', 'duration': 538.216, 'highlights': ['The development of deep learning and its impact on the computer vision community, including the significant decrease in error rates on ImageNet dataset and the increasing complexity of network architectures. Significant decrease in error rates on ImageNet dataset, increasing complexity of network architectures.', 'The resurgence of interest in neural net accelerators and their impact on the chip industry, including the focus on neural net accelerators in computer architecture conferences. Resurgence of interest in neural net accelerators, focus on neural net accelerators in computer architecture conferences.', 'The development of neural net chips in the 1980s and their implementation by researchers at Bell Labs, contributing to the hot topic of neural nets during that time. Development and implementation of neural net chips in the 1980s, hot topic of neural nets during that time.']}], 'duration': 633.387, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo303452421.jpg', 'highlights': ["Mobileye's adoption of convolutional net for vision systems resulted in significant performance improvements, leading to the first vision system for cars capable of keeping a car on a highway and braking for pedestrians or cyclists.", "A professor's recommendation of convolutional nets led to the entire company switching to this technique, surpassing their previous benchmarks and revolutionizing their vision system for autonomous driving.", 'The pivotal role of convolutional nets in revolutionizing vision systems for autonomous driving was exemplified by the case of Mobileye, where the technique led to significant performance improvements and the adoption of semantic segmentation.', 'The development of deep learning and its impact on the computer vision community, including the significant decrease in error rates on ImageNet dataset and the increasing complexity of network architectures.', 'The resurgence of interest in neural net accelerators and their impact on the chip industry, including the focus on neural net accelerators in computer architecture conferences.', 'The development of neural net chips in the 1980s and their implementation by researchers at Bell Labs, contributing to the hot topic of neural nets during that time.']}, {'end': 4677.308, 'segs': [{'end': 4136.865, 'src': 'embed', 'start': 4109.265, 'weight': 0, 'content': [{'end': 4114.47, 'text': 'And most of those things are based on two basic families of architectures,', 'start': 4109.265, 'duration': 5.205}, {'end': 4121.397, 'text': 'the sort of so-called one-pass object detection recognition architectures called RetinaNet,', 'start': 4114.47, 'duration': 6.927}, {'end': 4123.559, 'text': "Feature Pyramid Network there's various names for it or U-Net.", 'start': 4121.397, 'duration': 2.162}, {'end': 4126.46, 'text': 'And then another type called MascarCNN.', 'start': 4124.279, 'duration': 2.181}, {'end': 4128.501, 'text': 'Both of them actually originated from Facebook.', 'start': 4126.76, 'duration': 1.741}, {'end': 4133.863, 'text': 'Or the people who originated them are now at Facebook.', 'start': 4130.721, 'duration': 3.142}, {'end': 4136.865, 'text': 'They sometimes came up with it before they came to Facebook.', 'start': 4133.904, 'duration': 2.961}], 'summary': 'Two main families of architectures for object detection: retinanet, feature pyramid network/u-net, and mascarcnn, originated from facebook.', 'duration': 27.6, 'max_score': 4109.265, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304109265.jpg'}, {'end': 4181.908, 'src': 'embed', 'start': 4155.786, 'weight': 5, 'content': [{'end': 4162.331, 'text': 'In fact, the output is a whole bunch of images, one per category, and for each category, it outputs the mask of the object from that category.', 'start': 4155.786, 'duration': 6.545}, {'end': 4166.754, 'text': "Those things can also do what's called instant segmentation.", 'start': 4164.352, 'duration': 2.402}, {'end': 4171.837, 'text': 'So if you have a whole bunch of sheets, It can tell you not just that this region is sheep,', 'start': 4166.794, 'duration': 5.043}, {'end': 4177.984, 'text': 'but it actually picks out the individual sheep and will tear them apart, and it will count the sheep and fall asleep.', 'start': 4171.837, 'duration': 6.147}, {'end': 4181.908, 'text': "That's what you're supposed to do, right?", 'start': 4180.947, 'duration': 0.961}], 'summary': 'Ai model outputs masks for each category, performs instant segmentation, and counts individual sheep.', 'duration': 26.122, 'max_score': 4155.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304155786.jpg'}, {'end': 4260.238, 'src': 'embed', 'start': 4223.515, 'weight': 1, 'content': [{'end': 4226.416, 'text': "But even in academia, people weren't used to kind of distributing their code.", 'start': 4223.515, 'duration': 2.901}, {'end': 4234.241, 'text': "But deep learning is sort of somehow the race that's kind of driven people to kind of be more open about research.", 'start': 4227.877, 'duration': 6.364}, {'end': 4236.903, 'text': "So there's a lot of applications of all this.", 'start': 4235.202, 'duration': 1.701}, {'end': 4240.685, 'text': 'As I said, you know, self-driving car, this is actually a video from Mobileye.', 'start': 4237.623, 'duration': 3.062}, {'end': 4246.628, 'text': 'And Mobileye was pretty early in this, using convolutional nets for autonomous driving, to the point that in 2015,', 'start': 4241.385, 'duration': 5.243}, {'end': 4251.611, 'text': 'they had managed to shoehorn a convolutional net on the chip that they had designed for some other purpose.', 'start': 4246.628, 'duration': 4.983}, {'end': 4254.473, 'text': 'And they sold the license, the technology to Tesla.', 'start': 4252.171, 'duration': 2.302}, {'end': 4256.214, 'text': 'So the first self-driving Teslas.', 'start': 4254.533, 'duration': 1.681}, {'end': 4260.238, 'text': 'I mean self-driving, not really self-driving, but they have driving assistance right?', 'start': 4257.615, 'duration': 2.623}], 'summary': 'Deep learning encourages open research, e.g., mobileye fit convolutional net on chip for self-driving teslas.', 'duration': 36.723, 'max_score': 4223.515, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304223515.jpg'}, {'end': 4382.494, 'src': 'embed', 'start': 4356.838, 'weight': 2, 'content': [{'end': 4364.601, 'text': 'This is lifted from a paper by some of our colleagues here at NYU, where they analyzed MRI images.', 'start': 4356.838, 'duration': 7.763}, {'end': 4371.245, 'text': "So there's one big advantage to convolutional nets is that they don't need to look at a screen look at an MRI.", 'start': 4365.122, 'duration': 6.123}, {'end': 4376.689, 'text': "In particular, to be able to look at an MRI, they don't have to slice it into 2D images.", 'start': 4371.585, 'duration': 5.104}, {'end': 4378.171, 'text': 'They can look at the entire 3D volume.', 'start': 4376.709, 'duration': 1.462}, {'end': 4382.494, 'text': 'This is one property that this thing uses.', 'start': 4378.851, 'duration': 3.643}], 'summary': 'Colleagues at nyu analyzed mri images using convolutional nets, which can look at the entire 3d volume without slicing it into 2d images.', 'duration': 25.656, 'max_score': 4356.838, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304356838.jpg'}, {'end': 4579.553, 'src': 'embed', 'start': 4550.017, 'weight': 3, 'content': [{'end': 4551.699, 'text': 'And the number of training samples is nowhere near that.', 'start': 4550.017, 'duration': 1.682}, {'end': 4554.641, 'text': 'How does that work? But it works.', 'start': 4552.379, 'duration': 2.262}, {'end': 4562.022, 'text': 'Okay, so things we can do with deep learning today.', 'start': 4559.781, 'duration': 2.241}, {'end': 4565.004, 'text': 'You know, we can have safer cars.', 'start': 4562.223, 'duration': 2.781}, {'end': 4568.566, 'text': 'We can have better medical analysis, medical image analysis systems.', 'start': 4565.044, 'duration': 3.522}, {'end': 4573.029, 'text': 'We can have pretty good language translation, far from perfect, but useful.', 'start': 4569.627, 'duration': 3.402}, {'end': 4574.43, 'text': 'Stupid chatbots.', 'start': 4573.69, 'duration': 0.74}, {'end': 4579.553, 'text': 'You know, very good information search retrieval and filtering.', 'start': 4576.872, 'duration': 2.681}], 'summary': 'Deep learning enables safer cars, better medical analysis, useful language translation, and efficient information retrieval.', 'duration': 29.536, 'max_score': 4550.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304550017.jpg'}, {'end': 4631.715, 'src': 'embed', 'start': 4596.518, 'weight': 4, 'content': [{'end': 4597.718, 'text': "We don't have machines with common sense.", 'start': 4596.518, 'duration': 1.2}, {'end': 4600.62, 'text': "We don't have intelligent personal assistants.", 'start': 4597.778, 'duration': 2.842}, {'end': 4604.682, 'text': "We don't have smart chatbots.", 'start': 4600.68, 'duration': 4.002}, {'end': 4606.243, 'text': "We don't have household robots.", 'start': 4604.702, 'duration': 1.541}, {'end': 4610.206, 'text': "I mean, there's a lot of things we don't know how to do, right? Which is why we still do research.", 'start': 4606.283, 'duration': 3.923}, {'end': 4615.789, 'text': 'Okay, so deep learning is really about learning representations.', 'start': 4612.287, 'duration': 3.502}, {'end': 4620.712, 'text': 'But really, we should know in advance what representations are.', 'start': 4617.571, 'duration': 3.141}, {'end': 4623.613, 'text': 'So I talked about the traditional model of pattern recognition.', 'start': 4620.792, 'duration': 2.821}, {'end': 4631.715, 'text': 'But representation is really about, you know, you have your raw data.', 'start': 4625.133, 'duration': 6.582}], 'summary': 'Lack of advanced ai systems, focus on deep learning for data representation, ongoing research.', 'duration': 35.197, 'max_score': 4596.518, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304596518.jpg'}], 'start': 4086.428, 'title': 'Computer vision and deep learning applications', 'summary': 'Discusses the incredible progress in computer vision, focusing on object detection and segmentation using architectures like retinanet and mascarcnn, and it explores deep learning applications in domains such as self-driving cars, medical imaging, and scientific research, with examples like reducing collisions by 40% in france and advancements in medical image analysis.', 'chapters': [{'end': 4196.599, 'start': 4086.428, 'title': 'Incredible progress in computer vision', 'summary': 'Discusses the incredible progress in computer vision, particularly in object detection and segmentation, citing the use of architectures like retinanet, feature pyramid network, and mascarcnn, which can accurately detect objects, generate masks, and perform instant segmentation in real time.', 'duration': 110.171, 'highlights': ['Architectures like RetinaNet, Feature Pyramid Network, and MascarCNN have led to incredible progress in computer vision, enabling the accurate detection of objects and generation of masks in real time.', 'These architectures can perform instant segmentation, accurately detecting and counting individual objects within a category, such as sheep, demonstrating the practical applications of computer vision in real-life scenarios.', 'Deep learning research has been embraced by the community, emphasizing the necessity of open research in the field.']}, {'end': 4677.308, 'start': 4196.659, 'title': 'Applications of deep learning', 'summary': 'Discusses the applications of deep learning in various domains, including self-driving cars, medical imaging, and scientific research, and highlights the positive impact such as reducing collisions by 40% in france and advancements in medical image analysis.', 'duration': 480.649, 'highlights': ["Deep learning applications in self-driving cars, such as Mobileye's use of convolutional nets for autonomous driving and the deployment of vision systems in European cars, reducing collisions by 40% in France. Mobileye's early use of convolutional nets for autonomous driving, leading to vision system deployment in European cars, reducing collisions by 40% in France.", 'The use of convolutional nets in medical imaging, specifically in analyzing MRI images and detecting malignant tumors in mammograms, showcasing the advantages of 3D convolutional nets over 2D slices. The use of 3D convolutional nets in medical imaging, analyzing MRI images and detecting malignant tumors in mammograms, showcasing the advantages over 2D slices.', 'The impact of deep learning on various domains, including physics, bioinformatics, and the challenges and mysteries in understanding its effectiveness and training methods. The impact of deep learning on domains like physics and bioinformatics, highlighting the challenges and mysteries in understanding its effectiveness and training methods.', 'The current capabilities of deep learning, including advancements in car safety, medical image analysis, language translation, information retrieval, and energy management, while noting the limitations in developing truly intelligent machines. The current capabilities of deep learning, including advancements in car safety, medical image analysis, language translation, information retrieval, and energy management, while noting the limitations in developing truly intelligent machines.', 'The importance of learning representations in deep learning and the ongoing research in understanding and developing useful representations for various tasks. The importance of learning representations in deep learning and the ongoing research in understanding and developing useful representations for various tasks.']}], 'duration': 590.88, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304086428.jpg', 'highlights': ['Architectures like RetinaNet, Feature Pyramid Network, and MascarCNN have led to incredible progress in computer vision, enabling the accurate detection of objects and generation of masks in real time.', "Deep learning applications in self-driving cars, such as Mobileye's use of convolutional nets for autonomous driving and the deployment of vision systems in European cars, reducing collisions by 40% in France.", 'The use of convolutional nets in medical imaging, specifically in analyzing MRI images and detecting malignant tumors in mammograms, showcasing the advantages of 3D convolutional nets over 2D slices.', 'The current capabilities of deep learning, including advancements in car safety, medical image analysis, language translation, information retrieval, and energy management, while noting the limitations in developing truly intelligent machines.', 'The importance of learning representations in deep learning and the ongoing research in understanding and developing useful representations for various tasks.', 'These architectures can perform instant segmentation, accurately detecting and counting individual objects within a category, such as sheep, demonstrating the practical applications of computer vision in real-life scenarios.', 'Deep learning research has been embraced by the community, emphasizing the necessity of open research in the field.', 'The impact of deep learning on various domains, including physics, bioinformatics, and the challenges and mysteries in understanding its effectiveness and training methods.']}, {'end': 5282.727, 'segs': [{'end': 4721.111, 'src': 'embed', 'start': 4679.258, 'weight': 0, 'content': [{'end': 4682.139, 'text': 'But there are things like tiling the space, doing random projections.', 'start': 4679.258, 'duration': 2.881}, {'end': 4692.482, 'text': 'A random projection is actually kind of, you know, like a monster that rears its head periodically, like every five years,', 'start': 4682.159, 'duration': 10.323}, {'end': 4694.362, 'text': 'and you have to whack it on the head every time it pops up.', 'start': 4692.482, 'duration': 1.88}, {'end': 4697.623, 'text': 'That was the idea behind the perceptron.', 'start': 4696.263, 'duration': 1.36}, {'end': 4701.325, 'text': 'So the first layer of perceptron is a layer of random projections.', 'start': 4697.643, 'duration': 3.682}, {'end': 4701.966, 'text': 'What does that mean?', 'start': 4701.405, 'duration': 0.561}, {'end': 4713.349, 'text': 'A random projection is a random matrix which has a smaller output dimension than input dimension, with some sort of non-linearity at the end.', 'start': 4701.986, 'duration': 11.363}, {'end': 4717.43, 'text': 'So think about a single layer neural net with non-linearities, but the weights are random.', 'start': 4713.369, 'duration': 4.061}, {'end': 4721.111, 'text': 'So you can think of this as random projections.', 'start': 4718.43, 'duration': 2.681}], 'summary': 'Random projections in perceptron, an occasional challenge to address every five years.', 'duration': 41.853, 'max_score': 4679.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304679258.jpg'}, {'end': 4815.341, 'src': 'heatmap', 'start': 4748.936, 'weight': 1, 'content': [{'end': 4751.498, 'text': 'they call this extreme learning machines, okay?', 'start': 4748.936, 'duration': 2.562}, {'end': 4755.365, 'text': "And it's like it's ridiculous, but it exists.", 'start': 4752.523, 'duration': 2.842}, {'end': 4759.248, 'text': "They're not extreme.", 'start': 4758.147, 'duration': 1.101}, {'end': 4760.929, 'text': "I mean, they're extremely stupid, but you know.", 'start': 4759.288, 'duration': 1.641}, {'end': 4770.164, 'text': 'Right, so I was mentioning the compositionality of the world.', 'start': 4768.002, 'duration': 2.162}, {'end': 4774.568, 'text': "It's, you know, from pixels to edges to text on motifs, parts, objects.", 'start': 4771.305, 'duration': 3.263}, {'end': 4777.931, 'text': 'In text, you have characters, word, word groups, clauses, sentences, stories.', 'start': 4775.008, 'duration': 2.923}, {'end': 4779.372, 'text': "In speech, it's the same.", 'start': 4778.571, 'duration': 0.801}, {'end': 4780.453, 'text': 'You have individual samples.', 'start': 4779.412, 'duration': 1.041}, {'end': 4787.8, 'text': 'You have, you know, spectral bands, sound, phone, phonemes, words, etc.', 'start': 4781.494, 'duration': 6.306}, {'end': 4791.692, 'text': 'You always have this kind of hierarchy.', 'start': 4790.131, 'duration': 1.561}, {'end': 4796.633, 'text': 'Okay, so here are many attempts at dismissing the whole idea of deep learning.', 'start': 4792.412, 'duration': 4.221}, {'end': 4804.096, 'text': "okay?. First thing, and this is things that I've heard for decades, okay? From mostly theoreticians, but a lot of people.", 'start': 4796.633, 'duration': 7.463}, {'end': 4808.898, 'text': "And you have to know about them because they're going to come back in five years when people say, oh, deep learning sucks.", 'start': 4804.657, 'duration': 4.241}, {'end': 4815.341, 'text': 'Why not use super vacuum machines? Okay? Here is support vector machines here on the top left.', 'start': 4810.739, 'duration': 4.602}], 'summary': 'Discussion on extreme learning machines and hierarchy in data processing.', 'duration': 66.405, 'max_score': 4748.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304748936.jpg'}, {'end': 4977.073, 'src': 'embed', 'start': 4948.649, 'weight': 5, 'content': [{'end': 4954.092, 'text': 'You can write a thousand page books about the cute mathematics behind that,', 'start': 4948.649, 'duration': 5.443}, {'end': 4959.875, 'text': "but the bottom line is it's a two layer neural net where the first layer is trained in a very stupid way, unsupervised,", 'start': 4954.092, 'duration': 5.783}, {'end': 4961.415, 'text': 'and the second layer is just a linear classifier.', 'start': 4959.875, 'duration': 1.54}, {'end': 4970.38, 'text': "So it's basically glorified template matching because it basically compares the input vector to all the training samples.", 'start': 4964.017, 'duration': 6.363}, {'end': 4977.073, 'text': "And so it doesn't work if you want to do computer vision with raw images.", 'start': 4971.67, 'duration': 5.403}], 'summary': 'Two-layer neural net; first layer unsupervised, second layer linear classifier. not suitable for computer vision with raw images.', 'duration': 28.424, 'max_score': 4948.649, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304948649.jpg'}, {'end': 5095.878, 'src': 'embed', 'start': 5067.676, 'weight': 4, 'content': [{'end': 5071.839, 'text': 'given that you have a large enough vector in the middle.', 'start': 5067.676, 'duration': 4.163}, {'end': 5074.541, 'text': 'So the dimension of what comes out of the first layer.', 'start': 5072.099, 'duration': 2.442}, {'end': 5077.583, 'text': "if it's high enough, potentially infinite.", 'start': 5074.541, 'duration': 3.042}, {'end': 5081.626, 'text': 'you can approximate any function you want as close as you want by making this layer go to infinity.', 'start': 5077.583, 'duration': 4.043}, {'end': 5089.411, 'text': 'So again, you talk to theoreticians and they tell you, why do you need layers? I can approximate anything I want with two layers.', 'start': 5083.564, 'duration': 5.847}, {'end': 5095.878, 'text': 'But there is an argument which is, it could be very, very expensive to do it in two layers.', 'start': 5090.892, 'duration': 4.986}], 'summary': 'Large enough vector dimension can approximate any function closely by making the layer approach infinity.', 'duration': 28.202, 'max_score': 5067.676, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305067676.jpg'}, {'end': 5196.329, 'src': 'embed', 'start': 5170.123, 'weight': 3, 'content': [{'end': 5177.344, 'text': "If you allow yourself to do it in log n layers, where n is the number of input bits, then it's linear.", 'start': 5170.123, 'duration': 7.221}, {'end': 5182.346, 'text': 'Okay, so you go from exponential complexity to linear complexity if you allow yourself to use multiple layers.', 'start': 5178.204, 'duration': 4.142}, {'end': 5187.887, 'text': "It's as if you know when you write a program.", 'start': 5183.526, 'duration': 4.361}, {'end': 5196.329, 'text': "I'll tell you write a program in such a way that there is only two sequential steps that are necessary to run your program.", 'start': 5187.887, 'duration': 8.442}], 'summary': 'Using log n layers reduces complexity from exponential to linear.', 'duration': 26.206, 'max_score': 5170.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305170123.jpg'}], 'start': 4679.258, 'title': 'Random projections and deep learning', 'summary': 'Explains the concept of random projections in the perceptron model and discusses the periodic rediscovery of deep learning. it highlights the application of random matrices, dismissals of deep learning, advantages of multiple layers in neural networks, and inefficiencies of support vector machines.', 'chapters': [{'end': 4721.111, 'start': 4679.258, 'title': 'Perceptron and random projections', 'summary': 'Explains the concept of random projections and their application in the perceptron model, highlighting its periodic occurrence and the use of random matrices with smaller output dimensions and non-linearities.', 'duration': 41.853, 'highlights': ['The first layer of perceptron is a layer of random projections, which are random matrices with smaller output dimension than input dimension and some sort of non-linearity at the end.', 'Random projection is a periodic concept that rears its head every five years and requires attention each time it emerges.', 'Random projection is like a single layer neural net with non-linearities, but the weights are random.']}, {'end': 5282.727, 'start': 4723.816, 'title': 'Rediscovery of deep learning', 'summary': 'Discusses the periodic rediscovery of deep learning, with a focus on dismissals of deep learning and the advantages of using multiple layers in neural networks, highlighting the inefficiencies of support vector machines and the potential for exponential complexity in two-layer systems.', 'duration': 558.911, 'highlights': ['Support Vector Machines and their inefficiencies Support Vector Machines are described as a two-layer neural net with a fixed first layer and a trainable second layer, highlighting its inefficiency in tasks such as computer vision with raw images due to the need to compare input vectors with all training samples, resulting in high computational cost.', 'The limitations of two-layer systems The discussion highlights that while any function can be approximated with two-layer systems, the number of terms needed in the middle can be exponential in the size of the input, making it very expensive, thus advocating for the use of multiple layers to reduce complexity.', 'Advantages of using multiple layers in neural networks The narrative explains the potential for reducing complexity from exponential to linear by allowing multiple layers in neural networks, drawing parallels to the design of computer circuits and program execution, emphasizing the potential for simpler computations with sequential steps.']}], 'duration': 603.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo304679258.jpg', 'highlights': ['Random projection is like a single layer neural net with non-linearities, but the weights are random.', 'Random projection is a periodic concept that rears its head every five years and requires attention each time it emerges.', 'The first layer of perceptron is a layer of random projections, which are random matrices with smaller output dimension than input dimension and some sort of non-linearity at the end.', 'Advantages of using multiple layers in neural networks The narrative explains the potential for reducing complexity from exponential to linear by allowing multiple layers in neural networks, drawing parallels to the design of computer circuits and program execution, emphasizing the potential for simpler computations with sequential steps.', 'The limitations of two-layer systems The discussion highlights that while any function can be approximated with two-layer systems, the number of terms needed in the middle can be exponential in the size of the input, making it very expensive, thus advocating for the use of multiple layers to reduce complexity.', 'Support Vector Machines and their inefficiencies Support Vector Machines are described as a two-layer neural net with a fixed first layer and a trainable second layer, highlighting its inefficiency in tasks such as computer vision with raw images due to the need to compare input vectors with all training samples, resulting in high computational cost.']}, {'end': 5934.826, 'segs': [{'end': 5311.607, 'src': 'embed', 'start': 5283.087, 'weight': 0, 'content': [{'end': 5289.069, 'text': "So the problem with this is that it takes a time that's proportional to the size of the of the numbers that you're trying to add.", 'start': 5283.087, 'duration': 5.982}, {'end': 5295.094, 'text': 'So circuit designers have a way of basically pre-computing the carry.', 'start': 5289.91, 'duration': 5.184}, {'end': 5302.18, 'text': "that's called carry look-ahead, so that the number of steps necessary to do an addition is actually not n, it's much less than that.", 'start': 5295.094, 'duration': 7.086}, {'end': 5311.607, 'text': "But that's at the expense of a huge increase in the complexity of the circuit, the area that it takes on the chip.", 'start': 5302.48, 'duration': 9.127}], 'summary': 'Circuit designers use carry look-ahead to reduce addition time, but complexity increases chip area.', 'duration': 28.52, 'max_score': 5283.087, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305283087.jpg'}, {'end': 5362.842, 'src': 'embed', 'start': 5333.167, 'weight': 1, 'content': [{'end': 5340.032, 'text': "I don't call that deep, even though it technically uses backprop, but it doesn't really learn complex representations.", 'start': 5333.167, 'duration': 6.865}, {'end': 5343.754, 'text': "So there's this idea of hierarchy in deep learning.", 'start': 5342.053, 'duration': 1.701}, {'end': 5346.375, 'text': 'SVN definitely are not deep.', 'start': 5345.214, 'duration': 1.161}, {'end': 5351.517, 'text': "Unless you learn complicated kernels, but then they're not SVNs anymore.", 'start': 5347.835, 'duration': 3.682}, {'end': 5362.842, 'text': "So what are good features? What are good representations? So here's an example I like.", 'start': 5355.318, 'duration': 7.524}], 'summary': 'Discussion on deep learning hierarchy and the limitations of support vector machines in learning complex representations.', 'duration': 29.675, 'max_score': 5333.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305333167.jpg'}, {'end': 5447.372, 'src': 'embed', 'start': 5427.685, 'weight': 3, 'content': [{'end': 5439.709, 'text': 'So the manifold hypothesis is that the set of things that you know look natural to us live in a low dimensional surface inside the high dimensional ambient space.', 'start': 5427.685, 'duration': 12.024}, {'end': 5443.431, 'text': 'And a good example to kind of convince yourself of this.', 'start': 5440.83, 'duration': 2.601}, {'end': 5447.372, 'text': 'imagine I take lots of pictures of a person making faces right?', 'start': 5443.431, 'duration': 3.941}], 'summary': 'Manifold hypothesis: natural things exist in low-dimensional surface in high-dimensional space.', 'duration': 19.687, 'max_score': 5427.685, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305427685.jpg'}, {'end': 5790.082, 'src': 'embed', 'start': 5757.299, 'weight': 4, 'content': [{'end': 5760.679, 'text': "That's another set of, you know, variable variables.", 'start': 5757.299, 'duration': 3.38}, {'end': 5765.943, 'text': "And what you'd like is a representation that basically individually represents each of those factors of variations.", 'start': 5760.94, 'duration': 5.003}, {'end': 5772.148, 'text': "So if there is a criterion to satisfy in learning good representations, it's that.", 'start': 5767.004, 'duration': 5.144}, {'end': 5776.672, 'text': "It's finding independent explanatory factors of variation of the data that you're looking at.", 'start': 5772.248, 'duration': 4.424}, {'end': 5790.082, 'text': 'And the bottom line is that nobody has any idea how to do this, okay? But that would be the ultimate goal of representation learning.', 'start': 5778.433, 'duration': 11.649}], 'summary': 'Representation learning aims to find independent explanatory factors of variation in data, a challenging and currently unsolved problem.', 'duration': 32.783, 'max_score': 5757.299, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305757299.jpg'}, {'end': 5858.236, 'src': 'embed', 'start': 5823.627, 'weight': 5, 'content': [{'end': 5833.672, 'text': 'So if you assume that the surface occupied by all those examples of faces is a plane, then PCA will find the dimension of that plane.', 'start': 5823.627, 'duration': 10.045}, {'end': 5848.287, 'text': "Principal component analysis, right? But no, it's not linear, unfortunately, right? Let me give you an example.", 'start': 5834.392, 'duration': 13.895}, {'end': 5858.236, 'text': 'If you take me and my oldest son that looks like me and you place us making the same face in the same position,', 'start': 5850.449, 'duration': 7.787}], 'summary': 'Pca finds the dimension of the plane occupied by faces, illustrating non-linearity with a personal example.', 'duration': 34.609, 'max_score': 5823.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305823627.jpg'}], 'start': 5283.087, 'title': 'Optimizing circuits and natural image manifolds', 'summary': 'Discusses the trade-off between time and space in circuit design, focusing on carry look-ahead implementation. it also explores the manifold hypothesis of natural images and the challenge of finding independent explanatory factors of variation in image representation.', 'chapters': [{'end': 5362.842, 'start': 5283.087, 'title': 'Optimizing addition circuits', 'summary': 'Discusses the trade-off between time and space in circuit design, where implementing carry look-ahead reduces the steps necessary for addition, but at the cost of increased circuit complexity and chip area. it also touches on the concept of depth in deep learning models and the importance of hierarchy and complex representations.', 'duration': 79.755, 'highlights': ['Implementing carry look-ahead reduces the steps necessary for addition, but increases circuit complexity and chip area.', 'The concept of depth in deep learning models is discussed, highlighting the importance of hierarchy and complex representations.', 'SVNs are not considered deep unless they learn complicated kernels.']}, {'end': 5934.826, 'start': 5363.702, 'title': 'Manifold hypothesis of natural images', 'summary': "Discusses the manifold hypothesis, emphasizing the low-dimensional surface in high-dimensional ambient space where natural images reside, illustrated by the example of images of a person's face and the challenge of finding independent explanatory factors of variation in image representation.", 'duration': 571.124, 'highlights': ["The set of things that look natural to us live in a low dimensional surface inside the high dimensional ambient space, with the example of images of a person's face suggesting a surface with less than 100 parameters determining the position of a point on that surface.", 'The ultimate goal of representation learning is to find independent explanatory factors of variation, but currently, there is no known method to achieve this.', 'Principal Component Analysis (PCA) will only find the dimension of a linear manifold, and is not suitable for identifying the complex non-linear surfaces occupied by natural images, as illustrated by examples of faces and head movements.']}], 'duration': 651.739, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/0bMe_vCZo30/pics/0bMe_vCZo305283087.jpg', 'highlights': ['Implementing carry look-ahead reduces addition steps, but increases circuit complexity.', 'The concept of depth in deep learning models emphasizes hierarchy and complex representations.', 'SVNs are not considered deep unless they learn complicated kernels.', 'The set of natural things lives in a low dimensional surface within a high dimensional space.', 'The ultimate goal of representation learning is to find independent explanatory factors of variation.', 'Principal Component Analysis (PCA) is not suitable for identifying complex non-linear surfaces occupied by natural images.']}], 'highlights': ['The lecture is run by two TAs, Alfredo Canziani and Mark Goldstein, and students will interact with them more frequently than with the lecturer.', 'The course slides will be posted on the website just before the lecture, ensuring availability for students before class.', 'The course includes practical sessions on Tuesdays covering practical questions, mathematics refreshers, basic concepts, PyTorch tutorials, and other software tools.', 'The future potential of self-supervised learning, particularly in natural language processing and computer vision, is highlighted as a dominant trend that is likely to become increasingly influential in the field of deep learning.', 'Norbert Wiener proposed the idea of cybernetics, involving systems with sensors and actuators that enable feedback loops and self-regulation, forming the basis for AI and neural networks.', 'McCulloch and Pitts introduced the concept of neurons as threshold units and building Boolean circuits, laying the groundwork for logical inference using neurons.', 'Resurgence in 1985 following the emergence of back propagation, which allowed for training multilayer neural nets.', 'Adoption of deep learning convolutional nets around 2012-2013, marking a revolution in computer vision.', 'Stochastic gradient descent converges much faster, particularly with large training sets. Empirical evidence shows that stochastic gradient descent converges much faster, especially when dealing with very large training sets.', "The visual cortex organization and mammalian visual system's compression of signals to 1 million fibers from 100 million pixels, influenced by the work of Hubel and Wiesel, is discussed, addressing the challenges in processing image inputs in neural networks.", "Mobileye's adoption of convolutional net for vision systems resulted in significant performance improvements, leading to the first vision system for cars capable of keeping a car on a highway and braking for pedestrians or cyclists.", 'Architectures like RetinaNet, Feature Pyramid Network, and MascarCNN have led to incredible progress in computer vision, enabling the accurate detection of objects and generation of masks in real time.', 'The use of convolutional nets in medical imaging, specifically in analyzing MRI images and detecting malignant tumors in mammograms, showcasing the advantages of 3D convolutional nets over 2D slices.', 'Random projection is like a single layer neural net with non-linearities, but the weights are random.', 'The narrative explains the potential for reducing complexity from exponential to linear by allowing multiple layers in neural networks, drawing parallels to the design of computer circuits and program execution, emphasizing the potential for simpler computations with sequential steps.', 'The discussion highlights that while any function can be approximated with two-layer systems, the number of terms needed in the middle can be exponential in the size of the input, making it very expensive, thus advocating for the use of multiple layers to reduce complexity.', 'The ultimate goal of representation learning is to find independent explanatory factors of variation.']}