title
Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94

description
Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Ilya's Twitter: https://twitter.com/ilyasut Ilya's Website: https://www.cs.toronto.edu/~ilya/ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:23 - AlexNet paper and the ImageNet moment 8:33 - Cost functions 13:39 - Recurrent neural networks 16:19 - Key ideas that led to success of deep learning 19:57 - What's harder to solve: language or vision? 29:35 - We're massively underestimating deep learning 36:04 - Deep double descent 41:20 - Backpropagation 42:42 - Can neural networks be made to reason? 50:35 - Long-term memory 56:37 - Language models 1:00:35 - GPT-2 1:07:14 - Active learning 1:08:52 - Staged release of AI systems 1:13:41 - How to build AGI? 1:25:00 - Question to AGI 1:32:07 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

detail
{'title': 'Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94', 'heatmap': [{'end': 5090.602, 'start': 5026.137, 'weight': 1}], 'summary': 'Ilya sutskever, co-founder of openai, discusses deep learning, cryptocurrency, and the potential of ai in various domains, covering key moments in neural network evolution, cost functions, unity across ml domains, language and visual understanding, agi development, ai model advancements, and transfer capabilities, advocating for ethical governance and aligning ai values to human values.', 'chapters': [{'end': 128.495, 'segs': [{'end': 49.352, 'src': 'embed', 'start': 0.109, 'weight': 0, 'content': [{'end': 9.454, 'text': 'The following is a conversation with Ilyas Eskever, co-founder and chief scientist of OpenAI, one of the most cited computer scientists in history,', 'start': 0.109, 'duration': 9.345}, {'end': 19.04, 'text': 'with over 165,000 citations, and, to me, one of the most brilliant and insightful minds ever in the field of deep learning.', 'start': 9.454, 'duration': 9.586}, {'end': 25.261, 'text': 'There are very few people in this world who I would rather talk to and brainstorm with about deep learning,', 'start': 20.06, 'duration': 5.201}, {'end': 30.062, 'text': 'intelligence and life in general than Ilya on and off the mic.', 'start': 25.261, 'duration': 4.801}, {'end': 32.622, 'text': 'This was an honor and a pleasure.', 'start': 30.702, 'duration': 1.92}, {'end': 36.803, 'text': 'This conversation was recorded before the outbreak of the pandemic.', 'start': 33.763, 'duration': 3.04}, {'end': 42.604, 'text': "For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way.", 'start': 37.243, 'duration': 5.361}, {'end': 43.865, 'text': 'Stay strong.', 'start': 43.264, 'duration': 0.601}, {'end': 44.985, 'text': "We're in this together.", 'start': 44.205, 'duration': 0.78}, {'end': 46.165, 'text': "We'll beat this thing.", 'start': 45.005, 'duration': 1.16}, {'end': 49.352, 'text': 'This is the Artificial Intelligence Podcast.', 'start': 47.23, 'duration': 2.122}], 'summary': 'Ilyas eskever, co-founder of openai, with over 165,000 citations, discussed deep learning and shared insights in an ai podcast.', 'duration': 49.243, 'max_score': 0.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A109.jpg'}, {'end': 121.032, 'src': 'embed', 'start': 93.11, 'weight': 1, 'content': [{'end': 96.09, 'text': 'I recommend Ascent of Money as a great book on this history.', 'start': 93.11, 'duration': 2.98}, {'end': 98.491, 'text': 'Both the book and audio book are great.', 'start': 96.83, 'duration': 1.661}, {'end': 103.281, 'text': 'Debits and credits on ledgers started around 30,000 years ago.', 'start': 99.639, 'duration': 3.642}, {'end': 111.566, 'text': 'The US dollar created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago.', 'start': 103.301, 'duration': 8.265}, {'end': 119.351, 'text': "So, given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to, and just might,", 'start': 112.086, 'duration': 7.265}, {'end': 121.032, 'text': 'redefine the nature of money.', 'start': 119.351, 'duration': 1.681}], 'summary': 'Debits and credits on ledgers date back 30,000 years. us dollar created over 200 years ago, bitcoin just over 10 years ago. cryptocurrency is in early stages, aiming to redefine money.', 'duration': 27.922, 'max_score': 93.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A93110.jpg'}], 'start': 0.109, 'title': 'Conversation with ilyas eskever', 'summary': 'Features a conversation with ilyas eskever, co-founder and chief scientist of openai, with over 165,000 citations, discussing deep learning and the potential of cryptocurrency, emphasizing the early days of development and its potential to redefine the nature of money.', 'chapters': [{'end': 128.495, 'start': 0.109, 'title': 'Conversation with ilyas eskever', 'summary': 'Features a conversation with ilyas eskever, co-founder and chief scientist of openai, with over 165,000 citations, discussing deep learning and the potential of cryptocurrency, with an emphasis on the early days of development and its potential to redefine the nature of money.', 'duration': 128.386, 'highlights': ['Ilyas Eskever is the co-founder and chief scientist of OpenAI, with over 165,000 citations, and is a prominent computer scientist in the field of deep learning.', 'Cryptocurrency, including Bitcoin, is discussed in the context of the history of money, highlighting its early days of development and potential to redefine the nature of money.', 'The conversation was recorded before the outbreak of the pandemic, and a message of support is conveyed for those affected by the crisis.']}], 'duration': 128.386, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A109.jpg', 'highlights': ['Ilyas Eskever is the co-founder and chief scientist of OpenAI, with over 165,000 citations, and is a prominent computer scientist in the field of deep learning.', 'Cryptocurrency, including Bitcoin, is discussed in the context of the history of money, highlighting its early days of development and potential to redefine the nature of money.', 'The conversation was recorded before the outbreak of the pandemic, and a message of support is conveyed for those affected by the crisis.']}, {'end': 773.688, 'segs': [{'end': 228.955, 'src': 'embed', 'start': 204.242, 'weight': 0, 'content': [{'end': 213.691, 'text': 'The first moment in which I realized that deep neural networks are powerful was when James Martens invented the Hessian Free Optimizer in 2010.', 'start': 204.242, 'duration': 9.449}, {'end': 220.248, 'text': 'and he trained a 10-layer neural network end-to-end without pre-training from scratch.', 'start': 213.691, 'duration': 6.557}, {'end': 223.393, 'text': 'And when that happened, I thought, this is it.', 'start': 221.693, 'duration': 1.7}, {'end': 228.955, 'text': 'Because if you can train a big neural network, a big neural network can represent very complicated function.', 'start': 223.974, 'duration': 4.981}], 'summary': 'In 2010, james martens invented hessian free optimizer and trained a 10-layer neural network from scratch, revealing the power of deep neural networks.', 'duration': 24.713, 'max_score': 204.242, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A204242.jpg'}, {'end': 370.198, 'src': 'embed', 'start': 346.609, 'weight': 1, 'content': [{'end': 354.436, 'text': 'So in your intuition about neural networks, does the human brain come into play as an intuition builder? Definitely.', 'start': 346.609, 'duration': 7.827}, {'end': 360.335, 'text': 'I mean you gotta be precise with these analogies between artificial neural networks and the brain.', 'start': 354.994, 'duration': 5.341}, {'end': 370.198, 'text': 'but there is no question that the brain is a huge source of intuition and inspiration for deep learning researchers since all the way from Rosenblatt in the sixties.', 'start': 360.335, 'duration': 9.863}], 'summary': 'The human brain is a significant source of intuition and inspiration for deep learning researchers, dating back to the sixties.', 'duration': 23.589, 'max_score': 346.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A346609.jpg'}, {'end': 495.34, 'src': 'embed', 'start': 470.203, 'weight': 2, 'content': [{'end': 475.129, 'text': 'Looking at the advantages versus disadvantages is a good way to figure out what is the important difference.', 'start': 470.203, 'duration': 4.926}, {'end': 479.635, 'text': 'So the brain uses spikes, which may or may not be important.', 'start': 475.69, 'duration': 3.945}, {'end': 481.297, 'text': "Yeah, it's a really interesting question.", 'start': 480.116, 'duration': 1.181}, {'end': 488.006, 'text': "Do you think it's important or not? That's one big architectural difference between artificial neural networks and..", 'start': 481.417, 'duration': 6.589}, {'end': 491.658, 'text': "It's hard to tell, but my prior is not very high.", 'start': 488.336, 'duration': 3.322}, {'end': 492.919, 'text': 'And I can say why.', 'start': 491.698, 'duration': 1.221}, {'end': 495.34, 'text': 'You know, there are people who are interested in spiking neural networks.', 'start': 493.279, 'duration': 2.061}], 'summary': 'Analyzing pros and cons of brain spikes in neural networks, with uncertain importance.', 'duration': 25.137, 'max_score': 470.203, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A470203.jpg'}], 'start': 128.495, 'title': 'Deep learning evolution and cost functions', 'summary': 'Covers the evolution of deep learning, from alexnet to current neural network success, emphasizing key moments and intuition. it also explores the significance of cost functions in deep learning, questioning their universality and advocating for their effectiveness, while considering alternatives like gans and spiking-based learning rules.', 'chapters': [{'end': 509.268, 'start': 128.495, 'title': 'Deep learning revolution: the evolution and intuition of neural networks', 'summary': 'Discusses the evolution of deep learning, from the pivotal alexnet paper to the current success of artificial neural networks, highlighting key moments and the role of intuition, and explores the differences between artificial neural networks and the human brain.', 'duration': 380.773, 'highlights': ['The realization that large and deep neural networks could be trained end-to-end with backpropagation marked a pivotal moment in the deep learning revolution, with the successful training of a 10-layer neural network without pre-training by James Martens in 2010 being a key indicator of the power of deep neural networks.', 'The intuition behind deep neural networks was inspired by the human brain, with the concept of artificial neurons being directly inspired by the brain, and the analogy between artificial neural networks and the brain serving as a source of intuition and inspiration for deep learning researchers.', 'The chapter explores the differences between artificial neural networks and the human brain, highlighting the advantages and disadvantages of each, and the debate over the importance of spikes in the brain as a significant architectural difference between artificial neural networks and the brain.']}, {'end': 773.688, 'start': 509.609, 'title': 'Cost functions in deep learning', 'summary': 'Discusses the significance of cost functions in deep learning, questioning their universality and exploring alternatives like gans and spiking-based learning rules, while expressing a strong belief in the effectiveness of cost functions.', 'duration': 264.079, 'highlights': ['The significance of the cost function in measuring the performance of neural networks is explored, raising questions about its universality and the potential emergence of alternatives like GANs and spiking-based learning rules.', 'The concept of cost functions in deep learning is discussed, with a strong belief expressed in their effectiveness and potential for new ways of looking at things that may involve cost functions in a less central way.', 'The discussion touches on the potential limitations of cost functions in deep learning and explores alternatives such as GANs, self-play, and spike time independent plasticity as potentially useful for designing artificial neural networks.']}], 'duration': 645.193, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A128495.jpg', 'highlights': ['The realization of training large and deep neural networks with backpropagation marked a pivotal moment in the deep learning revolution, exemplified by the successful training of a 10-layer neural network without pre-training by James Martens in 2010.', 'The intuition behind deep neural networks was inspired by the human brain, with artificial neurons directly inspired by the brain, serving as a source of intuition and inspiration for deep learning researchers.', 'The chapter explores the differences between artificial neural networks and the human brain, highlighting the advantages and disadvantages of each, and the debate over the importance of spikes in the brain as a significant architectural difference.']}, {'end': 1468.64, 'segs': [{'end': 807.262, 'src': 'embed', 'start': 773.708, 'weight': 2, 'content': [{'end': 778.391, 'text': "So if I said something wrong here, Don't get too angry.", 'start': 773.708, 'duration': 4.683}, {'end': 780.993, 'text': 'But you sounded brilliant while saying it.', 'start': 779.432, 'duration': 1.561}, {'end': 783.555, 'text': "But the timing, that's one thing that's missing.", 'start': 781.053, 'duration': 2.502}, {'end': 787.118, 'text': 'The temporal dynamics is not captured.', 'start': 784.256, 'duration': 2.862}, {'end': 793.044, 'text': "I think that's like a fundamental property of the brain is the timing of the signals.", 'start': 787.459, 'duration': 5.585}, {'end': 794.445, 'text': 'Well, we have recurrent neural networks.', 'start': 793.304, 'duration': 1.141}, {'end': 807.262, 'text': "But you think of that as this, I mean, that's a very crude, simplified, what's that called? There's a clock, I guess, to recurrent neural networks.", 'start': 795.486, 'duration': 11.776}], 'summary': "Brain's fundamental property is timing of signals, temporal dynamics not captured in current neural networks.", 'duration': 33.554, 'max_score': 773.708, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A773708.jpg'}, {'end': 875.964, 'src': 'embed', 'start': 847.732, 'weight': 0, 'content': [{'end': 852.637, 'text': "Right now, recurrent neural networks have been superseded by transformers, but maybe one day they'll make a comeback.", 'start': 847.732, 'duration': 4.905}, {'end': 853.418, 'text': "Maybe they'll be back.", 'start': 852.718, 'duration': 0.7}, {'end': 854.019, 'text': "We'll see.", 'start': 853.719, 'duration': 0.3}, {'end': 858.701, 'text': "Let me in a small tangent say do you think they'll be back??", 'start': 855.62, 'duration': 3.081}, {'end': 869.623, 'text': "So so much of the breakthroughs recently that we'll talk about on natural language processing and language modeling has been with transformers that don't emphasize recurrence.", 'start': 859.081, 'duration': 10.542}, {'end': 875.964, 'text': 'Do you think recurrence will make a comeback? Well, some kind of recurrence, I think very likely.', 'start': 870.943, 'duration': 5.021}], 'summary': 'Recurrence in neural networks may make a comeback in the future, despite being superseded by transformers.', 'duration': 28.232, 'max_score': 847.732, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A847732.jpg'}, {'end': 1084.67, 'src': 'embed', 'start': 1058.584, 'weight': 1, 'content': [{'end': 1063.227, 'text': 'So in terms of deep learning to answer the question directly, the ideas were all there.', 'start': 1058.584, 'duration': 4.643}, {'end': 1067.59, 'text': 'The thing that was missing was a lot of supervised data and a lot of compute.', 'start': 1063.527, 'duration': 4.063}, {'end': 1075.706, 'text': 'Once you have a lot of supervised data and a lot of compute, then there is a third thing which is needed as well, and that is conviction.', 'start': 1069.804, 'duration': 5.902}, {'end': 1083.55, 'text': 'Conviction that if you take the right stuff, which already exists, and apply and mix with a lot of data and a lot of compute,', 'start': 1076.387, 'duration': 7.163}, {'end': 1084.67, 'text': 'that it will in fact work.', 'start': 1083.55, 'duration': 1.12}], 'summary': 'Deep learning requires supervised data, compute power, and conviction for success.', 'duration': 26.086, 'max_score': 1058.584, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1058584.jpg'}, {'end': 1301.817, 'src': 'embed', 'start': 1261.409, 'weight': 3, 'content': [{'end': 1267.235, 'text': "And that's why today, when someone writes a paper on improving optimization of deep learning and vision,", 'start': 1261.409, 'duration': 5.826}, {'end': 1271.319, 'text': 'it improves the different NLP applications and it improves the different reinforcement learning applications.', 'start': 1267.235, 'duration': 4.084}, {'end': 1273.249, 'text': 'Reinforcement learning.', 'start': 1272.369, 'duration': 0.88}, {'end': 1277.951, 'text': 'So I would say that computer vision and NLP are very similar to each other.', 'start': 1273.289, 'duration': 4.662}, {'end': 1282.132, 'text': 'Today they differ in that they have slightly different architectures.', 'start': 1278.651, 'duration': 3.481}, {'end': 1285.913, 'text': 'We use transformers in NLP and we use convolutional neural networks in vision.', 'start': 1282.152, 'duration': 3.761}, {'end': 1291.454, 'text': "But it's also possible that one day this will change and everything will be unified with a single architecture.", 'start': 1286.533, 'duration': 4.921}, {'end': 1299.356, 'text': 'Because, if you go back a few years ago, in natural language processing there were a huge number of architectures,', 'start': 1291.854, 'duration': 7.502}, {'end': 1301.817, 'text': 'for every different tiny problem had its own architecture.', 'start': 1299.356, 'duration': 2.461}], 'summary': 'Improving deep learning optimization benefits nlp and reinforcement learning applications, with potential for unified architecture.', 'duration': 40.408, 'max_score': 1261.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1261409.jpg'}, {'end': 1374.803, 'src': 'embed', 'start': 1340.811, 'weight': 5, 'content': [{'end': 1343.792, 'text': 'You really do need to do something about exploration.', 'start': 1340.811, 'duration': 2.981}, {'end': 1345.073, 'text': 'Your variance is much higher.', 'start': 1343.852, 'duration': 1.221}, {'end': 1347.514, 'text': 'But I think there is a lot of unity even there.', 'start': 1346.053, 'duration': 1.461}, {'end': 1355.202, 'text': 'And I would expect, for example, that at some point there will be some broader unification between RL and supervised learning,', 'start': 1348.234, 'duration': 6.968}, {'end': 1358.504, 'text': 'where somehow the RL will be making decisions to make the supervised learning go better.', 'start': 1355.202, 'duration': 3.302}, {'end': 1367.189, 'text': 'And it will be, I imagine, one big black box and you just shovel things into it and it just figures out what to do with whatever you shovel in it.', 'start': 1358.524, 'duration': 8.665}, {'end': 1374.803, 'text': 'I mean, reinforcement learning has some aspects of language and vision combined, almost.', 'start': 1368.178, 'duration': 6.625}], 'summary': 'Higher variance in exploration, potential unification of rl and supervised learning for better decision-making.', 'duration': 33.992, 'max_score': 1340.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1340811.jpg'}], 'start': 773.708, 'title': 'Deep learning success and unity', 'summary': 'Delves into the history of deep learning success, emphasizing the transition from recurrent neural networks to transformers, and discusses the unity across machine learning domains, highlighting the overlap of ideas and principles, serving as a pivotal moment in the success of deep learning, and the potential for unification with a single architecture.', 'chapters': [{'end': 1193.234, 'start': 773.708, 'title': 'History of deep learning success', 'summary': 'Discusses the fundamental properties of the brain, the transition from recurrent neural networks to transformers, and the key ideas that led to the success of deep learning, emphasizing the importance of supervised data, compute, and conviction, with imagenet serving as a pivotal moment.', 'duration': 419.526, 'highlights': ['The transition from recurrent neural networks to transformers Recurrent neural networks have been superseded by transformers, but there is a possibility of a comeback.', 'The fundamental property of the brain: the timing of signals The timing of signals is a fundamental property of the brain, crucial in the firing of neurons.', 'The key ideas that led to the success of deep learning The success of deep learning was facilitated by the presence of supervised data, compute, and the conviction to combine them, with ImageNet serving as a pivotal moment.']}, {'end': 1468.64, 'start': 1193.635, 'title': 'Unity in machine learning domains', 'summary': 'Discusses the unity in machine learning, highlighting the overlap of ideas and principles across different domains like computer vision, natural language processing, and reinforcement learning, and the potential for unification with a single architecture.', 'duration': 275.005, 'highlights': ['The overlap of ideas and principles across different machine learning domains such as computer vision, natural language processing, and reinforcement learning, and the potential for unification with a single architecture. The speaker emphasizes the overlap of ideas and principles across different machine learning domains, such as computer vision, natural language processing, and reinforcement learning. They discuss the potential for unification with a single architecture, citing examples of current overlaps and the possibility of future convergence.', 'The unification and simplification of architectures in natural language processing, from a previous multitude of specialized architectures to a single transformer for different tasks. The speaker mentions the unification and simplification of architectures in natural language processing, from a previous multitude of specialized architectures to a single transformer for different tasks. They highlight the evolution towards unified architectures and the potential for similar unification in computer vision.', 'The potential for broader unification between reinforcement learning and supervised learning, with reinforcement learning influencing decision-making to improve supervised learning. The speaker discusses the potential for broader unification between reinforcement learning and supervised learning, envisioning reinforcement learning influencing decision-making to improve supervised learning. They speculate on the possibility of a single integrated system for both types of learning.']}], 'duration': 694.932, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A773708.jpg', 'highlights': ['The transition from recurrent neural networks to transformers has superseded the former, but there is a possibility of a comeback.', 'The success of deep learning was facilitated by the presence of supervised data, compute, and the conviction to combine them, with ImageNet serving as a pivotal moment.', 'The timing of signals is a fundamental property of the brain, crucial in the firing of neurons.', 'The overlap of ideas and principles across different machine learning domains, such as computer vision, natural language processing, and reinforcement learning, and the potential for unification with a single architecture is emphasized.', 'The unification and simplification of architectures in natural language processing, from a previous multitude of specialized architectures to a single transformer for different tasks, is highlighted.', 'The potential for broader unification between reinforcement learning and supervised learning, envisioning reinforcement learning influencing decision-making to improve supervised learning, is discussed.']}, {'end': 2438.598, 'segs': [{'end': 1623.306, 'src': 'embed', 'start': 1598.23, 'weight': 4, 'content': [{'end': 1611.018, 'text': "So vision is just a little example of the kind of structure and fundamental hierarchy of ideas that's already represented in our brain somehow that's represented through language.", 'start': 1598.23, 'duration': 12.788}, {'end': 1623.306, 'text': "But where does vision stop and language begin? That's a really interesting question.", 'start': 1611.418, 'duration': 11.888}], 'summary': "Vision and language are interconnected in the brain's fundamental hierarchy of ideas.", 'duration': 25.076, 'max_score': 1598.23, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1598230.jpg'}, {'end': 1801.978, 'src': 'embed', 'start': 1773.036, 'weight': 1, 'content': [{'end': 1774.657, 'text': "I'm sure we'll get that as well.", 'start': 1773.036, 'duration': 1.621}, {'end': 1785.333, 'text': 'So forgive the romanticized question, but, looking back to you, what is the most beautiful or surprising idea in deep learning or AI in general,', 'start': 1775.444, 'duration': 9.889}, {'end': 1786.253, 'text': "you've come across?", 'start': 1785.333, 'duration': 0.92}, {'end': 1790.517, 'text': 'So I think the most beautiful thing about deep learning is that it actually works.', 'start': 1786.834, 'duration': 3.683}, {'end': 1794.66, 'text': 'And I mean it because you got these ideas, you got the little neural network,', 'start': 1791.718, 'duration': 2.942}, {'end': 1800.097, 'text': "you got the back propagation algorithm And then you've got some theories.", 'start': 1794.66, 'duration': 5.437}, {'end': 1801.978, 'text': 'as to you know, this is kind of like the brain.', 'start': 1800.097, 'duration': 1.881}], 'summary': 'The most beautiful thing about deep learning is that it works, utilizing neural networks and back propagation algorithms.', 'duration': 28.942, 'max_score': 1773.036, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1773036.jpg'}, {'end': 1995.128, 'src': 'embed', 'start': 1965.024, 'weight': 0, 'content': [{'end': 1970.809, 'text': 'So do you think there are still beautiful and mysterious properties in neural networks that are yet to be discovered? Definitely.', 'start': 1965.024, 'duration': 5.785}, {'end': 1973.972, 'text': 'I think that we are still massively underestimating deep learning.', 'start': 1971.429, 'duration': 2.543}, {'end': 1979.118, 'text': 'What do you think it will look like? Like what? If I knew I would have done it.', 'start': 1975.475, 'duration': 3.643}, {'end': 1987.103, 'text': 'So, but if you look at all the progress from the past 10 years, I would say most of it.', 'start': 1981.219, 'duration': 5.884}, {'end': 1995.128, 'text': "I would say there've been a few cases where some were things that felt like really new ideas showed up, but by and large it was every year.", 'start': 1987.103, 'duration': 8.025}], 'summary': 'Underestimating deep learning, with potential for new discoveries.', 'duration': 30.104, 'max_score': 1965.024, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1965024.jpg'}, {'end': 2216.46, 'src': 'embed', 'start': 2179.681, 'weight': 2, 'content': [{'end': 2183.253, 'text': 'Can you describe the main idea and Yeah, definitely.', 'start': 2179.681, 'duration': 3.572}, {'end': 2191.577, 'text': 'So what happened is that over the years some small number of researchers noticed that it is kind of weird that when you make the neural network larger,', 'start': 2183.593, 'duration': 7.984}, {'end': 2194.258, 'text': 'it works better, and it seems to go in contradiction with statistical ideas.', 'start': 2191.577, 'duration': 2.681}, {'end': 2198.541, 'text': 'And then some people made an analysis showing that actually you got this double descent bump.', 'start': 2194.839, 'duration': 3.702}, {'end': 2205.565, 'text': "And what we've done was to show that double descent occurs for pretty much all practical deep learning systems.", 'start': 2198.941, 'duration': 6.624}, {'end': 2216.46, 'text': "And that it'll be also, so can you step back? What's the X axis and the Y axis of a double descent plot? Okay, great.", 'start': 2206.471, 'duration': 9.989}], 'summary': 'Neural networks perform better when larger; double descent occurs for practical deep learning systems.', 'duration': 36.779, 'max_score': 2179.681, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2179681.jpg'}, {'end': 2428.933, 'src': 'embed', 'start': 2400.865, 'weight': 3, 'content': [{'end': 2405.929, 'text': 'If you introduce early stop in your regularization, you can make the double descent bump almost completely disappear.', 'start': 2400.865, 'duration': 5.064}, {'end': 2412.434, 'text': 'What is early stop? Early stopping is when you train your model and you monitor your validation performance.', 'start': 2406.129, 'duration': 6.305}, {'end': 2416.905, 'text': "And then if at some point validation performance starts to get worse, you say, okay, let's stop training.", 'start': 2413.561, 'duration': 3.344}, {'end': 2418.366, 'text': "If you're good, you're good.", 'start': 2417.605, 'duration': 0.761}, {'end': 2419.247, 'text': "You're good enough.", 'start': 2418.787, 'duration': 0.46}, {'end': 2423.111, 'text': 'So the magic happens after that moment.', 'start': 2420.048, 'duration': 3.063}, {'end': 2424.573, 'text': "So you don't want to do the early stopping.", 'start': 2423.131, 'duration': 1.442}, {'end': 2428.933, 'text': "Well, if you don't do the early stopping, you get a very pronounced double descent.", 'start': 2425.051, 'duration': 3.882}], 'summary': 'Introducing early stop in regularization reduces double descent bump.', 'duration': 28.068, 'max_score': 2400.865, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2400865.jpg'}], 'start': 1468.901, 'title': 'Language, visual understanding, and deep learning', 'summary': 'Delves into the challenges of language and visual scene understanding, the remarkable capabilities of deep learning, the potential for further breakthroughs, and the counterintuitive nature of the double descent phenomenon in neural networks.', 'chapters': [{'end': 1679.013, 'start': 1468.901, 'title': 'Language vs. visual understanding', 'summary': "Discusses the difficulty of language understanding and visual scene understanding, questioning the definition of 'hard' problems and the potential overlap between the two domains.", 'duration': 210.112, 'highlights': ["Language understanding may be harder than visual perception due to the challenge of achieving 'absolute top notch, 100% language understanding.' The speaker suggests that achieving complete language understanding may be harder than visual perception.", 'The difficulty of a problem depends on the current capabilities of our tools and whether the problem has been solved. The speaker emphasizes that the difficulty of a problem is relative to the tools available and the extent to which it has been solved.', 'The chapter raises the question of where vision ends and language begins, suggesting an overlap in the systems required for deep understanding in both domains. The discussion explores the possibility that achieving deep understanding in images or language may require the use of a similar system, leading to potential overlap between the two domains.']}, {'end': 2117.341, 'start': 1679.454, 'title': 'The beauty of deep learning', 'summary': 'Discusses the beauty and surprises of deep learning, highlighting the continuous surprises from humans, the impressive performance of neural networks, and the potential for further breakthroughs in the field.', 'duration': 437.887, 'highlights': ['The impressive performance of neural networks, with the ability to continually improve and surpass expectations, showcasing the beauty and surprises of deep learning.', 'The potential for further breakthroughs in deep learning, as the field continues to surprise and exceed previous limitations, indicating ongoing underestimation and mysterious properties in neural networks.', 'The continuous surprises and inspiration from humans, emphasizing the subjective test of continuous pleasure, wit, and new ideas, implying a unique and irreplaceable quality compared to systems.', 'The challenges and complexities in deep learning, with the deepening stack of ideas, systems, data sets, distributed programming, and GPU programming, making it difficult for a single person to excel in every layer of the stack.', 'The potential for robust progress in the field of deep learning, despite the challenges faced by individual researchers and the necessity of large compute resources for significant breakthroughs.']}, {'end': 2438.598, 'start': 2117.93, 'title': 'Deep double descent: bigger models and more data', 'summary': 'Discusses the potential breakthroughs in efficient learning, the concept of double descent in deep learning, and the impact of early stopping in regularization, emphasizing that neural networks can achieve better performance with larger sizes and highlighting the counterintuitive nature of the double descent phenomenon.', 'duration': 320.668, 'highlights': ['The concept of double descent in deep learning is discussed, revealing that increasing the size of a neural network can lead to a rapid increase in performance, followed by a decrease, and then an improvement, highlighting the counterintuitive nature of deep learning phenomena. Double descent occurs for practical deep learning systems, where increasing the size of the neural network initially leads to a rapid performance improvement, followed by a decline, and then a subsequent improvement, contrary to the expected monotonic behavior.', 'The impact of early stopping in regularization is explained, indicating that introducing early stop can minimize the double descent bump, providing insight into the role of early stopping in achieving better model performance. Introducing early stop in regularization can largely eliminate the double descent bump, showcasing the influence of early stopping in enhancing model performance.', 'The potential breakthroughs in efficient learning are mentioned, suggesting that there will be numerous breakthroughs that do not require a huge amount of compute, emphasizing the possibility of important work being done by small groups and individuals. Anticipates a large number of breakthroughs in efficient learning that will not necessitate a massive amount of compute, highlighting the potential for significant contributions from small groups and individuals in the field.']}], 'duration': 969.697, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A1468901.jpg', 'highlights': ['The potential for further breakthroughs in deep learning, as the field continues to surprise and exceed previous limitations, indicating ongoing underestimation and mysterious properties in neural networks.', 'The impressive performance of neural networks, with the ability to continually improve and surpass expectations, showcasing the beauty and surprises of deep learning.', 'The concept of double descent in deep learning is discussed, revealing that increasing the size of a neural network can lead to a rapid increase in performance, followed by a decrease, and then an improvement, highlighting the counterintuitive nature of deep learning phenomena.', 'The impact of early stopping in regularization is explained, indicating that introducing early stop can minimize the double descent bump, providing insight into the role of early stopping in achieving better model performance.', 'The chapter raises the question of where vision ends and language begins, suggesting an overlap in the systems required for deep understanding in both domains.']}, {'end': 3592.455, 'segs': [{'end': 2466.685, 'src': 'embed', 'start': 2438.858, 'weight': 1, 'content': [{'end': 2447.723, 'text': 'The intuition is basically is this that when the data set has as many degrees of freedom as the model,', 'start': 2438.858, 'duration': 8.865}, {'end': 2449.544, 'text': 'then there is a one-to-one correspondence between them.', 'start': 2447.723, 'duration': 1.821}, {'end': 2454.487, 'text': 'And so small changes to the data set lead to noticeable changes in the model.', 'start': 2449.824, 'duration': 4.663}, {'end': 2457.251, 'text': 'So your model is very sensitive to all the randomness.', 'start': 2455.107, 'duration': 2.144}, {'end': 2458.753, 'text': 'It is unable to discard it.', 'start': 2457.331, 'duration': 1.422}, {'end': 2466.685, 'text': 'Whereas it turns out that when you have a lot more data than parameters or a lot more parameters than data,', 'start': 2459.594, 'duration': 7.091}], 'summary': "When data set has as many degrees of freedom as the model, small changes lead to noticeable changes in the model's sensitivity.", 'duration': 27.827, 'max_score': 2438.858, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2438858.jpg'}, {'end': 2601.908, 'src': 'embed', 'start': 2573.387, 'weight': 0, 'content': [{'end': 2583.036, 'text': 'Well, if you look, for example, at AlphaGo or AlphaZero, The neural network of AlphaZero plays Go, which, we all agree,', 'start': 2573.387, 'duration': 9.649}, {'end': 2588.202, 'text': 'is a game that requires reasoning better than 99.9% of all humans.', 'start': 2583.036, 'duration': 5.166}, {'end': 2591.385, 'text': 'Just the neural network, without the search, just the neural network itself.', 'start': 2588.582, 'duration': 2.803}, {'end': 2601.908, 'text': "Doesn't that give us an existence proof that neural networks can reason? To push back and disagree a little bit, we all agree that go is reasoning.", 'start': 2592.347, 'duration': 9.561}], 'summary': "Alphazero's neural network plays go better than 99.9% of humans, showcasing the ability of neural networks to reason.", 'duration': 28.521, 'max_score': 2573.387, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2573387.jpg'}, {'end': 2708.395, 'src': 'embed', 'start': 2684.117, 'weight': 4, 'content': [{'end': 2693.222, 'text': "I think it's definitely possible that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today.", 'start': 2684.117, 'duration': 9.105}, {'end': 2702.571, 'text': 'Maybe a little bit more recurrent, maybe a little bit deeper, but Like these neural nets are so insanely powerful.', 'start': 2693.703, 'duration': 8.868}, {'end': 2704.652, 'text': "Why wouldn't they be able to learn to reason?", 'start': 2702.971, 'duration': 1.681}, {'end': 2708.395, 'text': "Humans can reason, so why can't neural networks?", 'start': 2705.613, 'duration': 2.782}], 'summary': 'Neural networks have potential for reasoning breakthroughs, possibly deeper and more recurrent.', 'duration': 24.278, 'max_score': 2684.117, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2684117.jpg'}, {'end': 2903.074, 'src': 'embed', 'start': 2874.807, 'weight': 2, 'content': [{'end': 2883.612, 'text': 'But do you see it important to be able to try to learn something like programs? I mean, if we can, definitely.', 'start': 2874.807, 'duration': 8.805}, {'end': 2888.897, 'text': "I think it's kind of, the answer is kind of yes, if we can do it.", 'start': 2884.895, 'duration': 4.002}, {'end': 2890.418, 'text': 'We should do things that we can do it.', 'start': 2889.117, 'duration': 1.301}, {'end': 2899.943, 'text': "It's the reason we are pushing on deep learning, the fundamental reason, the root cause is that we are able to train them.", 'start': 2891.238, 'duration': 8.705}, {'end': 2903.074, 'text': 'So in other words, training comes first.', 'start': 2901.573, 'duration': 1.501}], 'summary': 'Importance of learning programs, especially deep learning for training models.', 'duration': 28.267, 'max_score': 2874.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2874807.jpg'}, {'end': 3585.257, 'src': 'embed', 'start': 3554.659, 'weight': 3, 'content': [{'end': 3558.142, 'text': 'they exhibit signs of understanding the semantics, whereas the smaller language models do not.', 'start': 3554.659, 'duration': 3.483}, {'end': 3561.925, 'text': "We've seen that a few years ago when we did work on the sentiment neuron.", 'start': 3558.522, 'duration': 3.403}, {'end': 3568.21, 'text': 'We trained a small, you know, small LSTM to predict the next character in Amazon reviews.', 'start': 3561.965, 'duration': 6.245}, {'end': 3575.393, 'text': 'And we noticed that when you increase the size of the LSTM from 500 LSTM cells to 4000 LSTM cells,', 'start': 3568.81, 'duration': 6.583}, {'end': 3580.735, 'text': 'then one of the neurons starts to represent the sentiment of the article of sorry of their view.', 'start': 3575.393, 'duration': 5.342}, {'end': 3585.257, 'text': 'Now, why is that? Sentiment is a pretty semantic attribute.', 'start': 3582.096, 'duration': 3.161}], 'summary': 'Increasing lstm size led to sentiment representation in neuron.', 'duration': 30.598, 'max_score': 3554.659, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A3554659.jpg'}], 'start': 2438.858, 'title': 'Neural networks and training', 'summary': 'Delves into the sensitivity of neural network models to data set size and parameters, explores reasoning abilities, and emphasizes the importance of training in deep learning, while also discussing interpretability and semantic understanding in large language models.', 'chapters': [{'end': 2550.275, 'start': 2438.858, 'title': 'Neural networks and backpropagation', 'summary': "Discusses the sensitivity of models to data set size and parameters in neural networks, jeff hinton's suggestion to explore alternative training methods, and the ongoing relevance of backpropagation despite its limitations.", 'duration': 111.417, 'highlights': ['Jeff Hinton suggested exploring alternative methods of training neural networks by looking at how the brain learns, despite the ongoing usefulness of backpropagation.', 'The sensitivity of models to data set size and parameters is discussed, emphasizing that a one-to-one correspondence exists when the data set has as many degrees of freedom as the model, leading to noticeable changes in the model with small changes in the data set.', 'When there are a lot more parameters than data, the resulting solution becomes insensitive to small changes in the data set, enabling the model to discard randomness and spurious correlations.']}, {'end': 2872.666, 'start': 2550.455, 'title': 'Neural networks and reasoning', 'summary': "Discusses the potential of neural networks to reason, citing examples like alphago and alphazero, and explores the idea that neural networks can mimic reasoning processes, while also touching on the limitations of neural networks' ability to reason and the concept of finding the shortest program to make the best prediction.", 'duration': 322.211, 'highlights': ['Neural networks like AlphaGo and AlphaZero provide an existence proof that neural networks can reason, demonstrated by their ability to play Go at a superior level to humans without the need for search.', 'The architecture of future neural networks capable of reasoning may be similar to existing architectures, possibly with more recurrence and depth, leveraging the inherent power of neural networks to learn to reason like humans.', 'Neural networks are capable of reasoning, but their ability to reason is contingent on being trained on tasks that require reasoning, as they tend to solve problems in the easiest way possible when not explicitly trained for reasoning.', 'The concept of finding the shortest program that outputs the available data is a theoretical statement that can be proven mathematically, and while neural networks are not able to find the best program, they can find a small or large circuit that fits the data in some way, with the weights containing a small amount of information.', 'The training process of a neural network involves slowly transmitting entropy from the dataset to the parameters, leading to the weights containing a small amount of information, which may explain their ability to generalize well.']}, {'end': 3592.455, 'start': 2874.807, 'title': 'Training pillar and neural networks', 'summary': 'Discusses the importance of training in deep learning, the potential of neural networks to find small programs, the challenge of interpretability, and the ability of large language models to understand semantics from raw data.', 'duration': 717.648, 'highlights': ['The fundamental reason for pushing on deep learning is the ability to train neural networks, with training as the top priority. The chapter emphasizes the importance of training in deep learning, with the fundamental reason for pushing on deep learning being the ability to train neural networks.', 'The potential of neural networks to find small programs is discussed, with the assertion that training a deep neural network to do so should be possible in principle. The discussion explores the potential of neural networks to find small programs, suggesting that training a deep neural network to do so should be possible in principle.', 'The challenge of interpretability in neural networks is raised, with the need for self-awareness and the example of language models generating interpretable text. The chapter raises the challenge of interpretability in neural networks and the need for self-awareness, providing an example of language models generating interpretable text.', 'The ability of large language models to understand semantics from raw data is highlighted, with evidence from the differences observed in small and large LSTM cells. The chapter discusses the ability of large language models to understand semantics from raw data, supported by evidence from differences observed in small and large LSTM cells.']}], 'duration': 1153.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A2438858.jpg', 'highlights': ['Neural networks like AlphaGo and AlphaZero provide an existence proof that neural networks can reason, demonstrated by their ability to play Go at a superior level to humans without the need for search.', 'The sensitivity of models to data set size and parameters is discussed, emphasizing that a one-to-one correspondence exists when the data set has as many degrees of freedom as the model, leading to noticeable changes in the model with small changes in the data set.', 'The fundamental reason for pushing on deep learning is the ability to train neural networks, with training as the top priority. The chapter emphasizes the importance of training in deep learning, with the fundamental reason for pushing on deep learning being the ability to train neural networks.', 'The ability of large language models to understand semantics from raw data is highlighted, with evidence from the differences observed in small and large LSTM cells.', 'The architecture of future neural networks capable of reasoning may be similar to existing architectures, possibly with more recurrence and depth, leveraging the inherent power of neural networks to learn to reason like humans.']}, {'end': 4364.017, 'segs': [{'end': 3662.044, 'src': 'embed', 'start': 3592.536, 'weight': 0, 'content': [{'end': 3595.597, 'text': 'Is the person happy with something or is the person unhappy with something?', 'start': 3592.536, 'duration': 3.061}, {'end': 3603.142, 'text': 'And so here we had very clear evidence that a small neural net does not capture sentiment, while a large neural net does.', 'start': 3596.178, 'duration': 6.964}, {'end': 3610.387, 'text': 'And why is that? Well, our theory is that at some point you run out of syntax to models, you start to gotta focus on something else.', 'start': 3603.682, 'duration': 6.705}, {'end': 3619.233, 'text': 'And with size, you quickly run out of syntax to model, and then you really start to focus on the semantics, would be the idea.', 'start': 3611.147, 'duration': 8.086}, {'end': 3619.853, 'text': "That's right.", 'start': 3619.453, 'duration': 0.4}, {'end': 3624.777, 'text': "And so I don't want to imply that our models have complete semantic understanding, because that's not true.", 'start': 3619.913, 'duration': 4.864}, {'end': 3630.761, 'text': 'But they definitely are showing signs of semantic understanding, partial semantic understanding.', 'start': 3625.397, 'duration': 5.364}, {'end': 3633.363, 'text': 'But the smaller models do not show those signs.', 'start': 3630.821, 'duration': 2.542}, {'end': 3638.126, 'text': 'Can you take a step back and say what is GPT-2,,', 'start': 3634.544, 'duration': 3.582}, {'end': 3643.37, 'text': 'which is one of the big language models that was the conversation changer in the past couple of years?', 'start': 3638.126, 'duration': 5.244}, {'end': 3660.663, 'text': 'Yeah, so GPT-2 is a transformer with one and a half billion parameters that was trained on about 40 billion tokens of text which were obtained from web pages that were linked to from Reddit,', 'start': 3643.79, 'duration': 16.873}, {'end': 3662.044, 'text': 'articles with more than three upvotes.', 'start': 3660.663, 'duration': 1.381}], 'summary': 'Large neural nets capture sentiment better than smaller ones, as shown by gpt-2 with 1.5b parameters.', 'duration': 69.508, 'max_score': 3592.536, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A3592536.jpg'}, {'end': 3727.859, 'src': 'embed', 'start': 3701.562, 'weight': 4, 'content': [{'end': 3705.445, 'text': 'So the transformer uses a lot of attention, but attention existed for a few years.', 'start': 3701.562, 'duration': 3.883}, {'end': 3707.286, 'text': "So that can't be the main innovation.", 'start': 3705.885, 'duration': 1.401}, {'end': 3714.951, 'text': 'The transformer is designed in such a way that it runs really fast on the GPU.', 'start': 3708.467, 'duration': 6.484}, {'end': 3717.733, 'text': 'And that makes a huge amount of difference.', 'start': 3716.152, 'duration': 1.581}, {'end': 3718.954, 'text': 'This is one thing.', 'start': 3718.193, 'duration': 0.761}, {'end': 3722.096, 'text': 'The second thing is that the transformer is not recurrent.', 'start': 3719.394, 'duration': 2.702}, {'end': 3727.859, 'text': 'And that is really important too, because it is more shallow and therefore much easier to optimize.', 'start': 3722.896, 'duration': 4.963}], 'summary': 'The transformer runs fast on gpu, not recurrent, easier to optimize.', 'duration': 26.297, 'max_score': 3701.562, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A3701562.jpg'}, {'end': 4197.14, 'src': 'embed', 'start': 4169.908, 'weight': 5, 'content': [{'end': 4175.551, 'text': 'Like how do we release artificial intelligence models to the public?', 'start': 4169.908, 'duration': 5.643}, {'end': 4185.795, 'text': 'If we do at all, how do we privately discuss with other, even competitors, about how we manage the use of the systems and so on?', 'start': 4176.192, 'duration': 9.603}, {'end': 4190.657, 'text': 'So, from this whole experience, you released a report on it, but, in general,', 'start': 4186.075, 'duration': 4.582}, {'end': 4197.14, 'text': "are there any insights that you've gathered from just thinking about this, about how you release models like this?", 'start': 4190.657, 'duration': 6.483}], 'summary': 'Challenges in releasing ai models publicly and managing private discussions with competitors, leading to insights on model release.', 'duration': 27.232, 'max_score': 4169.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4169908.jpg'}], 'start': 3592.536, 'title': 'Ai model advancements', 'summary': 'Discusses the limitations of small neural nets in capturing sentiment, the impact of gpt-2 with 1.5 billion parameters and 40 billion text tokens, and the need for staged release and cross-company collaboration in deploying powerful ai systems.', 'chapters': [{'end': 3633.363, 'start': 3592.536, 'title': 'Neural networks and semantic understanding', 'summary': 'Discusses how small neural nets fail to capture sentiment while large ones show signs of semantic understanding, suggesting that smaller models run out of syntax to model, leading to a focus on semantics.', 'duration': 40.827, 'highlights': ['Large neural nets show signs of semantic understanding, while small neural nets do not.', 'The theory suggests that as the size of the neural net increases, it runs out of syntax to model, leading to a focus on semantics.', 'The smaller models do not show signs of semantic understanding.']}, {'end': 4364.017, 'start': 3634.544, 'title': 'Gpt-2 and transformers: advancements and impacts', 'summary': 'Discusses the significance of gpt-2, a transformer with 1.5 billion parameters, trained on 40 billion text tokens, its impact on language models and the challenges in releasing powerful ai systems, emphasizing the need for a staged release and cross-company collaboration.', 'duration': 729.473, 'highlights': ['GPT-2 is a transformer with 1.5 billion parameters and was trained on 40 billion tokens of text from web pages linked to from Reddit, articles with more than three upvotes. Highlighting the scale of GPT-2, with 1.5 billion parameters and training on a substantial 40 billion text tokens, showcasing its significant data processing capabilities and scope of training.', 'The transformer, a combination of multiple ideas including attention, is successful due to its simultaneous combination of multiple ideas and its design for fast GPU processing. Explaining the success of the transformer, attributing it to the combination of various ideas, especially attention, and its optimization for GPU processing, highlighting its efficiency and design advantages.', 'The chapter emphasized the need for a staged release and cross-company collaboration in releasing powerful AI systems, considering the potential impact and usage of the models. Underlining the importance of a staged release and cross-company collaboration in releasing powerful AI systems, to address potential impacts and ensure responsible usage of such models.']}], 'duration': 771.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A3592536.jpg', 'highlights': ['Large neural nets show signs of semantic understanding, while small neural nets do not.', 'The theory suggests that as the size of the neural net increases, it runs out of syntax to model, leading to a focus on semantics.', 'The smaller models do not show signs of semantic understanding.', 'GPT-2 is a transformer with 1.5 billion parameters and was trained on 40 billion tokens of text from web pages linked to from Reddit, articles with more than three upvotes.', 'The transformer, a combination of multiple ideas including attention, is successful due to its simultaneous combination of multiple ideas and its design for fast GPU processing.', 'The chapter emphasized the need for a staged release and cross-company collaboration in releasing powerful AI systems, considering the potential impact and usage of the models.']}, {'end': 4704.849, 'segs': [{'end': 4416.629, 'src': 'embed', 'start': 4365.425, 'weight': 2, 'content': [{'end': 4369.93, 'text': 'all the AI developers are building technology, which is going to be increasingly more powerful.', 'start': 4365.425, 'duration': 4.505}, {'end': 4374.734, 'text': "And so it's..", 'start': 4370.87, 'duration': 3.864}, {'end': 4376.996, 'text': "The way to think about it is that ultimately we're all in it together.", 'start': 4374.734, 'duration': 2.262}, {'end': 4381.02, 'text': "Yeah, it's..", 'start': 4378.698, 'duration': 2.322}, {'end': 4386.406, 'text': 'I tend to believe in the better angels of our nature, but I do hope that..', 'start': 4381.02, 'duration': 5.386}, {'end': 4398.215, 'text': 'that when you build a really powerful AI system in a particular domain, that you also think about the potential negative consequences of, yeah.', 'start': 4388.55, 'duration': 9.665}, {'end': 4411.862, 'text': "It's an interesting and scary possibility that there'll be a race for AI development that would push people to close that development and not share ideas with others.", 'start': 4402.457, 'duration': 9.405}, {'end': 4414.227, 'text': "I don't love this.", 'start': 4413.567, 'duration': 0.66}, {'end': 4416.629, 'text': "I've been a pure academic for 10 years.", 'start': 4414.628, 'duration': 2.001}], 'summary': 'Ai developers building powerful tech, need to consider negative consequences and promote collaboration.', 'duration': 51.204, 'max_score': 4365.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4365425.jpg'}, {'end': 4474.107, 'src': 'embed', 'start': 4442.689, 'weight': 0, 'content': [{'end': 4445.411, 'text': 'Do you think self-play will be involved?', 'start': 4442.689, 'duration': 2.722}, {'end': 4449.113, 'text': "Sort of like you've spoken about the powerful mechanism of self-play,", 'start': 4445.651, 'duration': 3.462}, {'end': 4462.582, 'text': 'where systems learn by sort of exploring the world in a competitive setting against other entities that are similarly skilled as them and so incrementally improve in this way.', 'start': 4449.113, 'duration': 13.469}, {'end': 4467.862, 'text': 'Do you think self-play will be a component of building an AGI system? Yeah.', 'start': 4462.602, 'duration': 5.26}, {'end': 4474.107, 'text': 'So what I would say to build AGI, I think is going to be deep learning plus some ideas.', 'start': 4468.202, 'duration': 5.905}], 'summary': 'Self-play is a powerful mechanism for systems to learn and incrementally improve in a competitive setting, a potential component in building agi.', 'duration': 31.418, 'max_score': 4442.689, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4442689.jpg'}, {'end': 4660.45, 'src': 'embed', 'start': 4613.454, 'weight': 1, 'content': [{'end': 4626.1, 'text': "Or do you think it's possible to also just simulate in a photo-realistic and physics-realistic way the real world in a way that we can solve real problems with self-play in simulation?", 'start': 4613.454, 'duration': 12.646}, {'end': 4635.149, 'text': 'So I think that transfer from simulation to the real world is definitely possible and has been exhibited many times by many different groups.', 'start': 4626.841, 'duration': 8.308}, {'end': 4637.651, 'text': "It's been especially successful in vision.", 'start': 4636.05, 'duration': 1.601}, {'end': 4648.162, 'text': 'Also, OpenAI in the summer has demonstrated a robot hand which was trained entirely in simulation in a certain way that allowed for sim to real transfer to occur.', 'start': 4638.813, 'duration': 9.349}, {'end': 4651.903, 'text': "Is this for the Rubik's Cube? Yes, right.", 'start': 4649.941, 'duration': 1.962}, {'end': 4654.625, 'text': "I wasn't aware that was trained in simulation.", 'start': 4652.824, 'duration': 1.801}, {'end': 4656.227, 'text': 'It was trained in simulation entirely.', 'start': 4654.645, 'duration': 1.582}, {'end': 4660.45, 'text': "Really? So it wasn't in the physical, the hand wasn't trained? No.", 'start': 4657.328, 'duration': 3.122}], 'summary': "Simulation can solve real problems, as seen in openai's success in training a robot hand in simulation for sim to real transfer.", 'duration': 46.996, 'max_score': 4613.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4613454.jpg'}], 'start': 4365.425, 'title': 'Agi development and self-play mechanisms', 'summary': 'Discusses the challenges and potential of building increasingly powerful ai systems, emphasizing the importance of self-play for producing surprising and creative solutions. it also explores the potential of self-play mechanisms in simulation versus real-world applications, highlighting the success of sim-to-real transfer in robotics.', 'chapters': [{'end': 4538.653, 'start': 4365.425, 'title': 'Building agi: challenges and potential', 'summary': 'Discusses the development of increasingly powerful ai systems, the potential negative consequences and the components required to build an agi system, emphasizing the importance of self-play and its ability to produce surprising and creative solutions.', 'duration': 173.228, 'highlights': ['Self-play as a component of building an AGI system Self-play is highlighted as a crucial component in building AGI, with the ability to produce surprising and creative solutions, demonstrated through examples such as Dota bot and AlphaZero.', 'Potential negative consequences of powerful AI systems The chapter emphasizes the need for AI developers to consider the potential negative consequences of building powerful AI systems, including the possibility of a race for AI development leading to closed development and lack of idea sharing.', 'The development of increasingly powerful AI systems The chapter discusses the trend of AI developers building technology that is becoming increasingly more powerful, highlighting the need for considering the broader impact and consequences of such advancements.']}, {'end': 4704.849, 'start': 4539.213, 'title': 'Self-play mechanisms in simulation vs real world', 'summary': 'Discusses the potential of self-play mechanisms in simulation versus real-world applications, highlighting the success of sim-to-real transfer in robotics and the adaptability of policies trained in simulation.', 'duration': 165.636, 'highlights': ['OpenAI demonstrated a robot hand trained entirely in simulation for sim-to-real transfer, showing adaptability to the physical world.', 'The success of sim-to-real transfer has been exhibited multiple times by different groups, especially in vision.', 'The current power and results of reinforcement learning have been demonstrated in simulated or constrained physical environments, raising questions about its applicability in non-simulated environments.']}], 'duration': 339.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4365425.jpg', 'highlights': ['Self-play as a crucial component in building AGI, producing surprising and creative solutions', 'OpenAI demonstrated a robot hand trained entirely in simulation for sim-to-real transfer', 'Potential negative consequences of powerful AI systems, including closed development and lack of idea sharing', 'The trend of AI developers building increasingly powerful technology, emphasizing the need to consider broader impact', "Success of sim-to-real transfer in robotics and vision, raising questions about reinforcement learning's applicability in non-simulated environments"]}, {'end': 5838.993, 'segs': [{'end': 4730.344, 'src': 'embed', 'start': 4706.09, 'weight': 0, 'content': [{'end': 4712.031, 'text': "That's a clean, small scale, but clean example of a transfer from the simulated world to the physical world.", 'start': 4706.09, 'duration': 5.941}, {'end': 4717.371, 'text': 'Yeah, and I will also say that I expect the transfer capabilities of deep learning to increase in general.', 'start': 4712.131, 'duration': 5.24}, {'end': 4722.074, 'text': 'And the better the transfer capabilities are, the more useful simulation will become.', 'start': 4718.312, 'duration': 3.762}, {'end': 4730.344, 'text': 'Because then you could take, you could experience something in simulation and then learn a moral of the story,', 'start': 4723.656, 'duration': 6.688}], 'summary': 'Transfer capabilities of deep learning expected to increase, making simulation more useful.', 'duration': 24.254, 'max_score': 4706.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4706090.jpg'}, {'end': 5090.602, 'src': 'heatmap', 'start': 5026.137, 'weight': 1, 'content': [{'end': 5033.922, 'text': "It's kind of hard to judge what depth means, but there's definitely a sense in which humans don't make mistakes that these models do.", 'start': 5026.137, 'duration': 7.785}, {'end': 5037.365, 'text': 'Yes The same is applied to autonomous vehicles.', 'start': 5034.503, 'duration': 2.862}, {'end': 5041.348, 'text': 'The same is probably going to continue being applied to a lot of artificial intelligence systems.', 'start': 5037.805, 'duration': 3.543}, {'end': 5044.09, 'text': 'We find this is the annoying thing.', 'start': 5041.788, 'duration': 2.302}, {'end': 5044.97, 'text': 'This is the process of..', 'start': 5044.11, 'duration': 0.86}, {'end': 5046.812, 'text': 'in the 21st century.', 'start': 5045.791, 'duration': 1.021}, {'end': 5056.376, 'text': 'the process of analyzing the progress of AI is the search for one case where the system fails in a big way, where humans would not.', 'start': 5046.812, 'duration': 9.564}, {'end': 5060.098, 'text': 'And then many people writing articles about it.', 'start': 5057.096, 'duration': 3.002}, {'end': 5066.3, 'text': 'And then broadly as the public generally gets convinced that the system is not intelligent.', 'start': 5060.758, 'duration': 5.542}, {'end': 5071.943, 'text': "And we like pacify ourselves by thinking it's not intelligent because of this one anecdotal case.", 'start': 5066.641, 'duration': 5.302}, {'end': 5074.024, 'text': 'And this seems to continue happening.', 'start': 5071.983, 'duration': 2.041}, {'end': 5080.477, 'text': "Yeah, I mean, there is truth to that, although I'm sure that plenty of people are also extremely impressed by the system that exists today.", 'start': 5074.614, 'duration': 5.863}, {'end': 5086.2, 'text': "But I think this connects to the earlier point we discussed, that it's just confusing to judge progress in AI.", 'start': 5080.877, 'duration': 5.323}, {'end': 5090.602, 'text': 'And, you know, you have a new robot demonstrating something.', 'start': 5087.86, 'duration': 2.742}], 'summary': 'Ai systems are judged on failure cases, confusing to assess progress, and continue to impress many people.', 'duration': 64.465, 'max_score': 5026.137, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A5026137.jpg'}, {'end': 5220.434, 'src': 'embed', 'start': 5186.486, 'weight': 1, 'content': [{'end': 5196.008, 'text': 'but hopefully the 21st would be the creation of an AGI system and the people who have control, direct possession and control of the AGI system.', 'start': 5186.486, 'duration': 9.522}, {'end': 5205.169, 'text': 'So what do you think, after spending that evening having a discussion with the AGI system, what do you think you would do??', 'start': 5197.748, 'duration': 7.421}, {'end': 5220.434, 'text': "Well, the ideal world I'd like to imagine is one where humanity are like the board members of a company where the AGI is the CEO.", 'start': 5206.77, 'duration': 13.664}], 'summary': 'By the 21st century, an agi system should be created, with humans in control, akin to board members with agi as the ceo.', 'duration': 33.948, 'max_score': 5186.486, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A5186486.jpg'}, {'end': 5581.11, 'src': 'embed', 'start': 5551.496, 'weight': 2, 'content': [{'end': 5556.601, 'text': 'I think that the question implies that there is an objective answer, which is an external answer.', 'start': 5551.496, 'duration': 5.105}, {'end': 5558.642, 'text': 'You know, your meaning of life is X.', 'start': 5556.621, 'duration': 2.021}, {'end': 5563.186, 'text': "I think what's going on is that we exist and that's amazing.", 'start': 5558.642, 'duration': 4.544}, {'end': 5572.594, 'text': 'And we should try to make the most of it and try to maximize our own value and enjoyment of a very short time while we do exist.', 'start': 5564.267, 'duration': 8.327}, {'end': 5576.142, 'text': "It's funny because action does require an objective function.", 'start': 5573.538, 'duration': 2.604}, {'end': 5581.11, 'text': "It's definitely there in some form, but it's difficult to make it explicit.", 'start': 5576.162, 'duration': 4.948}], 'summary': 'Existence is amazing; we should maximize value and enjoyment while we exist.', 'duration': 29.614, 'max_score': 5551.496, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A5551496.jpg'}], 'start': 4706.09, 'title': 'Ai transfer capabilities and governance', 'summary': 'Discusses the potential increase in transfer capabilities of deep learning models, future governance models for agi, and ethical considerations including aligning ai gene values to human values and maximizing enjoyment in life.', 'chapters': [{'end': 5186.486, 'start': 4706.09, 'title': 'Transfer capabilities of deep learning', 'summary': 'Discusses the potential increase in transfer capabilities of deep learning models and the usefulness of simulation in learning, as well as the debate on the necessity of a body and consciousness for agi systems, and the challenges in judging progress in ai.', 'duration': 480.396, 'highlights': ['The transfer capabilities of deep learning are expected to increase, making simulations more useful for learning. The speaker expects the transfer capabilities of deep learning to increase, making simulations more useful for learning.', 'Debate on the necessity of a body for AGI systems, with the acknowledgment of the usefulness of having a body for learning certain things. The debate on the necessity of a body for AGI systems is discussed, with the acknowledgment of the usefulness of having a body for learning certain things.', "Discussion on the challenge of judging progress in AI and the skepticism towards deep learning models' mistakes. The challenge of judging progress in AI and the skepticism towards deep learning models' mistakes are discussed."]}, {'end': 5387.8, 'start': 5186.486, 'title': 'Future of agi governance', 'summary': 'Explores the ideal governance model for agi, envisioning a democratic system where humans control and guide agi systems to serve their interests, with the possibility of pressing the reset button if necessary.', 'duration': 201.314, 'highlights': ['Envisioning a democratic governance model for AGI, where humans have direct control and influence, akin to board members of a company. AGI system envisioned as the CEO with different entities, such as cities or countries, having their own AGI representation, guided by the democratic process.', 'The potential to design AGI systems with a deep drive to help humans flourish, akin to human parents nurturing their children. Belief in the possibility of programming AGI with a strong drive to delight in fulfilling the objective of aiding human flourishing.', "Emphasizing the crucial moment of relinquishing power between the creation of AGI and the democratic governance model. Importance of emulating George Washington's act of relinquishing power, highlighting the need for a transition of power to the democratic board members overseeing the AGI system."]}, {'end': 5838.993, 'start': 5389.215, 'title': 'Ethical ai and the meaning of life', 'summary': 'Discusses relinquishing control over agi, aligning ai gene values to human values, the objective function of human existence, and the source of happiness, emphasizing the importance of making the most of life and maximizing enjoyment while existing.', 'duration': 449.778, 'highlights': ['The importance of making the most of life and maximizing enjoyment while existing The speaker emphasizes that humans should try to make the most of their existence and maximize enjoyment while existing.', 'Discussion on relinquishing control over AGI and aligning AI gene values to human values The conversation delves into the ethical implications of relinquishing control over AGI and the mechanisms of aligning AI gene values to human values.', 'The subjective nature of the meaning of life and the source of happiness The discussion explores the subjective nature of the meaning of life and happiness, emphasizing that happiness comes from the way one looks at things and the importance of being humble in the face of uncertainty.', "The speaker's reflections on regrets and moments of pride The speaker reflects on experiencing regrets and moments of pride, emphasizing that they try to take solace in doing the best they could at the time and that happiness comes from the way one looks at things."]}], 'duration': 1132.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/13CZPWmke6A/pics/13CZPWmke6A4706090.jpg', 'highlights': ['The transfer capabilities of deep learning are expected to increase, making simulations more useful for learning.', 'Envisioning a democratic governance model for AGI, where humans have direct control and influence, akin to board members of a company.', 'The importance of making the most of life and maximizing enjoyment while existing.']}], 'highlights': ['Ilya Sutskever, co-founder of OpenAI, with over 165,000 citations, is a prominent computer scientist in the field of deep learning.', 'Cryptocurrency, including Bitcoin, is discussed in the context of the history of money, highlighting its potential to redefine the nature of money.', 'The realization of training large and deep neural networks with backpropagation marked a pivotal moment in the deep learning revolution.', 'The success of deep learning was facilitated by the presence of supervised data, compute, and the conviction to combine them, with ImageNet serving as a pivotal moment.', 'The potential for further breakthroughs in deep learning, as the field continues to surprise and exceed previous limitations, indicating ongoing underestimation and mysterious properties in neural networks.', 'Neural networks like AlphaGo and AlphaZero provide an existence proof that neural networks can reason, demonstrated by their ability to play Go at a superior level to humans without the need for search.', 'The ability of large language models to understand semantics from raw data is highlighted, with evidence from the differences observed in small and large LSTM cells.', 'Self-play as a crucial component in building AGI, producing surprising and creative solutions.', 'The transfer capabilities of deep learning are expected to increase, making simulations more useful for learning.', 'Envisioning a democratic governance model for AGI, where humans have direct control and influence, akin to board members of a company.']}