title

#61: Prof. YANN LECUN: Interpolation, Extrapolation and Linearisation (w/ Dr. Randall Balestriero)

description

We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
Yann LeCun thinks that it's specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Bellestrerio, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour.
[00:00:00] Pre-intro
[00:11:58] Intro Part 1: On linearisation in NNs
[00:28:17] Intro Part 2: On interpolation in NNs
[00:47:45] Intro Part 3: On the curse
[00:57:41] LeCun intro
[00:58:18] Why is it important to distinguish between interpolation and extrapolation?
[01:03:18] Can DL models reason?
[01:06:23] The ability to change your mind
[01:07:59] Interpolation - LeCun steelman argument against NNs
[01:14:11] Should extrapolation be over all dimensions
[01:18:54] On the morphing of MNIST digits, is that interpolation?
[01:20:11] Self-supervised learning
[01:26:06] View on data augmentation
[01:27:42] TangentProp paper with Patrice Simard
[01:29:19] LeCun has no doubt that NNs will be able to perform discrete reasoning
[01:38:44] Discrete vs continous problems?
[01:50:13] Randall introduction
[01:50:13] are the interpolation people barking up the wrong tree?
[01:53:48] Could you steel man the interpolation argument?
[01:56:40] The definition of interpolation
[01:58:33] What if extrapolation was being outside the sample range on every dimension?
[02:01:18] On spurious dimensions and correlations dont an extrapolation make
[02:04:13] Making clock faces interpolative and why DL works at all?
[02:06:59] We discount all the human engineering which has gone into machine learning
[02:08:01] Given the curse, NNs still seem to work remarkably well
[02:10:09] Interpolation doesn't have to be linear though
[02:12:21] Does this invalidate the manifold hypothesis?
[02:14:41] Are NNs basically compositions of piecewise linear functions?
[02:17:54] How does the predictive architecture affect the structure of the latent?
[02:23:54] Spline theory of deep learning, and the view of NNs as piecewise linear decompositions
[02:29:30] Neural Decision Trees
[02:30:59] Continous vs discrete (Keith's favourite question!)
[02:36:20] MNIST is in some sense, a harder problem than Imagenet!
[02:45:26] Randall debrief
[02:49:18] LeCun debrief
Pod version: https://anchor.fm/machinelearningstreettalk/episodes/061-Interpolation--Extrapolation-and-Linearisation-Prof--Yann-LeCun--Dr--Randall-Balestriero-e1cgdr0
Our special thanks to;
- Francois Chollet (buy his book! https://www.manning.com/books/deep-learning-with-python-second-edition)
- Alexander Mattick (Zickzack)
- Rob Lange
- Stella Biderman
References:
Learning in High Dimension Always Amounts to Extrapolation [Randall Balestriero, Jerome Pesenti, Yann LeCun]
https://arxiv.org/abs/2110.09485
A Spline Theory of Deep Learning [Dr. Balestriero, baraniuk]
https://proceedings.mlr.press/v80/balestriero18b.html
Neural Decision Trees [Dr. Balestriero]
https://arxiv.org/pdf/1702.07360.pdf
Interpolation of Sparse High-Dimensional Data [Dr. Thomas Lux]
https://tchlux.github.io/papers/tchlux-2020-NUMA.pdf
If you are an old fart and offended by the background music, here is the intro (first 60 mins) with no background music. https://drive.google.com/file/d/16bc7XJjKJzw4YdvL5rYdRZZB19dSzR70/view?usp=sharing

detail

{'title': '#61: Prof. YANN LECUN: Interpolation, Extrapolation and Linearisation (w/ Dr. Randall Balestriero)', 'heatmap': [{'end': 8513.369, 'start': 8268.151, 'weight': 1}], 'summary': 'Delves into perspectives on neural networks, challenges in ml, interpolation and generalization in machine learning, various ai topics, neural networks and decision making, reevaluating interpolation, generalization and dimensionality in machine learning, deep learning applications, hybrid systems and dataset analysis, and challenges of neural networks, covering a wide range of topics with insights from professor yann lecun.', 'chapters': [{'end': 418.107, 'segs': [{'end': 144.182, 'src': 'embed', 'start': 114.469, 'weight': 0, 'content': [{'end': 119.234, 'text': "Even if you have a million samples, you're only covering a tiny portion of the dimensions.", 'start': 114.469, 'duration': 4.765}, {'end': 129.918, 'text': 'of that space, right? Those images are in a tiny sliver of surface among the space of all possible combinations of values of pixels.', 'start': 120.055, 'duration': 9.863}, {'end': 136.98, 'text': "And so when you show the system a new image, it's very unlikely that this image is a linear combination of previous images.", 'start': 130.578, 'duration': 6.402}, {'end': 144.182, 'text': "What you're doing is extrapolation, not interpolation, okay? And in high dimension, all of machine learning is extrapolation, which is why it's hard.", 'start': 137.18, 'duration': 7.002}], 'summary': 'In high dimensions, machine learning is extrapolation, not interpolation, making it challenging.', 'duration': 29.713, 'max_score': 114.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw114469.jpg'}, {'end': 198.406, 'src': 'embed', 'start': 173.418, 'weight': 2, 'content': [{'end': 179.64, 'text': "That's why we've invented so many tricks to reduce the statistical and approximation complexity of problems,", 'start': 173.418, 'duration': 6.222}, {'end': 183.161, 'text': 'just like we do in computer science with the computational complexity of algorithms.', 'start': 179.64, 'duration': 3.521}, {'end': 187.362, 'text': 'Lacoon is not saying that deep learning models are clairvoyant.', 'start': 183.561, 'duration': 3.801}, {'end': 198.406, 'text': "Yann Lacoon thinks that it's specious to say that neural network models are interpolating because in high dimensions everything is extrapolation.", 'start': 188.103, 'duration': 10.303}], 'summary': 'Yann lacoon discusses reducing complexity in neural network models and challenges the idea of interpolation in high dimensions.', 'duration': 24.988, 'max_score': 173.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw173418.jpg'}], 'start': 0.069, 'title': 'Perspectives on neural networks and deep learning', 'summary': 'Discusses the limitations and misconceptions of deep learning, challenges the notion of interpolation in high dimensions, and provides a new perspective on neural networks, offering insights from professor yann lecun and emphasizing the concept of recursive chopping of input space. it highlights the complexities of learning in high dimensions and the representability of input predictions with a single linear affine transformation, leading to a shift in understanding and decrease in the mystery surrounding neural networks.', 'chapters': [{'end': 48.051, 'start': 0.069, 'title': 'Yann lecun on deep learning', 'summary': 'Discusses the limitation of deep learning and the misconception regarding its use for interpolation and extrapolation, with professor yann lecun stating that it is not a fundamental limitation of deep learning but rather a limitation of supervised learning.', 'duration': 47.982, 'highlights': ['Professor Yann LeCun addresses the misconception about the limitation of deep learning, stating that it is not a fundamental limitation of deep learning but rather a limitation of supervised learning.', 'Yann LeCun refutes the claim that deep learning is limited to interpolation and not for extrapolation, emphasizing that the qualitative difference is not in the form of fundamentally different things from deep learning.']}, {'end': 230.097, 'start': 48.332, 'title': 'Machine learning in high dimensions', 'summary': 'Discusses the concept that in high dimensions, all of machine learning is extrapolation, challenging the notion of interpolation in neural network models and highlighting the complexities of learning in high dimensions.', 'duration': 181.765, 'highlights': ['In high dimensions, all of machine learning is extrapolation Yann Lacoon argues that in high dimensions, machine learning operates through extrapolation rather than interpolation, challenging traditional notions of learning.', 'Challenging the notion of interpolation in neural network models The discussion challenges the idea that neural network models primarily rely on interpolation in high dimensions, asserting that it is instead extrapolation that characterizes machine learning in such contexts.', 'Complexities of learning in high dimensions The complexities of learning in high dimensions are highlighted, emphasizing the challenges posed by the vast dimensionality of data and the limitations of traditional statistical generalization in such contexts.']}, {'end': 418.107, 'start': 231.798, 'title': 'New perspective on neural networks', 'summary': 'Reveals a new perspective on neural networks, emphasizing the concept of neural networks recursively chopping up the input space and the idea that each input prediction is representable with a single linear affine transformation, leading to a shift in understanding and a decrease in the mystery surrounding neural networks.', 'duration': 186.309, 'highlights': ["The revelation that neural networks recursively chop up the input space into little convex cells or polyhedra, leading to a shift in understanding of their behavior. Randall's view challenges the traditional understanding of neural networks as interpolators, suggesting that they recursively divide the input space into convex cells, similar to decision trees, which share information between layers.", 'Realization that each input prediction is representable with a single linear affine transformation, leading to a decrease in the mystery surrounding neural networks. The understanding that each input prediction can be represented with a single linear affine transformation diminishes the perceived complexity and mystery surrounding the behavior of neural networks.', 'Comparison of neural networks to decision trees and classical machine learning, leading to a decrease in the mystery surrounding neural networks. The comparison of neural networks to decision trees and classical machine learning techniques reduces the perceived enigmatic nature of neural networks, likening them to more familiar methods.']}], 'duration': 418.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw69.jpg', 'highlights': ['Yann LeCun refutes the claim that deep learning is limited to interpolation and not for extrapolation, emphasizing that the qualitative difference is not in the form of fundamentally different things from deep learning.', 'In high dimensions, all of machine learning is extrapolation Yann Lacoon argues that in high dimensions, machine learning operates through extrapolation rather than interpolation, challenging traditional notions of learning.', 'The revelation that neural networks recursively chop up the input space into little convex cells or polyhedra, leading to a shift in understanding of their behavior.', 'Realization that each input prediction is representable with a single linear affine transformation, leading to a decrease in the mystery surrounding neural networks.']}, {'end': 1740.346, 'segs': [{'end': 816.549, 'src': 'embed', 'start': 791.212, 'weight': 11, 'content': [{'end': 809.039, 'text': "Another cool thing to come out of Randall's work was a geometrically principled way of devising regularization penalty terms which can improve neural network performance by orthogonalizing the placement of those latent hyperplane boundaries to increase their representational power.", 'start': 791.212, 'duration': 17.827}, {'end': 816.549, 'text': 'In short, looking at neural networks through this piecewise linear kaleidoscope, if you will,', 'start': 810.18, 'duration': 6.369}], 'summary': "Randall's work devises regularization penalty terms to improve neural network performance by orthogonalizing latent hyperplane boundaries.", 'duration': 25.337, 'max_score': 791.212, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw791212.jpg'}, {'end': 895.082, 'src': 'embed', 'start': 866.85, 'weight': 9, 'content': [{'end': 874.073, 'text': 'where the name of the game is to perturb the input as little as possible to the nearest polyhedron with a different class.', 'start': 866.85, 'duration': 7.223}, {'end': 880.776, 'text': "Anyhow, for me, Randall's work has transformed the way I think about neural networks.", 'start': 874.933, 'duration': 5.843}, {'end': 884.757, 'text': "I know what they're doing at a much deeper level now.", 'start': 881.716, 'duration': 3.041}, {'end': 895.082, 'text': 'Each layer of a neural network contributes a new set of hyperplanes, and the ReLUs act to toggle the hyperplanes in an input-sensitive way.', 'start': 885.778, 'duration': 9.304}], 'summary': "Randall's work transformed understanding of neural networks at a deeper level.", 'duration': 28.232, 'max_score': 866.85, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw866850.jpg'}, {'end': 1119.688, 'src': 'embed', 'start': 1092.123, 'weight': 5, 'content': [{'end': 1096.544, 'text': "I mean, I couldn't help but notice that everything in the machine learning world is linear.", 'start': 1092.123, 'duration': 4.421}, {'end': 1101.805, 'text': 'All the popular algorithms are linear, and any non-linearity is a trick.', 'start': 1097.104, 'duration': 4.701}, {'end': 1107.006, 'text': 'I mean we just apply some non-linear transformation to the data before running it through our algorithms,', 'start': 1101.825, 'duration': 5.181}, {'end': 1112.267, 'text': 'just like how it is in support vector machines or kernel ridge regression in Gaussian processors.', 'start': 1107.006, 'duration': 5.261}, {'end': 1114.007, 'text': 'Deep learning models are no different.', 'start': 1112.707, 'duration': 1.3}, {'end': 1119.688, 'text': "We're just placing these relus all over the input space to slice it and dice it.", 'start': 1114.387, 'duration': 5.301}], 'summary': 'Machine learning algorithms mostly rely on linearity with non-linear tricks, such as applying transformations or using relus in deep learning models.', 'duration': 27.565, 'max_score': 1092.123, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1092123.jpg'}, {'end': 1248.727, 'src': 'embed', 'start': 1217.764, 'weight': 0, 'content': [{'end': 1222.269, 'text': 'I imagined neural network latent space is a bit like a UMAP or a tisany projection plot.', 'start': 1217.764, 'duration': 4.505}, {'end': 1224.911, 'text': 'And this leads us to misunderstand their behavior.', 'start': 1222.689, 'duration': 2.222}, {'end': 1229.296, 'text': 'The latent space that these examples get projected into is not homogeneous.', 'start': 1225.312, 'duration': 3.984}, {'end': 1236.501, 'text': 'Depending on which cell you fell into in the input space or the ambient space, a different affine transformation will be applied,', 'start': 1229.616, 'duration': 6.885}, {'end': 1239.123, 'text': 'sending you to a different region of the latent space.', 'start': 1236.501, 'duration': 2.622}, {'end': 1248.727, 'text': 'so the latent space is kind of stitched together like bits of a cosmic jigsaw puzzle in the ambient space, and then, when you run umap on the latent,', 'start': 1239.123, 'duration': 9.604}], 'summary': 'Neural network latent space behaves non-homogeneously, leading to varied affine transformations and regions in the ambient space.', 'duration': 30.963, 'max_score': 1217.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1217764.jpg'}, {'end': 1280.875, 'src': 'embed', 'start': 1254.15, 'weight': 2, 'content': [{'end': 1259.813, 'text': 'diffeomorphic transformation of the entire input space and successive layers in the neural network,', 'start': 1254.15, 'duration': 5.663}, {'end': 1262.334, 'text': 'or even learning the topology of the data manifold.', 'start': 1259.813, 'duration': 2.521}, {'end': 1266.859, 'text': "I think it's much better to think of neural networks as quantizing the input space,", 'start': 1262.754, 'duration': 4.105}, {'end': 1270.623, 'text': 'much like a vector search engine does using locality-sensitive hashing.', 'start': 1266.859, 'duration': 3.764}, {'end': 1274.487, 'text': 'Now imagine a classification problem on the Cartesian plane,', 'start': 1271.244, 'duration': 3.243}, {'end': 1280.875, 'text': 'where the upper right and lower left quadrants are blue and the upper left and lower right quadrants are orange.', 'start': 1274.487, 'duration': 6.388}], 'summary': 'Neural networks quantize input space like vector search engines, enabling efficient classification.', 'duration': 26.725, 'max_score': 1254.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1254150.jpg'}, {'end': 1547.602, 'src': 'embed', 'start': 1513.77, 'weight': 6, 'content': [{'end': 1517.592, 'text': 'And he was showing figures from his paper, and they all had two things in common.', 'start': 1513.77, 'duration': 3.822}, {'end': 1529.297, 'text': 'First, they depicted very simple relationships that were previously undiscovered, okay? And second, all the fitted models were piecewise linear.', 'start': 1518.512, 'duration': 10.785}, {'end': 1538.957, 'text': "He even explicitly commented along the lines that of course there's some underlying smooth nonlinear relationship,", 'start': 1530.612, 'duration': 8.345}, {'end': 1547.602, 'text': "but absent a solid theoretical model of that which is often going to be the case for new discoveries, you're better off with simple,", 'start': 1538.957, 'duration': 8.645}], 'summary': 'Figures in paper showed simple relationships, all fitted models were piecewise linear.', 'duration': 33.832, 'max_score': 1513.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1513770.jpg'}, {'end': 1615.472, 'src': 'embed', 'start': 1594.11, 'weight': 3, 'content': [{'end': 1604.282, 'text': 'It conjures a scene of Darth Vader, that cybernetic warlord towering over a growing neural network saying, embrace your hyperplanes.', 'start': 1594.11, 'duration': 10.172}, {'end': 1611.73, 'text': "Now, you know I've always been skeptical and often remind us of the limitations of today's machine learning.", 'start': 1605.888, 'duration': 5.842}, {'end': 1615.472, 'text': "I'd say things like, ML isn't magic learning.", 'start': 1612.33, 'duration': 3.142}], 'summary': 'Skeptical view of machine learning, comparing it to darth vader and emphasizing its limitations.', 'duration': 21.362, 'max_score': 1594.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1594110.jpg'}, {'end': 1672.798, 'src': 'embed', 'start': 1642.905, 'weight': 7, 'content': [{'end': 1648.248, 'text': 'A neuron puts in a hyperplane and then lets the rest of the network chop more as needed.', 'start': 1642.905, 'duration': 5.343}, {'end': 1655.992, 'text': 'All the success of neural networks seems explained by piecewise linear functions.', 'start': 1649.468, 'duration': 6.524}, {'end': 1668.652, 'text': 'I also find it intriguing that our own brains, our own wet neural networks, have somehow gained access to smooth, nonlinear imagination.', 'start': 1657.617, 'duration': 11.035}, {'end': 1672.798, 'text': "Just look at the laws we've defined in physics.", 'start': 1669.914, 'duration': 2.884}], 'summary': 'Neural networks rely on piecewise linear functions; brains access smooth, nonlinear imagination.', 'duration': 29.893, 'max_score': 1642.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1642905.jpg'}], 'start': 418.747, 'title': 'Challenges and perspectives in ml', 'summary': 'Covers challenges in ml, including dimensionality and extrapolation, deceptive nature of neural networks, and implications for generalization. it also discusses the production process, importance of engineering rigor, and understanding neural networks through a piecewise linear perspective.', 'chapters': [{'end': 560.295, 'start': 418.747, 'title': 'Challenges in machine learning', 'summary': 'Discusses the challenges of dimensionality and extrapolation in machine learning, the deceptive nature of neural networks, and the implications for generalization in deep learning, highlighting the importance of domain-specific information and the role of weights & biases in mlops.', 'duration': 141.548, 'highlights': ['The challenge of the curse of dimensionality and the need to know what to ignore in the input space due to exponential growth with dimensions.', 'The problem of extrapolation outside of training data and its implications for generalization in deep learning.', 'The deceptive nature of neural networks and the incorporation of human-crafted domain knowledge, impacting their blank slate perception.', 'The importance of Weights & Biases as a developer-first MLOps platform for experiment tracking, dataset versioning, and model management.']}, {'end': 711.579, 'start': 560.295, 'title': 'Ml model production process', 'summary': 'Discusses the importance of engineering rigor in bringing machine learning systems to production and the benefits of using weights & biases, including improved model performance, reproducibility, and management information.', 'duration': 151.284, 'highlights': ['Weights & Biases helps improve model performance and increases knowledge sharing within the team. It makes your team more resilient by increasing your knowledge sharing. It helps you build models which not only have better predictive performance, but are more safe and secure.', 'Weights & Biases facilitates reproducibility and immortalizes important decisions in the machine learning process. Because Weights and Biases orchestrates the entire machine learning process, your models are reproducible and important decisions are immortalized. This fine level of control lets you set guardrails where you need to and reduce friction wherever possible.', 'The speaker emphasizes the importance of engineering fundamentals and reproducibility in ML model development. Engineering fundamentals are so important when you bring machine learning systems to production, as is helping data scientists to become first class citizens in the entire lifecycle of model development and deployment.']}, {'end': 1175.159, 'start': 711.579, 'title': 'Understanding neural networks through piecewise linear perspective', 'summary': "Discusses randall's work on understanding neural networks as compositions of linear functions in polyhedra, providing new technical understanding and insights into neural network functioning, and the implications of piecewise linear models on the structure and learning of neural networks.", 'duration': 463.58, 'highlights': ["Randall's work demonstrates that a large class of neural networks can be entirely rewritten as compositions of linear functions arranged in polyhedra, providing new technical understanding and insights into neural network functioning. Randall's work shows that neural networks, including CNNs, ResNets, and RNNs, can be rewritten as compositions of linear functions in polyhedra, shedding new light on the functioning of neural networks.", 'The piecewise linear perspective on neural networks opens new avenues of technical understanding and exploration, offering a geometrically principled way of devising regularization penalty terms to improve neural network performance. The piecewise linear perspective on neural networks provides a principled way of devising regularization terms to enhance neural network performance and opens new technical understanding and exploration.', 'The discussion delves into the implications of piecewise linear models on the structure and learning of neural networks, challenging the traditional view of neural networks and emphasizing the importance of piecewise linear transformations. The chapter challenges the traditional view of neural networks, emphasizing the significance of piecewise linear transformations and their impact on the structure and learning of neural networks.']}, {'end': 1409.376, 'start': 1175.719, 'title': 'Understanding neural networks behavior', 'summary': 'Delves into the behavior of neural networks, discussing the exponential function space of nonlinear functions, the role of relu cells in defining boundaries, and the composition of piecewise linear functions in forming decision surfaces in the ambient space.', 'duration': 233.657, 'highlights': ['Neural networks quantize the input space like a vector search engine using locality-sensitive hashing, and they share information by reusing hyperplanes to form decision boundaries. Neural networks act as a vector search engine, quantizing the input space and sharing information by reusing hyperplanes, forming decision boundaries.', 'The composition of piecewise linear functions in the second layer forms a decision surface in the ambient space, creating a smooth appearance through the combination of piecewise linear chops. The second layer of neural networks composes piecewise linear functions to form a decision surface in the ambient space, creating a seemingly smooth appearance.', 'Negative weights allow hyperplanes to combine in interesting ways, partially canceling each other out to enable more complex decision boundaries. Negative weights in neural networks enable hyperplanes to combine and partially cancel out, allowing for more complex decision boundaries.']}, {'end': 1740.346, 'start': 1410.016, 'title': 'Piecewise linear functions in neural networks', 'summary': 'Discusses the prevalence of piecewise linear functions in neural networks, emphasizing the importance of linearizing data and the limitations of these functions in capturing smooth, nonlinear relationships, and extrapolation in the general setting.', 'duration': 330.33, 'highlights': ['Neural networks primarily utilize piecewise linear functions to chop up the input space, showcasing the necessity of linearizing data before it enters the network.', 'The limitations of piecewise linear functions in capturing smooth, nonlinear relationships and extrapolation in the general setting are emphasized, questioning the ability of neural networks to learn fundamental smooth relationships.', 'Emphasizing the prevalence of piecewise linear functions in successful deep networks, suggesting that they have abandoned the pursuit of smooth, nonlinear models in favor of simplicity and scalability.', "The contrast between neural networks' flat boundaries and sharp edges in chopping up input space and the smooth, nonlinear imagination of the human brain is highlighted, questioning the ability of neural networks to discover fundamental smooth relationships.", 'The importance of finding interpolative representations in machine learning is discussed, with an example illustrating the limitations of pixel grids in capturing interpolative representations compared to using Euclidean coordinates.']}], 'duration': 1321.599, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw418747.jpg', 'highlights': ['The challenge of the curse of dimensionality and the need to know what to ignore in the input space due to exponential growth with dimensions.', 'The problem of extrapolation outside of training data and its implications for generalization in deep learning.', 'The deceptive nature of neural networks and the incorporation of human-crafted domain knowledge, impacting their blank slate perception.', "Randall's work demonstrates that a large class of neural networks can be entirely rewritten as compositions of linear functions arranged in polyhedra, providing new technical understanding and insights into neural network functioning.", 'The piecewise linear perspective on neural networks opens new avenues of technical understanding and exploration, offering a geometrically principled way of devising regularization penalty terms to improve neural network performance.', 'Neural networks quantize the input space like a vector search engine using locality-sensitive hashing, and they share information by reusing hyperplanes to form decision boundaries.', 'The composition of piecewise linear functions in the second layer forms a decision surface in the ambient space, creating a smooth appearance through the combination of piecewise linear chops.', 'Negative weights allow hyperplanes to combine in interesting ways, partially canceling each other out to enable more complex decision boundaries.', 'Neural networks primarily utilize piecewise linear functions to chop up the input space, showcasing the necessity of linearizing data before it enters the network.', 'The limitations of piecewise linear functions in capturing smooth, nonlinear relationships and extrapolation in the general setting are emphasized, questioning the ability of neural networks to learn fundamental smooth relationships.', 'Emphasizing the prevalence of piecewise linear functions in successful deep networks, suggesting that they have abandoned the pursuit of smooth, nonlinear models in favor of simplicity and scalability.', 'The importance of finding interpolative representations in machine learning is discussed, with an example illustrating the limitations of pixel grids in capturing interpolative representations compared to using Euclidean coordinates.']}, {'end': 4079.101, 'segs': [{'end': 1931.181, 'src': 'embed', 'start': 1903.914, 'weight': 1, 'content': [{'end': 1909.735, 'text': "Well, a problem that's intrinsically interpolated, but they won't generalize systematically to anything else.", 'start': 1903.914, 'duration': 5.821}, {'end': 1917.638, 'text': "You won't generalize when looking at problems that are not interpolated in nature and problems that are outside the training distribution.", 'start': 1910.116, 'duration': 7.522}, {'end': 1920.939, 'text': 'So why do we talk about this binary notion of extrapolation?', 'start': 1918.378, 'duration': 2.561}, {'end': 1931.181, 'text': "Well, the problem with the binary convex whole notion of extrapolation is that we're promoting this idea that the moment an example falls epsilon outside the hole,", 'start': 1921.459, 'duration': 9.722}], 'summary': 'Challenges of generalizing to non-interpolated problems and promoting binary notion of extrapolation.', 'duration': 27.267, 'max_score': 1903.914, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1903914.jpg'}, {'end': 2342.671, 'src': 'embed', 'start': 2321.295, 'weight': 15, 'content': [{'end': 2329.978, 'text': "but, as we'll discover slightly later, neural networks might have a slight advantage over pure play, simplex interpolation, just because they work,", 'start': 2321.295, 'duration': 8.683}, {'end': 2333.48, 'text': 'principally by figuring out boundaries of the input space to exclude.', 'start': 2329.978, 'duration': 3.502}, {'end': 2336.684, 'text': "So you know, if you're on the zero side of the relu, that is.", 'start': 2334.12, 'duration': 2.564}, {'end': 2342.671, 'text': 'So that helps a lot in high dimensions when you have a poor sampling density in a particular region of the input space.', 'start': 2337.164, 'duration': 5.507}], 'summary': 'Neural networks excel in determining input space boundaries, particularly in high dimensions with low sampling density.', 'duration': 21.376, 'max_score': 2321.295, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2321295.jpg'}, {'end': 2499.731, 'src': 'embed', 'start': 2473.422, 'weight': 14, 'content': [{'end': 2477.843, 'text': 'which are a way of adding a kind of enforced smoothness to the model predictions.', 'start': 2473.422, 'duration': 4.421}, {'end': 2482.665, 'text': 'So if you hit the right bias, then it can be beneficial.', 'start': 2478.323, 'duration': 4.342}, {'end': 2487.446, 'text': "If you impose the wrong bias, then it's going to hurt you.", 'start': 2483.445, 'duration': 4.001}, {'end': 2489.187, 'text': 'And this is a well-known trade-off.', 'start': 2487.806, 'duration': 1.381}, {'end': 2499.731, 'text': "So, of course, the whole endeavor of machine learning is defining the right inductive biases and leaving whatever you don't know to the data,", 'start': 2489.227, 'duration': 10.504}], 'summary': 'Inductive biases in machine learning can impact model predictions. finding the right bias is crucial for beneficial outcomes.', 'duration': 26.309, 'max_score': 2473.422, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2473422.jpg'}, {'end': 2587.895, 'src': 'embed', 'start': 2561.407, 'weight': 2, 'content': [{'end': 2567.433, 'text': "In order to even know that we're in an extrapolative regime, the basis functions must be based near to the training examples.", 'start': 2561.407, 'duration': 6.026}, {'end': 2572.438, 'text': "Typically, neural networks do not behave sensibly in regions where there's no training.", 'start': 2568.094, 'duration': 4.344}, {'end': 2573.119, 'text': 'examples right?', 'start': 2572.438, 'duration': 0.681}, {'end': 2578.344, 'text': 'Because their learned basis functions have been localized around the training data in the ambient space.', 'start': 2573.159, 'duration': 5.185}, {'end': 2583.49, 'text': "It's also worth noting that extrapolation and interpolation are two completely different regimes.", 'start': 2579.365, 'duration': 4.125}, {'end': 2587.895, 'text': 'Optimizing for one typically means being worse at the other.', 'start': 2583.951, 'duration': 3.944}], 'summary': 'Neural networks rely on localized basis functions near training data, impacting extrapolation and interpolation performance.', 'duration': 26.488, 'max_score': 2561.407, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2561407.jpg'}, {'end': 2792.463, 'src': 'embed', 'start': 2766.401, 'weight': 3, 'content': [{'end': 2773.105, 'text': "Gary Marcus talks about the parlor trick of intelligence, but isn't it ironic that there are more parlor tricks going on than most people realize??", 'start': 2766.401, 'duration': 6.704}, {'end': 2781.329, 'text': 'Namely that rather than doing smooth geometric morphing via interpolation, these networks are actually chopping up and composing linear polyhedra?', 'start': 2773.565, 'duration': 7.764}, {'end': 2788.241, 'text': 'Now, if you interpolate between two latent classes, it might traverse several polyhedra in the intermediate space.', 'start': 2782.219, 'duration': 6.022}, {'end': 2792.463, 'text': 'Along the way, it would pick up characteristics from all of those polyhedra.', 'start': 2788.762, 'duration': 3.701}], 'summary': 'Gary marcus discusses the use of parlor tricks in neural networks for interpolation and the traversal of multiple polyhedra in the intermediate space', 'duration': 26.062, 'max_score': 2766.401, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2766401.jpg'}, {'end': 2891.701, 'src': 'embed', 'start': 2862.116, 'weight': 11, 'content': [{'end': 2865.099, 'text': "At best, we think they're learning some approximate aspects of it.", 'start': 2862.116, 'duration': 2.983}, {'end': 2875.997, 'text': 'So essentially, all machine learning problems that we need to deal with nowadays are extremely highly dimensional.', 'start': 2870.856, 'duration': 5.141}, {'end': 2880.878, 'text': 'Even basic image problems live in thousands or even millions of dimensions.', 'start': 2876.637, 'duration': 4.241}, {'end': 2888.7, 'text': 'Now, I think most people have this intuition of convex hull membership, which is to say in two dimensions as you sample more and more training data,', 'start': 2881.498, 'duration': 7.202}, {'end': 2891.701, 'text': 'the convex hull eventually fills the entire space.', 'start': 2888.7, 'duration': 3.001}], 'summary': 'Machine learning problems are highly dimensional, even in millions of dimensions, posing challenges for training data sampling.', 'duration': 29.585, 'max_score': 2862.116, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2862116.jpg'}, {'end': 2930.421, 'src': 'embed', 'start': 2902.044, 'weight': 12, 'content': [{'end': 2908.588, 'text': 'It only works if we make some very strong assumptions about the regularities in the space of functions that we need to search through.', 'start': 2902.044, 'duration': 6.544}, {'end': 2913.27, 'text': 'The classical assumptions that we make in machine learning are no longer appropriate.', 'start': 2909.128, 'duration': 4.142}, {'end': 2916.192, 'text': 'So in general learning in high dimensions is intractable.', 'start': 2913.73, 'duration': 2.462}, {'end': 2919.914, 'text': 'The number of samples grows exponentially with the number of dimensions,', 'start': 2916.532, 'duration': 3.382}, {'end': 2930.421, 'text': 'and the curse of dimensionality refers to the various phenomena that arise when analyzing and organizing data in high dimensional spaces that do not occur in low dimensional settings,', 'start': 2919.914, 'duration': 10.507}], 'summary': 'Learning in high dimensions is intractable due to exponential growth of samples and curse of dimensionality.', 'duration': 28.377, 'max_score': 2902.044, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw2902044.jpg'}, {'end': 3131.229, 'src': 'embed', 'start': 3107.427, 'weight': 5, 'content': [{'end': 3115.586, 'text': 'In two dimensions, that is just the area of a disk with diameter 1, and is about 79%.', 'start': 3107.427, 'duration': 8.159}, {'end': 3124.848, 'text': 'Extending to three dimensions, imagine a ball in a cube, and we scan a square from one face through the ball to the opposite face.', 'start': 3115.586, 'duration': 9.262}, {'end': 3131.229, 'text': 'As it passes the center, we have our familiar inscribed disc in a square.', 'start': 3126.068, 'duration': 5.161}], 'summary': 'In 2d, the area of a disk with diameter 1 is about 79%, extending to 3d creates an inscribed disc in a square.', 'duration': 23.802, 'max_score': 3107.427, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3107427.jpg'}, {'end': 3222.077, 'src': 'embed', 'start': 3193.267, 'weight': 0, 'content': [{'end': 3204.351, 'text': 'As dimensionality grows, space expands exponentially, points grow further apart, and the volume near each point vanishes.', 'start': 3193.267, 'duration': 11.084}, {'end': 3212.795, 'text': 'In this episode, our guests argue this curse dooms traditional concepts of interpolation,', 'start': 3206.532, 'duration': 6.263}, {'end': 3218.577, 'text': 'even if we allow for the high dimensional transformative power of deep neural networks.', 'start': 3212.795, 'duration': 5.782}, {'end': 3222.077, 'text': 'Yeah, so the cross-dimensionality.', 'start': 3219.315, 'duration': 2.762}], 'summary': 'High dimensionality causes exponential space expansion and challenges traditional interpolation concepts.', 'duration': 28.81, 'max_score': 3193.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3193267.jpg'}, {'end': 3283.396, 'src': 'embed', 'start': 3249.902, 'weight': 6, 'content': [{'end': 3252.785, 'text': 'my algorithm is going to have more and more trouble to keep the pace.', 'start': 3249.902, 'duration': 2.883}, {'end': 3256.949, 'text': 'and so this, this curse, can take different flavors, right?', 'start': 3252.785, 'duration': 4.164}, {'end': 3268.178, 'text': 'so this curse might might might have like a statistical reason, in the sense that as i make my input space bigger, there would be many, many, many,', 'start': 3256.949, 'duration': 11.229}, {'end': 3273.044, 'text': 'much exponentially more functions, real functions out there.', 'start': 3268.178, 'duration': 4.866}, {'end': 3276.348, 'text': 'that would explain the training set that would basically pass through the training points.', 'start': 3273.044, 'duration': 3.304}, {'end': 3283.396, 'text': 'And so the more dimensions I add, the more uncertainty I have about the true function, right?', 'start': 3277.509, 'duration': 5.887}], 'summary': 'Algorithm faces challenges with larger input space leading to more uncertainty.', 'duration': 33.494, 'max_score': 3249.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3249902.jpg'}, {'end': 3400.755, 'src': 'embed', 'start': 3377.667, 'weight': 7, 'content': [{'end': 3387.791, 'text': 'He received his PhD in computer science from the modern-day Sorbonne University in 1987, during which he proposed an early form of backpropagation,', 'start': 3377.667, 'duration': 10.124}, {'end': 3390.972, 'text': 'which is of course the backbone for training all neural networks.', 'start': 3387.791, 'duration': 3.181}, {'end': 3396.113, 'text': 'Whilst he was at Bell Labs, he invented convolutional neural networks,', 'start': 3392.072, 'duration': 4.041}, {'end': 3400.755, 'text': 'which again are the backbone of most major deep learning architectures in production today.', 'start': 3396.113, 'duration': 4.642}], 'summary': 'In 1987, he proposed an early backpropagation and invented convolutional neural networks at bell labs.', 'duration': 23.088, 'max_score': 3377.667, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3377667.jpg'}, {'end': 3662.67, 'src': 'embed', 'start': 3636.209, 'weight': 8, 'content': [{'end': 3645.62, 'text': 'So here we adopt a definition which is an obvious and simple generalization of interpolation in low dimension,', 'start': 3636.209, 'duration': 9.411}, {'end': 3650.286, 'text': 'which is that you interpolate when a point is in between the points you already know.', 'start': 3645.62, 'duration': 4.666}, {'end': 3662.67, 'text': 'And the generalization of this in high dimension is you interpolate when a new point is inside the convex hull of the points that you already know.', 'start': 3652.207, 'duration': 10.463}], 'summary': 'Interpolation generalization: new point inside convex hull of known points', 'duration': 26.461, 'max_score': 3636.209, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3636209.jpg'}, {'end': 3802.505, 'src': 'embed', 'start': 3735.295, 'weight': 4, 'content': [{'end': 3737.796, 'text': 'The smallest ellipsoid that contains all the points right?', 'start': 3735.295, 'duration': 2.501}, {'end': 3742.838, 'text': 'So some points are going to be on the surface of the ellipsoid, but most of them are going to be inside.', 'start': 3737.896, 'duration': 4.942}, {'end': 3748.92, 'text': 'And for this type of and now your definition of interpolation is,', 'start': 3744.018, 'duration': 4.902}, {'end': 3753.222, 'text': 'that is a new point likely to be inside the ellipsoid of your previous point or outside?', 'start': 3748.92, 'duration': 4.302}, {'end': 3763.588, 'text': "And the answer to that is probably very different from the one in the paper, in the sense that it's very likely for a lot of natural data,", 'start': 3754.724, 'duration': 8.864}, {'end': 3768.189, 'text': 'new points are likely to be inside the containing ellipsoid.', 'start': 3763.588, 'duration': 4.601}, {'end': 3771.871, 'text': 'So it very much depends on what you mean,', 'start': 3769.05, 'duration': 2.821}, {'end': 3779.634, 'text': "but it's just that the notion of interpolation in high-dimensional space or intuition are kind of biased toward low dimension,", 'start': 3771.871, 'duration': 7.763}, {'end': 3780.994, 'text': 'and we have to be very careful what we say.', 'start': 3779.634, 'duration': 1.36}, {'end': 3785.537, 'text': 'So that was the main thing.', 'start': 3783.216, 'duration': 2.321}, {'end': 3792.802, 'text': 'And we have a bunch of, it turns out, mathematicians that worked on these questions for many years.', 'start': 3786.158, 'duration': 6.644}, {'end': 3797.465, 'text': "And there's a whole bunch of theorems about this that we survey in the paper.", 'start': 3792.982, 'duration': 4.483}, {'end': 3800.084, 'text': 'Very interesting.', 'start': 3799.403, 'duration': 0.681}, {'end': 3802.505, 'text': "Well, we'll dig more into that in a second.", 'start': 3800.104, 'duration': 2.401}], 'summary': 'Interpolation in high-dimensional space is biased; new points likely inside containing ellipsoid.', 'duration': 67.21, 'max_score': 3735.295, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw3735295.jpg'}, {'end': 4079.101, 'src': 'embed', 'start': 4046.637, 'weight': 13, 'content': [{'end': 4054.266, 'text': 'No, I think the essence of being a scientist is to be able to change your mind in the face of evidence.', 'start': 4046.637, 'duration': 7.629}, {'end': 4059.552, 'text': 'You cannot be a scientist if you have preconceived ideas.', 'start': 4055.688, 'duration': 3.864}, {'end': 4067.236, 'text': "On the other hand, I've also been known to hold very tight to ideas that I thought were true,", 'start': 4059.832, 'duration': 7.404}, {'end': 4073.099, 'text': 'in the face of considerable differing opinion from my dear colleagues.', 'start': 4067.236, 'duration': 5.863}, {'end': 4079.101, 'text': 'So it also helps to have deeply held convictions sometimes.', 'start': 4074.399, 'duration': 4.702}], 'summary': 'Being a scientist requires open-mindedness, yet also deeply held convictions at times.', 'duration': 32.464, 'max_score': 4046.637, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4046637.jpg'}], 'start': 1741.727, 'title': 'Interpolation and generalization in machine learning', 'summary': 'Delves into how deep learning models generalize via interpolation, the limitations of neural networks in generalization, the concept of interpolation in machine learning, understanding machine learning interpolation, the curse of dimensionality, and the challenges of interpolation vs extrapolation in high-dimensional learning.', 'chapters': [{'end': 2019.372, 'start': 1741.727, 'title': 'Deep learning models and generalization', 'summary': 'Discusses how deep learning models generalize via interpolation, the limitations of neural networks in generalization, and the importance of structural priors in models to enable systematic generalization, emphasizing the proximity to known situations as crucial for local generalization.', 'duration': 277.645, 'highlights': ['Deep learning models generalize via interpolation, working well with data within the training distribution but not systematically to anything outside the training distribution. Deep learning models are effective for data within the training distribution, but they struggle to generalize to data outside the training distribution.', 'Neural networks lack efficiency in generalization without relevant nonlinear transformation fed into the model by data scientists. Neural networks require data scientists to feed in relevant nonlinear transformations for efficient generalization.', 'The importance of structural priors in models to enable systematic generalization, emphasizing the proximity to known situations as crucial for local generalization. Structural priors in models are crucial for systematic generalization, with proximity to known situations being vital for local generalization.']}, {'end': 2232.525, 'start': 2019.892, 'title': 'Interpolation in machine learning', 'summary': 'Explores the concept of interpolation in machine learning, discussing the intrinsic nature of problems, the role of representations, and the limitations of current definitions, as well as the need for a new definition of interpolation in high dimensions.', 'duration': 212.633, 'highlights': ['The key endeavor in machine learning is to find better representations that reveal the interpolative nature of the problem, emphasizing the importance of feature engineering and representation learning.', 'The paper demonstrates that even in relatively small latent spaces on popular neural network classifiers, interpolation does not occur, challenging the traditional conceptualization of interpolation based on a few important latent factors.', 'The probability of test data being in the convex hull of the training data is near zero past a certain number of dimensions, highlighting the limitations of the current definition of interpolation in neural networks and the need for a new definition directly linked to generalization.']}, {'end': 2664.171, 'start': 2232.525, 'title': 'Understanding machine learning interpolation', 'summary': 'Discusses the concept of machine learning as an interpolation problem, the limitations in high-dimensional regression problems, the introduction of inductive biases, and the trade-off between interpolation and extrapolation in neural networks.', 'duration': 431.646, 'highlights': ['The crux of the problem is approximating a function with unique behavior in more than 20 dimensions, which becomes hopeless due to the lack of data density, as explained by Dr. Thomas Lukes from Meta AI Research. Dr. Thomas Lukes highlights the difficulty of approximating functions with unique behavior in high dimensions, stating that beyond 20 dimensions, data density becomes insufficient, impacting the feasibility of approximation.', 'Machine learning models struggle in an extrapolative regime, leading to the necessity of introducing inductive biases to enforce smoothness in model predictions, with the trade-off between beneficial and detrimental biases. The transcript emphasizes the challenge of making machine learning models work in an extrapolative regime, leading to the need for inductive biases to enforce smoothness in predictions, with the trade-off between beneficial and detrimental biases.', "Neural networks exhibit a boundary view in the discriminative setting, while understanding their behavior in the generative setting requires interpolation, showcasing the difference between extrapolation and interpolation. The distinction between neural networks' behavior in the discriminative and generative settings is highlighted, emphasizing the need for interpolation to understand neural network behavior in the generative setting and the difference between extrapolation and interpolation."]}, {'end': 3452.825, 'start': 2664.171, 'title': 'Understanding the curse of dimensionality', 'summary': 'Discusses the curse of dimensionality in machine learning, the manifold hypothesis, and the challenges associated with high dimensional spaces, citing examples and expert insights.', 'duration': 788.654, 'highlights': ["Yann LeCun's significant contributions to deep learning and machine learning, including his early work on backpropagation and invention of convolutional neural networks. Yann LeCun is a Turing Award winner known for proposing an early form of backpropagation and inventing convolutional neural networks, which are fundamental to many deep learning architectures.", 'The explanation of the curse of dimensionality, its impact on algorithms, and the challenges it poses in statistical learning, approximation, and computation. The curse of dimensionality causes challenges in statistical learning, approximation, and computation due to the exponential growth in complexity as the input space grows larger.', 'Insights into the manifold hypothesis and its implications for high-dimensional data, emphasizing the difficulty of neural networks in learning the data manifold. The chapter delves into the manifold hypothesis, stating that neural networks may struggle to learn the data manifold, which challenges the common belief in their ability to do so.']}, {'end': 3733.774, 'start': 3452.825, 'title': 'Interpolation vs extrapolation in high dimension learning', 'summary': 'Discusses the limitations of the dichotomy between interpolation and extrapolation in high-dimensional learning, emphasizing that machine learning in high dimensions operates differently from low dimension curve fitting and that any new point in high dimension is almost always outside the convex hull of existing points, challenging the conventional definitions of interpolation and extrapolation.', 'duration': 280.949, 'highlights': ['Machine learning in high dimensions operates differently from low dimension curve fitting, challenging the conventional definitions of interpolation and extrapolation The chapter emphasizes that in high dimension, the geometry is very different from the intuition formed with curve fitting in low dimension, challenging the dismissal of machine learning and deep learning as only performing interpolation. It is stated that any new point in high dimension is almost always going to be outside the convex hull of existing points, challenging the conventional definitions of interpolation and extrapolation.', 'The paper aims to dispel the myth that machine learning, and deep learning in particular, only does interpolation The paper aims to dispel the myth that machine learning, and deep learning in particular, only does interpolation, and provide intuition about what really takes place in machine learning and high dimension.', 'The definition of interpolation in high dimension is when a new point is inside the convex hull of the points that are already known In high dimension, the definition of interpolation is when a new point is inside the convex hull of the points that are already known, where it is pointed out that the volume of that space is tiny compared to the overall volume of space in high dimension.']}, {'end': 4079.101, 'start': 3735.295, 'title': 'Interpolation and reasoning in high-dimensional space', 'summary': "Discusses the concept of interpolation in high-dimensional space, the potential for new points to be inside or outside the containing ellipsoid, and the debate surrounding deep learning models' reasoning capabilities and their distinction from supervised learning.", 'duration': 343.806, 'highlights': ['The chapter discusses the concept of interpolation in high-dimensional space and the potential for new points to be inside or outside the containing ellipsoid. Concept of interpolation in high-dimensional space, potential for new points to be inside or outside the containing ellipsoid', "The debate surrounding deep learning models' reasoning capabilities and their distinction from supervised learning is explored, emphasizing the limitations of supervised learning in contrast to deep learning. Debate on reasoning capabilities and distinction from supervised learning, limitations of supervised learning", 'The chapter also delves into the potential for deep learning models to reason and extrapolate, acknowledging the challenges and the need for compatibility with deep learning. Potential for deep learning models to reason and extrapolate, challenges and need for compatibility with deep learning']}], 'duration': 2337.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw1741727.jpg', 'highlights': ['Deep learning models struggle to generalize to data outside the training distribution.', 'Neural networks require relevant nonlinear transformations for efficient generalization.', 'Structural priors in models are crucial for systematic generalization.', 'The key endeavor in machine learning is to find better representations revealing the interpolative nature.', 'The probability of test data being in the convex hull of the training data is near zero past a certain number of dimensions.', 'Approximating a function with unique behavior in more than 20 dimensions becomes hopeless due to the lack of data density.', 'Machine learning models struggle in an extrapolative regime, leading to the necessity of introducing inductive biases.', 'Neural networks exhibit a boundary view in the discriminative setting, while understanding their behavior in the generative setting requires interpolation.', "Yann LeCun's significant contributions to deep learning and machine learning.", 'The curse of dimensionality causes challenges in statistical learning, approximation, and computation.', 'Machine learning in high dimensions operates differently from low dimension curve fitting, challenging the conventional definitions of interpolation and extrapolation.', 'The paper aims to dispel the myth that machine learning, and deep learning in particular, only does interpolation.', 'The definition of interpolation in high dimension is when a new point is inside the convex hull of the points that are already known.', 'The chapter discusses the concept of interpolation in high-dimensional space and the potential for new points to be inside or outside the containing ellipsoid.', "The debate surrounding deep learning models' reasoning capabilities and their distinction from supervised learning is explored.", 'The chapter also delves into the potential for deep learning models to reason and extrapolate, acknowledging the challenges and the need for compatibility with deep learning.']}, {'end': 5393.034, 'segs': [{'end': 4364.321, 'src': 'embed', 'start': 4335.954, 'weight': 9, 'content': [{'end': 4338.175, 'text': 'So an RBF network was basically a two-layer neural net.', 'start': 4335.954, 'duration': 2.221}, {'end': 4340.416, 'text': 'Well, the first layer was very much like an SVM.', 'start': 4338.455, 'duration': 1.961}, {'end': 4341.337, 'text': 'This was before SVMs.', 'start': 4340.436, 'duration': 0.901}, {'end': 4350.658, 'text': 'that again had radial basis functions responses, comparing an input to two vectors and passing it to an exponential or something like this.', 'start': 4342.697, 'duration': 7.961}, {'end': 4353.779, 'text': 'And then you would initialize the first layer.', 'start': 4351.939, 'duration': 1.84}, {'end': 4357.9, 'text': "You could train it with backprop, but it would get stuck in local minima, so that wasn't a good idea.", 'start': 4353.859, 'duration': 4.041}, {'end': 4364.321, 'text': 'You had to initialize the first layer with something like k-means or Mixer or Gaussian or something like that.', 'start': 4358.16, 'duration': 6.161}], 'summary': 'Rbf network: 2-layer neural net with svm-like first layer, trained with k-means or gaussian initialization.', 'duration': 28.367, 'max_score': 4335.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4335954.jpg'}, {'end': 4414.1, 'src': 'embed', 'start': 4377.285, 'weight': 1, 'content': [{'end': 4379.866, 'text': 'they were kind of faster than the neural nets.', 'start': 4377.285, 'duration': 2.581}, {'end': 4384.289, 'text': 'so those things, for those things, the answer to your question is is a one type.', 'start': 4379.866, 'duration': 4.423}, {'end': 4391.553, 'text': "they're basically doing interpolation with kernels and uh, and it's very much like a smooth version of nearest neighbors.", 'start': 4384.289, 'duration': 7.264}, {'end': 4399.77, 'text': 'But then for classical neural nets, where you have either a hyperbolic tangent, nonlinearity or ReLU or something of that type,', 'start': 4393.265, 'duration': 6.505}, {'end': 4404.373, 'text': 'something with a kink in it or two kinks, the answer is different.', 'start': 4399.77, 'duration': 4.603}, {'end': 4414.1, 'text': "There it's a whole cone of response that will produce a positive response versus not, if you take a combination of units right?", 'start': 4404.433, 'duration': 9.667}], 'summary': 'Faster interpolation with kernels than neural nets, different response for classical neural nets.', 'duration': 36.815, 'max_score': 4377.285, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4377285.jpg'}, {'end': 4495.783, 'src': 'embed', 'start': 4467.489, 'weight': 2, 'content': [{'end': 4474.974, 'text': "maybe we could use a ball and ellipsoid instead, but there's kind of a key thing there, which is that it's going across all dimensions.", 'start': 4467.489, 'duration': 7.485}, {'end': 4479.216, 'text': "So you're inside the convex hull.", 'start': 4475.254, 'duration': 3.962}, {'end': 4487.079, 'text': "it's a necessary condition that, on every single dimension, your sample data point falls within the range of the training data.", 'start': 4479.216, 'duration': 7.863}, {'end': 4495.783, 'text': "We could kind of go the opposite extreme and say that you're interpolating if any dimension falls within the training domains, or rather,", 'start': 4487.559, 'duration': 8.224}], 'summary': 'Using ball and ellipsoid, ensure sample data falls within training data range on all dimensions.', 'duration': 28.294, 'max_score': 4467.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4467489.jpg'}, {'end': 4533.348, 'src': 'embed', 'start': 4507.71, 'weight': 11, 'content': [{'end': 4512.533, 'text': "like there's a subset of dimensions that might be salient for any particular data point.", 'start': 4507.71, 'duration': 4.823}, {'end': 4519.539, 'text': 'So why are we kind of using this very exponential extreme definition? It was still exponential.', 'start': 4513.454, 'duration': 6.085}, {'end': 4531.307, 'text': 'You can try to divide the set of dimensions into the ones that are useful and the ones that are just nuisance parameters that are useless but that are not relevant for the task at hand.', 'start': 4520.22, 'duration': 11.087}, {'end': 4533.348, 'text': 'But first of all, that task is very difficult.', 'start': 4531.767, 'duration': 1.581}], 'summary': 'Analyzing dimensions for relevance in data points, aiming to separate useful from nuisance parameters, with difficulty in the task.', 'duration': 25.638, 'max_score': 4507.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4507710.jpg'}, {'end': 4867.581, 'src': 'embed', 'start': 4820.375, 'weight': 7, 'content': [{'end': 4825.7, 'text': 'which is why I was disagreeing with Jeff Hinton about the usefulness of unsupervised running.', 'start': 4820.375, 'duration': 5.325}, {'end': 4835.327, 'text': 'Because if you have a task at hand for which you have data that you can use to train a supervised system,', 'start': 4826.541, 'duration': 8.786}, {'end': 4844.491, 'text': 'why would you go to the trouble of pre-training a system in unsupervised mode,', 'start': 4835.327, 'duration': 9.164}, {'end': 4852.594, 'text': 'knowing that the unsupervised learning problem is considerably more complicated, both from every aspect you can think of,', 'start': 4844.491, 'duration': 8.103}, {'end': 4854.174, 'text': 'certainly from the theoretical point of view?', 'start': 4852.594, 'duration': 1.58}, {'end': 4864.84, 'text': 'Vladimir Vapnik actually has kind of a a similar opinion, one of the few things that he and I agree on, or agreed on at least,', 'start': 4855.295, 'duration': 9.545}, {'end': 4867.581, 'text': 'which is why would you want to solve a more complex problem than you have to?', 'start': 4864.84, 'duration': 2.741}], 'summary': 'Disagreeing with jeff hinton about the utility of unsupervised learning due to its complexity compared to supervised learning.', 'duration': 47.206, 'max_score': 4820.375, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4820375.jpg'}, {'end': 5103.252, 'src': 'embed', 'start': 5075.64, 'weight': 10, 'content': [{'end': 5079.982, 'text': 'You had to have pairs of things that are compatible as well as pair of things that are incompatible.', 'start': 5075.64, 'duration': 4.342}, {'end': 5085.645, 'text': "and you had, you mentioned there's just too many ways two things can be incompatible, and so that was kind of doomed to failure.", 'start': 5079.982, 'duration': 5.663}, {'end': 5087.385, 'text': 'and you know, i played with this.', 'start': 5085.645, 'duration': 1.74}, {'end': 5090.307, 'text': 'you know back in, you know this used to be called siamese networks.', 'start': 5087.385, 'duration': 2.922}, {'end': 5096.369, 'text': 'right in the had a paper in 1992 1993 on doing signature verification using those techniques.', 'start': 5090.307, 'duration': 6.062}, {'end': 5103.252, 'text': 'jeffington had a slightly different method, based on maximizing mutual information with his former student, sue becker.', 'start': 5096.369, 'duration': 6.883}], 'summary': 'The challenges of compatibility and incompatibility in pairs led to a failed attempt, leading to further exploration in siamese networks and signature verification techniques.', 'duration': 27.612, 'max_score': 5075.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw5075640.jpg'}, {'end': 5326.004, 'src': 'embed', 'start': 5229.124, 'weight': 0, 'content': [{'end': 5244.472, 'text': "it's the idea where you give two examples and you have two examples in your training set and you actually do an interpolation in input space to generate a fake example that's in between the two and you train it to produce the intermediate target between the two original examples.", 'start': 5229.124, 'duration': 15.348}, {'end': 5245.272, 'text': 'Mix up.', 'start': 5244.932, 'duration': 0.34}, {'end': 5247.192, 'text': 'Mix up, yeah.', 'start': 5245.612, 'duration': 1.58}, {'end': 5250.973, 'text': 'There is that there is distillation, there is.', 'start': 5248.113, 'duration': 2.86}, {'end': 5257.355, 'text': 'various techniques like this are basically implicitly kind of ways to fill in the space between samples,', 'start': 5250.973, 'duration': 6.382}, {'end': 5261.456, 'text': 'with other kind of fake samples or virtual samples, if you want.', 'start': 5257.355, 'duration': 4.101}, {'end': 5271.658, 'text': 'There was a paper by Patrice Simard and me and John Denker many years ago when we were all at Bell Labs on something called tangent prop.', 'start': 5262.336, 'duration': 9.322}, {'end': 5275.679, 'text': 'So the idea of tangent prop was kind of Somewhat similar.', 'start': 5271.738, 'duration': 3.941}, {'end': 5282.642, 'text': "the idea was you take a training sample and you're going to be able to distort that training sample in several ways.", 'start': 5275.679, 'duration': 6.963}, {'end': 5285.283, 'text': 'You could generate points by data augmentation.', 'start': 5282.682, 'duration': 2.601}, {'end': 5293.807, 'text': 'But the other thing you can do is just figure out the plane in which those augmentations live.', 'start': 5285.763, 'duration': 8.044}, {'end': 5299.689, 'text': 'And that plane is going to be a tangent plane to the data manifold.', 'start': 5295.847, 'duration': 3.842}, {'end': 5312.478, 'text': 'What you want is your input-output function that the neural net learns to be invariant to those little distortions.', 'start': 5301.274, 'duration': 11.204}, {'end': 5315.24, 'text': 'And you can do this by just augmenting the data,', 'start': 5312.979, 'duration': 2.261}, {'end': 5326.004, 'text': 'or you can do this by explicitly having a regularization term that says the overall derivative of the function in the direction of the spanning vectors of that plane should be zero.', 'start': 5315.24, 'duration': 10.764}], 'summary': 'Various techniques like mix up and tangent prop implicitly fill in space between samples by generating fake samples or virtual samples, aiming for neural net invariance to data distortions.', 'duration': 96.88, 'max_score': 5229.124, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw5229124.jpg'}], 'start': 4080.089, 'title': 'Various ai topics', 'summary': 'Covers debates and challenges in neural network interpolation, high-dimensional geometry, self-supervised learning, and advancements in signature verification, including specific algorithms and techniques such as byol, barlow twins, vcrag, and data augmentation impact on training data volume.', 'chapters': [{'end': 4549.959, 'start': 4080.089, 'title': 'Neural network interpolation debate', 'summary': 'Discusses the debate around neural network interpolation, exploring examples of interpolation and extrapolation, different notions of interpolation, and the challenges of defining interpolation and extrapolation in multidimensional spaces.', 'duration': 469.87, 'highlights': ['Neural networks can exhibit extrapolation beyond the training data, which may not always be relevant for the problem at hand. Neural networks can extrapolate beyond the training data, potentially impacting the relevance of the extrapolated information for the specific problem.', 'Challenges in defining interpolation and extrapolation in multidimensional spaces, suggesting the need for nuanced definitions taking into account the saliency of dimensions. The debate highlights the difficulty in defining interpolation and extrapolation in multidimensional spaces, emphasizing the importance of considering the saliency of dimensions in defining these concepts.', 'Exploration of different notions of interpolation in neural networks, including comparisons with kernel-based methods and traditional neural network architectures. The discussion delves into various notions of interpolation in neural networks, drawing comparisons with kernel-based methods and traditional neural network architectures.']}, {'end': 4820.375, 'start': 4550.399, 'title': 'Interpolation in high-dimensional geometry', 'summary': 'Discusses the challenges of interpolation in high-dimensional spaces, highlighting the impact of data dimensionality and the need for a useful definition of interpolation versus extrapolation in machine learning, as well as the complexities of learning the structure of a data manifold.', 'duration': 269.976, 'highlights': ['The experiment demonstrates that as the dimension of the embedding exceeds 20, a large number of training samples are required to stay within the interpolation regime. The dimension of the embedding plays a crucial role, as it necessitates a substantial number of training samples (2 to the power of 20) to remain within the interpolation regime.', 'The dimension of the linear subspace containing the entire dataset is more significant than the dimension of the input space for the convex hull process. The importance of the dimension of the linear subspace that encompasses all data points is emphasized in the context of the convex hull process, overshadowing the dimension of the input space.', 'Discussion on the need for a more useful definition of interpolation versus extrapolation in machine learning, with a suggestion to consider whether the points are contained in an ellipsoid/sphere that contains all the data. The chapter raises the question of devising a more practical definition of interpolation versus extrapolation in the context of machine learning, proposing the consideration of whether the points are contained in an ellipsoid or sphere encompassing all the data.', 'The challenges of learning the structure of a data manifold are highlighted, emphasizing the complexity of this task compared to classifying objects on the manifold. The complexities of learning the structure of a data manifold are underscored, with the chapter highlighting the arduous nature of this task compared to the classification of objects on the manifold.']}, {'end': 5090.307, 'start': 4820.375, 'title': 'Self-supervised learning in ai', 'summary': 'Discusses the debate between supervised and unsupervised learning, while advocating for the utilization of unlabeled data for self-supervised learning, particularly in natural language processing and vision, with a focus on multimodal prediction architectures.', 'duration': 269.932, 'highlights': ['The debate between supervised and unsupervised learning The chapter presents a debate about the usefulness of unsupervised learning in comparison to supervised learning, highlighting the complexity and theoretical differences between the two approaches.', 'Utilization of unlabeled data for self-supervised learning The discussion emphasizes the advantage of leveraging large quantities of unlabeled data to pre-train neural networks for subsequent fine-tuning, with a focus on the success of self-supervised learning in natural language processing and the anticipated impact in the domain of vision.', 'Multimodal prediction architectures in self-supervised learning The chapter explores the concept of multimodal prediction architectures for self-supervised learning, specifically in the context of vision, highlighting the characteristics and benefits of both latent variable predictive architecture and joint embedding architecture.']}, {'end': 5393.034, 'start': 5090.307, 'title': 'Advancements in signature verification and data augmentation', 'summary': 'Discusses advancements in signature verification techniques, including non-contrastive methods and joint embedding architectures, citing specific algorithms such as byol, barlow twins, and vcrag. additionally, it explores the impact of data augmentation on training data volume and the concept of interpolating fake examples in the training set to fill the space between samples, along with the idea of tangent prop as a regularizer for input-output functions in neural networks.', 'duration': 302.727, 'highlights': ['The chapter discusses advancements in signature verification techniques, including non-contrastive methods and joint embedding architectures, citing specific algorithms such as BYOL, Barlow Twins, and vCRAG.', 'It explores the impact of data augmentation on training data volume and suggests that augmentations are generally fairly local in dimensions that are already explored, hence not significantly increasing the data point cloud.', 'The concept of interpolating fake examples in the training set to fill the space between samples is mentioned, with examples such as mix up and distillation, along with the idea of tangent prop as a regularizer for input-output functions in neural networks.']}], 'duration': 1312.945, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw4080089.jpg', 'highlights': ['Neural networks can extrapolate beyond the training data, potentially impacting the relevance of the extrapolated information for the specific problem.', 'Challenges in defining interpolation and extrapolation in multidimensional spaces, emphasizing the importance of considering the saliency of dimensions in defining these concepts.', 'Exploration of different notions of interpolation in neural networks, drawing comparisons with kernel-based methods and traditional neural network architectures.', 'The dimension of the embedding plays a crucial role, as it necessitates a substantial number of training samples (2 to the power of 20) to remain within the interpolation regime.', 'The importance of the dimension of the linear subspace that encompasses all data points is emphasized in the context of the convex hull process, overshadowing the dimension of the input space.', 'The chapter raises the question of devising a more practical definition of interpolation versus extrapolation in the context of machine learning, proposing the consideration of whether the points are contained in an ellipsoid or sphere encompassing all the data.', 'The complexities of learning the structure of a data manifold are underscored, with the chapter highlighting the arduous nature of this task compared to the classification of objects on the manifold.', 'The chapter presents a debate about the usefulness of unsupervised learning in comparison to supervised learning, highlighting the complexity and theoretical differences between the two approaches.', 'The discussion emphasizes the advantage of leveraging large quantities of unlabeled data to pre-train neural networks for subsequent fine-tuning, with a focus on the success of self-supervised learning in natural language processing and the anticipated impact in the domain of vision.', 'The chapter explores the concept of multimodal prediction architectures for self-supervised learning, specifically in the context of vision, highlighting the characteristics and benefits of both latent variable predictive architecture and joint embedding architecture.', 'The chapter discusses advancements in signature verification techniques, including non-contrastive methods and joint embedding architectures, citing specific algorithms such as BYOL, Barlow Twins, and vCRAG.', 'It explores the impact of data augmentation on training data volume and suggests that augmentations are generally fairly local in dimensions that are already explored, hence not significantly increasing the data point cloud.', 'The concept of interpolating fake examples in the training set to fill the space between samples is mentioned, with examples such as mix up and distillation, along with the idea of tangent prop as a regularizer for input-output functions in neural networks.']}, {'end': 6568.783, 'segs': [{'end': 6220.91, 'src': 'embed', 'start': 6193.907, 'weight': 4, 'content': [{'end': 6199.491, 'text': 'So I think the human mind is able to deal with all of those situations.', 'start': 6193.907, 'duration': 5.584}, {'end': 6207.498, 'text': 'know to use differentiable, continuous stuff when they have to and use the discrete exploration when we have to as well.', 'start': 6199.491, 'duration': 8.007}, {'end': 6213.643, 'text': 'but we have to realize that humans are really really bad at the discrete exploration stuff.', 'start': 6208.078, 'duration': 5.565}, {'end': 6215.365, 'text': 'we totally suck at it.', 'start': 6213.643, 'duration': 1.722}, {'end': 6220.91, 'text': "um, if we didn't, then we would be better than computers at playing chess and go, but we're not.", 'start': 6215.365, 'duration': 5.545}], 'summary': 'Human mind adept at continuous tasks but bad at discrete exploration; inferior to computers in chess and go.', 'duration': 27.003, 'max_score': 6193.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6193907.jpg'}, {'end': 6271.826, 'src': 'embed', 'start': 6242.427, 'weight': 0, 'content': [{'end': 6245.709, 'text': "Do you think that's, is that a good thing? Or do you think that's.", 'start': 6242.427, 'duration': 3.282}, {'end': 6248.331, 'text': 'So you like that kind of model.', 'start': 6247.21, 'duration': 1.121}, {'end': 6248.771, 'text': 'Well, OK.', 'start': 6248.351, 'duration': 0.42}, {'end': 6257.016, 'text': "There's something very interesting in the context of reinforcement learning about this, which is actual critic models.", 'start': 6248.791, 'duration': 8.225}, {'end': 6261.859, 'text': 'And you could think of all of the stuff that actually works in reinforcement learning.', 'start': 6257.197, 'duration': 4.662}, {'end': 6267.383, 'text': "I mean, they don't work in the real world, but they work in games and stuff and simulated environment.", 'start': 6262.58, 'duration': 4.803}, {'end': 6271.826, 'text': 'They very often use actual critic type architectures.', 'start': 6268.423, 'duration': 3.403}], 'summary': 'Discussion on the use of actual critic models in reinforcement learning.', 'duration': 29.399, 'max_score': 6242.427, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6242427.jpg'}, {'end': 6407.425, 'src': 'embed', 'start': 6382.75, 'weight': 7, 'content': [{'end': 6396.178, 'text': 'So what you have now is a neural net inside your agent that basically can simulate the world and simulate the cost that is going to result from the state of the world in a differentiable way.', 'start': 6382.75, 'duration': 13.428}, {'end': 6401.641, 'text': 'So now you can use gradient descent or gradient-based methods for two things.', 'start': 6396.538, 'duration': 5.103}, {'end': 6405.444, 'text': 'One, for inferring a sequence of action that will minimize a particular cost.', 'start': 6402.162, 'duration': 3.282}, {'end': 6407.425, 'text': "There's no learning there.", 'start': 6406.184, 'duration': 1.241}], 'summary': 'Neural net in agent simulates world, uses gradient descent for minimizing cost and inferring action sequences.', 'duration': 24.675, 'max_score': 6382.75, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6382750.jpg'}, {'end': 6484.716, 'src': 'embed', 'start': 6450.194, 'weight': 5, 'content': [{'end': 6455.06, 'text': "the car will go to the right and if there's a cliff next to you, the car is going to fall off the cliff and you know you're going to die.", 'start': 6450.194, 'duration': 4.866}, {'end': 6458.802, 'text': "right?. You don't have to try this to know that this is going to happen.", 'start': 6455.06, 'duration': 3.742}, {'end': 6467.587, 'text': 'You can rely on your internal model to avoid yourself a lot of pain.', 'start': 6458.902, 'duration': 8.685}, {'end': 6471.57, 'text': 'But you pay attention to the situation.', 'start': 6467.947, 'duration': 3.623}, {'end': 6473.291, 'text': 'You pay a lot of attention to the situation.', 'start': 6471.61, 'duration': 1.681}, {'end': 6474.952, 'text': "You're completely deliberate about it.", 'start': 6473.331, 'duration': 1.621}, {'end': 6484.716, 'text': 'you imagine all kinds of scenarios and you drive slowly, so you leave yourself enough time to actually do this kind of reasoning and then,', 'start': 6476.432, 'duration': 8.284}], 'summary': 'Rely on internal model to avoid pain, drive deliberately, imagine scenarios, drive slowly.', 'duration': 34.522, 'max_score': 6450.194, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6450194.jpg'}, {'end': 6527.063, 'src': 'embed', 'start': 6498.324, 'weight': 2, 'content': [{'end': 6504.31, 'text': "I played once I'm a terrible chess player, by the way and I played once a simultaneous game against a grandmaster.", 'start': 6498.324, 'duration': 5.986}, {'end': 6513.116, 'text': 'So he was playing against like 50 other people, And so I had plenty of time to think about my move right?', 'start': 6505.491, 'duration': 7.625}, {'end': 6518.059, 'text': 'Because he had to kind of play with the 49 other players before getting to me.', 'start': 6513.156, 'duration': 4.903}, {'end': 6521.12, 'text': 'And so I wait for him to come and make one move.', 'start': 6518.659, 'duration': 2.461}, {'end': 6527.063, 'text': 'And then, you know, in one second, well, first of all, he does like, you know, as if I played something stupid, which I did.', 'start': 6521.52, 'duration': 5.543}], 'summary': 'Played a simultaneous chess game against a grandmaster, made a mistake in one move.', 'duration': 28.739, 'max_score': 6498.324, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6498324.jpg'}, {'end': 6568.783, 'src': 'embed', 'start': 6540.671, 'weight': 3, 'content': [{'end': 6548.676, 'text': "He doesn't have to think because I'm not, you know, I'm not good enough for him to really kind of cause his system to kick in.", 'start': 6540.671, 'duration': 8.005}, {'end': 6555.942, 'text': "And of course, you know, he beat me in 10, you know, in 10 plays, right? I mean, as I told you, I'm terrible.", 'start': 6549.357, 'duration': 6.585}, {'end': 6561.545, 'text': 'You know, I learned to drive a long time ago, but as it turned out, very recently, I.', 'start': 6557.202, 'duration': 4.343}, {'end': 6568.783, 'text': 'I went to drive a sports car on a raceway.', 'start': 6563.7, 'duration': 5.083}], 'summary': 'Speaker feels inadequate compared to someone they lost to in 10 plays, and recently drove a sports car on a raceway.', 'duration': 28.112, 'max_score': 6540.671, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6540671.jpg'}], 'start': 5393.074, 'title': 'Neural networks and decision making', 'summary': 'Explores the potential of neural networks in reasoning and optimization across domains such as speech and handwriting recognition. it also delves into backprop through time in neural networks, limitations, and its potential impact. furthermore, it discusses the challenges in dealing with uncertainty in decision making and the need for predictive models and differentiable approaches.', 'chapters': [{'end': 5695.928, 'start': 5393.074, 'title': 'Neural networks and reasoning', 'summary': 'Discusses the limitations of the current understanding of neural networks and emphasizes their capability in reasoning using energy minimization, differentiable operations, and optimization problems in various domains such as speech recognition, handwriting recognition, and planning.', 'duration': 302.854, 'highlights': ['Neural networks go beyond the restricted view and can perform reasoning through energy minimization and optimization problems in various domains such as speech recognition and planning. Energy minimization, optimization problems in speech recognition and planning.', 'Systems in use today perform reasoning through a minimization of an energy function with respect to latent variables, using differentiable operations. Usage of differentiable operations for reasoning through energy minimization and latent variables.', 'The ability to backpropagate gradient through different modules, inferring latent variables, and performing energy minimization demonstrates the capability of deep learning systems in reasoning. Backpropagation of gradient, inference of latent variables, and energy minimization.']}, {'end': 6042.757, 'start': 5696.869, 'title': 'Backprop through time and neural networks', 'summary': 'Discusses the concept of backprop through time in optimal control, its application to neural networks, the limitations in handling variable number of layers, and the potential for bridging the gap in computation capabilities with neural networks.', 'duration': 345.888, 'highlights': ['Optimal control theorists invented backprop through time in the 1960s, but its application to machine learning was realized only in the mid-80s. Historical background of backprop through time and its application to machine learning.', 'Challenges in training neural networks with a variable number of layers and the introduction of recurrent nets to handle varying number of iterations. The difficulty in training neural networks with variable layers and the role of recurrent nets in handling varying iterations.', 'Discussion on the different types of computation - finite fixed computation and discrete symbolic computation - and their potential in bridging computation capabilities with neural networks. Exploration of the different types of computation and their potential implications for neural networks.', 'The discussion of problems with discontinuous mappings from action to result, and the need to handle uncertainty in search in various types of situations. Exploration of problems with discontinuous mappings and the need to handle uncertainty in different situations.']}, {'end': 6568.783, 'start': 6042.757, 'title': 'Dealing with uncertainty in decision making', 'summary': 'Discusses the challenges of dealing with uncertainty in decision making, including the need for predictive models to represent uncertainty and the use of differentiable and discrete approaches in problem-solving, while highlighting the limitations of human ability in discrete exploration.', 'duration': 526.026, 'highlights': ['The need for predictive models to represent uncertainty The chapter emphasizes the importance of predictive models that can represent uncertainty in decision making, especially in cases where the model is not given by equations derived from first principles and the world is not completely predictable.', 'Challenges of dealing with uncertainty in decision making The chapter discusses the challenges of dealing with uncertainty in decision making, including the need to learn the model of the real world due to its complicated dynamics and the unpredictability of the world despite being deterministic.', "Limitations of human ability in discrete exploration The chapter highlights the limitations of human ability in discrete exploration, stating that humans are 'really really bad' at it, as evidenced by their inferior performance compared to computers in playing chess and go.", 'Use of differentiable and discrete approaches in problem-solving The chapter discusses the need to use differentiable, continuous approaches and discrete exploration in problem-solving, stating that most problems are a combination of the two and that the human mind is capable of using both approaches as needed.']}], 'duration': 1175.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw5393074.jpg', 'highlights': ['Neural networks can perform reasoning through energy minimization and optimization problems in speech recognition and planning.', 'Backpropagation of gradient, inference of latent variables, and energy minimization demonstrate the capability of deep learning systems in reasoning.', 'Optimal control theorists invented backprop through time in the 1960s, but its application to machine learning was realized only in the mid-80s.', 'Challenges in training neural networks with a variable number of layers and the introduction of recurrent nets to handle varying number of iterations.', 'Exploration of the different types of computation and their potential implications for neural networks.', 'The need for predictive models to represent uncertainty in decision making, especially in cases where the model is not given by equations derived from first principles.', 'Challenges of dealing with uncertainty in decision making, including the need to learn the model of the real world due to its complicated dynamics and the unpredictability of the world despite being deterministic.', 'The chapter discusses the need to use differentiable, continuous approaches and discrete exploration in problem-solving.']}, {'end': 7243.968, 'segs': [{'end': 6724.182, 'src': 'embed', 'start': 6697.392, 'weight': 2, 'content': [{'end': 6704.035, 'text': 'But the current definition is not good enough for the current data regimes that we are in right now.', 'start': 6697.392, 'duration': 6.643}, {'end': 6705.915, 'text': "I don't know if it's precise enough.", 'start': 6704.055, 'duration': 1.86}, {'end': 6708.536, 'text': "Perfect It's wonderful.", 'start': 6707.156, 'duration': 1.38}, {'end': 6712.138, 'text': 'Cool Well, sorry, need to get into the mood again.', 'start': 6708.737, 'duration': 3.401}, {'end': 6714.719, 'text': 'Yeah Yeah.', 'start': 6713.118, 'duration': 1.601}, {'end': 6715.339, 'text': 'Hi, Randall.', 'start': 6714.759, 'duration': 0.58}, {'end': 6718.22, 'text': "It's really cool to have you here.", 'start': 6715.919, 'duration': 2.301}, {'end': 6720.44, 'text': "We've enjoyed reading the paper.", 'start': 6718.62, 'duration': 1.82}, {'end': 6724.182, 'text': "It's quite a short and concise paper, I have to say.", 'start': 6720.521, 'duration': 3.661}], 'summary': 'Discussion on current data regimes and a short, concise paper.', 'duration': 26.79, 'max_score': 6697.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6697392.jpg'}, {'end': 7015.988, 'src': 'embed', 'start': 6982.762, 'weight': 0, 'content': [{'end': 6984.663, 'text': "let's say, 15 or 20..", 'start': 6982.762, 'duration': 1.901}, {'end': 6990.65, 'text': 'So it does not happen because of the dimensionality, unless you have very degenerate things.', 'start': 6984.663, 'duration': 5.987}, {'end': 6997.357, 'text': 'Of course, if your generative network just spits out a constant, then in pixel space, you will have interpolation.', 'start': 6990.71, 'duration': 6.647}, {'end': 7001.221, 'text': 'But this is degenerate by definition.', 'start': 6997.757, 'duration': 3.464}, {'end': 7003.063, 'text': "I'd like to pick up on that, if you don't mind.", 'start': 7001.321, 'duration': 1.742}, {'end': 7015.988, 'text': 'Yeah, so what I want to pick up on is again all of this hinges on definition, one which is in the paper, which is membership within this convex hull.', 'start': 7005.816, 'duration': 10.172}], 'summary': 'The discussion revolves around issues related to degenerate generative networks and convex hull membership.', 'duration': 33.226, 'max_score': 6982.762, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6982762.jpg'}], 'start': 6569.144, 'title': 'Reevaluating interpolation in machine learning', 'summary': 'Explores the limitations of current interpolation definitions in high-dimensional settings, emphasizing the need for reevaluation due to dimensionality impact. it discusses redefining interpolation in higher dimensions, task-dependent definitions, and the concept of interpolation vs extrapolation.', 'chapters': [{'end': 7003.063, 'start': 6569.144, 'title': 'Critique of interpolation in high-dimensional settings', 'summary': 'Explores the limitations of the current definition of interpolation in high-dimensional settings, arguing for a reevaluation of the concept in machine learning models due to the impact of dimensionality on the occurrence of interpolation and emphasizes the need to adapt the mathematical definition to the current data regimes.', 'duration': 433.919, 'highlights': ['The current definition of interpolation in high-dimensional settings is insufficient due to the impact of dimensionality on the occurrence of interpolation in machine learning models. The current definition of interpolation in high-dimensional settings is insufficient due to the impact of dimensionality on the occurrence of interpolation in machine learning models.', 'The need to adapt the mathematical definition of interpolation to the current data regimes due to the limitations of the current definition in machine learning models. The need to adapt the mathematical definition of interpolation to the current data regimes due to the limitations of the current definition in machine learning models.', 'In high-dimensional settings, interpolation does not occur in machine learning models due to the impact of dimensionality, rendering the current definition inadequate. In high-dimensional settings, interpolation does not occur in machine learning models due to the impact of dimensionality, rendering the current definition inadequate.', 'The impact of dimensionality on the occurrence of interpolation in machine learning models, emphasizing the need for a reevaluation of the concept in high-dimensional settings. The impact of dimensionality on the occurrence of interpolation in machine learning models, emphasizing the need for a reevaluation of the concept in high-dimensional settings.', 'The concept of interpolation needs to be reevaluated in high-dimensional settings to account for the limitations of the current definition in machine learning models. The concept of interpolation needs to be reevaluated in high-dimensional settings to account for the limitations of the current definition in machine learning models.']}, {'end': 7104.454, 'start': 7005.816, 'title': 'Redefining interpolation in higher dimensions', 'summary': 'Discusses the need to redefine interpolation in higher dimensions, stressing the importance of task-dependent definitions and the impact on data compression and denoising.', 'duration': 98.638, 'highlights': ['The importance of redefining interpolation in higher dimensions based on the task at hand, such as object classification or denoising, is emphasized, as it impacts data compression and the ability to differentiate between classes of objects.', 'The rigid definition of interpolation within the convex hull is acknowledged, with the need to reconsider and redefine the concept in order to match the intuition of interpolation in higher dimensions.', 'The task-dependent nature of defining interpolation is highlighted, with different tasks requiring specific definitions to effectively address factors of variation and noise in the data.']}, {'end': 7243.968, 'start': 7104.895, 'title': 'Interpolation vs extrapolation', 'summary': 'Discusses the concept of interpolation and extrapolation in machine learning, proposing a fractional concept as a route to an improved definition, and exploring the idea of using the smallest ellipsoid that encloses the data.', 'duration': 139.073, 'highlights': ['The concept of interpolation and extrapolation in machine learning is discussed, proposing a fractional concept as a route to an improved definition. Discussion on the concept of interpolation and extrapolation, proposing a fractional concept as a route to an improved definition.', 'Exploring the idea of using the smallest ellipsoid that encloses the data, which is somehow in between the convex hull and the hypercube. Discussion on exploring the idea of using the smallest ellipsoid that encloses the data, positioning it between the convex hull and the hypercube.']}], 'duration': 674.824, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw6569144.jpg', 'highlights': ['The current definition of interpolation in high-dimensional settings is insufficient due to the impact of dimensionality on the occurrence of interpolation in machine learning models.', 'The importance of redefining interpolation in higher dimensions based on the task at hand, such as object classification or denoising, is emphasized, as it impacts data compression and the ability to differentiate between classes of objects.', 'The concept of interpolation and extrapolation in machine learning is discussed, proposing a fractional concept as a route to an improved definition.']}, {'end': 8251.059, 'segs': [{'end': 7265.278, 'src': 'embed', 'start': 7244.368, 'weight': 1, 'content': [{'end': 7257.595, 'text': 'So for sure there is some ways in between those two extremes where you will have a meaningful interpolation and extrapolation regime that does not collapse all one way or the other way.', 'start': 7244.368, 'duration': 13.227}, {'end': 7261.196, 'text': 'So this is for sure one interesting route.', 'start': 7258.055, 'duration': 3.141}, {'end': 7265.278, 'text': "And this could potentially be, let's say, task agnostic.", 'start': 7262.217, 'duration': 3.061}], 'summary': 'Finding a meaningful interpolation and extrapolation regime between extremes, presenting an interesting task-agnostic route.', 'duration': 20.91, 'max_score': 7244.368, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw7244368.jpg'}, {'end': 7523.156, 'src': 'embed', 'start': 7487.649, 'weight': 0, 'content': [{'end': 7491.391, 'text': 'There exists some nonlinear transformation that transforms it into an interpolative space.', 'start': 7487.649, 'duration': 3.742}, {'end': 7492.391, 'text': "That's what these people say.", 'start': 7491.431, 'duration': 0.96}, {'end': 7497.514, 'text': "But I'm wondering, I'm interested in the curse of dimensionality.", 'start': 7492.712, 'duration': 4.802}, {'end': 7507.481, 'text': "Why does deep learning work at all? And I've spoken to folks that talk about creating various priors to combat the curse of dimensionality.", 'start': 7497.775, 'duration': 9.706}, {'end': 7511.464, 'text': 'But why do you think that deep learning works at all in these high dimensional spaces??', 'start': 7507.541, 'duration': 3.923}, {'end': 7523.156, 'text': "Well so first I might say something a bit, let's say, speculative or not agreed upon by everyone, but I don't think.", 'start': 7513.39, 'duration': 9.766}], 'summary': "Exploring deep learning's efficacy in high-dimensional spaces and combating the curse of dimensionality through various priors.", 'duration': 35.507, 'max_score': 7487.649, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw7487649.jpg'}, {'end': 8256.522, 'src': 'embed', 'start': 8225.076, 'weight': 2, 'content': [{'end': 8226.978, 'text': 'basically you are binarizing your data right?', 'start': 8225.076, 'duration': 1.902}, {'end': 8236.091, 'text': 'And if you have good classification, it means all the classes are assigned to the same labels, which is 1 or 0..', 'start': 8227.706, 'duration': 8.385}, {'end': 8243.014, 'text': 'And if you think of it still in interpolation regime, then suddenly you are in interpolation regime.', 'start': 8236.091, 'duration': 6.923}, {'end': 8243.434, 'text': 'You are a 1.', 'start': 8243.075, 'duration': 0.359}, {'end': 8246.877, 'text': 'The new sample is a 1 after binarization.', 'start': 8243.434, 'duration': 3.443}, {'end': 8249.138, 'text': 'And you become interpolation regime.', 'start': 8247.338, 'duration': 1.8}, {'end': 8251.059, 'text': 'And you have a good performance.', 'start': 8249.799, 'duration': 1.26}, {'end': 8256.522, 'text': 'But this comes after this compression step, if you will, or discretization step.', 'start': 8251.54, 'duration': 4.982}], 'summary': 'Binarizing data leads to good classification with 1 or 0 labels, resulting in improved performance post-compression or discretization.', 'duration': 31.446, 'max_score': 8225.076, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8225075.jpg'}], 'start': 7244.368, 'title': 'Generalization and dimensionality in machine learning', 'summary': 'Delves into the challenges of generalization and dimensionality in machine learning, emphasizing the impact of dimensions on generalization performance, the need for proper regularization and model training, and the limitations of deep learning in various domains. it also discusses the role of human engineering in machine learning, the successes and limitations of neural networks in extrapolation, and the validity of the manifold hypothesis in deep learning, highlighting the importance of meaningful nonlinear transformations for better generalization performance.', 'chapters': [{'end': 7624.016, 'start': 7244.368, 'title': 'Generalization and dimensionality in machine learning', 'summary': 'Discusses the challenges of generalization and dimensionality in machine learning, highlighting the need for meaningful interpolation and the impact of dimensions on generalization performance, emphasizing the importance of proper regularization and model training. it also questions the efficacy of deep learning in high-dimensional spaces and the need for task-specific preprocessing. additionally, it emphasizes the limitations of deep learning in various domains such as medical and audio classification.', 'duration': 379.648, 'highlights': ['The importance of meaningful interpolation and proper extrapolation regime Discusses the need for a meaningful interpolation and extrapolation regime that does not collapse one way or the other, emphasizing the task-agnostic potential.', 'Impact of dimensions on generalization performance and deep learning Emphasizes the impact of dimensions on generalization performance, highlighting the challenges of quantifying generalization performances and the potential limitations of deep learning in high-dimensional spaces.', 'Need for proper regularization and model training in handling high dimensions Stresses the importance of assuming perfect regularizer or model and regime of training, samples, and so on, in handling high dimensions, highlighting the challenges in practical settings.', 'Efficacy of deep learning in high-dimensional spaces and task-specific preprocessing Questions the efficacy of deep learning in high-dimensional spaces and emphasizes the usefulness of task-specific preprocessing in improving generalization.', 'Limitations of deep learning in various domains such as medical and audio classification Discusses the limitations of deep learning in various domains such as medical and audio classification, highlighting the challenges in generalization and the ad hoc nature of optimization.']}, {'end': 7940.74, 'start': 7624.136, 'title': 'Neural networks and generalization', 'summary': 'Discusses the role of human engineering in machine learning, the limitations and successes of neural networks in extrapolation, and the impact of high dimensionality on model generalization, highlighting the importance of task-dependent definitions for extrapolation.', 'duration': 316.604, 'highlights': ['The role of human engineering in machine learning Human engineering plays a crucial role in making machine learning work through specific architectures, guiding the direction of machine learning, and teaching machines how to perform tasks. This implies that out-of-the-box deep learning may not always work in new tasks, emphasizing the contribution of human engineering in the success of machine learning.', 'Limitations and successes of neural networks in extrapolation The discussion delves into the limitations of neural networks in extrapolating outside the training range and the surprising success of neural networks in extrapolation despite the curse of dimensionality. It emphasizes the importance of the orientation of the axis and the alignment in the orthogonal space for generalization performance.', 'Impact of high dimensionality on model generalization and the importance of task-dependent definitions for extrapolation The impact of high dimensionality on model generalization is explored, highlighting the challenge of being within a convex hull in high dimensionality and the need for a meaningful set to prevent extreme extrapolation. It emphasizes the correlation between the generalization performance of a model and the definition of extrapolation, suggesting the potential for task-dependent definitions to improve the precision of extrapolation.']}, {'end': 8251.059, 'start': 7941.961, 'title': 'Manifold hypothesis and interpolation in deep learning', 'summary': 'Discusses the validity of the manifold hypothesis in deep learning, the challenges of interpolation in high-dimensional spaces, and the importance of meaningful nonlinear transformations for better generalization performance in deep networks.', 'duration': 309.098, 'highlights': ['The challenges of interpolation in high-dimensional spaces Interpolating in high-dimensional spaces is challenging, as picking a new sample that lies in the convex hull of the training set is exponentially hard, especially when the dimensionality of the manifold is high.', 'Importance of meaningful nonlinear transformations for better generalization performance in deep networks Meaningful nonlinear transformations are crucial for achieving good generalization performance in deep networks, as they allow for shaping the data without excessively expanding or contracting the dimension of the space, leading to a higher generalization performance compared to other models.', 'Validity of the manifold hypothesis in deep learning The chapter explores the validity of the manifold hypothesis in deep learning, highlighting that while the actual data manifold may not be easily approximated, the model still performs useful statistical predictions in the intermediate space, supporting the manifold hypothesis.']}], 'duration': 1006.691, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw7244368.jpg', 'highlights': ['The importance of meaningful nonlinear transformations for better generalization performance in deep networks', 'The role of human engineering in machine learning', 'Impact of dimensions on generalization performance and deep learning', 'Limitations and successes of neural networks in extrapolation', 'Need for proper regularization and model training in handling high dimensions', 'Limitations of deep learning in various domains such as medical and audio classification']}, {'end': 9062.397, 'segs': [{'end': 8513.369, 'src': 'heatmap', 'start': 8268.151, 'weight': 1, 'content': [{'end': 8272.975, 'text': 'then you can reach classical interpolation much more easily as well.', 'start': 8268.151, 'duration': 4.824}, {'end': 8275.775, 'text': 'I have a couple of questions.', 'start': 8274.895, 'duration': 0.88}, {'end': 8281.978, 'text': "So again in table one, it was, I mean your paper's making the argument that everything is extrapolation,", 'start': 8275.856, 'duration': 6.122}, {'end': 8284.299, 'text': 'given this convex notion in high dimensional space.', 'start': 8281.978, 'duration': 2.321}, {'end': 8292.625, 'text': "But if we zoom in a little bit though, so you're using a pre-trained ResNet classifier, pre-trained on ImageNet.", 'start': 8284.379, 'duration': 8.246}, {'end': 8301.87, 'text': 'And how do all of these things change the structure of the latent space in a meaningful way? And the embedding space is also highly distributed.', 'start': 8293.025, 'duration': 8.845}, {'end': 8306.072, 'text': 'And we were wondering if Can you give us some intuition here?', 'start': 8302.27, 'duration': 3.802}, {'end': 8310.594, 'text': 'So is the information likely to be quite evenly distributed over the latent,', 'start': 8306.111, 'duration': 4.483}, {'end': 8315.878, 'text': "or do you think it's actually quite bunched up and sparsely encoded in few of the features?", 'start': 8310.594, 'duration': 5.284}, {'end': 8321.992, 'text': 'I think this will depend on which training setting you use.', 'start': 8318.789, 'duration': 3.203}, {'end': 8329.638, 'text': 'For example, if you start using dropout and things like that, you will try to have a more evenly distribution of your information,', 'start': 8322.191, 'duration': 7.447}, {'end': 8332.861, 'text': 'to have a more stable loss when you drop those units.', 'start': 8329.638, 'duration': 3.223}, {'end': 8335.245, 'text': "I don't think you have a general answer for that.", 'start': 8332.882, 'duration': 2.363}, {'end': 8340.308, 'text': 'It will depend on the type of regularization you have, or training is done, etc.', 'start': 8335.525, 'duration': 4.783}, {'end': 8346.395, 'text': 'But you have to keep in mind that what you try to do with gradient descent is just minimize your loss right?', 'start': 8341.17, 'duration': 5.225}, {'end': 8349.236, 'text': "But then with cross entropy loss, let's say,", 'start': 8346.934, 'duration': 2.302}, {'end': 8355.022, 'text': 'your gradient starts to vanish as you become really good and you stop learning at where you are basically.', 'start': 8349.236, 'duration': 5.786}, {'end': 8361.868, 'text': 'So given that, depending on your initialization, you will still try to make the best of what you get.', 'start': 8355.582, 'duration': 6.286}, {'end': 8370.054, 'text': 'and even if it means learning redundant dimensions, if this can reduce your loss further at a more rapid rate.', 'start': 8362.508, 'duration': 7.546}, {'end': 8371.476, 'text': "that's what you will do.", 'start': 8370.054, 'duration': 1.422}, {'end': 8380.343, 'text': "so, if you don't impose any regularization or anything, there is no clear reason to assume that everything is well organized and so on,", 'start': 8371.476, 'duration': 8.867}, {'end': 8382.405, 'text': "and that's what we see even in generative networks.", 'start': 8380.343, 'duration': 2.062}, {'end': 8389.61, 'text': 'you have to start putting so much regularization to try to have disentanglement and to try to make sense of those latent spaces,', 'start': 8382.405, 'duration': 7.205}, {'end': 8397.555, 'text': 'because otherwise you just try to learn what minimizes your loss with the most short-term view of your loss landscape.', 'start': 8389.61, 'duration': 7.945}, {'end': 8406.901, 'text': 'So basically that could be built if you have some specific regularizer, but otherwise it will not occur naturally.', 'start': 8398.055, 'duration': 8.846}, {'end': 8408.602, 'text': 'Of course.', 'start': 8407.602, 'duration': 1}, {'end': 8413.886, 'text': 'and again, if you wanted to only retain the minimal information to solve the task at hand,', 'start': 8408.602, 'duration': 5.284}, {'end': 8419.631, 'text': 'then you will see much more interpolation regime in that latent space.', 'start': 8414.767, 'duration': 4.864}, {'end': 8428.3, 'text': 'If you think of MNIST for example, you will disregard the translation of your digit, the rotation of the digit, all those things.', 'start': 8420.452, 'duration': 7.848}, {'end': 8436.768, 'text': 'All those things will be disregarded when you reach the latent space, and then you will basically be in interpolation regime most of the time.', 'start': 8428.9, 'duration': 7.868}, {'end': 8446.174, 'text': 'But since you keep as much information as possible to try to minimize your loss as best as you can, then you basically occupy as much as you can.', 'start': 8437.248, 'duration': 8.926}, {'end': 8451.338, 'text': 'Unless, again, you have some degeneracies because of the ROLU or architecture tricks.', 'start': 8446.755, 'duration': 4.583}, {'end': 8458.362, 'text': 'For example, if you have a bottleneck layer, you will limit the dimensionality of the manifold you span in the latent space.', 'start': 8451.818, 'duration': 6.544}, {'end': 8462.585, 'text': 'So you can have all those parameters that can play a big role.', 'start': 8458.783, 'duration': 3.802}, {'end': 8466.908, 'text': "So in the general setting, I don't think you could assume anything.", 'start': 8463.266, 'duration': 3.642}, {'end': 8477.801, 'text': "And I think there's also a relationship too between the dimensionality of the latent space and let's say some intrinsic dimensionality of the problem.", 'start': 8467.833, 'duration': 9.968}, {'end': 8486.447, 'text': 'So if the intrinsic dimensionality of the problem only takes five dimensions to solve and yet I give a latent space of 256 dimensions,', 'start': 8477.881, 'duration': 8.566}, {'end': 8496.756, 'text': 'I think what I hear you saying is that of course gradient descent is going to make some use of those other 251 dimensions,', 'start': 8489.61, 'duration': 7.146}, {'end': 8503.621, 'text': "but they're going to have maybe a very minuscule or diminishing effect on the latent space.", 'start': 8496.756, 'duration': 6.865}, {'end': 8513.369, 'text': 'Whereas, on the other hand, if I then took that same network and increased the complexity of the problem, We could end up with, for example,', 'start': 8504.362, 'duration': 9.007}], 'summary': 'Discussion on the impact of regularization and latent space structure in high-dimensional spaces, highlighting the influence on loss minimization and information retention.', 'duration': 245.218, 'max_score': 8268.151, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8268151.jpg'}, {'end': 8346.395, 'src': 'embed', 'start': 8318.789, 'weight': 5, 'content': [{'end': 8321.992, 'text': 'I think this will depend on which training setting you use.', 'start': 8318.789, 'duration': 3.203}, {'end': 8329.638, 'text': 'For example, if you start using dropout and things like that, you will try to have a more evenly distribution of your information,', 'start': 8322.191, 'duration': 7.447}, {'end': 8332.861, 'text': 'to have a more stable loss when you drop those units.', 'start': 8329.638, 'duration': 3.223}, {'end': 8335.245, 'text': "I don't think you have a general answer for that.", 'start': 8332.882, 'duration': 2.363}, {'end': 8340.308, 'text': 'It will depend on the type of regularization you have, or training is done, etc.', 'start': 8335.525, 'duration': 4.783}, {'end': 8346.395, 'text': 'But you have to keep in mind that what you try to do with gradient descent is just minimize your loss right?', 'start': 8341.17, 'duration': 5.225}], 'summary': 'The impact of training settings on loss distribution and regularization varies, with a focus on minimizing loss.', 'duration': 27.606, 'max_score': 8318.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8318789.jpg'}, {'end': 8408.602, 'src': 'embed', 'start': 8382.405, 'weight': 6, 'content': [{'end': 8389.61, 'text': 'you have to start putting so much regularization to try to have disentanglement and to try to make sense of those latent spaces,', 'start': 8382.405, 'duration': 7.205}, {'end': 8397.555, 'text': 'because otherwise you just try to learn what minimizes your loss with the most short-term view of your loss landscape.', 'start': 8389.61, 'duration': 7.945}, {'end': 8406.901, 'text': 'So basically that could be built if you have some specific regularizer, but otherwise it will not occur naturally.', 'start': 8398.055, 'duration': 8.846}, {'end': 8408.602, 'text': 'Of course.', 'start': 8407.602, 'duration': 1}], 'summary': 'Regularization is necessary for disentanglement and understanding latent spaces.', 'duration': 26.197, 'max_score': 8382.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8382405.jpg'}, {'end': 8558.317, 'src': 'embed', 'start': 8516.051, 'weight': 0, 'content': [{'end': 8520.494, 'text': "So if we're doing some multi-class problem, we may find that it sort of arranges, you know,", 'start': 8516.051, 'duration': 4.443}, {'end': 8529.882, 'text': 'these seven dimensions to solve the dog versus hot dog problem and these 12 dimensions to dissolve the car versus motorcycle problem.', 'start': 8520.494, 'duration': 9.388}, {'end': 8536.607, 'text': 'It might be forced to make more compact use of that latent information space per class.', 'start': 8530.282, 'duration': 6.325}, {'end': 8539.288, 'text': "Is that fair? Yeah, so that's a very good point.", 'start': 8536.687, 'duration': 2.601}, {'end': 8543.81, 'text': 'So, first of the first things that you said, one thing to keep in mind.', 'start': 8539.349, 'duration': 4.461}, {'end': 8548.791, 'text': "so let's say, your data is even a linear manifold of dimension one,", 'start': 8543.81, 'duration': 4.981}, {'end': 8553.753, 'text': "and then you go through a deep net and then somehow it's already linearly separable.", 'start': 8548.791, 'duration': 4.962}, {'end': 8558.317, 'text': 'then you only need to learn the identity mapping with your deep net to solve the task.', 'start': 8554.313, 'duration': 4.004}], 'summary': 'Deep nets can make more compact use of latent information space per class, achieving linear separability with fewer dimensions.', 'duration': 42.266, 'max_score': 8516.051, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8516051.jpg'}, {'end': 9024.309, 'src': 'embed', 'start': 8998.13, 'weight': 2, 'content': [{'end': 9008.356, 'text': 'the real limitation of it is that how you subdivide one region by adding a node does not really tell you how to subdivide another region in another part of the space.', 'start': 8998.13, 'duration': 10.226}, {'end': 9015.059, 'text': "You don't have this, let's say, communication or friendly help between regions subdivision.", 'start': 9008.976, 'duration': 6.083}, {'end': 9016.721, 'text': 'But in a deep net.', 'start': 9015.499, 'duration': 1.222}, {'end': 9024.309, 'text': 'what you do actually is that if you know how to subdivide one region, then it will automatically enforce how you subdivide nearby regions.', 'start': 9016.721, 'duration': 7.588}], 'summary': 'In deep nets, subdivision in one region enforces nearby regions to subdivide similarly.', 'duration': 26.179, 'max_score': 8998.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8998130.jpg'}, {'end': 9073.764, 'src': 'embed', 'start': 9048.052, 'weight': 4, 'content': [{'end': 9056.58, 'text': 'So I think this is extremely nice to have both points of view, because then you can try to use the strengths of one to maybe improve the other.', 'start': 9048.052, 'duration': 8.528}, {'end': 9058.722, 'text': "So that's really interesting to me.", 'start': 9056.62, 'duration': 2.102}, {'end': 9062.397, 'text': 'I want to ask you this question that we ask a lot of our guests,', 'start': 9059.855, 'duration': 2.542}, {'end': 9073.764, 'text': "because it's just at least it's something that's of kind of profound interest to me is that there is this apparent dichotomy between continuous and discrete.", 'start': 9062.397, 'duration': 11.367}], 'summary': 'Examining the benefits of dual perspectives to enhance different strengths.', 'duration': 25.712, 'max_score': 9048.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9048052.jpg'}], 'start': 8251.54, 'title': 'Deep learning applications', 'summary': 'Delves into the impact of compression and quantization on latent spaces in deep learning, emphasizing information distribution, regularization influence, and the relationship between latent space dimensionality and problem complexity. it also explores the use of deep neural networks in building adaptive splines, offering insights into decision boundaries and bridging the gap between deep learning and traditional signal processing.', 'chapters': [{'end': 8717.798, 'start': 8251.54, 'title': 'Understanding latent space and deep learning', 'summary': 'Discusses the impact of compression and quantization on the structure of latent spaces in deep learning, emphasizing the distribution of information, the influence of regularization, and the relationship between latent space dimensionality and problem complexity.', 'duration': 466.258, 'highlights': ['The impact of compression and quantization on the structure of latent space is explored, showing the influence of regularization and the distribution of information. The chapter discusses how compression and quantization affect the structure of latent space, highlighting the influence of regularization on the distribution of information and the relationship between compression, quantization, and classical interpolation.', 'The relationship between latent space dimensionality and problem complexity is examined, illustrating how gradient descent and cross-entropy loss influence the organization and distribution of information in the latent space. The chapter delves into the relationship between latent space dimensionality and problem complexity, explaining how gradient descent and cross-entropy loss impact the organization and distribution of information in the latent space.', 'The use of dropout and regularization techniques to achieve a stable loss and a more evenly distributed information in the latent space is discussed. The chapter explores the use of dropout and regularization techniques to achieve a stable loss and a more evenly distributed information in the latent space, emphasizing the impact of training settings on the distribution of information.']}, {'end': 9062.397, 'start': 8718.818, 'title': 'Deep learning and adaptive splines', 'summary': 'Explores the use of deep neural networks as a smart way to build an adaptive spline that learns its partition of the input space and parallel affine mapping, providing insights into decision boundaries and potential new learning techniques, bridging the gap between deep learning and traditional signal processing.', 'duration': 343.579, 'highlights': ['The use of deep neural networks as adaptive splines provides insights into decision boundaries, with the discovery that decision boundaries are linear in each region of the partition, potentially leading to new learning techniques and insights into how many layers to stack and which regions to subdivide.', 'The research bridges the gap between traditional signal processing and deep learning by demonstrating that a deep net is a smart way to build an adaptive spline that automatically learns its partition of the input space and parallel affine mapping, providing a new geometric avenue to study how deep neural networks organize signals in a hierarchical fashion.', 'The chapter presents the use of deep neural networks as a smart way to build an adaptive spline, revealing that at each layer, the partition of the mapping evolves by refining the regions containing more than one class within them, which can potentially open the door to building new learning techniques and understanding the behavior of these models.', 'The discussion uncovers that deep neural networks provide a mechanism where samples in some regions can guide the subdivision of regions in the space without samples, particularly powerful in high-dimensional settings where samples in all parts of the space are not feasible, highlighting the strength of deep nets in comparison to decision trees.', 'The research delves into the field of computational geometry, providing insights into hyperplane arrangements, half spaces, intersection of them, and hyperplane tessellation, offering visualization tools and interesting insights into understanding the behavior of deep neural networks.']}], 'duration': 810.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw8251540.jpg', 'highlights': ['The use of deep neural networks as adaptive splines provides insights into decision boundaries, potentially leading to new learning techniques.', 'The research bridges the gap between traditional signal processing and deep learning by demonstrating that a deep net is a smart way to build an adaptive spline.', 'The discussion uncovers that deep neural networks provide a mechanism where samples in some regions can guide the subdivision of regions in the space without samples, particularly powerful in high-dimensional settings.', 'The chapter presents the use of deep neural networks as a smart way to build an adaptive spline, revealing that at each layer, the partition of the mapping evolves by refining the regions containing more than one class within them.', 'The impact of compression and quantization on the structure of latent space is explored, showing the influence of regularization and the distribution of information.', 'The relationship between latent space dimensionality and problem complexity is examined, illustrating how gradient descent and cross-entropy loss influence the organization and distribution of information in the latent space.', 'The use of dropout and regularization techniques to achieve a stable loss and a more evenly distributed information in the latent space is discussed.']}, {'end': 9914.934, 'segs': [{'end': 9233.685, 'src': 'embed', 'start': 9207.692, 'weight': 5, 'content': [{'end': 9213.557, 'text': 'And then within that region, you have a linear transformation of the mapping, which is basically continuous.', 'start': 9207.692, 'duration': 5.865}, {'end': 9215.679, 'text': 'And both interact.', 'start': 9214.357, 'duration': 1.322}, {'end': 9218.281, 'text': 'If you adapt one, the other one changes, and so on.', 'start': 9215.759, 'duration': 2.522}, {'end': 9228.003, 'text': 'And I think having this type of hybrid systems and where somehow learning through the continuous part adapts,', 'start': 9218.881, 'duration': 9.122}, {'end': 9231.684, 'text': 'the discrete part is what is extremely powerful.', 'start': 9228.003, 'duration': 3.681}, {'end': 9233.685, 'text': "And that's, I think,", 'start': 9232.264, 'duration': 1.421}], 'summary': 'Hybrid systems involving continuous and discrete parts interact and adapt, showcasing their power in learning.', 'duration': 25.993, 'max_score': 9207.692, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9207692.jpg'}, {'end': 9332.151, 'src': 'embed', 'start': 9305.925, 'weight': 7, 'content': [{'end': 9312.956, 'text': 'because if the problem even was learnable with stochastic gradient descent, the representation would be glitchy right?', 'start': 9305.925, 'duration': 7.031}, {'end': 9314.138, 'text': "It just wouldn't, it wouldn't work.", 'start': 9313.016, 'duration': 1.122}, {'end': 9322.864, 'text': 'Yeah, I think it depends a lot too on what are you trying to achieve with the model you build.', 'start': 9316.24, 'duration': 6.624}, {'end': 9332.151, 'text': "If you just try to be as close as possible to, let's say, what the human brain is doing, then you might impose yourself to have some restriction on.", 'start': 9322.904, 'duration': 9.247}], 'summary': 'Discussion on the limitations of stochastic gradient descent in building glitchy representations and the importance of aligning model goals with human brain functions.', 'duration': 26.226, 'max_score': 9305.925, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9305925.jpg'}, {'end': 9432.257, 'src': 'embed', 'start': 9403.278, 'weight': 4, 'content': [{'end': 9409.063, 'text': 'which seems to kind of more slowly transition from interpolation to image extrapolation.', 'start': 9403.278, 'duration': 5.785}, {'end': 9415.046, 'text': "And what I'm wondering is the intuition I got from that, and I wonder if this is completely wrong or it's correct.", 'start': 9409.603, 'duration': 5.443}, {'end': 9424.492, 'text': "is that for machine learning there's a sense in which MNIST is actually a harder problem because it has to look at kind of global relationships?", 'start': 9415.046, 'duration': 9.446}, {'end': 9432.257, 'text': "Like it has to try and say, well, there's a circle over here that's kind of oriented with respect to a line that's kind of further away.", 'start': 9424.572, 'duration': 7.685}], 'summary': 'Transitioning from interpolation to image extrapolation in machine learning, mnist poses a harder problem due to the need to discern global relationships.', 'duration': 28.979, 'max_score': 9403.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9403278.jpg'}, {'end': 9620.055, 'src': 'embed', 'start': 9593.673, 'weight': 1, 'content': [{'end': 9604.042, 'text': 'the same patch for all the samples and then we are looking at the proportion of the test set patches that are in interpolation regime and we report this.', 'start': 9593.673, 'duration': 10.369}, {'end': 9606.924, 'text': 'Now for the PCA plot.', 'start': 9604.802, 'duration': 2.122}, {'end': 9617.132, 'text': 'what we do is basically we look at once you extract this patch, how many principal components you need to perfectly reconstruct those patches, or,', 'start': 9606.924, 'duration': 10.208}, {'end': 9620.055, 'text': 'you could say, to explain the variance in those patches?', 'start': 9617.132, 'duration': 2.923}], 'summary': 'Analyzing patch proportion in test set for interpolation regime and determining principal components for patch reconstruction.', 'duration': 26.382, 'max_score': 9593.673, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9593673.jpg'}, {'end': 9832.928, 'src': 'embed', 'start': 9805.694, 'weight': 0, 'content': [{'end': 9810.538, 'text': 'Now on ImageNet, a 16 by 16 patch is almost a constant texture.', 'start': 9805.694, 'duration': 4.844}, {'end': 9813.54, 'text': "You have a few different colors, but you don't have a lot of variation.", 'start': 9810.818, 'duration': 2.722}, {'end': 9818.201, 'text': 'through the spatial dimensions is basically constant.', 'start': 9814.82, 'duration': 3.381}, {'end': 9825.384, 'text': 'And what this means is that with only three or four principal components, basically one for each color channel,', 'start': 9818.722, 'duration': 6.662}, {'end': 9827.424, 'text': 'you can perfectly reconstruct all those 16 by 16 patches.', 'start': 9825.384, 'duration': 2.04}, {'end': 9832.928, 'text': 'So, to really get to the conclusions you are saying,', 'start': 9829.745, 'duration': 3.183}], 'summary': 'On imagenet, 16x16 patches show low variation, requiring only 3-4 principal components for perfect reconstruction.', 'duration': 27.234, 'max_score': 9805.694, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9805694.jpg'}], 'start': 9062.397, 'title': 'Hybrid systems and dataset analysis', 'summary': 'Discusses the efficiency of hybrid systems in combining continuous and discrete reasoning, and analyzes the differences in interpolation and extrapolation behaviors between mnist and imagenet datasets, emphasizing the impact of spatial dimensions and principal components.', 'chapters': [{'end': 9374, 'start': 9062.397, 'title': 'Dichotomy of continuous and discrete systems', 'summary': 'Discusses the apparent dichotomy between continuous and discrete systems, the efficiency of hybrid systems in combining both types of reasoning, and the interaction between continuous and discrete systems for adaptive training.', 'duration': 311.603, 'highlights': ['The efficiency of hybrid systems in combining discrete and continuous settings for efficient clustering and computation.', 'The interaction between continuous and discrete systems in current networks, leading to adaptive training of their discrete part through training of their continuous parameters.', 'The importance of both continuous and discrete systems interacting to avoid becoming suboptimal in certain regimes and achieve the best convergence rate.', 'The discussion on the goal and purpose of the model in determining whether to prioritize a discrete or continuous system, with the potential efficiency of a hybrid system for task acquisition.', 'The limitations of solely relying on either discrete or continuous systems, and the potential glitchiness of representation in the continuous domain.']}, {'end': 9914.934, 'start': 9374.701, 'title': 'Mnist and imagenet analysis', 'summary': 'Discusses the differences in interpolation and extrapolation behaviors between mnist and imagenet datasets, highlighting the impact of spatial dimensions and principal components on the difficulty of the problem.', 'duration': 540.233, 'highlights': ['The chapter explains that MNIST data transitions more rapidly from interpolation to extrapolation as dimensionality increases compared to ImageNet due to differences in spatial information, indicating that MNIST may be a harder problem for machine learning. MNIST data transitions more rapidly from interpolation to extrapolation as dimensionality increases compared to ImageNet', 'The discussion emphasizes that for a fixed dimensionality in pixel space, MNIST goes much more quickly to an extrapolation regime because the information in that amount of dimensions is greater than for the imagined case, providing insights into the differences in behavior between the two datasets. MNIST goes much more quickly to an extrapolation regime for a fixed dimensionality in pixel space compared to ImageNet', 'The chapter highlights that MNIST requires more principal components for a given amount of variance explained and dimensionality compared to ImageNet, but clarifies that this is due to the difference in spatial dimensions of the datasets. MNIST requires more principal components for a given amount of variance explained and dimensionality compared to ImageNet']}], 'duration': 852.537, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9062397.jpg', 'highlights': ['The efficiency of hybrid systems in combining discrete and continuous settings for efficient clustering and computation.', 'The interaction between continuous and discrete systems in current networks, leading to adaptive training of their discrete part through training of their continuous parameters.', 'The importance of both continuous and discrete systems interacting to avoid becoming suboptimal in certain regimes and achieve the best convergence rate.', 'The discussion on the goal and purpose of the model in determining whether to prioritize a discrete or continuous system, with the potential efficiency of a hybrid system for task acquisition.', 'The limitations of solely relying on either discrete or continuous systems, and the potential glitchiness of representation in the continuous domain.', 'MNIST data transitions more rapidly from interpolation to extrapolation as dimensionality increases compared to ImageNet due to differences in spatial information, indicating that MNIST may be a harder problem for machine learning.', 'For a fixed dimensionality in pixel space, MNIST goes much more quickly to an extrapolation regime because the information in that amount of dimensions is greater than for the imagined case, providing insights into the differences in behavior between the two datasets.', 'MNIST requires more principal components for a given amount of variance explained and dimensionality compared to ImageNet, but clarifies that this is due to the difference in spatial dimensions of the datasets.']}, {'end': 10832.758, 'segs': [{'end': 9972.878, 'src': 'embed', 'start': 9942.081, 'weight': 9, 'content': [{'end': 9943.722, 'text': 'what he says is.', 'start': 9942.081, 'duration': 1.641}, {'end': 9952.23, 'text': 'my purpose behind this paper is is to show that, even though the intuition that people have of interpolation,', 'start': 9943.722, 'duration': 8.508}, {'end': 9963.02, 'text': 'like the intuition that we have of interpolation, is good, the mathematical definition that we have of interpolation is not useful in high dimension.', 'start': 9952.23, 'duration': 10.79}, {'end': 9968.754, 'text': 'What I thought was interesting, too, is when we asked about well, what about the manifold concept?', 'start': 9963.73, 'duration': 5.024}, {'end': 9972.878, 'text': "You know, why isn't that sort of the definition of interpolation?", 'start': 9968.935, 'duration': 3.943}], 'summary': 'Paper aims to demonstrate limitations of mathematical definition of interpolation in high dimensions.', 'duration': 30.797, 'max_score': 9942.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9942081.jpg'}, {'end': 10033.565, 'src': 'embed', 'start': 9990.874, 'weight': 6, 'content': [{'end': 9998.541, 'text': "in high enough dimension, interpolation doesn't work and there your manifold is just literally a convex hole.", 'start': 9990.874, 'duration': 7.667}, {'end': 10004.447, 'text': 'you know, and so sure you can have kind of a nonlinear transformation and a nonlinear shape and whatever.', 'start': 9998.541, 'duration': 5.906}, {'end': 10007.57, 'text': "you're still hit by this curse of dimensionality.", 'start': 10004.447, 'duration': 3.123}, {'end': 10013.113, 'text': 'and And he, you know, he brought up the point that, like you know, of course,', 'start': 10007.57, 'duration': 5.543}, {'end': 10023.659, 'text': 'if your problem compresses down enough to where only a small number of transformed dimensions right latent space dimensions matter and everything,', 'start': 10013.113, 'duration': 10.546}, {'end': 10027.101, 'text': "then you can be said that you're interpolating.", 'start': 10023.659, 'duration': 3.442}, {'end': 10032.744, 'text': "you know, because we're not really hit by that curse of dimensionality, because we stripped away all the dimensionality down to these dimensions.", 'start': 10027.101, 'duration': 5.643}, {'end': 10033.565, 'text': "We've gotten lucky.", 'start': 10032.784, 'duration': 0.781}], 'summary': 'In high dimensions, interpolation fails, and a lower-dimensional latent space avoids the curse of dimensionality.', 'duration': 42.691, 'max_score': 9990.874, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9990874.jpg'}, {'end': 10174.568, 'src': 'embed', 'start': 10151.459, 'weight': 4, 'content': [{'end': 10158.402, 'text': "So I think that's an area where we gotta dive deeper with him, but I didn't hear anything definitive, you know, today.", 'start': 10151.459, 'duration': 6.943}, {'end': 10159.323, 'text': "It's a wrap.", 'start': 10158.823, 'duration': 0.5}, {'end': 10162.384, 'text': 'We just interviewed the godfather of deep learning.', 'start': 10160.043, 'duration': 2.341}, {'end': 10165.485, 'text': "How's that possible? I think we can just quit now.", 'start': 10162.704, 'duration': 2.781}, {'end': 10166.886, 'text': 'We might as well just shut the channel down.', 'start': 10165.545, 'duration': 1.341}, {'end': 10171.247, 'text': "Yeah I mean, obviously, after we've published this.", 'start': 10167.606, 'duration': 3.641}, {'end': 10174.568, 'text': "That's the singularity.", 'start': 10173.307, 'duration': 1.261}], 'summary': 'Interview with the godfather of deep learning. the singularity.', 'duration': 23.109, 'max_score': 10151.459, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10151459.jpg'}, {'end': 10322.772, 'src': 'embed', 'start': 10291.426, 'weight': 5, 'content': [{'end': 10296.088, 'text': 'However, we work on it, we modify them,', 'start': 10291.426, 'duration': 4.662}, {'end': 10309.058, 'text': "we're figuring out how do we need to build these systems such that A learning system can conceivably do many of the things that people would call kind of reasoning.", 'start': 10296.088, 'duration': 12.97}, {'end': 10311.321, 'text': "And I think that's why he went into.", 'start': 10309.258, 'duration': 2.063}, {'end': 10315.405, 'text': 'let me give you an example of reasoning to sort of show.', 'start': 10311.321, 'duration': 4.084}, {'end': 10322.772, 'text': 'look, here is an example of a reasoning that neural networks already do.', 'start': 10315.405, 'duration': 7.367}], 'summary': 'Developing systems for learning and reasoning, neural networks already demonstrate reasoning capabilities.', 'duration': 31.346, 'max_score': 10291.426, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10291426.jpg'}, {'end': 10794.529, 'src': 'embed', 'start': 10766.101, 'weight': 0, 'content': [{'end': 10780.447, 'text': "you couldn't build neural network that you know has like these kind of continuous values but still ends up with something that that's that synthesizes a a discrete,", 'start': 10766.101, 'duration': 14.346}, {'end': 10784.669, 'text': "you know, decision, sure, but wouldn't it be glitchy and it still wouldn't extrapolate?", 'start': 10780.447, 'duration': 4.222}, {'end': 10786.545, 'text': 'I have no idea.', 'start': 10785.444, 'duration': 1.101}, {'end': 10787.905, 'text': "I just don't know.", 'start': 10786.565, 'duration': 1.34}, {'end': 10794.529, 'text': "I think this kind of point is we're too early on to reach these conclusions that it cannot be done.", 'start': 10787.965, 'duration': 6.564}], 'summary': 'Challenges in building neural networks for synthesizing discrete decisions and extrapolating are uncertain at this early stage.', 'duration': 28.428, 'max_score': 10766.101, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10766101.jpg'}, {'end': 10844.983, 'src': 'embed', 'start': 10816.847, 'weight': 2, 'content': [{'end': 10819.429, 'text': "that's going to be these hybrid systems working together.", 'start': 10816.847, 'duration': 2.582}, {'end': 10824.654, 'text': 'Just like you brought up with Hey look, you know um alpha, alpha zero, alpha, go, all those things are.', 'start': 10819.489, 'duration': 5.165}, {'end': 10826.576, 'text': 'it can be viewed in the same kind of hybrid way.', 'start': 10824.654, 'duration': 1.922}, {'end': 10830.697, 'text': "Right, But that's exactly I mean you said.", 'start': 10826.656, 'duration': 4.041}, {'end': 10832.758, 'text': 'this is what you said and also what when he,', 'start': 10830.697, 'duration': 2.061}, {'end': 10839.461, 'text': "when he said like so first when he said you know we're far away from that with the digits of pi and so on,", 'start': 10832.758, 'duration': 6.703}, {'end': 10844.983, 'text': 'but also when he said you know humans are actually pretty bad at chess or at discrete exploration in general.', 'start': 10839.461, 'duration': 5.522}], 'summary': 'Hybrid systems like alphazero and alphago are viewed in a similar way, with humans being relatively poor at chess and discrete exploration.', 'duration': 28.136, 'max_score': 10816.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10816847.jpg'}], 'start': 9915.374, 'title': 'Challenges in high-dimensional interpolation, deep learning, and neural networks for discrete reasoning', 'summary': 'Discusses challenges in high-dimensional interpolation and deep learning, emphasizing the limitations in mathematical definition, the need for generalization, difficulties in deep learning, including handling reasoning and debate over interpolation and extrapolation, and the feasibility of using neural networks for discrete reasoning and potential hybrid systems.', 'chapters': [{'end': 10150.499, 'start': 9915.374, 'title': 'Challenges in high-dimensional interpolation', 'summary': 'Discusses the challenges of high-dimensional interpolation, emphasizing the limitations of the mathematical definition in high dimensions and the need for a better definition to maintain intuitive notions while working in high dimensions, with a goal of achieving generalization.', 'duration': 235.125, 'highlights': ['The mathematical definition of interpolation is not useful in high dimensions The intuition of interpolation is good, but the mathematical definition is not useful in high dimensions, highlighting the limitations of high-dimensional interpolation.', "Interpolation doesn't work in high enough dimension even for linear problems Even in the case of linear problems in high dimensions, interpolation doesn't work, as high dimensionality leads to the failure of interpolation, emphasizing the impact of dimensionality on the effectiveness of interpolation.", 'Need for a better definition of interpolation for high dimensions to achieve generalization There is a need to come up with a definition of interpolation that maintains intuitive notions while working in high dimensions, with the ultimate goal of achieving generalization, emphasizing the importance of developing a better definition for high-dimensional interpolation.', 'Difficult questions in high-dimensional interpolation are problem-specific and not universally applicable The challenges and concepts related to high-dimensional interpolation are problem-specific and not universally applicable, indicating the need for task-specific and problem-specific approaches in addressing high-dimensional interpolation.']}, {'end': 10472.214, 'start': 10151.459, 'title': 'Challenges in deep learning', 'summary': 'Discusses the challenges in deep learning, including the need to identify areas of disagreement, the potential for neural networks to handle reasoning, and the debate over interpolation and extrapolation in machine learning, as highlighted by the interview.', 'duration': 320.755, 'highlights': ['The interviewee emphasizes the need to identify areas of disagreement in deep learning, indicating a general sentiment of agreement among academics and the importance of understanding differing perspectives.', "The interviewee presents a more optimistic outlook on the capabilities of neural networks for reasoning, contrasting with the skepticism of some individuals, with a focus on modifying and expanding the systems' abilities.", 'The discussion delves into the debate over interpolation and extrapolation in machine learning, with objections raised against drawing strong conclusions about the capabilities of neural networks and the need for further work to enhance their capabilities.']}, {'end': 10832.758, 'start': 10472.214, 'title': 'Feasibility of neural networks for discrete reasoning', 'summary': "Discusses the feasibility of using neural networks for discrete reasoning, highlighting the challenges and potential hybrid systems, emphasizing the ongoing debate on whether it's possible, efficient, or glitchy.", 'duration': 360.544, 'highlights': ["Yann LeCun and the debate on neural networks' ability to generalize and interpolate, quoting Francois Chollet and Andrew Yee, questioning the possibility of neural networks to generalize and the comparison with Taylor series and Fourier analysis.", "The comparison of the human brain's continuous analog function and its ability to produce digital reasoning, raising doubts about the efficient use of neural networks for discrete reasoning.", 'The potential future use of hybrid systems combining neural networks with classic digital compute components, posing the question of whether neural networks will be the primary method for discrete reasoning or part of a hybrid structural system.', 'The continuous and distributed nature of the brain, with no discrete components found, leading to skepticism about the possibility of building neural networks that synthesize discrete decisions efficiently without glitches or extrapolation issues.']}], 'duration': 917.384, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw9915374.jpg', 'highlights': ['The challenges and concepts related to high-dimensional interpolation are problem-specific and not universally applicable, indicating the need for task-specific and problem-specific approaches in addressing high-dimensional interpolation.', 'There is a need to come up with a definition of interpolation that maintains intuitive notions while working in high dimensions, with the ultimate goal of achieving generalization, emphasizing the importance of developing a better definition for high-dimensional interpolation.', "Even in the case of linear problems in high dimensions, interpolation doesn't work, as high dimensionality leads to the failure of interpolation, emphasizing the impact of dimensionality on the effectiveness of interpolation.", 'The mathematical definition of interpolation is not useful in high dimensions The intuition of interpolation is good, but the mathematical definition is not useful in high dimensions, highlighting the limitations of high-dimensional interpolation.', 'The interviewee emphasizes the need to identify areas of disagreement in deep learning, indicating a general sentiment of agreement among academics and the importance of understanding differing perspectives.', 'The discussion delves into the debate over interpolation and extrapolation in machine learning, with objections raised against drawing strong conclusions about the capabilities of neural networks and the need for further work to enhance their capabilities.', "The interviewee presents a more optimistic outlook on the capabilities of neural networks for reasoning, contrasting with the skepticism of some individuals, with a focus on modifying and expanding the systems' abilities.", "Yann LeCun and the debate on neural networks' ability to generalize and interpolate, quoting Francois Chollet and Andrew Yee, questioning the possibility of neural networks to generalize and the comparison with Taylor series and Fourier analysis.", 'The potential future use of hybrid systems combining neural networks with classic digital compute components, posing the question of whether neural networks will be the primary method for discrete reasoning or part of a hybrid structural system.', "The comparison of the human brain's continuous analog function and its ability to produce digital reasoning, raising doubts about the efficient use of neural networks for discrete reasoning.", 'The continuous and distributed nature of the brain, with no discrete components found, leading to skepticism about the possibility of building neural networks that synthesize discrete decisions efficiently without glitches or extrapolation issues.']}, {'end': 11982.349, 'segs': [{'end': 10858.21, 'src': 'embed', 'start': 10832.758, 'weight': 3, 'content': [{'end': 10839.461, 'text': "when he said like so first when he said you know we're far away from that with the digits of pi and so on,", 'start': 10832.758, 'duration': 6.703}, {'end': 10844.983, 'text': 'but also when he said you know humans are actually pretty bad at chess or at discrete exploration in general.', 'start': 10839.461, 'duration': 5.522}, {'end': 10851.526, 'text': "And that is, that's how humans do it, right? Humans build discrete reasoning.", 'start': 10845.684, 'duration': 5.842}, {'end': 10856.669, 'text': 'on top of this sort of neural continuous function.', 'start': 10852.907, 'duration': 3.762}, {'end': 10858.21, 'text': "And it's actually really hard.", 'start': 10856.709, 'duration': 1.501}], 'summary': 'Humans struggle with discrete reasoning on top of neural continuous function, making it challenging.', 'duration': 25.452, 'max_score': 10832.758, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10832758.jpg'}, {'end': 11109.342, 'src': 'embed', 'start': 11088.111, 'weight': 9, 'content': [{'end': 11097.014, 'text': 'we need to end-to-end learn a machine where you simply input something and then out pops through forward prop, out pops an answer.', 'start': 11088.111, 'duration': 8.903}, {'end': 11099.255, 'text': 'But all of these things are just reasoning by the backdoor.', 'start': 11097.054, 'duration': 2.201}, {'end': 11101.1, 'text': 'Even with that critic.', 'start': 11100.2, 'duration': 0.9}, {'end': 11105.521, 'text': "as I understand, that's just the way of hacking the value, the advantage function right?", 'start': 11101.1, 'duration': 4.421}, {'end': 11109.342, 'text': "I'm not an expert in reinforcement learning, but I think that's what it is.", 'start': 11106.101, 'duration': 3.241}], 'summary': 'Learning a machine to input and output an answer, using reasoning by the backdoor and hacking the advantage function.', 'duration': 21.231, 'max_score': 11088.111, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11088111.jpg'}, {'end': 11374.167, 'src': 'embed', 'start': 11345.03, 'weight': 0, 'content': [{'end': 11352.635, 'text': 'But Anyway, my intuition is that deep learning models encode the most high frequency information into the latent space.', 'start': 11345.03, 'duration': 7.605}, {'end': 11358.879, 'text': 'And this information would be encoded in a minimally distributed way to denoise the predictive task,', 'start': 11352.775, 'duration': 6.104}, {'end': 11365.824, 'text': "which is to say there's a few dimensions of the latent which should be encoding the actual things that you trained it on.", 'start': 11358.879, 'duration': 6.945}, {'end': 11374.167, 'text': 'So my intuition is actually most of the dimensions of the latent are kind of just encoding low frequency information.', 'start': 11366.284, 'duration': 7.883}], 'summary': 'Deep learning models encode high frequency info in a minimally distributed way to denoise predictive task.', 'duration': 29.137, 'max_score': 11345.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11345030.jpg'}, {'end': 11430.512, 'src': 'embed', 'start': 11407.822, 'weight': 8, 'content': [{'end': 11417.567, 'text': 'If it were too much space, I would encode the same information pretty much redundantly in many of the dimensions that I could.', 'start': 11407.822, 'duration': 9.745}, {'end': 11424.55, 'text': "But that's noisy, right? So you're taking a softmax, and if you're noisily aggregating over all of those features, you don't want to do that.", 'start': 11417.587, 'duration': 6.963}, {'end': 11430.512, 'text': 'But still, if I have backprop, right, then the backprop path goes through each of the dimensions.', 'start': 11425.968, 'duration': 4.544}], 'summary': 'Encoding redundant information in multiple dimensions can add noise to the softmax aggregation process, affecting backpropagation.', 'duration': 22.69, 'max_score': 11407.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11407822.jpg'}, {'end': 11587.189, 'src': 'embed', 'start': 11558.175, 'weight': 7, 'content': [{'end': 11563.256, 'text': "let's say it's just barely big enough to to to classify your images.", 'start': 11558.175, 'duration': 5.081}, {'end': 11565.057, 'text': "and let's suppose you're doing multi-class.", 'start': 11563.256, 'duration': 1.801}, {'end': 11571.24, 'text': "so we've got 10, 10, 10 classes, you know, and we just barely got enough latent space for it.", 'start': 11565.057, 'duration': 6.183}, {'end': 11580.825, 'text': "My intuition would tell me that if those classes are somewhat different from one another like it's not we're classifying brown dogs from white dogs,", 'start': 11571.92, 'duration': 8.905}, {'end': 11587.189, 'text': "from like every other, you know, simple kind of dog, but they're different from each other and it's not easy to determine.", 'start': 11580.825, 'duration': 6.364}], 'summary': 'If the latent space is barely enough for 10 different classes, with distinct differences, it might pose challenges in classification.', 'duration': 29.014, 'max_score': 11558.175, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11558175.jpg'}, {'end': 11680.809, 'src': 'embed', 'start': 11648.472, 'weight': 1, 'content': [{'end': 11657.041, 'text': "Whereas if you were to talk to a human, that stuff wouldn't happen as much.", 'start': 11648.472, 'duration': 8.569}, {'end': 11659.203, 'text': "Of course, there's the argument.", 'start': 11658.002, 'duration': 1.201}, {'end': 11667.826, 'text': "People that say, well, it's just interpolating or it might also be, you know, it's just sort of repeating the stuff in the training data.", 'start': 11660.024, 'duration': 7.802}, {'end': 11680.809, 'text': "I think what they'd like to see is more like the pattern that these models extract aren't sufficiently high level for them, right?", 'start': 11668.546, 'duration': 12.263}], 'summary': 'Humans offer better understanding than ai in data analysis.', 'duration': 32.337, 'max_score': 11648.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11648472.jpg'}, {'end': 11853.528, 'src': 'embed', 'start': 11820.516, 'weight': 2, 'content': [{'end': 11822.277, 'text': 'But how much refining is it?', 'start': 11820.516, 'duration': 1.761}, {'end': 11825.44, 'text': 'Is it more finding things that already work versus refining??', 'start': 11822.538, 'duration': 2.902}, {'end': 11834.054, 'text': "Yeah, it's well same, but what it is not is sort of learning from scratch.", 'start': 11826.004, 'duration': 8.05}, {'end': 11842.224, 'text': "That's what, like, people, like, we cannot, you cannot initialize a neural network from zeros and then have it learn well, at least not today.", 'start': 11834.134, 'duration': 8.09}, {'end': 11848.407, 'text': "Maybe that's going to come in in some form, but initialization is actually pretty important, pretty crazy, isn't it?", 'start': 11842.324, 'duration': 6.083}, {'end': 11853.528, 'text': "and and that's crazy and that's, yeah, that's like one hint that you know we're, we're not.", 'start': 11848.407, 'duration': 5.121}], 'summary': 'Discussion on neural network initialization and its importance in learning.', 'duration': 33.012, 'max_score': 11820.516, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw11820516.jpg'}], 'start': 10832.758, 'title': 'Challenges of neural networks', 'summary': 'Discusses challenges of integrating discrete reasoning with neural networks, latent space analysis, and limitations of gpt-3, addressing issues such as information redundancy, classification accuracy, and high-level pattern extraction.', 'chapters': [{'end': 11247.756, 'start': 10832.758, 'title': 'Discrete reasoning and neural networks', 'summary': 'Discusses the challenges and potential of integrating discrete reasoning on top of neural continuous function, the limitations of human discrete reasoning, and the implications of differentiable computation in neural networks.', 'duration': 414.998, 'highlights': ['The challenges of integrating discrete reasoning on top of neural continuous function are discussed, highlighting the difficulty of performing discrete reasoning in the human mind and the potential power of training a discrete algorithm on top of a continuous function. N/A', "The limitations of human discrete reasoning are explored, emphasizing the slow and deliberate nature of performing tasks like mental multiplication of five-digit numbers and the comparison to easily achievable multiplication by children's toys. N/A", 'The preference for differentiable computation in neural networks is addressed, with a focus on the desire for end-to-end learning and the potential sophistication of intelligence through discrete reasoning, albeit in a differentiable and learnable manner. N/A']}, {'end': 11619.468, 'start': 11248.576, 'title': 'Neural network latent space analysis', 'summary': 'Discusses the analysis of the latent space in neural networks, addressing the issue of interpolation and extrapolation, the impact of latent space dimensions on information redundancy, and the potential for sparsity and precision in distributing information, highlighting the implications for classification accuracy and the encoding of distinct classes.', 'duration': 370.892, 'highlights': ['The latent space in neural networks determines the presence of interpolation or extrapolation, with findings showing that on a trained ResNet, interpolation does not occur in a space of 30 dimensions or more.', 'Deep learning models encode high-frequency information in a minimally distributed way to denoise the predictive task, suggesting that most latent dimensions encode low-frequency information, potentially allowing for discarding of certain dimensions.', 'The size of the latent space impacts the encoding of information, with larger latent spaces leading to redundant encoding of information, thereby affecting the backpropagation process and the overall classification accuracy.', 'The distribution of information in the latent space, while potentially increasing precision, may have a minor impact, particularly in cases where the latent space is just sufficient for classification of distinct classes, potentially leading to specific bit encodings for different classes.']}, {'end': 11982.349, 'start': 11620.376, 'title': 'Limitations of gpt-3 and neural networks', 'summary': 'Discusses the limitations of gpt-3 and neural networks, questioning the ability to extract high-level patterns and the significance of initialization in training, while highlighting the evolutionary approach to learning.', 'duration': 361.973, 'highlights': ['The discussion revolves around the limitations of GPT-3 and neural networks in extracting high-level patterns and the potential fundamental limitations in achieving arbitrary patterns. The chapter delves into the limitations of GPT-3 and neural networks in extracting high-level patterns, questioning the potential fundamental limitations in achieving arbitrary patterns with such models.', 'The importance of initialization in training neural networks is emphasized, suggesting that the choice of initialization serves as a buffet for stochastic gradient descent to refine and find good combinations of weights. The significance of initialization in training neural networks is highlighted, emphasizing that the choice of initialization serves as a buffet for stochastic gradient descent to refine and find good combinations of weights.', 'The evolutionary approach to learning is discussed, comparing it to a massively distributed combinatorial trial and error search, highlighting that it differs from the conventional learning system. The chapter discusses the evolutionary approach to learning, comparing it to a massively distributed combinatorial trial and error search, and highlighting that it differs from the conventional learning system.']}], 'duration': 1149.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/86ib0sfdFtw/pics/86ib0sfdFtw10832758.jpg', 'highlights': ['The challenges of integrating discrete reasoning on top of neural continuous function are discussed, highlighting the difficulty of performing discrete reasoning in the human mind and the potential power of training a discrete algorithm on top of a continuous function.', "The limitations of human discrete reasoning are explored, emphasizing the slow and deliberate nature of performing tasks like mental multiplication of five-digit numbers and the comparison to easily achievable multiplication by children's toys.", 'The preference for differentiable computation in neural networks is addressed, with a focus on the desire for end-to-end learning and the potential sophistication of intelligence through discrete reasoning, albeit in a differentiable and learnable manner.', 'The latent space in neural networks determines the presence of interpolation or extrapolation, with findings showing that on a trained ResNet, interpolation does not occur in a space of 30 dimensions or more.', 'Deep learning models encode high-frequency information in a minimally distributed way to denoise the predictive task, suggesting that most latent dimensions encode low-frequency information, potentially allowing for discarding of certain dimensions.', 'The size of the latent space impacts the encoding of information, with larger latent spaces leading to redundant encoding of information, thereby affecting the backpropagation process and the overall classification accuracy.', 'The distribution of information in the latent space, while potentially increasing precision, may have a minor impact, particularly in cases where the latent space is just sufficient for classification of distinct classes, potentially leading to specific bit encodings for different classes.', 'The discussion revolves around the limitations of GPT-3 and neural networks in extracting high-level patterns and the potential fundamental limitations in achieving arbitrary patterns.', 'The importance of initialization in training neural networks is emphasized, suggesting that the choice of initialization serves as a buffet for stochastic gradient descent to refine and find good combinations of weights.', 'The evolutionary approach to learning is discussed, comparing it to a massively distributed combinatorial trial and error search, highlighting that it differs from the conventional learning system.']}], 'highlights': ['Yann LeCun refutes the claim that deep learning is limited to interpolation and not for extrapolation, emphasizing that the qualitative difference is not in the form of fundamentally different things from deep learning.', 'In high dimensions, all of machine learning is extrapolation Yann Lacoon argues that in high dimensions, machine learning operates through extrapolation rather than interpolation, challenging traditional notions of learning.', 'The revelation that neural networks recursively chop up the input space into little convex cells or polyhedra, leading to a shift in understanding of their behavior.', 'Realization that each input prediction is representable with a single linear affine transformation, leading to a decrease in the mystery surrounding neural networks.', 'The challenge of the curse of dimensionality and the need to know what to ignore in the input space due to exponential growth with dimensions.', 'The problem of extrapolation outside of training data and its implications for generalization in deep learning.', 'The deceptive nature of neural networks and the incorporation of human-crafted domain knowledge, impacting their blank slate perception.', "Randall's work demonstrates that a large class of neural networks can be entirely rewritten as compositions of linear functions arranged in polyhedra, providing new technical understanding and insights into neural network functioning.", 'The piecewise linear perspective on neural networks opens new avenues of technical understanding and exploration, offering a geometrically principled way of devising regularization penalty terms to improve neural network performance.', 'Neural networks quantize the input space like a vector search engine using locality-sensitive hashing, and they share information by reusing hyperplanes to form decision boundaries.', 'The composition of piecewise linear functions in the second layer forms a decision surface in the ambient space, creating a smooth appearance through the combination of piecewise linear chops.', 'Neural networks can extrapolate beyond the training data, potentially impacting the relevance of the extrapolated information for the specific problem.', 'Challenges in defining interpolation and extrapolation in multidimensional spaces, emphasizing the importance of considering the saliency of dimensions in defining these concepts.', 'Deep learning models struggle to generalize to data outside the training distribution.', 'Neural networks require relevant nonlinear transformations for efficient generalization.', 'Structural priors in models are crucial for systematic generalization.', 'The key endeavor in machine learning is to find better representations revealing the interpolative nature.', 'The efficiency of hybrid systems in combining discrete and continuous settings for efficient clustering and computation.', 'The interaction between continuous and discrete systems in current networks, leading to adaptive training of their discrete part through training of their continuous parameters.', 'The importance of both continuous and discrete systems interacting to avoid becoming suboptimal in certain regimes and achieve the best convergence rate.', 'The challenges and concepts related to high-dimensional interpolation are problem-specific and not universally applicable, indicating the need for task-specific and problem-specific approaches in addressing high-dimensional interpolation.', 'The discussion delves into the debate over interpolation and extrapolation in machine learning, with objections raised against drawing strong conclusions about the capabilities of neural networks and the need for further work to enhance their capabilities.', "The limitations of human discrete reasoning are explored, emphasizing the slow and deliberate nature of performing tasks like mental multiplication of five-digit numbers and the comparison to easily achievable multiplication by children's toys.", 'The latent space in neural networks determines the presence of interpolation or extrapolation, with findings showing that on a trained ResNet, interpolation does not occur in a space of 30 dimensions or more.', 'The distribution of information in the latent space, while potentially increasing precision, may have a minor impact, particularly in cases where the latent space is just sufficient for classification of distinct classes, potentially leading to specific bit encodings for different classes.']}