title
GEOMETRIC DEEP LEARNING BLUEPRINT

description
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB "Symmetry, as wide or narrow as you may define its meaning, is one idea by which man through the ages has tried to comprehend and create order, beauty, and perfection." and that was a quote from Hermann Weyl, a German mathematician who was born in the late 19th century. The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact tractable given enough computational horsepower. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, many tasks are not uniform and have strong repeating patterns as a result of the low-dimensionality and structure of the physical world. Geometric Deep Learning unifies a broad class of ML problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases. This week we spoke with Professor Michael Bronstein (head of graph ML at Twitter) and Dr. Petar Veličković (Senior Research Scientist at DeepMind), and Dr. Taco Cohen and Prof. Joan Bruna about their new proto-book Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. Enjoy the show! Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges https://arxiv.org/abs/2104.13478 [00:00:00] Tim Intro [00:01:55] Fabian Fuchs article [00:04:05] High dimensional learning and curse [00:05:33] Inductive priors [00:07:55] The proto book [00:09:37] The domains of geometric deep learning [00:10:03] Symmetries [00:12:03] The blueprint [00:13:30] NNs don't deal with network structure (TedX) [00:14:26] Penrose - standing edition [00:15:29] Past decade revolution (ICLR) [00:16:34] Talking about the blueprint [00:17:11] Interpolated nature of DL / intelligence [00:21:29] Going tack to Euclid [00:22:42] Erlangen program [00:24:56] “How is geometric deep learning going to have an impact” [00:26:36] Introduce Michael and Petar [00:28:35] Petar Intro [00:32:52] Algorithmic reasoning [00:36:16] Thinking fast and slow (Petar) [00:38:12] Taco Intro [00:46:52] Deep learning is the craze now (Petar) [00:48:38] On convolutions (Taco) [00:53:17] Joan Bruna's voyage into geometric deep learning [00:56:51] What is your most passionately held belief about machine learning? (Bronstein) [00:57:57] Is the function approximation theorem still useful? (Bruna) [01:11:52] Could an NN learn a sorting algorithm efficiently (Bruna) [01:17:08] Curse of dimensionality / manifold hypothesis (Bronstein) [01:25:17] Will we ever understand approximation of deep neural networks (Bruna) [01:29:01] Can NNs extrapolate outside of the training data? (Bruna) [01:31:21] What areas of math are needed for geometric deep learning? (Bruna) [01:32:18] Graphs are really useful for representing most natural data (Petar) [01:35:09] What was your biggest aha moment early (Bronstein) [01:39:04] What gets you most excited? (Bronstein) [01:39:46] Main show kick off + Conservation laws [01:49:10] Graphs are king [01:52:44] Vector spaces vs discrete [02:00:08] Does language have a geometry? Which domains can geometry not be applied? +Category theory [02:04:21] Abstract categories in language from graph learning [02:07:10] Reasoning and extrapolation in knowledge graphs [02:15:36] Transformers are graph neural networks? [02:21:31] Tim never liked positional embeddings [02:24:13] Is the case for invariance overblown? Could they actually be harmful? [02:31:24] Why is geometry a good prior? [02:34:28] Augmentations vs architecture and on learning approximate invariance [02:37:04] Data augmentation vs symmetries (Taco) [02:40:37] Could symmetries be harmful (Taco) [02:47:43] Discovering group structure (from Yannic) [02:49:36] Are fractals a good analogy for physical reality? [02:52:50] Is physical reality high dimensional or not? [02:54:30] Heuristics which deal with permutation blowups in GNNs [02:59:46] Practical blueprint of building a geometric network architecture [03:01:50] Symmetry discovering procedures [03:04:05] How could real world data scientists benefit from geometric DL? [03:07:17] Most important problem to solve in message passing in GNNs [03:09:09] Better RL sample efficiency as a result of geometric DL (XLVIN paper) [03:14:02] Geometric DL helping latent graph learning [03:17:07] On intelligence [03:23:52] Convolutions on irregular objects (Taco)

detail
{'title': 'GEOMETRIC DEEP LEARNING BLUEPRINT', 'heatmap': [{'end': 6403.183, 'start': 6271.185, 'weight': 0.848}], 'summary': "'geometric deep learning blueprint' covers various chapters discussing topics such as symmetries in modern machine learning, geometric deep learning, applications in computer vision and drug design, challenges and potential of machine learning, scale separation and symmetry in deep learning, challenges and applications in high-dimensional spaces, transformer models and applications, geometric and symmetric principles in data processing, advancements in graph neural networks, and more, emphasizing the importance of geometric deep learning and symmetries in diverse applications.", 'chapters': [{'end': 103.536, 'segs': [{'end': 103.536, 'src': 'embed', 'start': 21.996, 'weight': 0, 'content': [{'end': 27.998, 'text': 'motivates the design of rich function spaces with the capacity to interpolate over the data points.', 'start': 21.996, 'duration': 6.002}, {'end': 36.764, 'text': 'Now this mindset plays well with neural networks since even the simplest choices of inductive prior yield a dense class of functions.', 'start': 28.738, 'duration': 8.026}, {'end': 51.556, 'text': 'Now, symmetry, as wide or narrow as you may define its meaning, is one idea by which man, through the ages, has tried to comprehend and create order,', 'start': 38.045, 'duration': 13.511}, {'end': 53.417, 'text': 'beauty and perfection.', 'start': 51.556, 'duration': 1.861}, {'end': 60.502, 'text': 'And that was a quote from Hermann Weil, a German mathematician who was born in the 19th century.', 'start': 54.498, 'duration': 6.004}, {'end': 70.866, 'text': 'Now, since the early days, researchers have adapted neural networks to exploit the low-dimensional geometry arising from physical measurements,', 'start': 61.682, 'duration': 9.184}, {'end': 80.41, 'text': 'for example grids in images, sequences in time series or position and momentum in molecules and their associated symmetries,', 'start': 70.866, 'duration': 9.544}, {'end': 82.371, 'text': 'such as translation or rotation.', 'start': 80.41, 'duration': 1.961}, {'end': 86.832, 'text': 'Now folks, this is an epic special edition of MLST.', 'start': 83.391, 'duration': 3.441}, {'end': 92.793, 'text': "We've been working on this since May of this year, so please use the table of contents on YouTube if you want to skip around.", 'start': 87.172, 'duration': 5.621}, {'end': 95.494, 'text': 'The show is about three and a half hours long.', 'start': 93.193, 'duration': 2.301}, {'end': 103.536, 'text': 'The second half of the show, roughly speaking, is a traditional-style MLST episode, but the beginning part is a bit of an experiment,', 'start': 96.074, 'duration': 7.462}], 'summary': 'Neural networks utilize rich function spaces to interpolate data points, incorporating low-dimensional geometry from physical measurements and associated symmetries.', 'duration': 81.54, 'max_score': 21.996, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u821996.jpg'}], 'start': 0.209, 'title': 'Symmetries in modern machine learning', 'summary': 'Discusses the role of symmetries in modern machine learning, emphasizing the adaptation of neural networks to exploit low-dimensional geometry arising from physical measurements and the capacity to interpolate over data points, with a special edition of mlst being about three and a half hours long.', 'chapters': [{'end': 103.536, 'start': 0.209, 'title': 'Symmetries in modern machine learning', 'summary': 'Discusses the role of symmetries in modern machine learning, emphasizing the adaptation of neural networks to exploit low-dimensional geometry arising from physical measurements and the capacity to interpolate over data points, with a special edition of mlst being about three and a half hours long.', 'duration': 103.327, 'highlights': ['Modern machine learning operates with large, high-quality datasets and rich function spaces to interpolate over data points.', 'Neural networks are adapted to exploit low-dimensional geometry arising from physical measurements, like grids in images or sequences in time series.', 'The special edition of MLST discussed in the chapter is about three and a half hours long and emphasizes the adaptation of neural networks to exploit low-dimensional geometry.']}], 'duration': 103.327, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8209.jpg', 'highlights': ['Neural networks exploit low-dimensional geometry from physical measurements.', 'Modern machine learning interpolates over data points using rich function spaces.', 'The special edition of MLST discussed in the chapter is about three and a half hours long.']}, {'end': 1698.351, 'segs': [{'end': 483.039, 'src': 'embed', 'start': 456.629, 'weight': 0, 'content': [{'end': 464.153, 'text': 'this will reduce the space of possible functions that we search through, which means less risk of statistical error and less risk of overfitting.', 'start': 456.629, 'duration': 7.524}, {'end': 467.794, 'text': 'We should be able to do this without increasing approximation error,', 'start': 464.813, 'duration': 2.981}, {'end': 474.517, 'text': 'because we should know for sure that the true function has a certain geometrical property which will bias into the model.', 'start': 467.794, 'duration': 6.723}, {'end': 478.258, 'text': 'So introducing the Geometrical Deep Learning proto-book.', 'start': 475.237, 'duration': 3.021}, {'end': 483.039, 'text': 'So recently Professor Michael Bronstein, Professor Joanne Brunner,', 'start': 478.738, 'duration': 4.301}], 'summary': 'Reducing search space for functions, less risk of overfitting with geometrical deep learning proto-book.', 'duration': 26.41, 'max_score': 456.629, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8456629.jpg'}, {'end': 804.155, 'src': 'embed', 'start': 778.417, 'weight': 1, 'content': [{'end': 785.001, 'text': 'Now these building blocks provide a rich approximation space which have prescribed invariance and stability properties,', 'start': 778.417, 'duration': 6.584}, {'end': 790.964, 'text': 'by combining them together into a scheme that these researchers refer to as the geometric deep learning blueprint.', 'start': 785.001, 'duration': 5.963}, {'end': 796.973, 'text': 'Now, the researchers also introduced the concept of geometric stability,', 'start': 792.591, 'duration': 4.382}, {'end': 804.155, 'text': 'which extends the notion of group invariance and equivariance to approximate symmetry or transformations around the group.', 'start': 796.973, 'duration': 7.182}], 'summary': 'Researchers propose a geometric deep learning blueprint with prescribed invariance and stability properties, extending the notion of group invariance and equivariance to approximate symmetry or transformations.', 'duration': 25.738, 'max_score': 778.417, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8778417.jpg'}, {'end': 915.603, 'src': 'embed', 'start': 889.346, 'weight': 2, 'content': [{'end': 896.669, 'text': 'But if I were to compress the thousand plus pages of this book into just a single concept, I can capture it in one word.', 'start': 889.346, 'duration': 7.323}, {'end': 897.95, 'text': 'And this is symmetry.', 'start': 897.149, 'duration': 0.801}, {'end': 906.641, 'text': 'And symmetry is really fundamental concept and fundamental idea that underpins all modern physics as we know it.', 'start': 899.079, 'duration': 7.562}, {'end': 915.603, 'text': 'So, for example, the standard model of particle physics can entirely be derived from the considerations of symmetry.', 'start': 907.121, 'duration': 8.482}], 'summary': 'Symmetry is the fundamental concept in modern physics, underpinning the standard model of particle physics.', 'duration': 26.257, 'max_score': 889.346, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8889346.jpg'}, {'end': 1002.9, 'src': 'embed', 'start': 975.337, 'weight': 3, 'content': [{'end': 981.322, 'text': 'So we need some form of geometric unification in the spirit of the Erlangen program that I call geometric deep learning.', 'start': 975.337, 'duration': 5.985}, {'end': 982.943, 'text': 'It serves two purposes.', 'start': 981.902, 'duration': 1.041}, {'end': 988.047, 'text': 'First, to provide a common mathematical framework to derive the most successful neural network architectures.', 'start': 983.463, 'duration': 4.584}, {'end': 993.752, 'text': 'And second, to give a constructive procedure to build future architectures in a principled way.', 'start': 988.768, 'duration': 4.984}, {'end': 998.936, 'text': 'This is a very general design that can be applied to different types of geometric structures, such as grids,', 'start': 994.272, 'duration': 4.664}, {'end': 1002.9, 'text': 'homogeneous spaces with global transformation groups, crafts and manifolds,', 'start': 998.936, 'duration': 3.964}], 'summary': 'Geometric deep learning aims to unify neural network architectures and construct future ones using a common mathematical framework.', 'duration': 27.563, 'max_score': 975.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8975337.jpg'}, {'end': 1079.679, 'src': 'embed', 'start': 1052.455, 'weight': 4, 'content': [{'end': 1055.859, 'text': "it's a hard question because it has several terms that are not well defined.", 'start': 1052.455, 'duration': 3.404}, {'end': 1057.6, 'text': 'what do you define by intelligence?', 'start': 1055.859, 'duration': 1.741}, {'end': 1059.883, 'text': "so we don't understand what is human intelligence.", 'start': 1057.6, 'duration': 2.283}, {'end': 1067.57, 'text': "everybody probably gives a different meaning to this term, so it's hard for me to even to define and quantify artificial intelligence.", 'start': 1059.883, 'duration': 7.687}, {'end': 1071.413, 'text': "I don't think that we necessarily need to emulate human intelligence.", 'start': 1068.251, 'duration': 3.162}, {'end': 1078.198, 'text': 'And as you mentioned, in the past, we thought of artificial intelligence as being able to solve certain tasks.', 'start': 1071.553, 'duration': 6.645}, {'end': 1079.679, 'text': "And it's a kind of a moving target.", 'start': 1078.358, 'duration': 1.321}], 'summary': 'Defining human and artificial intelligence is challenging and constantly evolving.', 'duration': 27.224, 'max_score': 1052.455, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u81052455.jpg'}], 'start': 103.536, 'title': 'Geometric deep learning', 'summary': 'Discusses deep learning on sets, curse of dimensionality, geometric deep learning, importance of inductive bias, and symmetry in modern physics. it emphasizes the need for better function spaces and achieving artificial general intelligence.', 'chapters': [{'end': 354.868, 'start': 103.536, 'title': 'Deep learning on sets and curse of dimensionality', 'summary': 'Discusses the deep learning on sets, focusing on the concept of janossi pooling framework and approximate permutation invariance, and also explores the curse of dimensionality and its impact on high-dimensional learning.', 'duration': 251.332, 'highlights': ['High-dimensional learning is impossible due to the curse of dimensionality, leading to intractable learning and the need for strong assumptions about the regularities of the space of functions. The curse of dimensionality makes learning in high dimensions intractable, as the number of samples grows exponentially with the number of dimensions.', 'The concept of Janossi pooling framework provides a computationally tractable way of achieving permutation invariance in deep learning algorithms. Janossi pooling framework involves generating k-tuples of a set and averaging over the target function on those permutations, offering a computationally tractable approach to achieving permutation invariance.', 'Approximate permutation invariance, achieved by setting k to n and sampling the permutations, yields good results even with a small number of samples. Approximate permutation invariance, achieved by setting k to n and sampling the permutations, is shown to yield good results even with a small number of samples.']}, {'end': 866.013, 'start': 355.729, 'title': 'Geometric deep learning', 'summary': 'Discusses the importance of inductive bias in machine learning, the trade-off between statistical error, approximation error, and optimization error, the need for better function spaces, and the introduction of geometric deep learning, its principles, domains, and applications in biological sciences.', 'duration': 510.284, 'highlights': ['Geometric Deep Learning introduces a new class of function spaces based on geometrical priors to reduce statistical error and overfitting without increasing approximation error, as it provides a way to incorporate prior physical knowledge into neural architectures. By using geometrical priors, the space of possible functions that we search through is reduced, leading to less risk of statistical error and overfitting. This allows for the incorporation of prior physical knowledge into neural architectures, providing a principled way to build future architectures.', 'The blueprint of Geometric Deep Learning consists of three core principles: symmetry, scale separation, and geometric stability, which are recognized in popular deep neural network architectures and provide a rich approximation space with prescribed invariance and stability properties. The blueprint of Geometric Deep Learning includes three core principles: symmetry, scale separation, and geometric stability, providing a rich approximation space with prescribed invariance and stability properties, recognized in popular deep neural network architectures.', "Geometric Deep Learning enables the learning of network effects of clinically approved drugs and prediction of properties of other molecules, leading to success stories in industrial applications and featuring in major biological journals' covers. Geometric Deep Learning allowed learning the network effects of clinically approved drugs and predicting properties of other molecules, leading to success stories in industrial applications and featuring in major biological journals' covers."]}, {'end': 1128.699, 'start': 866.553, 'title': 'Symmetry in modern physics and deep learning', 'summary': 'Discusses the fundamental concept of symmetry in modern physics and its application in deep learning, emphasizing the need for geometric unification in neural network architectures and the challenges in achieving artificial general intelligence.', 'duration': 262.146, 'highlights': ['The concept of symmetry underpins all modern physics and can entirely derive the standard model of particle physics. Symmetry is a fundamental concept in modern physics and can entirely derive the standard model of particle physics.', 'Deep learning has revolutionized data science, but lacks unifying principles, leading to reinvention and rebranding of concepts. Deep learning has revolutionized data science, but lacks unifying principles, leading to reinvention and rebranding of concepts.', 'Geometric unification in the spirit of the Erlangen program, termed geometric deep learning, is proposed to provide a common mathematical framework and constructive procedure for building future neural network architectures. Geometric deep learning is proposed to provide a common mathematical framework and constructive procedure for building future neural network architectures.', 'The implementation of geometric unification principles leads to popular architectures in deep learning such as convolutional networks, graph neural networks, deepsets, and transformers. The implementation of geometric unification principles leads to popular architectures in deep learning such as convolutional networks, graph neural networks, deepsets, and transformers.', 'Challenges in defining and achieving artificial general intelligence are discussed, emphasizing the need for a deeper understanding of human intelligence and the distinct nature of artificial intelligence. Challenges in defining and achieving artificial general intelligence are discussed, emphasizing the need for a deeper understanding of human intelligence and the distinct nature of artificial intelligence.']}, {'end': 1698.351, 'start': 1129.701, 'title': 'Geometric deep learning: unifying principles', 'summary': "Introduces the concept of geometric deep learning as a unifying mindset derived from principles of symmetry and invariance, with parallels to the erlangen program's impact on unifying geometries and its spillover effects in mathematics, physics, and computer science.", 'duration': 568.65, 'highlights': ["The Erlangen program's unifying lens of studying geometry through invariances and symmetry had significant spillover effects in mathematics, physics, and computer science, providing a blueprint for deriving necessary geometries and demonstrating the power of unification. Unifying lens of studying geometry, spillover effects in mathematics, physics, and computer science, blueprint for deriving necessary geometries", 'Geometric deep learning serves as a unifying mindset, allowing the derivation of architectures and architectural primitives like convolution, attention, and graph convolution from first principles of symmetry and scale separation, offering a generic blueprint for designing new machine learning architectures. Derivation of architectures from first principles, generic blueprint for designing new machine learning architectures', 'Professor Michael Bronstein, a recognized expert in graph representation learning research, has made significant contributions in geometric deep learning and its application in solving real-world problems, with his work being cited over 21,000 times and his startup company, Fabula AI, being acquired by Twitter in 2019. Expertise in graph representation learning, work cited over 21,000 times, startup company acquired by Twitter']}], 'duration': 1594.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u8103536.jpg', 'highlights': ['Geometric deep learning introduces function spaces based on geometrical priors to reduce statistical error and overfitting without increasing approximation error.', 'The blueprint of Geometric Deep Learning consists of three core principles: symmetry, scale separation, and geometric stability, providing a rich approximation space with prescribed invariance and stability properties.', 'Symmetry is a fundamental concept in modern physics and can entirely derive the standard model of particle physics.', 'Geometric unification in the spirit of the Erlangen program, termed geometric deep learning, is proposed to provide a common mathematical framework and constructive procedure for building future neural network architectures.', 'Challenges in defining and achieving artificial general intelligence are discussed, emphasizing the need for a deeper understanding of human intelligence and the distinct nature of artificial intelligence.']}, {'end': 3401.672, 'segs': [{'end': 1768.95, 'src': 'embed', 'start': 1741.09, 'weight': 0, 'content': [{'end': 1744.534, 'text': "Petar's work has been published in the leading machine learning venues.", 'start': 1741.09, 'duration': 3.444}, {'end': 1754.364, 'text': 'Petar was the first author of Graph Attention Networks, a popular convolutional layer for graphs, and DeepGraph InfoMax, a scalable,', 'start': 1744.974, 'duration': 9.39}, {'end': 1756.907, 'text': 'unsupervised learning pipeline for graphs.', 'start': 1754.364, 'duration': 2.543}, {'end': 1758.247, 'text': 'Hi, everyone.', 'start': 1757.807, 'duration': 0.44}, {'end': 1761.928, 'text': "My name is Petr Velichkovich, and I'm a senior research scientist at DeepMind.", 'start': 1758.647, 'duration': 3.281}, {'end': 1768.95, 'text': "And previously, I have done my PhD in computer science at the University of Cambridge, where I'm actually still based.", 'start': 1762.528, 'duration': 6.422}], 'summary': 'Petar velickovic, senior research scientist at deepmind, published popular graph convolutional layer and unsupervised learning pipeline for graphs.', 'duration': 27.86, 'max_score': 1741.09, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u81741090.jpg'}, {'end': 1843.665, 'src': 'embed', 'start': 1818.463, 'weight': 1, 'content': [{'end': 1824.888, 'text': 'where I was suddenly exposed to a much wider wealth of computer science topics than just theoretical computer science and algorithms.', 'start': 1818.463, 'duration': 6.425}, {'end': 1828.07, 'text': 'And for a brief moment, my interests drifted elsewhere.', 'start': 1825.508, 'duration': 2.562}, {'end': 1835.72, 'text': 'Everything started to come back together when I started my final year project with Professor Pietro Leo at Cambridge.', 'start': 1829.437, 'duration': 6.283}, {'end': 1843.665, 'text': "And I had heard that bioinformatics is a topic that's brimming with classical algorithms and competitive programming algorithms specifically.", 'start': 1836.141, 'duration': 7.524}], 'summary': 'Exposure to diverse computer science topics led to focus on bioinformatics with competitive algorithms.', 'duration': 25.202, 'max_score': 1818.463, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u81818463.jpg'}, {'end': 1972.035, 'src': 'embed', 'start': 1945.581, 'weight': 2, 'content': [{'end': 1952.744, 'text': 'The field of graph representation learning has then spiraled completely out of control in terms of the quantity of papers being proposed.', 'start': 1945.581, 'duration': 7.163}, {'end': 1957.046, 'text': 'Only one year after the graph retention network paper came out,', 'start': 1953.744, 'duration': 3.302}, {'end': 1965.751, 'text': 'I was reviewing for some of the conferences in the area and I found on my reviewing stack four or five papers that were extending graph retention nets in one way or another.', 'start': 1957.046, 'duration': 8.705}, {'end': 1972.035, 'text': 'So the field certainly has become a lot more vibrant because of a nice barrier of entry, which is not too high.', 'start': 1966.211, 'duration': 5.824}], 'summary': 'Graph representation learning field has seen a rapid increase in papers and extensions, making it more vibrant with low entry barrier.', 'duration': 26.454, 'max_score': 1945.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u81945581.jpg'}, {'end': 2181.62, 'src': 'embed', 'start': 2159.477, 'weight': 3, 'content': [{'end': 2168.63, 'text': 'So it is probably just the beginning of a new way of doing science, essentially even as pure science, as creative science, as mathematics,', 'start': 2159.477, 'duration': 9.153}, {'end': 2171.473, 'text': 'which was considered really the hallmark of human intelligence.', 'start': 2168.63, 'duration': 2.843}, {'end': 2176.02, 'text': 'It can be maybe, if not replaced, assisted by artificial intelligence.', 'start': 2172.034, 'duration': 3.986}, {'end': 2181.62, 'text': "Petar invokes Daniel Kahneman's System 1 and System 2.", 'start': 2177.196, 'duration': 4.424}], 'summary': 'Ai may assist, not replace, human intelligence in science and mathematics.', 'duration': 22.143, 'max_score': 2159.477, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u82159477.jpg'}, {'end': 2468.965, 'src': 'embed', 'start': 2441.203, 'weight': 4, 'content': [{'end': 2450.411, 'text': 'they can be understood in one as essentially a special cases of one generic structure that we call the geometric deep learning blueprint.', 'start': 2441.203, 'duration': 9.208}, {'end': 2460.058, 'text': 'So to explain a little bit about what this is all about, the blueprint refers to, first of all, feature spaces.', 'start': 2451.212, 'duration': 8.846}, {'end': 2462.52, 'text': "So I'll explain how we model those.", 'start': 2460.338, 'duration': 2.182}, {'end': 2468.965, 'text': 'And then it refers to maps between feature spaces or layers of the network.', 'start': 2463.221, 'duration': 5.744}], 'summary': 'Geometric deep learning blueprint explains feature spaces and maps in networks.', 'duration': 27.762, 'max_score': 2441.203, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u82441203.jpg'}], 'start': 1698.951, 'title': 'Geometric deep learning research and applications', 'summary': "Covers dr. petar velickovic's work in geometric deep learning and graph neural networks, their applications in computer vision, computational biology, and drug design, along with the speaker's journey into geometric deep learning, algorithmic reasoning, and the blueprint for geometric deep learning.", 'chapters': [{'end': 1761.928, 'start': 1698.951, 'title': 'Graph learning research and applications', 'summary': 'Discusses the work of dr. petar velickovic and a professor at imperial college london in the field of geometric deep learning and graph neural networks, including their applications in computer vision, computational biology, and drug design.', 'duration': 62.977, 'highlights': ['Dr. Petar Velickovic is a senior research scientist at DeepMind in London and obtained his PhD from Trinity College in Cambridge, focusing on geometric deep learning and devising neural network architectures for graph representation learning and its applications in algorithmic reasoning and computational biology.', 'The professor at Imperial College London works on geometric deep learning, particularly focusing on graph neural networks and their diverse applications, including computer vision, graphics, computational biology, and drug design.', "Petar's notable contributions include being the first author of Graph Attention Networks, a popular convolutional layer for graphs, and DeepGraph InfoMax, a scalable, unsupervised learning pipeline for graphs, published in leading machine learning venues."]}, {'end': 1965.751, 'start': 1762.528, 'title': 'Geometric deep learning journey', 'summary': "Discusses the speaker's journey from competitive programming to geometric deep learning, including his shift from classical algorithms to graph representation learning and the development of graph attention networks, which led to his first top-tier conference publication at iclr 2018.", 'duration': 203.223, 'highlights': ["The speaker's transition from competitive programming to geometric deep learning, including his exposure to a wider range of computer science topics at Cambridge.", "The development of graph attention networks, a paper published at ICLR 2018, which became the speaker's most well-known work in the field.", 'The proliferation of papers in graph representation learning, with numerous extensions to graph attention networks proposed within a year of its publication.']}, {'end': 2421.354, 'start': 1966.211, 'title': 'Algorithmic reasoning and geometric deep learning', 'summary': 'Discusses the emerging field of algorithmic reasoning and its potential to integrate classical algorithms into neural networks, as well as the advancements in geometric deep learning and its applications across various domains.', 'duration': 455.143, 'highlights': ['Neural Architecture for Algorithmic Reasoning Petar proposed a neural architecture that outputs a graph of abstract and natural outputs, allowing classical algorithms to be applied to unoriginal inputs, potentially revolutionizing machine learning.', 'Geometric Deep Learning and Symmetry in Machine Learning Taco Cohen discusses the application of symmetry, invariance, and group theory to geometric deep learning, leading to various methods and applications across domains like medical imaging, climate data analysis, and molecular studies.', "System 2 Reasoning in Deep Learning The need for 'System 2' reasoning in deep learning to achieve reasoning and generalization, as highlighted by Petar invoking Daniel Kahneman's concepts, presents a potential gap in current neural network architectures.", 'The Fusion of Classical Algorithms and Neural Networks The fusion of classical algorithms and neural networks is seen as complementary and could bring unprecedented benefits, bringing together the strengths of both domains.', 'Future Implications for Mathematics and Science The potential for computer-aided mathematical proofs and the emergence of a new way of scientific discovery, combining human intelligence with artificial intelligence, is acknowledged, indicating a significant shift in traditional scientific paradigms.']}, {'end': 3401.672, 'start': 2421.354, 'title': 'Geometric deep learning blueprint', 'summary': 'Discusses the concept of the geometric deep learning blueprint, which unifies different architectures under a generic structure, emphasizing the importance of feature spaces, symmetries, and equivariant linear maps, with the key selling point of the protobook being its potential to guide future deep learning research.', 'duration': 980.318, 'highlights': ['The geometric deep learning blueprint unifies different architectures under a generic structure, emphasizing the importance of feature spaces, symmetries, and equivariant linear maps. Unification of architectures, emphasis on feature spaces, symmetries, equivariant linear maps', "The protobook's key selling point lies in its potential to guide future deep learning research by providing a blueprint for reasoning about existing architectures and any future ones. Guiding deep learning research, blueprint for reasoning about architectures", 'The concept of equivariant linear maps is discussed, emphasizing their importance and relevance in respecting the structure of feature spaces. Importance of equivariant linear maps, respecting structure of feature spaces', 'The discussion explores the historical context of geometric principles underlying deep learning architectures and the potential for unifying fields together. Historical context, unifying fields, geometric principles underlying architectures', 'The Fourier perspective on convolutions is detailed, including its application in generalized Fourier transforms for groups and its intuitive understanding as the most general kind of equivariant linear map. Fourier perspective on convolutions, generalized Fourier transforms for groups, intuitive understanding of convolutions']}], 'duration': 1702.721, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u81698951.jpg', 'highlights': ["Dr. Petar Velickovic's notable contributions include being the first author of Graph Attention Networks and DeepGraph InfoMax, published in leading machine learning venues.", "The speaker's transition from competitive programming to geometric deep learning, including his exposure to a wider range of computer science topics at Cambridge.", 'The proliferation of papers in graph representation learning, with numerous extensions to graph attention networks proposed within a year of its publication.', 'The potential for computer-aided mathematical proofs and the emergence of a new way of scientific discovery, combining human intelligence with artificial intelligence, is acknowledged, indicating a significant shift in traditional scientific paradigms.', 'The geometric deep learning blueprint unifies different architectures under a generic structure, emphasizing the importance of feature spaces, symmetries, and equivariant linear maps.']}, {'end': 3882.575, 'segs': [{'end': 3468.689, 'src': 'embed', 'start': 3444.112, 'weight': 0, 'content': [{'end': 3453.976, 'text': 'in order to make progress to the next level and make machine learning achieve its potential to become the transformative technology we trust and use ubiquitously,', 'start': 3444.112, 'duration': 9.864}, {'end': 3456.277, 'text': 'it must be built on solid mathematical foundations.', 'start': 3453.976, 'duration': 2.301}, {'end': 3460.637, 'text': 'And I also think that machine learning will drive future scientific breakthroughs.', 'start': 3457.352, 'duration': 3.285}, {'end': 3468.689, 'text': 'And probably a good litmus test would be a Nobel Prize awarded for a discovery made by or with the help of an ML system.', 'start': 3461.358, 'duration': 7.331}], 'summary': 'To advance machine learning, build on math foundations; ml to drive scientific breakthroughs, possibly leading to nobel prize.', 'duration': 24.577, 'max_score': 3444.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83444112.jpg'}, {'end': 3706.934, 'src': 'embed', 'start': 3677.707, 'weight': 1, 'content': [{'end': 3686.346, 'text': 'in the sense that Universal approximation has a flavor is a result that does not quantify.', 'start': 3677.707, 'duration': 8.639}, {'end': 3688.266, 'text': 'how many parameters do I need?', 'start': 3686.346, 'duration': 1.92}, {'end': 3694.189, 'text': "If I want to approximate functions, let's say I want to classify between different dog breeds.", 'start': 3688.486, 'duration': 5.703}, {'end': 3695.509, 'text': "this theorem doesn't tell me.", 'start': 3694.189, 'duration': 1.32}, {'end': 3698.27, 'text': 'how many parameters, how many neurons do I need for that?', 'start': 3695.509, 'duration': 2.761}, {'end': 3706.934, 'text': "It's a statement, but in that sense, it leaves you a little bit with your, it's a bittersweet result.", 'start': 3698.41, 'duration': 8.524}], 'summary': 'The universal approximation theorem lacks quantifiable parameters for function approximation, leaving it bittersweet.', 'duration': 29.227, 'max_score': 3677.707, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83677707.jpg'}, {'end': 3814.926, 'src': 'embed', 'start': 3792.454, 'weight': 2, 'content': [{'end': 3801.736, 'text': 'two different sources of prior information that one needs to bake into the problem to really go beyond this basic approximation result of neural nets.', 'start': 3792.454, 'duration': 9.282}, {'end': 3803.817, 'text': 'The first one, indeed, is invariance.', 'start': 3802.177, 'duration': 1.64}, {'end': 3814.926, 'text': 'this ability that you need to The fact that you actually put symmetries into the architecture is certainly going to have a benefit in terms of sample complexity.', 'start': 3803.817, 'duration': 11.109}], 'summary': 'Incorporating invariance and symmetries in neural network architecture can improve sample complexity.', 'duration': 22.472, 'max_score': 3792.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83792454.jpg'}], 'start': 3401.672, 'title': 'Machine learning challenges and depth in neural networks', 'summary': 'Explores the challenges and potential of machine learning, emphasizing the need for solid mathematical foundations. it also discusses the role of depth in neural networks, emphasizing the insufficiency of the universal approximation theorem in quantifying the number of parameters needed. additionally, it delves into the evolution of understanding the power of universal approximation in neural nets when combined with prior information about invariances, highlighting the importance of baking two different sources of prior information into the problem.', 'chapters': [{'end': 3652.95, 'start': 3401.672, 'title': 'Machine learning and universal approximation theorem', 'summary': "Discusses the challenges and potential of machine learning, emphasizing the need for solid mathematical foundations, and explores the relevance of the universal approximation theorem in today's neural networks, highlighting its guiding principles and limitations.", 'duration': 251.278, 'highlights': ['The importance of solid mathematical foundations for machine learning to achieve its potential as a transformative technology and drive future scientific breakthroughs.', 'The potential relevance of the Universal Approximation Theorem as a guiding principle in understanding why deep learning works and its limitations when combined with other elements in neural network architectures.', 'The description of the Universal Approximation Theorem as a general principle that allows the approximation of arbitrary functions with increasing parameters, providing guiding principles when combined with other elements in neural network architectures.']}, {'end': 3745.855, 'start': 3653.59, 'title': 'Role of depth in neural networks', 'summary': 'Discusses the role of depth in neural networks, emphasizing the insufficiency of the universal approximation theorem in quantifying the number of parameters needed to approximate functions, leaving the question of determining the required depth and parameters open.', 'duration': 92.265, 'highlights': ['The insufficiency of the universal approximation theorem in quantifying the number of parameters needed to approximate functions, leaving the question of determining the required depth and parameters open.', 'The necessary condition of the architecture checkmark for expressing more elements functions from the class, despite not being sufficient.', 'The lack of quantitative information provided by the universal approximation theorem in determining the number of parameters or neurons needed for specific tasks such as classifying different dog breeds.', 'The open question regarding how to quantitatively determine the necessary depth and parameters in neural networks, emphasizing the role of depth as the key open question.']}, {'end': 3882.575, 'start': 3745.855, 'title': 'Neural nets and prior information', 'summary': 'Discusses the evolution of understanding the power of universal approximation in neural nets when combined with prior information about invariances, highlighting the importance of baking two different sources of prior information into the problem to go beyond the basic approximation result, with a focus on the sample complexity and learning guarantee.', 'duration': 136.72, 'highlights': ['The importance of incorporating two different sources of prior information, such as invariances, into neural nets to go beyond the basic approximation result, with a focus on sample complexity and learning guarantee.', 'The discovery that incorporating only symmetries as prior information is not sufficient for efficient learning and establishing good sample complexity, indicating the need for additional sources of prior information.', 'The impact of incorporating symmetries into the architecture of neural nets, leading to more efficient learning in terms of sample complexity.']}], 'duration': 480.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83401672.jpg', 'highlights': ['The importance of solid mathematical foundations for machine learning to achieve its potential as a transformative technology and drive future scientific breakthroughs.', 'The insufficiency of the universal approximation theorem in quantifying the number of parameters needed to approximate functions, leaving the question of determining the required depth and parameters open.', 'The importance of incorporating two different sources of prior information, such as invariances, into neural nets to go beyond the basic approximation result, with a focus on sample complexity and learning guarantee.']}, {'end': 4545.017, 'segs': [{'end': 3910.942, 'src': 'embed', 'start': 3883.581, 'weight': 1, 'content': [{'end': 3886.584, 'text': 'but still needs a lot of examples to be able to learn.', 'start': 3883.581, 'duration': 3.003}, {'end': 3889.646, 'text': 'So what we need is to add something else into the mix.', 'start': 3887.204, 'duration': 2.442}, {'end': 3894.13, 'text': 'And this is something that we call in the book this scale separation.', 'start': 3890.227, 'duration': 3.903}, {'end': 3900.296, 'text': 'And if you want, I can try to very briefly give you an intuitive idea of what this means.', 'start': 3895.451, 'duration': 4.845}, {'end': 3906.019, 'text': 'If you think about the problem of classifying an image, like a dog or the cat.', 'start': 3902.057, 'duration': 3.962}, {'end': 3910.942, 'text': 'So what is given to you is like a big branch of pixels, right? Every pixel has a color value.', 'start': 3906.52, 'duration': 4.422}], 'summary': 'The ai system needs more examples to learn, and adding scale separation may help in image classification.', 'duration': 27.361, 'max_score': 3883.581, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83883581.jpg'}, {'end': 4030.834, 'src': 'embed', 'start': 4000.373, 'weight': 4, 'content': [{'end': 4003.895, 'text': 'This idea, as you might imagine, is not new.', 'start': 4000.373, 'duration': 3.522}, {'end': 4005.435, 'text': "It's not specific to deep learning.", 'start': 4003.935, 'duration': 1.5}, {'end': 4012.599, 'text': 'The idea that you can take a complicated system of interacting particles and break it into different scales.', 'start': 4006.216, 'duration': 6.383}, {'end': 4016.381, 'text': 'this is at the basis of essentially all of physics and chemistry, right?', 'start': 4012.599, 'duration': 3.782}, {'end': 4020.443, 'text': "There's many, many you know, like when people study, even like biology, life right?", 'start': 4016.401, 'duration': 4.042}, {'end': 4025.187, 'text': 'You have experts that are very experts at the molecular level.', 'start': 4021.643, 'duration': 3.544}, {'end': 4030.834, 'text': 'Then you have experts that might understand, like doctors that understand things at the level of functions.', 'start': 4025.668, 'duration': 5.166}], 'summary': 'Complex systems can be broken into different scales, a foundational concept in physics and chemistry.', 'duration': 30.461, 'max_score': 4000.373, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84000373.jpg'}, {'end': 4068.219, 'src': 'embed', 'start': 4042.564, 'weight': 2, 'content': [{'end': 4048.326, 'text': "We don't have the full mathematical picture of, for example, why this scale separation is strictly necessary.", 'start': 4042.564, 'duration': 5.762}, {'end': 4055.388, 'text': 'What we know from empirical evidence that is now, I would say, indisputable is that this is an efficient way to do that.', 'start': 4048.826, 'duration': 6.562}, {'end': 4063.015, 'text': 'Because when I was reading the proto book, I noticed that there was a separation between the symmetries and the scale separation.', 'start': 4056.409, 'duration': 6.606}, {'end': 4068.219, 'text': 'Could you explain in simple terms why is the scale separation not just a symmetry as well?', 'start': 4063.455, 'duration': 4.764}], 'summary': 'Scale separation is efficient despite incomplete mathematical understanding.', 'duration': 25.655, 'max_score': 4042.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84042564.jpg'}, {'end': 4184.799, 'src': 'embed', 'start': 4153.885, 'weight': 3, 'content': [{'end': 4160.469, 'text': 'You can also see these principles appearing completely everywhere as you study physical systems right?', 'start': 4153.885, 'duration': 6.584}, {'end': 4167.233, 'text': 'Like the scale and the symmetry are really at the core of many physical theories.', 'start': 4160.488, 'duration': 6.745}, {'end': 4181.457, 'text': "And I would say that there's also at the more maybe technical level these two priors somehow have been instrumental to organize,", 'start': 4167.733, 'duration': 13.724}, {'end': 4184.799, 'text': 'basically to have a kind of a recipe to build architectures.', 'start': 4181.457, 'duration': 3.342}], 'summary': 'Scale and symmetry are core principles in physical theories and instrumental for organizing architectures.', 'duration': 30.914, 'max_score': 4153.885, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84153885.jpg'}, {'end': 4463.449, 'src': 'embed', 'start': 4421.13, 'weight': 0, 'content': [{'end': 4429.697, 'text': 'So you can basically try to understand, quantify the gains of sample complexity of learning without symmetry versus learning with symmetry.', 'start': 4421.13, 'duration': 8.567}, {'end': 4439.786, 'text': 'And so the punchline of this work, the recent work that we completed, is that one can actually quantify the sample complexity gains,', 'start': 4430.578, 'duration': 9.208}, {'end': 4442.408, 'text': 'and these are of the order of the size of the group.', 'start': 4439.786, 'duration': 2.622}, {'end': 4451.543, 'text': 'And so here, in the case of permutations, what it means is that if a learner is aware of the symmetry,', 'start': 4443.479, 'duration': 8.064}, {'end': 4459.647, 'text': 'one training example of the symmetric learner is roughly equivalent to n factorial samples of the agnostic learner,', 'start': 4451.543, 'duration': 8.104}, {'end': 4463.449, 'text': 'which is something that you would expect if you think in terms of data augmentation.', 'start': 4459.647, 'duration': 3.802}], 'summary': 'Quantify gains of learning with symmetry, equivalent to n factorial samples of agnostic learner.', 'duration': 42.319, 'max_score': 4421.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84421130.jpg'}], 'start': 3883.581, 'title': 'Scale separation and symmetry in deep learning', 'summary': 'Explains scale separation in deep learning, emphasizing its importance in breaking down complex problems across different scales, and explores the benefits of symmetry in neural networks, with recent work demonstrating significant gains in sample complexity.', 'chapters': [{'end': 4105.986, 'start': 3883.581, 'title': 'Scale separation in deep learning', 'summary': 'Explains the concept of scale separation in deep learning, highlighting the importance of breaking down a complex problem into sub-problems across different scales, drawing comparisons to physics and biology. it also discusses the efficient use of scale separation in deep neural networks, supported by empirical evidence.', 'duration': 222.405, 'highlights': ['The concept of scale separation in deep learning is explained, emphasizing the importance of breaking down a complex problem into sub-problems across different scales. The chapter discusses the idea of breaking a complicated problem into families of sub-problems that exist in different scales, highlighting the efficient nature of deep neural networks in achieving this.', 'Comparisons are drawn between the concept of scale separation in deep learning and its application in physics, chemistry, and biology. The discussion establishes the similarity between breaking down complex systems into different scales in deep learning and in fields such as physics, chemistry, and biology, highlighting the natural tendency to approach complex problems in this manner.', 'The efficiency of scale separation in deep neural networks is supported by empirical evidence, despite the lack of a complete mathematical understanding of its necessity. Empirical evidence is cited to support the efficiency of scale separation in deep neural networks, even though there is not a complete mathematical understanding of why scale separation is strictly necessary, emphasizing the practical effectiveness of this approach.']}, {'end': 4545.017, 'start': 4106.526, 'title': 'Symmetric deep learning blueprint', 'summary': 'Explores the interaction of scale and symmetry in neural networks, and how leveraging symmetry results in significant gains in sample complexity, with recent work showing that a symmetric learner gains sample complexity equivalent to n factorial samples of an agnostic learner.', 'duration': 438.491, 'highlights': ['Recent work shows that a symmetric learner gains sample complexity equivalent to n factorial samples of an agnostic learner. The recent work completed quantifies the gains of sample complexity, showing that a symmetric learner gains sample complexity equivalent to n factorial samples of an agnostic learner, resulting in significant gains in sample efficiency.', 'The scale and symmetry are at the core of many physical theories and have been instrumental in organizing architectures. The scale and symmetry are at the core of many physical theories and have been instrumental in organizing architectures, providing a fundamental principle for building efficient neural networks and appearing in various domains such as grids, groups, and graphs.', 'Symmetry in computational tasks, such as sorting algorithms, results in gains of sample complexity. Symmetry in computational tasks, such as sorting algorithms, results in gains of sample complexity, where the recent work quantifies these gains to be of the order of the size of the group, highlighting the advantage of learning with symmetry.']}], 'duration': 661.436, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u83883581.jpg', 'highlights': ['Recent work shows that a symmetric learner gains sample complexity equivalent to n factorial samples of an agnostic learner.', 'The concept of scale separation in deep learning is explained, emphasizing the importance of breaking down a complex problem into sub-problems across different scales.', 'The efficiency of scale separation in deep neural networks is supported by empirical evidence, despite the lack of a complete mathematical understanding of its necessity.', 'The scale and symmetry are at the core of many physical theories and have been instrumental in organizing architectures.', 'Comparisons are drawn between the concept of scale separation in deep learning and its application in physics, chemistry, and biology.', 'Symmetry in computational tasks, such as sorting algorithms, results in gains of sample complexity.']}, {'end': 6004.517, 'segs': [{'end': 4572.832, 'src': 'embed', 'start': 4545.297, 'weight': 1, 'content': [{'end': 4548.72, 'text': 'So this dependency, in fact, is exponential in dimension.', 'start': 4545.297, 'duration': 3.423}, {'end': 4554.424, 'text': 'So basically, the sample complexity gains by invariance, they are exponential in dimension.', 'start': 4548.74, 'duration': 5.684}, {'end': 4558.367, 'text': 'But they are fighting an impossible problem that is already cursed by dimension.', 'start': 4555.185, 'duration': 3.182}, {'end': 4559.468, 'text': 'So what it means is that,', 'start': 4558.448, 'duration': 1.02}, {'end': 4572.832, 'text': 'at the end of the day This is what was in the heart of what I was saying before is that invariance alone might not be sufficient.', 'start': 4559.468, 'duration': 13.364}], 'summary': 'Sample complexity gains by invariance are exponential in dimension.', 'duration': 27.535, 'max_score': 4545.297, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84545297.jpg'}, {'end': 4768.389, 'src': 'embed', 'start': 4736.125, 'weight': 2, 'content': [{'end': 4738.267, 'text': 'This can be captured also in the form of symmetry.', 'start': 4736.125, 'duration': 2.142}, {'end': 4741.931, 'text': 'For example, image is not just a high dimensional vector.', 'start': 4738.307, 'duration': 3.624}, {'end': 4745.135, 'text': 'It has underlying grid structure and grid structure has symmetry.', 'start': 4742.372, 'duration': 2.763}, {'end': 4751.202, 'text': 'This is what captured in convolutional networks in the form of shared weights that translates into the convolution operation.', 'start': 4745.375, 'duration': 5.827}, {'end': 4754.04, 'text': 'I think the symmetries are part of the magic here,', 'start': 4751.758, 'duration': 2.282}, {'end': 4761.224, 'text': "because it's not just the interesting observation that natural data is only spatially novel in so few dimensions.", 'start': 4754.04, 'duration': 7.184}, {'end': 4763.206, 'text': "There's something magic about symmetries.", 'start': 4761.705, 'duration': 1.501}, {'end': 4768.389, 'text': 'And when we spoke to Francois Chollet recently, he invoked the kaleidoscope effect,', 'start': 4763.226, 'duration': 5.163}], 'summary': 'Convolutional networks capture symmetries in high-dimensional data, leading to shared weights and convolution operations.', 'duration': 32.264, 'max_score': 4736.125, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84736125.jpg'}, {'end': 5004.912, 'src': 'embed', 'start': 4965.161, 'weight': 3, 'content': [{'end': 4967.683, 'text': 'you need to actually solve these three problems at once, right?', 'start': 4965.161, 'duration': 2.522}, {'end': 4974.168, 'text': "You need to be able to say that, in their conditions that you're studying, you have an algorithm, that it does not suffer from approximation,", 'start': 4967.703, 'duration': 6.465}, {'end': 4976.11, 'text': 'nor statistical nor computational crisis.', 'start': 4974.168, 'duration': 1.942}, {'end': 4981.942, 'text': "So as you can imagine, it's very hard, right? Because you need to master many things at the same time.", 'start': 4977.401, 'duration': 4.541}, {'end': 4990.604, 'text': 'So why do we think that geometric deep learning is at least an important piece to overcome this curse? As I said before.', 'start': 4982.462, 'duration': 8.142}, {'end': 4996.646, 'text': 'so geometric deep learning is really a device to put more structure into the target function right?', 'start': 4990.604, 'duration': 6.042}, {'end': 4999.647, 'text': 'So, basically, to make the learning problem easier because we are.', 'start': 4996.686, 'duration': 2.961}, {'end': 5004.912, 'text': 'We are promising the learner more properties about the target function.', 'start': 5000.307, 'duration': 4.605}], 'summary': 'Geometric deep learning aims to solve multiple problems with algorithms that avoid approximation, statistical, and computational issues, making the learning process easier by providing more structure and properties in the target function.', 'duration': 39.751, 'max_score': 4965.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84965161.jpg'}, {'end': 5598.974, 'src': 'embed', 'start': 5573.52, 'weight': 0, 'content': [{'end': 5579.006, 'text': 'In the domain of computational chemistry, where you can represent molecules as graphs of atoms and bonds between them.', 'start': 5573.52, 'duration': 5.486}, {'end': 5589.117, 'text': 'the graph neural networks have already proven impactful in detecting novel potent antibiotics that previously were completely overlooked because of their unusual structure.', 'start': 5579.006, 'duration': 10.111}, {'end': 5592.283, 'text': 'In the area of chip design.', 'start': 5590.399, 'duration': 1.884}, {'end': 5598.974, 'text': "graph neural networks are powering systems that are developing the latest generation of Google's machine learning chips, the TPU.", 'start': 5592.283, 'duration': 6.691}], 'summary': 'Graph neural networks impact computational chemistry and chip design for google tpu.', 'duration': 25.454, 'max_score': 5573.52, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u85573520.jpg'}, {'end': 5826.081, 'src': 'embed', 'start': 5803.311, 'weight': 5, 'content': [{'end': 5810.623, 'text': 'And I think the answer why you might still be interested in data efficiency first of all, there are applications like, say, medical imaging,', 'start': 5803.311, 'duration': 7.312}, {'end': 5813.607, 'text': 'where acquiring labeled data simply is very cost.', 'start': 5810.623, 'duration': 2.984}, {'end': 5815.851, 'text': 'You have to get patients.', 'start': 5814.108, 'duration': 1.743}, {'end': 5817.613, 'text': "you're dealing with privacy restrictions.", 'start': 5815.851, 'duration': 1.762}, {'end': 5822.178, 'text': "you're dealing with costly highly trained doctors who have to annotate the data,", 'start': 5817.613, 'duration': 4.565}, {'end': 5826.081, 'text': 'come together in a committee to decide on questionable cases and so forth.', 'start': 5822.178, 'duration': 3.903}], 'summary': 'Data efficiency is crucial in medical imaging due to high costs and privacy restrictions.', 'duration': 22.77, 'max_score': 5803.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u85803311.jpg'}], 'start': 4545.297, 'title': 'Challenges and applications in high-dimensional spaces', 'summary': 'Discusses the curse of dimensionality, the magic of symmetry, and geometric deep learning, emphasizing exponential increase in sample complexity due to invariance, the importance of low-dimensional structure, and the impact of graph neural networks in diverse applications.', 'chapters': [{'end': 4715.571, 'start': 4545.297, 'title': 'Curse of dimensionality: invariance and scale separation', 'summary': 'Discusses the curse of dimensionality, highlighting the exponential increase in sample complexity due to invariance and the implications for machine learning problems in high-dimensional spaces.', 'duration': 170.274, 'highlights': ['The curse of dimensionality leads to exponential increase in sample complexity due to invariance, making machine learning problems in high-dimensional spaces extremely challenging. Exponential increase in sample complexity, implications for machine learning problems in high-dimensional spaces', 'Combining symmetry prior with scale separation prior is crucial due to the exponential factor still present in the problem, despite removing an exponential factor through invariance. Importance of combining symmetry prior with scale separation prior', 'Most natural data is only spatially novel on very few dimensions, and machine learning problems nowadays are extremely high dimensional, making standard function interpolation approaches ineffective. Nature of natural data, challenges in high-dimensional machine learning problems']}, {'end': 4981.942, 'start': 4715.571, 'title': 'The magic of symmetry in machine learning', 'summary': 'Discusses the importance of symmetry and low-dimensional structure in data, explaining how it is captured in machine learning through convolutional networks and deep learning architectures, ultimately addressing the curse of dimensionality and the challenges it poses.', 'duration': 266.371, 'highlights': ['Symmetry and low-dimensional structure in data are crucial for machine learning, captured through convolutional networks and deep learning architectures. The discussion emphasizes the significance of symmetry and low-dimensional structure in data for machine learning, highlighting how these aspects are captured through convolutional networks and deep learning architectures.', "The curse of dimensionality poses challenges in algorithms' ability to maintain performance as data becomes more complex, leading to statistical, approximation, and computational issues. The curse of dimensionality presents challenges in algorithms' performance as data complexity increases, resulting in statistical, approximation, and computational issues.", 'Algorithms in machine learning must address approximation, statistical, and computational challenges simultaneously, requiring mastery of multiple aspects. Algorithms in machine learning need to simultaneously address approximation, statistical, and computational challenges, necessitating proficiency in multiple areas.']}, {'end': 5708.473, 'start': 4982.462, 'title': 'Geometric deep learning: theory and applications', 'summary': 'Explores the fundamental principles and challenges of geometric deep learning, emphasizing the need for theoretical guarantees in optimization and approximation properties, and highlights the significant impact of graph neural networks in diverse applications, from computational chemistry to product recommendation systems, and travel time predictions.', 'duration': 726.011, 'highlights': ["Graph neural networks have seen impactful applications in computational chemistry, chip design, and social networks, serving billions of users and significantly improving travel time predictions in Google Maps. Graph neural networks have proven impactful in detecting novel potent antibiotics in computational chemistry, powering systems for developing the latest generation of Google's machine learning chips, and serving various content in production to billions of users, while significantly improving travel time predictions in Google Maps.", 'Theoretical challenges exist in understanding the approximation properties of deep networks, with open questions regarding the ability to efficiently optimize architectures and the fundamental mathematical intuition behind the power of depth. Theoretical challenges exist in understanding the approximation properties of deep networks, with open questions regarding efficient optimization and the fundamental mathematical intuition behind the power of depth.', 'Geometric deep learning aims to put more structure into the target function, making the learning problem easier by promising more properties about the target function and reducing the hypothesis class. Geometric deep learning aims to put more structure into the target function, making the learning problem easier by promising more properties about the target function and reducing the hypothesis class.']}, {'end': 6004.517, 'start': 5709.229, 'title': 'Non-euclidean diffusion and data efficiency', 'summary': 'Discusses the elegance of variational methods in image processing, the relevance of data efficiency in different applications, and the connection between graph neural networks and diffusion equations, emphasizing the importance of data efficiency in intelligence and the need for generic priors.', 'duration': 295.288, 'highlights': ['The relevance of data efficiency in various applications, such as medical imaging and graph neural nets, is emphasized, with the potential to make economically infeasible problems feasible with improved data efficiency. improve data efficiency by a factor of two or 10', 'The connection between graph neural networks and diffusion equations is discussed, demonstrating the importance of data efficiency in intelligence and the need for generic priors to improve generalization and data efficiency. ', 'The elegance of variational methods in image processing and the use of non-Euclidean diffusion equations in interpreting graph neural networks as neural PDEs are highlighted. ']}], 'duration': 1459.22, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u84545297.jpg', 'highlights': ["Graph neural networks have proven impactful in detecting novel potent antibiotics in computational chemistry, powering systems for developing the latest generation of Google's machine learning chips, and serving various content in production to billions of users, while significantly improving travel time predictions in Google Maps.", 'The curse of dimensionality leads to exponential increase in sample complexity due to invariance, making machine learning problems in high-dimensional spaces extremely challenging.', 'Symmetry and low-dimensional structure in data are crucial for machine learning, captured through convolutional networks and deep learning architectures.', 'Algorithms in machine learning need to simultaneously address approximation, statistical, and computational challenges, necessitating proficiency in multiple areas.', 'Geometric deep learning aims to put more structure into the target function, making the learning problem easier by promising more properties about the target function and reducing the hypothesis class.', 'The relevance of data efficiency in various applications, such as medical imaging and graph neural nets, is emphasized, with the potential to make economically infeasible problems feasible with improved data efficiency.']}, {'end': 7621.126, 'segs': [{'end': 6080.054, 'src': 'embed', 'start': 6032.288, 'weight': 0, 'content': [{'end': 6036.393, 'text': 'A young mathematician based in Germany called Felix Klein,', 'start': 6032.288, 'duration': 4.105}, {'end': 6044.303, 'text': 'proposed this quite remarkable and groundbreaking idea that you can define geometry by studying the groups of symmetries,', 'start': 6036.393, 'duration': 7.91}, {'end': 6049.089, 'text': 'basically the kinds of transformations to which you can subject geometric forms.', 'start': 6044.303, 'duration': 4.786}, {'end': 6055.676, 'text': 'seeing how different properties are preserved or not under these transformations.', 'start': 6050.991, 'duration': 4.685}, {'end': 6072.165, 'text': 'So these ideas appear to be very powerful and what Amy Neuter showed in her work and she actually worked in the same institution where Klein ended up in Göttingen in Germany and she showed that you can take a physical system that is described as functional,', 'start': 6056.137, 'duration': 16.028}, {'end': 6080.054, 'text': 'as a variational system, and associate different conservation laws with different symmetries of this system.', 'start': 6072.165, 'duration': 7.889}], 'summary': 'Felix klein proposed defining geometry by studying symmetries, amy neuter showed associating conservation laws with symmetries in variational systems.', 'duration': 47.766, 'max_score': 6032.288, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86032288.jpg'}, {'end': 6164.97, 'src': 'embed', 'start': 6134.63, 'weight': 2, 'content': [{'end': 6139.793, 'text': "We're told that we can think of training neural networks as being a kind of program search.", 'start': 6134.63, 'duration': 5.163}, {'end': 6145.856, 'text': 'But you argued in your paper that algorithms possess fundamentally different qualities to deep learning methods.', 'start': 6140.413, 'duration': 5.443}, {'end': 6154.8, 'text': 'Francois Chollet actually often points out that deep learning algorithms would struggle to represent a sorting algorithm without learning point by point,', 'start': 6146.336, 'duration': 8.464}, {'end': 6157.502, 'text': 'which is to say without any generalization power whatsoever.', 'start': 6154.8, 'duration': 2.702}, {'end': 6164.97, 'text': 'But you seem to be making the argument that the interpolative function space of neural networks can model algorithms more closely to real-world problems,', 'start': 6157.882, 'duration': 7.088}], 'summary': 'Neural networks can model algorithms closely to real-world problems.', 'duration': 30.34, 'max_score': 6134.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86134630.jpg'}, {'end': 6403.183, 'src': 'heatmap', 'start': 6271.185, 'weight': 0.848, 'content': [{'end': 6277.429, 'text': 'So all these kinds of properties interpretability, compositionality and obviously also out of distribution,', 'start': 6271.185, 'duration': 6.244}, {'end': 6282.613, 'text': "generalization are plagued not by the fact that neural networks don't have the capacity to do this,", 'start': 6277.429, 'duration': 5.184}, {'end': 6289.077, 'text': 'but the routines we use to optimize them are not good enough to cross that divide.', 'start': 6282.613, 'duration': 6.464}, {'end': 6291.018, 'text': 'So, in neural algorithmic reasoning,', 'start': 6289.597, 'duration': 1.421}, {'end': 6299.522, 'text': "all that we're really trying to do is to bring these two sides closer together by making changes either in the structure of the neural network or the training regime of the neural network,", 'start': 6291.018, 'duration': 8.504}, {'end': 6307.106, 'text': "or the kinds of data that we let the neural network see, so that hopefully it's going to generalize better and extrapolate better.", 'start': 6299.522, 'duration': 7.584}, {'end': 6313.669, 'text': 'And especially on the kinds of, you know, classical algorithmic problems that we might see in a computer science textbook.', 'start': 6307.706, 'duration': 5.963}, {'end': 6317.852, 'text': "And lastly, I think I'd just like to address the point about sorting.", 'start': 6314.25, 'duration': 3.602}, {'end': 6326.599, 'text': 'We have a paper on algorithmic reasoning benchmarks that we are about to submit to the NeurIPS dataset track.', 'start': 6319.21, 'duration': 7.389}, {'end': 6331.945, 'text': "I think it should be public even now on GitHub, because that's the requirement for the conference,", 'start': 6326.679, 'duration': 5.266}, {'end': 6336.09, 'text': "where we have quite a few algorithmic tasks and we're trying to force GNNs to learn them.", 'start': 6331.945, 'duration': 4.145}, {'end': 6339.473, 'text': 'And we do have several sorting tasks in there.', 'start': 6336.77, 'duration': 2.703}, {'end': 6346.939, 'text': 'And at least in distribution, these graph neural networks are capable of imitating the steps of, say, insertion sort.', 'start': 6340.194, 'duration': 6.745}, {'end': 6353.023, 'text': "So I will say not all is lost if you're very careful about how you tune them, but obviously there is a lot of caveats.", 'start': 6347.179, 'duration': 5.844}, {'end': 6360.909, 'text': "And I hope that later during this chat, we'll also get a chance to talk a little bit about how, even though we cannot perfectly mimic algorithms,", 'start': 6353.043, 'duration': 7.866}, {'end': 6367.874, 'text': 'we can still use this concept of algorithmic execution today, now, to help expand the space of applicability of algorithms.', 'start': 6360.909, 'duration': 6.965}, {'end': 6375.821, 'text': 'Yeah, this is absolutely fascinating because this gets to the core of what I think some people point out as being the limitations of deep learning.', 'start': 6368.755, 'duration': 7.066}, {'end': 6383.388, 'text': "right?, Cholet spoke about this, but I don't think that geometric deep learning would help a neural network learn a sorting function,", 'start': 6375.821, 'duration': 7.567}, {'end': 6388.592, 'text': "because discrete problems in general don't seem amenable to vector spaces,", 'start': 6383.388, 'duration': 5.204}, {'end': 6395.797, 'text': 'either because the representation would be glitchy or the problem is not interpolative in nature or not learnable with stochastic gradient descent.', 'start': 6388.592, 'duration': 7.205}, {'end': 6403.183, 'text': 'So it would be fascinating if we could overcome these problems using continuous neural networks as an algorithmic substrate.', 'start': 6396.298, 'duration': 6.885}], 'summary': "Neural algorithmic reasoning aims to improve neural networks' generalization and extrapolation by modifying network structure, training, and data, with gnns showing promise in imitating sorting algorithms.", 'duration': 131.998, 'max_score': 6271.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86271185.jpg'}, {'end': 6533.59, 'src': 'embed', 'start': 6504.641, 'weight': 4, 'content': [{'end': 6506.822, 'text': 'What you actually need is something a bit more fine-grained.', 'start': 6504.641, 'duration': 2.181}, {'end': 6512.104, 'text': 'You just need for all like locally isomorphic parts of the graph, you need to behave the same.', 'start': 6507.362, 'duration': 4.742}, {'end': 6514.765, 'text': 'But in principle, it gives you a bit more flexibility.', 'start': 6512.544, 'duration': 2.221}, {'end': 6521.067, 'text': 'And I think that this kind of language, Like moving a bit away from the group formalism, would allow us to talk about, say,', 'start': 6514.785, 'duration': 6.282}, {'end': 6522.767, 'text': 'algorithmic invariance and things like this.', 'start': 6521.067, 'duration': 1.7}, {'end': 6528.749, 'text': "I don't yet have any theory to properly prove this, but it's something that I'm actively working on.", 'start': 6523.328, 'duration': 5.421}, {'end': 6530.829, 'text': 'And I guess I would say you know.', 'start': 6529.529, 'duration': 1.3}, {'end': 6533.59, 'text': 'the only question is would you still call this geometric deep learning?', 'start': 6530.829, 'duration': 2.761}], 'summary': 'Proposing a more flexible approach to geometric deep learning, focusing on algorithmic invariance and locally isomorphic parts of the graph.', 'duration': 28.949, 'max_score': 6504.641, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86504641.jpg'}, {'end': 6815.43, 'src': 'embed', 'start': 6788.219, 'weight': 3, 'content': [{'end': 6796.802, 'text': 'I think that there are many ways in which geometric deep learning is already, at least implicitly, powering discrete approaches,', 'start': 6788.219, 'duration': 8.583}, {'end': 6804.305, 'text': 'such as program synthesis, because there is a pretty big movement on these so-called dual approaches,', 'start': 6796.802, 'duration': 7.503}, {'end': 6812.728, 'text': 'where you stick a geometric deep learning architecture within a discrete tool that searches for the best solution.', 'start': 6804.305, 'duration': 8.423}, {'end': 6815.43, 'text': 'so, for example, in combinatorial optimization,', 'start': 6813.308, 'duration': 2.122}], 'summary': 'Geometric deep learning powers program synthesis and combinatorial optimization.', 'duration': 27.211, 'max_score': 6788.219, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86788219.jpg'}, {'end': 6901.344, 'src': 'embed', 'start': 6871.742, 'weight': 5, 'content': [{'end': 6877.407, 'text': "DeepMind's recently published a paper on this, where you can treat MIP problems as bipartite graphs,", 'start': 6871.742, 'duration': 5.665}, {'end': 6880.69, 'text': 'where you have variables on one side and the constraints on the other.', 'start': 6877.407, 'duration': 3.283}, {'end': 6883.832, 'text': 'And you link them together if a variable appears in a constraint.', 'start': 6881.19, 'duration': 2.642}, {'end': 6887.314, 'text': 'Then they run a graph neural network which, as we just discussed,', 'start': 6884.372, 'duration': 2.942}, {'end': 6894.859, 'text': 'is one of the flagship models in geometric deep learning over this bipartite graph to decide which variable the model should select next.', 'start': 6887.314, 'duration': 7.545}, {'end': 6901.344, 'text': 'And you can train this either as a separate kind of supervised technique to learn some kind of heuristic,', 'start': 6895.38, 'duration': 5.964}], 'summary': "Deepmind's paper proposes using graph neural network for mip problems, improving variable selection.", 'duration': 29.602, 'max_score': 6871.742, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86871742.jpg'}, {'end': 6949.923, 'src': 'embed', 'start': 6924.046, 'weight': 6, 'content': [{'end': 6933.555, 'text': 'but i would just like to offer another angle in which you can think of program synthesis as nothing other than just one more way to do language modeling right?', 'start': 6924.046, 'duration': 9.509}, {'end': 6942.279, 'text': 'Because synthesizing a program is not that different to synthesizing a sentence, maybe with a more stringent check on syntax and so forth.', 'start': 6933.675, 'duration': 8.604}, {'end': 6949.923, 'text': 'But you know, any technique that is applied to language modeling could in principle be applied for program synthesis.', 'start': 6942.98, 'duration': 6.943}], 'summary': 'Program synthesis can be viewed as another form of language modeling, with similar techniques applicable for both.', 'duration': 25.877, 'max_score': 6924.046, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86924046.jpg'}, {'end': 7057.361, 'src': 'embed', 'start': 7030.267, 'weight': 7, 'content': [{'end': 7036.552, 'text': 'Why are we doing that? There are multiple reasons why vector spaces are so popular in representation learning.', 'start': 7030.267, 'duration': 6.285}, {'end': 7042.055, 'text': 'Vectors are probably the most convenient representation for both humans and computers.', 'start': 7037.272, 'duration': 4.783}, {'end': 7046.117, 'text': 'We can do, for example, arithmetic operations with them, like addition or subtraction.', 'start': 7042.495, 'duration': 3.622}, {'end': 7049.679, 'text': 'We can represent them as arrays in the memory of the computer.', 'start': 7046.697, 'duration': 2.982}, {'end': 7057.361, 'text': 'there are also continuous objects, so it is very easy to use continuous optimization techniques in vector spaces.', 'start': 7050.439, 'duration': 6.922}], 'summary': 'Vector spaces popular for representation learning due to convenience for both humans and computers, supporting arithmetic operations and continuous optimization.', 'duration': 27.094, 'max_score': 7030.267, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87030267.jpg'}, {'end': 7201.915, 'src': 'embed', 'start': 7175.924, 'weight': 8, 'content': [{'end': 7180.506, 'text': 'In a hyperbolic space, the situation is very different because the volume grows exponentially with the radius.', 'start': 7175.924, 'duration': 4.582}, {'end': 7186.19, 'text': 'So it is way more convenient to use these spaces for graph embeddings.', 'start': 7181.127, 'duration': 5.063}, {'end': 7194.509, 'text': "And in fact, recent papers show that To achieve the same error in embedding of a graph in a hyperbolic space with, let's say, 10 dimensions,", 'start': 7186.43, 'duration': 8.079}, {'end': 7197.812, 'text': 'you would require something like a 100-dimensional Euclidean space.', 'start': 7194.509, 'duration': 3.303}, {'end': 7201.915, 'text': 'Of course I should say that metrics are just one example of a structure.', 'start': 7198.893, 'duration': 3.022}], 'summary': 'Hyperbolic spaces have exponential volume growth, making them more efficient for graph embeddings.', 'duration': 25.991, 'max_score': 7175.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87175924.jpg'}, {'end': 7270.029, 'src': 'embed', 'start': 7246.332, 'weight': 9, 'content': [{'end': 7253.058, 'text': 'where unsupervised language translation can be done by a geometric alignment of the latent spaces.', 'start': 7246.332, 'duration': 6.726}, {'end': 7256.76, 'text': "In my opinion, it's not something that describes the geometry of the language.", 'start': 7253.658, 'duration': 3.102}, {'end': 7261.784, 'text': 'It probably describes in a geometric way some semantics of the world.', 'start': 7257.261, 'duration': 4.523}, {'end': 7270.029, 'text': "And even though we have, linguistically speaking, very different languages like, let's say, English and Chinese, yet they describe the same reality.", 'start': 7262.324, 'duration': 7.705}], 'summary': 'Unsupervised language translation uses geometric alignment of latent spaces to describe the semantics of different languages.', 'duration': 23.697, 'max_score': 7246.332, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87246332.jpg'}, {'end': 7315.591, 'src': 'embed', 'start': 7292.481, 'weight': 12, 'content': [{'end': 7300.644, 'text': "I wouldn't probably use the term language, because it's a little bit loaded and probably some purists will be shocked by me saying that, for example,", 'start': 7292.481, 'duration': 8.163}, {'end': 7302.285, 'text': 'whales have a language.', 'start': 7300.644, 'duration': 1.641}, {'end': 7304.506, 'text': 'But we are studying the communication of sperm whales.', 'start': 7302.345, 'duration': 2.161}, {'end': 7307.407, 'text': 'So this is a big international collaboration called Project.', 'start': 7304.546, 'duration': 2.861}, {'end': 7315.591, 'text': "And I don't think that you can really model the concepts that whales need to describe and to deal with.", 'start': 7307.527, 'duration': 8.064}], 'summary': 'Studying sperm whale communication in an international collaboration called project.', 'duration': 23.11, 'max_score': 7292.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87292481.jpg'}, {'end': 7406.309, 'src': 'embed', 'start': 7376.092, 'weight': 11, 'content': [{'end': 7382.598, 'text': 'what we can look at is what is the local geometry of the words that you tend to use around it?', 'start': 7376.092, 'duration': 6.506}, {'end': 7386.662, 'text': 'And I mean, this kind of principle has been used all over the place.', 'start': 7383.318, 'duration': 3.344}, {'end': 7401.085, 'text': 'that has then been extended to graph structured observations generally with models like DeepWalk and Node2Vec.', 'start': 7395.822, 'duration': 5.263}, {'end': 7406.309, 'text': "Basically the same idea, treat a node's representation as everything that's around it, basically.", 'start': 7401.125, 'duration': 5.184}], 'summary': 'Analyzing local word geometry and extending to graph structured observations with models like deepwalk and node2vec.', 'duration': 30.217, 'max_score': 7376.092, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87376092.jpg'}, {'end': 7492.468, 'src': 'embed', 'start': 7464.668, 'weight': 10, 'content': [{'end': 7469.732, 'text': "Because people have said for quite a long time that there's a difference between syntax and semantics.", 'start': 7464.668, 'duration': 5.064}, {'end': 7478.44, 'text': 'And you could look at the geometrical structure of spoken language, or for example, you could look at the topology of the connections in your brain.', 'start': 7470.293, 'duration': 8.147}, {'end': 7483.684, 'text': 'The topology of reference frames in your brain is how you actually have learned concepts.', 'start': 7478.94, 'duration': 4.744}, {'end': 7492.468, 'text': "Would looking at the topology of spoken language tell you enough about abstract categories? That's a good question.", 'start': 7484.845, 'duration': 7.623}], 'summary': 'Exploring the difference between syntax and semantics in language and the topology of brain connections and reference frames.', 'duration': 27.8, 'max_score': 7464.668, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87464668.jpg'}], 'start': 6004.577, 'title': 'Geometric deep learning and neural approaches', 'summary': "Covers groundbreaking ideas in defining geometry through symmetries, noether's work in deriving conservation laws, limitations of deep learning, geometric deep learning's potential, neural networks in mip solving, and the geometry of language and communication.", 'chapters': [{'end': 6346.939, 'start': 6004.577, 'title': "Noether's theorem and neural algorithmic reasoning", 'summary': "Discusses the groundbreaking idea of defining geometry through symmetries proposed by felix klein, and highlights the significance of amy noether's work in deriving conservation laws from symmetries in physical systems, as well as the comparison between classical algorithms and deep neural networks in algorithmic reasoning.", 'duration': 342.362, 'highlights': ["Amy Noether's work derived conservation laws from symmetries in physical systems, a significant departure from the purely empirical approach to discovering conservation laws, marking a shift towards deriving laws from first principles. conservation laws were previously purely empirical", "Felix Klein's groundbreaking idea proposed defining geometry through studying groups of symmetries, revolutionizing the approach to understanding geometric forms and transformations. Felix Klein proposed defining geometry through studying groups of symmetries", 'The comparison between classical algorithms and deep neural networks in algorithmic reasoning emphasizes the challenges in optimizing neural networks to imitate algorithms in practice, particularly due to the limitations of stochastic gradient descent and the high dimensional space in which neural networks operate. challenges in optimizing neural networks to imitate algorithms in practice']}, {'end': 6815.43, 'start': 6347.179, 'title': 'Geometric deep learning and algorithmic invariance', 'summary': 'Explores the limitations of deep learning, the potential of geometric deep learning in overcoming problems with discrete problems, and the application of geometric principles to discrete program synthesis, with a focus on algorithmic invariance and graph neural networks.', 'duration': 468.251, 'highlights': ['The potential of geometric deep learning in overcoming problems with discrete problems, such as learning a sorting function, and the application of geometric principles to discrete program synthesis, with a focus on algorithmic invariance (e.g., using continuous neural networks as an algorithmic substrate).', "The exploration of geometric deep learning's limitations in handling discrete problems, including the challenges with representing discrete problems in vector spaces and the limitations of stochastic gradient descent in learning non-interpolative problems.", 'The discussion on the use of category theory as a potentially more suitable language for expressing algorithmic invariance, with the example of natural graph networks as a generalized concept of permutation equivariance to capture algorithmic invariance more flexibly.', 'The consideration of graphs as abstract models for systems of relations or interactions, their popularity as a modeling tool, and the potential for representing certain types of graphs as discretizations of continuous objects, leading to better performing architectures for graph neural networks.', 'The integration of geometric deep learning architectures within discrete tools for combinatorial optimization, indicating the implicit influence of geometric deep learning on discrete approaches like program synthesis.']}, {'end': 7197.812, 'start': 6815.43, 'title': 'Neural approaches for mip solving', 'summary': "Discusses the application of neural networks in solving mixed integer programs, citing deepmind's use of bipartite graphs and graph neural networks for mip solving, and the potential of applying language modeling techniques to program synthesis. it also explores the advantages of vector spaces and the emergence of hyperbolic spaces in graph embeddings.", 'duration': 382.382, 'highlights': ["DeepMind's use of bipartite graphs and graph neural networks for MIP solving DeepMind's paper presents a method of treating MIP problems as bipartite graphs and using graph neural networks to decide which variable the model should select next.", 'Application of language modeling techniques to program synthesis The discussion highlights the potential of applying techniques used in language modeling to program synthesis, drawing parallels between synthesizing a program and synthesizing a sentence.', 'Advantages of vector spaces in representation learning The advantages of vector spaces in representation learning are discussed, including their convenience for both humans and computers, ability to perform arithmetic operations, and ease of use with continuous optimization techniques.', 'Emergence of hyperbolic spaces in graph embeddings The emergence of hyperbolic spaces in graph embeddings is highlighted, offering advantages in representing certain types of graphs with exponentially growing neighbors and requiring lower dimensions compared to Euclidean space.']}, {'end': 7621.126, 'start': 7198.893, 'title': 'Geometry of language and communication', 'summary': 'Explores the potential geometry of language, discussing concepts such as unsupervised language translation through geometric alignment, universal structures in languages, non-human communication, and the topology of spoken language and brain connections.', 'duration': 422.233, 'highlights': ['The potential geometry of language and communication is explored, including unsupervised language translation through geometric alignment of latent spaces and the assumption that different languages describe the same reality (Relevance: High).', 'The discussion of the topology of spoken language and brain connections, including the implications of different sentence structures in translation and the impact of language on memory and testimony (Relevance: Medium).', 'The consideration of graph structures as a form of geometry and the analysis of local topology of words using models like DeepWalk and Node2Vec, along with the exploration of category theory as a means to reason about geometric concepts (Relevance: Medium).', 'The mention of potential universal structures in languages and the study of non-human communication, particularly the communication of sperm whales through Project, suggesting that non-human communications may not align with human language concepts (Relevance: Low).']}], 'duration': 1616.549, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u86004577.jpg', 'highlights': ["Amy Noether's work derived conservation laws from symmetries in physical systems, marking a shift towards deriving laws from first principles", 'Felix Klein proposed defining geometry through studying groups of symmetries, revolutionizing the approach to understanding geometric forms and transformations', 'The comparison between classical algorithms and deep neural networks emphasizes the challenges in optimizing neural networks to imitate algorithms in practice', 'The potential of geometric deep learning in overcoming problems with discrete problems, such as learning a sorting function, and the application of geometric principles to discrete program synthesis', 'The discussion on the use of category theory as a potentially more suitable language for expressing algorithmic invariance', "DeepMind's use of bipartite graphs and graph neural networks for MIP solving", 'Application of language modeling techniques to program synthesis', 'Advantages of vector spaces in representation learning', 'Emergence of hyperbolic spaces in graph embeddings', 'The potential geometry of language and communication is explored, including unsupervised language translation through geometric alignment of latent spaces and the assumption that different languages describe the same reality', 'The discussion of the topology of spoken language and brain connections, including the implications of different sentence structures in translation and the impact of language on memory and testimony', 'The consideration of graph structures as a form of geometry and the analysis of local topology of words using models like DeepWalk and Node2Vec', 'The mention of potential universal structures in languages and the study of non-human communication, particularly the communication of sperm whales through Project']}, {'end': 8090.253, 'segs': [{'end': 7714.979, 'src': 'embed', 'start': 7678.409, 'weight': 0, 'content': [{'end': 7682.893, 'text': 'So by reasoning I mean extrapolating new knowledge from existing knowledge.', 'start': 7678.409, 'duration': 4.484}, {'end': 7687.017, 'text': 'It feels like graph neural networks could at least be part of the solution here.', 'start': 7683.673, 'duration': 3.344}, {'end': 7690.942, 'text': 'And in your lecture series you mentioned the work by Cramner, which was explainable.', 'start': 7687.358, 'duration': 3.584}, {'end': 7697.29, 'text': 'GNNs, where they use some kind of symbolic regression to get a symbolic model from a graph neural network.', 'start': 7690.942, 'duration': 6.348}, {'end': 7703.729, 'text': "So do you think there's some really cool work we can do here? There is a little bit of divide in graph learning literature.", 'start': 7697.33, 'duration': 6.399}, {'end': 7711.216, 'text': 'So people working on graph neural networks and working on knowledge graphs, even though at least in principle, the methods are similar.', 'start': 7703.749, 'duration': 7.467}, {'end': 7714.979, 'text': 'For example, you typically do some form of embedding of the nodes of the graph.', 'start': 7711.556, 'duration': 3.423}], 'summary': 'Graph neural networks can improve knowledge extraction and integration in graph learning literature.', 'duration': 36.57, 'max_score': 7678.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87678409.jpg'}, {'end': 7764.191, 'src': 'embed', 'start': 7734.975, 'weight': 1, 'content': [{'end': 7739.964, 'text': 'And you can use standard generic message passing functions to model the interactions.', 'start': 7734.975, 'duration': 4.989}, {'end': 7747.818, 'text': 'Now, the step forward that they do is they replace these generic message passing functions with symbolic equations.', 'start': 7740.545, 'duration': 7.273}, {'end': 7754.042, 'text': 'And not only that this allows to generalize better, but you also have an interpretable system.', 'start': 7748.777, 'duration': 5.265}, {'end': 7759.046, 'text': 'You can recover from your data the laws of motion right?', 'start': 7754.362, 'duration': 4.684}, {'end': 7764.191, 'text': 'And if you think of how much time it took historically to people like Johannes Kepler, for example,', 'start': 7759.066, 'duration': 5.125}], 'summary': 'Using symbolic equations improves generalization and interpretability, enabling recovery of laws of motion from data.', 'duration': 29.216, 'max_score': 7734.975, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87734975.jpg'}, {'end': 7809.205, 'src': 'embed', 'start': 7781.006, 'weight': 3, 'content': [{'end': 7790.133, 'text': 'I think the point that particularly caught my attention in what you asked, Tim, was this interplay between graphs and reasoning and extrapolation,', 'start': 7781.006, 'duration': 9.127}, {'end': 7791.935, 'text': 'how that supports knowledge.', 'start': 7790.133, 'duration': 1.802}, {'end': 7799.959, 'text': 'Now, when it comes to how critical is this going to be, it depends on the environment in which you put your agent.', 'start': 7793.174, 'duration': 6.785}, {'end': 7809.205, 'text': 'Like, is it a closed environment or is it an open-ended environment where new information and new knowledge can come in in principle at any time?', 'start': 7800.239, 'duration': 8.966}], 'summary': 'Interplay between graphs, reasoning, and extrapolation supports knowledge in different environments.', 'duration': 28.199, 'max_score': 7781.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87781006.jpg'}, {'end': 7959.354, 'src': 'embed', 'start': 7940.347, 'weight': 4, 'content': [{'end': 7952.571, 'text': "graph neural networks have arisen as a very attractive primitive because there's been a few really exciting theoretical results coming out in recent years saying that the operations of a graph neural network align really,", 'start': 7940.347, 'duration': 12.224}, {'end': 7955.052, 'text': 'really well with dynamic programming algorithms.', 'start': 7952.571, 'duration': 2.481}, {'end': 7959.354, 'text': 'And dynamic programming is a very standard computational primitive.', 'start': 7955.552, 'duration': 3.802}], 'summary': 'Graph neural networks align well with dynamic programming algorithms', 'duration': 19.007, 'max_score': 7940.347, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87940347.jpg'}], 'start': 7621.146, 'title': 'Graphs and neural networks in knowledge acquisition', 'summary': 'Explores the impact of language on perception, the role of graph neural networks in knowledge acquisition, and the potential for modeling physical systems using graphs, enabling faster derivation of laws of motion. it also delves into the interplay between graphs and reasoning in neural networks, emphasizing the challenges and importance of extrapolation for future knowledge application.', 'chapters': [{'end': 7780.165, 'start': 7621.146, 'title': 'The power of graphs in language and knowledge acquisition', 'summary': 'Discusses the influence of language on perception, the role of graph neural networks in knowledge acquisition, and the potential for using graphs to model physical systems, highlighting the ability to derive laws of motion in seconds or minutes instead of years.', 'duration': 159.019, 'highlights': ["Graph Neural Networks (GNNs) and symbolic regression can potentially aid in knowledge acquisition by extrapolating new knowledge from existing knowledge, as mentioned in Professor Bronstein's lecture series.", 'The use of graphs to model physical systems, such as the body problem, and the replacement of generic message passing functions with symbolic equations, enables the recovery of interpretable systems and the derivation of laws of motion in a matter of seconds or minutes, a significant improvement from historical methods.', 'The divide between communities working on graph neural networks and knowledge graphs, despite the similarity in methods, reflects historical evolution in different fields but presents opportunities for collaboration and knowledge sharing.']}, {'end': 8090.253, 'start': 7781.006, 'title': 'Neural networks and algorithmic reasoning', 'summary': 'Discusses the interplay between graphs and reasoning in neural networks, emphasizing the importance of extrapolation for future knowledge application, and the challenges of algorithmic reasoning in neural networks, particularly in comparison to dynamic programming algorithms.', 'duration': 309.247, 'highlights': ['The interplay between graphs and reasoning in neural networks, and the importance of extrapolation for future knowledge application.', 'Challenges of algorithmic reasoning in neural networks, particularly in comparison to dynamic programming algorithms.', 'The potential of graph neural networks aligning well with dynamic programming algorithms, but the difficulty in learning to accommodate larger inputs.']}], 'duration': 469.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u87621146.jpg', 'highlights': ['Graph Neural Networks (GNNs) aid in knowledge acquisition by extrapolating new knowledge from existing knowledge.', 'Using graphs to model physical systems enables the derivation of laws of motion in seconds or minutes.', 'The divide between communities working on graph neural networks and knowledge graphs presents opportunities for collaboration and knowledge sharing.', 'The interplay between graphs and reasoning in neural networks is crucial for future knowledge application.', 'Challenges of algorithmic reasoning in neural networks are highlighted, particularly in comparison to dynamic programming algorithms.', 'Graph neural networks align well with dynamic programming algorithms, but face difficulty in accommodating larger inputs.']}, {'end': 9100.438, 'segs': [{'end': 8219.748, 'src': 'embed', 'start': 8178.381, 'weight': 3, 'content': [{'end': 8185.947, 'text': "I would start off by saying, like, I don't want to start this discussion just by saying, yes, transformers are graph neural networks.", 'start': 8178.381, 'duration': 7.566}, {'end': 8190.81, 'text': "This is why, end of story, because I feel like, you know, that doesn't touch upon the whole picture.", 'start': 8186.147, 'duration': 4.663}, {'end': 8197.155, 'text': "So let's look at this from a natural language processing angle, which is how most people have come to know about transformers.", 'start': 8190.931, 'duration': 6.224}, {'end': 8201.258, 'text': 'So imagine that you have a task which is specified on a sentence.', 'start': 8197.736, 'duration': 3.522}, {'end': 8208.842, 'text': "And you want to exploit the fact that words in the sentence interact, right? It's not just a bag of words.", 'start': 8202.299, 'duration': 6.543}, {'end': 8214.025, 'text': "There is, there's some interesting structure inside this bunch of words that you might want to exploit.", 'start': 8208.983, 'duration': 5.042}, {'end': 8219.748, 'text': 'When we were using recurrent neural networks, we assumed that the structure between the words was a line graph.', 'start': 8214.504, 'duration': 5.244}], 'summary': 'Transformers are not just graph neural networks; they exploit sentence structure.', 'duration': 41.367, 'max_score': 8178.381, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u88178381.jpg'}, {'end': 8327.544, 'src': 'embed', 'start': 8304.906, 'weight': 1, 'content': [{'end': 8314.094, 'text': 'I have effectively re-derived the transformer model equation without ever like using this specific transformer lingo.', 'start': 8304.906, 'duration': 9.188}, {'end': 8321.642, 'text': "So from this kind of angle, the fact that it's a model that operates over a complete graph of individual words.", 'start': 8314.556, 'duration': 7.086}, {'end': 8327.544, 'text': "in a way that you know once you've put all the embeddings to them, is permutation equivariant.", 'start': 8322.522, 'duration': 5.022}], 'summary': 'Re-derived the transformer model equation without using transformer lingo, operating over a complete graph of individual words.', 'duration': 22.638, 'max_score': 8304.906, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u88304906.jpg'}, {'end': 8457.788, 'src': 'embed', 'start': 8429.802, 'weight': 2, 'content': [{'end': 8438.95, 'text': 'you can straightforwardly generalize this sine or cosine positional coordinates that are used in transformers using the eigenvectors of the Laplacian.', 'start': 8429.802, 'duration': 9.148}, {'end': 8440.651, 'text': 'There are other techniques.', 'start': 8439.47, 'duration': 1.181}, {'end': 8449.338, 'text': 'you can actually show that you can make the message passing type neural network strictly more powerful than traditional message passing.', 'start': 8440.651, 'duration': 8.687}, {'end': 8457.788, 'text': 'The equivalent Weiss-Ferrer 11 graphoisomorphism test by using a special kind of structure-aware positional encoding.', 'start': 8449.538, 'duration': 8.25}], 'summary': 'Transformers can be generalized using eigenvectors, making neural networks more powerful.', 'duration': 27.986, 'max_score': 8429.802, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u88429802.jpg'}, {'end': 8600.948, 'src': 'embed', 'start': 8579.682, 'weight': 0, 'content': [{'end': 8589.051, 'text': 'But you use the graph as Petra described to model different long-distance relations between different tokens or words in a sentence.', 'start': 8579.682, 'duration': 9.369}, {'end': 8597.259, 'text': 'So you want to incorporate this prior knowledge that these nodes are not in arbitrary order, that they have some sentence order.', 'start': 8590.012, 'duration': 7.247}, {'end': 8600.948, 'text': 'And this principle applied more generally.', 'start': 8598.246, 'duration': 2.702}], 'summary': 'Graph model used to represent long-distance relations between tokens in a sentence.', 'duration': 21.266, 'max_score': 8579.682, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u88579682.jpg'}], 'start': 8090.833, 'title': 'Transformer models and applications', 'summary': 'Discusses transformer models, their positional encoding, and applications in modeling long-distance relations in sentences. it explores graph neural networks, transformers, and their applications in natural language processing, emphasizing challenges in defining optimal graph structures and the influence of hardware in the popularity of transformers. it also highlights the non-linearity of language structure and the task-dependent nature of graph representation learning, as well as the significance of positional embeddings and their connection to the discrete fourier transform.', 'chapters': [{'end': 8304.906, 'start': 8090.833, 'title': 'Graph neural networks and transformers', 'summary': 'Discusses the use of graph neural networks and transformers, exploring their applications in natural language processing and the challenges in defining optimal graph structures, with an emphasis on the non-linearity of language structure and the task-dependent nature of graph representation learning.', 'duration': 214.073, 'highlights': ['Graph neural networks simulate continuous computation under the hood, propelled by discrete optimization steps, while transformers are capable of formulating graph problems, with about 10 different papers suggesting their equivalence to RNNs, Hopfield networks, and graph neural networks.', 'The non-linearity of language structure challenges the assumption of line graphs in recurrent neural networks, prompting the exploration of optimal graph structures for representing long-range interactions between words in sentences.', 'In graph representation learning, the assumption of a complete graph is made when the optimal graph structure is unknown, allowing the graph neural network to determine the important connections, especially in the context of natural language processing tasks.', 'There is a task-dependent nature of graph representation learning, as different tasks may require different optimal graph structures, such as syntax trees for representing sentences, leading to the need for the graph neural network to adapt to these variations.', 'The chapter also addresses the debate on whether transformers are specifically graph neural networks or are so general that they can be used to formulate graph problems, emphasizing the need to consider natural language processing perspectives and the interaction of words in sentences.']}, {'end': 8561.192, 'start': 8304.906, 'title': 'Transformer model and positional encoding', 'summary': "Discusses the transformer model's self-attention equations, its permutation equivariant nature, and the significance of positional embeddings, highlighting their connection to the discrete fourier transform and potential extensions to general structures.", 'duration': 256.286, 'highlights': ['The transformer model operates over a complete graph of individual words, utilizing self-attention equations and permutation equivariance.', 'The positional embeddings of a transformer, including sine and cosine waves, hint at a sentential structure and its connection to the discrete Fourier transform.', 'Positional encoding can be extended to graphs using eigenvectors of the Laplacian, allowing for the detection of specialized structures not possible with traditional message passing algorithms.', 'There is a debate on the necessity of position tokens in language representation, with considerations for different language structures and the potential for encoding more powerful structures like cycles and rings.', 'The chapter also explores the desire to learn higher order structures in language representation while acknowledging the potential encoding of powerful structures with positional encodings.']}, {'end': 9100.438, 'start': 8561.212, 'title': 'Transformer applications and hardware lottery', 'summary': 'Discusses the application of transformers in modeling long-distance relations between tokens in a sentence, the importance of positional encoding, the trade-off between prior assumptions and learning, and the influence of hardware in the popularity of transformers. it also touches upon the hardware lottery and the potential impact of graph-oriented hardware in the future.', 'duration': 539.226, 'highlights': ["Transformers' application in modeling long-distance relations between tokens in a sentence The graph is used to model different long-distance relations between tokens or words in a sentence, incorporating the prior knowledge that the nodes have a specific sentence order.", 'Importance of positional encoding in message passing Positional encoding is used to specialize message passing for different portions of the graph, allowing training to decide whether to use this information or not.', 'Trade-off between prior assumptions and learning in machine learning The discussion revolves around the trade-off between modeling as much as possible and learning what is hard or impossible to model, considering factors such as computational complexity, availability of data, and hardware friendliness.', "Influence of hardware on the popularity of transformers The popularity of transformers is attributed to the hardware lottery, where the hardware's support for matrix multiplication routines efficiently on GPUs favors the use of transformers, despite the potential overheads of running a full graph neural network solution with message passing.", 'Potential impact of graph-oriented hardware in the future The chapter speculates about the possibility of graph-oriented hardware catching up to the trends in graph representation learning, potentially impacting the choice of graph-oriented hardware in the future.']}], 'duration': 1009.605, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u88090833.jpg', 'highlights': ["Transformers' application in modeling long-distance relations between tokens in a sentence", 'The transformer model operates over a complete graph of individual words, utilizing self-attention equations and permutation equivariance', 'Positional encoding can be extended to graphs using eigenvectors of the Laplacian, allowing for the detection of specialized structures not possible with traditional message passing algorithms', "The popularity of transformers is attributed to the hardware lottery, where the hardware's support for matrix multiplication routines efficiently on GPUs favors the use of transformers, despite the potential overheads of running a full graph neural network solution with message passing", 'The non-linearity of language structure challenges the assumption of line graphs in recurrent neural networks, prompting the exploration of optimal graph structures for representing long-range interactions between words in sentences']}, {'end': 10455.995, 'segs': [{'end': 9150.547, 'src': 'embed', 'start': 9119.848, 'weight': 5, 'content': [{'end': 9121.869, 'text': "right?. They're ubiquitous in natural data.", 'start': 9119.848, 'duration': 2.021}, {'end': 9129.433, 'text': 'But why exactly is it a principled approach to start with things we know, which is to say geometric primitives, and to work upwards from there?', 'start': 9122.209, 'duration': 7.224}, {'end': 9132.134, 'text': 'What would it look like if we went top-down instead?', 'start': 9129.973, 'duration': 2.161}, {'end': 9133.495, 'text': 'And what makes a good prior??', 'start': 9132.414, 'duration': 1.081}, {'end': 9138.718, 'text': "I mean one way to think about it is the actual function, space that you're searching through.", 'start': 9133.515, 'duration': 5.203}, {'end': 9140.52, 'text': 'you know this hypothesis space.', 'start': 9138.718, 'duration': 1.802}, {'end': 9146.464, 'text': "it's not just about being able to find the function easily in that space or the simplicity of the function you find.", 'start': 9140.52, 'duration': 5.944}, {'end': 9150.547, 'text': "Cholet would say it's the information conversion ratio of that function.", 'start': 9146.904, 'duration': 3.643}], 'summary': 'Discussing the principled approach of starting with geometric primitives, exploring top-down approach, and evaluating good priors based on information conversion ratio.', 'duration': 30.699, 'max_score': 9119.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u89119848.jpg'}, {'end': 9392.919, 'src': 'embed', 'start': 9369.219, 'weight': 12, 'content': [{'end': 9376.485, 'text': 'The geometric stability principle tells you that If you are close enough to an element of the group,', 'start': 9369.219, 'duration': 7.266}, {'end': 9383.431, 'text': 'if you can describe this transformation as an approximate translation, then you will be approximately invariant or approximately equivalent.', 'start': 9376.485, 'duration': 6.946}, {'end': 9385.593, 'text': 'And this is actually what happens in CNN.', 'start': 9383.831, 'duration': 1.762}, {'end': 9386.614, 'text': 'So this was shown by Joan.', 'start': 9385.633, 'duration': 0.981}, {'end': 9392.919, 'text': 'And they use this motivation to explain why convolutional neural networks are so powerful.', 'start': 9388.095, 'duration': 4.824}], 'summary': 'Geometric stability principle in cnn makes them powerful', 'duration': 23.7, 'max_score': 9369.219, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u89369219.jpg'}, {'end': 10062.687, 'src': 'embed', 'start': 10005.385, 'weight': 0, 'content': [{'end': 10011.947, 'text': "So what we're doing here is we make this high-dimensional neural network component that simulates the effects of Dijkstra.", 'start': 10005.385, 'duration': 6.562}, {'end': 10017.65, 'text': "But we're also mindful of the fact that, To compute, say, the expected travel time,", 'start': 10012.367, 'duration': 5.283}, {'end': 10021.852, 'text': "there's more factors at play than just the output of a shortest path algorithm, right?", 'start': 10017.65, 'duration': 4.202}, {'end': 10030.638, 'text': 'There could well also be some flow-related elements, maybe just some elements related to the current time of day, human psychology, whatnot, right?', 'start': 10022.273, 'duration': 8.365}, {'end': 10038.623, 'text': 'So we start off by assuming the algorithm does not give the complete picture in this high-dimensional, noisy world.', 'start': 10031.018, 'duration': 7.605}, {'end': 10045.29, 'text': 'So we always, as default as part of our architecture, incorporate a skip connection from, just you know,', 'start': 10039.204, 'duration': 6.086}, {'end': 10048.133, 'text': 'a raw neural network encoder over the algorithm.', 'start': 10045.29, 'duration': 2.843}, {'end': 10055.059, 'text': "So in case there's any model-free information that you want to extract without looking at what the algorithm tells you, you can do that.", 'start': 10048.593, 'duration': 6.466}, {'end': 10059.544, 'text': "So maybe I don't know, Yannick, if that answers your question about approximate symmetries,", 'start': 10055.3, 'duration': 4.244}, {'end': 10062.687, 'text': "but that's the kind of the vibe I got when I heard the question.", 'start': 10059.544, 'duration': 3.143}], 'summary': 'Neural network simulates dijkstra, considers multiple factors, incorporates skip connection for model-free information.', 'duration': 57.302, 'max_score': 10005.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810005385.jpg'}, {'end': 10124.136, 'src': 'embed', 'start': 10088.946, 'weight': 15, 'content': [{'end': 10092.588, 'text': 'you know there are symmetries, you want to incorporate them into your problem, and so on.', 'start': 10088.946, 'duration': 3.642}, {'end': 10100.993, 'text': "And we've also talked about the symbolic regression beforehand to maybe parse out symmetries of the underlying problem.", 'start': 10093.028, 'duration': 7.965}, {'end': 10105.996, 'text': "What are the current best approaches if we don't know the symmetries?", 'start': 10101.213, 'duration': 4.783}, {'end': 10108.546, 'text': 'So we have a bunch of data.', 'start': 10107.205, 'duration': 1.341}, {'end': 10115.25, 'text': "we suspect there must be some kind of symmetries at play, because they're usually are in the world right there.", 'start': 10108.546, 'duration': 6.704}, {'end': 10124.136, 'text': "And they, if we knew them, we could describe our problems in very compact forms and solve them very efficiently, but we don't often know.", 'start': 10115.55, 'duration': 8.586}], 'summary': 'Data analysis seeks symmetries for efficient problem-solving.', 'duration': 35.19, 'max_score': 10088.946, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810088946.jpg'}, {'end': 10196.145, 'src': 'embed', 'start': 10151.451, 'weight': 3, 'content': [{'end': 10153.792, 'text': 'What is the right symmetry structure to model??', 'start': 10151.451, 'duration': 2.341}, {'end': 10156.933, 'text': 'Is it a one-dimensional translation group or a two-dimensional translation group??', 'start': 10153.892, 'duration': 3.041}, {'end': 10165.855, 'text': 'Do we want to observe the vertical, the slight vertical motions as the noise and deal with it as data augmentation,', 'start': 10156.993, 'duration': 8.862}, {'end': 10169.856, 'text': 'or you want to describe it in the structure of the group that you discover?', 'start': 10165.855, 'duration': 4.001}, {'end': 10171.957, 'text': 'So there is no single answer.', 'start': 10170.637, 'duration': 1.32}, {'end': 10175.458, 'text': 'You cannot say that one is correct and another one is wrong.', 'start': 10173.337, 'duration': 2.121}, {'end': 10182.301, 'text': 'I think this was where I was going with the question of how principled are the symmetries?', 'start': 10176.9, 'duration': 5.401}, {'end': 10187.483, 'text': 'The symmetries seem to be hierarchical just in the same way that geometries are hierarchical.', 'start': 10183.102, 'duration': 4.381}, {'end': 10193.884, 'text': 'You were saying that, for example, the projective geometry subsumes Euclidean geometry.', 'start': 10187.523, 'duration': 6.361}, {'end': 10196.145, 'text': 'I had a little thought experiment.', 'start': 10193.984, 'duration': 2.161}], 'summary': 'Exploring the hierarchical nature of symmetries and their relationship to geometry.', 'duration': 44.694, 'max_score': 10151.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810151451.jpg'}, {'end': 10359.908, 'src': 'embed', 'start': 10334.772, 'weight': 1, 'content': [{'end': 10342.299, 'text': 'And in fact, some of these constructions that showed remarkable compression ratios were constructed semi by hand.', 'start': 10334.772, 'duration': 7.527}, {'end': 10345.161, 'text': 'I should say that in more recent times in computer vision, for example,', 'start': 10342.519, 'duration': 2.642}, {'end': 10351.645, 'text': 'the group of Michal Irani from the Weizmann Institute in Israel use similar ideas for super resolution and image denoising,', 'start': 10345.161, 'duration': 6.484}, {'end': 10359.908, 'text': 'where you can build the clean or higher resolution image from bits and pieces of the image itself.', 'start': 10351.645, 'duration': 8.263}], 'summary': 'Hand-constructed constructions achieved remarkable compression ratios, used for super resolution and image denoising.', 'duration': 25.136, 'max_score': 10334.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810334772.jpg'}, {'end': 10395.045, 'src': 'embed', 'start': 10373.493, 'weight': 4, 'content': [{'end': 10383.86, 'text': 'I spend a lot of time thinking about this because my intuition is that deep learning works quite well because of the strict structural limitations of the data which is produced by our physical world.', 'start': 10373.493, 'duration': 10.367}, {'end': 10389.002, 'text': 'Would you say that physical reality is highly dimensional? or not?', 'start': 10384.3, 'duration': 4.702}, {'end': 10395.045, 'text': "If it's highly dimensional, is it because it emerged from a simple set of rules or relations like we were just talking about?", 'start': 10389.183, 'duration': 5.862}], 'summary': 'Deep learning works well due to strict structural limitations of data from physical reality.', 'duration': 21.552, 'max_score': 10373.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810373493.jpg'}, {'end': 10468.24, 'src': 'embed', 'start': 10440.007, 'weight': 2, 'content': [{'end': 10444.429, 'text': 'And yet, if we zoom out, we can model the system statistically.', 'start': 10440.007, 'duration': 4.422}, {'end': 10448.011, 'text': "And that's exactly the main idea of thermodynamics and statistical mechanics.", 'start': 10444.449, 'duration': 3.562}, {'end': 10453.113, 'text': 'And this macroscopic system is surprisingly simple.', 'start': 10448.791, 'duration': 4.322}, {'end': 10455.995, 'text': 'It can be described by just a few parameters, such as temperature.', 'start': 10453.333, 'duration': 2.662}, {'end': 10468.24, 'text': 'And the example of fractals that you brought up before essentially show that you can create very complex patterns with very simple rules that are applied locally in a repeated way.', 'start': 10456.875, 'duration': 11.365}], 'summary': 'Thermodynamics and statistical mechanics model simple macroscopic systems with few parameters like temperature.', 'duration': 28.233, 'max_score': 10440.007, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810440007.jpg'}], 'start': 9100.858, 'title': 'Geometric and symmetric principles in data processing', 'summary': 'Delves into the application of geometric priors in stochastic gradient descent, the concept of geometric stability in cnn, the challenges and solutions in algorithmic reasoning, and the hierarchical nature of symmetries and fractals in physical reality, offering insights into practical applications and trade-offs in data processing.', 'chapters': [{'end': 9348.066, 'start': 9100.858, 'title': 'Geometric priors in stochastic gradient descent', 'summary': 'Discusses the principled approach of using geometric priors in stochastic gradient descent, emphasizing the importance of choosing the right symmetry group based on specific data structures and problems, and the practicality of incorporating prior knowledge into data augmentations.', 'duration': 247.208, 'highlights': ['The importance of choosing the right symmetry group based on specific data structures and problems is emphasized. Emphasizes the significance of selecting the appropriate symmetry group for specific data structures and problems.', 'The practicality of incorporating prior knowledge into data augmentations is discussed, highlighting its success in practice. Discusses the success of incorporating prior knowledge into data augmentations and its practical application.', 'The discussion of why geometric priors are principled and their utility in dealing with natural data is explored. Explores the rationale behind the principled nature and utility of geometric priors in handling natural data.']}, {'end': 9955.36, 'start': 9348.227, 'title': 'Geometric stability in cnn and data augmentation', 'summary': 'Discusses the concept of geometric stability in cnn, demonstrating its application and effectiveness in data augmentation, while also highlighting the trade-offs between building inductive priors into the model and using data augmentation.', 'duration': 607.133, 'highlights': ['The geometric stability principle in CNN shows that being close enough to an element of the group allows for approximate invariance or equivalence, which is demonstrated in data augmentation. Demonstrates the concept of geometric stability in CNN and its application in data augmentation.', 'In some cases, building inductive priors into the model is preferred, especially when dealing with large groups of symmetries, as shown in the example of graph neural networks. Emphasizes the preference for building inductive priors into the model in certain cases and provides an example with graph neural networks.', 'Replacing convolutions with group convolutions respecting rotational symmetries and translations led to a significant improvement in performance, demonstrating the effectiveness of building symmetries into the network. Shows the significant improvement in performance by building symmetries into the network, as demonstrated in a medical imaging problem.', 'Approximate symmetries can be beneficial in low data regimes and for general vision engines, allowing for robustness and computational efficiency trade-offs. Emphasizes the benefits of approximate symmetries in low data regimes and highlights the trade-offs involved in computational efficiency.', 'Using skip connections in neural network blocks can provide the model the choice to use or ignore certain symmetries, allowing for approximate invariances in the architecture. Discusses the use of skip connections to incorporate approximate invariances into the architecture.']}, {'end': 10171.957, 'start': 9955.38, 'title': 'Symmetry in algorithmic reasoning', 'summary': 'Discusses the challenges of applying classical algorithms to rich data, the incorporation of skip connections in neural network architecture, and the ambiguity in discovering group structure in problems without known symmetries.', 'duration': 216.577, 'highlights': ['Incorporating skip connections in neural network architecture to extract model-free information alongside algorithm output. The chapter discusses the incorporation of skip connections in the neural network architecture to extract model-free information alongside algorithm output, providing flexibility in capturing additional factors that influence outcomes.', 'Challenges of applying classical algorithms to rich data due to difficulty in abstractifying complex real-world data for algorithmic processing. The chapter addresses the challenges of applying classical algorithms to rich data, highlighting the difficulty in abstractifying complex real-world data for algorithmic processing, leading to information loss.', 'Ambiguity in discovering group structure in problems without known symmetries, leading to the absence of a single satisfactory approach. The chapter delves into the ambiguity in discovering group structure in problems without known symmetries, emphasizing the absence of a single satisfactory approach due to the ambiguous nature of the problem.']}, {'end': 10455.995, 'start': 10173.337, 'title': 'Symmetries and fractals in physical reality', 'summary': 'Discusses the hierarchical nature of symmetries, the emergence of abstract symmetries in fractal patterns, and the application of fractal coding in image compression, with a focus on the practical use of similar ideas in computer vision for super resolution and image denoising.', 'duration': 282.658, 'highlights': ['The hierarchical nature of symmetries and their relation to geometries is discussed, suggesting a similarity between the hierarchical organization of symmetries and geometries. The symmetries seem to be hierarchical just in the same way that geometries are hierarchical.', 'The discussion on the emergent abstract symmetries in fractal patterns and the potential application of fractals as an analogy for physical reality, raising the question of whether to focus on low-level primitive regularities or more abstract emergent symmetries. There would be an expanding scale symmetry which might resemble the original rule, along with emergent abstract symmetries not obviously related to the simple rule producing the pattern.', 'The mention of the famous paper by Michael Barnsley on fractal coding, its use in achieving remarkable compression ratios for natural images, and its practical application in industry, such as in Microsoft and Carta encyclopedia. In the 90s, there was a famous paper by Michael Barnsley on fractal coding claiming remarkable compression ratios for natural images, and it was actually used in the industry, including Microsoft and Carta encyclopedia.', 'The application of similar ideas from fractal coding in computer vision for super resolution and image denoising, as demonstrated by the group of Michal Irani from the Weizmann Institute in Israel. In more recent times, similar ideas from fractal coding have been used in computer vision, such as for super resolution and image denoising by the group of Michal Irani from the Weizmann Institute in Israel.', "The discussion on the structural limitations of data produced by our physical world, the dimensionality of physical reality, and its potential description at different scales, particularly in the context of statistical mechanics and thermodynamics. The deep learning's success is attributed to the strict structural limitations of the data produced by our physical world, and the discussion delves into the dimensionality of physical reality and its potential description at different scales, particularly in the context of statistical mechanics and thermodynamics."]}], 'duration': 1355.137, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u89100858.jpg', 'highlights': ['The practicality of incorporating prior knowledge into data augmentations is discussed, highlighting its success in practice.', 'Replacing convolutions with group convolutions respecting rotational symmetries and translations led to a significant improvement in performance, demonstrating the effectiveness of building symmetries into the network.', 'Using skip connections in neural network blocks can provide the model the choice to use or ignore certain symmetries, allowing for approximate invariances in the architecture.', 'The chapter discusses the incorporation of skip connections in the neural network architecture to extract model-free information alongside algorithm output, providing flexibility in capturing additional factors that influence outcomes.', 'The hierarchical nature of symmetries and their relation to geometries is discussed, suggesting a similarity between the hierarchical organization of symmetries and geometries.', 'The mention of the famous paper by Michael Barnsley on fractal coding, its use in achieving remarkable compression ratios for natural images, and its practical application in industry, such as in Microsoft and Carta encyclopedia.', 'In more recent times, similar ideas from fractal coding have been used in computer vision, such as for super resolution and image denoising by the group of Michal Irani from the Weizmann Institute in Israel.', "The deep learning's success is attributed to the strict structural limitations of the data produced by our physical world, and the discussion delves into the dimensionality of physical reality and its potential description at different scales, particularly in the context of statistical mechanics and thermodynamics.", 'The importance of choosing the right symmetry group based on specific data structures and problems is emphasized.', 'The geometric stability principle in CNN shows that being close enough to an element of the group allows for approximate invariance or equivalence, which is demonstrated in data augmentation.', 'Approximate symmetries can be beneficial in low data regimes and for general vision engines, allowing for robustness and computational efficiency trade-offs.', 'The discussion on the emergent abstract symmetries in fractal patterns and the potential application of fractals as an analogy for physical reality, raising the question of whether to focus on low-level primitive regularities or more abstract emergent symmetries.', 'The chapter delves into the ambiguity in discovering group structure in problems without known symmetries, emphasizing the absence of a single satisfactory approach due to the ambiguous nature of the problem.', 'The discussion of why geometric priors are principled and their utility in dealing with natural data is explored.', 'Incorporating skip connections in neural network architecture to extract model-free information alongside algorithm output.', 'Challenges of applying classical algorithms to rich data due to difficulty in abstractifying complex real-world data for algorithmic processing.']}, {'end': 11218.762, 'segs': [{'end': 10830.403, 'src': 'embed', 'start': 10803.391, 'weight': 1, 'content': [{'end': 10813.159, 'text': 'Can you give us a bit of a practical blueprint of how would I build a network that takes this as an input and applies this?', 'start': 10803.391, 'duration': 9.768}, {'end': 10815.32, 'text': 'How would you go about this?', 'start': 10814.08, 'duration': 1.24}, {'end': 10819.721, 'text': 'You know what would be the building blocks that you choose, the orders and so on?', 'start': 10815.52, 'duration': 4.201}, {'end': 10822.942, 'text': 'Is there overarching principles in how to do that?', 'start': 10819.761, 'duration': 3.181}, {'end': 10830.403, 'text': "I don't think that there is really a general recipe, so it's problem dependent, but maybe one example is applications in chemistry.", 'start': 10823.722, 'duration': 6.681}], 'summary': 'Building a network depends on problem, with example in chemistry.', 'duration': 27.012, 'max_score': 10803.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810803391.jpg'}, {'end': 10972.473, 'src': 'embed', 'start': 10934.921, 'weight': 2, 'content': [{'end': 10941.11, 'text': "like making sense of the data you're receiving and figuring out what's the right kind of symmetry to bake into it.", 'start': 10934.921, 'duration': 6.189}, {'end': 10946.637, 'text': "I don't necessarily have a good opinion on what this model might look like.", 'start': 10942.091, 'duration': 4.546}, {'end': 10953.357, 'text': 'But what I do say is just looking at the immediate utility of the geometric deep learning blueprint.', 'start': 10947.631, 'duration': 5.726}, {'end': 10962.046, 'text': "we are like, I think, very strictly saying that we don't want to use this blueprint to propose, you know, the one true architecture.", 'start': 10953.357, 'duration': 8.689}, {'end': 10972.473, 'text': 'Rather, we make the argument that different problems require different specifications and we provide a common language that will allow, say,', 'start': 10963.627, 'duration': 8.846}], 'summary': 'Geometric deep learning blueprint offers flexible model options for diverse problems.', 'duration': 37.552, 'max_score': 10934.921, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810934921.jpg'}, {'end': 11017.984, 'src': 'embed', 'start': 10990.183, 'weight': 0, 'content': [{'end': 10995.288, 'text': 'But this blueprint kind of just provides a clear delimiting aspect to these things.', 'start': 10990.183, 'duration': 5.105}, {'end': 11003.896, 'text': 'Just like in the 1800s, you had all these different types of geometries that basically lived on completely different kinds of objects.', 'start': 10995.408, 'duration': 8.488}, {'end': 11007.798, 'text': 'geometric objects, right? So hyperbolic, elliptic, and so on and so forth.', 'start': 11004.196, 'duration': 3.602}, {'end': 11013.081, 'text': 'And what Klein-Zerlangen program allowed us to do was, among other things,', 'start': 11008.078, 'duration': 5.003}, {'end': 11017.984, 'text': 'reason about all of these geometries using the same language of group invariants and symmetries.', 'start': 11013.081, 'duration': 4.903}], 'summary': 'The klein-zerlangen program unified different geometries using group invariants and symmetries.', 'duration': 27.801, 'max_score': 10990.183, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810990183.jpg'}, {'end': 11105.323, 'src': 'embed', 'start': 11076.109, 'weight': 3, 'content': [{'end': 11080.172, 'text': "I mean it's possible that many people could benefit from this, but they just don't know about it yet.", 'start': 11076.109, 'duration': 4.063}, {'end': 11088.219, 'text': 'I was thinking that, for example, if I had a LiDAR scanner on my phone and the result is a point cloud which is not particularly useful,', 'start': 11080.592, 'duration': 7.627}, {'end': 11092.162, 'text': 'but I would presumably transform it into a mesh which would be more useful.', 'start': 11088.219, 'duration': 3.943}, {'end': 11097.246, 'text': 'But is it possible that loads of data scientists out there are sitting on data sets that they could be thinking about geometrically,', 'start': 11092.202, 'duration': 5.044}, {'end': 11097.747, 'text': "but they're not?", 'start': 11097.246, 'duration': 0.501}, {'end': 11099.76, 'text': 'many folds are exotic.', 'start': 11098.74, 'duration': 1.02}, {'end': 11101.461, 'text': "It's probably in the eyes of the beholder.", 'start': 11099.84, 'duration': 1.621}, {'end': 11105.323, 'text': 'And well, in machine learning, probably they are to some extent exotic.', 'start': 11101.501, 'duration': 3.822}], 'summary': 'Many people could benefit from lidar data transformation, but lack awareness. data scientists may underutilize geometric thinking in machine learning.', 'duration': 29.214, 'max_score': 11076.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u811076109.jpg'}], 'start': 10456.875, 'title': 'Geometric deep learning', 'summary': 'Discusses challenges and applications of graph neural networks, including limitations due to the factorial size of the permutation group, implications of geometric stability issues, and the importance of locality, as well as its application in chemistry, molecule analysis, drug screening, alphafold, and commercial success in 3d hand reconstruction, emphasizing the importance of symmetry and equivariant message passing.', 'chapters': [{'end': 10822.942, 'start': 10456.875, 'title': 'Graph neural networks: stability and geometric considerations', 'summary': 'Discusses the challenges of graph neural networks, including limitations of graph convolutions due to the factorial size of the permutation group, implications of geometric stability issues on graph fourier transform, and the importance of locality in convolutional neural networks.', 'duration': 366.067, 'highlights': ['Graph convolutions are limited by the factorial size of the permutation group, necessitating reliance on heuristics like graph convolutions, impacting the ability to represent every possible operation in a neural network operation. Limitation of graph convolutions due to factorial size of permutation group, reliance on heuristics, inability to represent every possible operation', 'Geometric stability issues impact the effectiveness of additional graph neural networks, leading to instability in the presence of perturbations in node features or edge structure, limiting their applicability. Geometric instability impact on additional graph neural networks, instability due to perturbations, limitations in applicability', 'Locality is a crucial feature in many situations, as demonstrated in convolutional neural networks where compositionality properties allow for complex features to be created from simple primitives, and the importance of locality in problems that do not heavily depend on distant interactions. Locality as a crucial feature, compositionality properties in convolutional neural networks, importance of locality in problems']}, {'end': 11218.762, 'start': 10823.722, 'title': 'Geometric deep learning in applications', 'summary': 'Discusses the application of geometric deep learning in chemistry, molecule analysis, drug screening, alphafold, and commercial success in 3d hand reconstruction, emphasizing the importance of symmetry, equivariant message passing, and the need for different specifications for different problems.', 'duration': 395.04, 'highlights': ['The chapter discusses the application of geometric deep learning in chemistry, molecule analysis, drug screening, AlphaFold, and commercial success in 3D hand reconstruction. Applications in chemistry, molecule analysis, drug screening, AlphaFold, and commercial success in 3D hand reconstruction', 'Emphasizes the importance of symmetry and equivariant message passing in the success of virtual drug screening architectures and AlphaFold. Importance of symmetry, equivariant message passing in virtual drug screening architectures and AlphaFold', 'Highlights the need for different specifications for different problems and provides a common language for different types of data. Need for different specifications for different problems, common language for different types of data']}], 'duration': 761.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u810456875.jpg', 'highlights': ['Applications in chemistry, molecule analysis, drug screening, AlphaFold, and commercial success in 3D hand reconstruction', 'Importance of symmetry, equivariant message passing in virtual drug screening architectures and AlphaFold', 'Locality as a crucial feature, compositionality properties in convolutional neural networks, importance of locality in problems', 'Limitation of graph convolutions due to factorial size of permutation group, reliance on heuristics, inability to represent every possible operation', 'Geometric instability impact on additional graph neural networks, instability due to perturbations, limitations in applicability', 'Need for different specifications for different problems, common language for different types of data']}, {'end': 12802.122, 'segs': [{'end': 11597.216, 'src': 'embed', 'start': 11570.038, 'weight': 5, 'content': [{'end': 11573.942, 'text': 'And just by like training this pipeline end to end with a model free loss,', 'start': 11570.038, 'duration': 3.904}, {'end': 11581.266, 'text': 'we were able to get interesting returns in atari games much sooner than some of the competing approaches.', 'start': 11573.942, 'duration': 7.324}, {'end': 11582.867, 'text': "so it's a very small step.", 'start': 11581.266, 'duration': 1.601}, {'end': 11589.091, 'text': 'it still requires, you know, hundred thousand, two hundred thousand iterations of uh playing before you get meaning.', 'start': 11582.867, 'duration': 6.224}, {'end': 11591.313, 'text': 'some meaningful behavior start to come out.', 'start': 11589.091, 'duration': 2.222}, {'end': 11597.216, 'text': "but it's a sign that we might be able to move the needle a bit backwards and not require, you know,", 'start': 11591.313, 'duration': 5.903}], 'summary': 'Training pipeline yielded early returns in atari games, with 100,000-200,000 iterations, hinting at potential progress.', 'duration': 27.178, 'max_score': 11570.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u811570038.jpg'}, {'end': 11896.599, 'src': 'embed', 'start': 11872.652, 'weight': 8, 'content': [{'end': 11880.219, 'text': 'So, does this require a different neural network architecture or could geometric deep learning already deliver the goods right?', 'start': 11872.652, 'duration': 7.567}, {'end': 11882.441, 'text': "It's almost as if it's just a representation problem.", 'start': 11880.239, 'duration': 2.202}, {'end': 11890.754, 'text': "I think it's a very important question, one which, Well, I cannot claim to have the right answer to, and my definition is, I guess,", 'start': 11882.461, 'duration': 8.293}, {'end': 11896.599, 'text': 'a little bit skewed by the specific research that I do and the engineering approaches that I do.', 'start': 11890.754, 'duration': 5.845}], 'summary': 'Geometric deep learning may address neural network architecture for representation problem.', 'duration': 23.947, 'max_score': 11872.652, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u811872652.jpg'}, {'end': 11938.14, 'src': 'embed', 'start': 11913.931, 'weight': 3, 'content': [{'end': 11921.974, 'text': 'to figure out how to recompose them and either discover new analogies or just discover new conclusions that you can use in the next step of reasoning.', 'start': 11913.931, 'duration': 8.043}, {'end': 11932.038, 'text': 'And you know, it just feels really amenable to a kind of synergy of, as Daniel Kahneman puts it, system one and system two right?', 'start': 11922.714, 'duration': 9.324}, {'end': 11938.14, 'text': 'You have the perceptive component that feeds in the raw information that you receive as your input data.', 'start': 11932.418, 'duration': 5.722}], 'summary': 'Recompose data to find new insights and conclusions for reasoning, aiming for a synergy of system one and system two.', 'duration': 24.209, 'max_score': 11913.931, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u811913931.jpg'}, {'end': 12260.136, 'src': 'embed', 'start': 12232.418, 'weight': 6, 'content': [{'end': 12235.741, 'text': 'And this was the end of my conversation with Taco Cohen.', 'start': 12232.418, 'duration': 3.323}, {'end': 12238.623, 'text': "One of the really interesting things is you're getting on to.", 'start': 12236.061, 'duration': 2.562}, {'end': 12249.628, 'text': "some of the work that you've done are being able to think of group convolutions on homogeneous objects like spheres, for example.", 'start': 12240.581, 'duration': 9.047}, {'end': 12260.136, 'text': 'but also you moved on to irregular objects like any mesh, and you looked into things like fiber bundles and local and convolutions,', 'start': 12249.628, 'duration': 10.508}], 'summary': 'Taco cohen explores group convolutions on spheres and irregular objects like meshes.', 'duration': 27.718, 'max_score': 12232.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u812232418.jpg'}, {'end': 12428.183, 'src': 'embed', 'start': 12389.315, 'weight': 4, 'content': [{'end': 12391.977, 'text': 'Um, and, uh, you know, that just is what it is.', 'start': 12389.315, 'duration': 2.662}, {'end': 12399.641, 'text': 'If you say I have a signal on this manifold, uh, and I want, uh, uh, to respect the symmetry as well.', 'start': 12392.057, 'duration': 7.584}, {'end': 12402.682, 'text': "If there are no global symmetries, there's nothing to respect.", 'start': 12399.661, 'duration': 3.021}, {'end': 12403.743, 'text': 'So you get no constraints.', 'start': 12402.702, 'duration': 1.041}, {'end': 12406.404, 'text': 'You can just use arbitrary, uh, linear map.', 'start': 12403.763, 'duration': 2.641}, {'end': 12416.159, 'text': 'Um Now, it turns out there are certain other kinds of symmetries called gauge symmetries that you might still want to respect.', 'start': 12407.345, 'duration': 8.814}, {'end': 12428.183, 'text': 'And in practice, what respecting gauge symmetry will do is will put some constraints on the filter at a particular position.', 'start': 12417.079, 'duration': 11.104}], 'summary': 'Gauge symmetries put constraints on filter at a position.', 'duration': 38.868, 'max_score': 12389.315, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u812389315.jpg'}, {'end': 12796.957, 'src': 'embed', 'start': 12739.638, 'weight': 0, 'content': [{'end': 12749.969, 'text': 'And I really think this can help new people who are new to the field to learn more quickly, to get an overview of all the things that are out there.', 'start': 12739.638, 'duration': 10.331}, {'end': 12761.249, 'text': 'And I also think that this is the start of at least one way in which we can take the black box of deep learning,', 'start': 12751.571, 'duration': 9.678}, {'end': 12771.433, 'text': 'which often is viewed as completely inscrutable, and actually start to open it and start to understand how the pieces connect,', 'start': 12761.249, 'duration': 10.184}, {'end': 12781.277, 'text': "which can then perhaps inform future developments that are guided by both empirical results and an understanding of what's going on.", 'start': 12771.433, 'duration': 9.844}, {'end': 12783.838, 'text': 'Amazing Thanks so much, Taco.', 'start': 12782.337, 'duration': 1.501}, {'end': 12785.748, 'text': 'Thanks for having me.', 'start': 12785.128, 'duration': 0.62}, {'end': 12786.469, 'text': "It's been a pleasure.", 'start': 12785.849, 'duration': 0.62}, {'end': 12789.011, 'text': 'Joan, thank you so much for joining us.', 'start': 12787.21, 'duration': 1.801}, {'end': 12789.832, 'text': 'This has been amazing.', 'start': 12789.031, 'duration': 0.801}, {'end': 12791.553, 'text': 'Okay No, thank you so much, Tim.', 'start': 12790.532, 'duration': 1.021}, {'end': 12793.995, 'text': 'It was very fun and best of luck.', 'start': 12791.573, 'duration': 2.422}, {'end': 12796.957, 'text': "And I think let's maybe get in touch.", 'start': 12794.035, 'duration': 2.922}], 'summary': 'Deep learning can be made more accessible to newcomers, guiding future developments with empirical results and understanding.', 'duration': 57.319, 'max_score': 12739.638, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u812739638.jpg'}], 'start': 11220.299, 'title': 'Advancements in graph neural networks', 'summary': 'Discusses new message passing mechanisms for graph neural networks, leveraging geometric and algorithmic concepts in reinforcement learning, exploring the potential of geometric deep learning in agi, and understanding group convolutions on irregular objects, potentially informing future developments in deep learning.', 'chapters': [{'end': 11458.406, 'start': 11220.299, 'title': 'Advancements in message passing mechanisms for graph neural networks', 'summary': 'Discusses the development of a new message passing mechanism for graph neural networks that is able to work on structures beyond traditional graphs, such as simplicial and cell complexes, leading to the discovery of entirely new problems and potential breakthroughs in areas like reinforcement learning.', 'duration': 238.107, 'highlights': ['The development of a message passing mechanism that can work on structures beyond traditional graphs, such as simplicial and cell complexes, has opened up entirely new problems and potential breakthroughs in areas like reinforcement learning. The new message passing mechanism allows for the consideration of structures beyond traditional graphs, leading to the discovery of entirely new problems and potential breakthroughs in areas like reinforcement learning.', "The traditional graph neural network's use of input graph for message propagation is equivalent to device for 11 graph isomorphism tests, which has limitations in the kinds of structures it can detect. The traditional message passing used in graph neural networks has limitations in the kinds of structures it can detect, as it is equivalent to device for 11 graph isomorphism tests.", 'The ability to regard rings as cells and perform a different form of message passing on them has been shown to be strictly more powerful than the Weiss-Ferrell-Lehman algorithm. The ability to regard rings as cells and perform a different form of message passing on them has been shown to be strictly more powerful than the Weiss-Ferrell-Lehman algorithm.', 'Advancements in message passing mechanisms have created opportunities to tackle previously challenging reinforcement learning problems. Advancements in message passing mechanisms have created opportunities to tackle previously challenging reinforcement learning problems, leading to potential breakthroughs in the field.']}, {'end': 11872.172, 'start': 11459.107, 'title': 'Geometric concepts in reinforcement learning', 'summary': 'Discusses leveraging geometric and algorithmic concepts to improve data efficiency in reinforcement learning, highlighting the executed latent value iteration network (excelvin) as an example, which achieved interesting returns in atari games with a model-free loss and showed potential for moving the needle in reducing the required number of interactions before meaningful behavior emerges.', 'duration': 413.065, 'highlights': ['The Executed Latent Value Iteration Network (ExcelVIN) achieved interesting returns in atari games with a model-free loss, showing potential for reducing the required number of interactions before meaningful behavior emerges. ExcelVIN achieved interesting returns in atari games with a model-free loss, signaling potential for reducing the required interactions before meaningful behavior emerges.', 'Geometric deep learning could help in getting better behaviors faster and with fewer interactions in reinforcement learning. Geometric deep learning could lead to improved behaviors with fewer interactions in reinforcement learning.', 'The emerging area of latent graph learning or latent graph inference aims to learn the graph simultaneously with using it for the underlying decision problem, posing challenges for neural network optimization. Latent graph learning poses challenges for neural network optimization by aiming to learn the graph simultaneously with using it for the underlying decision problem.', 'The potential for a qualitative breakthrough in the structural biology field is discussed, where the accurate prediction of 3D protein structures enables impactful applications in drug design and pharmaceutical pipelines. Accurate prediction of 3D protein structures can lead to qualitative breakthroughs in drug design and pharmaceutical pipelines.', 'Analogies are discussed as core to cognition, with the analogy being likened to the interstate freeway of cognition and symmetries, shedding light on the nature of intelligence and neural networks. Analogies are highlighted as core to cognition, shedding light on the nature of intelligence and neural networks.']}, {'end': 12231.877, 'start': 11872.652, 'title': 'Geometric deep learning and agi', 'summary': 'Explores the potential of geometric deep learning in analogy making and reasoning for agi, the challenges in simulating algorithms with graph neural networks, and the debate on anthropocentric conceptions of intelligence.', 'duration': 359.225, 'highlights': ['The challenges in simulating algorithms with graph neural networks is a key area of work, as it is super easy to do in distribution but difficult to extrapolate, posing a significant challenge in achieving AGI. challenges in simulating algorithms with graph neural networks', 'Geometric deep learning, with the inclusion of category theory concepts, has the potential to support analogy making and reasoning for AGI, building on the existing building blocks in place. potential of geometric deep learning in supporting analogy making and reasoning for AGI', 'The debate on anthropocentric conceptions of intelligence raises questions about the definitions of artificial intelligence and the potential limitation of the famous Turing test, reflecting on the need to consider intelligence beyond a human-centric perspective. debate on anthropocentric conceptions of intelligence and the limitation of the Turing test']}, {'end': 12802.122, 'start': 12232.418, 'title': 'Understanding group convolutions on irregular objects', 'summary': 'Discusses the challenges of applying convolutional neural networks to irregular objects such as meshes and the proposal of natural graph networks to capture local motifs, providing a principled way to understand the symmetries and constraints of such objects, potentially informing future developments in deep learning.', 'duration': 569.704, 'highlights': ['The challenges of applying convolutional neural networks to irregular objects like meshes and the proposal of natural graph networks to capture local motifs, providing a principled way to understand the symmetries and constraints of such objects, potentially informing future developments in deep learning.', 'The explanation of weight sharing in convolution and its relation to symmetry, where the use of the same filter at each position in a two-dimensional convolution is motivated by translation symmetry, and the difficulty of motivating weight sharing on general manifolds without global symmetries.', 'The discussion of the global permutation symmetry in graph neural networks and the proposal of a local version of natural graph networks to capture local motifs and symmetries, such as aromatic rings, potentially informing future developments in graph neural networks.', 'The emphasis on the importance of understanding the commonalities and underlying principles of various architectures in deep learning, which can help new practitioners quickly grasp the field and potentially lead to a better understanding of the black box of deep learning, guiding future developments.', 'The mention of natural graph networks as a generalization of equivariance and the proposal of a global and local version to process certain local motifs, such as aromatic rings, and the potential impact on understanding the symmetries and constraints of graph structures.']}], 'duration': 1581.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/bIZB1hIJ4u8/pics/bIZB1hIJ4u811220299.jpg', 'highlights': ['The development of a message passing mechanism that can work on structures beyond traditional graphs has opened up entirely new problems and potential breakthroughs in areas like reinforcement learning.', 'Advancements in message passing mechanisms have created opportunities to tackle previously challenging reinforcement learning problems.', 'The ability to regard rings as cells and perform a different form of message passing on them has been shown to be strictly more powerful than the Weiss-Ferrell-Lehman algorithm.', 'ExcelVIN achieved interesting returns in atari games with a model-free loss, signaling potential for reducing the required interactions before meaningful behavior emerges.', 'Geometric deep learning could lead to improved behaviors with fewer interactions in reinforcement learning.', 'Accurate prediction of 3D protein structures can lead to qualitative breakthroughs in drug design and pharmaceutical pipelines.', 'Analogies are highlighted as core to cognition, shedding light on the nature of intelligence and neural networks.', 'The challenges of applying convolutional neural networks to irregular objects like meshes and the proposal of natural graph networks to capture local motifs, potentially informing future developments in deep learning.', 'The emphasis on the importance of understanding the commonalities and underlying principles of various architectures in deep learning, which can help new practitioners quickly grasp the field and potentially lead to a better understanding of the black box of deep learning, guiding future developments.', 'The potential of geometric deep learning in supporting analogy making and reasoning for AGI.']}], 'highlights': ['Geometric deep learning introduces function spaces based on geometrical priors to reduce statistical error and overfitting without increasing approximation error.', 'The blueprint of Geometric Deep Learning consists of three core principles: symmetry, scale separation, and geometric stability, providing a rich approximation space with prescribed invariance and stability properties.', "Dr. Petar Velickovic's notable contributions include being the first author of Graph Attention Networks and DeepGraph InfoMax, published in leading machine learning venues.", "Graph neural networks have proven impactful in detecting novel potent antibiotics in computational chemistry, powering systems for developing the latest generation of Google's machine learning chips, and serving various content in production to billions of users, while significantly improving travel time predictions in Google Maps.", "Transformers' application in modeling long-distance relations between tokens in a sentence", 'Using skip connections in neural network blocks can provide the model the choice to use or ignore certain symmetries, allowing for approximate invariances in the architecture.', 'The practicality of incorporating prior knowledge into data augmentations is discussed, highlighting its success in practice.', 'The development of a message passing mechanism that can work on structures beyond traditional graphs has opened up entirely new problems and potential breakthroughs in areas like reinforcement learning.', 'Accurate prediction of 3D protein structures can lead to qualitative breakthroughs in drug design and pharmaceutical pipelines.', 'The potential of geometric deep learning in supporting analogy making and reasoning for AGI.']}