title

MIT 6.S191 (2021): Deep Generative Modeling

description

MIT 6.S191 (2021): Introduction to Deep Learning
Deep Generative Modeling
Lecturer: Ava Soleimany
January 2021
For all lectures, slides, and lab materials: http://introtodeeplearning.com
Lecture Outline
0:00 - Introduction
6:03 - Why care about generative models?
8:56 - Latent variable models
11:31 - Autoencoders
17:00 - Variational autoencoders
24:30 - Priors on the latent distribution
34:38 - Reparameterization trick
38:14 - Latent perturbation and disentanglement
41:25 - Debiasing with VAEs
43:42 - Generative adversarial networks
46:14 - Intuitions behind GANs
48:27 - Training GANs
52:57 - GANs: Recent advances
57:15 - CycleGAN of unpaired translation
1:01:01 - Summary
Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

detail

{'title': 'MIT 6.S191 (2021): Deep Generative Modeling', 'heatmap': [{'end': 711.691, 'start': 667.594, 'weight': 0.789}, {'end': 1152.132, 'start': 1038.075, 'weight': 0.878}, {'end': 1264.369, 'start': 1189.816, 'weight': 0.824}, {'end': 1413.891, 'start': 1289.302, 'weight': 0.73}, {'end': 2308.361, 'start': 2257.006, 'weight': 0.767}, {'end': 2676.711, 'start': 2598.763, 'weight': 0.936}, {'end': 3061.376, 'start': 2972.242, 'weight': 1}, {'end': 3130.664, 'start': 3078.725, 'weight': 0.745}], 'summary': "Discusses deep generative modeling's power, unsupervised learning fundamentals, generative models, autoencoders, variational autoencoders, regularizing vae for continuity and completeness, vaes training techniques, gans, and recent advances in gans, including examples of their applications in automatic debiasing, synthetic instance generation, and image transformation.", 'chapters': [{'end': 118.957, 'segs': [{'end': 50.745, 'src': 'embed', 'start': 10.381, 'weight': 0, 'content': [{'end': 13.983, 'text': 'Hi, everyone, and welcome to Lecture 4 of MIT 6S191.', 'start': 10.381, 'duration': 3.602}, {'end': 26.192, 'text': "In today's lecture we're going to be talking about how we can use deep learning and neural networks to build systems that not only look for patterns in data but actually can go a step beyond this,", 'start': 14.063, 'duration': 12.129}, {'end': 30.715, 'text': 'to generate brand new synthetic examples based on those learned patterns.', 'start': 26.192, 'duration': 4.523}, {'end': 34.998, 'text': 'And this, I think, is an incredibly powerful idea,', 'start': 31.636, 'duration': 3.362}, {'end': 42.501, 'text': "and it's a particular subfield of deep learning that has enjoyed a lot of success and gotten a lot of interest in the past couple of years.", 'start': 34.998, 'duration': 7.503}, {'end': 50.745, 'text': "But I think there's still tremendous, tremendous potential of this field of deep generative modeling in the future and in the years to come,", 'start': 43.201, 'duration': 7.544}], 'summary': "Mit 6s191 lecture 4 explores deep generative modeling's potential for generating synthetic examples based on learned patterns.", 'duration': 40.364, 'max_score': 10.381, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw10381.jpg'}], 'start': 10.381, 'title': 'Deep generative modeling', 'summary': 'Discusses the power of deep generative modeling in using neural networks to create synthetic examples based on learned patterns, highlighting its success and potential.', 'chapters': [{'end': 118.957, 'start': 10.381, 'title': 'Deep generative modeling', 'summary': 'Discusses the power of deep generative modeling in using neural networks to create synthetic examples based on learned patterns, highlighting the success and potential of this subfield of deep learning.', 'duration': 108.576, 'highlights': ['Deep generative modeling uses neural networks to create synthetic examples based on learned patterns, with synthetic images showing the power of these algorithms.', 'The subfield of deep generative modeling has enjoyed a lot of success and interest in the past couple of years, with tremendous potential in the future.', 'The lecture emphasizes the increasing relevance of deep generative modeling in a variety of application areas.']}], 'duration': 108.576, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw10381.jpg', 'highlights': ['Deep generative modeling uses neural networks to create synthetic examples based on learned patterns, with synthetic images showing the power of these algorithms.', 'The subfield of deep generative modeling has enjoyed a lot of success and interest in the past couple of years, with tremendous potential in the future.', 'The lecture emphasizes the increasing relevance of deep generative modeling in a variety of application areas.']}, {'end': 519.86, 'segs': [{'end': 219.25, 'src': 'embed', 'start': 192.639, 'weight': 1, 'content': [{'end': 200.061, 'text': "And in contrast to supervised settings, where we're given data and labels, in unsupervised learning, we're given only data, no labels.", 'start': 192.639, 'duration': 7.422}, {'end': 209.684, 'text': 'And our goal is to train a machine learning or deep learning model to understand or build up a representation of the hidden and underlying structure in that data.', 'start': 200.701, 'duration': 8.983}, {'end': 219.25, 'text': 'And what this can do is it can allow sort of an insight into the foundational structure of the data and then, in turn,', 'start': 210.524, 'duration': 8.726}], 'summary': 'In unsupervised learning, the goal is to train a model to uncover hidden data structure.', 'duration': 26.611, 'max_score': 192.639, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw192639.jpg'}, {'end': 270.345, 'src': 'embed', 'start': 235.402, 'weight': 0, 'content': [{'end': 239.145, 'text': 'such as clustering algorithms or dimensionality reduction algorithms.', 'start': 235.402, 'duration': 3.743}, {'end': 243.846, 'text': 'Generative modeling is one example of unsupervised learning.', 'start': 240.523, 'duration': 3.323}, {'end': 257.476, 'text': 'And our goal in this case is to take as input examples from a training set and learn a model that represents the distribution of the data that is input to that model.', 'start': 244.506, 'duration': 12.97}, {'end': 260.579, 'text': 'And this can be achieved in two principal ways.', 'start': 258.337, 'duration': 2.242}, {'end': 270.345, 'text': "The first is through what is called density estimation, where let's say we are given a set of data samples and they fall according to some density.", 'start': 261.238, 'duration': 9.107}], 'summary': 'Unsupervised learning involves generative modeling to learn data distribution.', 'duration': 34.943, 'max_score': 235.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw235402.jpg'}, {'end': 399.629, 'src': 'embed', 'start': 370.626, 'weight': 5, 'content': [{'end': 379.39, 'text': "let's take this idea a step further and consider what could be potential impactful applications and real-world use cases of generative modeling.", 'start': 370.626, 'duration': 8.764}, {'end': 389.618, 'text': 'What generative models enable us as the users to do is to automatically uncover the underlying structure and features in a dataset.', 'start': 380.868, 'duration': 8.75}, {'end': 399.629, 'text': 'The reason this can be really important and really powerful is often we do not know how those features are distributed within a particular dataset of interest.', 'start': 390.379, 'duration': 9.25}], 'summary': 'Generative modeling automatically uncovers dataset features, enabling impactful real-world applications.', 'duration': 29.003, 'max_score': 370.626, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw370626.jpg'}, {'end': 474.334, 'src': 'embed', 'start': 445.775, 'weight': 4, 'content': [{'end': 448.577, 'text': 'like that of faces, and by doing so,', 'start': 445.775, 'duration': 2.802}, {'end': 458.206, 'text': 'actually uncover the regions of the training distribution that are underrepresented and overrepresented with respect to particular features such as skin tone.', 'start': 448.577, 'duration': 9.629}, {'end': 474.334, 'text': 'And the reason why this is so powerful is we can actually now use this information to actually adjust how the data is sampled during training to ultimately build up a more fair and more representative data set.', 'start': 459.266, 'duration': 15.068}], 'summary': 'Uncover underrepresented/overrepresented training data regions to build a fairer dataset.', 'duration': 28.559, 'max_score': 445.775, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw445775.jpg'}, {'end': 508.234, 'src': 'embed', 'start': 485.839, 'weight': 3, 'content': [{'end': 495.545, 'text': 'Another great example and use case where generative models are exceptionally powerful is this broad class of problems that can be considered outlier or anomaly detection.', 'start': 485.839, 'duration': 9.706}, {'end': 499.648, 'text': 'One example is in the case of self-driving cars,', 'start': 496.386, 'duration': 3.262}, {'end': 508.234, 'text': "where it's going to be really critical to ensure that an autonomous vehicle governed and operated by a deep neural network is able to handle all,", 'start': 499.648, 'duration': 8.586}], 'summary': 'Generative models excel at outlier detection, critical for self-driving cars.', 'duration': 22.395, 'max_score': 485.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw485839.jpg'}], 'start': 118.957, 'title': 'Unsupervised learning fundamentals', 'summary': 'Delves into the difference between supervised and unsupervised learning, highlighting the objective of unsupervised learning to train models without labels, enabling applications like generative modeling and dimensionality reduction for understanding data structure.', 'chapters': [{'end': 260.579, 'start': 118.957, 'title': 'Unsupervised learning fundamentals', 'summary': 'Discusses the distinction between supervised and unsupervised learning, emphasizing the goal of unsupervised learning to train models without labels to understand data structure, leading to applications like generative modeling and dimensionality reduction.', 'duration': 141.622, 'highlights': ['The chapter emphasizes the goal of unsupervised learning to train models without labels to understand data structure, leading to applications like generative modeling and dimensionality reduction.', 'Generative modeling is presented as an example of unsupervised learning, where the goal is to learn a model representing the distribution of the input data.', 'The lecture contrasts unsupervised learning with supervised learning, where only data, and no labels, are provided for training the model.']}, {'end': 519.86, 'start': 261.238, 'title': 'Generative modeling applications', 'summary': 'Discusses generative modeling principles, applications, and real-world use cases, including the generation of synthetic samples matching the data distribution and the automatic uncovering of underrepresented features in a dataset for fair model training.', 'duration': 258.622, 'highlights': ['Generative modeling principles and applications', 'Automatic uncovering of underrepresented features in a dataset for fair model training', 'Use of generative models for outlier or anomaly detection']}], 'duration': 400.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw118957.jpg', 'highlights': ['The chapter emphasizes the goal of unsupervised learning to train models without labels to understand data structure, leading to applications like generative modeling and dimensionality reduction.', 'The lecture contrasts unsupervised learning with supervised learning, where only data, and no labels, are provided for training the model.', 'Generative modeling is presented as an example of unsupervised learning, where the goal is to learn a model representing the distribution of the input data.', 'Use of generative models for outlier or anomaly detection', 'Automatic uncovering of underrepresented features in a dataset for fair model training', 'Generative modeling principles and applications']}, {'end': 879.527, 'segs': [{'end': 545.767, 'src': 'embed', 'start': 520.799, 'weight': 0, 'content': [{'end': 535.428, 'text': 'So generative models can actually be used to detect outliers within training distributions and use this to again improve the training process so that the resulting model can be better equipped to handle these edge cases and rare events.', 'start': 520.799, 'duration': 14.629}, {'end': 545.767, 'text': 'Alright, so hopefully that motivates why and how generative models can be exceptionally powerful and useful for a variety of real-world applications.', 'start': 537.159, 'duration': 8.608}], 'summary': 'Generative models can detect outliers in training data to improve model handling of rare events.', 'duration': 24.968, 'max_score': 520.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw520799.jpg'}, {'end': 605.89, 'src': 'embed', 'start': 572.327, 'weight': 3, 'content': [{'end': 572.907, 'text': 'And to do so.', 'start': 572.327, 'duration': 0.58}, {'end': 581.832, 'text': "I think really the best example that I've personally come across for understanding what a latent variable is is this story that is from Plato's work,", 'start': 572.907, 'duration': 8.925}, {'end': 582.413, 'text': 'The Republic.', 'start': 581.832, 'duration': 0.581}, {'end': 586.015, 'text': 'And this story is called The Myth of the Cave or The Parable of the Cave.', 'start': 582.893, 'duration': 3.122}, {'end': 587.676, 'text': 'And the story is as follows.', 'start': 586.755, 'duration': 0.921}, {'end': 596.482, 'text': 'In this myth, there are a group of prisoners, and these prisoners are constrained as part of their prison punishment to face a wall.', 'start': 588.456, 'duration': 8.026}, {'end': 605.89, 'text': "And the only things that they can see on this wall are the shadows of particular objects that are being passed in front of a fire that's behind them,", 'start': 597.043, 'duration': 8.847}], 'summary': "Plato's the republic contains a story called the myth of the cave, illustrating the concept of latent variables.", 'duration': 33.563, 'max_score': 572.327, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw572327.jpg'}, {'end': 717.699, 'src': 'heatmap', 'start': 667.594, 'weight': 1, 'content': [{'end': 669.335, 'text': 'And this is an extremely,', 'start': 667.594, 'duration': 1.741}, {'end': 686.563, 'text': 'extremely complex problem that is very well suited to learning by neural networks because of their power to handle multidimensional data sets and to learn combinations of nonlinear functions that can approximate really complex data distributions.', 'start': 669.335, 'duration': 17.228}, {'end': 688.412, 'text': 'All right.', 'start': 687.671, 'duration': 0.741}, {'end': 699.561, 'text': "so we'll first begin by discussing a simple and foundational generative model which tries to build up this latent variable representation by actually self-encoding the input.", 'start': 688.412, 'duration': 11.149}, {'end': 702.303, 'text': 'And these models are known as autoencoders.', 'start': 700.102, 'duration': 2.201}, {'end': 711.691, 'text': "What an autoencoder is, is it's an approach for learning a lower dimensional latent space from raw data.", 'start': 703.585, 'duration': 8.106}, {'end': 717.699, 'text': 'To understand how it works, what we do is we feed in as input raw data.', 'start': 712.694, 'duration': 5.005}], 'summary': 'Neural networks handle complex data well; autoencoders learn lower dimensional spaces from raw data.', 'duration': 29.287, 'max_score': 667.594, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw667594.jpg'}, {'end': 764.563, 'src': 'embed', 'start': 737.154, 'weight': 2, 'content': [{'end': 746.241, 'text': "And so we can call this portion of the network an encoder, since it's mapping the data, x, into an encoded vector of latent variables, z.", 'start': 737.154, 'duration': 9.087}, {'end': 752.139, 'text': "So let's consider this latent space z.", 'start': 747.898, 'duration': 4.241}, {'end': 759.181, 'text': "If you've noticed, I've represented z as having a smaller size, a smaller dimensionality as the input x.", 'start': 752.139, 'duration': 7.042}, {'end': 764.563, 'text': 'Why would it be important to ensure the low dimensionality of this latent space?', 'start': 759.181, 'duration': 5.382}], 'summary': 'Encoder maps data x to smaller latent space z for dimensionality reduction.', 'duration': 27.409, 'max_score': 737.154, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw737154.jpg'}, {'end': 858.792, 'src': 'embed', 'start': 816.746, 'weight': 4, 'content': [{'end': 819.068, 'text': 'starting from this lower dimensional latent space.', 'start': 816.746, 'duration': 2.322}, {'end': 828.708, 'text': 'And again, this decoder portion of our autoencoder network is going to be a series of layers neural network layers, like convolutional layers.', 'start': 820.086, 'duration': 8.622}, {'end': 834.669, 'text': "that's going to then take this hidden latent vector and map it back up to the input space.", 'start': 828.708, 'duration': 5.961}, {'end': 843.471, 'text': "And we call our reconstructed output x hat because it's our prediction and it's an imperfect reconstruction of our input x.", 'start': 835.629, 'duration': 7.842}, {'end': 858.792, 'text': 'And the way that we can actually train this network is by looking at the original input x and our reconstructed output x hat and simply comparing the two and minimizing the distance between these two images.', 'start': 845.202, 'duration': 13.59}], 'summary': 'Autoencoder network uses latent space to reconstruct input data.', 'duration': 42.046, 'max_score': 816.746, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw816746.jpg'}], 'start': 520.799, 'title': 'Generative models and autoencoders', 'summary': 'Explores the application of generative models for outlier detection, training improvement, and rare event handling, as well as the use of autoencoders to learn lower dimensional latent space for data compression and reconstruction, with focus on minimizing mean squared error.', 'chapters': [{'end': 650.997, 'start': 520.799, 'title': 'Generative models in real-world applications', 'summary': "Discusses how generative models can be used to detect outliers within training distributions, improve the training process, and handle edge cases and rare events, and then delves into the concept of latent variable models, using the example of plato's myth of the cave to explain latent variables.", 'duration': 130.198, 'highlights': ['Generative models can be used to detect outliers within training distributions and improve the training process.', "The chapter delves into latent variable models, using the example of Plato's Myth of the Cave to explain latent variables."]}, {'end': 879.527, 'start': 652.577, 'title': 'Autoencoders: learning latent variables', 'summary': 'Discusses the use of autoencoders to learn lower dimensional latent space from raw data, enabling the compression and reconstruction of high-dimensional data through a series of neural network layers, with an emphasis on minimizing the mean squared error between the input and the reconstructed output.', 'duration': 226.95, 'highlights': ['Autoencoders are used to learn a lower dimensional latent space from raw data.', 'The importance of ensuring a low dimensionality of the latent space in autoencoders.', 'Training of the autoencoder model through the use of a decoder network and minimizing the mean squared error between the input and the reconstructed output.']}], 'duration': 358.728, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw520799.jpg', 'highlights': ['Generative models can be used to detect outliers within training distributions and improve the training process.', 'Autoencoders are used to learn a lower dimensional latent space from raw data.', 'The importance of ensuring a low dimensionality of the latent space in autoencoders.', "The chapter delves into latent variable models, using the example of Plato's Myth of the Cave to explain latent variables.", 'Training of the autoencoder model through the use of a decoder network and minimizing the mean squared error between the input and the reconstructed output.']}, {'end': 1613.073, 'segs': [{'end': 968.162, 'src': 'embed', 'start': 935.225, 'weight': 5, 'content': [{'end': 940.808, 'text': 'And when we constrain this latent space to a lower dimensionality,', 'start': 935.225, 'duration': 5.583}, {'end': 946.711, 'text': 'that affects the degree to which and the faithfulness to which we can actually reconstruct the input.', 'start': 940.808, 'duration': 5.903}, {'end': 954.855, 'text': "And the way you can think of this is as imposing a sort of information bottleneck during the model's training and learning process.", 'start': 947.351, 'duration': 7.504}, {'end': 959.457, 'text': "And effectively, what this bottleneck does is it's a form of compression.", 'start': 955.595, 'duration': 3.862}, {'end': 968.162, 'text': "We're taking the input data, compressing it down to a much smaller latent space, and then building back up a reconstruction.", 'start': 960.318, 'duration': 7.844}], 'summary': 'Constraining latent space affects reconstruction fidelity and imposes information bottleneck during model training.', 'duration': 32.937, 'max_score': 935.225, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw935225.jpg'}, {'end': 1037.454, 'src': 'embed', 'start': 980.429, 'weight': 0, 'content': [{'end': 989.243, 'text': 'so, in summary, these autoencoder structures use this sort of bottlenecking hidden layer to learn a compressed latent representation of the data.', 'start': 980.429, 'duration': 8.814}, {'end': 995.071, 'text': 'And we can self-supervise the training of this network by using what we call a reconstruction loss.', 'start': 989.924, 'duration': 5.147}, {'end': 1005.897, 'text': 'that forces the autoencoder network to encode as much information about the data as possible into a lower dimensional latent space,', 'start': 995.632, 'duration': 10.265}, {'end': 1009.759, 'text': 'while still being able to build up faithful reconstructions.', 'start': 1005.897, 'duration': 3.862}, {'end': 1018.124, 'text': 'So the way I like to think of this is automatically encoding information from the data into a lower dimensional latent space.', 'start': 1010.48, 'duration': 7.644}, {'end': 1029.645, 'text': "Let's now expand upon this idea a bit more and introduce this concept and architecture of variational autoencoders, or VAEs.", 'start': 1020.393, 'duration': 9.252}, {'end': 1037.454, 'text': 'So as we just saw, traditional autoencoders go from input to reconstructed output.', 'start': 1031.31, 'duration': 6.144}], 'summary': 'Autoencoder structures use bottlenecking hidden layer to learn compressed latent representation and self-supervise training with reconstruction loss.', 'duration': 57.025, 'max_score': 980.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw980429.jpg'}, {'end': 1152.132, 'src': 'heatmap', 'start': 1038.075, 'weight': 0.878, 'content': [{'end': 1043.678, 'text': 'And if we pay closer attention to this latent layer denoted here in orange,', 'start': 1038.075, 'duration': 5.603}, {'end': 1049.441, 'text': 'what you can hopefully realize is that this is just a normal layer in a neural network, just like any other layer.', 'start': 1043.678, 'duration': 5.763}, {'end': 1050.762, 'text': "It's deterministic.", 'start': 1049.861, 'duration': 0.901}, {'end': 1057.406, 'text': "If you're going to feed in a particular input to this network, you're going to get the same output, so long as the weights are the same.", 'start': 1051.302, 'duration': 6.104}, {'end': 1066.972, 'text': 'So effectively, a traditional autoencoder learns this deterministic encoding, which allows for reconstruction and reproduction of the input.', 'start': 1058.166, 'duration': 8.806}, {'end': 1076.257, 'text': 'In contrast, Variational autoencoders impose a stochastic or variational twist on this architecture.', 'start': 1067.912, 'duration': 8.345}, {'end': 1089.045, 'text': 'And the idea behind doing so is to generate smoother representations of the input data and improve the quality of not only of reconstructions,', 'start': 1077.058, 'duration': 11.987}, {'end': 1096.891, 'text': 'but also to actually generate new images that are similar to the input data set, but not direct reconstructions of the input data.', 'start': 1089.045, 'duration': 7.846}, {'end': 1107.701, 'text': 'And the way this is achieved is that variational autoencoders replace that deterministic layer z with a stochastic sampling operation.', 'start': 1097.671, 'duration': 10.03}, {'end': 1114.728, 'text': 'What this means is that, instead of learning the latent variables z directly for each variable,', 'start': 1108.482, 'duration': 6.246}, {'end': 1121.512, 'text': 'the variational autoencoder learns a mean and a variance associated with that latent variable.', 'start': 1115.569, 'duration': 5.943}, {'end': 1128.496, 'text': 'And what those means and variances do is that they parametrize a probability distribution for that latent variable.', 'start': 1122.172, 'duration': 6.324}, {'end': 1143.849, 'text': "So what we've done in going from an autoencoder to a variational autoencoder is going from a vector of latent variable z to learning a vector of means mu and a vector of variances sigma,", 'start': 1129.403, 'duration': 14.446}, {'end': 1152.132, 'text': 'sigma squared that parametrize these variables and define probability distributions for each of our latent variables.', 'start': 1143.849, 'duration': 8.283}], 'summary': 'Autoencoders use deterministic encoding, while variational autoencoders introduce stochasticity for smoother representations and new image generation.', 'duration': 114.057, 'max_score': 1038.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1038075.jpg'}, {'end': 1089.045, 'src': 'embed', 'start': 1058.166, 'weight': 2, 'content': [{'end': 1066.972, 'text': 'So effectively, a traditional autoencoder learns this deterministic encoding, which allows for reconstruction and reproduction of the input.', 'start': 1058.166, 'duration': 8.806}, {'end': 1076.257, 'text': 'In contrast, Variational autoencoders impose a stochastic or variational twist on this architecture.', 'start': 1067.912, 'duration': 8.345}, {'end': 1089.045, 'text': 'And the idea behind doing so is to generate smoother representations of the input data and improve the quality of not only of reconstructions,', 'start': 1077.058, 'duration': 11.987}], 'summary': 'Autoencoders learn deterministic encoding, while variational autoencoders impose a stochastic twist to generate smoother representations.', 'duration': 30.879, 'max_score': 1058.166, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1058166.jpg'}, {'end': 1264.369, 'src': 'heatmap', 'start': 1189.816, 'weight': 0.824, 'content': [{'end': 1190.096, 'text': 'All right.', 'start': 1189.816, 'duration': 0.28}, {'end': 1198.183, 'text': "So now, because we've introduced this sampling operation, this stochasticity, into our model.", 'start': 1191.457, 'duration': 6.726}, {'end': 1205.316, 'text': 'What this means for the actual computation and learning process of the network, the encoder and decoder,', 'start': 1199.332, 'duration': 5.984}, {'end': 1208.237, 'text': "is that they're now probabilistic in their nature.", 'start': 1205.316, 'duration': 2.921}, {'end': 1219.144, 'text': 'And the way you can think of this is that our encoder is going to be trying to learn a probability distribution of the latent space z,', 'start': 1209.338, 'duration': 9.806}, {'end': 1220.465, 'text': 'given the input data x.', 'start': 1219.144, 'duration': 1.321}, {'end': 1230.902, 'text': 'while the decoder is going to take that learned latent representation and compute a new probability distribution of the input x,', 'start': 1221.553, 'duration': 9.349}, {'end': 1233.685, 'text': 'given that latent distribution z.', 'start': 1230.902, 'duration': 2.783}, {'end': 1241.093, 'text': 'And these networks the encoder, the decoder are going to be defined by separate sets of weights phi and theta,', 'start': 1233.685, 'duration': 7.408}, {'end': 1252.38, 'text': "And the way that we can train this variational autoencoder is by defining a loss function that's going to be a function of the data x,", 'start': 1242.372, 'duration': 10.008}, {'end': 1254.741, 'text': 'as well as these sets of weights, phi and theta.', 'start': 1252.38, 'duration': 2.361}, {'end': 1264.369, 'text': "And what's key to how VAEs can be optimized is that this loss function is now comprised of two terms instead of just one.", 'start': 1255.762, 'duration': 8.607}], 'summary': 'Introducing stochasticity into the model makes the encoder and decoder probabilistic, leading to a loss function with two terms instead of one.', 'duration': 74.553, 'max_score': 1189.816, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1189816.jpg'}, {'end': 1413.891, 'src': 'heatmap', 'start': 1289.302, 'weight': 0.73, 'content': [{'end': 1300.235, 'text': "let's first emphasize again that our overall loss function is going to be defined and taken with respect to the sets of weights of the encoder and decoder and the input x.", 'start': 1289.302, 'duration': 10.933}, {'end': 1310.531, 'text': 'The reconstruction loss is very similar to before, right? And you can think of it as being driven by a log likelihood function.', 'start': 1302.227, 'duration': 8.304}, {'end': 1316.153, 'text': 'For example, for image data, the mean squared error between the input and the output.', 'start': 1310.911, 'duration': 5.242}, {'end': 1319.875, 'text': 'And we can self-supervise the reconstruction loss,', 'start': 1316.733, 'duration': 3.142}, {'end': 1327.718, 'text': 'just as before to force the latent space to learn and represent faithful representations of the input data,', 'start': 1319.875, 'duration': 7.843}, {'end': 1329.899, 'text': 'ultimately resulting in faithful reconstructions.', 'start': 1327.718, 'duration': 2.181}, {'end': 1337.998, 'text': 'The new term here, the regularization term, is a bit more interesting and completely new at this stage,', 'start': 1331.394, 'duration': 6.604}, {'end': 1341.681, 'text': "so we're going to dive in and discuss it further in a bit more detail.", 'start': 1337.998, 'duration': 3.683}, {'end': 1353.749, 'text': "So our probability distribution that's going to be computed by our encoder, q phi of z of x, is a distribution on the latent space z given the data x.", 'start': 1342.942, 'duration': 10.807}, {'end': 1365.504, 'text': "And what regularization enforces is that, as a part of this learning process, we're going to place a prior on the latent space z,", 'start': 1354.795, 'duration': 10.709}, {'end': 1372.849, 'text': 'which is effectively some initial hypothesis about what we expect the distributions of z to actually look like.', 'start': 1365.504, 'duration': 7.345}, {'end': 1376.552, 'text': 'And by imposing this regularization term,', 'start': 1373.69, 'duration': 2.862}, {'end': 1384.138, 'text': "what we can achieve is that the model will try to enforce the z's that it learns to follow this prior distribution.", 'start': 1376.552, 'duration': 7.586}, {'end': 1392.064, 'text': "And we're going to denote this prior as p This term here, d, is the regularization term.", 'start': 1384.799, 'duration': 7.265}, {'end': 1404.766, 'text': "And what it's going to do is it's going to try to enforce a minimization of the divergence or the difference between what the encoder is trying to infer,", 'start': 1392.704, 'duration': 12.062}, {'end': 1413.891, 'text': "the probability distribution of z given x, and that prior that we're going to place on the latent variables p.", 'start': 1404.766, 'duration': 9.125}], 'summary': 'Loss function defined with respect to encoder and decoder weights, emphasizing reconstruction and regularization terms.', 'duration': 124.589, 'max_score': 1289.302, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1289302.jpg'}, {'end': 1372.849, 'src': 'embed', 'start': 1342.942, 'weight': 4, 'content': [{'end': 1353.749, 'text': "So our probability distribution that's going to be computed by our encoder, q phi of z of x, is a distribution on the latent space z given the data x.", 'start': 1342.942, 'duration': 10.807}, {'end': 1365.504, 'text': "And what regularization enforces is that, as a part of this learning process, we're going to place a prior on the latent space z,", 'start': 1354.795, 'duration': 10.709}, {'end': 1372.849, 'text': 'which is effectively some initial hypothesis about what we expect the distributions of z to actually look like.', 'start': 1365.504, 'duration': 7.345}], 'summary': 'The encoder computes q phi of z of x, a distribution on the latent space z given the data x, with regularization enforcing a prior on the latent space z.', 'duration': 29.907, 'max_score': 1342.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1342942.jpg'}, {'end': 1442.824, 'src': 'embed', 'start': 1418.817, 'weight': 3, 'content': [{'end': 1431.233, 'text': "we can try to keep the network from overfitting on certain parts of the latent space by enforcing the fact that we want to encourage the latent variables to adopt a distribution that's similar to our prior.", 'start': 1418.817, 'duration': 12.416}, {'end': 1434.878, 'text': "So we're going to go through now you know,", 'start': 1432.116, 'duration': 2.762}, {'end': 1442.824, 'text': 'both the mathematical basis for this regularization term as well as a really intuitive walkthrough of what regularization achieves,', 'start': 1434.878, 'duration': 7.946}], 'summary': 'Regularization encourages latent variables to adopt a distribution similar to the prior.', 'duration': 24.007, 'max_score': 1418.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1418817.jpg'}], 'start': 880.902, 'title': 'Learning latent representations with variational autoencoders', 'summary': 'Explains how autoencoders and variational autoencoders learn compressed latent representations through reconstruction and regularization losses, and the transformative idea of unsupervised learning.', 'chapters': [{'end': 1057.406, 'start': 880.902, 'title': 'Autoencoder: learning latent representations', 'summary': 'Explains how autoencoders use a reconstruction loss to learn compressed latent representations of data, self-supervised by forcing the network to encode maximum information into a lower dimensional space, impacting the quality of reconstructions. it introduces the concept of variational autoencoders (vaes) and emphasizes the transformative idea of unsupervised learning.', 'duration': 176.504, 'highlights': ['Autoencoder structures use a bottlenecking hidden layer to learn a compressed latent representation of the data, self-supervised by a reconstruction loss, forcing the network to encode as much information as possible into a lower dimensional latent space, impacting the quality of reconstructions.', "The lower the dimensionality of the latent space, the poorer the quality of reconstruction, illustrating the impact of imposing an information bottleneck during the model's training and learning process.", 'Variational autoencoders (VAEs) introduce the concept of a probabilistic latent space, enabling the generation of new data by sampling from the learned distribution, thus emphasizing the transformative idea of unsupervised learning.']}, {'end': 1613.073, 'start': 1058.166, 'title': 'Variational autoencoders: probabilistic encoding', 'summary': 'Explains how variational autoencoders use stochastic sampling to generate smoother representations of input data, introduce a regularization loss to encourage a prior on latent space, and enforce normal gaussian priors on latent variables to evenly distribute encodings.', 'duration': 554.907, 'highlights': ['Variational autoencoders use stochastic sampling to generate smoother representations of input data and improve the quality of reconstructions and new image generation.', 'Introducing a regularization loss in VAEs enforces a prior on the latent space to encourage even distribution of encodings and prevent overfitting.', 'The regularization term in VAEs uses the Kublai-Leibler divergence to encourage encodings to follow the chosen prior distribution.']}], 'duration': 732.171, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw880902.jpg', 'highlights': ['Variational autoencoders (VAEs) introduce the concept of a probabilistic latent space, enabling the generation of new data by sampling from the learned distribution, thus emphasizing the transformative idea of unsupervised learning.', 'Autoencoder structures use a bottlenecking hidden layer to learn a compressed latent representation of the data, self-supervised by a reconstruction loss, forcing the network to encode as much information as possible into a lower dimensional latent space, impacting the quality of reconstructions.', 'Variational autoencoders use stochastic sampling to generate smoother representations of input data and improve the quality of reconstructions and new image generation.', 'Introducing a regularization loss in VAEs enforces a prior on the latent space to encourage even distribution of encodings and prevent overfitting.', 'The regularization term in VAEs uses the Kublai-Leibler divergence to encourage encodings to follow the chosen prior distribution.', "The lower the dimensionality of the latent space, the poorer the quality of reconstruction, illustrating the impact of imposing an information bottleneck during the model's training and learning process."]}, {'end': 2089.739, 'segs': [{'end': 1674.741, 'src': 'embed', 'start': 1642.028, 'weight': 2, 'content': [{'end': 1646.792, 'text': "Alright, so to do this, let's consider the following question.", 'start': 1642.028, 'duration': 4.764}, {'end': 1651.336, 'text': 'What properties do we want this to achieve from regularization?', 'start': 1647.492, 'duration': 3.844}, {'end': 1655.539, 'text': 'Why are we actually regularizing our network in the first place?', 'start': 1651.816, 'duration': 3.723}, {'end': 1665.557, 'text': 'The first key property that we want for a generative model like a VAE is what I like to think of as continuity,', 'start': 1657.093, 'duration': 8.464}, {'end': 1674.741, 'text': 'which means that if there are points that are represented closely in the latent space, they should also result in similar reconstructions,', 'start': 1665.557, 'duration': 9.184}], 'summary': 'Regularization in vae aims to achieve continuity in latent space for similar reconstructions.', 'duration': 32.713, 'max_score': 1642.028, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1642028.jpg'}, {'end': 1957.359, 'src': 'embed', 'start': 1930.02, 'weight': 1, 'content': [{'end': 1936.463, 'text': 'And you can also have means that are totally divergent from each other, which result in discontinuities in the latent space.', 'start': 1930.02, 'duration': 6.443}, {'end': 1941.566, 'text': 'And this can occur while still trying to optimize that reconstruction loss.', 'start': 1937.003, 'duration': 4.563}, {'end': 1947.95, 'text': 'direct consequence of not regularizing.', 'start': 1942.745, 'duration': 5.205}, {'end': 1957.359, 'text': 'In order to overcome these problems, we need to regularize the variance and the mean of these distributions that are being returned by the encoder.', 'start': 1948.15, 'duration': 9.209}], 'summary': 'Regularizing variance and mean of divergent means to avoid latent space discontinuities.', 'duration': 27.339, 'max_score': 1930.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1930020.jpg'}, {'end': 2021.9, 'src': 'embed', 'start': 1994.278, 'weight': 0, 'content': [{'end': 2005.96, 'text': 'And so this will ensure a smoothness and a regularity and an overlap in the latent space which will be very effective in helping us achieve these properties of continuity and completeness.', 'start': 1994.278, 'duration': 11.682}, {'end': 2010.262, 'text': 'centering the means, regularizing the variances.', 'start': 2007.816, 'duration': 2.446}, {'end': 2021.9, 'text': 'So the regularization via this normal prior, by centering each of these latent variables, regularizing their variances,', 'start': 2012.026, 'duration': 9.874}], 'summary': 'Regularization ensures smoothness, regularity, and overlap in latent space for continuity and completeness.', 'duration': 27.622, 'max_score': 1994.278, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1994278.jpg'}], 'start': 1613.073, 'title': 'Regularizing vae for continuity and completeness', 'summary': 'Emphasizes regularizing variational autoencoders (vaes) to achieve continuity and completeness in the latent space, highlighting the importance of properties such as smoothness, overlap, and trade-off between regularization and reconstruction quality.', 'chapters': [{'end': 1665.557, 'start': 1613.073, 'title': 'Regularizing vae for continuity', 'summary': 'Discusses the importance of regularizing variational autoencoder (vae) by considering the properties and intuition behind regularization, emphasizing the need for continuity in the generative model.', 'duration': 52.484, 'highlights': ['The need for regularization and the selection of a normal prior in VAE is discussed to achieve continuity in the generative model.', 'The chapter emphasizes the importance of continuity as a key property in a generative model like VAE.']}, {'end': 1849.772, 'start': 1665.557, 'title': 'Generative model regularization', 'summary': 'Discusses the key properties of continuity and completeness in the latent space of a generative model, emphasizing the importance of regularization in ensuring that points close in the latent space are similarly and meaningfully decoded.', 'duration': 184.215, 'highlights': ['The importance of continuity and completeness in the latent space of a generative model', 'Consequences of not regularizing the model', 'Desired properties for generative models', 'Role of normal prior in achieving regularization']}, {'end': 2089.739, 'start': 1851.514, 'title': 'Regularizing vaes with normal prior', 'summary': 'Discusses the importance of regularizing variational autoencoders (vaes) with a normal prior to achieve continuity and completeness in the latent space, where mean and variance are centered and regularized, enforcing smoothness and overlap, while avoiding pointed distributions and discontinuities, with a trade-off between regularization and reconstruction quality.', 'duration': 238.225, 'highlights': ['The normal prior encourages learned latent variable distributions to overlap in latent space, enforcing a centered mean and regularized variances for continuity and completeness.', 'Without regularization, VAEs may result in pointed distributions with very small variances and divergent means, leading to discontinuities in the latent space while optimizing the reconstruction loss.', 'Regularizing the variance and mean of the distributions returned by the encoder is crucial to overcome problems in VAEs, preventing discontinuities and pointed distributions.']}], 'duration': 476.666, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw1613073.jpg', 'highlights': ['The normal prior encourages learned latent variable distributions to overlap in latent space, enforcing a centered mean and regularized variances for continuity and completeness.', 'Regularizing the variance and mean of the distributions returned by the encoder is crucial to overcome problems in VAEs, preventing discontinuities and pointed distributions.', 'The need for regularization and the selection of a normal prior in VAE is discussed to achieve continuity in the generative model.', 'The chapter emphasizes the importance of continuity as a key property in a generative model like VAE.', 'Without regularization, VAEs may result in pointed distributions with very small variances and divergent means, leading to discontinuities in the latent space while optimizing the reconstruction loss.']}, {'end': 2484.868, 'segs': [{'end': 2156.376, 'src': 'embed', 'start': 2090.539, 'weight': 3, 'content': [{'end': 2098.323, 'text': 'These are all the components that define a forward pass through the network going from input to encoding to decoded reconstruction.', 'start': 2090.539, 'duration': 7.784}, {'end': 2103.03, 'text': "But we're still missing a critical step in putting the whole picture together.", 'start': 2099.608, 'duration': 3.422}, {'end': 2105.371, 'text': "And that's a backpropagation.", 'start': 2103.65, 'duration': 1.721}, {'end': 2112.495, 'text': "And the key here is that, because of this fact that we've introduced this stochastic sampling layer,", 'start': 2106.092, 'duration': 6.403}, {'end': 2119.699, 'text': "we now have a problem where we can't backpropagate gradients through a sampling layer that has this element of stochasticity.", 'start': 2112.495, 'duration': 7.204}, {'end': 2128.784, 'text': 'Backpropagation requires deterministic nodes, deterministic layers for which we can iteratively apply the chain rule to optimize gradients.', 'start': 2120.94, 'duration': 7.844}, {'end': 2132.372, 'text': 'optimize the loss via gradient descent.', 'start': 2130.091, 'duration': 2.281}, {'end': 2134.153, 'text': 'All right.', 'start': 2133.853, 'duration': 0.3}, {'end': 2143.779, 'text': 'VAEs introduced sort of a breakthrough idea that solved this issue of not being able to backpropagate through a sampling layer.', 'start': 2135.734, 'duration': 8.045}, {'end': 2153.064, 'text': 'And the key idea was to actually subtly re-parameterize the sampling operation such that the network could be trained completely end-to-end.', 'start': 2144.559, 'duration': 8.505}, {'end': 2156.376, 'text': 'So, as we already learned,', 'start': 2154.335, 'duration': 2.041}], 'summary': 'Vaes introduce re-parameterization to enable backpropagation through sampling layer.', 'duration': 65.837, 'max_score': 2090.539, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2090539.jpg'}, {'end': 2320.388, 'src': 'heatmap', 'start': 2257.006, 'weight': 0, 'content': [{'end': 2265.713, 'text': 'as well as this noise factor, epsilon, such that when we want to do back propagation through the network to update,', 'start': 2257.006, 'duration': 8.707}, {'end': 2273.717, 'text': 'we can directly backpropagate through z defined by mu and sigma squared, because this epsilon value is just taken as a constant.', 'start': 2266.454, 'duration': 7.263}, {'end': 2275.758, 'text': "It's reparameterized elsewhere.", 'start': 2274.077, 'duration': 1.681}, {'end': 2281.101, 'text': 'And this is a very, very powerful trick, the reparameterization trick,', 'start': 2276.939, 'duration': 4.162}, {'end': 2292.746, 'text': 'because it enables us to train variational autoencoders end-to-end by backpropagating with respect to z and with respect to the actual weights of the encoder network.', 'start': 2281.101, 'duration': 11.645}, {'end': 2294.867, 'text': 'All right.', 'start': 2294.607, 'duration': 0.26}, {'end': 2308.361, 'text': 'One side effect and one consequence of imposing these distributional priors on the latent variable is that we can actually sample from these latent variables and individually tune them,', 'start': 2296.394, 'duration': 11.967}, {'end': 2312.483, 'text': 'while keeping all of the other variables fixed.', 'start': 2308.361, 'duration': 4.122}, {'end': 2320.388, 'text': 'And what you can do is you can tune the value of a particular latent variable and run the decoder each time that variable is changed,', 'start': 2312.503, 'duration': 7.885}], 'summary': 'Reparameterization trick enables end-to-end training of variational autoencoders for latent variable tuning.', 'duration': 77.795, 'max_score': 2257.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2257006.jpg'}, {'end': 2434.181, 'src': 'embed', 'start': 2405.879, 'weight': 2, 'content': [{'end': 2408.481, 'text': 'So, if we consider the loss of a standard VAE,', 'start': 2405.879, 'duration': 2.602}, {'end': 2416.106, 'text': 'again we have this reconstruction term defined by a log likelihood and a regularization term defined by the KL divergence.', 'start': 2408.481, 'duration': 7.625}, {'end': 2423.732, 'text': 'Beta VAEs introduce a new hyperparameter, beta, which controls the strength of this regularization term.', 'start': 2416.906, 'duration': 6.826}, {'end': 2431.699, 'text': "And it's been shown mathematically that by increasing beta, the effect is to place constraints on the latent encoding,", 'start': 2424.392, 'duration': 7.307}, {'end': 2434.181, 'text': 'such as to encourage disentanglement.', 'start': 2431.699, 'duration': 2.482}], 'summary': 'Beta vae introduces a hyperparameter beta to control the strength of the regularization term, encouraging disentanglement.', 'duration': 28.302, 'max_score': 2405.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2405879.jpg'}], 'start': 2090.539, 'title': 'Variational autoencoders (vaes) and their training techniques', 'summary': 'Discusses the challenges of backpropagating through a stochastic sampling layer in vaes and introduces the breakthrough re-parameterization trick, enabling end-to-end training. it explains representing the latent vector using fixed mu and sigma scaled by random constants drawn from a prior distribution. the use of distributional priors on latent variables is explored, demonstrating how individual tuning can lead to disentangled and semantically meaningful representations, with examples of head pose and smile variations, and the use of beta vaes to achieve uncorrelated latent features.', 'chapters': [{'end': 2156.376, 'start': 2090.539, 'title': 'Vaes and backpropagation', 'summary': 'Discusses the challenge of backpropagating through a stochastic sampling layer and introduces the breakthrough idea of re-parameterizing the sampling operation in vaes to enable end-to-end training.', 'duration': 65.837, 'highlights': ['VAEs introduced a breakthrough idea to enable backpropagation through a sampling layer by re-parameterizing the sampling operation for end-to-end training.', 'Backpropagation requires deterministic nodes and layers, which poses a problem when dealing with stochastic sampling layers.', "The chapter emphasizes the critical step of backpropagation in completing the entire picture of the network's forward pass."]}, {'end': 2294.867, 'start': 2156.376, 'title': 'Reparameterization trick in vae', 'summary': 'Explains the reparameterization trick in vae, which involves representing the latent vector z as a sum of fixed mu and sigma, scaled by random constants drawn from a prior distribution, enabling end-to-end training of variational autoencoders.', 'duration': 138.491, 'highlights': ['The reparameterization trick involves representing the latent vector z as a sum of fixed mu and sigma, scaled by random constants drawn from a prior distribution.', 'The stochastic sampling node Z is reparameterized to include a noise factor, epsilon, as a constant, allowing direct backpropagation through z defined by mu and sigma squared.', 'The reparameterization trick enables end-to-end training of variational autoencoders by backpropagating with respect to z and the actual weights of the encoder network.']}, {'end': 2484.868, 'start': 2296.394, 'title': 'Optimizing vaes for disentangled latent variables', 'summary': 'Discusses the use of distributional priors on latent variables in variational autoencoders (vaes), demonstrating how individual tuning of latent variables can lead to disentangled and semantically meaningful representations, with examples of head pose and smile variations and the use of beta vaes to achieve uncorrelated latent features.', 'duration': 188.474, 'highlights': ['Beta VAEs introduce a new hyperparameter, beta, which controls the strength of the regularization term, and increasing beta encourages disentanglement, leading to relatively constant smile while perturbing head rotation.', 'The ability to tune individual latent variables and run the decoder to generate new reconstructed output enables the encoding of different latent features that can be interpreted, such as variations in head pose and smile.', 'Imposing distributional priors on latent variables in VAEs allows sampling and individual tuning of latent variables, aiming for uncorrelated latent variables to achieve the richest and most compact latent representation.']}], 'duration': 394.329, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2090539.jpg', 'highlights': ['The reparameterization trick enables end-to-end training of variational autoencoders by backpropagating with respect to z and the actual weights of the encoder network.', 'Imposing distributional priors on latent variables in VAEs allows sampling and individual tuning of latent variables, aiming for uncorrelated latent variables to achieve the richest and most compact latent representation.', 'Beta VAEs introduce a new hyperparameter, beta, which controls the strength of the regularization term, and increasing beta encourages disentanglement, leading to relatively constant smile while perturbing head rotation.', 'VAEs introduced a breakthrough idea to enable backpropagation through a sampling layer by re-parameterizing the sampling operation for end-to-end training.', 'The ability to tune individual latent variables and run the decoder to generate new reconstructed output enables the encoding of different latent features that can be interpreted, such as variations in head pose and smile.', 'Backpropagation requires deterministic nodes and layers, which poses a problem when dealing with stochastic sampling layers.', "The chapter emphasizes the critical step of backpropagation in completing the entire picture of the network's forward pass.", 'The stochastic sampling node Z is reparameterized to include a noise factor, epsilon, as a constant, allowing direct backpropagation through z defined by mu and sigma squared.', 'The reparameterization trick involves representing the latent vector z as a sum of fixed mu and sigma, scaled by random constants drawn from a prior distribution.']}, {'end': 3161.219, 'segs': [{'end': 2526.9, 'src': 'embed', 'start': 2502.915, 'weight': 0, 'content': [{'end': 2509.482, 'text': 'that can be used to achieve automatic debiasing of facial classification systems, facial detection systems.', 'start': 2502.915, 'duration': 6.567}, {'end': 2519.131, 'text': 'And the power and the idea of this approach is to build up a representation, a learned latent distribution of face data,', 'start': 2510.042, 'duration': 9.089}, {'end': 2526.9, 'text': 'and use this to identify regions of that latent space that are going to be overrepresented or underrepresented.', 'start': 2520.052, 'duration': 6.848}], 'summary': 'Automatic debiasing of facial classification systems using learned latent distribution.', 'duration': 23.985, 'max_score': 2502.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2502915.jpg'}, {'end': 2676.711, 'src': 'heatmap', 'start': 2598.763, 'weight': 0.936, 'content': [{'end': 2603.767, 'text': 'Reconstruction of the data input allows for unsupervised learning without labels.', 'start': 2598.763, 'duration': 5.004}, {'end': 2610.566, 'text': 'We can use the reparameterization trick to train VAEs end-to-end.', 'start': 2606.445, 'duration': 4.121}, {'end': 2617.289, 'text': 'We can take hidden latent variables, perturb them to interpret their content and their meaning.', 'start': 2612.007, 'duration': 5.282}, {'end': 2621.911, 'text': 'And finally, we can sample from the latent space to generate new examples.', 'start': 2618.169, 'duration': 3.742}, {'end': 2633.555, 'text': 'But what if we wanted to focus on generating samples and synthetic samples that were as faithful to a data distribution generally as possible?', 'start': 2623.571, 'duration': 9.984}, {'end': 2641.68, 'text': "To understand how we can achieve this, we're going to transition to discuss a new type of generative model called a generative adversarial network,", 'start': 2634.454, 'duration': 7.226}, {'end': 2643.101, 'text': 'or GAN, for short.', 'start': 2641.68, 'duration': 1.421}, {'end': 2654.69, 'text': "The idea here is that we don't want to explicitly model the density or the distribution underlying some data,", 'start': 2645.223, 'duration': 9.467}, {'end': 2661.716, 'text': 'but instead just learn a representation that can be successful in generating new instances that are similar to the data.', 'start': 2654.69, 'duration': 7.026}, {'end': 2672.985, 'text': 'which means that we want to optimize to sample from a very, very complex distribution which cannot be learned and modeled directly.', 'start': 2663.151, 'duration': 9.834}, {'end': 2676.711, 'text': "Instead, we're going to have to build up some approximation of this distribution.", 'start': 2673.366, 'duration': 3.345}], 'summary': 'Using reparameterization trick to train vaes and discussing generative adversarial network (gan) for generating new instances.', 'duration': 77.948, 'max_score': 2598.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2598763.jpg'}, {'end': 2661.716, 'src': 'embed', 'start': 2634.454, 'weight': 1, 'content': [{'end': 2641.68, 'text': "To understand how we can achieve this, we're going to transition to discuss a new type of generative model called a generative adversarial network,", 'start': 2634.454, 'duration': 7.226}, {'end': 2643.101, 'text': 'or GAN, for short.', 'start': 2641.68, 'duration': 1.421}, {'end': 2654.69, 'text': "The idea here is that we don't want to explicitly model the density or the distribution underlying some data,", 'start': 2645.223, 'duration': 9.467}, {'end': 2661.716, 'text': 'but instead just learn a representation that can be successful in generating new instances that are similar to the data.', 'start': 2654.69, 'duration': 7.026}], 'summary': 'Exploring gans for generating similar data instances.', 'duration': 27.262, 'max_score': 2634.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2634454.jpg'}, {'end': 2925.403, 'src': 'embed', 'start': 2886.249, 'weight': 2, 'content': [{'end': 2893.457, 'text': "At this point, it's going to be really, really hard for the discriminator to effectively distinguish between what is real and what is fake,", 'start': 2886.249, 'duration': 7.208}, {'end': 2899.144, 'text': 'while the generator is going to continue to try to create fake data instances to fool the discriminator.', 'start': 2893.457, 'duration': 5.687}, {'end': 2906.733, 'text': 'And this is really the key intuition behind how these two components of GANs are essentially competing with each other.', 'start': 2899.965, 'duration': 6.768}, {'end': 2909.311, 'text': 'All right.', 'start': 2908.49, 'duration': 0.821}, {'end': 2912.253, 'text': 'so, to summarize how we train GANs,', 'start': 2909.311, 'duration': 2.942}, {'end': 2922.921, 'text': 'the generator is going to try to synthesize fake instances to fool a discriminator which is going to be trained to identify the synthesized instances and discriminate these as fake.', 'start': 2912.253, 'duration': 10.668}, {'end': 2925.403, 'text': 'To actually train.', 'start': 2924.182, 'duration': 1.221}], 'summary': "Gans involve a competition between a generator creating fake data and a discriminator identifying it as fake. the discriminator's challenge is to distinguish between real and fake data.", 'duration': 39.154, 'max_score': 2886.249, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2886249.jpg'}, {'end': 3061.376, 'src': 'heatmap', 'start': 2950.237, 'weight': 4, 'content': [{'end': 2953.46, 'text': "So let's go through how the loss function for a GAN breaks down.", 'start': 2950.237, 'duration': 3.223}, {'end': 2965.3, 'text': "The loss term for a GAN is based on that familiar cross entropy loss, and it's going to now be defined between the true and generated distributions.", 'start': 2955.477, 'duration': 9.823}, {'end': 2970.262, 'text': "So we're first going to consider the loss from the perspective of the discriminator.", 'start': 2965.94, 'duration': 4.322}, {'end': 2977.984, 'text': 'We want to try to maximize the probability that the fake data is identified as fake.', 'start': 2972.242, 'duration': 5.742}, {'end': 2983.371, 'text': "And so to break this down, here g defines the generator's output.", 'start': 2978.888, 'duration': 4.483}, {'end': 2991.257, 'text': "And so d is the discriminator's estimate of the probability that a fake instance is actually fake.", 'start': 2983.732, 'duration': 7.525}, {'end': 2997.702, 'text': "d is the discriminator's estimate of the probability that a real instance is fake.", 'start': 2992.658, 'duration': 5.044}, {'end': 3003.466, 'text': 'So 1 minus d is its probability estimate that a real instance is real.', 'start': 2998.122, 'duration': 5.344}, {'end': 3009.633, 'text': 'So together, from the point of view of the discriminator, we want to maximize this probability.', 'start': 3004.608, 'duration': 5.025}, {'end': 3014.859, 'text': 'Maximize probability fake is fake, maximize the estimate of probability real is real.', 'start': 3010.334, 'duration': 4.525}, {'end': 3018.635, 'text': "Now let's turn our attention to the generator.", 'start': 3016.573, 'duration': 2.062}, {'end': 3024.279, 'text': 'Remember that the generator is taking random noise and generating an instance.', 'start': 3019.515, 'duration': 4.764}, {'end': 3036.028, 'text': "It cannot directly affect the term d which shows up in the loss, because d is solely based on the discriminator's operation on the real data.", 'start': 3024.899, 'duration': 11.129}, {'end': 3037.689, 'text': 'So for the generator.', 'start': 3036.528, 'duration': 1.161}, {'end': 3045.175, 'text': "the generator is going to have the adversarial objective to the discriminator, which means it's going to try to minimize this term.", 'start': 3037.689, 'duration': 7.486}, {'end': 3061.376, 'text': 'effectively minimizing the probability that the discriminator can distinguish its generated data as fake, d of g of z.', 'start': 3046.829, 'duration': 14.547}], 'summary': 'Loss function of gan maximizes probability fake is fake, minimize probability real is fake.', 'duration': 27.747, 'max_score': 2950.237, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2950237.jpg'}, {'end': 3130.664, 'src': 'heatmap', 'start': 3078.725, 'weight': 0.745, 'content': [{'end': 3084.79, 'text': 'the discriminator is going to be as best as it possibly can be at discriminating real versus fake.', 'start': 3078.725, 'duration': 6.065}, {'end': 3091.492, 'text': 'Therefore, the ultimate goal of the generator is to synthesize fake instances that fool the best discriminator.', 'start': 3085.47, 'duration': 6.022}, {'end': 3099.515, 'text': 'And this is all put together in this min-max objective function, which has these two components optimized adversarially.', 'start': 3091.972, 'duration': 7.543}, {'end': 3105.437, 'text': 'And then, after training, we can actually use the generator network, which is now fully trained,', 'start': 3100.395, 'duration': 5.042}, {'end': 3108.958, 'text': 'to produce new data instances that have never been seen before.', 'start': 3105.437, 'duration': 3.521}, {'end': 3111.278, 'text': "So we're going to focus on that now.", 'start': 3109.858, 'duration': 1.42}, {'end': 3118.84, 'text': 'And what is really cool is that when the train generator of a GAN synthesizes new instances,', 'start': 3112.119, 'duration': 6.721}, {'end': 3125.662, 'text': "it's effectively learning a transformation from a distribution of noise to a target data distribution.", 'start': 3118.84, 'duration': 6.822}, {'end': 3130.664, 'text': "And that transformation, that mapping, is going to be what's learned over the course of training.", 'start': 3126.262, 'duration': 4.402}], 'summary': 'The ultimate goal is for the generator to fool the best discriminator by producing new data instances, effectively learning a transformation from noise to the target data distribution.', 'duration': 51.939, 'max_score': 3078.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3078725.jpg'}], 'start': 2484.868, 'title': 'Gans and variational autoencoders', 'summary': 'Discusses the application of variational autoencoders for automatic debiasing of facial classification systems, as well as the concept and functionality of generative adversarial networks (gans) for generating synthetic instances close to real data distribution. it covers unsupervised learning, reparameterization trick, perturbing latent variables, and the competition between generator and discriminator networks in gans. additionally, it explains the training process of gans, highlighting the competition between the generator and discriminator, the loss function based on cross entropy, and the ultimate goal of synthesizing fake instances to fool the discriminator, leading to the generation of new data instances.', 'chapters': [{'end': 2825.661, 'start': 2484.868, 'title': 'Variational autoencoder and generative adversarial network', 'summary': 'Discusses the application of variational autoencoders in automatic debiasing of facial classification systems, as well as the concept and functionality of generative adversarial networks (gans) for generating synthetic instances close to real data distribution. the key points include unsupervised learning, reparameterization trick, perturbing latent variables, and the competition between generator and discriminator networks in gans.', 'duration': 340.793, 'highlights': ['Variational autoencoders (VAEs) can achieve automatic debiasing of facial classification systems by building a learned latent distribution of face data and adjusting the training process to place greater weight on underrepresented regions of the latent space.', 'VAEs eliminate the need for manual annotation of important features for debiasing, as the model learns them automatically, and also pave the way for exploring algorithmic bias and machine learning fairness.', 'Generative adversarial networks (GANs) aim to learn a representation that can successfully generate new instances similar to the data distribution, using a generator and discriminator network that compete against each other to improve the generation of synthetic data.', 'The breakthrough idea of GANs involves starting from random noise and utilizing a generative neural network to learn a transformation that maps the noise to the data distribution, with the discriminator being trained to distinguish between fake and real data.', 'Training of GANs involves the generator attempting to produce better synthetic data to fool the discriminator, leading to an iterative improvement process for both networks.']}, {'end': 3161.219, 'start': 2825.661, 'title': 'Understanding gans training', 'summary': 'Explains the training process of generative adversarial networks (gans), highlighting the competition between the generator and discriminator, the loss function based on cross entropy, and the ultimate goal of synthesizing fake instances to fool the discriminator, leading to the generation of new data instances.', 'duration': 335.558, 'highlights': ['The generator competes with the discriminator to synthesize fake instances that closely resemble real data, making it challenging for the discriminator to distinguish between real and fake data.', 'The loss function for GANs is based on cross entropy and has adversarial objectives for the discriminator and the generator, aiming for the generator to perfectly reproduce the true data distribution, making it indistinguishable from the real data.', "The generator's goal is to synthesize fake instances that fool the discriminator by minimizing the probability that the discriminator can distinguish its generated data as fake.", 'The trained generator of a GAN can produce new data instances that have never been seen before, effectively learning a transformation from a distribution of noise to a target data distribution.']}], 'duration': 676.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw2484868.jpg', 'highlights': ['VAEs can achieve automatic debiasing of facial classification systems by learning a latent distribution of face data.', 'GANs aim to learn a representation that can generate new instances similar to the data distribution.', 'Training of GANs involves the generator attempting to produce better synthetic data to fool the discriminator.', 'The generator competes with the discriminator to synthesize fake instances that closely resemble real data.', 'The loss function for GANs is based on cross entropy and has adversarial objectives for the discriminator and the generator.']}, {'end': 3712.362, 'segs': [{'end': 3253.296, 'src': 'embed', 'start': 3203.591, 'weight': 0, 'content': [{'end': 3216.769, 'text': 'And this is done by progressively adding layers of increasing spatial resolution in the case of image data and by incrementally building up both the generator and discriminator networks.', 'start': 3203.591, 'duration': 13.178}, {'end': 3224.633, 'text': 'in this way, as training progresses, it results in very well-resolved synthetic images that are output ultimately by the generator.', 'start': 3216.769, 'duration': 7.864}, {'end': 3230.436, 'text': 'So some results of this idea of a progressive GAN are displayed here.', 'start': 3225.393, 'duration': 5.043}, {'end': 3242.574, 'text': 'Another idea that has also led to tremendous improvement in the quality of synthetic examples generated by GANs is an architecture improvement called StyleGAN,', 'start': 3231.932, 'duration': 10.642}, {'end': 3249.575, 'text': 'which combines this idea of progressive growing that I introduced earlier with principles of style transfer,', 'start': 3242.574, 'duration': 7.001}, {'end': 3253.296, 'text': 'which means trying to compose an image in the style of another image.', 'start': 3249.575, 'duration': 3.721}], 'summary': 'Progressive gan and stylegan improve synthetic image quality.', 'duration': 49.705, 'max_score': 3203.591, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3203591.jpg'}, {'end': 3315.132, 'src': 'embed', 'start': 3284.852, 'weight': 2, 'content': [{'end': 3287.114, 'text': 'can be reflected in these synthetic examples.', 'start': 3284.852, 'duration': 2.262}, {'end': 3301.543, 'text': 'This same StyleGAN system has led to tremendously realistic synthetic images in the areas of both face synthesis, as well as for animals,', 'start': 3288.308, 'duration': 13.235}, {'end': 3302.564, 'text': 'other objects as well.', 'start': 3301.543, 'duration': 1.021}, {'end': 3315.132, 'text': 'Another extension to the GAN architecture that has enabled particularly powerful applications for select problems and tasks is this idea of conditioning,', 'start': 3304.505, 'duration': 10.627}], 'summary': 'Stylegan enables realistic synthetic images for faces, animals, and objects.', 'duration': 30.28, 'max_score': 3284.852, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3284852.jpg'}, {'end': 3585.64, 'src': 'embed', 'start': 3552.857, 'weight': 3, 'content': [{'end': 3555.338, 'text': 'And it can achieve these distribution transformations.', 'start': 3552.857, 'duration': 2.481}, {'end': 3566.111, 'text': "Finally, I'd like to consider one additional application that you may be familiar with of using cycle GANs,", 'start': 3557.827, 'duration': 8.284}, {'end': 3572.994, 'text': "and that's to transform speech and to actually use this cycle GAN technique to synthesize speech in someone else's voice.", 'start': 3566.111, 'duration': 6.883}, {'end': 3585.64, 'text': 'And the way this is done is by taking a bunch of audio recordings in one voice and audio recordings in another voice and converting those audio waveforms into an image representation,', 'start': 3573.795, 'duration': 11.845}], 'summary': "Cycle gans can transform speech to synthesize speech in someone else's voice.", 'duration': 32.783, 'max_score': 3552.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3552857.jpg'}, {'end': 3703.44, 'src': 'embed', 'start': 3678.297, 'weight': 5, 'content': [{'end': 3685.662, 'text': "And with that I'd like to close the lecture and introduce you to the remainder of today's course,", 'start': 3678.297, 'duration': 7.365}, {'end': 3689.404, 'text': 'which is going to focus on our second lab on computer vision.', 'start': 3685.662, 'duration': 3.742}, {'end': 3703.44, 'text': 'specifically exploring this question of debiasing in facial detection systems and using variational autoencoders to actually achieve an approach for automatic debiasing of classification systems.', 'start': 3690.436, 'duration': 13.004}], 'summary': 'Focus on debiasing facial detection using variational autoencoders.', 'duration': 25.143, 'max_score': 3678.297, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3678297.jpg'}], 'start': 3161.759, 'title': 'Recent advances in gans', 'summary': "Highlights recent advances in gans, such as progressive gans, stylegan, conditioning, paired and unpaired image-to-image translation, and cyclegan, showcasing their impact on synthetic image generation and transformation. it also discusses the effectiveness of cycle gans in transforming data manifolds and their application for speech synthesis, demonstrated by synthesizing obama's voice from alexander's voice using spectrogram images.", 'chapters': [{'end': 3524.526, 'start': 3161.759, 'title': 'Advances in gans', 'summary': 'Highlights recent advances in gans, including progressive gans, stylegan, conditioning, paired and unpaired image-to-image translation, and cyclegan, showcasing their impact on synthetic image generation and transformation.', 'duration': 362.767, 'highlights': ['Progressive GANs with iterative addition of layers and increasing spatial resolution result in well-resolved synthetic images', 'StyleGAN combines progressive growing with style transfer, resulting in realistic synthetic images for faces, animals, and objects', 'Conditioning in GAN architecture enables powerful applications for paired and unpaired image translation, as well as dynamic transformations']}, {'end': 3712.362, 'start': 3524.526, 'title': 'Cycle gans and speech synthesis', 'summary': "Discusses the effectiveness of cycle gans in transforming data manifolds and highlights an application of using cycle gans for speech synthesis, specifically in transforming voice representations from one person to another, demonstrated by synthesizing obama's voice from alexander's voice using spectrogram images.", 'duration': 187.836, 'highlights': ['Cycle GANs are effective in transforming data manifolds, such as converting voice representations from one domain to another.', "The application of Cycle GANs in synthesizing speech involves transforming voice representations from one person to appear like another person's voice, such as creating a spectrogram representation of Obama's voice from Alexander's voice.", 'Introduction to the next focus on debiasing in facial detection systems and using variational autoencoders for automatic debiasing of classification systems.']}], 'duration': 550.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BUNl0To1IVw/pics/BUNl0To1IVw3161759.jpg', 'highlights': ['Progressive GANs with iterative addition of layers and increasing spatial resolution result in well-resolved synthetic images', 'StyleGAN combines progressive growing with style transfer, resulting in realistic synthetic images for faces, animals, and objects', 'Conditioning in GAN architecture enables powerful applications for paired and unpaired image translation, as well as dynamic transformations', 'Cycle GANs are effective in transforming data manifolds, such as converting voice representations from one domain to another', "The application of Cycle GANs in synthesizing speech involves transforming voice representations from one person to appear like another person's voice, such as creating a spectrogram representation of Obama's voice from Alexander's voice", 'Introduction to the next focus on debiasing in facial detection systems and using variational autoencoders for automatic debiasing of classification systems']}], 'highlights': ['Progressive GANs result in well-resolved synthetic images', 'StyleGAN combines progressive growing with style transfer for realistic synthetic images', 'VAEs can achieve automatic debiasing of facial classification systems', 'Beta VAEs control the strength of the regularization term for disentanglement', 'VAEs use stochastic sampling to generate smoother representations of input data', 'Generative models can be used to detect outliers within training distributions', 'The reparameterization trick enables end-to-end training of variational autoencoders', 'Training of GANs involves the generator attempting to produce better synthetic data', 'The loss function for GANs is based on cross entropy and has adversarial objectives', 'Autoencoders are used to learn a lower dimensional latent space from raw data']}