title
MIT 6.S191 (2020): Deep Generative Modeling

description
MIT 6.S191 (2020): Introduction to Deep Learning Deep Generative Modeling Lecturer: Ava Soleimany January 2020 For all lectures, slides, and lab materials: http://introtodeeplearning.com Lecture Outline 0:00 - Introduction 4:37 - Why do we care? 6:36 - Latent variable models 8:12 - Autoencoders 13:30 - Variational autoencoders 20:18 - Reparameterization trick 23:55 - Latent pertubation 26:12 - Debiasing with VAEs 30:40 - Generative adversarial networks 32:40 - Intuitions behind GANs 35:12 - GANs: Recent advances 39:38 - Summary Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

detail
{'title': 'MIT 6.S191 (2020): Deep Generative Modeling', 'heatmap': [], 'summary': 'Explores deep generative modeling, deep neural networks, generative models, autoencoders, vaes, addressing algorithmic bias in face recognition, and cyclegan applications, showcasing their significance in creating new data, uncovering hidden structures, and mitigating biases.', 'chapters': [{'end': 97.626, 'segs': [{'end': 56.645, 'src': 'embed', 'start': 31.874, 'weight': 0, 'content': [{'end': 37.216, 'text': 'And this is a really new and emerging field within deep learning,', 'start': 31.874, 'duration': 5.342}, {'end': 42.499, 'text': "and it's enjoying a lot of success and attention right now and in the past couple years, especially.", 'start': 37.216, 'duration': 5.283}, {'end': 48.201, 'text': 'And so this broadly can be considered this field of deep generative modeling.', 'start': 42.519, 'duration': 5.682}, {'end': 51.823, 'text': "So I'd like to start by taking a quick poll.", 'start': 48.922, 'duration': 2.901}, {'end': 54.444, 'text': 'So study these three faces for a moment.', 'start': 52.243, 'duration': 2.201}, {'end': 56.645, 'text': 'These are three faces.', 'start': 55.165, 'duration': 1.48}], 'summary': 'Emerging field in deep learning, with success and attention in past years, focused on deep generative modeling.', 'duration': 24.771, 'max_score': 31.874, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq431874.jpg'}], 'start': 9.423, 'title': 'Deep generative modeling', 'summary': 'Introduces the concept of deep generative modeling, a new field within deep learning that focuses on creating systems that can learn to generate brand new data. it has gained attention and success, as demonstrated by the ability to create convincing fake faces in december 2019.', 'chapters': [{'end': 97.626, 'start': 9.423, 'title': 'Deep generative modeling', 'summary': 'Introduces the concept of deep generative modeling, a new field within deep learning that focuses on creating systems that can learn to generate brand new data based on learned information, enjoying success and attention in recent years, as demonstrated by the ability to create convincing fake faces in december 2019.', 'duration': 88.203, 'highlights': ['Deep generative modeling is an emerging field within deep learning that focuses on creating systems that can learn to generate new data based on learned information.', 'The ability to create convincing fake faces was demonstrated in December 2019, showcasing the recent advancements in deep generative modeling.']}], 'duration': 88.203, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq49423.jpg', 'highlights': ['The ability to create convincing fake faces was demonstrated in December 2019, showcasing the recent advancements in deep generative modeling.', 'Deep generative modeling is an emerging field within deep learning that focuses on creating systems that can learn to generate new data based on learned information.']}, {'end': 384.637, 'segs': [{'end': 146.107, 'src': 'embed', 'start': 98.107, 'weight': 0, 'content': [{'end': 99.729, 'text': 'And today, by the end of this lecture,', 'start': 98.107, 'duration': 1.622}, {'end': 109.739, 'text': "you're going to have a sense of how a deep neural network can be used to generate data that is so realistic that it fooled many of you,", 'start': 99.729, 'duration': 10.01}, {'end': 110.8, 'text': 'or rather all of you.', 'start': 109.739, 'duration': 1.061}, {'end': 118.365, 'text': "OK, so so far in this course we've been considering this problem of supervised learning,", 'start': 112.921, 'duration': 5.444}, {'end': 123.808, 'text': "which means we're given a set of data and a set of labels that go along with that data.", 'start': 118.365, 'duration': 5.443}, {'end': 128.731, 'text': 'And our goal is to learn a functional mapping that goes from data to labels.', 'start': 123.888, 'duration': 4.843}, {'end': 132.013, 'text': 'And in this course, this is a course on deep learning.', 'start': 129.512, 'duration': 2.501}, {'end': 137.457, 'text': "And we've been largely talking about functional mappings that are described by deep neural networks.", 'start': 132.053, 'duration': 5.404}, {'end': 140.419, 'text': 'But at the core, these mappings could be anything.', 'start': 137.977, 'duration': 2.442}, {'end': 146.107, 'text': "Now we're going to turn our attention to a new class of problems.", 'start': 142.304, 'duration': 3.803}], 'summary': 'Deep neural network can generate realistic data, transitioning to new class of problems.', 'duration': 48, 'max_score': 98.107, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq498107.jpg'}, {'end': 286.081, 'src': 'embed', 'start': 259.848, 'weight': 2, 'content': [{'end': 265.229, 'text': "that's similar to the true distribution that describes how the data was generated?", 'start': 259.848, 'duration': 5.381}, {'end': 275.833, 'text': 'So why care about generative modeling, besides the fact that it can be used to generate these realistic, human-like faces?', 'start': 266.97, 'duration': 8.863}, {'end': 286.081, 'text': 'Well, first of all, generative approaches can uncover the underlying factors and features present in a data set.', 'start': 277.435, 'duration': 8.646}], 'summary': 'Generative modeling uncovers underlying data features.', 'duration': 26.233, 'max_score': 259.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4259848.jpg'}, {'end': 361.885, 'src': 'embed', 'start': 332.835, 'weight': 3, 'content': [{'end': 340.977, 'text': 'and use this information to actually create fairer, more representative data sets to train an unbiased classification model.', 'start': 332.835, 'duration': 8.142}, {'end': 346.111, 'text': 'Another great example is this question of outlier detection.', 'start': 342.648, 'duration': 3.463}, {'end': 352.217, 'text': 'So if we go back to the example problem of self-driving cars,', 'start': 346.151, 'duration': 6.066}, {'end': 361.885, 'text': 'most of the data that we may want to train a control network that Alexander was describing may be very common driving scenes.', 'start': 352.217, 'duration': 9.668}], 'summary': 'Create fairer data sets for unbiased models, e.g., in self-driving cars.', 'duration': 29.05, 'max_score': 332.835, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4332835.jpg'}], 'start': 98.107, 'title': 'Deep neural networks and generative modeling', 'summary': 'Covers the use of deep neural networks for data generation, including their applications in fooling participants, supervised learning, and functional mappings. it also delves into unsupervised learning, generative modeling, and their significance in uncovering hidden data structures, creating fairer datasets, and outlier detection for self-driving cars.', 'chapters': [{'end': 146.107, 'start': 98.107, 'title': 'Deep neural networks for data generation', 'summary': 'Explains how deep neural networks can be used to generate realistic data, fooling all the participants, and introduces a new class of problems in the context of supervised learning and functional mappings.', 'duration': 48, 'highlights': ['Deep neural networks can be used to generate data so realistic that it fooled all participants in the lecture.', 'This lecture focuses on supervised learning, where the goal is to learn a functional mapping from data to labels.', 'The course is centered on deep learning and functional mappings described by deep neural networks.']}, {'end': 384.637, 'start': 146.827, 'title': 'Unsupervised learning and generative modeling', 'summary': 'Explores unsupervised learning, particularly generative modeling, which aims to uncover hidden data structure and generate new examples. it also discusses the importance of generative modeling in uncovering underlying features and creating fairer, more representative datasets, as well as its application in outlier detection for self-driving cars.', 'duration': 237.81, 'highlights': ['Generative modeling aims to uncover underlying data structure and generate new examples.', 'Generative modeling is important in uncovering underlying features and creating fairer, more representative datasets.', 'Generative modeling is valuable for outlier detection in self-driving cars.']}], 'duration': 286.53, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq498107.jpg', 'highlights': ['Deep neural networks can generate realistic data fooling all participants.', 'Supervised learning focuses on learning functional mapping from data to labels.', 'Generative modeling uncovers data structure and creates fairer datasets.', 'Generative modeling is valuable for outlier detection in self-driving cars.', 'The course is centered on deep learning and functional mappings by deep neural networks.']}, {'end': 1267.101, 'segs': [{'end': 409.925, 'src': 'embed', 'start': 384.798, 'weight': 0, 'content': [{'end': 394.86, 'text': 'And so generative models can be used to actually detect the outliers that exist within training distributions and use this to train a more robust model.', 'start': 384.798, 'duration': 10.062}, {'end': 402.343, 'text': "And so we'll really dive deeply into two classes of models that we call latent variable models.", 'start': 396.561, 'duration': 5.782}, {'end': 408.484, 'text': 'But first, before we get into those details, we have a big question.', 'start': 403.503, 'duration': 4.981}, {'end': 409.925, 'text': 'What is a latent variable?', 'start': 408.544, 'duration': 1.381}], 'summary': 'Generative models detect outliers to train a more robust model.', 'duration': 25.127, 'max_score': 384.798, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4384798.jpg'}, {'end': 808.592, 'src': 'embed', 'start': 777.732, 'weight': 1, 'content': [{'end': 787.455, 'text': 'autoencoders are using this bottlenecking hidden layer that forces the network to learn a compressed latent representation of the data.', 'start': 777.732, 'duration': 9.723}, {'end': 793.857, 'text': 'And by using this reconstruction loss, we can train the network in a completely unsupervised manner.', 'start': 788.315, 'duration': 5.542}, {'end': 808.592, 'text': "And the name autoencoder comes from the fact that we're automatically encoding information within the data into this smaller latent space.", 'start': 795.069, 'duration': 13.523}], 'summary': 'Autoencoders learn compressed latent representation for unsupervised training.', 'duration': 30.86, 'max_score': 777.732, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4777732.jpg'}, {'end': 871.895, 'src': 'embed', 'start': 846.293, 'weight': 2, 'content': [{'end': 852.162, 'text': 'This is a deterministic encoding that allows us to reproduce the input as best as we can.', 'start': 846.293, 'duration': 5.869}, {'end': 866.694, 'text': 'But if we want to learn a more smooth representation of the latent space and use this to actually generate new images and sample new images that are similar to the input data set,', 'start': 853.851, 'duration': 12.843}, {'end': 871.895, 'text': 'we can use a structure called a variational autoencoder to more robustly do this.', 'start': 866.694, 'duration': 5.201}], 'summary': 'Deterministic encoding for input reproduction, variational autoencoder for generating new images.', 'duration': 25.602, 'max_score': 846.293, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4846293.jpg'}, {'end': 952.067, 'src': 'embed', 'start': 924.465, 'weight': 3, 'content': [{'end': 933.307, 'text': 'And what we can do is sample from these described distributions to obtain a probabilistic representation of our latent space.', 'start': 924.465, 'duration': 8.842}, {'end': 942.47, 'text': "And so as you can tell and as I've emphasized, the VAE structure is just an autoencoder with this probabilistic twist.", 'start': 933.888, 'duration': 8.582}, {'end': 952.067, 'text': 'So now what this means is, instead of deterministic functions that describe the encoder and decoder,', 'start': 944.622, 'duration': 7.445}], 'summary': 'Vae structure is an autoencoder with a probabilistic twist, using sampled distributions to represent latent space.', 'duration': 27.602, 'max_score': 924.465, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4924465.jpg'}, {'end': 1059.675, 'src': 'embed', 'start': 1033.295, 'weight': 4, 'content': [{'end': 1042.358, 'text': 'we want to place some constraints on how this probability distribution is computed and what that probability distribution resembles,', 'start': 1033.295, 'duration': 9.063}, {'end': 1045.339, 'text': 'as a part of regularizing and training our network.', 'start': 1042.358, 'duration': 2.981}, {'end': 1051.183, 'text': "And so the way that's done is by placing a prior on the latent distribution.", 'start': 1045.98, 'duration': 5.203}, {'end': 1053.629, 'text': "And that's p of z.", 'start': 1052.168, 'duration': 1.461}, {'end': 1059.675, 'text': "And so that's some initial hypothesis or guess about what the distribution of z looks like.", 'start': 1053.629, 'duration': 6.046}], 'summary': 'Constraints on probability distribution to regularize and train network.', 'duration': 26.38, 'max_score': 1033.295, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41033295.jpg'}], 'start': 384.798, 'title': 'Generative models and autoencoders', 'summary': 'Explores generative models for detecting outliers in training distributions and discusses the concept of latent variables. it also delves into autoencoders, which use a bottlenecking hidden layer to learn a compressed latent representation of data, and introduces variational autoencoders (vaes) enabling the learning of a smooth representation of the latent space and the generation of new images.', 'chapters': [{'end': 477.834, 'start': 384.798, 'title': 'Generative models and latent variables', 'summary': "Explores generative models for detecting outliers in training distributions to train more robust models. it also delves into latent variable models and explains the concept of latent variables using plato's myth of the cave.", 'duration': 93.036, 'highlights': ['Generative models can detect outliers in training distributions to train a more robust model.', "The concept of latent variables is explained using Plato's myth of the cave."]}, {'end': 1267.101, 'start': 479.974, 'title': 'Autoencoders and variational autoencoders', 'summary': 'Discusses the concept of autoencoders, which use a bottlenecking hidden layer to learn a compressed latent representation of data and can be trained in a completely unsupervised manner. it also introduces variational autoencoders (vaes) as a probabilistic twist on traditional autoencoders, enabling the learning of a smooth representation of the latent space and the generation of new images through stochastic sampling operations.', 'duration': 787.127, 'highlights': ['Autoencoders use a bottlenecking hidden layer to learn a compressed latent representation of the data', 'Autoencoders can be trained in a completely unsupervised manner using a reconstruction loss', 'Variational autoencoders (VAEs) introduce a stochastic sampling operation in the latent space', 'VAEs incorporate a regularization term to enforce constraints on the probability distributions of the latent variables', 'The use of a normal Gaussian prior in VAEs encourages even distribution of the latent variables and penalizes overfitting']}], 'duration': 882.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq4384798.jpg', 'highlights': ['Generative models detect outliers in training distributions for robust model training', 'Autoencoders learn compressed latent representation using bottlenecking hidden layer', 'Variational autoencoders (VAEs) introduce stochastic sampling in latent space', 'VAEs enforce constraints on probability distributions of latent variables', 'VAEs use normal Gaussian prior to encourage even distribution of latent variables']}, {'end': 1592.149, 'segs': [{'end': 1290.181, 'src': 'embed', 'start': 1268.8, 'weight': 0, 'content': [{'end': 1286.237, 'text': 'But what was a really breakthrough idea that enabled VAEs to really take off was to use this little trick called a reparameterization trick to reparameterize the sampling layer such that the network can now be trained end-to-end.', 'start': 1268.8, 'duration': 17.437}, {'end': 1290.181, 'text': "And I'll give you the key idea about how this operation works.", 'start': 1286.738, 'duration': 3.443}], 'summary': 'Using reparameterization trick enabled vaes to be trained end-to-end.', 'duration': 21.381, 'max_score': 1268.8, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41268800.jpg'}, {'end': 1431.985, 'src': 'embed', 'start': 1380.649, 'weight': 1, 'content': [{'end': 1386.411, 'text': "And as we saw, we can't do back propagation because we're going to get stuck at this stochastic sampling node.", 'start': 1380.649, 'duration': 5.762}, {'end': 1389.636, 'text': 'when the network is parameterized like this.', 'start': 1387.634, 'duration': 2.002}, {'end': 1399.263, 'text': "Instead, when we re-parameterize, we get this diagram, where we've now diverted the sampling operation off to the side,", 'start': 1390.736, 'duration': 8.527}, {'end': 1404.007, 'text': 'to this stochastic node epsilon, which is drawn from a normal distribution.', 'start': 1399.263, 'duration': 4.744}, {'end': 1411.493, 'text': 'And now the latent variables z are deterministic with respect to epsilon, the sampling operation.', 'start': 1404.527, 'duration': 6.966}, {'end': 1421.8, 'text': 'And so this means that we can back-propagate through Z without running into errors that arise from having stochasticity.', 'start': 1412.335, 'duration': 9.465}, {'end': 1431.985, 'text': 'And so this is a really powerful trick, because this simple re-parameterization is what actually allows for VAEs to be trained end-to-end.', 'start': 1423.34, 'duration': 8.645}], 'summary': 'Re-parameterizing vaes enables back-propagation, allowing for end-to-end training.', 'duration': 51.336, 'max_score': 1380.649, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41380649.jpg'}, {'end': 1489.739, 'src': 'embed', 'start': 1463.001, 'weight': 3, 'content': [{'end': 1471.086, 'text': 'And what is done is after that value of that latent variable is tuned, you can run the decoder to generate a reconstructed output.', 'start': 1463.001, 'duration': 8.085}, {'end': 1483.174, 'text': "And what you'll see now in the example of these faces is that that output that results from tuning a single latent variable has a very clear and distinct semantic meaning.", 'start': 1471.807, 'duration': 11.367}, {'end': 1489.739, 'text': 'So for example, this is differences in the pose or the orientation of a face.', 'start': 1484.315, 'duration': 5.424}], 'summary': 'Tuning latent variable results in distinct semantic meanings in reconstructed output.', 'duration': 26.738, 'max_score': 1463.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41463001.jpg'}, {'end': 1592.149, 'src': 'embed', 'start': 1516.023, 'weight': 4, 'content': [{'end': 1522.467, 'text': 'what we would want is for each of our latent variables to be independent and uncorrelated with each other,', 'start': 1516.023, 'duration': 6.444}, {'end': 1527.611, 'text': 'to really learn the richest and most compact representation possible.', 'start': 1522.467, 'duration': 5.144}, {'end': 1533.153, 'text': "So here's the same example as before, now with faces again, where we're looking at faces.", 'start': 1528.271, 'duration': 4.882}, {'end': 1544.717, 'text': "And now we're walking along two axes, which we can interpret as pose or orientation on the x, and maybe you can tell the smile on the y-axis.", 'start': 1533.713, 'duration': 11.004}, {'end': 1553.421, 'text': "And to re-emphasize, these are reconstructed images, but they're reconstructed by keeping all other variables fixed except these two.", 'start': 1545.378, 'duration': 8.043}, {'end': 1558.284, 'text': 'and then tuning the value of those latent features.', 'start': 1554.221, 'duration': 4.063}, {'end': 1562.007, 'text': 'And this is this idea of disentanglement,', 'start': 1559.405, 'duration': 2.602}, {'end': 1571.455, 'text': 'by trying to encourage the network to learn variables that are as independent and uncorrelated with each other as possible.', 'start': 1562.007, 'duration': 9.448}, {'end': 1579.724, 'text': 'And so to motivate the use of VAE and generative models a bit further.', 'start': 1573.462, 'duration': 6.262}, {'end': 1585.947, 'text': "let's go back to this example that I showed from the beginning of lecture the question of facial detection.", 'start': 1579.724, 'duration': 6.223}, {'end': 1592.149, 'text': "And, as I kind of mentioned, if we're given a data set with many different faces,", 'start': 1586.926, 'duration': 5.223}], 'summary': 'Encouraging independent and uncorrelated latent variables for compact representation in vae and generative models.', 'duration': 76.126, 'max_score': 1516.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41516023.jpg'}], 'start': 1268.8, 'title': 'Vaes and latent variables in nn', 'summary': 'Covers reparameterization trick in vaes, enabling end-to-end training, and explores the use of distributional priors on latent variables in neural networks to achieve independent and uncorrelated latent variables for a rich and compact representation.', 'chapters': [{'end': 1431.985, 'start': 1268.8, 'title': 'Reparameterization trick for vaes', 'summary': 'Discusses the reparameterization trick used in variational autoencoders (vaes), enabling end-to-end training by diverting the stochastic sampling operation for latent vectors, allowing for back-propagation through the network.', 'duration': 163.185, 'highlights': ['The reparameterization trick involves representing the sampled latent vector z as a sum of fixed vectors mu and sigma, scaled by a random constant drawn from a prior distribution, enabling end-to-end training.', 'By reparameterizing the sampling operation to a stochastic node epsilon drawn from a normal distribution, VAEs can back-propagate through the latent variables z without encountering errors from stochasticity.', 'The visualization demonstrates the diversion of the sampling operation to a stochastic node epsilon, allowing for deterministic latent variables with respect to the sampling operation, enabling back-propagation without encountering errors.']}, {'end': 1592.149, 'start': 1435.106, 'title': 'Understanding latent variables in neural networks', 'summary': "Explains how distributional priors on latent variables in neural networks allow for interpretation of the network's learning, enabling the tuning of single latent variables to generate reconstructed outputs with distinct semantic meanings, aiming for independent and uncorrelated latent variables for a rich and compact representation.", 'duration': 157.043, 'highlights': ['The network learns different latent variables in a totally unsupervised way, allowing for interpretation by perturbing the value of a single latent variable to understand their meaning and representation.', 'Tuning a single latent variable can result in reconstructed outputs with clear and distinct semantic meanings, such as differences in the pose or orientation of a face.', 'The aim is for each latent variable to be independent and uncorrelated with each other, to learn the richest and most compact representation possible in the compressed latent space.', 'Reconstructed images are generated by keeping all other variables fixed except the ones being tuned, demonstrating the idea of disentanglement in encouraging the network to learn independent and uncorrelated variables.', 'The use of VAE and generative models is motivated by the example of facial detection, highlighting the significance of learning and interpreting latent variables in neural networks.']}], 'duration': 323.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41268800.jpg', 'highlights': ['The reparameterization trick enables end-to-end training in VAEs', 'VAEs can back-propagate through latent variables using the reparameterization trick', 'The visualization demonstrates the diversion of the sampling operation to a stochastic node epsilon', 'Tuning a single latent variable results in reconstructed outputs with clear semantic meanings', 'Each latent variable aims to be independent and uncorrelated for a rich and compact representation', 'Reconstructed images are generated by keeping other variables fixed, demonstrating disentanglement', 'VAEs and generative models are motivated by the example of facial detection']}, {'end': 1761.361, 'segs': [{'end': 1633.858, 'src': 'embed', 'start': 1613.499, 'weight': 2, 'content': [{'end': 1628.193, 'text': "And a standard classifier that's trained on a data set that contains a lot of these types of examples will be better suited at recognizing and classifying those faces that have features similar to those shown on the left.", 'start': 1613.499, 'duration': 14.694}, {'end': 1633.858, 'text': 'And so this can generate unwanted bias in our classifier.', 'start': 1629.133, 'duration': 4.725}], 'summary': 'Training classifier on biased data can lead to unwanted bias.', 'duration': 20.359, 'max_score': 1613.499, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41613499.jpg'}, {'end': 1683.762, 'src': 'embed', 'start': 1657.984, 'weight': 0, 'content': [{'end': 1668.209, 'text': 'And so what is done is a VAE network is used to learn the underlying features of a training data set, in this case images of faces,', 'start': 1657.984, 'duration': 10.225}, {'end': 1672.531, 'text': 'in an unbiased and unsupervised manner, without any annotation.', 'start': 1668.209, 'duration': 4.322}, {'end': 1679.537, 'text': "And so here we're showing an example of one such learned latent variable, the orientation of the face.", 'start': 1673.371, 'duration': 6.166}, {'end': 1683.762, 'text': 'And again, we never told the network that orientation was important.', 'start': 1679.617, 'duration': 4.145}], 'summary': 'Vae network learns features of face images without supervision.', 'duration': 25.778, 'max_score': 1657.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41657984.jpg'}, {'end': 1761.361, 'src': 'embed', 'start': 1738.643, 'weight': 1, 'content': [{'end': 1747.248, 'text': 'And similarly, these faces with rarer features, like shadows or darker skin or glasses, may be underrepresented in the data.', 'start': 1738.643, 'duration': 8.605}, {'end': 1751.651, 'text': 'And so the likelihood of selecting them during sampling will be low.', 'start': 1747.608, 'duration': 4.043}, {'end': 1761.361, 'text': 'And the way this algorithm works is to use these inferred distributions to adaptively resample data during training.', 'start': 1753.254, 'duration': 8.107}], 'summary': 'Underrepresented features have low sampling likelihood, adapting resampling during training.', 'duration': 22.718, 'max_score': 1738.643, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41738643.jpg'}], 'start': 1592.149, 'title': 'Addressing algorithmic bias in face recognition', 'summary': 'Delves into utilizing generative models like vae to mitigate imbalances in face data sets, which can lead to algorithmic bias, by adaptively resampling data during training based on inferred distributions.', 'chapters': [{'end': 1761.361, 'start': 1592.149, 'title': 'Algorithmic bias in face recognition', 'summary': 'Discusses the use of generative models, specifically vae, to address imbalances in face data sets, which can result in algorithmic bias by adaptively resampling data during training based on inferred distributions.', 'duration': 169.212, 'highlights': ['Generative model VAE is used to learn latent variables in an unbiased and unsupervised manner, such as orientation of the face.', 'Imbalances in the training data can result in unwanted algorithmic bias, affecting the recognition and classification of faces with certain features.', 'Adaptive resampling of data during training is employed based on inferred distributions to mitigate bias in the selection of images with overrepresented or underrepresented features.']}], 'duration': 169.212, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41592149.jpg', 'highlights': ['Generative model VAE learns unbiased latent variables like face orientation', 'Adaptive resampling of data during training mitigates algorithmic bias', 'Imbalances in training data affect recognition and classification of faces']}, {'end': 2228.699, 'segs': [{'end': 1822.169, 'src': 'embed', 'start': 1763.743, 'weight': 2, 'content': [{'end': 1776.974, 'text': 'And this is used to generate a more balanced and more fair training data set that can be then fed into the network to ultimately result in a unbiased classifier.', 'start': 1763.743, 'duration': 13.231}, {'end': 1780.517, 'text': "And this is exactly what you'll explore in today's lab.", 'start': 1777.234, 'duration': 3.283}, {'end': 1788.289, 'text': 'OK. so to reiterate and summarize some of the key points of VAEs,', 'start': 1782.56, 'duration': 5.729}, {'end': 1795.84, 'text': 'these VAEs encode a compressed representation of the world by learning these underlying latent features.', 'start': 1788.289, 'duration': 7.551}, {'end': 1804.837, 'text': 'And from this representation, we can sample to generate reconstructions of the input data in an unsupervised fashion.', 'start': 1797.091, 'duration': 7.746}, {'end': 1822.169, 'text': 'We can use the reparameterization trick to train our networks end to end and use this perturbation approach to interpret and understand the meaning behind each of these hidden latent variables.', 'start': 1805.697, 'duration': 16.472}], 'summary': 'Vaes encode compressed representation, sample to generate reconstructions, and use reparameterization trick to train networks.', 'duration': 58.426, 'max_score': 1763.743, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41763743.jpg'}, {'end': 1872.426, 'src': 'embed', 'start': 1847.679, 'weight': 3, 'content': [{'end': 1856.333, 'text': "And so the idea here is we don't want to explicitly model the density or the distribution underlying some data,", 'start': 1847.679, 'duration': 8.654}, {'end': 1861.162, 'text': "but instead just generate new instances that are similar to the data that we've seen.", 'start': 1856.333, 'duration': 4.829}, {'end': 1872.426, 'text': 'which means we want to try to sample from a really, really complex distribution, which we may not be able to learn directly in an efficient manner.', 'start': 1863.083, 'duration': 9.343}], 'summary': 'Generate new instances similar to seen data without explicitly modeling density or distribution.', 'duration': 24.747, 'max_score': 1847.679, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41847679.jpg'}, {'end': 1930.929, 'src': 'embed', 'start': 1900.819, 'weight': 5, 'content': [{'end': 1906.024, 'text': 'And so the breakthrough to really achieving this was this GAN structure,', 'start': 1900.819, 'duration': 5.205}, {'end': 1913.171, 'text': 'where we have two neural networks one we call a generator and one we call a discriminator that are competing against each other.', 'start': 1906.024, 'duration': 7.147}, {'end': 1914.152, 'text': "They're adversaries.", 'start': 1913.211, 'duration': 0.941}, {'end': 1924.362, 'text': 'Specifically, the goal of the generator is to go from noise to produce imitations of data that are close to real as possible.', 'start': 1915.013, 'duration': 9.349}, {'end': 1930.929, 'text': 'Then the discriminator network takes both the fake data as well as true data.', 'start': 1925.379, 'duration': 5.55}], 'summary': 'Gan structure uses competing neural networks to generate realistic imitations of data.', 'duration': 30.11, 'max_score': 1900.819, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41900819.jpg'}, {'end': 2069.127, 'src': 'embed', 'start': 2045.766, 'weight': 6, 'content': [{'end': 2053.672, 'text': 'the generator sees the real examples and it starts moving these fake points closer and closer to the real data,', 'start': 2045.766, 'duration': 7.906}, {'end': 2057.855, 'text': 'such that the fake data is almost following the distribution of the real data.', 'start': 2053.672, 'duration': 4.183}, {'end': 2069.127, 'text': "And eventually, Eventually, it's going to be very hard for the discriminator to be able to distinguish between what's real, what's fake,", 'start': 2059.096, 'duration': 10.031}], 'summary': 'Generator moves fake points closer to real data, mimicking real data distribution.', 'duration': 23.361, 'max_score': 2045.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42045766.jpg'}, {'end': 2146.093, 'src': 'embed', 'start': 2093.313, 'weight': 0, 'content': [{'end': 2101.64, 'text': 'OK, and so you can think of this as an adversarial competition between these two networks, the generator and the discriminator.', 'start': 2093.313, 'duration': 8.327}, {'end': 2111.706, 'text': "And what we can do is, after training, use the trained generator network to sample and create new data that's not been seen before.", 'start': 2102.58, 'duration': 9.126}, {'end': 2116.95, 'text': 'And so to look at examples of what we can achieve with this approach.', 'start': 2112.887, 'duration': 4.063}, {'end': 2132.169, 'text': 'The examples that I showed at the beginning of the lecture were generated by using this idea of progressively growing GANs to iteratively build more detailed image generations.', 'start': 2118.666, 'duration': 13.503}, {'end': 2139.911, 'text': 'And the way this works is that the generator and the discriminator start by having very low spatial resolution.', 'start': 2132.95, 'duration': 6.961}, {'end': 2146.093, 'text': 'And as training progresses, more and more layers are incrementally added to each of the two networks.', 'start': 2140.491, 'duration': 5.602}], 'summary': 'Adversarial competition between generator and discriminator, creating new data using trained generator network, progressively growing gans for detailed image generations.', 'duration': 52.78, 'max_score': 2093.313, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42093313.jpg'}, {'end': 2200.011, 'src': 'embed', 'start': 2170.43, 'weight': 1, 'content': [{'end': 2180.937, 'text': 'Another idea involves unpaired image-to-image translation, or style transfer, which uses a network called CycleGAN.', 'start': 2170.43, 'duration': 10.507}, {'end': 2186.121, 'text': "And here, we're taking a bunch of images in one domain, for example, the horse domain.", 'start': 2181.658, 'duration': 4.463}, {'end': 2190.323, 'text': 'And without having the corresponding image in another domain.', 'start': 2186.841, 'duration': 3.482}, {'end': 2200.011, 'text': 'We want to take the input image, generate an image in a new style that follows the distribution of that new style.', 'start': 2191.184, 'duration': 8.827}], 'summary': 'Using cyclegan for unpaired image-to-image translation without corresponding images in another domain.', 'duration': 29.581, 'max_score': 2170.43, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42170430.jpg'}], 'start': 1763.743, 'title': 'Vaes and gans in ml', 'summary': 'Covers the use of variational autoencoders (vaes) and generative adversarial networks (gans) for unbiased classifier, with examples of image generation and style transfer.', 'chapters': [{'end': 1872.426, 'start': 1763.743, 'title': 'Vaes and gans in ml', 'summary': 'Covers the use of variational autoencoders (vaes) to learn latent features and generative adversarial networks (gans) to generate realistic new samples, ultimately achieving an unbiased classifier.', 'duration': 108.683, 'highlights': ['VAEs encode a compressed representation of the world by learning underlying latent features, allowing for unsupervised generation of reconstructions of input data.', 'VAEs use the reparameterization trick to train networks end to end and interpret the meaning behind hidden latent variables.', 'Generative Adversarial Networks (GANs) focus on generating new samples similar to existing data, without explicitly modeling the underlying distribution.', 'GANs aim to sample from complex distributions efficiently, providing realistic new instances of the input data.']}, {'end': 2228.699, 'start': 1873.347, 'title': 'Understanding gans and adversarial networks', 'summary': 'Explains the adversarial approach of generative adversarial networks (gans), involving a generator and discriminator competing to produce fake data as close to real data as possible, with examples of image generation and style transfer.', 'duration': 355.352, 'highlights': ['The GAN structure involves two neural networks, a generator, and a discriminator that compete against each other, with the generator trained to produce imitations of data close to real and the discriminator learning to distinguish between fake and real data.', 'Through adversarial competition, the generator aims to create fake data points that are almost indistinguishable from the real data, while the discriminator improves at distinguishing between real and fake data.', 'Progressively growing GANs are used to iteratively build more detailed image generations, starting with low spatial resolution and adding layers incrementally to increase the spatial resolution of the outputted generation images.', 'The CycleGAN is used for unpaired image-to-image translation, employing two generators and two discriminators with a cyclic loss term to transfer the style of one domain to another.']}], 'duration': 464.956, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq41763743.jpg', 'highlights': ['Progressively growing GANs iteratively build detailed image generations', 'CycleGAN is used for unpaired image-to-image translation with two generators and discriminators', 'VAEs use reparameterization trick to interpret meaning behind hidden latent variables', 'GANs aim to sample from complex distributions efficiently for realistic new instances', 'VAEs encode compressed representation of world for unsupervised generation of reconstructions', 'GAN structure involves two neural networks, a generator, and a discriminator', 'Through adversarial competition, the generator aims to create fake data points', 'Generative Adversarial Networks focus on generating new samples similar to existing data']}, {'end': 2433.725, 'segs': [{'end': 2270.457, 'src': 'embed', 'start': 2229.628, 'weight': 0, 'content': [{'end': 2241.532, 'text': "So maybe you'll notice in this example of going from horse to zebra that the network has not only learned how to transform the skin of the horse from brown to the stripes of a zebra,", 'start': 2229.628, 'duration': 11.904}, {'end': 2245.594, 'text': "but it's also changed a bit about the background in the scene.", 'start': 2241.532, 'duration': 4.062}, {'end': 2251.696, 'text': "It's learned that zebras are more likely to be found in maybe the savanna grasslands,", 'start': 2246.154, 'duration': 5.542}, {'end': 2258.398, 'text': 'so the grass is browner and maybe more savanna-like in the zebra example compared to the horse.', 'start': 2251.696, 'duration': 6.702}, {'end': 2270.457, 'text': "And what we actually did is to use this approach of CycleGAN to synthesize speech in someone else's voice.", 'start': 2260.569, 'duration': 9.888}], 'summary': 'Cyclegan transforms horse to zebra, altering skin and background; used to synthesize speech in different voice.', 'duration': 40.829, 'max_score': 2229.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42229628.jpg'}, {'end': 2346.239, 'src': 'embed', 'start': 2320.167, 'weight': 2, 'content': [{'end': 2329.413, 'text': 'And what we did is we took original audio of Alexander saying that script that was played yesterday and took the audio waveforms,', 'start': 2320.167, 'duration': 9.246}, {'end': 2346.239, 'text': "converted them into the spectrogram images and then trained a CycleGAN using this information and audio recordings of Obama's voice to to transfer the style of Obama's voice onto our script.", 'start': 2329.413, 'duration': 16.826}], 'summary': 'Used audio waveforms to train cyclegan for voice style transfer.', 'duration': 26.072, 'max_score': 2320.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42320167.jpg'}, {'end': 2433.725, 'src': 'embed', 'start': 2377.188, 'weight': 3, 'content': [{'end': 2387.576, 'text': "OK, so to summarize today, we've covered a lot of ground on autoencoders and variational autoencoders and generative adversarial networks.", 'start': 2377.188, 'duration': 10.388}, {'end': 2397.504, 'text': 'And hopefully this discussion of these approaches gives you a sense of how we can use deep learning to not only learn patterns in data,', 'start': 2388.157, 'duration': 9.347}, {'end': 2403.93, 'text': 'but to use this information in a rich way to achieve generative modeling.', 'start': 2397.504, 'duration': 6.426}, {'end': 2408.054, 'text': 'And I really appreciate the great questions and discussions.', 'start': 2404.491, 'duration': 3.563}, {'end': 2414.781, 'text': 'And all of us are happy to continue that dialogue during the lab session.', 'start': 2408.775, 'duration': 6.006}, {'end': 2420.176, 'text': 'And so our lab today is going to focus on computer vision.', 'start': 2416.553, 'duration': 3.623}, {'end': 2427.4, 'text': 'And as Alexander mentioned, there is another corresponding competition for lab two.', 'start': 2420.516, 'duration': 6.884}, {'end': 2432.604, 'text': 'And we encourage you all to stick around if you wish to ask us questions.', 'start': 2428.181, 'duration': 4.423}, {'end': 2433.725, 'text': 'And thank you again.', 'start': 2433.124, 'duration': 0.601}], 'summary': 'Covered autoencoders, variational autoencoders, and gans for generative modeling in deep learning. lab session to focus on computer vision with corresponding competition for lab two.', 'duration': 56.537, 'max_score': 2377.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42377188.jpg'}], 'start': 2229.628, 'title': 'Cyclegan applications', 'summary': "Covers the use of cyclegan for image transformation, including altering a horse's skin to zebra stripes, transforming backgrounds, and synthesizing speech in another person's voice. it also discusses speech transformation using cyclegan, exemplified by converting alexander's voice into obama's voice and concludes with an overview of the covered topics and an invitation to the lab session.", 'chapters': [{'end': 2270.457, 'start': 2229.628, 'title': 'Cyclegan for image transformation', 'summary': "Discusses the use of cyclegan to transform a horse's skin to zebra stripes, altering the background to a more savanna-like environment, and applying the approach to synthesize speech in another person's voice.", 'duration': 40.829, 'highlights': ["The network learned to transform the horse's skin to zebra stripes and alter the background to a more savanna-like environment.", "The approach of CycleGAN was used to synthesize speech in someone else's voice."]}, {'end': 2433.725, 'start': 2270.578, 'title': 'Speech transformation using cyclegan', 'summary': "Discusses using cyclegan to transform audio representations from one voice to appear like representations from another voice, demonstrated by transforming alexander's voice into obama's voice, and concludes with an overview of the covered topics and an invitation to the lab session.", 'duration': 163.147, 'highlights': ["The chapter demonstrates using CycleGAN to transform Alexander's voice into Obama's voice by converting audio waveforms into spectrogram images and training the CycleGAN using this information and audio recordings of Obama's voice.", 'The chapter concludes with an overview of the covered topics, including autoencoders, variational autoencoders, and generative adversarial networks, emphasizing the use of deep learning for generative modeling.', 'The chapter invites participation in the upcoming lab session focusing on computer vision and mentions the corresponding competition for lab two.']}], 'duration': 204.097, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rZufA635dq4/pics/rZufA635dq42229628.jpg', 'highlights': ["The network learned to transform the horse's skin to zebra stripes and alter the background to a more savanna-like environment.", "The approach of CycleGAN was used to synthesize speech in someone else's voice.", "The chapter demonstrates using CycleGAN to transform Alexander's voice into Obama's voice by converting audio waveforms into spectrogram images and training the CycleGAN using this information and audio recordings of Obama's voice.", 'The chapter concludes with an overview of the covered topics, including autoencoders, variational autoencoders, and generative adversarial networks, emphasizing the use of deep learning for generative modeling.', 'The chapter invites participation in the upcoming lab session focusing on computer vision and mentions the corresponding competition for lab two.']}], 'highlights': ['The ability to create convincing fake faces was demonstrated in December 2019, showcasing the recent advancements in deep generative modeling.', 'Deep generative modeling is an emerging field within deep learning that focuses on creating systems that can learn to generate new data based on learned information.', 'Progressively growing GANs iteratively build detailed image generations', 'CycleGAN is used for unpaired image-to-image translation with two generators and discriminators', 'Generative model VAE learns unbiased latent variables like face orientation', 'Generative modeling uncovers data structure and creates fairer datasets', 'Generative models detect outliers in training distributions for robust model training', 'Deep neural networks can generate realistic data fooling all participants.', 'VAEs use reparameterization trick to interpret meaning behind hidden latent variables', 'Adaptive resampling of data during training mitigates algorithmic bias', 'Imbalances in training data affect recognition and classification of faces', 'The reparameterization trick enables end-to-end training in VAEs', 'VAEs enforce constraints on probability distributions of latent variables', 'Tuning a single latent variable results in reconstructed outputs with clear semantic meanings', 'Each latent variable aims to be independent and uncorrelated for a rich and compact representation', 'Reconstructed images are generated by keeping other variables fixed, demonstrating disentanglement', 'VAEs and generative models are motivated by the example of facial detection', "The network learned to transform the horse's skin to zebra stripes and alter the background to a more savanna-like environment.", "The approach of CycleGAN was used to synthesize speech in someone else's voice.", "The chapter demonstrates using CycleGAN to transform Alexander's voice into Obama's voice by converting audio waveforms into spectrogram images and training the CycleGAN using this information and audio recordings of Obama's voice.", 'The chapter concludes with an overview of the covered topics, including autoencoders, variational autoencoders, and generative adversarial networks, emphasizing the use of deep learning for generative modeling.', 'The course is centered on deep learning and functional mappings by deep neural networks.', 'Supervised learning focuses on learning functional mapping from data to labels.', 'Generative modeling is valuable for outlier detection in self-driving cars.', 'Generative Adversarial Networks focus on generating new samples similar to existing data', 'Through adversarial competition, the generator aims to create fake data points']}