CS231n • Generative Models
 Generative Models
 Generative Models
 PixelRNN and PixelCNN
 Autoencoders
 Variational Autoencoders (VAE)
 Generative Adversarial Networks (GANs)
 Citation
Generative Models

Generative models are a type of Unsupervised learning.

Supervised vs Unsupervised Learning:

Supervised Learning Unsupervised Learning Data structure Data: (x, y), and x is data, y is label Data: x, Just data, no labels! Data price Training data is expensive in a lot of cases. Training data are cheap! Goal Learn a function to map x > y Learn some underlying hidden structure of the data Examples Classification, regression, object detection, semantic segmentation, image captioning Clustering, dimensionality reduction, feature learning, density estimation


Autoencoders are a Feature learning technique.
![]((assets/generativemodels/24.png)
 It contains an encoder and a decoder. The encoder downsamples the image while the decoder upsamples the features.
 The loss used is L2 loss.

Density estimation is where we want to learn/estimate the underlaying distribution for the data!

There are a lot of research open problems in unsupervised learning compared with supervised learning!
Generative Models
 Given training data, generate new samples from same distribution.
 Addresses density estimation, a core problem in unsupervised learning.
 We have different ways to do this:
 Explicit density estimation: explicitly define and solve for the learning model.
 Learn model that can sample from the learning model without explicitly defining it.
 Why Generative Models?
 Realistic samples for artwork, superresolution, colorization, etc
 Generative models of timeseries data can be used for simulation and planning (reinforcement learning applications!)
 Training generative models can also enable inference of latent representations that can be useful as general features
 Taxonomy of Generative Models: ![]((assets/generativemodels/52.png)
 In this lecture we will discuss: PixelRNN/CNN, Variational Autoencoder, and GANs as they are the popular models in research now.
PixelRNN and PixelCNN
 In a full visible belief network we use the chain rule to decompose likelihood of an image x into product of 1d distributions
p(x) = sum(p(x[i] x[1]x[2]....x[i1]))
 Where p(x) is the Likelihood of image x and x[i] is Probability of i’th pixel value given all previous pixels.
 To solve the problem we need to maximize the likelihood of training data but the distribution is so complex over pixel values.
 Also we will need to define ordering of previous pixels.
 PixelRNN
 Founded by [van der Oord et al. 2016]
 Dependency on previous pixels modeled using an RNN (LSTM)
 Generate image pixels starting from corner
 Drawback: sequential generation is slow! because you have to generate pixel by pixel!
 PixelCNN
 Also Founded by [van der Oord et al. 2016]
 Still generate image pixels starting from corner.
 Dependency on previous pixels now modeled using a CNN over context region
 Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images)
 Generation must still proceed sequentially still slow.
 There are some tricks to improve PixelRNN & PixelCNN.
 PixelRNN and PixelCNN can generate good samples and are still active area of research.
Autoencoders
 Unsupervised approach for learning a lowerdimensional feature representation from unlabeled training data.
 Consists of Encoder and decoder.
 The encoder:
 Converts the input x to the features z. z should be smaller than x to get only the important values out of the input. We can call this dimensionality reduction.
 The encoder can be made with:
 Linear or non linear layers (earlier days days)
 Deep fully connected NN (Then)
 RELU CNN (Currently we use this on images)
 The decoder:
 We want the encoder to map the features we have produced to output something similar to x or the same x.
 The decoder can be made with the same techniques we made the encoder and currently it uses a RELU CNN.
 The encoder is a conv layer while the decoder is deconv layer! Means Decreasing and then increasing.
 The loss function is L2 loss function:
L[i] = y[i]  y'[i]^2
 After training we though away the decoder.
# Now we have the features we need
 After training we though away the decoder.
 We can use this encoder we have to make a supervised model.
 The value of this it can learn a good feature representation to the input you have.
 A lot of times we will have a small amount of data to solve problem. One way to tackle this is to use an Autoencoder that learns how to get features from images and train your small dataset on top of that model.
 The question is can we generate data (Images) from this Autoencoder?
Variational Autoencoders (VAE)
 Probabilistic spin on Autoencoders  will let us sample from the model to generate data!
 We have z as the features vector that has been formed using the encoder.
 We then choose prior p(z) to be simple, e.g. Gaussian.
 Reasonable for hidden attributes: e.g. pose, how much smile.

Conditional p(x z) is complex (generates image) => represent with neural network  But we cant compute integral for P(z)p(xz)dz as the following equation: ![]((assets/generativemodels/25.png)
 After resolving all the equations that solves the last equation we should get this: ![]((assets/generativemodels/26.png)
 Variational Autoencoder are an approach to generative models but Samples blurrier and lower quality compared to stateoftheart (GANs)
 Active areas of research:
 More flexible approximations, e.g. richer approximate posterior instead of diagonal Gaussian
 Incorporating structure in latent variables
Generative Adversarial Networks (GANs)

GANs don’t work with any explicit density function!

Instead, take gametheoretic approach: learn to generate from training distribution through 2player game.

Yann LeCun, who oversees AI research at Facebook, has called GANs:

The coolest idea in deep learning in the last 20 years


Problem: Want to sample from complex, highdimensional training distribution. No direct way to do this as we have discussed!

Solution: Sample from a simple distribution, e.g. random noise. Learn transformation to training distribution.

So we create a noise image which are drawn from simple distribution feed it to NN we will call it a generator network that should learn to transform this into the distribution we want.

Training GANs: Twoplayer game:
 Generator network: try to fool the discriminator by generating reallooking images.
 Discriminator network: try to distinguish between real and fake images.

If we are able to train the Discriminator well then we can train the generator to generate the right images.

The loss function of GANs as minimax game are here:
![]((assets/generativemodels/27.png)

The label of the generator network will be 0 and the real images are 1.

To train the network we will do:
 Gradient ascent on discriminator.
 Gradient ascent on generator but with different loss.

You can read the full algorithm with the equations here:
![]((assets/generativemodels/28.png)

Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training is an active area of research.

Convolutional Architectures:
 Generator is an upsampling network with fractionallystrided convolutions Discriminator is a Convolutional network.
 Guidelines for stable deep Conv GANs:
 Replace any pooling layers with strided convs (discriminator) and fractionalstrided convs with (Generator).
 Use batch norm for both networks.
 Remove fully connected hidden layers for deeper architectures.
 Use RELU activation in generator for all layers except the output which uses Tanh
 Use leaky RELU in discriminator for all the layers.

2017 is the year of the GANs! it has exploded and there are some really good results.

Active areas of research also is GANs for all kinds of applications.

The GAN zoo can be found here: https://github.com/hindupuravinash/theganzoo

Tips and tricks for using GANs: https://github.com/soumith/ganhacks

NIPS 2016 Tutorial GANs: https://www.youtube.com/watch?v=AJVyzd0rqdc
Citation
If you found our work useful, please cite it as:
@article{Chadha2020GenerativeModels,
title = {Generative Models},
author = {Chadha, Aman},
journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
year = {2020},
note = {\url{https://aman.ai}}
}