• Neural networks are widely used to predict. What if they could be used to generate new images, texts or even audio clips?

• Imagine training a robotic arm to localize objects on a table (in order to grasp them.) Collecting real data for this task is expensive. It requires to position objects on a table, take pictures and label with bounding boxes. Alternatively, taking screenshots in simulations allows you to virtually generate millions of labelled images. The downside is that a network trained on simulated data might not generalize to real data. Having a network that generates real homologues of simulated images would be a game changer. This is one example of application of Generative Adversarial Networks (GANs.)

• This topic will give you a thorough grounding on GANs and how to apply them to cutting-edge tasks.

### Motivation

• Are networks capable of generating images of cats they have never seen? Intuitively, they should be. If a cat vs. non-cat classifier generalizes to unseen data, it means that it understands the salient features of the data (i.e. what a cat is and isn’t) instead of overfitting the training data. Similarly, a generative model should be able to generate pictures of cats it has never seen because its complexity (~ number of parameters) doesn’t allow it to memorize the training set.

• For instance, the following cats, cars and faces were generated by Karras et al. 1 using GANs. They do not exist in reality!

### The generator vs. discriminator game

• Although there exist various generative algorithms, this article will focus on the study of GANs.

• A GAN 2 involves two neural networks. The first network is called the “generator” ($G$) and its goal is to generate realistic samples. The second network is a binary classifier called the “discriminator”, and its goal is to differentiate fake samples (label $0$) from real sample (label $1$.)

• These two networks play a game. $D$ alternatively receives real samples from a database and fake samples generated by $G$, and has to learn to differentiate them. At the same time, $G$ learns to fool $D$. The game ends when $G$ generates samples that are realistic enough to fool $D$. When training ends successfully, you can use $G$ to generate realistic samples. Here’s an illustration of the GAN game.

• It is common to choose a random code $z$ as input to $G$, such that $x = G(z)$ is a generated image. Later, you will learn alternative designs for z allowing you to choose the features of $x$.

### Training GANs

• To training the GAN, you need to optimize two cost function simultaneously.
• Discrimator cost $J^{(D)}$: $D$ is a binary classifier aiming to map inputs $x=G(z)$ to $y = 0$ and inputs $x=x_{real}$ to $y = 1$. Thus, the logistic loss ( binary cross-entropy loss) is appropriate:

$J^{(D)} = -\frac{1}{m_{\text{real}}}\sum_{i=1}^{m_{\text{real}}} y_{\text{real}}^{(i)}\log (D(x^{(i)})) -\frac{1}{m_{\text{gen}}}\sum_{i=1}^{m_{\text{gen}}} (1-y_{\text{gen}}^{(i)})\log (1-D(G(z^{(i)})))$
• where $m_{\text{real}}$ (resp. $m_{\text{gen}}$) is the number of real (resp. generated) examples in a batch. $y_{\text{gen}}$ = 0$and$y_{\text{real}} = 1$. • Generator cost$J^{(G)}$: Since success is measured by the ability of$G$to fool$D$,$J^{(G)}$should be the opposite of$J^{(D)}$: $J^{(G)} = \frac{1}{m_{\text{gen}}}\sum_{i=1}^{m_{\text{gen}}} \log (1-D(G(z^{(i)})))$ • Note: the first term of$J^{(D)}$does not appear in$J^{(G)}$because it is independent of$z$and will entail no gradient during optimization. • You can run an optimization algorithm such as Adam 3 simultaneously using two mini-batches of real and fake samples. You can think of it as a two step process: 1. Forward propagate a mini-batch of real samples, compute$J^{(D)}$. Then, backprogate to compute$\frac{\partial J^{(D)}}{\partial W_D}$where$W_D denotes the parameters of $D$.