# CS230 • Interpretability of Neural Networks

- Introduction
- Interpreting the neural networks’ outputs
- Visualizing Neural Networks from the inside
- The deconvolution and its applications

# Introduction

In this topic, we discuss how to interpret the output of trained neural networks. We see how to derive saliency maps from input images and introduce some new ideas for object detection and localization. We then shift our discussion to visualizing neural nets from the inside and interpreting the output of different activations inside a trained neural network. With these tools, we can better understand and interpret the complex black box of deep neural networks.

# Interpreting the neural networks’ outputs

Consider the following situation. We have trained a CNN for animal classification to be used in a pet shop. The manager of the pet shop is a little reluctant to use our CNN classifier, since he has doubts about the decision process of our complex CNN function. We want to explain to the manager about the rationale behind our CNN’s decision and look for different ways to convince the manager that our CNN decides after it properly localizes the animal in the input image. How do we show that the model is actually looking at the animal?

## Building saliency maps through derivative analysis

One way to approach this task is to look at the derivative (gradient) of our trained CNN’s score assigned to a particular animal label (e.g. dog’s score $s_{\text{dog}}$ ) with respect to the input image. Note that by the CNN’s score we mean the pre-softmax layer’s assigned score, since the post-softmax score which has been normalized to report a probability takes all animals’ scores into account. The derivative $\frac{\partial s_{\text{dog}}(x)}{\partial x}$ provides a saliency map indicating which pixels need to be changed at least to affect the class score the most. This saliency map can further be employed for image segmentation, as it provides a measure describing how important each pixel will be for assigning the selected label to that input image.

## Building probability maps via analyzing occlusion sensitivity

Another approach to interpret the CNN’s decision is through an occlusion sensitivity analysis. Here, we block a rectangle with some fixed size in the input image and evaluate how much the occlusion affects the CNN’s score. We evaluate this score for all different ways of positioning the blocking rectangle and build a probability map showing the CNN’s confidence score after blocking the rectangular regions. Since blocking the regions where the animal is located has a higher impact on the confidence score, the resulted probability map will provide a nice visualization of the CNN’s decision-making mechanism.

## Interpreting NNs in order to localize objects

Suppose that the pet shop also wants to localize animals in the input pictures. Can we somehow use the trained CNN for this job? Here we discuss a useful idea for this task. In a typical CNN architecture, we usually flatten the output of the last conv-maxpool layer and then pass the output through a fully-connected layer before the final softmax layer. Note that flattening the conv layer’s output removes the location-based information of the CNN transformation. In order to preserve the local information, we can replace the flattening operation with a global average pooling. The global average pooling maintains the local knowledge of the image. Since we have changed the architecture of the CNN after the last convolution layer, we need to train the new weights for the last fully-connected layer. Then, taking the average based on the new trained weights of the final conv layer mapping will give a good estimate of the location of the particular object. This idea shows hot to turn a classifier CNN to a localizer CNN.

# Visualizing Neural Networks from the inside

Given a trained neural network, we can ask this basic question that how the trained model thinks of a particular object. We discuss two ideas based on gradient ascent algorithm and dataset search tools to address this problem.

## Gradient Ascent Optimizer for Visualizing NNs

To understand how the CNN model thinks of an object (e.g. a dog), we can maximize the score corresponding to that object after applying a regularization penalty to ensure that the optimal solution looks natural. The loss function can therefore be $L = s_{\text{dog}}(x) - \lambda ||x||^2$. We can apply the gradient ascent algorithm to find the optimal solution. To do this, we repeat the following process for a certain number of iterations:

- Forward propagate $x$
- Compute the objective $L$
- Backpropagate to get $dL/dx$
- Update x’s pixels with gradient Ascent

This method can further be applied to the output of any activation of the neural net, i.e. we replace $s_{\text{dog}}(x)$ to $a_j(x)$ with $a^l_j$ denoting the output of the $l$th activation in the CNN’s $j$th layer.

## Dataset Search for Visualizing NNs

Instead of optimizing the input x for visualizing the neural net, we can search the entire dataset and list the top k (e.g. k = 5) images to find the type of the images activating a particular activation the most. For example, after trying this idea on the output of one maxpool layer we might obtain the shirt pictures or the pictures with some certain green-red pattern.

# The deconvolution and its applications

## Introduction

The concept of ‘deconvolution’ can be seen as a transpose convolution, or as an inverse operation of a convolution.

One main motivation for this tool can be found in GANs. How can we make sure that the generator can produce an image from a given vector array? One possible way of doing it would be to have a fully connected layer with the correct output size. However, we are going to see that the same is possible with deconvolutions.

Another application is semantic segmentation, where a system is an encoder/decoder compresses information and then produces a mask that encompasses the input image.

## Motivation

From a trained convolutional network architecture, we would like from a given activation map to trace back all the steps back to the input and plot the pixels that triggered these activations.

In order to do so, we are going to keep the maximum value of the activation map and set other values to zero. In order to trace back the signal to the input, we see that the mathematical concept related to this operation would be a sort of inversion. To do so, we use all inverse operations used in the forward pass in reverse order to find a reconstruction of the input.

## Walkthrough with a 1D example

### Convolution

Suppose we have a vector of 8 elements with a padding of two. In other words, let’s consider the 12-dimensional vector
\begin{align*}
x =
\begin{bmatrix}
0
0
x_1
\vdots
x_8
0
0
\end{bmatrix}
\end{align*}
and perform 1D convolution on it with a filter of size $f = 4$ and stride $s =2$. By using the formula $n_y = \lceil \frac{n_x -f}{s} \rceil +1$, we see that the output shape is of $n_y = 5$.

We can see the convolution operation a matrix-vector operation: $y = W x$ where $W$ is the $5 \times 12$ matrix such that we get the system of equations:
\begin{align*}
y_1 = w_1 \cdot 0 + w_2 \cdot 0 + w_3 \cdot x_1 + w_4 \cdot x_2
y_2 = w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3 + w_4 \cdot x_4
\vdots
y_5 = w_1 \cdot x_7 + w_2 \cdot x_8 + w_3 \cdot 0 + w_4 \cdot 0
\end{align*}

### Deconvolution

The interesting part of this observation is that we can give a simple meaning to the inverse operation of the convolution operation, namely the inverse of $W$ through the equation $x = W^{-1} y$.

For now, let’s assume that $W$ is an orthogonal matrix, or in other words that $W^{-1} = W^T$. We can also characterize these matrices by the fact that rows are two by two orthogonal.

In the example of an edge detector, we would have $(w_1, w_2, w_3, w_4) = (-1, 0, 0, 0,1)$, which indeed gives an orthogonal matrix. With this assumption, we can see that it is very simple to reconstruct $x$ through the matrix-vector multiplication $x = W^T y$.

However, we note that there is one caveat to this approach: each dot product done between rows of the matrix $W^T$ and the vector $y$ contain just parts of the filter weights, which prevents us from having an implementation that is as efficient as the convolution operation.

### Trick: subpixel convolution

The idea is to tweak the convolution operation in a way so that the deconvolution operation can be done in a similar fashion. We do so by introducing some padding on the $y$ vector:
\begin{align*}
0
0
0
y_1
0
y_2
0
\vdots
y_5
0
0
0
\end{align*}

That way, the matrix involved in the operation mapping $y$ to $x$ is the same we had during the convolution operation except that:

- the weights at each row of the matrix are flipped in order (i.e. $(w_1,w_2,w_3,w_4)$ becomes $(w_4,w_3,w_2,w_1)$)
- there is no more ‘stride’ between two consecutive rows of the matrix

## Unpooling

Similarly, the unpooling operation is the inverse operation of max-pool. To do so, we use the mask (called ‘switches’) that selected maximum entries in the pooling operations in order to restore their original positions. We then set the other values to 0.