CS231n • Neural Networks and Backpropagation
Neural Networks and Backpropagation

Computing the analytic gradient for arbitrary complex functions:

What is a Computational graphs?
 Used to represent any function. with nodes.
 Using Computational graphs can easy lead us to use a technique that called backpropagation. Even with complex models like CNN and RNN.

Backpropagation simple example:

Suppose we have \(f(x,y,z) = (x+y)z\).

Then graph can be represented this way:
X \ (+)> q (*)> f / / Y / / / Z/

We made an intermediate variable
q
to hold the values ofx+y
. 
Then we have:
q = (x+y) # dq/dx = 1 , dq/dy = 1 f = qz # df/dq = z , df/dz = q

Then:
df/dq = z df/dz = q df/dx = df/dq * dq/dx = z * 1 = z # Chain rule df/dy = df/dq * dq/dy = z * 1 = z # Chain rule


So in the Computational graphs, we call each operation
f
. For eachf
we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule. 
In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.

A bigger example:
![]((assets/neuralnetsandbackprop/01.png)
 Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.

Modularized implementation: forward/ backward API (example multiply code):
class MultuplyGate(object): """ x,y are scalars """ def forward(x,y): z = x*y self.x = x # Cache self.y = y # Cache # We cache x and y because we know that the derivatives contains them. return z def backward(dz): dx = self.y * dz #self.y is dx dy = self.x * dz return [dx, dy]

If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:
 Multiplication
 Max
 Plus
 Minus
 Sigmoid
 Convolution


So to define neural network as a function:
 (Before) Linear score function: \(f = Wx\)
 (Now) 2layer neural network: \(f = W_2*max(0, W_1*x)\)
 Where max is the RELU non linear function
 (Now) 3layer neural network: \(f = W_3*max(0, W_2*max(0, W_1*x)\)
 And so on…

Neural networks is a stack of some simple operation that forms complex operations.
Citation
If you found our work useful, please cite it as:
@article{Chadha2020NeuralNetworksAndBackpropagation,
title = {Neural Networks and Backpropagation},
author = {Chadha, Aman},
journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
year = {2020},
note = {\url{https://aman.ai}}
}