## Neural Networks and Backpropagation

• Computing the analytic gradient for arbitrary complex functions:

• What is a Computational graphs?

• Used to represent any function. with nodes.
• Using Computational graphs can easy lead us to use a technique that called back-propagation. Even with complex models like CNN and RNN.
• Back-propagation simple example:

• Suppose we have $$f(x,y,z) = (x+y)z$$.

• Then graph can be represented this way:

X
\
(+)--> q ---(*)--> f
/           /
Y            /
/
/
Z---------/

• We made an intermediate variable q to hold the values of x+y.

• Then we have:

q = (x+y)              # dq/dx = 1 , dq/dy = 1
f = qz                 # df/dq = z , df/dz = q

• Then:

df/dq = z
df/dz = q
df/dx = df/dq * dq/dx = z * 1 = z       # Chain rule
df/dy = df/dq * dq/dy = z * 1 = z       # Chain rule

• So in the Computational graphs, we call each operation f. For each f we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule.

• In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.

• A bigger example:

![]((assets/neuralnets-and-backprop/01.png)

• Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.
• Modularized implementation: forward/ backward API (example multiply code):

class MultuplyGate(object):
"""
x,y are scalars
"""

def forward(x,y):
z = x*y
self.x = x  # Cache
self.y = y  # Cache
# We cache x and y because we know that the derivatives contains them.
return z

def backward(dz):
dx = self.y * dz         #self.y is dx
dy = self.x * dz
return [dx, dy]

• If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:

• Multiplication
• Max
• Plus
• Minus
• Sigmoid
• Convolution
• So to define neural network as a function:

• (Before) Linear score function: $$f = Wx$$
• (Now) 2-layer neural network: $$f = W_2*max(0, W_1*x)$$
• Where max is the RELU non linear function
• (Now) 3-layer neural network: $$f = W_3*max(0, W_2*max(0, W_1*x)$$
• And so on…
• Neural networks is a stack of some simple operation that forms complex operations.

## Citation

If you found our work useful, please cite it as:

@article{Chadha2020NeuralNetworksAndBackpropagation,
title   = {Neural Networks and Backpropagation},
author  = {Chadha, Aman},
journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
year    = {2020},
note    = {\url{https://aman.ai}}
}