Aman's AI Journal • CS231n • Neural Networks and Backpropagation

Neural Networks and Backpropagation
Citation

Neural Networks and Backpropagation

Computing the analytic gradient for arbitrary complex functions:
- What is a Computational graphs?
  - Used to represent any function. with nodes.
  - Using Computational graphs can easy lead us to use a technique that called back-propagation. Even with complex models like CNN and RNN.
- Back-propagation simple example:
  - Suppose we have \(f(x,y,z) = (x+y)z\).
  - Then graph can be represented this way:
    X \ (+)--> q ---(*)--> f / / Y / / / Z---------/
  - We made an intermediate variable q to hold the values of x+y.
  - Then we have:
    q = (x+y) # dq/dx = 1 , dq/dy = 1 f = qz # df/dq = z , df/dz = q
  - Then:
    df/dq = z df/dz = q df/dx = df/dq * dq/dx = z * 1 = z # Chain rule df/dy = df/dq * dq/dy = z * 1 = z # Chain rule
- So in the Computational graphs, we call each operation f. For each f we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule.
- In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.
- A bigger example:
  
  ![]((assets/neuralnets-and-backprop/01.png)
  - Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.
- Modularized implementation: forward/ backward API (example multiply code):
```
class MultuplyGate(object):
  """
  x,y are scalars
  """
      
  def forward(x,y):
    z = x*y
    self.x = x  # Cache
    self.y = y  # Cache
    # We cache x and y because we know that the derivatives contains them.
    return z
        
  def backward(dz):
    dx = self.y * dz         #self.y is dx
    dy = self.x * dz
    return [dx, dy]
```
- If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:
  - Multiplication
  - Max
  - Plus
  - Minus
  - Sigmoid
  - Convolution
So to define neural network as a function:
- (Before) Linear score function: \(f = Wx\)
- (Now) 2-layer neural network: \(f = W_2*max(0, W_1*x)\)
  - Where max is the RELU non linear function
- (Now) 3-layer neural network: \(f = W_3*max(0, W_2*max(0, W_1*x)\)
- And so on…
Neural networks is a stack of some simple operation that forms complex operations.

Citation

If you found our work useful, please cite it as:

@article{Chadha2020NeuralNetworksAndBackpropagation,
  title   = {Neural Networks and Backpropagation},
  author  = {Chadha, Aman},
  journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
  year    = {2020},
  note    = {\url{https://aman.ai}}
}