Neural Networks and Backpropagation

  • Computing the analytic gradient for arbitrary complex functions:

    • What is a Computational graphs?

      • Used to represent any function. with nodes.
      • Using Computational graphs can easy lead us to use a technique that called back-propagation. Even with complex models like CNN and RNN.
    • Back-propagation simple example:

      • Suppose we have \(f(x,y,z) = (x+y)z\).

      • Then graph can be represented this way:

           (+)--> q ---(*)--> f
          /           /
        Y            /
      • We made an intermediate variable q to hold the values of x+y.

      • Then we have:

        q = (x+y)              # dq/dx = 1 , dq/dy = 1
        f = qz                 # df/dq = z , df/dz = q
      • Then:

        df/dq = z
        df/dz = q
        df/dx = df/dq * dq/dx = z * 1 = z       # Chain rule
        df/dy = df/dq * dq/dy = z * 1 = z       # Chain rule
    • So in the Computational graphs, we call each operation f. For each f we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule.

    • In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.

    • A bigger example:


      • Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.
    • Modularized implementation: forward/ backward API (example multiply code):

      class MultuplyGate(object):
        x,y are scalars
        def forward(x,y):
          z = x*y
          self.x = x  # Cache
          self.y = y  # Cache
          # We cache x and y because we know that the derivatives contains them.
          return z
        def backward(dz):
          dx = self.y * dz         #self.y is dx
          dy = self.x * dz
          return [dx, dy]
    • If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:

      • Multiplication
      • Max
      • Plus
      • Minus
      • Sigmoid
      • Convolution
  • So to define neural network as a function:

    • (Before) Linear score function: \(f = Wx\)
    • (Now) 2-layer neural network: \(f = W_2*max(0, W_1*x)\)
      • Where max is the RELU non linear function
    • (Now) 3-layer neural network: \(f = W_3*max(0, W_2*max(0, W_1*x)\)
    • And so on…
  • Neural networks is a stack of some simple operation that forms complex operations.


If you found our work useful, please cite it as:

  title   = {Neural Networks and Backpropagation},
  author  = {Chadha, Aman},
  journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
  year    = {2020},
  note    = {\url{}}