CS231n • Neural Networks and Backpropagation
Neural Networks and Backpropagation
-
Computing the analytic gradient for arbitrary complex functions:
-
What is a Computational graphs?
- Used to represent any function. with nodes.
- Using Computational graphs can easy lead us to use a technique that called back-propagation. Even with complex models like CNN and RNN.
-
Back-propagation simple example:
-
Suppose we have \(f(x,y,z) = (x+y)z\).
-
Then graph can be represented this way:
X \ (+)--> q ---(*)--> f / / Y / / / Z---------/
-
We made an intermediate variable
q
to hold the values ofx+y
. -
Then we have:
q = (x+y) # dq/dx = 1 , dq/dy = 1 f = qz # df/dq = z , df/dz = q
-
Then:
df/dq = z df/dz = q df/dx = df/dq * dq/dx = z * 1 = z # Chain rule df/dy = df/dq * dq/dy = z * 1 = z # Chain rule
-
-
So in the Computational graphs, we call each operation
f
. For eachf
we calculate the local gradient before we go on back propagation and then we compute the gradients in respect of the loss function using the chain rule. -
In the Computational graphs you can split each operation to as simple as you want but the nodes will be a lot. if you want the nodes to be smaller be sure that you can compute the gradient of this node.
-
A bigger example:
![]((assets/neuralnets-and-backprop/01.png)
- Hint: the back propagation of two nodes going to one node from the back is by adding the two derivatives.
-
Modularized implementation: forward/ backward API (example multiply code):
class MultuplyGate(object): """ x,y are scalars """ def forward(x,y): z = x*y self.x = x # Cache self.y = y # Cache # We cache x and y because we know that the derivatives contains them. return z def backward(dz): dx = self.y * dz #self.y is dx dy = self.x * dz return [dx, dy]
-
If you look at a deep learning framework you will find it follow the Modularized implementation where each class has a definition for forward and backward. For example:
- Multiplication
- Max
- Plus
- Minus
- Sigmoid
- Convolution
-
-
So to define neural network as a function:
- (Before) Linear score function: \(f = Wx\)
- (Now) 2-layer neural network: \(f = W_2*max(0, W_1*x)\)
- Where max is the RELU non linear function
- (Now) 3-layer neural network: \(f = W_3*max(0, W_2*max(0, W_1*x)\)
- And so on…
-
Neural networks is a stack of some simple operation that forms complex operations.
Citation
If you found our work useful, please cite it as:
@article{Chadha2020NeuralNetworksAndBackpropagation,
title = {Neural Networks and Backpropagation},
author = {Chadha, Aman},
journal = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
year = {2020},
note = {\url{https://aman.ai}}
}