# Primers • Backprop Guide

- Background: Backprop
- Why understand Backprop?
- Backprop and Gradient Descent
- Primer: Differential Calculus
- Primer: Chain Rule for Backprop
- (Partial) Derivatives of Standard Layers/Loss Functions
- (Partial) Gradients of Standard Layers/Loss Functions
- References

## Background: Backprop

- From the Wikipedia article on Backprop,

Backpropagation, an abbreviation for “backward propagation of errors”, is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the

gradient of a loss function with respect to all the weightsin the network. The gradient is fed to theoptimization methodwhich in turn uses it toupdate the weights, in an attempt tominimize the loss function.

- Note that the terms backprop and backward pass are used interchangeably. Technically, you carry out backprop during the backward pass while training your network.

## Why understand Backprop?

- Read through Andrej Karpathy’s famous post “Yes you should understand backprop” about the need of understanding back propagation coining it as a Leaky Abstraction. From the post:

It is easy to fall into the trap of

abstracting away the learning process— believing that you can simplystack arbitrary layerstogether and backprop will “magically make them work” on your data.

## Backprop and Gradient Descent

- Gradient descent is the backbone of backprop. During backprop, we update our weights using gradient descent, which is a first-order iterative optimization algorithm for finding the
**minima**of our (differentiable)**loss function**. - To minimize our loss function using gradient descent, we take steps proportional to the negative of the gradient of the function at the current point.

## Primer: Differential Calculus

- Calculus is the study of continuous change. It has two major sub-fields:
*differential calculus*, which studies the rate of change of functions, and*integral calculus*, which studies the area under the curve. Differential calculus is at the core of Deep Learning, so it is important to understand what derivatives and gradients are, how they are used in Deep Learning, and understand what their limitations are. - For a primer on differential calculus, please refer Aurélien Geron’s notebook on differential calculus.

## Primer: Chain Rule for Backprop

- In calculus, the chain rule helps compute the derivative of
**composite**functions. - Formally, it states that:

- Read more about it here.

## (Partial) Derivatives of Standard Layers/Loss Functions

- Sigmoid Function
- tanh
- ReLU
- Cost Function for Logistic Regression
- Cost Function for Support Vector Machines/Hinge Loss