• The partial derivative of the logistic regression cost function with respect to $$\theta$$ is:
$\frac{\partial J(\theta)}{\partial \theta_j} = \nabla_{\theta_j}J(\theta) = \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}$
• Let’s begin with the cost function used for logistic regression, which is the average of the log loss across all training examples, as given below:

$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)$
• where the logs are natural logarithms and $$h_{\theta}(x)$$ is defined as:
$\begin{array}{l} h_{\theta}(x)=g(\theta^{T} x) \\ g(z)=\frac{1}{1+e^{-z}} \end{array}$

• We use the notation:
$\theta x^{(i)} = \theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{n} x_{n}^{(i)}$
• Since our original cost function is the form of:
$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)$
• Now,
$\begin{array}{c} \log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{-\theta x^{(i)}}}=-\log \left(1+e^{-\theta x^{(i)}}\right) \\ \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \left(1-\frac{1}{1+e^{-\theta x^{(i)}}}\right)=\log \left(e^{-\theta x^{(i)}}\right)-\log \left(1+e^{-\theta x^{(i)}}\right)=-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right) \end{array}\\ \text{because, }\left(1=\frac{\left(1+e^{-\theta x^{(i)}}\right)}{\left(1+e^{-\theta x^{(i)}}\right)}, \text { the 1's in numerator cancel, then we used: } \log \left(\frac{x}{y}\right)=\log (x)-\log (y)\right)$
• Plugging in the two simplified expressions above in our original cost function, we obtain:
$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right)\right]$
• which can be simplified to:

$\boxed{J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\log \left(1+e^{\theta x^{(i)}}\right)\right]}$
• where the second equality follows from:
$-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)=-\left[\log e^{\theta x^{(i)}}+\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\log \left(1+e^{\theta x^{(i)}}\right)$ $\text { because, } \log (x)+\log (y)=\log (x y)$

• Now, all you need is to compute the partial derivative of the boxed equation above w.r.t. $$\theta_{j}$$, using the following:
$\begin{array}{c} \frac{\partial}{\partial \theta_{j}} y_{(i)} \theta x^{(i)}=y_{(i)} x_{j}^{(i)} \\ \frac{\partial}{\partial \theta_{j}} \log \left(1+e^{\theta x^{(i)}}\right)=\frac{x_{j}^{(i)} e^{\theta x^{(i)}}}{1+e^{\theta x^{(i)}}}=x_{j}^{(i)} h_{\theta}\left(x^{(i)}\right) \end{array}$
• Finally, plugging in the two components above in the expression for $$\frac{\partial J(\theta)}{\partial \theta_j}$$, we obtain the end result:
$\boxed{\frac{\partial J(\theta)}{\partial \theta_j}=\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}}$