• The partial derivative of the logistic regression cost function with respect to \(\theta\) is:
\[\frac{\partial J(\theta)}{\partial \theta_j} = \nabla_{\theta_j}J(\theta) = \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\]
  • Let’s begin with the cost function used for logistic regression, which is the average of the log loss across all training examples, as given below:

    \[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\]
    • where the logs are natural logarithms and \(h_{\theta}(x)\) is defined as:
    \[\begin{array}{l} h_{\theta}(x)=g(\theta^{T} x) \\ g(z)=\frac{1}{1+e^{-z}} \end{array}\]

  • We use the notation:
\[\theta x^{(i)} = \theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{n} x_{n}^{(i)}\]
  • Since our original cost function is the form of:
\[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\]
  • Now,
\[\begin{array}{c} \log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{-\theta x^{(i)}}}=-\log \left(1+e^{-\theta x^{(i)}}\right) \\ \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \left(1-\frac{1}{1+e^{-\theta x^{(i)}}}\right)=\log \left(e^{-\theta x^{(i)}}\right)-\log \left(1+e^{-\theta x^{(i)}}\right)=-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right) \end{array}\\ \text{because, }\left(1=\frac{\left(1+e^{-\theta x^{(i)}}\right)}{\left(1+e^{-\theta x^{(i)}}\right)}, \text { the 1's in numerator cancel, then we used: } \log \left(\frac{x}{y}\right)=\log (x)-\log (y)\right)\]
  • Plugging in the two simplified expressions above in our original cost function, we obtain:
\[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right)\right]\]
  • which can be simplified to:

    \[\boxed{J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\log \left(1+e^{\theta x^{(i)}}\right)\right]}\]
    • where the second equality follows from:
    \[-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)=-\left[\log e^{\theta x^{(i)}}+\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\log \left(1+e^{\theta x^{(i)}}\right)\] \[\text { because, } \log (x)+\log (y)=\log (x y)\]

  • Now, all you need is to compute the partial derivative of the boxed equation above w.r.t. \(\theta_{j}\), using the following:
\[\begin{array}{c} \frac{\partial}{\partial \theta_{j}} y_{(i)} \theta x^{(i)}=y_{(i)} x_{j}^{(i)} \\ \frac{\partial}{\partial \theta_{j}} \log \left(1+e^{\theta x^{(i)}}\right)=\frac{x_{j}^{(i)} e^{\theta x^{(i)}}}{1+e^{\theta x^{(i)}}}=x_{j}^{(i)} h_{\theta}\left(x^{(i)}\right) \end{array}\]
  • Finally, plugging in the two components above in the expression for \(\frac{\partial J(\theta)}{\partial \theta_j}\), we obtain the end result:
\[\boxed{\frac{\partial J(\theta)}{\partial \theta_j}=\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}}\]

References