Aman's AI Journal • Primers • Partial Derivative of the Cost Function for Logistic Regression

The partial derivative of the logistic regression cost function with respect to \(\theta\) is:

\[\frac{\partial J(\theta)}{\partial \theta_j} = \nabla_{\theta_j}J(\theta) = \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\]

Let’s begin with the cost function used for logistic regression, which is the average of the log loss across all training examples, as given below:
\[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\]
- where the logs are natural logarithms and \(h_{\theta}(x)\) is defined as:
\[\begin{array}{l} h_{\theta}(x)=g(\theta^{T} x) \\ g(z)=\frac{1}{1+e^{-z}} \end{array}\]

We use the notation:

\[\theta x^{(i)} = \theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{n} x_{n}^{(i)}\]

Since our original cost function is the form of:

\[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\]

Now,

\[\begin{array}{c} \log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{-\theta x^{(i)}}}=-\log \left(1+e^{-\theta x^{(i)}}\right) \\ \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \left(1-\frac{1}{1+e^{-\theta x^{(i)}}}\right)=\log \left(e^{-\theta x^{(i)}}\right)-\log \left(1+e^{-\theta x^{(i)}}\right)=-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right) \end{array}\\ \text{because, }\left(1=\frac{\left(1+e^{-\theta x^{(i)}}\right)}{\left(1+e^{-\theta x^{(i)}}\right)}, \text { the 1's in numerator cancel, then we used: } \log \left(\frac{x}{y}\right)=\log (x)-\log (y)\right)\]

Plugging in the two simplified expressions above in our original cost function, we obtain:

\[J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right)\right]\]

which can be simplified to:
\[\boxed{J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{(i)} \theta x^{(i)}-\log \left(1+e^{\theta x^{(i)}}\right)\right]}\]
- where the second equality follows from:
\[-\theta x^{(i)}-\log \left(1+e^{-\theta x^{(i)}}\right)=-\left[\log e^{\theta x^{(i)}}+\log \left(1+e^{-\theta x^{(i)}}\right)\right]=-\log \left(1+e^{\theta x^{(i)}}\right)\] \[\text { because, } \log (x)+\log (y)=\log (x y)\]

Now, all you need is to compute the partial derivative of the boxed equation above w.r.t. \(\theta_{j}\), using the following:

\[\begin{array}{c} \frac{\partial}{\partial \theta_{j}} y_{(i)} \theta x^{(i)}=y_{(i)} x_{j}^{(i)} \\ \frac{\partial}{\partial \theta_{j}} \log \left(1+e^{\theta x^{(i)}}\right)=\frac{x_{j}^{(i)} e^{\theta x^{(i)}}}{1+e^{\theta x^{(i)}}}=x_{j}^{(i)} h_{\theta}\left(x^{(i)}\right) \end{array}\]

Finally, plugging in the two components above in the expression for \(\frac{\partial J(\theta)}{\partial \theta_j}\), we obtain the end result:

\[\boxed{\frac{\partial J(\theta)}{\partial \theta_j}=\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}}\]

References