Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Regularized Linear Regression

Advance Maths for Machine Learning

⚖️ Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights.

Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Cost Function (Without Regularization)

Recall the logistic regression cost function:

J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]

Cost Function With Regularization

We add a L2 penalty term: penalty term:

J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

Regularization term

The Regularization term is:

\sum_{j=1}^{n} \theta_j^2

The parameter vector contains:

\theta_1, \dots, \theta_n

Explicitly excludes the bias term $\theta_0$ .

Regularization runs from $j = 1$ to $n$
So $\theta_0$ is not penalized

Why Exclude $\theta_0$ ?

The bias term controls the decision boundary shift.

We do not want to shrink it toward zero.

Only the other parameters are regularized.

Gradient Descent With Regularization

repeat until convergence: {

For $j = 0$ (bias):

No regularization term here for $\theta_0$ :

\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_0^{(i)}

For $j \ge 1$ :

Update for $𝑗 = 1 , 2 , \dots, 𝑛$

\theta_j := \theta_j - \alpha \left[ \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m}\theta_j \right]

}

where:

h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}

This essentially looks similar to linear regression, but with the logistic cost function.

Simplified Update Rule

You can also rewrite it as:

For $j \ge 1$ :

\theta_j := \theta_j \left(1 - \alpha \frac{\lambda}{m}\right) - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}

Intuition

Regularization:

Penalizes large weights
Reduces model complexity
Helps prevent overfitting
Encourages smoother decision boundaries

The regularized model is less likely to overfit compared to the non-regularized one.

Summary

Regularized logistic regression modifies:

The cost function
The gradient updates

Key rule:

Do not regularize $\theta_0$
Regularize all other parameters

AI-Machine-Learning/9-Regularized-Logistic-Regression

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Regularized Linear Regression

Advance Maths for Machine Learning

⚖️ Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights.

Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Cost Function (Without Regularization)

Recall the logistic regression cost function:

J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]

Cost Function With Regularization

We add a L2 penalty term: penalty term:

J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

Regularization term

The Regularization term is:

\sum_{j=1}^{n} \theta_j^2

The parameter vector contains:

\theta_1, \dots, \theta_n

Explicitly excludes the bias term $\theta_0$ .

Regularization runs from $j = 1$ to $n$
So $\theta_0$ is not penalized

Why Exclude $\theta_0$ ?

The bias term controls the decision boundary shift.

We do not want to shrink it toward zero.

Only the other parameters are regularized.

Gradient Descent With Regularization

repeat until convergence: {

For $j = 0$ (bias):

No regularization term here for $\theta_0$ :

\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_0^{(i)}

For $j \ge 1$ :

Update for $𝑗 = 1 , 2 , \dots, 𝑛$

\theta_j := \theta_j - \alpha \left[ \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m}\theta_j \right]

}

where:

h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}

This essentially looks similar to linear regression, but with the logistic cost function.

Simplified Update Rule

You can also rewrite it as:

For $j \ge 1$ :

\theta_j := \theta_j \left(1 - \alpha \frac{\lambda}{m}\right) - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}

Intuition

Regularization:

Penalizes large weights
Reduces model complexity
Helps prevent overfitting
Encourages smoother decision boundaries

The regularized model is less likely to overfit compared to the non-regularized one.

Summary

Regularized logistic regression modifies:

The cost function
The gradient updates

Key rule:

Do not regularize $\theta_0$
Regularize all other parameters

AI-Machine-Learning/9-Regularized-Logistic-Regression

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Regularized Logistic Regression

Cost Function (Without Regularization)

Cost Function With Regularization

Regularization term

Why Exclude $\theta_0$ ?

Gradient Descent With Regularization

For $j = 0$ (bias):

For $j \ge 1$ :

Simplified Update Rule

Intuition

Summary

Playstore

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Regularized Logistic Regression

Cost Function (Without Regularization)

Cost Function With Regularization

Regularization term

Why Exclude $\theta_0$ ?

Gradient Descent With Regularization

For $j = 0$ (bias):

For $j \ge 1$ :

Simplified Update Rule

Intuition

Summary

Playstore

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Regularized Logistic Regression

Cost Function (Without Regularization)

Cost Function With Regularization

Regularization term

Why Exclude θ0\theta_0θ0​?

Gradient Descent With Regularization

For j=0j = 0j=0 (bias):

For j≥1j \ge 1j≥1:

Simplified Update Rule

Intuition

Summary

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Regularized Logistic Regression

Regularization helps prevent overfitting by penalizing large weights. Compared to the non-regularized model, the regularized version produces smoother decision boundaries.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Regularized Logistic Regression

Cost Function (Without Regularization)

Cost Function With Regularization

Regularization term

Why Exclude θ0\theta_0θ0​?

Gradient Descent With Regularization

For j=0j = 0j=0 (bias):

For j≥1j \ge 1j≥1:

Simplified Update Rule

Intuition

Summary

Why Exclude $\theta_0$ ?

For $j = 0$ (bias):

For $j \ge 1$ :

Why Exclude $\theta_0$ ?

For $j = 0$ (bias):

For $j \ge 1$ :