⚖️ Regularized Logistic Regression
Regularization helps prevent overfitting by penalizing large weights.
Compared to the non-regularized model, the regularized version produces smoother decision boundaries.
Cost Function (Without Regularization)
Recall the logistic regression cost function:
J(θ)=−m1i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
Cost Function With Regularization
We add a L2 penalty term: penalty term:
J(θ)=−m1i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
Regularization term
The Regularization term is:
j=1∑nθj2
The parameter vector contains:
θ1,…,θn
Explicitly excludes the bias term θ0.
- Regularization runs from j=1 to n
- So θ0 is not penalized
Why Exclude θ0?
The bias term controls the decision boundary shift.
We do not want to shrink it toward zero.
Only the other parameters are regularized.
Gradient Descent With Regularization
repeat until convergence:
{
For j=0 (bias):
No regularization term here for θ0:
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
For j≥1:
Update for
j=1,2,…,n
θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)+mλθj]
}
where:
hθ(x)=1+e−θTx1
This essentially looks similar to linear regression, but with the logistic cost function.
Simplified Update Rule
You can also rewrite it as:
For j≥1:
θj:=θj(1−αmλ)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
Intuition
Regularization:
- Penalizes large weights
- Reduces model complexity
- Helps prevent overfitting
- Encourages smoother decision boundaries
The regularized model is less likely to overfit compared to the non-regularized one.
Summary
Regularized logistic regression modifies:
- The cost function
- The gradient updates
Key rule:
- Do not regularize θ0
- Regularize all other parameters