Regularized Linear Regression
Learn how regularization helps prevent overfitting in linear regression by adding a penalty term to the cost function, modifying the gradient descent update rules, and improving model generalization.
⚖️ Regularized Linear Regression
Regularization can be applied to both linear and logistic regression.
We first consider linear regression.
Gradient Descent with Regularization
We modify gradient descent to avoid penalizing .
We do not regularize .
Update Rules
Before Regularization (standard gradient descent): Repeat until convergence:
For :
With regularization:
repeat until convergence: {
For :
For rest add regularization:
}
Simplified Update Rule
This can be rearranged to:
The update for can also be written as:
Intuition
The term
is less than 1, so each update slightly shrinks .
This is called weight decay.
The second term is exactly the same as standard gradient descent.
Normal Equation with Regularization
Instead of iterative gradient descent, we can use the normal equation.
Without Regularization:
With Regularization (Ridge Regression):
This discourages large parameter values and reduces overfitting.
Where Matrix
Properties:
- It is almost the identity matrix except the top-left element is 0.
- Dimension:
First diagonal entry is 0 because no regularization for
Remaining diagonal entries are 1 because we regularize to .
This ensures:
- (bias term) is not regularized
- All other parameters are regularized
Why Regularization Helps
If , then is non-invertible.
If , it may or may not be invertible.
where
- = number of training examples
- = number of features
In other terms if , then:
is non-invertible.
However, with regularization we add to :
That makes whole term invertible (for ).
This improves numerical stability.
Key Takeaways
Regularization:
- Prevents overfitting
- Shrinks large weights
- Does not penalize
Two Methods:
- Gradient Descent (iterative)
- Normal Equation (closed-form)
