Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models
Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.
⚖️ Cost Function Regularization
If a model is overfitting, we can reduce the influence of certain terms by increasing their cost. This discourages large weights.
Regularization balances:
- Bias
- Variance
General Regularized Cost Function
We can regularize all parameters using a single summation:
Where the regularization term is:
- is the regularization parameter that controls the strength of regularization.
- The summation is over to , excluding .
- This term penalizes large values of , encouraging smaller weights and thus simpler models.
Regularization Parameter
Regularization shrinks parameters. The more shrinkage you see, the larger the
Choosing correctly is essential for good generalization.
Lambda controls the curve of the decision boundary.
Larger → stronger regularization
→ all parameters shrink to zero → model becomes too simple → underfitting
- Parameter Weights shrink toward zero
- Reduces model complexity and make it rigid/linear
- Underfitting may occur
- Bias increases
- Variance decreases
Example:
Smaller (as )
→ no regularization → model may overfit
weaker regularization --> Less Penalty --> Large weights
- Parameter weights grow larger
- More complex models & becomes more flexible/curvy
- Risk of overfitting
- Variance increases
- Bias decreases
Small λ → Low bias, high variance (overfitting)
Example:
What Happens If ?
- No regularization is applied
- The model may overfit
- We revert to standard least squares / logistic regression
How to Choose the Best λ
To select the optimal regularization parameter:
- Choose candidate λ values
- Train models for each λ
- Compute cross-validation error (without regularization)
- Select best λ + model
- Evaluate once on test set
1. Create Candidate Values
Example:
S2. Train Models
For each value of λ:
- Train model parameters Θ
- Possibly try different model complexities (degrees, architectures, etc.)
3. Compute Cross-Validation Error
Evaluate using:
Important:
- Compute cross-validation error without regularization
- That means use λ = 0 when evaluating
This ensures fair comparison between models.
4. Select Best Combination
Choose the model and λ that produce the lowest cross-validation error.
5. Final Evaluation
Using the best:
- Θ
- λ
Evaluate on the test set:
This measures generalization performance.
Example: Polynomial Hypothesis
Consider the function:
If we want the model to behave more like a quadratic function, we can reduce the influence of:
Instead of removing these features, we modify the cost function.
Regularized Cost Function
We minimize:
Effect of Large Penalty
Adding large penalty terms forces:
This reduces the contribution of:
As a result:
- The hypothesis becomes smoother
- Overfitting decreases
- The curve behaves more like a quadratic function
