Cost Function for Neural Networks
The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.
Cost Function for Neural Networks
Notation
We define:
- = total number of layers in the network
- = number of units (excluding bias unit) in layer
- = number of output units (classes)
For multiclass problems, the output is written as:
Recall: Logistic Regression Cost Function
The regularized logistic regression cost is:
Neural Network Cost Function
For neural networks, the cost function generalizes to:
Key Ideas
Double Sum
The term
means:
- Loop over all training examples ()
- Loop over all output units ()
- Compute logistic loss for each output
- Sum them together
This is simply the total loss across all output neurons.
Triple Sum (Regularization)
The term
means:
- Loop over all layers
- Loop over all weights in each layer
- Square every weight
- Add them all together
Important:
- Bias weights are not regularized.
- The index here does not refer to training examples.
- This term regularizes all parameters in the entire network.
Intuition
Neural network cost function =
- Logistic regression loss applied to every output unit
- Plus regularization over all weights in the network
