Training a Neural Network
In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.
Training a Neural Network
Putting It Together
Now that we have covered forward propagation, backpropagation, and gradient checking, letโs combine everything into a complete training pipeline.
1. ๐ Choose a Network Architecture
First, decide the structure of your neural network:
- Number of layers
- Number of hidden units per layer
- Number of Outputs
How to choose Network
- Input layer size = dimension of feature vector
- Output layer size = number of output classes
- Hidden units:
- More units usually perform better
- But increase computational cost
- Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer
2. ๐ Training a Neural Network
2.1 ๐ฒ Randomly Initialize Weights
Initialize each randomly (not to zero).
This breaks symmetry and allows learning.
2.2 โฉ Forward Propagation (FP)
For each training example , compute:
This gives the networkโs prediction.
2.3 ๐ฐ Implement the Cost Function
Compute:
This includes:
- Logistic loss over all output units
- Regularization term
2.4 โช Backpropagation (BP)
Use backpropagation to compute:
This gives the gradients needed for optimization.
2.5 ๐ข Gradient Checking
Use numerical approximation to verify backpropagation:
โ ๏ธ Once verified:
- Disable gradient checking
- It is computationally expensive
2.6 โ๏ธ Minimize the Cost Function
Use:
- Gradient descent, or
- A built-in optimization algorithm (e.g., advanced optimizers)
to minimize .
Training Loop
During training, we iterate over all examples:
for i = 1:m
% Forward propagation
% Compute activations a^(l)
% Backpropagation
% Compute delta terms d^(l) for l = 2,...,L
end
For each example:
- Perform forward pass
- Compute errors
- Accumulate gradients
Final Insight
Neural network training is simply:
- Forward propagation
- Backpropagation
- Gradient-based optimization
All of deep learning is built on this foundation.
Complete Neural Network Workflow
- Choose architecture
- Initialize weights randomly
- Implement forward propagation
- Implement cost function
- Implement backpropagation
- Perform gradient checking
- Optimize using gradient descent
- Train until convergence
