Putting It Together — Training a Neural Network
In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.
Putting It Together — Training a Neural Network
Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.
Step 1: Choose a Network Architecture
First, decide the structure of your neural network:
- Number of layers
- Number of hidden units per layer
Guidelines
- Input layer size = dimension of feature vector
- Output layer size = number of classes
- Hidden units:
- More units usually perform better
- But increase computational cost
- Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer
Step 2: Training a Neural Network
1️⃣ Randomly Initialize Weights
Initialize each randomly (not to zero).
This breaks symmetry and allows learning.
2️⃣ Forward Propagation
For each training example , compute:
This gives the network’s prediction.
3️⃣ Implement the Cost Function
Compute:
This includes:
- Logistic loss over all output units
- Regularization term
4️⃣ Backpropagation
Use backpropagation to compute:
This gives the gradients needed for optimization.
5️⃣ Gradient Checking
Use numerical approximation to verify backpropagation:
Once verified:
- Disable gradient checking
- It is computationally expensive
6️⃣ Minimize the Cost Function
Use:
- Gradient descent, or
- A built-in optimization algorithm (e.g., advanced optimizers)
to minimize .
Training Loop
During training, we iterate over all examples:
for i = 1:m
% Forward propagation
% Compute activations a^(l)
% Backpropagation
% Compute delta terms d^(l) for l = 2,...,L
end
For each example:
- Perform forward pass
- Compute errors
- Accumulate gradients
What Is Happening Intuitively?
For each training example:
- The input moves forward through the network.
- The output prediction is computed.
- The error is measured.
- The error is sent backward.
- Gradients are accumulated.
- We update weights to reduce error.
This process repeats until convergence.
Complete Neural Network Workflow
- Choose architecture
- Initialize weights randomly
- Implement forward propagation
- Implement cost function
- Implement backpropagation
- Perform gradient checking
- Optimize using gradient descent
- Train until convergence
Final Insight
Neural network training is simply:
- Forward propagation
- Backpropagation
- Gradient-based optimization
All of deep learning is built on this foundation.
