Putting It Together — Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Putting It Together — Training a Neural Network

Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.

Step 1: Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers
Number of hidden units per layer

Guidelines

Input layer size = dimension of feature vector $x^{(i)}$
Output layer size = number of classes
Hidden units:
- More units usually perform better
- But increase computational cost
Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer

Step 2: Training a Neural Network

1️⃣ Randomly Initialize Weights

Initialize each $\Theta^{(l)}$ randomly (not to zero).

This breaks symmetry and allows learning.

2️⃣ Forward Propagation

For each training example $x^{(i)}$ , compute:

h_\Theta(x^{(i)})

This gives the network’s prediction.

3️⃣ Implement the Cost Function

Compute:

J(\Theta)

This includes:

Logistic loss over all output units
Regularization term

4️⃣ Backpropagation

Use backpropagation to compute:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

This gives the gradients needed for optimization.

5️⃣ Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

Once verified:

Disable gradient checking
It is computationally expensive

6️⃣ Minimize the Cost Function

Use:

Gradient descent, or
A built-in optimization algorithm (e.g., advanced optimizers)

to minimize $J(\Theta)$ .

Training Loop

During training, we iterate over all examples:

for i = 1:m
    % Forward propagation
    % Compute activations a^(l)

    % Backpropagation
    % Compute delta terms d^(l) for l = 2,...,L
end

For each example:

Perform forward pass
Compute errors
Accumulate gradients

What Is Happening Intuitively?

For each training example:

The input moves forward through the network.
The output prediction is computed.
The error is measured.
The error is sent backward.
Gradients are accumulated.
We update weights to reduce error.

This process repeats until convergence.

Final Insight

Neural network training is simply:

Forward propagation
Backpropagation
Gradient-based optimization

All of deep learning is built on this foundation.

Putting It Together — Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Step 1: Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers

Number of hidden units per layer

Guidelines

Input layer size = dimension of feature vector

x^{(i)}

Output layer size = number of classes

Hidden units:

More units usually perform better
But increase computational cost

Default choice:

Use 1 hidden layer
If using multiple hidden layers, use the same number of units in each layer

5️⃣ Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

Once verified:

Disable gradient checking

It is computationally expensive

Putting It Together — Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Putting It Together — Training a Neural Network

Step 1: Choose a Network Architecture

Guidelines

Step 2: Training a Neural Network

1️⃣ Randomly Initialize Weights

2️⃣ Forward Propagation

3️⃣ Implement the Cost Function

4️⃣ Backpropagation

5️⃣ Gradient Checking

6️⃣ Minimize the Cost Function

Training Loop

What Is Happening Intuitively?

Complete Neural Network Workflow

Final Insight

Playstore

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

Putting It Together — Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Putting It Together — Training a Neural Network

Step 1: Choose a Network Architecture

Guidelines

Step 2: Training a Neural Network

1️⃣ Randomly Initialize Weights

2️⃣ Forward Propagation

3️⃣ Implement the Cost Function

4️⃣ Backpropagation

5️⃣ Gradient Checking

6️⃣ Minimize the Cost Function

Training Loop

What Is Happening Intuitively?

Complete Neural Network Workflow

Final Insight

Playstore