Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦈 Sharks existed before trees 🌳.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-DeepLearning

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Data Science

Machine Learning

Deep Learning

Neural Networks

Artificial Intelligence

Computational Graphs

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Training a Neural Network

Putting It Together

Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.

1. 🔀 Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers $L$
Number of hidden units per layer $j$
Number of Outputs $y$

How to choose Network

Input layer size = dimension of feature vector $x^{(i)}$
Output layer size = number of output classes
Hidden units:
- More units usually perform better
- But increase computational cost
Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

Initialize each $\Theta^{(l)}$ randomly (not to zero).

This breaks symmetry and allows learning.

2.2 ⏩ Forward Propagation (FP)

For each training example $x^{(i)}$ , compute:

h_\Theta(x^{(i)})

This gives the network’s prediction.

2.3 💰 Implement the Cost Function

Compute:

J(\Theta)

This includes:

Logistic loss over all output units
Regularization term

2.4 ⏪ Backpropagation (BP)

Use backpropagation to compute:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

This gives the gradients needed for optimization.

2.5 🎢 Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

⚠️ Once verified:

Disable gradient checking
It is computationally expensive

2.6 ⚖️ Minimize the Cost Function

Use:

Gradient descent, or
A built-in optimization algorithm (e.g., advanced optimizers)

to minimize $J(\Theta)$ .

Training Loop

During training, we iterate over all examples:

for i = 1:m
    % Forward propagation
    % Compute activations a^(l)

    % Backpropagation
    % Compute delta terms d^(l) for l = 2,...,L
end

For each example:

Perform forward pass
Compute errors
Accumulate gradients

Final Insight

Neural network training is simply:

Forward propagation
Backpropagation
Gradient-based optimization

All of deep learning is built on this foundation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

AI-DeepLearning/10-Neural-Network-Training

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦈 Sharks existed before trees 🌳.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-DeepLearning

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Data Science

Machine Learning

Deep Learning

Neural Networks

Artificial Intelligence

Computational Graphs

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Training a Neural Network

Putting It Together

Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.

1. 🔀 Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers $L$
Number of hidden units per layer $j$
Number of Outputs $y$

How to choose Network

Input layer size = dimension of feature vector $x^{(i)}$
Output layer size = number of output classes
Hidden units:
- More units usually perform better
- But increase computational cost
Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

Initialize each $\Theta^{(l)}$ randomly (not to zero).

This breaks symmetry and allows learning.

2.2 ⏩ Forward Propagation (FP)

For each training example $x^{(i)}$ , compute:

h_\Theta(x^{(i)})

This gives the network’s prediction.

2.3 💰 Implement the Cost Function

Compute:

J(\Theta)

This includes:

Logistic loss over all output units
Regularization term

2.4 ⏪ Backpropagation (BP)

Use backpropagation to compute:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

This gives the gradients needed for optimization.

2.5 🎢 Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

⚠️ Once verified:

Disable gradient checking
It is computationally expensive

2.6 ⚖️ Minimize the Cost Function

Use:

Gradient descent, or
A built-in optimization algorithm (e.g., advanced optimizers)

to minimize $J(\Theta)$ .

Training Loop

During training, we iterate over all examples:

for i = 1:m
    % Forward propagation
    % Compute activations a^(l)

    % Backpropagation
    % Compute delta terms d^(l) for l = 2,...,L
end

For each example:

Perform forward pass
Compute errors
Accumulate gradients

Final Insight

Neural network training is simply:

Forward propagation
Backpropagation
Gradient-based optimization

All of deep learning is built on this foundation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

AI-DeepLearning/10-Neural-Network-Training

Fetching content, this won’t take long…

🦈 Sharks existed before trees 🌳.

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Training a Neural Network

Putting It Together

1. 🔀 Choose a Network Architecture

How to choose Network

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

2.2 ⏩ Forward Propagation (FP)

2.3 💰 Implement the Cost Function

2.4 ⏪ Backpropagation (BP)

2.5 🎢 Gradient Checking

2.6 ⚖️ Minimize the Cost Function

Training Loop

Final Insight

Complete Neural Network Workflow

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🦈 Sharks existed before trees 🌳.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Training a Neural Network

Putting It Together

1. 🔀 Choose a Network Architecture

How to choose Network

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

2.2 ⏩ Forward Propagation (FP)

2.3 💰 Implement the Cost Function

2.4 ⏪ Backpropagation (BP)

2.5 🎢 Gradient Checking

2.6 ⚖️ Minimize the Cost Function

Training Loop

Final Insight

Complete Neural Network Workflow

Written by Hitesh Sahu, a passionate developer and blogger.