Revision Cheat Sheet

A concise cheat sheet covering core concepts, dimensions, activation functions, forward propagation, cost function, backpropagation, gradient checking, random initialization, training pipeline, and key intuition for neural networks.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Principal Component Analysis (PCA) Explained

Machine Learning Learning Path

Neural Networks — Revision Cheat Sheet

1️⃣ Core Concepts

Neural Network Structure

Input layer
Hidden layer(s)
Output layer

Each layer computes:

z^{(l)} = \Theta^{(l-1)} a^{(l-1)}

a^{(l)} = g(z^{(l)})

Add bias unit:

a_0^{(l)} = 1

2️⃣ Dimensions

If:

Layer $l$ has $s_l$ units
Layer $l+1$ has $s_{l+1}$ units

Then:

\Theta^{(l)} \in \mathbb{R}^{s_{l+1} \times (s_l + 1)}

+1 accounts for bias
Output layer size = number of classes

3️⃣ Activation Function

Most common: Sigmoid

g(z) = \frac{1}{1 + e^{-z}}

Derivative (important for backprop):

g'(z) = g(z)(1 - g(z))

4️⃣ Forward Propagation

For each layer:

Compute:
$z^{(l)} = \Theta^{(l-1)} a^{(l-1)}$
Apply activation:
$a^{(l)} = g(z^{(l)})$

Final output:

h_\Theta(x) = a^{(L)}

5️⃣ Cost Function (Multiclass)

J(\Theta) = - \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k)+ (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right]+ \frac{\lambda}{2m} \sum (\Theta^{(l)}_{i,j})^2

Double sum → over training examples and outputs
Regularization → sum of squared weights
Bias terms NOT regularized

6️⃣ Backpropagation

Output Layer Error

\delta^{(L)} = a^{(L)} - y

Hidden Layer Error

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) .\!* a^{(l)} .\!* (1 - a^{(l)})

Gradient Accumulation

\Delta^{(l)} = \Delta^{(l)} + \delta^{(l+1)} (a^{(l)})^T

Final Gradient

For non-bias terms:

D^{(l)} = \frac{1}{m} \left( \Delta^{(l)} + \lambda \Theta^{(l)} \right)

For bias terms: (no regularization)

7️⃣ Gradient Checking

Numerical approximation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

Use:

$\epsilon = 10^{-4}$
Only for debugging
Disable after verification

8️⃣ Random Initialization

Do NOT initialize weights to zero.

Initialize:

\Theta_{i,j}^{(l)} \in [-\epsilon, \epsilon]

Using:

Theta = rand(x,y) * (2*epsilon) - epsilon;

This breaks symmetry.

9️⃣ Training Pipeline

Choose architecture
Randomly initialize weights
Forward propagation
Compute cost
Backpropagation
Gradient checking (once)
Optimize using gradient descent
Repeat until convergence

🔟 Key Intuition

Neural network training is:

Forward pass → prediction
Backward pass → compute gradients
Gradient descent → update weights

Deep learning = repeated application of this process.

Revision Cheat Sheet

A concise cheat sheet covering core concepts, dimensions, activation functions, forward propagation, cost function, backpropagation, gradient checking, random initialization, training pipeline, and key intuition for neural networks.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Principal Component Analysis (PCA) Explained

Machine Learning Learning Path

Neural Networks — Revision Cheat Sheet

1️⃣ Core Concepts

Neural Network Structure

Input layer
Hidden layer(s)
Output layer

Each layer computes:

z^{(l)} = \Theta^{(l-1)} a^{(l-1)}

a^{(l)} = g(z^{(l)})

Add bias unit:

a_0^{(l)} = 1

2️⃣ Dimensions

If:

Layer $l$ has $s_l$ units
Layer $l+1$ has $s_{l+1}$ units

Then:

\Theta^{(l)} \in \mathbb{R}^{s_{l+1} \times (s_l + 1)}

+1 accounts for bias
Output layer size = number of classes

3️⃣ Activation Function

Most common: Sigmoid

g(z) = \frac{1}{1 + e^{-z}}

Derivative (important for backprop):

g'(z) = g(z)(1 - g(z))

4️⃣ Forward Propagation

For each layer:

Compute:
$z^{(l)} = \Theta^{(l-1)} a^{(l-1)}$
Apply activation:
$a^{(l)} = g(z^{(l)})$

Final output:

h_\Theta(x) = a^{(L)}

5️⃣ Cost Function (Multiclass)

J(\Theta) = - \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k)+ (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right]+ \frac{\lambda}{2m} \sum (\Theta^{(l)}_{i,j})^2

Double sum → over training examples and outputs
Regularization → sum of squared weights
Bias terms NOT regularized

6️⃣ Backpropagation

Output Layer Error

\delta^{(L)} = a^{(L)} - y

Hidden Layer Error

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) .\!* a^{(l)} .\!* (1 - a^{(l)})

Gradient Accumulation

\Delta^{(l)} = \Delta^{(l)} + \delta^{(l+1)} (a^{(l)})^T

Final Gradient

For non-bias terms:

D^{(l)} = \frac{1}{m} \left( \Delta^{(l)} + \lambda \Theta^{(l)} \right)

For bias terms: (no regularization)

7️⃣ Gradient Checking

Numerical approximation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

Use:

$\epsilon = 10^{-4}$
Only for debugging
Disable after verification

8️⃣ Random Initialization

Do NOT initialize weights to zero.

Initialize:

\Theta_{i,j}^{(l)} \in [-\epsilon, \epsilon]

Using:

Theta = rand(x,y) * (2*epsilon) - epsilon;

This breaks symmetry.

9️⃣ Training Pipeline

Choose architecture
Randomly initialize weights
Forward propagation
Compute cost
Backpropagation
Gradient checking (once)
Optimize using gradient descent
Repeat until convergence

🔟 Key Intuition

Neural network training is:

Forward pass → prediction
Backward pass → compute gradients
Gradient descent → update weights

Deep learning = repeated application of this process.

Revision Cheat Sheet

A concise cheat sheet covering core concepts, dimensions, activation functions, forward propagation, cost function, backpropagation, gradient checking, random initialization, training pipeline, and key intuition for neural networks.

Written by Hitesh Sahu, a passionate developer and blogger.

Neural Networks — Revision Cheat Sheet

1️⃣ Core Concepts

Neural Network Structure

2️⃣ Dimensions

3️⃣ Activation Function

4️⃣ Forward Propagation

5️⃣ Cost Function (Multiclass)

6️⃣ Backpropagation

Output Layer Error

Hidden Layer Error

Gradient Accumulation

Final Gradient

7️⃣ Gradient Checking

8️⃣ Random Initialization

9️⃣ Training Pipeline

🔟 Key Intuition

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Revision Cheat Sheet

A concise cheat sheet covering core concepts, dimensions, activation functions, forward propagation, cost function, backpropagation, gradient checking, random initialization, training pipeline, and key intuition for neural networks.

Written by Hitesh Sahu, a passionate developer and blogger.

Neural Networks — Revision Cheat Sheet

1️⃣ Core Concepts

Neural Network Structure

2️⃣ Dimensions

3️⃣ Activation Function

4️⃣ Forward Propagation

5️⃣ Cost Function (Multiclass)

6️⃣ Backpropagation

Output Layer Error

Hidden Layer Error

Gradient Accumulation

Final Gradient

7️⃣ Gradient Checking

8️⃣ Random Initialization

9️⃣ Training Pipeline

🔟 Key Intuition