Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. Summary

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Neural Networks Introduction

Neural Networks Introduction

A concise cheat sheet covering core concepts, dimensions, activation functions, forward propagation, cost function, backpropagation, gradient checking, random initialization, training pipeline, and key intuition for neural networks.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Putting It Together — Training a Neural Network

Next →

Machine Learning Learning Path

Neural Networks — Revision Cheat Sheet


1️⃣ Core Concepts

Neural Network Structure

  • Input layer
  • Hidden layer(s)
  • Output layer

Each layer computes:

z(l)=Θ(l−1)a(l−1)z^{(l)} = \Theta^{(l-1)} a^{(l-1)}z(l)=Θ(l−1)a(l−1) a(l)=g(z(l))a^{(l)} = g(z^{(l)})a(l)=g(z(l))

Add bias unit:

a0(l)=1a_0^{(l)} = 1a0(l)​=1

2️⃣ Dimensions

If:

  • Layer lll has sls_lsl​ units
  • Layer l+1l+1l+1 has sl+1s_{l+1}sl+1​ units

Then:

Θ(l)∈Rsl+1×(sl+1)\Theta^{(l)} \in \mathbb{R}^{s_{l+1} \times (s_l + 1)}Θ(l)∈Rsl+1​×(sl​+1)
  • +1 accounts for bias
  • Output layer size = number of classes

3️⃣ Activation Function

Most common: Sigmoid

g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1​

Derivative (important for backprop):

g′(z)=g(z)(1−g(z))g'(z) = g(z)(1 - g(z))g′(z)=g(z)(1−g(z))

4️⃣ Forward Propagation

For each layer:

  1. Compute:

    z(l)=Θ(l−1)a(l−1)z^{(l)} = \Theta^{(l-1)} a^{(l-1)}z(l)=Θ(l−1)a(l−1)
  2. Apply activation:

    a(l)=g(z(l))a^{(l)} = g(z^{(l)})a(l)=g(z(l))

Final output:

hΘ(x)=a(L)h_\Theta(x) = a^{(L)}hΘ​(x)=a(L)

5️⃣ Cost Function (Multiclass)

J(Θ)=−1m∑i=1m∑k=1K[yk(i)log⁡((hΘ(x(i)))k)+(1−yk(i))log⁡(1−(hΘ(x(i)))k)]+λ2m∑(Θi,j(l))2J(\Theta) = - \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k)+ (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right]+ \frac{\lambda}{2m} \sum (\Theta^{(l)}_{i,j})^2J(Θ)=−m1​i=1∑m​k=1∑K​[yk(i)​log((hΘ​(x(i)))k​)+(1−yk(i)​)log(1−(hΘ​(x(i)))k​)]+2mλ​∑(Θi,j(l)​)2
  • Double sum → over training examples and outputs
  • Regularization → sum of squared weights
  • Bias terms NOT regularized

6️⃣ Backpropagation

Output Layer Error

δ(L)=a(L)−y\delta^{(L)} = a^{(L)} - yδ(L)=a(L)−y

Hidden Layer Error

δ(l)=((Θ(l))Tδ(l+1)). ⁣∗a(l). ⁣∗(1−a(l))\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) .\!* a^{(l)} .\!* (1 - a^{(l)})δ(l)=((Θ(l))Tδ(l+1)).∗a(l).∗(1−a(l))

Gradient Accumulation

Δ(l)=Δ(l)+δ(l+1)(a(l))T\Delta^{(l)} = \Delta^{(l)} + \delta^{(l+1)} (a^{(l)})^TΔ(l)=Δ(l)+δ(l+1)(a(l))T

Final Gradient

For non-bias terms:

D(l)=1m(Δ(l)+λΘ(l))D^{(l)} = \frac{1}{m} \left( \Delta^{(l)} + \lambda \Theta^{(l)} \right)D(l)=m1​(Δ(l)+λΘ(l))

For bias terms: (no regularization)


7️⃣ Gradient Checking

Numerical approximation:

∂∂ΘJ(Θ)≈J(Θ+ϵ)−J(Θ−ϵ)2ϵ\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}∂Θ∂​J(Θ)≈2ϵJ(Θ+ϵ)−J(Θ−ϵ)​

Use:

  • ϵ=10−4\epsilon = 10^{-4}ϵ=10−4
  • Only for debugging
  • Disable after verification

8️⃣ Random Initialization

Do NOT initialize weights to zero.

Initialize:

Θi,j(l)∈[−ϵ,ϵ]\Theta_{i,j}^{(l)} \in [-\epsilon, \epsilon]Θi,j(l)​∈[−ϵ,ϵ]

Using:

Theta = rand(x,y) * (2*epsilon) - epsilon;

This breaks symmetry.


9️⃣ Training Pipeline

  1. Choose architecture
  2. Randomly initialize weights
  3. Forward propagation
  4. Compute cost
  5. Backpropagation
  6. Gradient checking (once)
  7. Optimize using gradient descent
  8. Repeat until convergence

🔟 Key Intuition

Neural network training is:

  • Forward pass → prediction
  • Backward pass → compute gradients
  • Gradient descent → update weights

Deep learning = repeated application of this process.

AI-DeepLearning/Summary
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.