Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Vectorized Neural Networks

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.
Cover Image for Vectorized Neural Networks Model Representation

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Vectorized Neural Networks Model Representation

From Scalar Equations to Vector Form

Previously, we wrote each neuron separately.

For the hidden layer:

a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a^{(2)}_1 = g\left(\Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3\right)a1(2)​=g(Θ10(1)​x0​+Θ11(1)​x1​+Θ12(1)​x2​+Θ13(1)​x3​) a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a^{(2)}_2 = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)a2(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​) a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)a^{(2)}_3 = g\left(\Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3\right)a3(2)​=g(Θ30(1)​x0​+Θ31(1)​x1​+Θ32(1)​x2​+Θ33(1)​x3​)

Final hypothesis:

hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))h_\Theta(x) = a^{(3)}_1 = g\left(\Theta^{(2)}_{10}a^{(2)}_0 + \Theta^{(2)}_{11}a^{(2)}_1 + \Theta^{(2)}_{12}a^{(2)}_2 + \Theta^{(2)}_{13}a^{(2)}_3\right)hΘ​(x)=a1(3)​=g(Θ10(2)​a0(2)​+Θ11(2)​a1(2)​+Θ12(2)​a2(2)​+Θ13(2)​a3(2)​)

This works, but it does not scale. So we vectorize.


Step 1 — Define the Intermediate Variable

Define the weighted sum before activation:

zk(j)=Θk,0(j−1)a0(j−1)+Θk,1(j−1)a1(j−1)+⋯+Θk,n(j−1)an(j−1)z^{(j)}_k = \Theta^{(j-1)}_{k,0} a^{(j-1)}_0 + \Theta^{(j-1)}_{k,1} a^{(j-1)}_1 + \dots + \Theta^{(j-1)}_{k,n} a^{(j-1)}_nzk(j)​=Θk,0(j−1)​a0(j−1)​+Θk,1(j−1)​a1(j−1)​+⋯+Θk,n(j−1)​an(j−1)​

Activation becomes:

ak(j)=g(zk(j))a^{(j)}_k = g\left(z^{(j)}_k\right)ak(j)​=g(zk(j)​)

Step 2 — Vector Representation

Input layer:

x=a(1)=[x0x1⋮xn]x = a^{(1)} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}x=a(1)=​x0​x1​⋮xn​​​

Weighted sum vector:

z(j)=[z1(j)z2(j)⋮zsj(j)]z^{(j)} = \begin{bmatrix} z^{(j)}_1 \\ z^{(j)}_2 \\ \vdots \\ z^{(j)}_{s_j} \end{bmatrix}z(j)=​z1(j)​z2(j)​⋮zsj​(j)​​​

Where:

sj=number of units in layer js_j = \text{number of units in layer } jsj​=number of units in layer j

Step 3 — The Key Vectorized Equation

The entire layer becomes a single matrix multiplication:

z(j)=Θ(j−1)a(j−1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}z(j)=Θ(j−1)a(j−1)

Dimensions:

Θ(j−1)∈Rsj×(n+1)\Theta^{(j-1)} \in \mathbb{R}^{s_j \times (n+1)}Θ(j−1)∈Rsj​×(n+1) a(j−1)∈R(n+1)×1a^{(j-1)} \in \mathbb{R}^{(n+1) \times 1}a(j−1)∈R(n+1)×1 z(j)∈Rsj×1z^{(j)} \in \mathbb{R}^{s_j \times 1}z(j)∈Rsj​×1

Step 4 — Apply Activation Function

Activation is applied element-wise:

a(j)=g(z(j))a^{(j)} = g\left(z^{(j)}\right)a(j)=g(z(j))

If using sigmoid:

g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1​

Step 5 — Add Bias Unit

After computing activations, add the bias:

a(j)=[1a1(j)⋮asj(j)]a^{(j)} = \begin{bmatrix} 1 \\ a^{(j)}_1 \\ \vdots \\ a^{(j)}_{s_j} \end{bmatrix}a(j)=​1a1(j)​⋮asj​(j)​​​

Step 6 — Output Layer

Repeat the same process:

z(j+1)=Θ(j)a(j)z^{(j+1)} = \Theta^{(j)} a^{(j)}z(j+1)=Θ(j)a(j) a(j+1)=g(z(j+1))a^{(j+1)} = g\left(z^{(j+1)}\right)a(j+1)=g(z(j+1))

Final hypothesis:

hΘ(x)=a(j+1)h_\Theta(x) = a^{(j+1)}hΘ​(x)=a(j+1)

The Big Picture

Each layer performs:

Linear transformationz=Θa\text{Linear transformation} \quad z = \Theta aLinear transformationz=Θa

followed by

Nonlinearitya=g(z)\text{Nonlinearity} \quad a = g(z)Nonlinearitya=g(z)

Stacking these layers allows neural networks to represent complex nonlinear functions.

AI-DeepLearning/2-Vectorized-Neural-Networks
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.