Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. MultiVariant Linear Algebra 2

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.
Cover Image for ️☠️ Advance MultiVariant Linear Algebra 2

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous


Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

vTw=0v^T w = 0vTw=0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w=0v.w =0v.w=0

In ML, orthogonality means:

  • No linear dependence
  • No shared directional information

This is why:

  • PCA finds orthogonal principal components
  • QR decomposition builds orthogonal bases
  • SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

  • scaled—stretched
  • compressed
  • reversed

by a scalar factor known as the eigenvalue λ\lambdaλ

If:

Av = $\lambda$ v

That means:

  • Applying transformation AAA
  • Does not change the direction of vvv
  • Only scales it by λ\lambdaλ

Eigenvectors are:

  • Directions that remain stable under transformation.
  • Natural Directions of a Transformation

In ML:

  • PCA uses eigenvectors of the covariance matrix
  • These represent directions of maximum variance
  • Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X=UΣVTX = U \Sigma V^TX=UΣVT

Geometrically:

  1. VTV^TVT rotates the space
  2. SigmaSigmaSigma scales each axis
  3. UUU rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

  • Dimensionality reduction
  • Embeddings
  • LLM weight compression
  • Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z=Wx+bz = Wx + bz=Wx+b

Which is:

  1. Linear transformation ( W )
  2. Shift ( b )
  3. Nonlinear activation

Geometrically:

  • Linear layers reshape space
  • Activations bend space
  • Deep networks progressively warp geometry

Training adjusts ( W ) so that:

  • Classes become linearly separable
  • Desired outputs align with target directions

Deep learning is geometry engineering.


A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y=Axy = Axy=Ax

Then AAA transforms vector xxx into a new vector yyy.

Geometrically, a matrix can:

  • Stretch
  • Compress
  • Rotate
  • Reflect
  • Shear
  • Project

Linear Regression = Projection

When solving linear regression:

θ^=(XTX)−1XTy\hat{\theta} = (X^T X)^{-1} X^T yθ^=(XTX)−1XTy

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

  • ( X ) defines a subspace (all linear combinations of features)
  • ( y ) may not lie in that space
  • We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

XT(y−Xθ^)=0X^T (y - X\hat{\theta}) = 0XT(y−Xθ^)=0

Geometric meaning:

The error vector is orthogonal to every feature direction.


AI-Math/MultiVariant-Linear-Algebra-2
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.