Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Algebra for Notation and Geometry

️☠️ Advance MultiVariant Linear Algebra

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous

Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

v^T w = 0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w =0

In ML, orthogonality means:

No linear dependence
No shared directional information

This is why:

PCA finds orthogonal principal components
QR decomposition builds orthogonal bases
SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

scaled—stretched
compressed
reversed

by a scalar factor known as the eigenvalue $\lambda$

If:

Av = $\lambda$ v

That means:

Applying transformation $A$
Does not change the direction of $v$
Only scales it by $\lambda$

Eigenvectors are:

Directions that remain stable under transformation.
Natural Directions of a Transformation

In ML:

PCA uses eigenvectors of the covariance matrix
These represent directions of maximum variance
Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X = U \Sigma V^T

Geometrically:

$V^T$ rotates the space
$Sigma$ scales each axis
$U$ rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

Dimensionality reduction
Embeddings
LLM weight compression
Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z = Wx + b

Which is:

Linear transformation ( W )
Shift ( b )
Nonlinear activation

Geometrically:

Linear layers reshape space
Activations bend space
Deep networks progressively warp geometry

Training adjusts ( W ) so that:

Classes become linearly separable
Desired outputs align with target directions

Deep learning is geometry engineering.

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y = Ax

Then $A$ transforms vector $x$ into a new vector $y$ .

Geometrically, a matrix can:

Stretch
Compress
Rotate
Reflect
Shear
Project

Linear Regression = Projection

When solving linear regression:

\hat{\theta} = (X^T X)^{-1} X^T y

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

( X ) defines a subspace (all linear combinations of features)
( y ) may not lie in that space
We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

X^T (y - X\hat{\theta}) = 0

Geometric meaning:

The error vector is orthogonal to every feature direction.

AI-Math/MultiVariant-Linear-Algebra-2

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Algebra for Notation and Geometry

️☠️ Advance MultiVariant Linear Algebra

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous

Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

v^T w = 0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w =0

In ML, orthogonality means:

No linear dependence
No shared directional information

This is why:

PCA finds orthogonal principal components
QR decomposition builds orthogonal bases
SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

scaled—stretched
compressed
reversed

by a scalar factor known as the eigenvalue $\lambda$

If:

Av = $\lambda$ v

That means:

Applying transformation $A$
Does not change the direction of $v$
Only scales it by $\lambda$

Eigenvectors are:

Directions that remain stable under transformation.
Natural Directions of a Transformation

In ML:

PCA uses eigenvectors of the covariance matrix
These represent directions of maximum variance
Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X = U \Sigma V^T

Geometrically:

$V^T$ rotates the space
$Sigma$ scales each axis
$U$ rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

Dimensionality reduction
Embeddings
LLM weight compression
Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z = Wx + b

Which is:

Linear transformation ( W )
Shift ( b )
Nonlinear activation

Geometrically:

Linear layers reshape space
Activations bend space
Deep networks progressively warp geometry

Training adjusts ( W ) so that:

Classes become linearly separable
Desired outputs align with target directions

Deep learning is geometry engineering.

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y = Ax

Then $A$ transforms vector $x$ into a new vector $y$ .

Geometrically, a matrix can:

Stretch
Compress
Rotate
Reflect
Shear
Project

Linear Regression = Projection

When solving linear regression:

\hat{\theta} = (X^T X)^{-1} X^T y

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

( X ) defines a subspace (all linear combinations of features)
( y ) may not lie in that space
We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

X^T (y - X\hat{\theta}) = 0

Geometric meaning:

The error vector is orthogonal to every feature direction.

AI-Math/MultiVariant-Linear-Algebra-2

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

☠️ Advance MultiVariant Linear Algebra

Orthogonality = Independence

Eigenvectors

Singular Value Decomposition (SVD)

Neural Networks as Layered Transformations

A matrix is a transformation of space.

Linear Regression = Projection

Playstore

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

☠️ Advance MultiVariant Linear Algebra

Orthogonality = Independence

Eigenvectors

Singular Value Decomposition (SVD)

Neural Networks as Layered Transformations

A matrix is a transformation of space.

Linear Regression = Projection

Playstore