️☠️ Advance MultiVariant Linear Algebra 2
Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.
☠️ Advance MultiVariant Linear Algebra
💀 When Geometry Becomes Dangerous
Orthogonality = Independence
Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.
Two vectors are orthogonal if:
Or their dot product is zero. ie they form a 90 angle or are perpendicular.
In ML, orthogonality means:
- No linear dependence
- No shared directional information
This is why:
- PCA finds orthogonal principal components
- QR decomposition builds orthogonal bases
- SVD decomposes into orthogonal directions
Orthogonality reduces redundancy.
Eigenvectors
Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.
Instead, they are only
- scaled—stretched
- compressed
- reversed
by a scalar factor known as the eigenvalue
If:
Av = $\lambda$ vThat means:
- Applying transformation
- Does not change the direction of
- Only scales it by
Eigenvectors are:
- Directions that remain stable under transformation.
- Natural Directions of a Transformation
In ML:
- PCA uses eigenvectors of the covariance matrix
- These represent directions of maximum variance
- Each eigenvector = principal axis of data
Eigenvalues tell you how important that direction is.
Singular Value Decomposition (SVD)
Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.
SVD states:
Geometrically:
- rotates the space
- scales each axis
- rotates again
So any matrix transformation can be seen as:
Rotate → Stretch → Rotate
This is why SVD is foundational in:
- Dimensionality reduction
- Embeddings
- LLM weight compression
- Recommender systems
Neural Networks as Layered Transformations
Each layer computes:
Which is:
- Linear transformation ( W )
- Shift ( b )
- Nonlinear activation
Geometrically:
- Linear layers reshape space
- Activations bend space
- Deep networks progressively warp geometry
Training adjusts ( W ) so that:
- Classes become linearly separable
- Desired outputs align with target directions
Deep learning is geometry engineering.
A matrix is a transformation of space.
All machine learning models are compositions of transformations.
If:
Then transforms vector into a new vector .
Geometrically, a matrix can:
- Stretch
- Compress
- Rotate
- Reflect
- Shear
- Project
Linear Regression = Projection
When solving linear regression:
You are not just solving equations.
You are projecting vector ( y ) onto the column space of ( X ).
Meaning:
- ( X ) defines a subspace (all linear combinations of features)
- ( y ) may not lie in that space
- We find the closest point in that space
This closest point is the orthogonal projection.
The residual error is perpendicular to the feature space.
Mathematically:
Geometric meaning:
The error vector is orthogonal to every feature direction.
