️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

️☠️ Advance MultiVariant Linear Algebra 2

Introduction to German Parts of Speech (Wortarten)

Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

Examples numbers, expressions, shapes, functions, and sets.
Complex Objects: theorems, proofs.

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

Latin: tendere meaning 'to stretch'

Scaler( $s \in \mathbb{R}$ )

Scalars are real numbers used in linear algebra

A single number, a 0-dimensional tensor.
Example: $5$ , $-3.14$ , $\pi$ , $e$

Matrices( $A \in \mathbb{R}^{n \times m}$ )

A matrix is a 2D array of numbers or table of numbers.

$A = \begin{bmatrix} 85 & 76 & 66 & 5 \\ 94 & 75 & 18 & 28 \\ 68 & 40 & 71 & 5 \end{bmatrix}$

Representation:

In mathematics:

$A \in \mathbb{R}^{3 \times 4}$

Where $A$ is a real-valued matrix with 4 rows and 2 columns.

In programming:

A = np.array([[85, 76, 66, 5],
              [94, 75, 18, 28],
              [68, 40, 71, 5]])

In theory:

Uppercase letters (A, B, X) → Matrices
Lowercase letters (x, y, z) → Vectors or scalars

Dimension:

$(m \times n)$

Where:

$m$ = rows
$n$ = columns
Example $3\times4$ matrix in the example

Square Matrix

A matrix with the same number of rows and columns ( $m = n$ ).

Element notation:

$A_{ij}$

The element in the $i-th$ row and $j-th$ column.

Example:

$A_{11}$ = 85 → Row 1, Column 1
$A_{32} = 40$ → Row 3, Column 2
$A_{41} = 5$ → Row 4, Column 1
$A_{23} = 18$ → Row 2, Column 3
$A_{64} = undefined$ → 6th row, 4th column does not exist

Use in Machine Learning

Represents Data Matrix, Model Parameters, Transformations

If we have:

$m$ training examples
$n$ features

The data matrix is:

X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}

Dimension $m \times n$ where:

Each row = one training example
Each column = one feature

Vectors( $\vec{x}$ )

A vector is a Matrix with 1 Column

Represents A point in $n$ high-dimensional space
Latin: vector , meaning "carrier" or "driver"
Have A direction ( $\vec{x}$ ) & A magnitude

$x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$

Represent as:

In Maths

$y \in \mathbb{R}^4$

In Programming

y = np.array([460, 232, 315, 178])

In Theory:

Uppercase letters (A, B, X) → Matrices

Dimension: $n \times 1$

In ML, vectors represent:

A data point → a vector
A feature column → a direction
A model weight vector → a direction of best fit

Example

$y = \begin{bmatrix} 460 \\ 232 \\ 315 \\ 178 \end{bmatrix}$

4 × 1 matrix Or a 4-dimensional vector

Element Indexing

$y_i$ = i-th element.

In mathematics, indexing usually starts at 1.
In programming indexing often starts at 0.
Unless otherwise specified, assume one-indexed notation in linear algebra.

Example:

$y_1 = 460$
$y_2 = 232$
$y_3 = 315$
$y_4 = 178$

Transpose ( $\mathbf{x}^T$ )

Transpose swaps rows and columns.

If: $A \in \mathbb{R}^{m \times n}$ then: $A^T \in \mathbb{R}^{n \times m}$

Element-wise:

$(A^T)_{ij} = A_{ji}$

A column vector becomes a row vector.

Given:

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

Then:

A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Used heavily in:

Normal Equation
Gradient derivations

Identity Matrix ( $I$ )

The identity matrix is the matrix equivalent of the number 1.

It is a square matrix with:

1’s on the diagonal
0’s everywhere else

I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Property:

AI = IA = A

Inverse Matrix( $A^{-1}$ )

The inverse of a matrix is like division.

Only square matrices can have inverses.

Matrix inverse satisfies:

A^{-1}A = AA^{-1} = I

Used in Normal Equation:

\theta = (X^T X)^{-1} X^T y

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

A matrix can be inverted

it has an inverse if it is full rank (rows and columns are linearly independent).

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

A matrix that does not have an inverse

Does not have a inverse because it is not full rank (rows or columns are linearly dependent).

Cause for non invertible Matrix:

Redundant feature: two feature related by a linear equation x2 = kx1 eg: size in feet and meter
More feature than training set(m<=n)): delete some feature or use regularization

Octave method for inverting matrix:

pinv(A) : Pseudo Inverse, calculates inverse even if matrix is non invertible
inv(A) : Inverse

Determinant ( $det(A)$ )

The determinant tells us whether a matrix is invertible.

For a 2 × 2 matrix:

A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

\det(A) = ad - bc

If: $det(A) \neq 0$ : The matrix is invertible.

If: $\det(A) = 0$ : The matrix is singular (not invertible).

Either no solution
Or infinitely many solution

Use in Machine Learning:

Normal Equation requires matrix inversion.

Closed-form solution:

\theta = (X^T X)^{-1} X^T y

In practice, we use numerical methods to avoid instability of matrix inversion.
Regularization can help make matrices invertible by adding a small value to the diagonal (Ridge Regression).

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y = Ax

Then $A$ transforms vector $x$ into a new vector $y$ .

Geometrically, a matrix can:

Stretch
Compress
Rotate
Reflect
Shear
Project

Matrix Addition/Subtraction

When Is Addition Allowed?

Addition is done element by element.

Two matrices can be added only if they have the same dimensions.

If:

$A, B \in \mathbb{R}^{m \times n}$

then:

$C = A + B$ where $(A + B)_{ij} = A_{ij} + B_{ij}$

is also an $( m \times n$ ) matrix.

A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}

A + B = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Subtraction is the same but with minus signs.

Scalar Multiplication / Division

Scalar multiplication is multiplying every element of a matrix by a single number (scalar).

Matrix-Matrix Multiplication

Element-wise: $C_{ij} = A_{ik}B_{kj}$ (sum over k)

Given 2 Matrices:

$A$ is $(m \times n)$ or $A \in \mathbb{R}^{m \times n}$
$B$ is $(n \times p)$ or $B \in \mathbb{R}^{n \times p}$

Then:

C = AB

where $C$ is a new matrix with dimensions:

$C(m \times p)$ or $C\in \mathbb{R}^{m \times p}$
inner dimensions must match (n) : $(m \times n)(n \times p) \rightarrow (m \times p)$

Properties:

Not commutative: $AB \ne BA$ : Order matters
Associative: $(AB)C = A(BC)$

Use in Machine Learning

Everything in deep learning is matrix multiplication:

Inputs × Weights
Weights × Activations
Gradient updates

Neural network forward pass: $Z = WX + b$

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.

`Vectorization`: Matrix-Vector Multiplication

If:

$X$ is $m \times n$ Matrix
$\theta$ is $n \times 1$ Vector

Then

h = X\theta

Produces $h (m \times 1)$ Vector

Use in Machine Learning

This gives predictions for all training examples in one operation.
Faster computation: optimized hardware usage (CPU/GPU)
Clean mathematical formulation

Linear regression hypothesis:

h_\theta(x) = \theta^T x

For all training examples:

h = X\theta

Dot Product( $a.b$ )

The dot product is defined between two vectors of the same dimension.

If:

x, y \in \mathbb{R}^n

Then their dot product is:

x^T y

x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}

x^T y = x_1 y_1 + x_2 y_2 + \dots + x_n y_n

It produces a single number (a scalar).

Summary

Key ideas:

Vectors represent features and parameters
Matrices represent datasets
Matrix multiplication enables fast prediction
Transpose and inverse enable optimization
Vectorization is essential for performance

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

️☠️ Advance MultiVariant Linear Algebra 2

Introduction to German Parts of Speech (Wortarten)

Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

Examples numbers, expressions, shapes, functions, and sets.
Complex Objects: theorems, proofs.

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

Latin: tendere meaning 'to stretch'

Scaler( $s \in \mathbb{R}$ )

Scalars are real numbers used in linear algebra

A single number, a 0-dimensional tensor.
Example: $5$ , $-3.14$ , $\pi$ , $e$

Matrices( $A \in \mathbb{R}^{n \times m}$ )

A matrix is a 2D array of numbers or table of numbers.

$A = \begin{bmatrix} 85 & 76 & 66 & 5 \\ 94 & 75 & 18 & 28 \\ 68 & 40 & 71 & 5 \end{bmatrix}$

Representation:

In mathematics:

$A \in \mathbb{R}^{3 \times 4}$

Where $A$ is a real-valued matrix with 4 rows and 2 columns.

In programming:

A = np.array([[85, 76, 66, 5],
              [94, 75, 18, 28],
              [68, 40, 71, 5]])

In theory:

Uppercase letters (A, B, X) → Matrices
Lowercase letters (x, y, z) → Vectors or scalars

Dimension:

$(m \times n)$

Where:

$m$ = rows
$n$ = columns
Example $3\times4$ matrix in the example

Square Matrix

A matrix with the same number of rows and columns ( $m = n$ ).

Element notation:

$A_{ij}$

The element in the $i-th$ row and $j-th$ column.

Example:

$A_{11}$ = 85 → Row 1, Column 1
$A_{32} = 40$ → Row 3, Column 2
$A_{41} = 5$ → Row 4, Column 1
$A_{23} = 18$ → Row 2, Column 3
$A_{64} = undefined$ → 6th row, 4th column does not exist

Use in Machine Learning

Represents Data Matrix, Model Parameters, Transformations

If we have:

$m$ training examples
$n$ features

The data matrix is:

X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}

Dimension $m \times n$ where:

Each row = one training example
Each column = one feature

Vectors( $\vec{x}$ )

A vector is a Matrix with 1 Column

Represents A point in $n$ high-dimensional space
Latin: vector , meaning "carrier" or "driver"
Have A direction ( $\vec{x}$ ) & A magnitude

$x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$

Represent as:

In Maths

$y \in \mathbb{R}^4$

In Programming

y = np.array([460, 232, 315, 178])

In Theory:

Uppercase letters (A, B, X) → Matrices

Dimension: $n \times 1$

In ML, vectors represent:

A data point → a vector
A feature column → a direction
A model weight vector → a direction of best fit

Example

$y = \begin{bmatrix} 460 \\ 232 \\ 315 \\ 178 \end{bmatrix}$

4 × 1 matrix Or a 4-dimensional vector

Element Indexing

$y_i$ = i-th element.

In mathematics, indexing usually starts at 1.
In programming indexing often starts at 0.
Unless otherwise specified, assume one-indexed notation in linear algebra.

Example:

$y_1 = 460$
$y_2 = 232$
$y_3 = 315$
$y_4 = 178$

Transpose ( $\mathbf{x}^T$ )

Transpose swaps rows and columns.

If: $A \in \mathbb{R}^{m \times n}$ then: $A^T \in \mathbb{R}^{n \times m}$

Element-wise:

$(A^T)_{ij} = A_{ji}$

A column vector becomes a row vector.

Given:

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

Then:

A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Used heavily in:

Normal Equation
Gradient derivations

Identity Matrix ( $I$ )

The identity matrix is the matrix equivalent of the number 1.

It is a square matrix with:

1’s on the diagonal
0’s everywhere else

I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Property:

AI = IA = A

Inverse Matrix( $A^{-1}$ )

The inverse of a matrix is like division.

Only square matrices can have inverses.

Matrix inverse satisfies:

A^{-1}A = AA^{-1} = I

Used in Normal Equation:

\theta = (X^T X)^{-1} X^T y

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

A matrix can be inverted

it has an inverse if it is full rank (rows and columns are linearly independent).

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

A matrix that does not have an inverse

Does not have a inverse because it is not full rank (rows or columns are linearly dependent).

Cause for non invertible Matrix:

Redundant feature: two feature related by a linear equation x2 = kx1 eg: size in feet and meter
More feature than training set(m<=n)): delete some feature or use regularization

Octave method for inverting matrix:

pinv(A) : Pseudo Inverse, calculates inverse even if matrix is non invertible
inv(A) : Inverse

Determinant ( $det(A)$ )

The determinant tells us whether a matrix is invertible.

For a 2 × 2 matrix:

A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

\det(A) = ad - bc

If: $det(A) \neq 0$ : The matrix is invertible.

If: $\det(A) = 0$ : The matrix is singular (not invertible).

Either no solution
Or infinitely many solution

Use in Machine Learning:

Normal Equation requires matrix inversion.

Closed-form solution:

\theta = (X^T X)^{-1} X^T y

In practice, we use numerical methods to avoid instability of matrix inversion.
Regularization can help make matrices invertible by adding a small value to the diagonal (Ridge Regression).

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y = Ax

Then $A$ transforms vector $x$ into a new vector $y$ .

Geometrically, a matrix can:

Stretch
Compress
Rotate
Reflect
Shear
Project

Matrix Addition/Subtraction

When Is Addition Allowed?

Addition is done element by element.

Two matrices can be added only if they have the same dimensions.

If:

$A, B \in \mathbb{R}^{m \times n}$

then:

$C = A + B$ where $(A + B)_{ij} = A_{ij} + B_{ij}$

is also an $( m \times n$ ) matrix.

A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}

A + B = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Subtraction is the same but with minus signs.

Scalar Multiplication / Division

Scalar multiplication is multiplying every element of a matrix by a single number (scalar).

Matrix-Matrix Multiplication

Element-wise: $C_{ij} = A_{ik}B_{kj}$ (sum over k)

Given 2 Matrices:

$A$ is $(m \times n)$ or $A \in \mathbb{R}^{m \times n}$
$B$ is $(n \times p)$ or $B \in \mathbb{R}^{n \times p}$

Then:

C = AB

where $C$ is a new matrix with dimensions:

$C(m \times p)$ or $C\in \mathbb{R}^{m \times p}$
inner dimensions must match (n) : $(m \times n)(n \times p) \rightarrow (m \times p)$

Properties:

Not commutative: $AB \ne BA$ : Order matters
Associative: $(AB)C = A(BC)$

Use in Machine Learning

Everything in deep learning is matrix multiplication:

Inputs × Weights
Weights × Activations
Gradient updates

Neural network forward pass: $Z = WX + b$

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.

`Vectorization`: Matrix-Vector Multiplication

If:

$X$ is $m \times n$ Matrix
$\theta$ is $n \times 1$ Vector

Then

h = X\theta

Produces $h (m \times 1)$ Vector

Use in Machine Learning

This gives predictions for all training examples in one operation.
Faster computation: optimized hardware usage (CPU/GPU)
Clean mathematical formulation

Linear regression hypothesis:

h_\theta(x) = \theta^T x

For all training examples:

h = X\theta

Dot Product( $a.b$ )

The dot product is defined between two vectors of the same dimension.

If:

x, y \in \mathbb{R}^n

Then their dot product is:

x^T y

x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}

x^T y = x_1 y_1 + x_2 y_2 + \dots + x_n y_n

It produces a single number (a scalar).

Summary

Key ideas:

Vectors represent features and parameters
Matrices represent datasets
Matrix multiplication enables fast prediction
Transpose and inverse enable optimization
Vectorization is essential for performance

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Mathematical object

Tensor

Scaler(s∈Rs \in \mathbb{R}s∈R)

Matrices(A∈Rn×mA \in \mathbb{R}^{n \times m}A∈Rn×m)

Representation:

In mathematics:

Dimension:

Element notation:

Use in Machine Learning

Vectors(x⃗\vec{x}x)

Represent as:

In Maths

In Programming

In Theory:

In ML, vectors represent:

Example

Element Indexing

Transpose (xT\mathbf{x}^TxT)

Identity Matrix (III)

Inverse Matrix(A−1A^{-1}A−1)

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

Cause for non invertible Matrix:

Octave method for inverting matrix:

Determinant ( det(A)det(A)det(A))

Use in Machine Learning:

A matrix is a transformation of space.

When Is Addition Allowed?

Scalar Multiplication / Division

Use in Machine Learning

Vectorization: Matrix-Vector Multiplication

Use in Machine Learning

Linear regression hypothesis:

Dot Product(a.ba.ba.b)

Summary

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Mathematical object

Tensor

Scaler(s∈Rs \in \mathbb{R}s∈R)

Matrices(A∈Rn×mA \in \mathbb{R}^{n \times m}A∈Rn×m)

Representation:

In mathematics:

Dimension:

Element notation:

Use in Machine Learning

Vectors(x⃗\vec{x}x)

Represent as:

In Maths

In Programming

In Theory:

In ML, vectors represent:

Example

Element Indexing

Transpose (xT\mathbf{x}^TxT)

Identity Matrix (III)

Inverse Matrix(A−1A^{-1}A−1)

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

Cause for non invertible Matrix:

Octave method for inverting matrix:

Determinant ( det(A)det(A)det(A))

Use in Machine Learning:

A matrix is a transformation of space.

When Is Addition Allowed?

Scalar Multiplication / Division

Use in Machine Learning

Vectorization: Matrix-Vector Multiplication

Use in Machine Learning

Linear regression hypothesis:

Dot Product(a.ba.ba.b)

Summary

Scaler( $s \in \mathbb{R}$ )

Matrices( $A \in \mathbb{R}^{n \times m}$ )

Vectors( $\vec{x}$ )

Transpose ( $\mathbf{x}^T$ )

Identity Matrix ( $I$ )

Inverse Matrix( $A^{-1}$ )

Determinant ( $det(A)$ )

`Vectorization`: Matrix-Vector Multiplication

Dot Product( $a.b$ )

Scaler( $s \in \mathbb{R}$ )

Matrices( $A \in \mathbb{R}^{n \times m}$ )

Vectors( $\vec{x}$ )

Transpose ( $\mathbf{x}^T$ )

Identity Matrix ( $I$ )

Inverse Matrix( $A^{-1}$ )

Determinant ( $det(A)$ )

`Vectorization`: Matrix-Vector Multiplication

Dot Product( $a.b$ )