Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 3 Normal Equation

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

Understand the Normal Equation in linear regression, its closed-form solution, mathematical formula, advantages, limitations, and how it compares to gradient descent for model optimization.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Evaluating a Hypothesis in Neural Networks

Next →

Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

Normal Equation (Closed-Form Solution)

Instead of solving multiple iteration of gradient descent, Normal equation can get theta in one step

  • Θ can be directly calculated where cost function is minimal using calculus in one step instead of iterating iterative optimization:
θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy

Advantages

  • No learning rate required
  • Direct computation

Limitations

  • Computationally expensive for very large datasets
  • Matrix inversion can be costly

Steps:

  • Construct design matrix X using feature columns and add 1 in first column
  • Construct y vector using result values Y
  • calculate:

Θ = (XTX)-1 XTy

Mean Normalization

Feature scaling is not required for Normal Equation method

Normal Equation vs Gradient Descent:

Feature Gradient Descent Normal Equation
Complexity Complex need to debug alpha Convenient & Simple to implement
Choose Learning Rate(α) Required No need
Feature Scaling Required No need
Iteration Many Iteration Required Not required
Feature Set>=million Efficient if n is huge O(kn2) Slow if n is huge, cost of inverse matrix is O(n3)
Complex Learning Algo Can used for Complex learning algo Not supported

Faster single Hypothesis Prediction calculation given data set and Thetas **Much faster than nested for loops

Data Matrix * Parameter Vector = Prediction Vector

h(x) = Theta0 + Theta1x
[1 , x]*[Theta Vector] = [h(x)]

descentFormula

Usage:

  • Faster multiple Hypothesis Prediction calculation given data set and Thetas
  • Much-much faster than nested for loops

Data Matrix * Parameter Matrix = Prediction Matrix

  given h(x) = Theta0 + Theta1x
  [1 , x]*[Theta Matrix] = [h(x)]

descentFormula

AI-Machine-Learning/2-3-Normal-Equation
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.