Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Testing Hypothesis

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.
Cover Image for Evaluating a Hypothesis in Neural Networks

Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Next →

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

📋 Evaluating a Hypothesis

A model that fits the training data very well is not necessarily a good hypothesis.

A model can have low training error but still perform poorly on new data due to overfitting:

  • Low training error
  • High error on unseen data

Choosing Between Multiple Models

Suppose we are trying polynomial regression with different degrees:

d=1,2,3,…d = 1, 2, 3, \dotsd=1,2,3,…

Each degree defines a different hypothesis class.

We need a principled way to choose the best ddd without biasing our evaluation.

A good model has:

  • Low training error
  • Also, Low test error

If training error is low but test error is high, the model is overfitting.

To properly select a model:

  1. Train parameters on the training set
  2. Choose model complexity using the cross-validation set
  3. Report final performance using the test set

Splitting the Dataset

To properly evaluate performance, we split the dataset into:

A common split is:

  • Training set: 60%
  • Cross-validation set: 20%
  • Test set: 20%

1. 📚 Training set Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)

Typically, 60-70% of training data

  • Training error tells us how well the model fits known data.

Used to learn parameters Θ\ThetaΘ by minimizing the training error:

Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)
  • using only the training set.

2. 📘 Cross Validation Set Jcv(Θ)J_{\text{cv}}(\Theta)Jcv​(Θ)

Used for Model Selection (Validation)

For each trained model Θ(d)\Theta^{(d)}Θ(d), compute:

Jcv(Θ(d))J_{\text{cv}}\big(\Theta^{(d)}\big)Jcv​(Θ(d))

using the cross-validation set.

Choose the polynomial degree:

d∗=arg⁡min⁡dJcv(Θ(d))d^* = \arg\min_d J_{\text{cv}}\big(\Theta^{(d)}\big)d∗=argdmin​Jcv​(Θ(d))

This selects the model that generalizes best among the candidates.

3. 📗 Test set Jtest(Θ)J_{\text{test}}(\Theta)Jtest​(Θ)

Remaining 20-30% of training data

  • Test error tells us how well the model generalizes.

After choosing d∗d^*d∗, estimate generalization error using:

Jtest(Θ(d∗))J_{\text{test}}\big(\Theta^{(d^*)}\big)Jtest​(Θ(d∗))

The test set is used only once, at the very end.

  • The test set must remain untouched until the very end.

Test Set Error Examples

1. Linear Regression

For linear regression, the test error is:

Jtest(Θ)=12mtest∑i=1mtest(hΘ(xtest(i))−ytest(i))2J_{\text{test}}(\Theta) = \frac{1}{2 m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left( h_\Theta\big(x_{\text{test}}^{(i)}\big)- y_{\text{test}}^{(i)} \right)^2Jtest​(Θ)=2mtest​1​i=1∑mtest​​(hΘ​(xtest(i)​)−ytest(i)​)2

where:

  • mtestm_{\text{test}}mtest​ is the number of test examples
  • hΘ(x)h_\Theta(x)hΘ​(x) is the hypothesis function

This measures the average squared error on unseen data.

2. Classification Logistic Regression

Given a training set, learn the parameter vector Θ\ThetaΘ by minimizing the logistic regression cost function:

Jtrain(Θ)=−1mtrain∑i=1mtrain[y(i)log⁡hΘ(x(i))+(1−y(i))log⁡(1−hΘ(x(i)))]J_{\text{train}}(\Theta) = -\frac{1}{m_{\text{train}}} \sum_{i=1}^{m_{\text{train}}} \left[ y^{(i)} \log h_\Theta(x^{(i)})+ (1 - y^{(i)}) \log \big(1 - h_\Theta(x^{(i)})\big) \right]Jtrain​(Θ)=−mtrain​1​i=1∑mtrain​​[y(i)loghΘ​(x(i))+(1−y(i))log(1−hΘ​(x(i)))]

where

hΘ(x)=σ(ΘTx)=11+e−ΘTxh_\Theta(x) = \sigma(\Theta^T x) = \frac{1}{1 + e^{-\Theta^T x}}hΘ​(x)=σ(ΘTx)=1+e−ΘTx1​

After learning Θ\ThetaΘ using the training set, evaluate performance on the test set.

The test set cost is:

Jtest(Θ)=−1mtest∑i=1mtest[ytest(i)log⁡hΘ(xtest(i))+(1−ytest(i))log⁡(1−hΘ(xtest(i)))]J_{\text{test}}(\Theta) = -\frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left[ y_{\text{test}}^{(i)} \log h_\Theta(x_{\text{test}}^{(i)})+ (1 - y_{\text{test}}^{(i)}) \log \big(1 - h_\Theta(x_{\text{test}}^{(i)})\big) \right]Jtest​(Θ)=−mtest​1​i=1∑mtest​​[ytest(i)​loghΘ​(xtest(i)​)+(1−ytest(i)​)log(1−hΘ​(xtest(i)​))]

Important:

  • Θ\ThetaΘ is not retrained on the test set.
  • We simply plug the learned Θ\ThetaΘ into the test cost formula.

Misclassification error

For classification, we often use misclassification error (also called 0/1 error).

Define:

err(hΘ(x),y)={1if hΘ(x)≥0.5 and y=01if hΘ(x)<0.5 and y=10otherwise\text{err}(h_\Theta(x), y) = \begin{cases} 1 & \text{if } h_\Theta(x) \ge 0.5 \text{ and } y = 0 \\ 1 & \text{if } h_\Theta(x) < 0.5 \text{ and } y = 1 \\ 0 & \text{otherwise} \end{cases}err(hΘ​(x),y)=⎩⎨⎧​110​if hΘ​(x)≥0.5 and y=0if hΘ​(x)<0.5 and y=1otherwise​

This gives:

  • 1 for an incorrect prediction
  • 0 for a correct prediction

Classification Average Test Error

The overall test error is:

Test Error=1mtest∑i=1mtesterr(hΘ(xtest(i)),ytest(i))\text{Test Error} = \frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \text{err}\big( h_\Theta(x_{\text{test}}^{(i)}), y_{\text{test}}^{(i)} \big)Test Error=mtest​1​i=1∑mtest​​err(hΘ​(xtest(i)​),ytest(i)​)

This gives the proportion of test examples that were misclassified.


Error Analysis

A practical and effective approach to solving machine learning problems is:

  1. Start with a simple algorithm
  2. Implement it quickly
  3. Evaluate it early using cross-validation data

Avoid over-engineering before you understand where the model is failing.

Step 1 — Plot Learning Curves

Learning curves help answer questions like:

  • Would more training data help?
  • Is the model suffering from high bias?
  • Is it suffering from high variance?
  • Would more features improve performance?

They give direction before investing more time.

Step 2 — Manually Inspect Errors

After evaluating on the cross-validation set:

  • Look at misclassified examples
  • Try to identify patterns in the errors

Example

Suppose:

  • 500 total emails
  • 100 misclassified

Instead of guessing improvements, manually inspect those 100 emails.

You might categorize them:

  • Phishing emails
  • Promotional emails
  • Personal emails
  • Password theft attempts

If most errors are password-theft emails, that suggests the model is missing features specific to that category.

You could then:

  • Add features related to suspicious links
  • Add features related to urgent security language
  • Detect specific keyword patterns

Step 3 — Try Improvements Systematically

Every time you introduce a change:

  • Add a feature
  • Apply stemming
  • Modify preprocessing
  • Adjust regularization

You must measure the impact using a single numerical metric.

Without a numerical value, you cannot objectively compare changes.

Example: Stemming

Stemming treats variations of a word as the same root: [ fail , failing, failed]

If error rate drops from:

5%→3%5\% \rightarrow 3\%5%→3%

That is a strong improvement. Keep it.

Example: Case Sensitivity

Suppose distinguishing between uppercase and lowercase changes error from:

3%→3.2%3\% \rightarrow 3.2\%3%→3.2%

That is worse. Do not keep the feature.

Core Principle

Always:

  1. Make one change at a time
  2. Measure cross-validation error
  3. Keep only changes that reduce error

Avoid guessing. Let the data guide decisions.

Troubleshooting prediction errors by:

  • Getting more training examples
  • Trying smaller sets of features
  • Adding new features
  • Trying polynomial features
  • Increasing or decreasing λ\lambdaλ

we need a reliable way to evaluate the new hypothesis.


Key Insight

Error analysis turns machine learning from random tweaking into a systematic engineering process.

Instead of asking:

"What should I try next?"

You ask:

"Where is the model failing, and why?"

Then improve it in a targeted way.

AI-Machine-Learning/2-Testing-Hypothesis
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.