Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 1 Introduction

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍌 Bananas are berries, but strawberries are not.
Cover Image for Machine Learning: Introduction and Core Algorithms

Machine Learning: Introduction and Core Algorithms

Beginner-friendly introduction to machine learning, covering key concepts, model types, supervised and unsupervised learning, and essential algorithms such as linear regression, logistic regression, decision trees, and clustering.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Tue Feb 24 2026

Share This on

← Previous

The Economic Impact of Generative AI

Next →

AWS Serverless & Other Services

Machine Learning 🤖

AI

AI is the field of study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals

ML

ML is the study of computer algorithms that improve automatically through experience.

  • ML is Subset of AI
  • Learning from data
  • Improving performance (P) with experience(E) while performing Task (T)

Older definition -- Arthur Samuel (1959)

The field of study that gives computers the ability to learn without being explicitly programmed.

Modern definition -- Tom Mitchell (1998)

A program learns from:

  • E (Experience)- User-labeled emails
  • T (Task) - Classify emails as spam or not spam
  • P (Performance measure) - Fraction of correctly classified emails

If performance on task T, measured by P, improves with experience E, then it is learning.


Use Cases

ML is powerful when:

1. Large Datasets Exist

  • Web Analytics data
  • Medical records
  • Biological data

2. Problems Are Hard to Hand-Code

  • Autonomous Drive
  • Handwriting recognition
  • NLP(Natural Language Processing)
  • Computer vision

3. Self-Customizing Systems

  • Amazon recommendations
  • Netflix recommendations

Machine Learning Methods

1. Supervised Learning

You give the algorithm input data and the correct outputs (“right answers”), and it learns to predict outputs for new inputs.

Training set:

  • You are given labeled data.

(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m)) (x^(1), y^(1)), (x^(2), y^(2)), ..., (x^(m), y^(m))(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))

$x^{(i)}, y^{(i)}$

where:

  • x(i)x^{(i)}x(i) → input features
  • y(i)y^{(i)}y(i) → output label

Goal: Learn a function that maps inputs → outputs.

Example:

  • Find a decision boundary separating positive and negative examples.
  • Spam filtering with labeled emails
  • Diabetes classification with labeled patients
  • Cancer Type Prediction

1.1 Regression

Regression means predicting a continuous value output.

Algorithms studied:

  • Linear Regression
  • Logistic Regression
  • Neural Networks
  • Support Vector Machines (SVM)

These methods learn a function:

hθ(x)≈yh_\theta(x) \approx yhθ​(x)≈y

used for prediction or classification.

Example

Housing Price Prediction (Regression)

Predict the price of a house based on its size.

  • Feature (x): House size (square feet)
  • Output (y): Price (continuous value)

We are given historical data:

Size (sq ft) Price ($)
1000 200000
1500 300000
2000 400000

The algorithm may:

  • Fit a straight line (Linear Regression)
  • Fit a quadratic curve (Polynomial Regression)

Different models may produce different predictions.

1.2 Classification

Classification means predicting a discrete category as output

  • We train using past labeled examples.
  • Only specific categories allowed as output (0 or 1)

Using One Feature

Breast Cancer Detection (Classification)

What is the probability this tumor is malignant?

  • Feature: Tumor size
  • Output: 0 or 1
    • Malignant (1)
    • Benign (0)

Multiple Features

More than one feature:

  • Tumor size
  • Age
  • Clump thickness
  • Uniformity of cell size
  • Uniformity of cell shape

Multiple Output categories:

  • 0 → No cancer
  • 1 → Type 1 cancer
  • 2 → Type 2 cancer
  • 3 → Type 3 cancer

It is still classification because the output is from a finite set of categories.

The algorithm learns a decision boundary that separates categories.


2. Unsupervised Learning

There are no labeled input. The system tries to find structure in the data.

Goal: discover hidden structure in data.

Algorithms studied:

  • K-Means Clustering
  • Principal Component Analysis (PCA) for dimensionality reduction
  • Anomaly Detection

Training set:

Unsupervised learning uses unlabeled data:

x(i)x^{(i)}x(i)
  • No labels
  • No correct answers
  • No predefined categories

x(1),x(2),...,x(m) x^(1), x^(2), ..., x^(m)x(1),x(2),...,x(m)

Where:

  • x(i)x^{(i)}x(i) is the input (features)
  • There are no y labels.

Goal

Discover hidden structure in the data.

"Here is the data. Can you find structure in it?"

We do not tell the algorithm what the correct output is.
We ask it to find patterns on its own.

  • Discovers hidden structure
  • Common task: Clustering
  • Advanced example: Cocktail Party Problem

2.1 Clustering

The algorithm automatically groups similar data points together.

  • Used to find patterns

We are not told:

  • How many groups exist
  • What the groups represent
  • Which example belongs to which group

The algorithm discovers that on its own.

Example

  • Given market data Identify patterns in buying behavior
  • Given news articles data find topics
  • Given Data Centers logs find machines that frequently work together
  • Given Social Network data find groups or communities
  • Given customer data find Market Segmentation
  • Given Astronomical data find galaxies

2.2 Blind Source Separation

Separating mixed signals into original independent components.

The Cocktail Party Problem

Given only the mixed recordings:

  • Detect that multiple sources exist
  • Separate them into independent signals
  • Recover the original voices

No labels are given:

  • We do not tell the algorithm what each voice sounds like
  • It discovers structure in the signal

Separate the original voices from mixed signals.

Difference Between Supervised and Unsupervised Learning

Aspect Supervised Learning Unsupervised Learning
Data Labeled data (input + correct output) Unlabeled data (input only)
Goal Learn mapping from input → output Discover hidden structure or patterns
Output Type Continuous (regression) or discrete (classification) Clusters, groups, latent structure
Example Problem House price prediction Customer segmentation
Example Problem Spam detection Grouping news articles
Human Guidance Requires correct answers during training No correct answers provided
Typical Tasks Regression, Classification Clustering, Dimensionality Reduction
Evaluation Compare predictions with true labels Evaluate structure quality (e.g., cohesion, separation)
Use Case When you know what you want to predict When you want to explore unknown patterns

3. Special Applications

Several practical ML systems were discussed:

  • Recommender Systems
  • Large-scale machine learning
  • Parallel and distributed learning
  • Computer vision using sliding window object detection

These show how ML is applied in real-world systems.


Building Machine Learning Systems

How to make ML systems work in practice.

Important concepts:

Bias vs Variance

  • High bias → underfitting
  • High variance → overfitting

Regularization helps control variance.

Evaluating Learning Algorithms

Proper evaluation is essential.

Data is usually split into:

  • Training set
  • Cross-validation set
  • Test set

Common evaluation metrics:

  • Precision
  • Recall
  • F1 Score

Debugging and Improving ML Systems

Tools discussed for diagnosing problems:

  • Learning curves
  • Error analysis
  • Ceiling analysis

These techniques help answer:

What should we work on next to improve the system?


AI-Machine-Learning/1-Introduction
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.