Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Collaborative Filtering: Building Recommender Systems with Feature Learning

Large Scale Machine Learning: Training Models on Massive Datasets

👁️ Optical Character Recognition (OCR)

Sliding Windows and Photo OCR

Photo OCR stands for:

Photo Optical Character Recognition

Goal:

Image → Detect Text → Read Characters → Final Text

Applications:

Google Lens
document scanning
self-driving cars reading signs
helping visually impaired users
searching photos by text

Sliding Window Detection 🖼️

A small rectangle moves across the image.

flowchart LR
    A[Image Patch 1] --> B[Classifier]
    C[Image Patch 2] --> B
    D[Image Patch 3] --> B
    B --> E[Pedestrian Yes or No]

At each location:

crop image patch
resize if needed
run classifier
move window

We train a classifier

Image\ Patch \rightarrow y \in \{0,1\}

Where

1. Positive Examples ( $1$ )

Contains pedestrian

2. Negative Examples ( $0$ )

No pedestrian

Stride / Step Size

The amount the window moves each time.

Example:

stride = 1 pixel
stride = 4 pixels
stride = 8 pixels

Small stride

More accurate detection
Slower

Large stride

Faster detection
Low accuracy: May miss objects

Multi-Scale Detection

Objects can appear at different sizes.

So we use:

small windows
medium windows
large windows

flowchart TD
    A[Small Window]
    B[Medium Window]
    C[Large Window]

Each patch is resized before classification.

Your input images are 1000x1000 pixels.

Sliding windows detector sizes: 10x10 , 20x20
Stride 2

Number of iteration per side= 1000/2 = 500

Total iteration = 500X 500 = 250000

For 2 scales = 2 X 250000 = 500,000 times

Machine Learning Pipeline

Photo OCR is built as a pipeline of smaller ML systems.

Benefits:

easier debugging
modular design
easier teamwork

Each stage solves one smaller problem.

flowchart TD
    A[Input Image] --> B[Text Detection]
    B --> C[Character Segmentation]
    C --> D[Character Recognition]
    D --> E[Final Text Output]

1. Text Detection

Find where text exists inside the image.

Now apply sliding windows to text.

Train classifier on:

1. Positive Examples

Image patches containing text

2. Negative Examples

Image patches without text

Text Detection Process

flowchart TD
    A[Input Image]
    --> B[Slide Window Across Image]
    --> C[Classifier Predicts Text Probability]
    --> D[Probability Heatmap]
    --> E[Bounding Boxes Around Text]

Classifier outputs:

White  → high confidence text
Gray   → uncertain
Black  → no text

Expansion Operator

After detection, nearby white regions are expanded.

Purpose:

merge neighboring text pixels
form larger connected text regions

flowchart LR
    A[Small White Blobs]
    --> B[Expansion]
    --> C[Larger Connected Regions]

Connected Components

Now detect connected white regions and draw boxes.

┌─────────────┐
│ Antique Mall│
└─────────────┘

We also filter weird shapes.

Aspect Ratio Filtering

Text regions are usually:

Wide > Tall

So discard:

tall thin regions
random noisy blobs

2. Character Segmentation

Split the detected text into individual characters.

Example:

ANTIQUE

Desired output:

A | N | T | I | Q | U | E

1D Sliding Window

Now the window moves only horizontally.

flowchart LR
    A[Character Strip]
    --> B[Slide Left to Right]
    --> C[Predict Split Locations]

Classifier predicts:

Should we split here?

Training Character Split Classifier

1. Positive Example

A | N

Correct split location.

2. Negative Example

AN

Inside one character.

3. Character Recognition

Classify each character. Image of character → Predicted Letter

[A-image] → "A"
[N-image] → "N"

Now each segmented character becomes a classification problem.

flowchart LR
    A[Character Image]
    --> B[Classifier]
    --> C[A-Z or 0-9]

Example:

Image of "A" → Predict "A"

This is a multiclass classification problem.

Artificial Data Synthesis

Generate fake training data to creates huge datasets cheaply.

Example:

different fonts
rotations
shadows
blur

Full OCR System

flowchart TD
    A[Photo]
    --> B[Sliding Window Text Detection]
    --> C[Text Bounding Boxes]
    --> D[Character Segmentation]
    --> E[Character Classification]
    --> F[Combine Characters]
    --> G[Final Readable Text]

Modern Computer Vision

Classical sliding windows are historically important.

Modern systems now often use:

CNNs
YOLO
Faster R-CNN
SSD
Vision Transformers

These are much faster and more accurate.

Core Takeaway

Large machine learning systems are usually:

Many small ML models
working together
inside a pipeline

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Collaborative Filtering: Building Recommender Systems with Feature Learning

Large Scale Machine Learning: Training Models on Massive Datasets

👁️ Optical Character Recognition (OCR)

Sliding Windows and Photo OCR

Photo OCR stands for:

Photo Optical Character Recognition

Goal:

Image → Detect Text → Read Characters → Final Text

Applications:

Google Lens
document scanning
self-driving cars reading signs
helping visually impaired users
searching photos by text

Sliding Window Detection 🖼️

A small rectangle moves across the image.

flowchart LR
    A[Image Patch 1] --> B[Classifier]
    C[Image Patch 2] --> B
    D[Image Patch 3] --> B
    B --> E[Pedestrian Yes or No]

At each location:

crop image patch
resize if needed
run classifier
move window

We train a classifier

Image\ Patch \rightarrow y \in \{0,1\}

Where

1. Positive Examples ( $1$ )

Contains pedestrian

2. Negative Examples ( $0$ )

No pedestrian

Stride / Step Size

The amount the window moves each time.

Example:

stride = 1 pixel
stride = 4 pixels
stride = 8 pixels

Small stride

More accurate detection
Slower

Large stride

Faster detection
Low accuracy: May miss objects

Multi-Scale Detection

Objects can appear at different sizes.

So we use:

small windows
medium windows
large windows

flowchart TD
    A[Small Window]
    B[Medium Window]
    C[Large Window]

Each patch is resized before classification.

Your input images are 1000x1000 pixels.

Sliding windows detector sizes: 10x10 , 20x20
Stride 2

Number of iteration per side= 1000/2 = 500

Total iteration = 500X 500 = 250000

For 2 scales = 2 X 250000 = 500,000 times

Machine Learning Pipeline

Photo OCR is built as a pipeline of smaller ML systems.

Benefits:

easier debugging
modular design
easier teamwork

Each stage solves one smaller problem.

flowchart TD
    A[Input Image] --> B[Text Detection]
    B --> C[Character Segmentation]
    C --> D[Character Recognition]
    D --> E[Final Text Output]

1. Text Detection

Find where text exists inside the image.

Now apply sliding windows to text.

Train classifier on:

1. Positive Examples

Image patches containing text

2. Negative Examples

Image patches without text

Text Detection Process

flowchart TD
    A[Input Image]
    --> B[Slide Window Across Image]
    --> C[Classifier Predicts Text Probability]
    --> D[Probability Heatmap]
    --> E[Bounding Boxes Around Text]

Classifier outputs:

White  → high confidence text
Gray   → uncertain
Black  → no text

Expansion Operator

After detection, nearby white regions are expanded.

Purpose:

merge neighboring text pixels
form larger connected text regions

flowchart LR
    A[Small White Blobs]
    --> B[Expansion]
    --> C[Larger Connected Regions]

Connected Components

Now detect connected white regions and draw boxes.

┌─────────────┐
│ Antique Mall│
└─────────────┘

We also filter weird shapes.

Aspect Ratio Filtering

Text regions are usually:

Wide > Tall

So discard:

tall thin regions
random noisy blobs

2. Character Segmentation

Split the detected text into individual characters.

Example:

ANTIQUE

Desired output:

A | N | T | I | Q | U | E

1D Sliding Window

Now the window moves only horizontally.

flowchart LR
    A[Character Strip]
    --> B[Slide Left to Right]
    --> C[Predict Split Locations]

Classifier predicts:

Should we split here?

Training Character Split Classifier

1. Positive Example

A | N

Correct split location.

2. Negative Example

AN

Inside one character.

3. Character Recognition

Classify each character. Image of character → Predicted Letter

[A-image] → "A"
[N-image] → "N"

Now each segmented character becomes a classification problem.

flowchart LR
    A[Character Image]
    --> B[Classifier]
    --> C[A-Z or 0-9]

Example:

Image of "A" → Predict "A"

This is a multiclass classification problem.

Artificial Data Synthesis

Generate fake training data to creates huge datasets cheaply.

Example:

different fonts
rotations
shadows
blur

Full OCR System

flowchart TD
    A[Photo]
    --> B[Sliding Window Text Detection]
    --> C[Text Bounding Boxes]
    --> D[Character Segmentation]
    --> E[Character Classification]
    --> F[Combine Characters]
    --> G[Final Readable Text]

Modern Computer Vision

Classical sliding windows are historically important.

Modern systems now often use:

CNNs
YOLO
Faster R-CNN
SSD
Vision Transformers

These are much faster and more accurate.

Core Takeaway

Large machine learning systems are usually:

Many small ML models
working together
inside a pipeline

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.

Written by Hitesh Sahu, a passionate developer and blogger.

👁️ Optical Character Recognition (OCR)

Sliding Windows and Photo OCR

Sliding Window Detection 🖼️

1. Positive Examples (111)

2. Negative Examples (000)

Stride / Step Size

Small stride

Large stride

Multi-Scale Detection

Machine Learning Pipeline

1. Text Detection

1. Positive Examples

2. Negative Examples

Text Detection Process

Expansion Operator

Connected Components

Aspect Ratio Filtering

2. Character Segmentation

1D Sliding Window

Training Character Split Classifier

1. Positive Example

2. Negative Example

3. Character Recognition

Artificial Data Synthesis

Full OCR System

Modern Computer Vision

Core Takeaway

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.

Written by Hitesh Sahu, a passionate developer and blogger.

👁️ Optical Character Recognition (OCR)

Sliding Windows and Photo OCR

Sliding Window Detection 🖼️

1. Positive Examples (111)

2. Negative Examples (000)

Stride / Step Size

Small stride

Large stride

Multi-Scale Detection

Machine Learning Pipeline

1. Text Detection

1. Positive Examples

2. Negative Examples

Text Detection Process

Expansion Operator

Connected Components

Aspect Ratio Filtering

2. Character Segmentation

1D Sliding Window

Training Character Split Classifier

1. Positive Example

2. Negative Example

3. Character Recognition

Artificial Data Synthesis

Full OCR System

Modern Computer Vision

Core Takeaway

1. Positive Examples ( $1$ )

2. Negative Examples ( $0$ )

1. Positive Examples ( $1$ )

2. Negative Examples ( $0$ )