Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. β€Ί
  3. posts
  4. β€Ί
  5. …

  6. β€Ί
  7. 8 0 OCR

Loading ⏳
Fetching content, this won’t take long…


πŸ’‘ Did you know?

🍌 Bananas are berries, but strawberries are not.

πŸͺ This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Collaborative Filtering: Building Recommender Systems with Feature Learning

Next β†’

Large Scale Machine Learning: Training Models on Massive Datasets

πŸ‘οΈ Optical Character Recognition (OCR)

Sliding Windows and Photo OCR

Photo OCR stands for:

Photo Optical Character Recognition

Goal:

Image β†’ Detect Text β†’ Read Characters β†’ Final Text

Applications:

  • Google Lens
  • document scanning
  • self-driving cars reading signs
  • helping visually impaired users
  • searching photos by text

Sliding Window Detection πŸ–ΌοΈ

A small rectangle moves across the image.

flowchart LR
    A[Image Patch 1] --> B[Classifier]
    C[Image Patch 2] --> B
    D[Image Patch 3] --> B
    B --> E[Pedestrian Yes or No]

At each location:

  1. crop image patch
  2. resize if needed
  3. run classifier
  4. move window

We train a classifier

ImageΒ Patchβ†’y∈{0,1}Image\ Patch \rightarrow y \in \{0,1\}ImageΒ Patchβ†’y∈{0,1}

Where

1. Positive Examples (111)

Contains pedestrian

2. Negative Examples (000)

No pedestrian

Stride / Step Size

The amount the window moves each time.

Example:

  • stride = 1 pixel
  • stride = 4 pixels
  • stride = 8 pixels

Small stride

  • More accurate detection
  • Slower

Large stride

  • Faster detection
  • Low accuracy: May miss objects

Multi-Scale Detection

Objects can appear at different sizes.

So we use:

  • small windows
  • medium windows
  • large windows
flowchart TD
    A[Small Window]
    B[Medium Window]
    C[Large Window]

Each patch is resized before classification.

Your input images are 1000x1000 pixels.

  • Sliding windows detector sizes: 10x10 , 20x20
  • Stride 2

Number of iteration per side= 1000/2 = 500

Total iteration = 500X 500 = 250000

For 2 scales = 2 X 250000 = 500,000 times


Machine Learning Pipeline

Photo OCR is built as a pipeline of smaller ML systems.

Benefits:

  • easier debugging
  • modular design
  • easier teamwork

Each stage solves one smaller problem.

flowchart TD
    A[Input Image] --> B[Text Detection]
    B --> C[Character Segmentation]
    C --> D[Character Recognition]
    D --> E[Final Text Output]

1. Text Detection

Find where text exists inside the image.

Now apply sliding windows to text.

Train classifier on:

1. Positive Examples

Image patches containing text

2. Negative Examples

Image patches without text

Text Detection Process

flowchart TD
    A[Input Image]
    --> B[Slide Window Across Image]
    --> C[Classifier Predicts Text Probability]
    --> D[Probability Heatmap]
    --> E[Bounding Boxes Around Text]

Classifier outputs:

White  β†’ high confidence text
Gray   β†’ uncertain
Black  β†’ no text

Expansion Operator

After detection, nearby white regions are expanded.

Purpose:

  • merge neighboring text pixels
  • form larger connected text regions
flowchart LR
    A[Small White Blobs]
    --> B[Expansion]
    --> C[Larger Connected Regions]

Connected Components

Now detect connected white regions and draw boxes.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Antique Mallβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We also filter weird shapes.

Aspect Ratio Filtering

Text regions are usually:

Wide > Tall

So discard:

  • tall thin regions
  • random noisy blobs

2. Character Segmentation

Split the detected text into individual characters.

Example:

ANTIQUE

Desired output:

A | N | T | I | Q | U | E

1D Sliding Window

Now the window moves only horizontally.

flowchart LR
    A[Character Strip]
    --> B[Slide Left to Right]
    --> C[Predict Split Locations]

Classifier predicts:

Should we split here?

Training Character Split Classifier

1. Positive Example

A | N

Correct split location.

2. Negative Example

AN

Inside one character.


3. Character Recognition

Classify each character. Image of character β†’ Predicted Letter

  • [A-image] β†’ "A"
  • [N-image] β†’ "N"

Now each segmented character becomes a classification problem.

flowchart LR
    A[Character Image]
    --> B[Classifier]
    --> C[A-Z or 0-9]

Example:

Image of "A" β†’ Predict "A"

This is a multiclass classification problem.

Artificial Data Synthesis

Generate fake training data to creates huge datasets cheaply.

Example:

  • different fonts
  • rotations
  • shadows
  • blur

Full OCR System

flowchart TD
    A[Photo]
    --> B[Sliding Window Text Detection]
    --> C[Text Bounding Boxes]
    --> D[Character Segmentation]
    --> E[Character Classification]
    --> F[Combine Characters]
    --> G[Final Readable Text]

Modern Computer Vision

Classical sliding windows are historically important.

Modern systems now often use:

  • CNNs
  • YOLO
  • Faster R-CNN
  • SSD
  • Vision Transformers

These are much faster and more accurate.


Core Takeaway

Large machine learning systems are usually:

Many small ML models
working together
inside a pipeline
AI-Machine-Learning/8-0-OCR
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich πŸ₯¨, Germany πŸ‡©πŸ‡ͺ, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
Β  Home/About
Β  Skills
Β  Work/Projects
Β  Lab/Experiments
Β  Contribution
Β  Awards
Β  Art/Sketches
Β  Thoughts
Β  Contact
Links
Β  Sitemap
Β  Legal Notice
Β  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| Β© 2026 All rights reserved.