Anomaly Detection: Identifying Rare and Unusual Patterns in Data
Learn how anomaly detection models identify unusual data points using statistical methods such as Gaussian distributions. Understand how to detect fraud, system failures, and rare events in real-world datasets.
Collaborative Filtering: Building Recommender Systems with Feature Learning
Large Scale Machine Learning: Training Models on Massive Datasets
ποΈ Optical Character Recognition (OCR)
Sliding Windows and Photo OCR
Photo OCR stands for:
Photo Optical Character Recognition
Goal:
Image β Detect Text β Read Characters β Final Text
Applications:
- Google Lens
- document scanning
- self-driving cars reading signs
- helping visually impaired users
- searching photos by text
Sliding Window Detection πΌοΈ
A small rectangle moves across the image.
flowchart LR
A[Image Patch 1] --> B[Classifier]
C[Image Patch 2] --> B
D[Image Patch 3] --> B
B --> E[Pedestrian Yes or No]
At each location:
- crop image patch
- resize if needed
- run classifier
- move window
We train a classifier
Where
1. Positive Examples ()
Contains pedestrian
2. Negative Examples ()
No pedestrian
Stride / Step Size
The amount the window moves each time.
Example:
- stride = 1 pixel
- stride = 4 pixels
- stride = 8 pixels
Small stride
- More accurate detection
- Slower
Large stride
- Faster detection
- Low accuracy: May miss objects
Multi-Scale Detection
Objects can appear at different sizes.
So we use:
- small windows
- medium windows
- large windows
flowchart TD
A[Small Window]
B[Medium Window]
C[Large Window]
Each patch is resized before classification.
Your input images are 1000x1000 pixels.
- Sliding windows detector sizes:
10x10,20x20 - Stride
2
Number of iteration per side= 1000/2 = 500
Total iteration = 500X 500 = 250000
For 2 scales = 2 X 250000 = 500,000 times
Machine Learning Pipeline
Photo OCR is built as a pipeline of smaller ML systems.
Benefits:
- easier debugging
- modular design
- easier teamwork
Each stage solves one smaller problem.
flowchart TD
A[Input Image] --> B[Text Detection]
B --> C[Character Segmentation]
C --> D[Character Recognition]
D --> E[Final Text Output]
1. Text Detection
Find where text exists inside the image.
Now apply sliding windows to text.
Train classifier on:
1. Positive Examples
Image patches containing text
2. Negative Examples
Image patches without text
Text Detection Process
flowchart TD
A[Input Image]
--> B[Slide Window Across Image]
--> C[Classifier Predicts Text Probability]
--> D[Probability Heatmap]
--> E[Bounding Boxes Around Text]
Classifier outputs:
White β high confidence text
Gray β uncertain
Black β no text
Expansion Operator
After detection, nearby white regions are expanded.
Purpose:
- merge neighboring text pixels
- form larger connected text regions
flowchart LR
A[Small White Blobs]
--> B[Expansion]
--> C[Larger Connected Regions]
Connected Components
Now detect connected white regions and draw boxes.
βββββββββββββββ
β Antique Mallβ
βββββββββββββββ
We also filter weird shapes.
Aspect Ratio Filtering
Text regions are usually:
Wide > Tall
So discard:
- tall thin regions
- random noisy blobs
2. Character Segmentation
Split the detected text into individual characters.
Example:
ANTIQUE
Desired output:
A | N | T | I | Q | U | E
1D Sliding Window
Now the window moves only horizontally.
flowchart LR
A[Character Strip]
--> B[Slide Left to Right]
--> C[Predict Split Locations]
Classifier predicts:
Should we split here?
Training Character Split Classifier
1. Positive Example
A | N
Correct split location.
2. Negative Example
AN
Inside one character.
3. Character Recognition
Classify each character. Image of character β Predicted Letter
- [A-image] β "A"
- [N-image] β "N"
Now each segmented character becomes a classification problem.
flowchart LR
A[Character Image]
--> B[Classifier]
--> C[A-Z or 0-9]
Example:
Image of "A" β Predict "A"
This is a multiclass classification problem.
Artificial Data Synthesis
Generate fake training data to creates huge datasets cheaply.
Example:
- different fonts
- rotations
- shadows
- blur
Full OCR System
flowchart TD
A[Photo]
--> B[Sliding Window Text Detection]
--> C[Text Bounding Boxes]
--> D[Character Segmentation]
--> E[Character Classification]
--> F[Combine Characters]
--> G[Final Readable Text]
Modern Computer Vision
Classical sliding windows are historically important.
Modern systems now often use:
- CNNs
- YOLO
- Faster R-CNN
- SSD
- Vision Transformers
These are much faster and more accurate.
Core Takeaway
Large machine learning systems are usually:
Many small ML models
working together
inside a pipeline
