Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Collaborative Filtering 🫱🏻‍🫲🏽

Recommender system technique that learns both user preferences and item features automatically from rating data.

Unlike content-based methods, we do not know the features of movies beforehand.
Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Chicken-and-Egg Problem 🥚

Previously we saw two ideas:

If movie features $x^{(i)}$ are known, we can learn user parameters $\theta^{(j)}$ .
If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Instead of alternating between them, collaborative filtering learns both simultaneously.

Why It’s Called Collaborative?

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

Better movie representations
Better recommendations for everyone

flowchart TD
    A[Randomly Initialize User Preferences θ]
    --> B[Learn Movie Features x]
    --> C[Update User Preferences θ]
    --> D[Update Movie Features x]
    --> E[Repeat Until Convergence]

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

user preferences $\theta$
item features $x$

directly from the rating matrix, without manually defining features.

flowchart TD
    A[Random User Rate a Movie]
    --> B[Learn User Preferences Vector]
    --> C[Learn Movie Features Vector]
    --> D[Predict & Update Missing Ratings for new Movies]
    --> E[Generate New Recommendations]

Movies feature Matrix $x^{(i)}$

Movie	Romance	Action
Titanic	0.9	0.1
Notebook	0.95	0.05
Avengers	0.1	0.9
John Wick	0.05	0.95

$x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}$

where

$x_1$ = romantic level
$x_2$ = action level

From this we infer $x^{(1)}$ :

Movie is romantic
Movie is not action

No Intercept Term

Unlike previous models:

We remove the intercept feature $x_0 = 1$ .

x^{(i)} \in \mathbb{R}^n

\theta^{(j)} \in \mathbb{R}^n

Reason:

since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User	Titanic	The Notebook	Avengers	John Wick
Alice	⭐⭐⭐⭐⭐	⭐	⭐	⭐
Bob	⭐⭐⭐⭐	`?`	⭐	⭐
Carol	⭐	⭐	⭐	⭐⭐⭐⭐⭐

Prediction of user $j$ rating movie $i$ :

\hat{y}_{ij} = \theta^{(j)T} x^{(i)}

User Preferences Matrix $\theta^{(j)}$

User	Likes Romance	Likes Action
Alice	`0.95`	`0.05`
Bob	`0.85`	`0.15`
Carol	`0.05`	`0.95`

These features are not manually defined.
The algorithm learns them from ratings.

Observation

Alice and Bob have similar taste
Both dislike action movies
Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Minimize prediction error:

\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y_{ij}$ = actual rating
$r(i,j)=1$ if rating exists
$\lambda$ = regularization

Learning All Movie Features

flowchart TD
    A[Current Movie Features x]
    --> B[Predict User Ratings]
    --> C[Compute Error]
    --> D[Compute Gradient]
    --> E[Update Features]
    --> F[Better Predictions]

For all movies:

\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

New Feature
=
Old Feature
-
Learning Rate × Gradient

We are updating:

x_k^{(i)}

Predicted rating:

(\theta^{(j)})^T x^{(i)}

This is:

User Preferences·Movie Features = Predicted Rating

Error Term

Prediction error

(predicted rating - actual rating)^2

(\theta^{(j)})^T x^{(i)} - y^{(i,j)}

where:

predicted rating minus
actual rating

If:

error is large → update more
error is small → update less

Regularization ( $\lambda$ )

\lambda x_k^{(i)}

Prevents features from becoming too large.

Helps reduce overfitting.

Collaborative Filtering Algorithm

1. Initialize $\theta$ randomly.

Initialize with small random values:

x^{(i)}, \theta^{(j)}

We do this

2. Minimize the cost function $J(x,\theta)$

Estimate movie features: Fix $\theta$ , learn $x$
Estimate user preferences: Fix $x$ , learn $\theta$
Repeat until convergence.

If we:

fix $x$ and minimize $J$ w.r.t. $\theta$ , we recover the user learning problem.
fix $\theta$ and minimize $J$ w.r.t. $x$ , we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

Gradient Descent
Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)

J(x,\theta)

We combine both learning problems into a single cost function.

J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y^{(i,j)}$ = rating user $j$ gave movie $i$
$r(i,j)=1$ if rating exists, otherwise $0$
$x^{(i)}$ = feature vector for movie $i$
$\theta^{(j)}$ = parameter vector for user $j$

This objective:

penalizes prediction error
regularizes user parameters
regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}

If user $j$ has not rated movie $i$ , we predict their rating using this value.

4. Result

The algorithm learns:

movie feature vectors $x^{(i)}$
user preference vectors $\theta^{(j)}$

from the rating matrix alone, without manually defining movie features.

AI-Machine-Learning/7-1-Collaborative-Filtering

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Collaborative Filtering 🫱🏻‍🫲🏽

Recommender system technique that learns both user preferences and item features automatically from rating data.

Unlike content-based methods, we do not know the features of movies beforehand.
Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Chicken-and-Egg Problem 🥚

Previously we saw two ideas:

If movie features $x^{(i)}$ are known, we can learn user parameters $\theta^{(j)}$ .
If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Instead of alternating between them, collaborative filtering learns both simultaneously.

Why It’s Called Collaborative?

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

Better movie representations
Better recommendations for everyone

flowchart TD
    A[Randomly Initialize User Preferences θ]
    --> B[Learn Movie Features x]
    --> C[Update User Preferences θ]
    --> D[Update Movie Features x]
    --> E[Repeat Until Convergence]

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

user preferences $\theta$
item features $x$

directly from the rating matrix, without manually defining features.

flowchart TD
    A[Random User Rate a Movie]
    --> B[Learn User Preferences Vector]
    --> C[Learn Movie Features Vector]
    --> D[Predict & Update Missing Ratings for new Movies]
    --> E[Generate New Recommendations]

Movies feature Matrix $x^{(i)}$

Movie	Romance	Action
Titanic	0.9	0.1
Notebook	0.95	0.05
Avengers	0.1	0.9
John Wick	0.05	0.95

$x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}$

where

$x_1$ = romantic level
$x_2$ = action level

From this we infer $x^{(1)}$ :

Movie is romantic
Movie is not action

No Intercept Term

Unlike previous models:

We remove the intercept feature $x_0 = 1$ .

x^{(i)} \in \mathbb{R}^n

\theta^{(j)} \in \mathbb{R}^n

Reason:

since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User	Titanic	The Notebook	Avengers	John Wick
Alice	⭐⭐⭐⭐⭐	⭐	⭐	⭐
Bob	⭐⭐⭐⭐	`?`	⭐	⭐
Carol	⭐	⭐	⭐	⭐⭐⭐⭐⭐

Prediction of user $j$ rating movie $i$ :

\hat{y}_{ij} = \theta^{(j)T} x^{(i)}

User Preferences Matrix $\theta^{(j)}$

User	Likes Romance	Likes Action
Alice	`0.95`	`0.05`
Bob	`0.85`	`0.15`
Carol	`0.05`	`0.95`

These features are not manually defined.
The algorithm learns them from ratings.

Observation

Alice and Bob have similar taste
Both dislike action movies
Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Minimize prediction error:

\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y_{ij}$ = actual rating
$r(i,j)=1$ if rating exists
$\lambda$ = regularization

Learning All Movie Features

flowchart TD
    A[Current Movie Features x]
    --> B[Predict User Ratings]
    --> C[Compute Error]
    --> D[Compute Gradient]
    --> E[Update Features]
    --> F[Better Predictions]

For all movies:

\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

New Feature
=
Old Feature
-
Learning Rate × Gradient

We are updating:

x_k^{(i)}

Predicted rating:

(\theta^{(j)})^T x^{(i)}

This is:

User Preferences·Movie Features = Predicted Rating

Error Term

Prediction error

(predicted rating - actual rating)^2

(\theta^{(j)})^T x^{(i)} - y^{(i,j)}

where:

predicted rating minus
actual rating

If:

error is large → update more
error is small → update less

Regularization ( $\lambda$ )

\lambda x_k^{(i)}

Prevents features from becoming too large.

Helps reduce overfitting.

Collaborative Filtering Algorithm

1. Initialize $\theta$ randomly.

Initialize with small random values:

x^{(i)}, \theta^{(j)}

We do this

2. Minimize the cost function $J(x,\theta)$

Estimate movie features: Fix $\theta$ , learn $x$
Estimate user preferences: Fix $x$ , learn $\theta$
Repeat until convergence.

If we:

fix $x$ and minimize $J$ w.r.t. $\theta$ , we recover the user learning problem.
fix $\theta$ and minimize $J$ w.r.t. $x$ , we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

Gradient Descent
Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)

J(x,\theta)

We combine both learning problems into a single cost function.

J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y^{(i,j)}$ = rating user $j$ gave movie $i$
$r(i,j)=1$ if rating exists, otherwise $0$
$x^{(i)}$ = feature vector for movie $i$
$\theta^{(j)}$ = parameter vector for user $j$

This objective:

penalizes prediction error
regularizes user parameters
regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}

If user $j$ has not rated movie $i$ , we predict their rating using this value.

4. Result

The algorithm learns:

movie feature vectors $x^{(i)}$
user preference vectors $\theta^{(j)}$

from the rating matrix alone, without manually defining movie features.

AI-Machine-Learning/7-1-Collaborative-Filtering

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Fetching content, this won’t take long…

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering 🫱🏻‍🫲🏽

Chicken-and-Egg Problem 🥚

Why It’s Called Collaborative?

Key Idea

Movies feature Matrix x(i)x^{(i)}x(i)

No Intercept Term

Reason:

User Movie Rating Matrix

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

Learning Movie Features

Learning All Movie Features

Predicted rating:

Error Term

Regularization (λ\lambdaλ)

Collaborative Filtering Algorithm

1. Initialize θ\thetaθ randomly.

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

3. Rating Prediction

4. Result

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering 🫱🏻‍🫲🏽

Chicken-and-Egg Problem 🥚

Why It’s Called Collaborative?

Key Idea

Movies feature Matrix x(i)x^{(i)}x(i)

No Intercept Term

Reason:

User Movie Rating Matrix

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

Learning Movie Features

Learning All Movie Features

Predicted rating:

Error Term

Regularization (λ\lambdaλ)

Collaborative Filtering Algorithm

1. Initialize θ\thetaθ randomly.

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

3. Rating Prediction

4. Result

Movies feature Matrix $x^{(i)}$

User Preferences Matrix $\theta^{(j)}$

Regularization ( $\lambda$ )

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$

Movies feature Matrix $x^{(i)}$

User Preferences Matrix $\theta^{(j)}$

Regularization ( $\lambda$ )

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$