Collaborative Filtering: Building Recommender Systems with Feature Learning
Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.
Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches
Anomaly Detection: Identifying Rare and Unusual Patterns in Data
Collaborative Filtering 🫱🏻🫲🏽
Recommender system technique that learns both user preferences and item features automatically from rating data.
-
Unlike content-based methods, we do not know the features of movies beforehand.
-
Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.
Chicken-and-Egg Problem 🥚
Previously we saw two ideas:
- If movie features are known, we can learn user parameters .
- If user parameters are known, we can learn movie features .
Instead of alternating between them, collaborative filtering learns both simultaneously.
Why It’s Called Collaborative?
Many users rate movies.
Their ratings collaboratively help the system learn features.
Result:
- Better movie representations
- Better recommendations for everyone
flowchart TD
A[Randomly Initialize User Preferences θ]
--> B[Learn Movie Features x]
--> C[Update User Preferences θ]
--> D[Update Movie Features x]
--> E[Repeat Until Convergence]
Key Idea
People with similar tastes tend to like similar things.
Collaborative filtering simultaneously learns:
- user preferences
- item features
directly from the rating matrix, without manually defining features.
flowchart TD
A[Random User Rate a Movie]
--> B[Learn User Preferences Vector]
--> C[Learn Movie Features Vector]
--> D[Predict & Update Missing Ratings for new Movies]
--> E[Generate New Recommendations]
Movies feature Matrix
| Movie | Romance | Action |
|---|---|---|
| Titanic | 0.9 | 0.1 |
| Notebook | 0.95 | 0.05 |
| Avengers | 0.1 | 0.9 |
| John Wick | 0.05 | 0.95 |
where
- = romantic level
- = action level
From this we infer :
- Movie is romantic
- Movie is not action
No Intercept Term
Unlike previous models:
- We remove the intercept feature .
Reason:
since the algorithm learns all features automatically, it can learn a constant feature itself if needed.
User Movie Rating Matrix
| User | Titanic | The Notebook | Avengers | John Wick |
|---|---|---|---|---|
| Alice | ⭐⭐⭐⭐⭐ | ⭐ | ⭐ | ⭐ |
| Bob | ⭐⭐⭐⭐ | ? |
⭐ | ⭐ |
| Carol | ⭐ | ⭐ | ⭐ | ⭐⭐⭐⭐⭐ |
Prediction of user rating movie :
User Preferences Matrix
| User | Likes Romance | Likes Action |
|---|---|---|
| Alice | 0.95 |
0.05 |
| Bob | 0.85 |
0.15 |
| Carol | 0.05 |
0.95 |
- These features are not manually defined.
- The algorithm learns them from ratings.
Observation
- Alice and Bob have similar taste
- Both dislike action movies
- Both like romantic movies
So if Bob has not rated The Notebook, we can predict:
- Bob will probably rate it highly.
The algorithm uses behavior of other users to predict what someone will like.
Learning Movie Features
If user parameters are known, we can learn movie features .
Minimize prediction error:
Where:
- = actual rating
- if rating exists
- = regularization
Learning All Movie Features
flowchart TD
A[Current Movie Features x]
--> B[Predict User Ratings]
--> C[Compute Error]
--> D[Compute Gradient]
--> E[Update Features]
--> F[Better Predictions]
For all movies:
New Feature
=
Old Feature
-
Learning Rate × Gradient
We are updating:
Predicted rating:
This is:
User Preferences·Movie Features = Predicted Rating
Error Term
Prediction error
(predicted rating - actual rating)^2
where:
- predicted rating minus
- actual rating
If:
- error is large → update more
- error is small → update less
Regularization ()
Prevents features from becoming too large.
Helps reduce overfitting.
Collaborative Filtering Algorithm
1. Initialize randomly.
Initialize with small random values:
We do this
2. Minimize the cost function
- Estimate movie features: Fix , learn
- Estimate user preferences: Fix , learn
- Repeat until convergence.
If we:
- fix and minimize w.r.t. , we recover the user learning problem.
- fix and minimize w.r.t. , we recover the movie feature learning problem.
Instead of alternating between them, we optimize both together.
Minimize cost with:
- Gradient Descent
- Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)
We combine both learning problems into a single cost function.
Where:
- = rating user gave movie
- if rating exists, otherwise
- = feature vector for movie
- = parameter vector for user
This objective:
- penalizes prediction error
- regularizes user parameters
- regularizes movie features
3. Rating Prediction
Once the model is trained, predicted rating:
If user has not rated movie , we predict their rating using this value.
4. Result
The algorithm learns:
- movie feature vectors
- user preference vectors
from the rating matrix alone, without manually defining movie features.
