Abstract
Tutorial looking at three recommender system methods, namely user and item based collabarative filtereing and lastly, matrix factorisation. The methods are slowly introduced with an example and are coded from scratch.
In this tutorial we shall introduce recommender systems based on collaborative filtering (check google resource). Specifically, we shall look at three CF examples:
user-based collaborative filtering
item-based collaborative filtering
matrix factorization
The goal of this tutorial is to use these above mentioned approaches to build a system for recommending movies (or anything for that matter) to users based on their past viewing habits.
We shall be using examples throughout the tutorial based of the working of Ian Durbach; his profile is found here. This tutorial is based of teaching from Google’s Recomenndation Systems notes and Data Science for Industry course from university of Cape Town. Other great resources that are used for this tutorial are shown below:
Chapter 22 of Joel Grus’ “Data Science from Scratch: First Principles with Python”.
Practical Deep Learning for Coders book, particularly chapter 8
You interact nearly every day with recommendation systems—algorithms which guess what products and services you might like, based on your past behavior. These systems largely rely on collaborative-filtering, an approach based on linear algebra that fills in the missing values in a matrix.
A very common problem to solve is when you have a number of users and a number of products, and you want to recommend which products are most likely to be useful for which users; such as recommending movies (such as on Netflix) or video on YouTube that you might want to watch next, for example. Further examples: Amazon recommends products you might want to buy. Twitter recommends users you might want to follow.
A recommendation system helps users find compelling content in a large corpora. Crazily, \(40%\) of app installs on Google Play come from recommendations and \(60%\) of watch time on YouTube comes from recommendations (Google, 2022).
There are several ways to recommend items to users, like simply recommending what’s popular - this approach neglects a user’s existing interests. Instead we shall look at an approach which takes into account each user’s existing interests. Specifically, the approach is called collaborative filtering.
Collabarative filtering is a type of recommender system that works as follows: look at what products the current user has used or liked, find other users that have used or liked similar products, and then recommend other products that those users have used or liked.
A similarity measure is a function that takes a pair of embeddings and returns a scalar measuring their similarity. To determine the degree of similarity, most recommendation systems rely on one or more of the following:
cosine
dot product
Euclidean distance
We load required packages and the data set for this tutorial.
# Packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
# Load data
load("output/recommender.RData")
Let’s view the data that is contained in this R object.
# View Movies
head(viewed_movies, 6)
## # A tibble: 6 × 21
## userId 2001:…¹ Apoca…² Big L…³ Bourn…⁴ Clear…⁵ Crouc…⁶ Depar…⁷ Donni…⁸ Ferri…⁹
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0 1 1 0 1 0 0 0 0
## 2 20 0 0 0 1 0 1 0 0 0
## 3 187 1 0 1 0 0 1 0 1 0
## 4 198 1 0 1 0 0 1 0 0 0
## 5 212 1 0 1 0 0 0 0 0 1
## 6 222 0 1 1 0 0 0 1 0 0
## # … with 11 more variables: `Green Mile, The (1999)` <dbl>,
## # `Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)` <dbl>,
## # `Indiana Jones and the Temple of Doom (1984)` <dbl>,
## # `Interview with the Vampire: The Vampire Chronicles (1994)` <dbl>,
## # `Jumanji (1995)` <dbl>, `Kill Bill: Vol. 2 (2004)` <dbl>,
## # `Shining, The (1980)` <dbl>, `Sleepless in Seattle (1993)` <dbl>,
## # `Star Trek: Generations (1994)` <dbl>, …
We shorten the Harry Potter movie title - to make it look neater. We also view the movie names to see if the name was indeed changed.
# Change Harry Potter Name
viewed_movies <- rename(viewed_movies, `Harry Potter and the Philosopher's Stone (2001)` = `Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)`)
# Movie Names Changed?
names(viewed_movies)[10:14]
## [1] "Ferris Bueller's Day Off (1986)"
## [2] "Green Mile, The (1999)"
## [3] "Harry Potter and the Philosopher's Stone (2001)"
## [4] "Indiana Jones and the Temple of Doom (1984)"
## [5] "Interview with the Vampire: The Vampire Chronicles (1994)"
Before we begin exploring our recommender methods, we need to also do some pre-processing of our data. We first need to convert the data to matrix form otherwise some of the later functions we use will give an error.
# get list of users
sorted_my_users <- as.character(unlist(viewed_movies[,1]))
# convert to matrix
viewed_movies <- as.matrix(viewed_movies[,-1])
# make row names the users
row.names(viewed_movies) <- sorted_my_users
Let’s look at our data one last time before diving into some brief theory on our first collabarative method.
# Shape
dim(viewed_movies)
## [1] 15 20
# Structure of Matrix
str(viewed_movies)
## num [1:15, 1:20] 0 0 1 1 1 0 0 1 1 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:15] "1" "20" "187" "198" ...
## ..$ : chr [1:20] "2001: A Space Odyssey (1968)" "Apocalypse Now (1979)" "Big Lebowski, The (1998)" "Bourne Identity, The (2002)" ...
We have 15 rows and 20 columns. The rows represent the users adn the columns represent the movies An extract of the first 5 users and the first 3 movies is shown below.
viewed_movies[1:5,1:3]
## 2001: A Space Odyssey (1968) Apocalypse Now (1979) Big Lebowski, The (1998)
## 1 0 1 1
## 20 0 0 0
## 187 1 0 1
## 198 1 0 1
## 212 1 0 1
A 0 indicates the user has not watched that movie (a 1 indicates they have). Now let’s look at collaborative filtering.
Collaborative filtering uses similarities between users and items simultaneously to provide recommendations. This allows for serendipitous recommendations; that is, collaborative filtering models can recommend an item to user A based on the interests of a similar user B
We have our matrix for our movie recommender system. Each row represents a user. Each column represents an item (a movie). The feedback about movies falls into one of two categories:
Explicit: users specify how much they liked a particular movie by providing a numerical rating.
Implicit: if a user watches a movie, the system infers that the user is interested.
To simplify, we will assume that the feedback matrix is binary; that is, a value of 1 indicates interest in the movie.
Recall that we mentioned that we could simply recommend to a user the most popular items (movies). This would be a really simple recommender system which would just recommend the most popular movies (that a user hasn’t seen before). This information is obtained by summing the values of each column of viewed movies:
sort(apply(viewed_movies, 2, sum), decreasing = TRUE)[1:4]
## Shining, The (1980) Big Lebowski, The (1998) Apocalypse Now (1979)
## 11 10 9
## Kill Bill: Vol. 2 (2004)
## 8
This approach has an intuitive appeal but is pretty unsophisticated (everyone gets the same recommendations, barring the filtering out of seen movies!) In other words, everyone’s vote counts the same. Collaborative filtering overcomes these inadequacies. Let’s look at the three approaches below.
This type of CF is a way of taking a user’s interests into account by looking for users who are somehow similar to them, and then suggest the things that those users are interested in. In order to do that, we’ll need a way to measure how similar two users are. There are lots of different similarity measures. Here we’ll use cosine similarity. We’ll apply this to vectors of \(0\)s and \(1\)s, each vector representing one user’s interests. It will be 1 if the user specified the \(i^{th}\) interest, and \(0\) otherwise. Accordingly, “similar users” will mean “users whose interest vectors most nearly point in the same direction”.
Users with identical interests will have similarity 1.
Users with no identical interests will have similarity 0.
Otherwise, the similarity will fall in between, with numbers closer to 1 indicating “very similar” and numbers closer to 0 indicating “not very similar.”
To summarise user-based collaborative filtering: For each movie \(j\) not seen yet by user. Compute similarity between this user and all other users that have seen movie \(j\). Similarity score done using cosine similarity. Score where 0 is user being completely dissimilar to me, and 1 as exactly same tastes as me. Each person gets a weight (similarity to me). Sum these up. Pick the highest (the unseen movie that has been seen by the most similar users).
A possible issue with user-based CF is that it does not work as well when the number of items gets very large. In large- dimensional vector spaces most vectors are very far apart (and also point in very different directions). That is, when there are a large number of interests the “most similar users” to a given user might not be similar at all.
Consider the example taken from DSFS, looking at Amazon: a site like Amazon.com, from which I may have bought thousands of items over the last couple of decades. You could attempt to identify similar users to me based on buying patterns, but most likely in all the world there’s no one whose purchase history looks even remotely like mine. Whoever my “most similar” shopper is, he’s probably not similar to me at all, and his purchases would almost certainly make for lousy recommendations.
User-based CF extends the approach by changing how much each person’s vote counts. Specifically, when recommending what I should watch next, a user-based CF system will up-weight the votes of people that are “more similar” to me. In this context “similar” means “has seen many of the same movies as me”.
Cosine similarity derives its name from the fact that it measures the cosine of the angle between two non-zero vectors. The closer the vectors lie to each other, the smaller the angle, and the closer the cosine is to 1. It can be shown that for two vectors \(\boldsymbol x\) and \(\boldsymbol y\):
\[cos(\theta) = \frac{\boldsymbol x \cdot
\boldsymbol y}{||\boldsymbol x|| \ ||\boldsymbol y||} =
\frac{\sum_{i=1}^{n}x_iy_i}{\sqrt{\sum_{i=1}^{n}x^2_i}
\sqrt{\sum_{i=1}^{n}y^2_i}}\] We can use the
crossprod() function in R to calculate the dot
products.
# function calculating cosine similarity
cosine_sim <- function(a, b){crossprod(a, b) / sqrt(crossprod(a) * crossprod(b))}
Intuitively, If they are very dissimilar, then they point in different directions. They will be dissimilar if their are no correspondence between observations for vectors \(x\) and \(y\). For which he cross product will be 0. And so cosine of the angle is 0 then the angle is 90 degrees (orthogonal).
Cosine similarity lies between 0 and 1 inclusive and increases with similarity. Here are a few test cases to get a feel for it:
# maximally similar
x1 <- c(1,1,1,0,0)
x2 <- c(1,1,1,0,0)
cosine_sim(x1,x2)
## [,1]
## [1,] 1
Users are exactly similar.
# maximally dissimilar
x1 <- c(1,1,1,0,0)
x2 <- c(0,0,0,1,1)
cosine_sim(x1,x2)
## [,1]
## [1,] 0
Users are dissimilar.
# but also
x1 <- c(1,1,0,0,0,0,0)
x2 <- c(0,0,0,0,1,1,0)
cosine_sim(x1,x2)
## [,1]
## [1,] 0
Users are dissimilar, even when they share 0’s for 3 columns. In this binary case, we only interested in the 1’s we share.
# try an example from our data
as.numeric(viewed_movies[1,]) # user 1's viewing history
## [1] 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0
as.numeric(viewed_movies[2,]) # user 2's viewing history
## [1] 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
cosine_sim(viewed_movies[1,], viewed_movies[2,])
## [,1]
## [1,] 0
# example 2
as.numeric(viewed_movies[1,])
## [1] 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0
as.numeric(viewed_movies[7,])
## [1] 0 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 0 0 0 0
cosine_sim(viewed_movies[1,], viewed_movies[7,])
## [,1]
## [1,] 0.6804138
Let’s get similarities between user pairs. We’ll do this with a loop below, because it’s easier to see what’s going on, but this will be inefficient and very slow for bigger data sets.
user_similarities <- matrix(0, nrow = 15, ncol = 15)
for (i in 1:14) {
for (j in (i + 1):15) {
user_similarities[i,j] <- cosine_sim(viewed_movies[i,], viewed_movies[j,])
}
}
user_similarities <- user_similarities + t(user_similarities)
diag(user_similarities) <- 0
row.names(user_similarities) <- row.names(viewed_movies)
colnames(user_similarities) <- row.names(viewed_movies)
round(user_similarities, 3)
## 1 20 187 198 212 222 282 328 330 372 432 434
## 1 0.000 0.000 0.309 0.667 0.333 0.309 0.680 0.471 0.408 0.471 0.289 0.594
## 20 0.000 0.000 0.189 0.204 0.204 0.189 0.167 0.289 0.500 0.000 0.354 0.485
## 187 0.309 0.189 0.000 0.617 0.463 0.286 0.378 0.546 0.661 0.436 0.401 0.550
## 198 0.667 0.204 0.617 0.000 0.500 0.154 0.544 0.471 0.510 0.471 0.289 0.594
## 212 0.333 0.204 0.463 0.500 0.000 0.309 0.408 0.707 0.510 0.471 0.289 0.594
## 222 0.309 0.189 0.286 0.154 0.309 0.000 0.504 0.546 0.567 0.218 0.535 0.642
## 282 0.680 0.167 0.378 0.544 0.408 0.504 0.000 0.770 0.667 0.385 0.589 0.728
## 328 0.471 0.289 0.546 0.471 0.707 0.546 0.770 0.000 0.722 0.500 0.510 0.840
## 330 0.408 0.500 0.661 0.510 0.510 0.567 0.667 0.722 0.000 0.433 0.619 0.849
## 372 0.471 0.000 0.436 0.471 0.471 0.218 0.385 0.500 0.433 0.000 0.204 0.420
## 432 0.289 0.354 0.401 0.289 0.289 0.535 0.589 0.510 0.619 0.204 0.000 0.600
## 434 0.594 0.485 0.550 0.594 0.594 0.642 0.728 0.840 0.849 0.420 0.600 0.000
## 495 0.365 0.000 0.338 0.183 0.365 0.676 0.745 0.645 0.559 0.258 0.316 0.542
## 562 0.408 0.000 0.189 0.408 0.408 0.000 0.500 0.289 0.500 0.289 0.354 0.364
## 594 0.167 0.204 0.154 0.000 0.000 0.463 0.136 0.118 0.510 0.236 0.289 0.297
## 495 562 594
## 1 0.365 0.408 0.167
## 20 0.000 0.000 0.204
## 187 0.338 0.189 0.154
## 198 0.183 0.408 0.000
## 212 0.365 0.408 0.000
## 222 0.676 0.000 0.463
## 282 0.745 0.500 0.136
## 328 0.645 0.289 0.118
## 330 0.559 0.500 0.510
## 372 0.258 0.289 0.236
## 432 0.316 0.354 0.289
## 434 0.542 0.364 0.297
## 495 0.000 0.224 0.183
## 562 0.224 0.000 0.204
## 594 0.183 0.204 0.000
For the user similarity matrix, its 15 by 15, since we have 15 users in our little sample. Diagonal of matrix is 0. The rows and column names show the user. Elements of matrix indicate the similarity score between the respective users.
# who are the most similar users to user 222?
sort(user_similarities["222",], decreasing = TRUE)
## 495 434 330 328 432 282 594 1
## 0.6761234 0.6416889 0.5669467 0.5455447 0.5345225 0.5039526 0.4629100 0.3086067
## 212 187 372 20 198 222 562
## 0.3086067 0.2857143 0.2182179 0.1889822 0.1543033 0.0000000 0.0000000
Let’s see if this makes sense from the viewing histories. Below we show user 222’s history, together with the user who is most similar to user 222 (user 495) and another user who is very dissimilar (user 562).
t(viewed_movies[c("222","495","562"),])
## 222 495 562
## 2001: A Space Odyssey (1968) 0 0 0
## Apocalypse Now (1979) 1 1 0
## Big Lebowski, The (1998) 1 1 0
## Bourne Identity, The (2002) 0 0 0
## Clear and Present Danger (1994) 0 0 0
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 0 0 0
## Departed, The (2006) 1 1 0
## Donnie Darko (2001) 0 0 0
## Ferris Bueller's Day Off (1986) 0 1 1
## Green Mile, The (1999) 0 0 1
## Harry Potter and the Philosopher's Stone (2001) 0 0 0
## Indiana Jones and the Temple of Doom (1984) 0 0 0
## Interview with the Vampire: The Vampire Chronicles (1994) 0 0 0
## Jumanji (1995) 1 0 0
## Kill Bill: Vol. 2 (2004) 1 1 0
## Shining, The (1980) 0 0 1
## Sleepless in Seattle (1993) 0 0 1
## Star Trek: Generations (1994) 0 0 0
## There's Something About Mary (1998) 1 0 0
## Up (2009) 1 0 0
The results above indicate how the similarities do indeed make sense. User 222 and 495 do indeed share a lot of seen movies, in fact all but one movies seen by user 495 have been watched by 222 user. User 562 on the other hand does not share any viewed movies with user 222.
As an example, let’s consider the process of recommending a movie to one user, say user 222. How would we do this with a user-based collaborative filtering system?
First, we need to know what movies have they already seen (so we don’t recommend these). Below are the movies watched by user 222
viewed_movies['222',which(viewed_movies["222",] ==1)]
## Apocalypse Now (1979) Big Lebowski, The (1998)
## 1 1
## Departed, The (2006) Jumanji (1995)
## 1 1
## Kill Bill: Vol. 2 (2004) There's Something About Mary (1998)
## 1 1
## Up (2009)
## 1
The basic idea is now to recommend what’s popular by adding up the number of users that have seen each movie, but to weight each user by their similarity to user 222.
Let’s work through the calculations for one movie, say 2001: A Space Odyssey (movie 1). The table below shows who’s seen 2001: A Space Odyssey, and how similar each person is to user 222.
seen_movie <- viewed_movies[,"2001: A Space Odyssey (1968)"]
sim_to_user <- user_similarities["222",]
cbind(seen_movie,sim_to_user)
## seen_movie sim_to_user
## 1 0 0.3086067
## 20 0 0.1889822
## 187 1 0.2857143
## 198 1 0.1543033
## 212 1 0.3086067
## 222 0 0.0000000
## 282 0 0.5039526
## 328 1 0.5455447
## 330 1 0.5669467
## 372 1 0.2182179
## 432 0 0.5345225
## 434 1 0.6416889
## 495 0 0.6761234
## 562 0 0.0000000
## 594 0 0.4629100
The basic idea in user-based collaborative filtering is that user 372’s vote counts less than user 434’s, because user 434 is more similar to user 222 (in terms of viewing history). Shown clearly below:
cbind(seen_movie,sim_to_user)[c(10,12),]
## seen_movie sim_to_user
## 372 1 0.2182179
## 434 1 0.6416889
Note that this only means user 434 counts more in the context of making recommendations to user 222. When recommending to users other than user 222, user 372 may carry more weight.
We can now work out an overall recommendation score for 2001: A Space Odyssey by multiplying together the two elements in each row of the table we achieved just now, and summing these products (taking the dot product):
# overall score for 2001: A Space Odyssey
crossprod(viewed_movies[, "2001: A Space Odyssey (1968)"], user_similarities["222",])
## [,1]
## [1,] 2.721023
Note this score will increase with (a) the number of people who’ve seen the movie (more 1’s in the first column above) and (b) if the people who’ve seen it are similar to user 1.
Let’s repeat this calculation for all movies and compare recommendation scores:
t(user_similarities["222",] %*% viewed_movies)
## [,1]
## 2001: A Space Odyssey (1968) 2.7210226
## Apocalypse Now (1979) 3.9239911
## Big Lebowski, The (1998) 3.9914875
## Bourne Identity, The (2002) 2.9816377
## Clear and Present Danger (1994) 0.9502956
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.8376355
## Departed, The (2006) 3.4687789
## Donnie Darko (2001) 2.0398947
## Ferris Bueller's Day Off (1986) 3.2428631
## Green Mile, The (1999) 2.7100208
## Harry Potter and the Philosopher's Stone (2001) 2.2517693
## Indiana Jones and the Temple of Doom (1984) 2.1540964
## Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935
## Jumanji (1995) 2.3950504
## Kill Bill: Vol. 2 (2004) 3.7544932
## Shining, The (1980) 4.0681044
## Sleepless in Seattle (1993) 1.0298568
## Star Trek: Generations (1994) 0.4629100
## There's Something About Mary (1998) 1.6715457
## Up (2009) 2.0303629
To come up with a final recommendation, we just need to remember to remove movies user 222 has already seen, and sort the remaining movies in descending order of recommendation score.
We do that below, after tidying up the results a bit by putting them in a data frame.
user_scores <- data.frame(title = colnames(viewed_movies),
score = as.vector(user_similarities["222",] %*% viewed_movies),
seen = as.vector(viewed_movies["222",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score))
## title score seen
## 1 Shining, The (1980) 4.0681044 0
## 2 Ferris Bueller's Day Off (1986) 3.2428631 0
## 3 Bourne Identity, The (2002) 2.9816377 0
## 4 2001: A Space Odyssey (1968) 2.7210226 0
## 5 Green Mile, The (1999) 2.7100208 0
## 6 Harry Potter and the Philosopher's Stone (2001) 2.2517693 0
## 7 Indiana Jones and the Temple of Doom (1984) 2.1540964 0
## 8 Donnie Darko (2001) 2.0398947 0
## 9 Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935 0
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.8376355 0
## 11 Sleepless in Seattle (1993) 1.0298568 0
## 12 Clear and Present Danger (1994) 0.9502956 0
## 13 Star Trek: Generations (1994) 0.4629100 0
Therefore, our top recommendation for user 222 is “The Shining”.
Now that we’ve understood the calculations, let’s get recommendations for one more user, user 372:
# recommendations for user 372
user_scores <- data.frame(title = colnames(viewed_movies),
score = as.vector(user_similarities["372",] %*% viewed_movies),
seen = as.vector(viewed_movies["372",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score))
## title score seen
## 1 Big Lebowski, The (1998) 4.0650630 0
## 2 Kill Bill: Vol. 2 (2004) 2.8549736 0
## 3 Ferris Bueller's Day Off (1986) 2.7562755 0
## 4 Green Mile, The (1999) 2.6736052 0
## 5 Departed, The (2006) 2.4185378 0
## 6 Indiana Jones and the Temple of Doom (1984) 2.2477932 0
## 7 Bourne Identity, The (2002) 1.9421211 0
## 8 Harry Potter and the Philosopher's Stone (2001) 1.8245012 0
## 9 Up (2009) 1.8138306 0
## 10 Donnie Darko (2001) 1.7895325 0
## 11 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.7609370 0
## 12 Jumanji (1995) 1.5111410 0
## 13 Interview with the Vampire: The Vampire Chronicles (1994) 1.3092749 0
## 14 There's Something About Mary (1998) 1.3070169 0
## 15 Sleepless in Seattle (1993) 0.9573901 0
## 16 Clear and Present Danger (1994) 0.8914885 0
## 17 Star Trek: Generations (1994) 0.2357023 0
We would recommend “The Big Lebowski” to user 372.
We essentially just combine and generalize each step we made in the
previous section to get a function which takes in a user (a
number or string), user_sim which is the user similarity matrix
between all users, viewed_mov which is a matrix that shows
which movie each user has seen or not seen.
# a function to generate a recommendation for any user
user_based_recommendations <- function(user, user_sim, viewed_mov){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# get scores
user_scores <- data.frame(title = colnames(viewed_mov),
score = as.vector(user_sim[user,] %*% viewed_mov),
seen = as.vector(viewed_mov[user,]))
# sort unseen movies by score and remove the 'seen' column
user_scores %>%
filter(seen == 0) %>%
arrange(desc(score)) %>%
select(-seen)
}
Let’s check the function is working by running it on a user we’ve used before:
user_based_recommendations(user = 222, user_sim = user_similarities, viewed_mov = viewed_movies)
## title score
## 1 Shining, The (1980) 4.0681044
## 2 Ferris Bueller's Day Off (1986) 3.2428631
## 3 Bourne Identity, The (2002) 2.9816377
## 4 2001: A Space Odyssey (1968) 2.7210226
## 5 Green Mile, The (1999) 2.7100208
## 6 Harry Potter and the Philosopher's Stone (2001) 2.2517693
## 7 Indiana Jones and the Temple of Doom (1984) 2.1540964
## 8 Donnie Darko (2001) 2.0398947
## 9 Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.8376355
## 11 Sleepless in Seattle (1993) 1.0298568
## 12 Clear and Present Danger (1994) 0.9502956
## 13 Star Trek: Generations (1994) 0.4629100
Now do it for all users with lapply:
lapply(sorted_my_users, user_based_recommendations, user_similarities, viewed_movies)
We now make this better by displaying all these recommendation scores in the \(15 \times 20\) matrix relating users to movies, with blanks in the cells where a user has already watched a movie.
# New User_Based_Recommendations Function
user_based_recommendations_2 <- function(user, user_sim, viewed_mov){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# get scores
user_scores <- data.frame(title = colnames(viewed_mov),
score = as.vector(user_sim[user,] %*% viewed_mov),
seen = as.vector(viewed_mov[user,]))
# sort unseen movies by score and remove the 'seen' column
user_scores %>%
select(-seen)
}
# Create Matrix with users and movies recommendation rating
recommendation_scores <- matrix(nrow = nrow(user_similarities), ncol = ncol(viewed_movies))
row.names(recommendation_scores) <- row.names(user_similarities)
colnames(recommendation_scores) <- colnames(viewed_movies)
for (i in 1:nrow(user_similarities)){
recommendation_scores[i,] <- user_based_recommendations_2(user = row.names(user_similarities)[i]
, user_sim = user_similarities, viewed_mov = viewed_movies)[,"score"]
for (j in 1:20){
if (viewed_movies[i,j] == 1){
recommendation_scores[i,j] = 0
}
else{recommendation_scores[i,j] = recommendation_scores[i,j]}
}
}
# See result
t(round(recommendation_scores,2)) %>% head(5)
## 1 20 187 198 212 222 282 328 330
## 2001: A Space Odyssey (1968) 3.25 1.87 0.00 0.00 0.00 2.72 3.88 0.00 0.00
## Apocalypse Now (1979) 0.00 1.83 3.66 3.60 3.70 0.00 0.00 0.00 0.00
## Big Lebowski, The (1998) 0.00 2.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## Bourne Identity, The (2002) 2.44 0.00 2.72 2.61 2.71 2.98 0.00 0.00 0.00
## Clear and Present Danger (1994) 0.00 0.49 0.86 1.26 0.93 0.95 1.41 1.31 1.26
## 372 432 434 495 562 594
## 2001: A Space Odyssey (1968) 0.00 2.91 0 2.89 2.45 1.32
## Apocalypse Now (1979) 0.00 3.95 0 0.00 2.78 0.00
## Big Lebowski, The (1998) 4.07 4.44 0 0.00 3.29 2.03
## Bourne Identity, The (2002) 1.94 0.00 0 2.81 2.01 1.55
## Clear and Present Danger (1994) 0.89 0.89 0 0.91 0.77 0.46
Our new recommendation matrix shows the users on rows and columns for movie titles (above result is transposed). The elements show the respective recommendation scores for each movie for each user. If user has watched the movie, a 0 is in place.
# Test if it works with User 222
sort(recommendation_scores["222",], decreasing = T) %>% head(5)
## Shining, The (1980) Ferris Bueller's Day Off (1986)
## 4.068104 3.242863
## Bourne Identity, The (2002) 2001: A Space Odyssey (1968)
## 2.981638 2.721023
## Green Mile, The (1999)
## 2.710021
Shining again is top recommendation for user 222.
A variant on the above is a k-nearest-neighbours approach that bases recommendations only on k most similar users. This is faster when there are many users. We try to implement this. An example of this implementation is done here with different data.
using the same data we have been playing with lets implement the
KNN algorithm for recommendations.
KNN <- function(user, user_sim, viewed_mov, k){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# top 5 users similar to user, set other users similarity to 0
sim_peeps <- names(sort(user_sim[user,], decreasing = T))[1:k]
sim_peeps
user_sim[user,which(!colnames(user_sim) %in% sim_peeps)] <- 0
# get scores
user_scores <- data.frame(title = colnames(viewed_mov),
score = as.vector(user_sim[user,] %*% viewed_mov),
seen = as.vector(viewed_mov[user,]))
# sort unseen movies by score and remove the 'seen' column
user_scores %>%
filter(seen == 0) %>%
arrange(desc(score)) %>%
select(-seen)
}
# Using KNN Function
KNN(328, user_sim = user_similarities, viewed_mov = viewed_movies, k = 5)
## title score
## 1 Green Mile, The (1999) 2.3316562
## 2 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.5618559
## 3 Jumanji (1995) 1.5618559
## 4 There's Something About Mary (1998) 1.5618559
## 5 Clear and Present Danger (1994) 0.8401681
## 6 Interview with the Vampire: The Vampire Chronicles (1994) 0.7216878
## 7 Sleepless in Seattle (1993) 0.7216878
## 8 Star Trek: Generations (1994) 0.0000000
# Using Recommender Function with all users
user_based_recommendations(user = 328, user_sim = user_similarities, viewed_mov = viewed_movies)
## title score
## 1 Green Mile, The (1999) 4.0734508
## 2 Jumanji (1995) 3.0242372
## 3 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.8674803
## 4 There's Something About Mary (1998) 2.2252517
## 5 Interview with the Vampire: The Vampire Chronicles (1994) 1.8953941
## 6 Clear and Present Danger (1994) 1.3115726
## 7 Sleepless in Seattle (1993) 1.1282141
## 8 Star Trek: Generations (1994) 0.1178511
Here, we provide an alternative approach where we compute similarities between interests directly. We can then generate suggestions for each user by aggregating interests that are similar to her current interests.
Item-based collaborative filtering works very similarly to its user-based counterpart, although you might find it slightly less intuitive. It is also based on similarities, but similarities between movies rather than users.
There are two main conceptual parts to item-based collaborative filtering:
Essentially for item-based collaborative filtering: for each movie \(j\) not seen yet by user. We compute similarity between movie \(j\) and each movie already seen by user. Then we sum these up and pick the highest (the unseen movie that is most similar to those already seen).
In contrast to user-based where we recommend an unseen movie that has been seen by the most similar users. Now we recommend unseen movies that are most similar to movies I have already seen - item based CF.
Let’s start by computing the similarities between all pairs of movies. We can reuse the same code we used to compute user similarities, if we first transpose the viewed_movies matrix.
# transpose the viewed_movies matrix
movies_user <- t(viewed_movies)
# get all similarities between MOVIES
movie_similarities <- matrix(0, nrow = 20, ncol = 20)
for (i in 1:19) {
for (j in (i + 1):20) {
movie_similarities[i,j] <- cosine_sim(viewed_movies[,i], viewed_movies[,j])
}
}
movie_similarities <- movie_similarities + t(movie_similarities)
diag(movie_similarities) <- 0
row.names(movie_similarities) <- colnames(viewed_movies)
colnames(movie_similarities) <- colnames(viewed_movies)
We can use the result to see, for example, what movies are most similar to “Apocalypse Now”:
sort(movie_similarities[,"Apocalypse Now (1979)"], decreasing = TRUE) %>% head(5)
## Departed, The (2006) Big Lebowski, The (1998)
## 0.7559289 0.7378648
## Kill Bill: Vol. 2 (2004) There's Something About Mary (1998)
## 0.7071068 0.6666667
## Ferris Bueller's Day Off (1986)
## 0.6299408
Let’s again look at a concrete example of recommending a movie to a particular user, say user 372.
User 372 has seen the following movies:
which(viewed_movies["372", ] == 1)
## 2001: A Space Odyssey (1968) Apocalypse Now (1979)
## 1 2
## Shining, The (1980)
## 16
Another way of doing the same thing:
ratings_red %>%
filter(userId == 372) %>%
select(userId, title)
## # A tibble: 3 × 2
## userId title
## <int> <chr>
## 1 372 2001: A Space Odyssey (1968)
## 2 372 Apocalypse Now (1979)
## 3 372 Shining, The (1980)
We now implement the main idea behind item-based filtering. For each movie, we find the similarities between that movie and each of the three movies user 372 has seen, and sum up those similarities. The resulting sum is that movie’s “recommendation score”.
We start by identifying the movies the user has seen:
user_seen <- ratings_red %>%
filter(userId == 372) %>%
select(title) %>%
unlist() %>%
as.character()
user_seen
## [1] "2001: A Space Odyssey (1968)" "Apocalypse Now (1979)"
## [3] "Shining, The (1980)"
We then compute the similarities between all movies and these “seen” movies. For example, similarities for the first seen movie, 2001: A Space Odyssey are:
sort(movie_similarities[,user_seen[1]], decreasing = TRUE) %>% head(5)
## Shining, The (1980)
## 0.7977240
## Donnie Darko (2001)
## 0.7559289
## Big Lebowski, The (1998)
## 0.7171372
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## 0.6761234
## Harry Potter and the Philosopher's Stone (2001)
## 0.6761234
We can do the same for each of the three seen movies or, more simply, do all three at once:
movie_similarities[,user_seen] %>% head(6)
## 2001: A Space Odyssey (1968)
## 2001: A Space Odyssey (1968) 0.0000000
## Apocalypse Now (1979) 0.5039526
## Big Lebowski, The (1998) 0.7171372
## Bourne Identity, The (2002) 0.4629100
## Clear and Present Danger (1994) 0.2672612
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 0.6761234
## Apocalypse Now (1979)
## 2001: A Space Odyssey (1968) 0.5039526
## Apocalypse Now (1979) 0.0000000
## Big Lebowski, The (1998) 0.7378648
## Bourne Identity, The (2002) 0.5443311
## Clear and Present Danger (1994) 0.4714045
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 0.2981424
## Shining, The (1980)
## 2001: A Space Odyssey (1968) 0.7977240
## Apocalypse Now (1979) 0.6030227
## Big Lebowski, The (1998) 0.7627701
## Bourne Identity, The (2002) 0.6154575
## Clear and Present Danger (1994) 0.4264014
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 0.5393599
Each movie’s recommendation score is obtained by summing across columns, each column representing a seen movie:
# use 1 for columns
sort(apply(movie_similarities[, user_seen], 1, sum), decreasing = T) %>% head(5)
## Big Lebowski, The (1998)
## 2.217772
## Ferris Bueller's Day Off (1986)
## 1.885133
## Kill Bill: Vol. 2 (2004)
## 1.881231
## Donnie Darko (2001)
## 1.858952
## Indiana Jones and the Temple of Doom (1984)
## 1.777577
The preceding explanation hopefully makes the details of the calculations clear, but it is quite unwieldy. We can do all the calculations more neatly as:
user_scores <- tibble(title = row.names(movie_similarities),
score = apply(movie_similarities[,user_seen], 1, sum),
seen = viewed_movies["372",])
user_scores %>%
filter(seen == 0) %>%
arrange(desc(score)) %>% head(5)
## # A tibble: 5 × 3
## title score seen
## <chr> <dbl> <dbl>
## 1 Big Lebowski, The (1998) 2.22 0
## 2 Ferris Bueller's Day Off (1986) 1.89 0
## 3 Kill Bill: Vol. 2 (2004) 1.88 0
## 4 Donnie Darko (2001) 1.86 0
## 5 Indiana Jones and the Temple of Doom (1984) 1.78 0
Again we will end up recommending “The Big Lebowski” to this particular user.
Let’s repeat the process to generate a recommendation for one more user, user 222:
# do for user 222
user <- "222"
user_seen <- ratings_red %>%
filter(userId == user) %>%
select(title) %>%
unlist() %>%
as.character()
user_scores <- tibble(title = row.names(movie_similarities),
score = apply(movie_similarities[,user_seen],1,sum),
seen = viewed_movies[user,])
user_scores %>%
filter(seen == 0) %>%
arrange(desc(score))
## # A tibble: 13 × 3
## title score seen
## <chr> <dbl> <dbl>
## 1 Bourne Identity, The (2002) 4.18 0
## 2 Ferris Bueller's Day Off (1986) 3.92 0
## 3 Shining, The (1980) 3.79 0
## 4 Donnie Darko (2001) 3.76 0
## 5 Harry Potter and the Philosopher's Stone (2001) 3.59 0
## 6 Green Mile, The (1999) 3.39 0
## 7 2001: A Space Odyssey (1968) 3.38 0
## 8 Indiana Jones and the Temple of Doom (1984) 3.09 0
## 9 Interview with the Vampire: The Vampire Chronicles (1994) 2.89 0
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.87 0
## 11 Clear and Present Danger (1994) 2.39 0
## 12 Sleepless in Seattle (1993) 2.04 0
## 13 Star Trek: Generations (1994) 1.24 0
Here we see a different top recommendation (“The Bourne Identity”) to what was produced by the user-based system.
# a function to generate an item-based recommendation for any user
item_based_recommendations <- function(user, movie_sim, viewed_mov){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# get scores
user_seen <- row.names(movie_sim)[viewed_mov[user,] == TRUE]
user_scores <- tibble(title = row.names(movie_sim),
score = apply(movie_sim[,user_seen], 1, sum),
seen = viewed_mov[user,])
# sort unseen movies by score and remove the 'seen' column
user_scores %>%
filter(seen == 0) %>%
arrange(desc(score)) %>%
select(-seen)
}
Let’s check that its working with a user we’ve seen before, user 372:
item_based_recommendations(user = 372, movie_sim = movie_similarities, viewed_mov = viewed_movies)
## # A tibble: 17 × 2
## title score
## <chr> <dbl>
## 1 Big Lebowski, The (1998) 2.22
## 2 Ferris Bueller's Day Off (1986) 1.89
## 3 Kill Bill: Vol. 2 (2004) 1.88
## 4 Donnie Darko (2001) 1.86
## 5 Indiana Jones and the Temple of Doom (1984) 1.78
## 6 Departed, The (2006) 1.75
## 7 Green Mile, The (1999) 1.73
## 8 Harry Potter and the Philosopher's Stone (2001) 1.66
## 9 Bourne Identity, The (2002) 1.62
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.51
## 11 Up (2009) 1.49
## 12 There's Something About Mary (1998) 1.35
## 13 Jumanji (1995) 1.22
## 14 Clear and Present Danger (1994) 1.17
## 15 Interview with the Vampire: The Vampire Chronicles (1994) 1.16
## 16 Sleepless in Seattle (1993) 0.951
## 17 Star Trek: Generations (1994) 0.333
And now do it for all users with lapply
lapply(sorted_my_users, item_based_recommendations, movie_similarities, viewed_movies) %>% head(5)
## [[1]]
## # A tibble: 14 × 2
## title score
## <chr> <dbl>
## 1 Departed, The (2006) 3.39
## 2 Kill Bill: Vol. 2 (2004) 3.39
## 3 Ferris Bueller's Day Off (1986) 3.38
## 4 2001: A Space Odyssey (1968) 3.22
## 5 Bourne Identity, The (2002) 3.13
## 6 Donnie Darko (2001) 2.91
## 7 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.63
## 8 Harry Potter and the Philosopher's Stone (2001) 2.61
## 9 Up (2009) 2.61
## 10 There's Something About Mary (1998) 2.40
## 11 Jumanji (1995) 2.24
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.48
## 13 Sleepless in Seattle (1993) 1.35
## 14 Star Trek: Generations (1994) 0.333
##
## [[2]]
## # A tibble: 16 × 2
## title score
## <chr> <dbl>
## 1 Donnie Darko (2001) 2.36
## 2 Kill Bill: Vol. 2 (2004) 2.25
## 3 Departed, The (2006) 2.23
## 4 2001: A Space Odyssey (1968) 2.12
## 5 There's Something About Mary (1998) 2.12
## 6 Shining, The (1980) 2.06
## 7 Big Lebowski, The (1998) 2.04
## 8 Ferris Bueller's Day Off (1986) 1.94
## 9 Green Mile, The (1999) 1.93
## 10 Up (2009) 1.90
## 11 Apocalypse Now (1979) 1.83
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.69
## 13 Indiana Jones and the Temple of Doom (1984) 1.53
## 14 Sleepless in Seattle (1993) 1.22
## 15 Clear and Present Danger (1994) 1.21
## 16 Star Trek: Generations (1994) 0.408
##
## [[3]]
## # A tibble: 13 × 2
## title score
## <chr> <dbl>
## 1 Departed, The (2006) 3.93
## 2 Bourne Identity, The (2002) 3.88
## 3 Harry Potter and the Philosopher's Stone (2001) 3.75
## 4 Ferris Bueller's Day Off (1986) 3.73
## 5 Apocalypse Now (1979) 3.68
## 6 Green Mile, The (1999) 3.62
## 7 Jumanji (1995) 3.21
## 8 Indiana Jones and the Temple of Doom (1984) 3.21
## 9 There's Something About Mary (1998) 3.13
## 10 Up (2009) 3.12
## 11 Sleepless in Seattle (1993) 2.08
## 12 Clear and Present Danger (1994) 2.06
## 13 Star Trek: Generations (1994) 0.5
##
## [[4]]
## # A tibble: 14 × 2
## title score
## <chr> <dbl>
## 1 Donnie Darko (2001) 3.49
## 2 Kill Bill: Vol. 2 (2004) 3.44
## 3 Ferris Bueller's Day Off (1986) 3.39
## 4 Bourne Identity, The (2002) 3.31
## 5 Apocalypse Now (1979) 3.24
## 6 Departed, The (2006) 3.13
## 7 Harry Potter and the Philosopher's Stone (2001) 3.12
## 8 Clear and Present Danger (1994) 2.62
## 9 Up (2009) 2.55
## 10 Jumanji (1995) 2.26
## 11 There's Something About Mary (1998) 2.20
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.97
## 13 Sleepless in Seattle (1993) 1.44
## 14 Star Trek: Generations (1994) 0
##
## [[5]]
## # A tibble: 14 × 2
## title score
## <chr> <dbl>
## 1 Kill Bill: Vol. 2 (2004) 3.73
## 2 Donnie Darko (2001) 3.68
## 3 Departed, The (2006) 3.61
## 4 Bourne Identity, The (2002) 3.49
## 5 Apocalypse Now (1979) 3.37
## 6 Indiana Jones and the Temple of Doom (1984) 3.20
## 7 Green Mile, The (1999) 3.07
## 8 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.92
## 9 Jumanji (1995) 2.47
## 10 There's Something About Mary (1998) 2.43
## 11 Clear and Present Danger (1994) 2.04
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.78
## 13 Sleepless in Seattle (1993) 1.44
## 14 Star Trek: Generations (1994) 0
Latent Matrix Factorization is an incredibly powerful method to use when creating a Recommender System. Ever since Latent Matrix Factorization was shown to outperform other recommendation methods in the Netflix Recommendation contest, its been a cornerstone in building Recommender Systems.
Latent Matrix Factorization is an algorithm tackling the Recommendation Problem: Given a set of m users and n items, and set of ratings from user for some items, try to recommend the top items for each user
So far we have seen that we can represent our users’ preferences as a matrix of 0s and 1s, where the 1s represent “liked” items and the 0s “unliked” items. Sometimes we might actually have numeric ratings; for example, when you write an Amazon review you assign the item a score ranging from 1 to 5 stars. You could still represent these by numbers in a matrix (ignoring for now the problem of what to do about unrated items).
In this section we’ll assume we have such ratings data and try to learn a model that can predict the rating for a given user and item.
In this section we’re going to look at a different way of doing collaborative filtering, one based on the idea of matrix factorization, a topic from linear algebra.
Matrix factorization, also called matrix decomposition, takes a matrix and represents it as a product of other (usually two) matrices. There are many ways to do matrix factorization, and different problems tend to use different methods. Factorization often involves finding underlying latent factors containing information about the data set.
In recommendation systems, matrix factorization is used to decompose the ratings matrix into the product of two matrices. This is done in such a way that the known ratings are matched as closely as possible.
The key feature of matrix factorization for recommendation systems is that while the ratings matrix is incomplete (i.e. some entries are blank), the two matrices the ratings matrix is decomposed into are complete (no blank entries). This gives a straightforward way of filling in blank spaces in the original ratings matrix, as we’ll see.
Its actually easier to see the underlying logic and calculations in a
spreadsheet setting; an additional excel file is provided using ratings
matrix as a .csvfile.
# get ratings in wide format
ratings_wide <- ratings_red %>%
select(userId,title,rating) %>%
complete(userId, title) %>%
spread(key = title, value = rating)
# convert data to matrix form
sorted_my_users <- as.character(unlist(ratings_wide[,1]))
ratings_wide <- as.matrix(ratings_wide[,-1])
row.names(ratings_wide) <- sorted_my_users
# save as csv for Excel demo
write.csv(ratings_wide,"output/ratings_for_excel_example.csv")
Note that ratings_wide is a matrix where each row
pertains to a user and each column is a movie. The elements are the
ratings provided by each user for the movie they have watched. NA is
filled in for movies that the user has not watched (or perhaps given a
review, but for now we shall assume every movie watched has been
reviewed).
We start by defining a function that will compute the sum of squared differences between the observed movie ratings and any other set of predicted ratings (for example, ones predicted by matrix factorization). Note that we only count movies that have already been rated in the accuracy calculation.
recommender_accuracy <- function(x, observed_ratings){
# extract user and movie factors from parameter vector (note x is defined such that
# the first 75 elements are latent factors for users and rest are for movies)
user_factors <- matrix(x[1:75], 15, 5)
movie_factors <- matrix(x[76:175], 5, 20)
# get predictions from dot products of respective user and movie factor
predicted_ratings <- user_factors %*% movie_factors
# model accuracy is sum of squared errors (SSE) over all rated movies
errors <- (observed_ratings - predicted_ratings) ^ 2
sqrt(mean(errors[!is.na(observed_ratings)])) # only use rated movies
}
We note that this function isn’t general, because it refers specifically to a ratings matrix with 15 users, 20 movies, and 5 latent factors.
We’ll now optimize the values in the user and movie latent factors,
choosing them so that the root mean square error We have done this using
R’s inbuilt numerical optimizer optim(), with the default
“Nelder-Mead” method. There are better ways to do this. Can try
different methods.
set.seed(10)
# optimization step
rec1 <- optim(par = runif(175), recommender_accuracy,
observed_ratings = ratings_wide, control = list(maxit = 100000))
paste("Did optimizer converge: ", ifelse(rec1$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge: No"
paste("Objective Function value: ", round(rec1$value,4))
## [1] "Objective Function value: 0.3431"
The best value of the objective function found by
optim() after 100000 iterations is 0.343, but note that it
hasn’t converged yet, so we should really run for longer or try another
optimizer. Ignoring this for now, we can extract the optimal user and
movie factors. With a bit of work, these can be interpreted and often
give useful information. However, we do not cover that and is above the
scope of this article.
Extract the optimal user factors:
# extract optimal user factors
user_factors <- matrix(rec1$par[1:75], 15, 5)
head(user_factors)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.8704994 1.1994809 0.7636127 -0.92636218 -1.200934246
## [2,] 2.3724241 0.3416254 0.3262135 -0.49322154 1.889202017
## [3,] 1.1111085 1.8394120 0.5013395 -1.02570560 0.328460876
## [4,] 0.4730928 0.7980941 1.9927642 -1.44234539 -0.006360044
## [5,] -0.7133853 1.6069591 1.7957153 0.09586535 1.138985480
## [6,] -0.9429216 2.1016100 1.2595030 1.24631247 1.399925211
Extract the optimal movie factors:
# extract optimal movie factors
movie_factors <- matrix(rec1$par[76:175], 5, 20)
head(movie_factors)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.45213586 0.5664772 1.5820719 0.1568805 1.6091279 0.4922459
## [2,] 1.24068067 3.2651238 1.6937969 2.3894585 -2.4668102 -0.1371224
## [3,] 1.38205926 -1.7512861 0.5652246 0.8756689 0.8256912 -0.2268436
## [4,] -0.09118251 0.6880326 0.3485811 0.9556278 -3.0262817 -2.2203910
## [5,] -0.17154368 -1.0659075 0.2554457 1.0377295 -0.5186627 0.7401770
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1.17595084 -0.7182418 1.2398087 1.27433003 0.58822856 0.5804220
## [2,] 1.26457554 1.2561888 1.5108612 0.45180827 1.69939470 1.0047280
## [3,] 1.06426140 -1.9826785 0.4479486 1.64782246 -0.28835189 0.2869269
## [4,] -0.08885995 -2.0247182 0.5343086 -0.67048311 0.04518522 -0.9562937
## [5,] 0.98989633 2.5735647 0.1750985 0.07328022 1.38497948 -1.4930700
## [,13] [,14] [,15] [,16] [,17] [,18]
## [1,] 0.89899374 1.3108956 1.1921026 1.0087201 1.85861590 1.7595161
## [2,] 0.07560978 0.7372890 0.9595835 1.6531187 -0.27566768 -1.0703908
## [3,] 1.79849298 1.1676913 1.4426121 0.8330061 -0.01684526 -2.0657576
## [4,] -2.03705529 0.8482193 -0.2506995 0.2437435 0.62911225 0.3398119
## [5,] -0.73585067 -0.2217906 1.2143087 0.9179228 -1.02828691 -1.5899725
## [,19] [,20]
## [1,] 1.8285378 -0.5284306
## [2,] 1.7410831 1.6426011
## [3,] -0.6007035 0.1925521
## [4,] 0.6548961 -0.7082477
## [5,] 0.7074759 -0.4665972
Most importantly, we can get predicted movie ratings for any user, by taking the appropriate dot product of user and movie factors. Here we show the predictions for user 1:
# check predictions for one user
predicted_ratings <- user_factors %*% movie_factors
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 3.7 4.3 4.8 1.7 4.1 1.8 3.4 -2.6 3.8 4.7 1.2 5.2 5.9 3.7
## [2,] NA 4.0 5.0 NA 4.0 NA NA NA NA 5.0 NA 5.0 NA NA
## [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 3.3 3.2 3.8 2 3.6 2.3
## [2,] NA 3.0 NA NA NA NA
# Create Function to get Predicted Rating for any user
get_predictions <- function(user, movie_name = NULL){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# set names of matrices
rownames(predicted_ratings) <- rownames(viewed_movies)
colnames(predicted_ratings) <- colnames(viewed_movies)
# how to get results
if (!is.null(movie_name)){
res <- rbind(round(predicted_ratings[user,movie_name], 1), as.numeric(ratings_wide[user,movie_name]))
}
else{res <- rbind(round(predicted_ratings[user,], 1), as.numeric(ratings_wide[user,]))}
# show result
rownames(res) <- c("Predicted", "Actual")
return(res)
}
# Try Function
get_predictions(user=187)
## 2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted 3.5 4.7
## Actual 4.0 NA
## Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted 4.9 4.4
## Actual 5.0 NA
## Clear and Present Danger (1994)
## Predicted 0.6
## Actual NA
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted 2.7
## Actual 3.0
## Departed, The (2006) Donnie Darko (2001)
## Predicted 4.6 3.4
## Actual NA 3.5
## Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted 3.9 3.8
## Actual NA NA
## Harry Potter and the Philosopher's Stone (2001)
## Predicted 4
## Actual NA
## Indiana Jones and the Temple of Doom (1984)
## Predicted 3.1
## Actual NA
## Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted 3.9
## Actual 3.5
## Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted 2.5 4.5 4.6
## Actual NA 4.0 4.5
## Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted 0.6 -1.9
## Actual NA NA
## There's Something About Mary (1998) Up (2009)
## Predicted 4.5 3.1
## Actual NA NA
get_predictions(user=187)
## 2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted 3.5 4.7
## Actual 4.0 NA
## Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted 4.9 4.4
## Actual 5.0 NA
## Clear and Present Danger (1994)
## Predicted 0.6
## Actual NA
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted 2.7
## Actual 3.0
## Departed, The (2006) Donnie Darko (2001)
## Predicted 4.6 3.4
## Actual NA 3.5
## Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted 3.9 3.8
## Actual NA NA
## Harry Potter and the Philosopher's Stone (2001)
## Predicted 4
## Actual NA
## Indiana Jones and the Temple of Doom (1984)
## Predicted 3.1
## Actual NA
## Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted 3.9
## Actual 3.5
## Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted 2.5 4.5 4.6
## Actual NA 4.0 4.5
## Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted 0.6 -1.9
## Actual NA NA
## There's Something About Mary (1998) Up (2009)
## Predicted 4.5 3.1
## Actual NA NA
One trick that can improve the performance of matrix factorization collaborative filtering is to add L2 regularization. L2 regularization adds a penalty term to the function that we’re trying to minimize, which penalizes large parameter values.
We first rewrite the evaluate_fit function to make use of L2 regularization:
## adds L2 regularization, often improves accuracy
evaluate_fit_l2 <- function(x, observed_ratings, lambda){
# extract user and movie factors from parameter vector
user_factors <- matrix(x[1:75], 15, 5)
movie_factors <- matrix(x[76:175], 5, 20)
# get predictions from dot products
predicted_ratings <- user_factors %*% movie_factors
errors <- (observed_ratings - predicted_ratings) ^ 2
# L2 norm penalizes large parameter values
penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
# model accuracy contains an error term and a weighted penalty
accuracy <- sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
return(accuracy)
}
We now rerun the optimization with this new evaluation function:
set.seed(10)
# optimization step
rec2 <- optim(par = runif(175), evaluate_fit_l2,
lambda = 3e-2, observed_ratings = ratings_wide, control = list(maxit = 100000))
paste("Did optimizer converge:", ifelse(rec2$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge: No"
paste("Objective Function value:", round(rec2$value,4))
## [1] "Objective Function value: 0.7424"
The best value found is worse than before, but remember that we changed the objective function to include the L2 penalty term, so the numbers are not comparable. We need to extract just the RMSE that we’re interested in. To do that we first need to extract the optimal parameter values (user and movie factors), and multiply these matrices together to get predicted ratings. From there, its easy to calculate the errors.
# extract optimal user and movie factors
user_factors <- matrix(rec2$par[1:75], 15, 5)
movie_factors <- matrix(rec2$par[76:175], 5, 20)
# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors
# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2
paste("Accuracy:", sqrt(mean(errors[!is.na(ratings_wide)])))
## [1] "Accuracy: 0.280483618973422"
Compare this with what we achieved without L2 regularization: did it work? Indeed. As before, we can extract user and movie factors, and get predictions for any user.
# check predictions for one user
rbind(round(predicted_ratings[1,],1), as.numeric(ratings_wide[1,]))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 4.2 3.7 4.8 4.3 4.1 4.4 5.2 5.7 4 5 0.9 5 5 -0.8
## [2,] NA 4.0 5.0 NA 4.0 NA NA NA NA 5 NA 5 NA NA
## [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 1.1 3 6 -2 2.3 3.8
## [2,] NA 3 NA NA NA NA
We can see that the predicted ratings, top row is very close and similar to the observed ratings.
Bias terms are additive factors that model the fact that some users are more generous than others (and so will give higher ratings, on average) and some movies are better than others (and so will get higher ratings, on average).
Let’s adapt our evaluation function further to include bias terms for both users and movies:
## add an additive bias term for each user and movie
evaluate_fit_l2_bias <- function(x, observed_ratings, lambda){
# extract user and movie factors and bias terms from parameter vector
user_factors <- matrix(x[1:75], 15, 5)
movie_factors <- matrix(x[76:175], 5, 20)
# the bias vectors are repeated to make the later matrix calculations easier
user_bias <- matrix(x[176:190],nrow = 15, ncol = 20)
movie_bias <- t(matrix(x[191:210], nrow = 20, ncol = 15))
# get predictions from dot products + bias terms
predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias
errors <- (observed_ratings - predicted_ratings) ^ 2
# L2 norm penalizes large parameter values (note not applied to bias terms)
penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
# model accuracy contains an error term and a weighted penalty
sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
}
Again, rerun the optimization:
set.seed(10)
# optimization step (note longer parameter vector to include bias)
rec3 <- optim(par = runif(220), evaluate_fit_l2_bias,
observed_ratings = ratings_wide, lambda = 3e-2, control = list(maxit = 100000))
paste("Did optimizer converge:", ifelse(rec3$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge: No"
paste("Objective Function value:", round(rec3$value,4))
## [1] "Objective Function value: 0.499"
This value isn’t comparable to either of the previous values, for the same reason as before: the objective function has changed to include bias terms. Extracting just the RMSE:
# extract optimal user and movie factors and bias terms
user_factors <- matrix(rec3$par[1:75], 15, 5)
movie_factors <- matrix(rec3$par[76:175], 5, 20)
user_bias <- matrix(rec3$par[176:190], nrow = 15, ncol = 20)
movie_bias <- t(matrix(rec3$par[191:210], nrow = 20, ncol = 15))
# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias
# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2
paste("Accuracy:", sqrt(mean(errors[!is.na(ratings_wide)])))
## [1] "Accuracy: 0.178432511702095"
This is indeed an improvement over what we’ve seen before (at least, for the parameter settings above!).
We can examine and interpret the user or movie latent factors, or bias terms, if we want to. Below we show the movie bias terms, which gives some reflection of movie quality (with some notable exceptions!)
data.frame(movies = colnames(viewed_movies), bias = movie_bias[1,]) %>% arrange(desc(bias)) %>% head(5)
## movies bias
## 1 Clear and Present Danger (1994) 3.384641
## 2 Green Mile, The (1999) 2.668923
## 3 Ferris Bueller's Day Off (1986) 2.192063
## 4 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.111422
## 5 Kill Bill: Vol. 2 (2004) 1.920053
Finally, we again get predicted ratings for one user:
# check predictions for one user
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 1.6 4 5.1 3.1 4 2.7 2.9 4.3 6 5 4.4 4.9 4.5 3.4
## [2,] NA 4 5.0 NA 4 NA NA NA NA 5 NA 5.0 NA NA
## [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 3.2 3 5.7 1.6 4.8 3.6
## [2,] NA 3 NA NA NA NA
# Create Function to get Predicted Rating for any user
get_predictions <- function(user, movie_name = NULL){
# turn into character if not already
user <- ifelse(is.character(user), user, as.character(user))
# set names of matrices
rownames(predicted_ratings) <- rownames(viewed_movies)
colnames(predicted_ratings) <- colnames(viewed_movies)
# how to get results
if (!is.null(movie_name)){
res <- rbind(round(predicted_ratings[user,movie_name], 1), as.numeric(ratings_wide[user,movie_name]))
}
else{res <- rbind(round(predicted_ratings[user,], 1), as.numeric(ratings_wide[user,]))}
# show result
rownames(res) <- c("Predicted", "Actual")
return(res)
}
# Try Function
get_predictions(user=20)
## 2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted 3.4 1.8
## Actual NA NA
## Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted 0.1 2.9
## Actual NA 3.0
## Clear and Present Danger (1994)
## Predicted 3.4
## Actual NA
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted 3.5
## Actual 3.5
## Departed, The (2006) Donnie Darko (2001)
## Predicted 1.6 1.1
## Actual NA NA
## Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted -0.4 4.3
## Actual NA NA
## Harry Potter and the Philosopher's Stone (2001)
## Predicted 4.5
## Actual 4.5
## Indiana Jones and the Temple of Doom (1984)
## Predicted 0.6
## Actual NA
## Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted -0.9
## Actual NA
## Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted 3.2 4.7 2.2
## Actual 3.0 NA NA
## Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted 0.7 0.2
## Actual NA NA
## There's Something About Mary (1998) Up (2009)
## Predicted 4.2 -0.2
## Actual NA NA
There are a few places in the notebook where we can build upon:
recommender_accuracy() function so that it
can be used with an arbitrary number of users and movies.mean centering is a good idea
you can get a predictive result for these systems by scaling the similarity index so that it sums to one. And your predictions are the weighted sum of the observed ratings from those users that have given the rating.
cant really keep users aside for “test data” as this is not the case when you actually apply the system, as all users will be availble. More over, youy will have to create seperate matrices to calucluate the kept aside users.
randomly remove a proportion of the ratings. And leave them aside for testing.