Overview and Introduction

In this tutorial we shall introduce recommender systems based on collaborative filtering (check google resource). Specifically, we shall look at three CF examples:

  1. user-based collaborative filtering

  2. item-based collaborative filtering

  3. matrix factorization

  • L2 regularization and bias terms (two ways of improving recommender systems based on matrix factorization)

The goal of this tutorial is to use these above mentioned approaches to build a system for recommending movies (or anything for that matter) to users based on their past viewing habits.

References

We shall be using examples throughout the tutorial based of the working of Ian Durbach; his profile is found here. This tutorial is based of teaching from Google’s Recomenndation Systems notes and Data Science for Industry course from university of Cape Town. Other great resources that are used for this tutorial are shown below:

Background

You interact nearly every day with recommendation systems—algorithms which guess what products and services you might like, based on your past behavior. These systems largely rely on collaborative-filtering, an approach based on linear algebra that fills in the missing values in a matrix.

A very common problem to solve is when you have a number of users and a number of products, and you want to recommend which products are most likely to be useful for which users; such as recommending movies (such as on Netflix) or video on YouTube that you might want to watch next, for example. Further examples: Amazon recommends products you might want to buy. Twitter recommends users you might want to follow.

A recommendation system helps users find compelling content in a large corpora. Crazily, \(40%\) of app installs on Google Play come from recommendations and \(60%\) of watch time on YouTube comes from recommendations (Google, 2022).

There are several ways to recommend items to users, like simply recommending what’s popular - this approach neglects a user’s existing interests. Instead we shall look at an approach which takes into account each user’s existing interests. Specifically, the approach is called collaborative filtering.

Collabarative filtering is a type of recommender system that works as follows: look at what products the current user has used or liked, find other users that have used or liked similar products, and then recommend other products that those users have used or liked.

Similarity Measures

A similarity measure is a function that takes a pair of embeddings and returns a scalar measuring their similarity. To determine the degree of similarity, most recommendation systems rely on one or more of the following:

  • cosine

  • dot product

  • Euclidean distance

Setup

We load required packages and the data set for this tutorial.

# Packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# Load data
load("output/recommender.RData")

Let’s view the data that is contained in this R object.

# View Movies
head(viewed_movies, 6)
## # A tibble: 6 × 21
##   userId 2001:…¹ Apoca…² Big L…³ Bourn…⁴ Clear…⁵ Crouc…⁶ Depar…⁷ Donni…⁸ Ferri…⁹
##    <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1      1       0       1       1       0       1       0       0       0       0
## 2     20       0       0       0       1       0       1       0       0       0
## 3    187       1       0       1       0       0       1       0       1       0
## 4    198       1       0       1       0       0       1       0       0       0
## 5    212       1       0       1       0       0       0       0       0       1
## 6    222       0       1       1       0       0       0       1       0       0
## # … with 11 more variables: `Green Mile, The (1999)` <dbl>,
## #   `Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)` <dbl>,
## #   `Indiana Jones and the Temple of Doom (1984)` <dbl>,
## #   `Interview with the Vampire: The Vampire Chronicles (1994)` <dbl>,
## #   `Jumanji (1995)` <dbl>, `Kill Bill: Vol. 2 (2004)` <dbl>,
## #   `Shining, The (1980)` <dbl>, `Sleepless in Seattle (1993)` <dbl>,
## #   `Star Trek: Generations (1994)` <dbl>, …

We shorten the Harry Potter movie title - to make it look neater. We also view the movie names to see if the name was indeed changed.

# Change Harry Potter Name
viewed_movies <- rename(viewed_movies, `Harry Potter and the Philosopher's Stone (2001)` = `Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)`)

# Movie Names Changed?
names(viewed_movies)[10:14]
## [1] "Ferris Bueller's Day Off (1986)"                          
## [2] "Green Mile, The (1999)"                                   
## [3] "Harry Potter and the Philosopher's Stone (2001)"          
## [4] "Indiana Jones and the Temple of Doom (1984)"              
## [5] "Interview with the Vampire: The Vampire Chronicles (1994)"

Before we begin exploring our recommender methods, we need to also do some pre-processing of our data. We first need to convert the data to matrix form otherwise some of the later functions we use will give an error.

# get list of users
sorted_my_users <- as.character(unlist(viewed_movies[,1]))

# convert to matrix
viewed_movies <- as.matrix(viewed_movies[,-1])

# make row names the users
row.names(viewed_movies) <- sorted_my_users

Let’s look at our data one last time before diving into some brief theory on our first collabarative method.

# Shape
dim(viewed_movies)
## [1] 15 20
# Structure of Matrix
str(viewed_movies)
##  num [1:15, 1:20] 0 0 1 1 1 0 0 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:15] "1" "20" "187" "198" ...
##   ..$ : chr [1:20] "2001: A Space Odyssey (1968)" "Apocalypse Now (1979)" "Big Lebowski, The (1998)" "Bourne Identity, The (2002)" ...

We have 15 rows and 20 columns. The rows represent the users adn the columns represent the movies An extract of the first 5 users and the first 3 movies is shown below.

viewed_movies[1:5,1:3]
##     2001: A Space Odyssey (1968) Apocalypse Now (1979) Big Lebowski, The (1998)
## 1                              0                     1                        1
## 20                             0                     0                        0
## 187                            1                     0                        1
## 198                            1                     0                        1
## 212                            1                     0                        1

A 0 indicates the user has not watched that movie (a 1 indicates they have). Now let’s look at collaborative filtering.


Collabartive Filtering

Collaborative filtering uses similarities between users and items simultaneously to provide recommendations. This allows for serendipitous recommendations; that is, collaborative filtering models can recommend an item to user A based on the interests of a similar user B

We have our matrix for our movie recommender system. Each row represents a user. Each column represents an item (a movie). The feedback about movies falls into one of two categories:

  • Explicit: users specify how much they liked a particular movie by providing a numerical rating.

  • Implicit: if a user watches a movie, the system infers that the user is interested.

To simplify, we will assume that the feedback matrix is binary; that is, a value of 1 indicates interest in the movie.

User-Based (between users)

This type of CF is a way of taking a user’s interests into account by looking for users who are somehow similar to them, and then suggest the things that those users are interested in. In order to do that, we’ll need a way to measure how similar two users are. There are lots of different similarity measures. Here we’ll use cosine similarity. We’ll apply this to vectors of \(0\)s and \(1\)s, each vector representing one user’s interests. It will be 1 if the user specified the \(i^{th}\) interest, and \(0\) otherwise. Accordingly, “similar users” will mean “users whose interest vectors most nearly point in the same direction”.

  • Users with identical interests will have similarity 1.

  • Users with no identical interests will have similarity 0.

  • Otherwise, the similarity will fall in between, with numbers closer to 1 indicating “very similar” and numbers closer to 0 indicating “not very similar.”

To summarise user-based collaborative filtering: For each movie \(j\) not seen yet by user. Compute similarity between this user and all other users that have seen movie \(j\). Similarity score done using cosine similarity. Score where 0 is user being completely dissimilar to me, and 1 as exactly same tastes as me. Each person gets a weight (similarity to me). Sum these up. Pick the highest (the unseen movie that has been seen by the most similar users).

A possible issue with user-based CF is that it does not work as well when the number of items gets very large. In large- dimensional vector spaces most vectors are very far apart (and also point in very different directions). That is, when there are a large number of interests the “most similar users” to a given user might not be similar at all.

Consider the example taken from DSFS, looking at Amazon: a site like Amazon.com, from which I may have bought thousands of items over the last couple of decades. You could attempt to identify similar users to me based on buying patterns, but most likely in all the world there’s no one whose purchase history looks even remotely like mine. Whoever my “most similar” shopper is, he’s probably not similar to me at all, and his purchases would almost certainly make for lousy recommendations.

Summary

User-based CF extends the approach by changing how much each person’s vote counts. Specifically, when recommending what I should watch next, a user-based CF system will up-weight the votes of people that are “more similar” to me. In this context “similar” means “has seen many of the same movies as me”.

Getting Similarity between Users

Cosine similarity derives its name from the fact that it measures the cosine of the angle between two non-zero vectors. The closer the vectors lie to each other, the smaller the angle, and the closer the cosine is to 1. It can be shown that for two vectors \(\boldsymbol x\) and \(\boldsymbol y\):

\[cos(\theta) = \frac{\boldsymbol x \cdot \boldsymbol y}{||\boldsymbol x|| \ ||\boldsymbol y||} = \frac{\sum_{i=1}^{n}x_iy_i}{\sqrt{\sum_{i=1}^{n}x^2_i} \sqrt{\sum_{i=1}^{n}y^2_i}}\] We can use the crossprod() function in R to calculate the dot products.

# function calculating cosine similarity
cosine_sim <- function(a, b){crossprod(a, b) / sqrt(crossprod(a) * crossprod(b))}

Intuitively, If they are very dissimilar, then they point in different directions. They will be dissimilar if their are no correspondence between observations for vectors \(x\) and \(y\). For which he cross product will be 0. And so cosine of the angle is 0 then the angle is 90 degrees (orthogonal).

Cosine similarity lies between 0 and 1 inclusive and increases with similarity. Here are a few test cases to get a feel for it:

# maximally similar
x1 <- c(1,1,1,0,0)
x2 <- c(1,1,1,0,0)
cosine_sim(x1,x2)
##      [,1]
## [1,]    1

Users are exactly similar.

# maximally dissimilar
x1 <- c(1,1,1,0,0)
x2 <- c(0,0,0,1,1)
cosine_sim(x1,x2)
##      [,1]
## [1,]    0

Users are dissimilar.

# but also
x1 <- c(1,1,0,0,0,0,0)
x2 <- c(0,0,0,0,1,1,0)
cosine_sim(x1,x2)
##      [,1]
## [1,]    0

Users are dissimilar, even when they share 0’s for 3 columns. In this binary case, we only interested in the 1’s we share.

# try an example from our data
as.numeric(viewed_movies[1,]) # user 1's viewing history
##  [1] 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0
as.numeric(viewed_movies[2,]) # user 2's viewing history
##  [1] 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
cosine_sim(viewed_movies[1,], viewed_movies[2,])
##      [,1]
## [1,]    0
# example 2
as.numeric(viewed_movies[1,]) 
##  [1] 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0
as.numeric(viewed_movies[7,]) 
##  [1] 0 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 0 0 0 0
cosine_sim(viewed_movies[1,], viewed_movies[7,])
##           [,1]
## [1,] 0.6804138

Let’s get similarities between user pairs. We’ll do this with a loop below, because it’s easier to see what’s going on, but this will be inefficient and very slow for bigger data sets.

user_similarities <- matrix(0, nrow = 15, ncol = 15)
for (i in 1:14) {
  for (j in (i + 1):15) {
    user_similarities[i,j] <- cosine_sim(viewed_movies[i,], viewed_movies[j,])
  }
}
user_similarities <- user_similarities + t(user_similarities)
diag(user_similarities) <- 0
row.names(user_similarities) <- row.names(viewed_movies)
colnames(user_similarities) <- row.names(viewed_movies)
round(user_similarities, 3)
##         1    20   187   198   212   222   282   328   330   372   432   434
## 1   0.000 0.000 0.309 0.667 0.333 0.309 0.680 0.471 0.408 0.471 0.289 0.594
## 20  0.000 0.000 0.189 0.204 0.204 0.189 0.167 0.289 0.500 0.000 0.354 0.485
## 187 0.309 0.189 0.000 0.617 0.463 0.286 0.378 0.546 0.661 0.436 0.401 0.550
## 198 0.667 0.204 0.617 0.000 0.500 0.154 0.544 0.471 0.510 0.471 0.289 0.594
## 212 0.333 0.204 0.463 0.500 0.000 0.309 0.408 0.707 0.510 0.471 0.289 0.594
## 222 0.309 0.189 0.286 0.154 0.309 0.000 0.504 0.546 0.567 0.218 0.535 0.642
## 282 0.680 0.167 0.378 0.544 0.408 0.504 0.000 0.770 0.667 0.385 0.589 0.728
## 328 0.471 0.289 0.546 0.471 0.707 0.546 0.770 0.000 0.722 0.500 0.510 0.840
## 330 0.408 0.500 0.661 0.510 0.510 0.567 0.667 0.722 0.000 0.433 0.619 0.849
## 372 0.471 0.000 0.436 0.471 0.471 0.218 0.385 0.500 0.433 0.000 0.204 0.420
## 432 0.289 0.354 0.401 0.289 0.289 0.535 0.589 0.510 0.619 0.204 0.000 0.600
## 434 0.594 0.485 0.550 0.594 0.594 0.642 0.728 0.840 0.849 0.420 0.600 0.000
## 495 0.365 0.000 0.338 0.183 0.365 0.676 0.745 0.645 0.559 0.258 0.316 0.542
## 562 0.408 0.000 0.189 0.408 0.408 0.000 0.500 0.289 0.500 0.289 0.354 0.364
## 594 0.167 0.204 0.154 0.000 0.000 0.463 0.136 0.118 0.510 0.236 0.289 0.297
##       495   562   594
## 1   0.365 0.408 0.167
## 20  0.000 0.000 0.204
## 187 0.338 0.189 0.154
## 198 0.183 0.408 0.000
## 212 0.365 0.408 0.000
## 222 0.676 0.000 0.463
## 282 0.745 0.500 0.136
## 328 0.645 0.289 0.118
## 330 0.559 0.500 0.510
## 372 0.258 0.289 0.236
## 432 0.316 0.354 0.289
## 434 0.542 0.364 0.297
## 495 0.000 0.224 0.183
## 562 0.224 0.000 0.204
## 594 0.183 0.204 0.000

For the user similarity matrix, its 15 by 15, since we have 15 users in our little sample. Diagonal of matrix is 0. The rows and column names show the user. Elements of matrix indicate the similarity score between the respective users.

# who are the most similar users to user 222?
sort(user_similarities["222",], decreasing = TRUE)
##       495       434       330       328       432       282       594         1 
## 0.6761234 0.6416889 0.5669467 0.5455447 0.5345225 0.5039526 0.4629100 0.3086067 
##       212       187       372        20       198       222       562 
## 0.3086067 0.2857143 0.2182179 0.1889822 0.1543033 0.0000000 0.0000000

Let’s see if this makes sense from the viewing histories. Below we show user 222’s history, together with the user who is most similar to user 222 (user 495) and another user who is very dissimilar (user 562).

t(viewed_movies[c("222","495","562"),])
##                                                           222 495 562
## 2001: A Space Odyssey (1968)                                0   0   0
## Apocalypse Now (1979)                                       1   1   0
## Big Lebowski, The (1998)                                    1   1   0
## Bourne Identity, The (2002)                                 0   0   0
## Clear and Present Danger (1994)                             0   0   0
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)     0   0   0
## Departed, The (2006)                                        1   1   0
## Donnie Darko (2001)                                         0   0   0
## Ferris Bueller's Day Off (1986)                             0   1   1
## Green Mile, The (1999)                                      0   0   1
## Harry Potter and the Philosopher's Stone (2001)             0   0   0
## Indiana Jones and the Temple of Doom (1984)                 0   0   0
## Interview with the Vampire: The Vampire Chronicles (1994)   0   0   0
## Jumanji (1995)                                              1   0   0
## Kill Bill: Vol. 2 (2004)                                    1   1   0
## Shining, The (1980)                                         0   0   1
## Sleepless in Seattle (1993)                                 0   0   1
## Star Trek: Generations (1994)                               0   0   0
## There's Something About Mary (1998)                         1   0   0
## Up (2009)                                                   1   0   0

The results above indicate how the similarities do indeed make sense. User 222 and 495 do indeed share a lot of seen movies, in fact all but one movies seen by user 495 have been watched by 222 user. User 562 on the other hand does not share any viewed movies with user 222.

Recommending Movies for a Single User

As an example, let’s consider the process of recommending a movie to one user, say user 222. How would we do this with a user-based collaborative filtering system?

First, we need to know what movies have they already seen (so we don’t recommend these). Below are the movies watched by user 222

viewed_movies['222',which(viewed_movies["222",] ==1)]
##               Apocalypse Now (1979)            Big Lebowski, The (1998) 
##                                   1                                   1 
##                Departed, The (2006)                      Jumanji (1995) 
##                                   1                                   1 
##            Kill Bill: Vol. 2 (2004) There's Something About Mary (1998) 
##                                   1                                   1 
##                           Up (2009) 
##                                   1

The basic idea is now to recommend what’s popular by adding up the number of users that have seen each movie, but to weight each user by their similarity to user 222.

Let’s work through the calculations for one movie, say 2001: A Space Odyssey (movie 1). The table below shows who’s seen 2001: A Space Odyssey, and how similar each person is to user 222.

seen_movie <- viewed_movies[,"2001: A Space Odyssey (1968)"]
sim_to_user <- user_similarities["222",]
cbind(seen_movie,sim_to_user)
##     seen_movie sim_to_user
## 1            0   0.3086067
## 20           0   0.1889822
## 187          1   0.2857143
## 198          1   0.1543033
## 212          1   0.3086067
## 222          0   0.0000000
## 282          0   0.5039526
## 328          1   0.5455447
## 330          1   0.5669467
## 372          1   0.2182179
## 432          0   0.5345225
## 434          1   0.6416889
## 495          0   0.6761234
## 562          0   0.0000000
## 594          0   0.4629100

The basic idea in user-based collaborative filtering is that user 372’s vote counts less than user 434’s, because user 434 is more similar to user 222 (in terms of viewing history). Shown clearly below:

cbind(seen_movie,sim_to_user)[c(10,12),]
##     seen_movie sim_to_user
## 372          1   0.2182179
## 434          1   0.6416889

Note that this only means user 434 counts more in the context of making recommendations to user 222. When recommending to users other than user 222, user 372 may carry more weight.

We can now work out an overall recommendation score for 2001: A Space Odyssey by multiplying together the two elements in each row of the table we achieved just now, and summing these products (taking the dot product):

# overall score for 2001: A Space Odyssey
crossprod(viewed_movies[, "2001: A Space Odyssey (1968)"], user_similarities["222",])
##          [,1]
## [1,] 2.721023

Note this score will increase with (a) the number of people who’ve seen the movie (more 1’s in the first column above) and (b) if the people who’ve seen it are similar to user 1.

Let’s repeat this calculation for all movies and compare recommendation scores:

t(user_similarities["222",] %*% viewed_movies)
##                                                                [,1]
## 2001: A Space Odyssey (1968)                              2.7210226
## Apocalypse Now (1979)                                     3.9239911
## Big Lebowski, The (1998)                                  3.9914875
## Bourne Identity, The (2002)                               2.9816377
## Clear and Present Danger (1994)                           0.9502956
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)   1.8376355
## Departed, The (2006)                                      3.4687789
## Donnie Darko (2001)                                       2.0398947
## Ferris Bueller's Day Off (1986)                           3.2428631
## Green Mile, The (1999)                                    2.7100208
## Harry Potter and the Philosopher's Stone (2001)           2.2517693
## Indiana Jones and the Temple of Doom (1984)               2.1540964
## Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935
## Jumanji (1995)                                            2.3950504
## Kill Bill: Vol. 2 (2004)                                  3.7544932
## Shining, The (1980)                                       4.0681044
## Sleepless in Seattle (1993)                               1.0298568
## Star Trek: Generations (1994)                             0.4629100
## There's Something About Mary (1998)                       1.6715457
## Up (2009)                                                 2.0303629

To come up with a final recommendation, we just need to remember to remove movies user 222 has already seen, and sort the remaining movies in descending order of recommendation score.

We do that below, after tidying up the results a bit by putting them in a data frame.

user_scores <- data.frame(title = colnames(viewed_movies), 
                          score = as.vector(user_similarities["222",] %*% viewed_movies), 
                          seen = as.vector(viewed_movies["222",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score)) 
##                                                        title     score seen
## 1                                        Shining, The (1980) 4.0681044    0
## 2                            Ferris Bueller's Day Off (1986) 3.2428631    0
## 3                                Bourne Identity, The (2002) 2.9816377    0
## 4                               2001: A Space Odyssey (1968) 2.7210226    0
## 5                                     Green Mile, The (1999) 2.7100208    0
## 6            Harry Potter and the Philosopher's Stone (2001) 2.2517693    0
## 7                Indiana Jones and the Temple of Doom (1984) 2.1540964    0
## 8                                        Donnie Darko (2001) 2.0398947    0
## 9  Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935    0
## 10   Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.8376355    0
## 11                               Sleepless in Seattle (1993) 1.0298568    0
## 12                           Clear and Present Danger (1994) 0.9502956    0
## 13                             Star Trek: Generations (1994) 0.4629100    0

Therefore, our top recommendation for user 222 is “The Shining”.

Now that we’ve understood the calculations, let’s get recommendations for one more user, user 372:

# recommendations for user 372
user_scores <- data.frame(title = colnames(viewed_movies), 
                          score = as.vector(user_similarities["372",] %*% viewed_movies), 
                          seen = as.vector(viewed_movies["372",]))
user_scores %>% filter(seen == 0) %>% arrange(desc(score)) 
##                                                        title     score seen
## 1                                   Big Lebowski, The (1998) 4.0650630    0
## 2                                   Kill Bill: Vol. 2 (2004) 2.8549736    0
## 3                            Ferris Bueller's Day Off (1986) 2.7562755    0
## 4                                     Green Mile, The (1999) 2.6736052    0
## 5                                       Departed, The (2006) 2.4185378    0
## 6                Indiana Jones and the Temple of Doom (1984) 2.2477932    0
## 7                                Bourne Identity, The (2002) 1.9421211    0
## 8            Harry Potter and the Philosopher's Stone (2001) 1.8245012    0
## 9                                                  Up (2009) 1.8138306    0
## 10                                       Donnie Darko (2001) 1.7895325    0
## 11   Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.7609370    0
## 12                                            Jumanji (1995) 1.5111410    0
## 13 Interview with the Vampire: The Vampire Chronicles (1994) 1.3092749    0
## 14                       There's Something About Mary (1998) 1.3070169    0
## 15                               Sleepless in Seattle (1993) 0.9573901    0
## 16                           Clear and Present Danger (1994) 0.8914885    0
## 17                             Star Trek: Generations (1994) 0.2357023    0

We would recommend “The Big Lebowski” to user 372.

Function to Generate Recommendation For Any User

We essentially just combine and generalize each step we made in the previous section to get a function which takes in a user (a number or string), user_sim which is the user similarity matrix between all users, viewed_mov which is a matrix that shows which movie each user has seen or not seen.

# a function to generate a recommendation for any user
user_based_recommendations <- function(user, user_sim, viewed_mov){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # get scores
  user_scores <- data.frame(title = colnames(viewed_mov), 
                            score = as.vector(user_sim[user,] %*% viewed_mov), 
                            seen = as.vector(viewed_mov[user,]))
  
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    filter(seen == 0) %>% 
    arrange(desc(score)) %>% 
    select(-seen)
}

Let’s check the function is working by running it on a user we’ve used before:

user_based_recommendations(user = 222, user_sim = user_similarities, viewed_mov = viewed_movies)
##                                                        title     score
## 1                                        Shining, The (1980) 4.0681044
## 2                            Ferris Bueller's Day Off (1986) 3.2428631
## 3                                Bourne Identity, The (2002) 2.9816377
## 4                               2001: A Space Odyssey (1968) 2.7210226
## 5                                     Green Mile, The (1999) 2.7100208
## 6            Harry Potter and the Philosopher's Stone (2001) 2.2517693
## 7                Indiana Jones and the Temple of Doom (1984) 2.1540964
## 8                                        Donnie Darko (2001) 2.0398947
## 9  Interview with the Vampire: The Vampire Chronicles (1994) 1.8500935
## 10   Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.8376355
## 11                               Sleepless in Seattle (1993) 1.0298568
## 12                           Clear and Present Danger (1994) 0.9502956
## 13                             Star Trek: Generations (1994) 0.4629100

Now do it for all users with lapply:

lapply(sorted_my_users, user_based_recommendations, user_similarities, viewed_movies)

We now make this better by displaying all these recommendation scores in the \(15 \times 20\) matrix relating users to movies, with blanks in the cells where a user has already watched a movie.

# New User_Based_Recommendations Function
user_based_recommendations_2 <- function(user, user_sim, viewed_mov){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # get scores
  user_scores <- data.frame(title = colnames(viewed_mov), 
                            score = as.vector(user_sim[user,] %*% viewed_mov), 
                            seen = as.vector(viewed_mov[user,]))
  
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    select(-seen)
}

# Create Matrix with users and movies recommendation rating
recommendation_scores <- matrix(nrow = nrow(user_similarities), ncol = ncol(viewed_movies))
row.names(recommendation_scores) <- row.names(user_similarities)
colnames(recommendation_scores) <- colnames(viewed_movies)
for (i in 1:nrow(user_similarities)){
  recommendation_scores[i,] <- user_based_recommendations_2(user = row.names(user_similarities)[i]
, user_sim = user_similarities, viewed_mov = viewed_movies)[,"score"]
  for (j in 1:20){
    if (viewed_movies[i,j] == 1){
      recommendation_scores[i,j] = 0
    }
    else{recommendation_scores[i,j] = recommendation_scores[i,j]}
  }
}

# See result
t(round(recommendation_scores,2)) %>% head(5)
##                                    1   20  187  198  212  222  282  328  330
## 2001: A Space Odyssey (1968)    3.25 1.87 0.00 0.00 0.00 2.72 3.88 0.00 0.00
## Apocalypse Now (1979)           0.00 1.83 3.66 3.60 3.70 0.00 0.00 0.00 0.00
## Big Lebowski, The (1998)        0.00 2.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## Bourne Identity, The (2002)     2.44 0.00 2.72 2.61 2.71 2.98 0.00 0.00 0.00
## Clear and Present Danger (1994) 0.00 0.49 0.86 1.26 0.93 0.95 1.41 1.31 1.26
##                                  372  432 434  495  562  594
## 2001: A Space Odyssey (1968)    0.00 2.91   0 2.89 2.45 1.32
## Apocalypse Now (1979)           0.00 3.95   0 0.00 2.78 0.00
## Big Lebowski, The (1998)        4.07 4.44   0 0.00 3.29 2.03
## Bourne Identity, The (2002)     1.94 0.00   0 2.81 2.01 1.55
## Clear and Present Danger (1994) 0.89 0.89   0 0.91 0.77 0.46

Our new recommendation matrix shows the users on rows and columns for movie titles (above result is transposed). The elements show the respective recommendation scores for each movie for each user. If user has watched the movie, a 0 is in place.

# Test if it works with User 222
sort(recommendation_scores["222",], decreasing = T) %>% head(5)
##             Shining, The (1980) Ferris Bueller's Day Off (1986) 
##                        4.068104                        3.242863 
##     Bourne Identity, The (2002)    2001: A Space Odyssey (1968) 
##                        2.981638                        2.721023 
##          Green Mile, The (1999) 
##                        2.710021

Shining again is top recommendation for user 222.

\(K\)-Nearest Neighbours for User Based Filtering

A variant on the above is a k-nearest-neighbours approach that bases recommendations only on k most similar users. This is faster when there are many users. We try to implement this. An example of this implementation is done here with different data.

using the same data we have been playing with lets implement the KNN algorithm for recommendations.

KNN <- function(user, user_sim, viewed_mov,  k){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # top 5 users similar to user, set other users similarity to 0
  sim_peeps <- names(sort(user_sim[user,], decreasing = T))[1:k]
  sim_peeps
  user_sim[user,which(!colnames(user_sim) %in% sim_peeps)] <- 0
  
  # get scores 
  user_scores <- data.frame(title = colnames(viewed_mov), 
                            score = as.vector(user_sim[user,] %*% viewed_mov), 
                            seen = as.vector(viewed_mov[user,]))
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    filter(seen == 0) %>% 
    arrange(desc(score)) %>% 
    select(-seen)
}

# Using KNN Function
KNN(328, user_sim = user_similarities, viewed_mov = viewed_movies, k = 5)
##                                                       title     score
## 1                                    Green Mile, The (1999) 2.3316562
## 2   Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 1.5618559
## 3                                            Jumanji (1995) 1.5618559
## 4                       There's Something About Mary (1998) 1.5618559
## 5                           Clear and Present Danger (1994) 0.8401681
## 6 Interview with the Vampire: The Vampire Chronicles (1994) 0.7216878
## 7                               Sleepless in Seattle (1993) 0.7216878
## 8                             Star Trek: Generations (1994) 0.0000000
# Using Recommender Function with all users
user_based_recommendations(user = 328, user_sim = user_similarities, viewed_mov = viewed_movies)
##                                                       title     score
## 1                                    Green Mile, The (1999) 4.0734508
## 2                                            Jumanji (1995) 3.0242372
## 3   Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.8674803
## 4                       There's Something About Mary (1998) 2.2252517
## 5 Interview with the Vampire: The Vampire Chronicles (1994) 1.8953941
## 6                           Clear and Present Danger (1994) 1.3115726
## 7                               Sleepless in Seattle (1993) 1.1282141
## 8                             Star Trek: Generations (1994) 0.1178511

Item-based collaborative filtering

Here, we provide an alternative approach where we compute similarities between interests directly. We can then generate suggestions for each user by aggregating interests that are similar to her current interests.

Item-based collaborative filtering works very similarly to its user-based counterpart, although you might find it slightly less intuitive. It is also based on similarities, but similarities between movies rather than users.

There are two main conceptual parts to item-based collaborative filtering:

  1. One movie is similar to another if many of the same users have seen both movies.
  2. When deciding what movie to recommend to a particular user, movies are evaluated on how similar they are to movies that the user has already seen.

Essentially for item-based collaborative filtering: for each movie \(j\) not seen yet by user. We compute similarity between movie \(j\) and each movie already seen by user. Then we sum these up and pick the highest (the unseen movie that is most similar to those already seen).

In contrast to user-based where we recommend an unseen movie that has been seen by the most similar users. Now we recommend unseen movies that are most similar to movies I have already seen - item based CF.

Getting Similarity between Movies

Let’s start by computing the similarities between all pairs of movies. We can reuse the same code we used to compute user similarities, if we first transpose the viewed_movies matrix.

# transpose the viewed_movies matrix
movies_user <- t(viewed_movies)

# get all similarities between MOVIES
movie_similarities <- matrix(0, nrow = 20, ncol = 20)
for (i in 1:19) {
  for (j in (i + 1):20) {
    movie_similarities[i,j] <- cosine_sim(viewed_movies[,i], viewed_movies[,j])
  }
}
movie_similarities <- movie_similarities + t(movie_similarities)
diag(movie_similarities) <- 0
row.names(movie_similarities) <- colnames(viewed_movies)
colnames(movie_similarities) <- colnames(viewed_movies)

We can use the result to see, for example, what movies are most similar to “Apocalypse Now”:

sort(movie_similarities[,"Apocalypse Now (1979)"], decreasing = TRUE) %>% head(5)
##                Departed, The (2006)            Big Lebowski, The (1998) 
##                           0.7559289                           0.7378648 
##            Kill Bill: Vol. 2 (2004) There's Something About Mary (1998) 
##                           0.7071068                           0.6666667 
##     Ferris Bueller's Day Off (1986) 
##                           0.6299408

Recommending Movies for a Single User

Let’s again look at a concrete example of recommending a movie to a particular user, say user 372.

User 372 has seen the following movies:

which(viewed_movies["372", ] == 1)
## 2001: A Space Odyssey (1968)        Apocalypse Now (1979) 
##                            1                            2 
##          Shining, The (1980) 
##                           16

Another way of doing the same thing:

ratings_red %>% 
  filter(userId == 372) %>% 
  select(userId, title)
## # A tibble: 3 × 2
##   userId title                       
##    <int> <chr>                       
## 1    372 2001: A Space Odyssey (1968)
## 2    372 Apocalypse Now (1979)       
## 3    372 Shining, The (1980)

We now implement the main idea behind item-based filtering. For each movie, we find the similarities between that movie and each of the three movies user 372 has seen, and sum up those similarities. The resulting sum is that movie’s “recommendation score”.

We start by identifying the movies the user has seen:

user_seen <- ratings_red %>% 
        filter(userId == 372) %>% 
        select(title) %>% 
        unlist() %>% 
        as.character()
user_seen
## [1] "2001: A Space Odyssey (1968)" "Apocalypse Now (1979)"       
## [3] "Shining, The (1980)"

We then compute the similarities between all movies and these “seen” movies. For example, similarities for the first seen movie, 2001: A Space Odyssey are:

sort(movie_similarities[,user_seen[1]], decreasing = TRUE) %>% head(5)
##                                     Shining, The (1980) 
##                                               0.7977240 
##                                     Donnie Darko (2001) 
##                                               0.7559289 
##                                Big Lebowski, The (1998) 
##                                               0.7171372 
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 
##                                               0.6761234 
##         Harry Potter and the Philosopher's Stone (2001) 
##                                               0.6761234

We can do the same for each of the three seen movies or, more simply, do all three at once:

movie_similarities[,user_seen] %>% head(6)
##                                                         2001: A Space Odyssey (1968)
## 2001: A Space Odyssey (1968)                                               0.0000000
## Apocalypse Now (1979)                                                      0.5039526
## Big Lebowski, The (1998)                                                   0.7171372
## Bourne Identity, The (2002)                                                0.4629100
## Clear and Present Danger (1994)                                            0.2672612
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)                    0.6761234
##                                                         Apocalypse Now (1979)
## 2001: A Space Odyssey (1968)                                        0.5039526
## Apocalypse Now (1979)                                               0.0000000
## Big Lebowski, The (1998)                                            0.7378648
## Bourne Identity, The (2002)                                         0.5443311
## Clear and Present Danger (1994)                                     0.4714045
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)             0.2981424
##                                                         Shining, The (1980)
## 2001: A Space Odyssey (1968)                                      0.7977240
## Apocalypse Now (1979)                                             0.6030227
## Big Lebowski, The (1998)                                          0.7627701
## Bourne Identity, The (2002)                                       0.6154575
## Clear and Present Danger (1994)                                   0.4264014
## Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)           0.5393599

Each movie’s recommendation score is obtained by summing across columns, each column representing a seen movie:

# use 1 for columns
sort(apply(movie_similarities[, user_seen], 1, sum), decreasing = T) %>% head(5)
##                    Big Lebowski, The (1998) 
##                                    2.217772 
##             Ferris Bueller's Day Off (1986) 
##                                    1.885133 
##                    Kill Bill: Vol. 2 (2004) 
##                                    1.881231 
##                         Donnie Darko (2001) 
##                                    1.858952 
## Indiana Jones and the Temple of Doom (1984) 
##                                    1.777577

The preceding explanation hopefully makes the details of the calculations clear, but it is quite unwieldy. We can do all the calculations more neatly as:

user_scores <- tibble(title = row.names(movie_similarities), 
                      score = apply(movie_similarities[,user_seen], 1, sum),
                      seen = viewed_movies["372",])

user_scores %>% 
  filter(seen == 0) %>% 
  arrange(desc(score)) %>% head(5)
## # A tibble: 5 × 3
##   title                                       score  seen
##   <chr>                                       <dbl> <dbl>
## 1 Big Lebowski, The (1998)                     2.22     0
## 2 Ferris Bueller's Day Off (1986)              1.89     0
## 3 Kill Bill: Vol. 2 (2004)                     1.88     0
## 4 Donnie Darko (2001)                          1.86     0
## 5 Indiana Jones and the Temple of Doom (1984)  1.78     0

Again we will end up recommending “The Big Lebowski” to this particular user.

Let’s repeat the process to generate a recommendation for one more user, user 222:

# do for user 222
user <- "222"
user_seen <- ratings_red %>% 
  filter(userId == user) %>% 
  select(title) %>% 
  unlist() %>% 
  as.character()

user_scores <- tibble(title = row.names(movie_similarities), 
                      score = apply(movie_similarities[,user_seen],1,sum),
                      seen = viewed_movies[user,])

user_scores %>% 
  filter(seen == 0) %>% 
  arrange(desc(score))
## # A tibble: 13 × 3
##    title                                                     score  seen
##    <chr>                                                     <dbl> <dbl>
##  1 Bourne Identity, The (2002)                                4.18     0
##  2 Ferris Bueller's Day Off (1986)                            3.92     0
##  3 Shining, The (1980)                                        3.79     0
##  4 Donnie Darko (2001)                                        3.76     0
##  5 Harry Potter and the Philosopher's Stone (2001)            3.59     0
##  6 Green Mile, The (1999)                                     3.39     0
##  7 2001: A Space Odyssey (1968)                               3.38     0
##  8 Indiana Jones and the Temple of Doom (1984)                3.09     0
##  9 Interview with the Vampire: The Vampire Chronicles (1994)  2.89     0
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)    2.87     0
## 11 Clear and Present Danger (1994)                            2.39     0
## 12 Sleepless in Seattle (1993)                                2.04     0
## 13 Star Trek: Generations (1994)                              1.24     0

Here we see a different top recommendation (“The Bourne Identity”) to what was produced by the user-based system.

Function to Generate Item-Based Recommendation For Any User

# a function to generate an item-based recommendation for any user
item_based_recommendations <- function(user, movie_sim, viewed_mov){
  
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))
  
  # get scores
  user_seen <- row.names(movie_sim)[viewed_mov[user,] == TRUE]
  user_scores <- tibble(title = row.names(movie_sim), 
                        score = apply(movie_sim[,user_seen], 1, sum),
                        seen = viewed_mov[user,])
  
  # sort unseen movies by score and remove the 'seen' column
  user_scores %>% 
    filter(seen == 0) %>% 
    arrange(desc(score)) %>% 
    select(-seen)
}

Let’s check that its working with a user we’ve seen before, user 372:

item_based_recommendations(user = 372, movie_sim = movie_similarities, viewed_mov = viewed_movies)
## # A tibble: 17 × 2
##    title                                                     score
##    <chr>                                                     <dbl>
##  1 Big Lebowski, The (1998)                                  2.22 
##  2 Ferris Bueller's Day Off (1986)                           1.89 
##  3 Kill Bill: Vol. 2 (2004)                                  1.88 
##  4 Donnie Darko (2001)                                       1.86 
##  5 Indiana Jones and the Temple of Doom (1984)               1.78 
##  6 Departed, The (2006)                                      1.75 
##  7 Green Mile, The (1999)                                    1.73 
##  8 Harry Potter and the Philosopher's Stone (2001)           1.66 
##  9 Bourne Identity, The (2002)                               1.62 
## 10 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)   1.51 
## 11 Up (2009)                                                 1.49 
## 12 There's Something About Mary (1998)                       1.35 
## 13 Jumanji (1995)                                            1.22 
## 14 Clear and Present Danger (1994)                           1.17 
## 15 Interview with the Vampire: The Vampire Chronicles (1994) 1.16 
## 16 Sleepless in Seattle (1993)                               0.951
## 17 Star Trek: Generations (1994)                             0.333

And now do it for all users with lapply

lapply(sorted_my_users, item_based_recommendations, movie_similarities, viewed_movies) %>% head(5)
## [[1]]
## # A tibble: 14 × 2
##    title                                                     score
##    <chr>                                                     <dbl>
##  1 Departed, The (2006)                                      3.39 
##  2 Kill Bill: Vol. 2 (2004)                                  3.39 
##  3 Ferris Bueller's Day Off (1986)                           3.38 
##  4 2001: A Space Odyssey (1968)                              3.22 
##  5 Bourne Identity, The (2002)                               3.13 
##  6 Donnie Darko (2001)                                       2.91 
##  7 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)   2.63 
##  8 Harry Potter and the Philosopher's Stone (2001)           2.61 
##  9 Up (2009)                                                 2.61 
## 10 There's Something About Mary (1998)                       2.40 
## 11 Jumanji (1995)                                            2.24 
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.48 
## 13 Sleepless in Seattle (1993)                               1.35 
## 14 Star Trek: Generations (1994)                             0.333
## 
## [[2]]
## # A tibble: 16 × 2
##    title                                                     score
##    <chr>                                                     <dbl>
##  1 Donnie Darko (2001)                                       2.36 
##  2 Kill Bill: Vol. 2 (2004)                                  2.25 
##  3 Departed, The (2006)                                      2.23 
##  4 2001: A Space Odyssey (1968)                              2.12 
##  5 There's Something About Mary (1998)                       2.12 
##  6 Shining, The (1980)                                       2.06 
##  7 Big Lebowski, The (1998)                                  2.04 
##  8 Ferris Bueller's Day Off (1986)                           1.94 
##  9 Green Mile, The (1999)                                    1.93 
## 10 Up (2009)                                                 1.90 
## 11 Apocalypse Now (1979)                                     1.83 
## 12 Interview with the Vampire: The Vampire Chronicles (1994) 1.69 
## 13 Indiana Jones and the Temple of Doom (1984)               1.53 
## 14 Sleepless in Seattle (1993)                               1.22 
## 15 Clear and Present Danger (1994)                           1.21 
## 16 Star Trek: Generations (1994)                             0.408
## 
## [[3]]
## # A tibble: 13 × 2
##    title                                           score
##    <chr>                                           <dbl>
##  1 Departed, The (2006)                             3.93
##  2 Bourne Identity, The (2002)                      3.88
##  3 Harry Potter and the Philosopher's Stone (2001)  3.75
##  4 Ferris Bueller's Day Off (1986)                  3.73
##  5 Apocalypse Now (1979)                            3.68
##  6 Green Mile, The (1999)                           3.62
##  7 Jumanji (1995)                                   3.21
##  8 Indiana Jones and the Temple of Doom (1984)      3.21
##  9 There's Something About Mary (1998)              3.13
## 10 Up (2009)                                        3.12
## 11 Sleepless in Seattle (1993)                      2.08
## 12 Clear and Present Danger (1994)                  2.06
## 13 Star Trek: Generations (1994)                    0.5 
## 
## [[4]]
## # A tibble: 14 × 2
##    title                                                     score
##    <chr>                                                     <dbl>
##  1 Donnie Darko (2001)                                        3.49
##  2 Kill Bill: Vol. 2 (2004)                                   3.44
##  3 Ferris Bueller's Day Off (1986)                            3.39
##  4 Bourne Identity, The (2002)                                3.31
##  5 Apocalypse Now (1979)                                      3.24
##  6 Departed, The (2006)                                       3.13
##  7 Harry Potter and the Philosopher's Stone (2001)            3.12
##  8 Clear and Present Danger (1994)                            2.62
##  9 Up (2009)                                                  2.55
## 10 Jumanji (1995)                                             2.26
## 11 There's Something About Mary (1998)                        2.20
## 12 Interview with the Vampire: The Vampire Chronicles (1994)  1.97
## 13 Sleepless in Seattle (1993)                                1.44
## 14 Star Trek: Generations (1994)                              0   
## 
## [[5]]
## # A tibble: 14 × 2
##    title                                                     score
##    <chr>                                                     <dbl>
##  1 Kill Bill: Vol. 2 (2004)                                   3.73
##  2 Donnie Darko (2001)                                        3.68
##  3 Departed, The (2006)                                       3.61
##  4 Bourne Identity, The (2002)                                3.49
##  5 Apocalypse Now (1979)                                      3.37
##  6 Indiana Jones and the Temple of Doom (1984)                3.20
##  7 Green Mile, The (1999)                                     3.07
##  8 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)    2.92
##  9 Jumanji (1995)                                             2.47
## 10 There's Something About Mary (1998)                        2.43
## 11 Clear and Present Danger (1994)                            2.04
## 12 Interview with the Vampire: The Vampire Chronicles (1994)  1.78
## 13 Sleepless in Seattle (1993)                                1.44
## 14 Star Trek: Generations (1994)                              0

Matrix Factorization

Latent Matrix Factorization is an incredibly powerful method to use when creating a Recommender System. Ever since Latent Matrix Factorization was shown to outperform other recommendation methods in the Netflix Recommendation contest, its been a cornerstone in building Recommender Systems.

Latent Matrix Factorization is an algorithm tackling the Recommendation Problem: Given a set of m users and n items, and set of ratings from user for some items, try to recommend the top items for each user

So far we have seen that we can represent our users’ preferences as a matrix of 0s and 1s, where the 1s represent “liked” items and the 0s “unliked” items. Sometimes we might actually have numeric ratings; for example, when you write an Amazon review you assign the item a score ranging from 1 to 5 stars. You could still represent these by numbers in a matrix (ignoring for now the problem of what to do about unrated items).

In this section we’ll assume we have such ratings data and try to learn a model that can predict the rating for a given user and item.

In this section we’re going to look at a different way of doing collaborative filtering, one based on the idea of matrix factorization, a topic from linear algebra.

Matrix factorization, also called matrix decomposition, takes a matrix and represents it as a product of other (usually two) matrices. There are many ways to do matrix factorization, and different problems tend to use different methods. Factorization often involves finding underlying latent factors containing information about the data set.

In recommendation systems, matrix factorization is used to decompose the ratings matrix into the product of two matrices. This is done in such a way that the known ratings are matched as closely as possible.

The key feature of matrix factorization for recommendation systems is that while the ratings matrix is incomplete (i.e. some entries are blank), the two matrices the ratings matrix is decomposed into are complete (no blank entries). This gives a straightforward way of filling in blank spaces in the original ratings matrix, as we’ll see.

Its actually easier to see the underlying logic and calculations in a spreadsheet setting; an additional excel file is provided using ratings matrix as a .csvfile.

# get ratings in wide format
ratings_wide <- ratings_red %>% 
  select(userId,title,rating) %>% 
  complete(userId, title) %>% 
  spread(key = title, value = rating)

# convert data to matrix form 
sorted_my_users <- as.character(unlist(ratings_wide[,1]))
ratings_wide <- as.matrix(ratings_wide[,-1])
row.names(ratings_wide) <- sorted_my_users

# save as csv for Excel demo
write.csv(ratings_wide,"output/ratings_for_excel_example.csv")

Note that ratings_wide is a matrix where each row pertains to a user and each column is a movie. The elements are the ratings provided by each user for the movie they have watched. NA is filled in for movies that the user has not watched (or perhaps given a review, but for now we shall assume every movie watched has been reviewed).

Create Objective Function

We start by defining a function that will compute the sum of squared differences between the observed movie ratings and any other set of predicted ratings (for example, ones predicted by matrix factorization). Note that we only count movies that have already been rated in the accuracy calculation.

recommender_accuracy <- function(x, observed_ratings){
    
  # extract user and movie factors from parameter vector (note x is defined such that 
  # the first 75 elements are latent factors for users and rest are for movies)
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  
  # get predictions from dot products of respective user and movie factor
  predicted_ratings <- user_factors %*% movie_factors
  
  # model accuracy is sum of squared errors (SSE) over all rated movies
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  sqrt(mean(errors[!is.na(observed_ratings)]))   # only use rated movies
}

We note that this function isn’t general, because it refers specifically to a ratings matrix with 15 users, 20 movies, and 5 latent factors.

We’ll now optimize the values in the user and movie latent factors, choosing them so that the root mean square error We have done this using R’s inbuilt numerical optimizer optim(), with the default “Nelder-Mead” method. There are better ways to do this. Can try different methods.

set.seed(10)

# optimization step
rec1 <- optim(par = runif(175), recommender_accuracy, 
            observed_ratings = ratings_wide, control = list(maxit = 100000))
paste("Did optimizer converge: ", ifelse(rec1$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge:  No"
paste("Objective Function value: ", round(rec1$value,4))
## [1] "Objective Function value:  0.3431"

The best value of the objective function found by optim() after 100000 iterations is 0.343, but note that it hasn’t converged yet, so we should really run for longer or try another optimizer. Ignoring this for now, we can extract the optimal user and movie factors. With a bit of work, these can be interpreted and often give useful information. However, we do not cover that and is above the scope of this article.

Extract the optimal user factors:

# extract optimal user factors
user_factors <- matrix(rec1$par[1:75], 15, 5)
head(user_factors)
##            [,1]      [,2]      [,3]        [,4]         [,5]
## [1,]  1.8704994 1.1994809 0.7636127 -0.92636218 -1.200934246
## [2,]  2.3724241 0.3416254 0.3262135 -0.49322154  1.889202017
## [3,]  1.1111085 1.8394120 0.5013395 -1.02570560  0.328460876
## [4,]  0.4730928 0.7980941 1.9927642 -1.44234539 -0.006360044
## [5,] -0.7133853 1.6069591 1.7957153  0.09586535  1.138985480
## [6,] -0.9429216 2.1016100 1.2595030  1.24631247  1.399925211

Extract the optimal movie factors:

# extract optimal movie factors
movie_factors <- matrix(rec1$par[76:175], 5, 20)
head(movie_factors)
##             [,1]       [,2]      [,3]      [,4]       [,5]       [,6]
## [1,]  0.45213586  0.5664772 1.5820719 0.1568805  1.6091279  0.4922459
## [2,]  1.24068067  3.2651238 1.6937969 2.3894585 -2.4668102 -0.1371224
## [3,]  1.38205926 -1.7512861 0.5652246 0.8756689  0.8256912 -0.2268436
## [4,] -0.09118251  0.6880326 0.3485811 0.9556278 -3.0262817 -2.2203910
## [5,] -0.17154368 -1.0659075 0.2554457 1.0377295 -0.5186627  0.7401770
##             [,7]       [,8]      [,9]       [,10]       [,11]      [,12]
## [1,]  1.17595084 -0.7182418 1.2398087  1.27433003  0.58822856  0.5804220
## [2,]  1.26457554  1.2561888 1.5108612  0.45180827  1.69939470  1.0047280
## [3,]  1.06426140 -1.9826785 0.4479486  1.64782246 -0.28835189  0.2869269
## [4,] -0.08885995 -2.0247182 0.5343086 -0.67048311  0.04518522 -0.9562937
## [5,]  0.98989633  2.5735647 0.1750985  0.07328022  1.38497948 -1.4930700
##            [,13]      [,14]      [,15]     [,16]       [,17]      [,18]
## [1,]  0.89899374  1.3108956  1.1921026 1.0087201  1.85861590  1.7595161
## [2,]  0.07560978  0.7372890  0.9595835 1.6531187 -0.27566768 -1.0703908
## [3,]  1.79849298  1.1676913  1.4426121 0.8330061 -0.01684526 -2.0657576
## [4,] -2.03705529  0.8482193 -0.2506995 0.2437435  0.62911225  0.3398119
## [5,] -0.73585067 -0.2217906  1.2143087 0.9179228 -1.02828691 -1.5899725
##           [,19]      [,20]
## [1,]  1.8285378 -0.5284306
## [2,]  1.7410831  1.6426011
## [3,] -0.6007035  0.1925521
## [4,]  0.6548961 -0.7082477
## [5,]  0.7074759 -0.4665972

Most importantly, we can get predicted movie ratings for any user, by taking the appropriate dot product of user and movie factors. Here we show the predictions for user 1:

# check predictions for one user
predicted_ratings <- user_factors %*% movie_factors
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  3.7  4.3  4.8  1.7  4.1  1.8  3.4 -2.6  3.8   4.7   1.2   5.2   5.9   3.7
## [2,]   NA  4.0  5.0   NA  4.0   NA   NA   NA   NA   5.0    NA   5.0    NA    NA
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]   3.3   3.2   3.8     2   3.6   2.3
## [2,]    NA   3.0    NA    NA    NA    NA
# Create Function to get Predicted Rating for any user
get_predictions <- function(user, movie_name = NULL){
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))

  # set names of matrices
  rownames(predicted_ratings) <- rownames(viewed_movies)
  colnames(predicted_ratings) <- colnames(viewed_movies)
  
  # how to get results
  if (!is.null(movie_name)){
    res <- rbind(round(predicted_ratings[user,movie_name], 1), as.numeric(ratings_wide[user,movie_name]))
  }
  else{res <- rbind(round(predicted_ratings[user,], 1), as.numeric(ratings_wide[user,]))}

  # show result
  rownames(res) <- c("Predicted", "Actual")
  return(res)
}

# Try Function
get_predictions(user=187)
##           2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted                          3.5                   4.7
## Actual                             4.0                    NA
##           Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted                      4.9                         4.4
## Actual                         5.0                          NA
##           Clear and Present Danger (1994)
## Predicted                             0.6
## Actual                                 NA
##           Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted                                                     2.7
## Actual                                                        3.0
##           Departed, The (2006) Donnie Darko (2001)
## Predicted                  4.6                 3.4
## Actual                      NA                 3.5
##           Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted                             3.9                    3.8
## Actual                                 NA                     NA
##           Harry Potter and the Philosopher's Stone (2001)
## Predicted                                               4
## Actual                                                 NA
##           Indiana Jones and the Temple of Doom (1984)
## Predicted                                         3.1
## Actual                                             NA
##           Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted                                                       3.9
## Actual                                                          3.5
##           Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted            2.5                      4.5                 4.6
## Actual                NA                      4.0                 4.5
##           Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted                         0.6                          -1.9
## Actual                             NA                            NA
##           There's Something About Mary (1998) Up (2009)
## Predicted                                 4.5       3.1
## Actual                                     NA        NA
get_predictions(user=187)
##           2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted                          3.5                   4.7
## Actual                             4.0                    NA
##           Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted                      4.9                         4.4
## Actual                         5.0                          NA
##           Clear and Present Danger (1994)
## Predicted                             0.6
## Actual                                 NA
##           Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted                                                     2.7
## Actual                                                        3.0
##           Departed, The (2006) Donnie Darko (2001)
## Predicted                  4.6                 3.4
## Actual                      NA                 3.5
##           Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted                             3.9                    3.8
## Actual                                 NA                     NA
##           Harry Potter and the Philosopher's Stone (2001)
## Predicted                                               4
## Actual                                                 NA
##           Indiana Jones and the Temple of Doom (1984)
## Predicted                                         3.1
## Actual                                             NA
##           Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted                                                       3.9
## Actual                                                          3.5
##           Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted            2.5                      4.5                 4.6
## Actual                NA                      4.0                 4.5
##           Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted                         0.6                          -1.9
## Actual                             NA                            NA
##           There's Something About Mary (1998) Up (2009)
## Predicted                                 4.5       3.1
## Actual                                     NA        NA

Adding L2 regularization

One trick that can improve the performance of matrix factorization collaborative filtering is to add L2 regularization. L2 regularization adds a penalty term to the function that we’re trying to minimize, which penalizes large parameter values.

We first rewrite the evaluate_fit function to make use of L2 regularization:

## adds L2 regularization, often improves accuracy

evaluate_fit_l2 <- function(x, observed_ratings, lambda){
  
  # extract user and movie factors from parameter vector
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  
  # get predictions from dot products
  predicted_ratings <- user_factors %*% movie_factors
  
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  # L2 norm penalizes large parameter values
  penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
  
  # model accuracy contains an error term and a weighted penalty 
  accuracy <- sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
  
  return(accuracy)
}

We now rerun the optimization with this new evaluation function:

set.seed(10)
# optimization step
rec2 <- optim(par = runif(175), evaluate_fit_l2, 
            lambda = 3e-2, observed_ratings = ratings_wide, control = list(maxit = 100000))

paste("Did optimizer converge:", ifelse(rec2$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge: No"
paste("Objective Function value:", round(rec2$value,4))
## [1] "Objective Function value: 0.7424"

The best value found is worse than before, but remember that we changed the objective function to include the L2 penalty term, so the numbers are not comparable. We need to extract just the RMSE that we’re interested in. To do that we first need to extract the optimal parameter values (user and movie factors), and multiply these matrices together to get predicted ratings. From there, its easy to calculate the errors.

# extract optimal user and movie factors
user_factors <- matrix(rec2$par[1:75], 15, 5)
movie_factors <- matrix(rec2$par[76:175], 5, 20)

# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors

# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2 
paste("Accuracy:", sqrt(mean(errors[!is.na(ratings_wide)])))
## [1] "Accuracy: 0.280483618973422"

Compare this with what we achieved without L2 regularization: did it work? Indeed. As before, we can extract user and movie factors, and get predictions for any user.

# check predictions for one user
rbind(round(predicted_ratings[1,],1), as.numeric(ratings_wide[1,]))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  4.2  3.7  4.8  4.3  4.1  4.4  5.2  5.7    4     5   0.9     5     5  -0.8
## [2,]   NA  4.0  5.0   NA  4.0   NA   NA   NA   NA     5    NA     5    NA    NA
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]   1.1     3     6    -2   2.3   3.8
## [2,]    NA     3    NA    NA    NA    NA

We can see that the predicted ratings, top row is very close and similar to the observed ratings.

Adding bias terms

Bias terms are additive factors that model the fact that some users are more generous than others (and so will give higher ratings, on average) and some movies are better than others (and so will get higher ratings, on average).

Let’s adapt our evaluation function further to include bias terms for both users and movies:

## add an additive bias term for each user and movie

evaluate_fit_l2_bias <- function(x, observed_ratings, lambda){
  # extract user and movie factors and bias terms from parameter vector
  user_factors <- matrix(x[1:75], 15, 5)
  movie_factors <- matrix(x[76:175], 5, 20)
  # the bias vectors are repeated to make the later matrix calculations easier 
  user_bias <- matrix(x[176:190],nrow = 15, ncol = 20)
  movie_bias <- t(matrix(x[191:210], nrow = 20, ncol = 15))
  
  # get predictions from dot products + bias terms
  predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias
  
  errors <- (observed_ratings - predicted_ratings) ^ 2 
  
  # L2 norm penalizes large parameter values (note not applied to bias terms)
  penalty <- sqrt(sum(user_factors ^ 2, movie_factors ^ 2))
  
  # model accuracy contains an error term and a weighted penalty 
  sqrt(mean(errors[!is.na(observed_ratings)])) + lambda * penalty
}

Again, rerun the optimization:

set.seed(10)
# optimization step (note longer parameter vector to include bias)
rec3 <- optim(par = runif(220), evaluate_fit_l2_bias,
              observed_ratings = ratings_wide, lambda = 3e-2, control = list(maxit = 100000))

paste("Did optimizer converge:", ifelse(rec3$convergence ==1, "No", "Yes"))
## [1] "Did optimizer converge: No"
paste("Objective Function value:", round(rec3$value,4))
## [1] "Objective Function value: 0.499"

This value isn’t comparable to either of the previous values, for the same reason as before: the objective function has changed to include bias terms. Extracting just the RMSE:

# extract optimal user and movie factors and bias terms
user_factors <- matrix(rec3$par[1:75], 15, 5)
movie_factors <- matrix(rec3$par[76:175], 5, 20)
user_bias <- matrix(rec3$par[176:190], nrow = 15, ncol = 20)
movie_bias <- t(matrix(rec3$par[191:210], nrow = 20, ncol = 15))

# get predicted ratings
predicted_ratings <- user_factors %*% movie_factors + user_bias + movie_bias

# check accuracy
errors <- (ratings_wide - predicted_ratings) ^ 2 

paste("Accuracy:", sqrt(mean(errors[!is.na(ratings_wide)])))
## [1] "Accuracy: 0.178432511702095"

This is indeed an improvement over what we’ve seen before (at least, for the parameter settings above!).

We can examine and interpret the user or movie latent factors, or bias terms, if we want to. Below we show the movie bias terms, which gives some reflection of movie quality (with some notable exceptions!)

data.frame(movies = colnames(viewed_movies), bias = movie_bias[1,]) %>% arrange(desc(bias)) %>% head(5)
##                                                    movies     bias
## 1                         Clear and Present Danger (1994) 3.384641
## 2                                  Green Mile, The (1999) 2.668923
## 3                         Ferris Bueller's Day Off (1986) 2.192063
## 4 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) 2.111422
## 5                                Kill Bill: Vol. 2 (2004) 1.920053

Finally, we again get predicted ratings for one user:

# check predictions for one user
rbind(round(predicted_ratings[1,], 1), as.numeric(ratings_wide[1,]))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  1.6    4  5.1  3.1    4  2.7  2.9  4.3    6     5   4.4   4.9   4.5   3.4
## [2,]   NA    4  5.0   NA    4   NA   NA   NA   NA     5    NA   5.0    NA    NA
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]   3.2     3   5.7   1.6   4.8   3.6
## [2,]    NA     3    NA    NA    NA    NA
# Create Function to get Predicted Rating for any user
get_predictions <- function(user, movie_name = NULL){
  # turn into character if not already
  user <- ifelse(is.character(user), user, as.character(user))

  # set names of matrices
  rownames(predicted_ratings) <- rownames(viewed_movies)
  colnames(predicted_ratings) <- colnames(viewed_movies)
  
  # how to get results
  if (!is.null(movie_name)){
    res <- rbind(round(predicted_ratings[user,movie_name], 1), as.numeric(ratings_wide[user,movie_name]))
  }
  else{res <- rbind(round(predicted_ratings[user,], 1), as.numeric(ratings_wide[user,]))}

  # show result
  rownames(res) <- c("Predicted", "Actual")
  return(res)
}

# Try Function
get_predictions(user=20)
##           2001: A Space Odyssey (1968) Apocalypse Now (1979)
## Predicted                          3.4                   1.8
## Actual                              NA                    NA
##           Big Lebowski, The (1998) Bourne Identity, The (2002)
## Predicted                      0.1                         2.9
## Actual                          NA                         3.0
##           Clear and Present Danger (1994)
## Predicted                             3.4
## Actual                                 NA
##           Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## Predicted                                                     3.5
## Actual                                                        3.5
##           Departed, The (2006) Donnie Darko (2001)
## Predicted                  1.6                 1.1
## Actual                      NA                  NA
##           Ferris Bueller's Day Off (1986) Green Mile, The (1999)
## Predicted                            -0.4                    4.3
## Actual                                 NA                     NA
##           Harry Potter and the Philosopher's Stone (2001)
## Predicted                                             4.5
## Actual                                                4.5
##           Indiana Jones and the Temple of Doom (1984)
## Predicted                                         0.6
## Actual                                             NA
##           Interview with the Vampire: The Vampire Chronicles (1994)
## Predicted                                                      -0.9
## Actual                                                           NA
##           Jumanji (1995) Kill Bill: Vol. 2 (2004) Shining, The (1980)
## Predicted            3.2                      4.7                 2.2
## Actual               3.0                       NA                  NA
##           Sleepless in Seattle (1993) Star Trek: Generations (1994)
## Predicted                         0.7                           0.2
## Actual                             NA                            NA
##           There's Something About Mary (1998) Up (2009)
## Predicted                                 4.2      -0.2
## Actual                                     NA        NA

Additional Work

There are a few places in the notebook where we can build upon:

  1. Adapt the pairwise similarity function so that it doesn’t use loops.
  2. Display the output of the user-based and item-based recommendations in single matrices.
  3. Implement a k-nearest-neighbours version of item-based collaborative filtering.
  4. Adapt the recommender_accuracy() function so that it can be used with an arbitrary number of users and movies.
  5. Experiment with the optimizers used in the matrix factorization collaborative filter.

A note on Collabative Filterig

  • mean centering is a good idea

  • you can get a predictive result for these systems by scaling the similarity index so that it sums to one. And your predictions are the weighted sum of the observed ratings from those users that have given the rating.

A note on validation for recommender systems

  • cant really keep users aside for “test data” as this is not the case when you actually apply the system, as all users will be availble. More over, youy will have to create seperate matrices to calucluate the kept aside users.

  • randomly remove a proportion of the ratings. And leave them aside for testing.