Recommendation systems are becoming increasingly important in today’s extremely busy world. People are always short on time with the myriad tasks they need to accomplish in the limited 24 hours. Therefore, the recommendation systems are important as they help them make the right choices, without having to expend their cognitive resources.
The purpose of a recommendation system basically is to search for content that would be interesting to an individual. Moreover, it involves a number of factors to create personalised lists of useful and interesting content specific to each user/individual. Recommendation systems are Artificial Intelligence based algorithms that skim through all possible options and create a customized list of items that are interesting and relevant to an individual. These results are based on their profile, search/browsing history, what other people with similar traits/demographics are watching, and how likely are you to watch those movies. This is achieved through predictive modeling and heuristics with the data available.
Use-cases of Recommendation systems
Recommendations are not a new concept. Even when e-commerce was not that prominent, the sales staff in retail stores recommended items to the customers for the purpose of upselling and cross-selling, and ultimately maximise profit. The aim of recommendation systems is just the same.
Another objective of the recommendation system is to achieve customer loyalty by providing relevant content and maximising the time spent by a user on your website or channel. This also helps in increasing customer engagement.
On the other hand, ad budgets can be optimized by showcasing products and services only to those who have a propensity to respond to them.
Here is a video explaining the kind of recommendation systems used by BigBasket and Netflix:
Why Recommendation systems?
– They help the user find items of their interest
– Helps the item provider to deliver their items to the right user
– To identify the most relevant products for each user
– Showcase personalised content to each user
– Suggest top offers and discounts to the right user
– Websites can improve user-engagement
– It increases revenues for business through increased consumption
If you are new to Machine Learning, read "What is Machine Learning?"
What can be recommended?
– Advertising Messages
– Music Tracks
– News Articles
– Future Friends (Social Network Sites)
– Courses in e-learning
– Research Papers
– Investment Choices
– TV Programs
– Online Mates (Dating Services)
– Supermarket Goods
Here are some of the examples of the pioneers in creating algorithms for recommendation systems and using them to serve their customers better in a personalized manner. These are:
– Helped in developing initial recommender systems by pioneering collaborative filtering model
– It also provided many data-sets to train models including MovieLens and BookLens
– Implemented commercial recommender systems
– They also implemented a lot of computational improvements
– Pioneered Latent Factor/ Matrix Factorization models
– Hybrid Recommendation Systems
– Deep Learning based systems
– Social Network Recommendations
Various types of recommendation systems are:
– Popularity based recommendation systems
– Classification model based
– Content based recommendations
– Nearest neighbour collaborative filtering
– Hybrid Approaches
– Association rule mining
– Deep Learning based recommendation systems
Popularity based recommendation system
Let us take an example of a website that streams movies. The website is in its nascent stage and has listed all the movies for the users to search and watch. What the website misses here is a recommendation system. This results in users browsing through a long list of movies, with no suggestions about what to watch. This, in turn, reduces the propensity of a user to engage with the website and use its services. Therefore, the simplest way to fix this issue is to use a popularity based recommendation system. Top review websites like IMDb and Rotten Tomatoes maintain a database of movies and their popularity in terms of reviews and ratings. Utilising this data to recommend the most popular movies to users based on their star ratings, could increase their content consumption.
The popularity-based recommendation system eliminates the need for knowing other factors like user browsing history, user preferences, the star cast of the movie, genre, and other factors. Hence, the single-most factor considered is the star rating to generate a scalable recommendation system. This increases the chances of user engagement as compared to when there was no recommendation system.
Demerits of the popularity based recommendation system
– Recommendations are not personalized as per user attributes and all users see the same recommendations irrespective of their preferences
– Another problem is that the number of reviews (which reflects the number of people who have viewed the movie) will vary for each movie and hence the average star rating will have discrepancies.
– The system doesn’t take into account the regional and language preferences and might recommend movies in languages that a regional dialect speaking individual might not understand
A popularity based recommendation system when tweaked as per the needs, audience, and business requirement, it becomes a hybrid recommendation system. Additional logic is added to include customization as per the business needs.
How to build a popularity based recommendation system in Python?
For this exercise, we will consider the MovieLens small dataset, and focus on two files, i.e., the movies.csv and ratings.csv.
Movies.csv has three fields namely:
- MovieId – It has a unique id for every movie
- Title – It is the name of the movie
- Genre – The genre of the movie
The ratings.csv file has four fields namely:
- Userid – The unique id for every user who has rated one or multiple movies
- MovieId – The unique id for each movie
- Rating – The rating given to a user to a movie
- Timestamp – When was the rating given to a specific movie
The primary key here is the movieId which is common in both data files. This key makes it possible to join both these files.
Now, let us have a look at our Python code for popularity based recommendation system.
Step 1: Include the following packages to allow using functions defined under those packages. The cell will include:
– Import os
– Import numpy as np
– Import pandas as pd
Step 2: Change the working directory and replace it with where your dataset is stored (In )
Step 3: Read the ratings file with the below command into the local variable ratings_data. ‘.head’ shows you the top five records in the data set. Also, you can see that we are using the pandas library in this cell which we had called earlier.
Similarly, read the movies file as below
Step 4: Merge the two data variables, ratings_data, and movie_names together by calling merge function from the pandas library on the column movieId. This gives a new data frame ‘movie_data’.
Print the movie_data head and you can have a look at the format this new variable appears in.
Step 5: Next, compute the average rating for each movie using ‘groupby’. The function defined here takes the ‘mean’ of all the ‘ratings’ given for a specific ‘title’ and displays the first 5 results.
Step 6: Finally, sort these average rating values in the descending order to figure out the most popular movies.
Now, as mentioned earlier, a large number of users might be reviewing and rating certain movies. While as low as just one user might be rating the other movies. In such cases, some less popular movies can make it to the recommendation list and some of the more popular movies do not make it to the recommendation list. To avoid this bias, one can add a rule to better judge the popularity of a movie.
Moreover, newer movies could be more popular than the older ones even though the average ratings might suggest otherwise. In such cases, extra weight could be added to the rating values of the recently released movies to push them up in the recommendation list.
There are two types of collaborative filtering, namely:
- User – user collaborative filtering
- Item – item collaborative filtering
Let us understand this type of recommendation system with the help of an example. Say there are two users A and B.
Now, each of these users watched a number of movies and rated them as below:
|User A||User B|
Here, we can see that both A and B have two movies common and both have rated these movies in a similar manner. Hence, one can assume that both these users emanate similar characteristics and would like to see similar movies as each other.
Here, the recommendation system will recommend movies 1, 2, and 5 (if rated high) to user B because user A has watched them. Similarly, movies 6, 7, and 8 (if rated high) will be recommended to user A, (if rated high) because user B has watched them. This is an example of user-user collaborative filtering.
Measuring the similarity between users
One can measure the similarity between two users in different ways. A simple way would be to apply Pearson’s correlation to the common items. If the result is positively and highly correlated then the movies watched and liked by user A can be recommended to user B and vice-versa. On the other hand, if the correlation is negative then there is nothing to be recommended as the two users are not alike.
Limitations of user-user collaborative filtering
- A user might be watching a specific niche type of movies that nobody else is watching. Hence there are no similar profiles resulting in no recommendations.
- In case of a new movie, there are not enough user ratings to match
- In the case of a new user, there are not many movies that the user has watched or rated. Hence, it is difficult to map these users to similar users.
How to build a user-user collaborative filtering recommendation system in Python?
The library function used in order to get user-user collaborative filtering is ‘K nearest neighbours with means. It is a part of a library ‘surprise’, which stands for a simple python library for recommendation systems.
‘Surprise’ also consists of a sub-library called ‘dataset’ which includes some free datasets available to work on. It eliminates the need for downloading datasets from other sources. Another function that is included here is ‘train-test-split’. A portion of the data will be utilized for learning what needs to be recommended and another smaller portion to test the performance of the recommendation system.
Step 2: Load the inbuilt dataset ‘ml-100k’ and call it data. Split this data into two parts, i.e., 85% for training and 15% for testing.
Step 3: Apply KNNWithMeans as below. Then, fit the algorithm to the training set.
Here the function will display the 50 closest neighbours to a user which have rated the movies in a very similar way as the user being considered. The algorithm identifies these neighbours using ‘pearson_baseline’. This step accomplishes the training of the model.
Moreover, this model can also predict a rating that a user might give to a movie that he or she has not watched yet. Select a specific user id say 196, and a specific movie id say 302, which user 196 has not watched. Now, we can now predict the rating user will give to this movie with the help of the ‘algo’ defined above.
The model finds the nearest 50 neighbours and selects the ratings provided by these users for the movie 302. The average of these ratings is the predicted rating that the user 196 might give to the movie.
Also, the higher predicted rating means that the movie can be recommended to the user and he or she is much likely to click on it and watch.
Step 4: Test the model by passing test dataset (‘testset’) through the model (‘algo’) defined above. This will now predict the rating provided by each user for each movie in the data set.
In case the user or the movie is very new, we do not have many records to predict results. In such cases, the last value in the prediction will appear as ‘was_impossible’: True.
Step 5: Finally, measure the performance of the recommendation system by comparing predicted values and original rating values. Here we will calculate the ‘RMSE’ (root mean squared error) value.
In this case, the RMSE value is 0.9313, which one can judge if it is good or bad depending on the size of the dataset.
Here is the complete masterclass for you on movie recommendation system. Leave in your questions in comments and we would be glad to answer those for you.
These are two examples of recommendation systems and their implementation in Python. Follow our blog to know more about other Machine Learning applications and their implementations.