Jokes to recommend-Popularity based filtering
Introduction:
Personally I watch a lot of comedy movies and comedy shows ,as jokes makes us happy even if our day didn’t go well . Thats the reason for choosing this dataset and tried to use the recommendation system in this.
Recommender systems are the systems that are designed to recommend things to the user supported many various factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to.
The recommender system deals with a large volume of information present by filtering the most important information based on the data provided by a user and other factors that take care of the user’s preference and interest. It finds out the match between user and item and imputes the similarities between users and items for recommendation.
Both the users and therefore the services provided have benefited from these sorts of systems.
Popularity based recommendation system:
Easiest way to create a recommendation system is popularity based, simply over all the products that are popular, So the way to identify popular products, which might be identified by which are all the products that are bought most,
Example, In shopping store we will suggest popular dresses by purchase count.
Source of Data :
This is a predefined dataset from kaggle (jesterfinal151cols.csv)and downloaded and kept in my system for further analysis.
Jester may be a joke recommender system developed at UC Berkeley to review social information filtering. Users of the system are presented a joke then they rate them. This dataset is a collection of those ratings.
About the Dataset:
- Each row may be a user (Row 1 = User #1)
- Each column may be a joke (Column 1 = Joke #1)
- Ratings are given as real values from -10.00 to +10.00
99 corresponds to a null rating - As of May 2009, the jokes 7, 8, 13, 15, 16, 17, 18, 19 are the “gauge set”
Importing the dateset
First step in fetching the dataset is to import libraries using the necessary libraries.The following code describes the libraries used in order to perform the recommender system.
Data Cleaning :
After understanding the dataset , we need to clean the dataset . First we need to remove any null values.
From the result it was found that there were 151 rows with null values.
Data Preprocessing
As the dataset contains no column headers.After analyzing the first column is user id and subsequent columns are Joke ratings for 150 jokes.Also there are NaN values towards the top of the info .
Things to do:
- Add column headers
- All other Joke rating columns would be renamed to 1–150
- 0th column would be user_id
- Some rows contain NaN values, replace them as 0
- Many ratings are 99.0 such jokes were not rated by user, replace them as 0.
After this we need to convert all 151 columns into a range of 0 to 150 .
Since the first column as discussed before denotes the userid , we need to rename the 0th column .
Now the next step is to replace all NaN values to 0 and 99.0 to 0 as well .
Some these ratings are as high as 6.9 while some are -9.68. Lets normalize only ratings columns using Standard Scalar, the idea behind this is to transform your data such that it’s distribution will have mean of 0 and standard deviation of 1. Standard scaler aligns it into a Gaussian or Normal disctribution.
To do — Extract ratings and Fit Standard Scaler into Ratings
Fitting standard scalar into ratings .
Recommend Popular Jokes
Recommend the top n most popular jokes using mean ratings.
Things to do:
- Find mean rating for all the jokes
- Mean rating is an array that must be converted into Dataframe for sort into descending order
- Recommend top n popular jokes
Next is to convert this array into a dataframe and renaming column name for better readability.
Final step is to recommend the most popular jokes among all . So we are considering the top 10 jokes of all time and displaying the output .
Conclusion :
Through the use of popularity based filtering I was able to filter the top jokes based on poupularity . Popularity was basically determined by the user ratings.Therefore I was able to learn a new recommendation system apart from collaborative based recommendation system.
References:
- https://www.analyticssteps.com/blogs/what-are-recommendation-systems-machine-learning
- https://medium.com/data-science-community-srm/recommendation-systems-in-machine-learning-2ec7909212a8#:~:text=Recommender%20systems%20are%20one%20of,algorithms%20in%20data%20science%20today.
- https://madasamy.medium.com/introduction-to-recommendation-systems-and-how-to-design-recommendation-system-that-resembling-the-9ac167e30e95
- https://www.kaggle.com/crawford/jester-online-joke-recommender