Jokes to recommend-Popularity based filtering


  • Each row may be a user (Row 1 = User #1)
  • Each column may be a joke (Column 1 = Joke #1)
  • Ratings are given as real values from -10.00 to +10.00
    99 corresponds to a null rating
  • As of May 2009, the jokes 7, 8, 13, 15, 16, 17, 18, 19 are the “gauge set”

Data Preprocessing

  • Add column headers
  • All other Joke rating columns would be renamed to 1–150
  • 0th column would be user_id
  • Some rows contain NaN values, replace them as 0
  • Many ratings are 99.0 such jokes were not rated by user, replace them as 0.

Recommend Popular Jokes

  • Find mean rating for all the jokes
  • Mean rating is an array that must be converted into Dataframe for sort into descending order
  • Recommend top n popular jokes



