Xiang Liang's Reading Notes on 'Practical Recommender Systems' — Chapter 1: What Makes a Good Recommender System

1.1 What Is a Recommendation System?

By 2026, recommendation systems have become a widely recognized concept; this section may be skimmed.

1.2 Applications of Personalized Recommendation Systems

Same as above—skim this section.
Several algorithms—what are they for, and how are they implemented?

  • Item-based recommendation algorithm
  • Facebook EdgeRank algorithm

1.3 Evaluation of Recommendation Systems

What makes a good recommendation system? A system is not good simply because its predictions come true—for instance, predicting that a user will eat tomorrow is trivial and wastes computational resources.

Evaluation can be conducted from the following perspectives:

  • User satisfaction
  • Prediction accuracy
    • Rating prediction
    • Top-N recommendation
    • Coverage (measured via information entropy or Gini index)
  • Diversity: To illustrate what degree of diversity is ideal in a recommendation system, consider a simple example. Suppose a user enjoys both action movies and animated films, watching action movies 80% of the time and animated films 20% of the time. Four distinct recommendation lists could be generated:
    • List A contains 10 action movies and zero animated films;
    • List B contains 10 animated films and zero action movies;
    • List C contains 8 action movies and 2 animated films;
    • List D contains 5 action movies and 5 animated films.
      In this scenario, List C is generally considered optimal—it provides some diversity while still reflecting the user’s primary interest. List A satisfies the user’s main interest but lacks diversity; List D is overly diverse and neglects the user’s dominant preference; and List B neither reflects the user’s main interest nor offers meaningful diversity—making it the worst option.
  • Novelty: A novel recommendation suggests items the user has never heard of before. The simplest way to implement novelty on a website is to filter out items with which the user has previously interacted on that site. For example, on a video platform, novel recommendations should exclude videos the user has already watched, rated, or browsed. However, users may have seen certain videos elsewhere (e.g., on another website or on TV), so filtering only those items with prior user interaction on the current site does not fully guarantee novelty. Oscar Celma explored novelty evaluation in his doctoral dissertation, “Music Recommendation and Discovery in the Long Tail.” The simplest method for evaluating novelty is to compute the average popularity of recommended items, since less popular items are more likely to feel novel to users. Thus, lower average item popularity in recommendations indicates higher novelty.
  • Serendipity: Serendipity has become one of the hottest topics in recommendation systems in recent years. Yet first we must clarify: what is serendipity, and how does it differ from novelty? Note that here we discuss the conceptual distinction between serendipity and novelty as recommendation metrics, not the semantic difference between the two terms in Chinese (since both are translations of English words, their meanings in Chinese do not necessarily align with their original English meanings). Therefore, we must first set aside our preconceived notions about these two words in Chinese. An illustrative example clarifies the distinction between these two metrics. Suppose a user enjoys Stephen Chow’s films, and we recommend a movie titled “At the Crossroads” (a 1983 film starring Andy Lau, Stephen Chow, and Tony Leung—rarely known to feature Chow). If the user has never heard of this film, the recommendation qualifies as novel. Yet it lacks serendipity, because once the user learns about the cast, the recommendation seems unsurprising. In contrast, if we recommend Zhang Yimou’s “Red Sorghum”—assuming the user has never seen it—the user may initially find it puzzling, as it appears unrelated to their interests. However, if the user watches it and genuinely enjoys it, then the recommendation qualifies as serendipitous. This original example originates from Guy Shani’s paper, whose core idea is: if a recommendation diverges from the user’s historical interests yet yields high user satisfaction, then it exhibits high serendipity; novelty, by contrast, depends solely on whether the user has previously heard of the recommended item.
  • Trustworthiness: Imagine you have two friends—one highly trustworthy, the other habitually unreliable (“full of hot air”). If your trustworthy friend recommends a travel destination, you’re likely to follow the suggestion; if the unreliable friend recommends the same destination, you’re unlikely to go. These two friends can be viewed as two recommendation systems: even with identical outputs, users may respond differently due to differing levels of trust in each system.
  • Timeliness: On many websites, items such as news articles or microblog posts possess strong temporal relevance, requiring recommendations to be delivered while the items remain timely. For instance, recommending yesterday’s news is clearly inferior to recommending today’s news. Hence, timeliness becomes critical for recommendation systems on such platforms.
  • Robustness: Any profitable algorithmic system inevitably attracts attacks—the quintessential example being search engines. The battle between search engine spamming and anti-spam measures is exceptionally fierce, as ranking one’s product first for a popular search term brings enormous commercial benefits. Recommendation systems now face similar spam challenges, and robustness (i.e., “robustness” or “fault tolerance”) measures a recommendation system’s resilience against such manipulation.

How to conduct evaluation:

  • Offline experiments
  • User studies
  • Online experiments (i.e., A/B testing)