Source: https://www.evidentlyai.com/ranking-metrics/evaluating-recommender-systems
Overview
This EvidentlyAI article walks through how to align ranking metrics with the business goals of a recommender system. It highlights the difference between product KPIs (click-through, revenue, watch time) and offline proxies such as precision, recall, MAP, nDCG, coverage, novelty, and serendipity.
Key points
- Start with product intent – determine whether the recommender should drive clicks, conversions, dwell time, or other KPIs. Design experiments and datasets that mimic production distribution, and define what “relevance” means for your catalog.
- Core ranking metrics:
- *Precision@K / Recall@K*: share of relevant items among the top-K or recall of all relevant items within the shortlist.
- *MRR / MAP*: reward correct items ranked high by averaging reciprocal rank or precision at each hit.
- *nDCG*: discounts lower-ranked hits and normalizes across users so scores are comparable.
- *Coverage*: measures how much of the catalog and user base the system actually recommends.
- Beyond accuracy – track diversity, novelty, and serendipity to avoid echo-chamber recommendations and surface long-tail items. Monitor business constraints like price ranges or brand mix if merchandising requires it.
- Evaluation workflow – combine offline validation (holdout sets, cross-validation) with online A/B tests; continually monitor logs for drift, cold start behavior, and catalog changes.
Takeaways
- There is no single “best” metric; teams need a balanced scoreboard that connects offline rank metrics with online business KPIs.
- High-quality logging (user interactions, impressions, context) is the foundation for meaningful evaluation.
- Balancing accuracy with coverage and diversity is critical to maintain discovery, especially in marketplaces with long-tail inventory.