@brusilovsky

Contrasting Offline and Online Results when Evaluating Recommendation Algorithms

, , and . Proceedings of the 10th ACM Conference on Recommender Systems, page 31--34. New York, NY, USA, ACM, (2016)
DOI: 10.1145/2959100.2959176

Abstract

Most evaluations of novel algorithmic contributions assess their accuracy in predicting what was withheld in an offline evaluation scenario. However, several doubts have been raised that standard offline evaluation practices are not appropriate to select the best algorithm for field deployment. The goal of this work is therefore to compare the offline and the online evaluation methodology with the same study participants, i.e. a within users experimental design. This paper presents empirical evidence that the ranking of algorithms based on offline accuracy measurements clearly contradicts the results from the online study with the same set of users. Thus the external validity of the most commonly applied evaluation methodology is not guaranteed.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
  • @dblp
@brusilovsky's tags highlighted