I summarized evaluation measures of Recommender System. The contents about implicit feedback evaluation will be uploaded separately.


1. Quality of the predictions

  • In order to measure the accuracy of the results of an RS, it is usual to use the calculation of some of the most common prediction error metrics
  • Notations
\[\begin{align*} U &= \text{the set of users}\\ I &= \text{the set of items}\\ r_{ui} &= \text{rating of user }u \text{ on item }i\\ \bullet &= \text{the lack of rating}\\ r_{ui} &= \bullet \text{ ; means user }u \text{ has not rated item }i\\ p_{ui} &= \text{prediction of item }i \text{ on user }u \end{align*}\]

Let $O_u = \{i \in I \vert p_{ui} \ne \bullet \wedge r_{ui} \ne \bullet \}$ :set of items rated by user $u$ having prediction values


1.1 MAE : Mean Absolute Error

\[MAE = \frac{1}{\lvert U \rvert} \underset{u \in U}{\sum}\left( \frac{1}{\lvert O_u \rvert} \underset{i \in Q_u}{\sum}\lvert p_{ui} - r_{ui} \rvert \right)\]

1.2 RMSE : Root Mean Squared Error

\[RMSE = \frac{1}{\lvert U \rvert} \underset{u \in U}{\sum}\sqrt{ \frac{1}{\lvert O_u \rvert} \underset{i \in Q_u}{\sum}\left( p_{ui} - r_{ui} \right)^2 }\]

1.3 Coverage

calculates the percentage of situations in which at least one $k$-neighbor of each active user can rate an item that has not been rated yet bu that active user.

Define $K_{ui} = \text{the set of neighbors of } u \text{ which have rated the item }i$

Let $C_u = \{i \in I \vert r_{ui}=\bullet \wedge K_{ui} \ne \bullet \}$ and $D_u = \{i\in I \vert r_{ui} = \bullet \}$

\[coverage = \frac{1}{\lvert U \rvert}\underset{u \in U}{\sum}\left(100 \times \frac{\lvert C_u \rvert}{\lvert D_u \rvert}\right)\]

2. Quality of the set of recommendations

  • The confidence of users for a certain recommender system does not depend directly on the accuracy for the set of possible predictions.
  • A user gains confidence on the recommender system when this user agrees with a reduced set of recommendations made by recommender system.
  • Evaluation measures obtained by making $n$ test recommendations to user $u$, taking a $\theta$ relevancy threshold : Precision / Recall / F1
  • Notations
\[\begin{align*} X_u &= \text{the set of recommendations to user }u\\ Z_u &= \text{the set of }n\text{ recommendations to user }u \end{align*}\]

2.1 Precision

indicates the proportion of relevant recommended items from the total number of recommended items.

\[precision = \frac{1}{\lvert U \rvert}\underset{u \in U}{\sum}\frac{\lvert \{i \in Z_u \vert r_{ui} \ge \theta \} \rvert}{n}\]

2.2 Recall

indicates the proportion of relevant recommended items from the number of relevant items.

\[recall = \frac{1}{\lvert U \rvert}\underset{u \in U}{\sum} \frac{\lvert \{ i \in Z_u \vert r_{ui} \ge \theta \} \rvert} {\lvert \{i \in Z_u \vert r_{ui} \ge \theta \} \rvert + \lvert \{i \in Z^c_u \vert r_{ui} \ge \theta \} \rvert}\]

2.3 F1

harmonic mean of precision and recall

\[F1 = \frac{2\times precision \times recall}{precision+recall}\]

3. Quality of the list of recommendations: rank measure

  • When the number $n$ of recommended items is not small, users give greater importance to the first items on the list of recommendations.
  • Notations
\[\begin{align*} p_1, &\cdots, p_n : \text{recommendation list}\\ k &= \text{rank of the evaluated item}\\ d &= \text{default rating}\\ \alpha &= \text{the number of item on the list} \end{align*}\]

3.1 HL : Half Life

assume an exponential decrease in the interest of users as they move away from the recommendations at the top.

\[HL = \frac{1}{\lvert U \rvert}\underset{u \in U}{\sum}\underset{i=1}{\sum^{N}}\frac{\max(r_{u, p_i}-d, 0)} {2^{(i-1)/(\alpha -1)}}\]

3.2 DCG : Discounted Cumulative Gain

decay is logarithmic

\[DCG^k = \frac{1}{\lvert U \rvert}\underset{u \in U}{\sum} \left(\underset{i=1}{\sum^k} \frac{r_{u, p_i}}{\log_2(i+1)} \right)\]

4. Novelity and Diversity

4.1 Novelity

indicates the degree of difference between the items recommended to and known by user.

\[novelity_i = \frac{1}{\lvert Z_u \rvert -1} \underset{j \in Z_u}{\sum}\left( 1-sim(i, j) \right), i\in Z_u\]

4.2 Diversity

indicates the degree of differentiation among recommended items

\[diversity_{Z_u} = \frac{1}{\lvert Z_u \rvert (\lvert Z_u \rvert -1)}\underset{i \in Z_u}{\sum}\underset{j \in Z_u, j\ne i}{\sum}(1-sim(i, j))\]

Reference

[1] Bobadilla, Jesús, et al. “Recommender systems survey.” Knowledge-based systems 46 (2013): 109-132.