1

I've been reading this paper on recommendation systems using reinforcement learning (RL) and knowledge graphs (KGs).

To give some background, the graph has several (finitely many) entities, of which some are user entities and others are item entities. The goal is to recommend items to users, i.e. to find a recommendation set of items for every user such that the user and the corresponding items are connected by one reasoning path.

I'm attaching an example of such a graph for more clarity (from the paper itself) -

enter image description here

In the paper above, they say

First, we do not have pre-defined targeted items for any user, so it is not applicable to use a binary reward indicating whether the user interacts with the item or not. A better design of the reward function is to incorporate the uncertainty of how an item is relevant to a user based on the rich heterogeneous information given by the knowledge graph.

I'm not able to understand the above extract, which talks about the reward function to use - binary, or something else. A detailed explanation of what the author is trying to convey in the above extract would really help.

nbro
  • 39,006
  • 12
  • 98
  • 176
stoic-santiago
  • 1,121
  • 5
  • 18
  • Are you familiar with the basic concept of RL, including reward functions? – nbro May 02 '20 at 16:30
  • Yes, I am familiar with it. – stoic-santiago May 02 '20 at 16:31
  • 1
    I didn't read this paper, but the authors probably mean that, given that you don't know _a priori_ (or beforehand) if a user will buy or not an item, it probably doesn't make sense to give a reward of 1 whenever the system recommends an item that the user eventually buys. It probably makes more sense to associate a stochastic reward to recommendations of the system based on your knowledge encoded in the KG. By "stochastic reward" I mean something that isn't deterministic. Maybe later I will investigate that paper and provide a more solid answer, but I can't assure you. – nbro May 02 '20 at 16:36
  • You're absolutely right, @nbro I've added an answer that I found convincing using the details I uncovered from the latter half of the paper. – stoic-santiago May 03 '20 at 05:52

1 Answers1

2

I found the answer further into the paper (section 3.2 Formulation as Markov Decision Process)! I'll post it here for everyone.

Given any user, there is no pre-known targeted item in the KGRE-Rec problem, so it is unfeasible to consider binary rewards indicating whether the agent has reached a target or not. Instead, the agent is encouraged to explore as many “good” paths as possible. Intuitively, in the context of recommendations, a “good” path is one that leads to an item that a user will interact with, with high probability. To this end, we consider to give a soft reward only for the terminal state $s_{T}=\left(u, e_{T}, h_{T}\right)$ based on another scoring function $f(u, i)$. The terminal reward $R_{T}$ is defined as

$$ R_{T}= \begin{cases}\max \left(0, \frac{f\left(u, e_{T}\right)}{\max _{i \in I} f(u, i)}\right), & \text { if } e_{T} \in \mathcal{I} \\ 0, & \text { otherwise }\end{cases} $$

where the value of $R_{T}$ is normalized to the range of $[0,1] . f(u, i)$ is also introduced in the next section.

Details about $f(u,i)$, the scoring function, can be found in the paper.

nbro
  • 39,006
  • 12
  • 98
  • 176
stoic-santiago
  • 1,121
  • 5
  • 18
  • For clarification, a state is defined as a tuple $(u,e_t,h_t)$, where $u$,$e_t$,$h_t$ represent the starting user identity, entity the agent has reached at step $t$, and history prior to step $t$. – stoic-santiago May 03 '20 at 05:55