1

Here is the link to the paper https://www.davidsilver.uk/wp-content/uploads/2020/03/mc_aixi_long.pdf

Definition 2. An environment $\rho$ is a sequence of conditional probability functions $\{ \rho_0, \rho_1, \rho_2, ...\}$, where $\rho_n: A^n \rightarrow Density(X^n)$ that satisfies

$$ \forall a_{i:n} \forall x_{<n} : \rho_{n-1}(x_{<n}|a_{<n}) = \sum_{x_n \in X} \rho_n (x_{1:n} | a_{1:n})$$

In the base case, we have $\rho_0(\epsilon | \epsilon) = 1$

Ok so the right hand side seems to be saying, that $x_{1:n}$ will be made from $x_{<n}$ concatenated with $x_n \in X$, and the sum will be over these different realizations of $x_{1:n}$, so the probability function for the shorter sequence $\rho_{n-1}(x_{<n} | a_{<n})$ depends on the possible longer sequences? This seems to be what the equation is saying but it doesn't make sense that the present probability of a sequence depends on possible future outcomes.

TomT800
  • 11
  • 3
  • I'm not familiar with this paper, although it's one I've been trying to dedicate some time to read, but, if you read their explanation of how we should intuitively look at the AIXI decision rule, it shouldn't be surprising that something in the present depends on the future (aka planning). The idea that an environment is a sequence of probability distributions that map actions to rewards and observations is fine. But currently that equation is also not clear to me. It looks a bit like the law of total probability. – nbro Jun 04 '23 at 23:26
  • They call it the "chronological condition" and refer to Hutter 2005. I tried to read that book a few years ago. It's very technical so not easy to read, but quite interesting – nbro Jun 04 '23 at 23:26

0 Answers0