IQN paper (https://arxiv.org/abs/1806.06923) uses distributional bellman target: $$ \delta^{\tau,\tau'}_t = r_t + \gamma Z_{\tau'}(x_{t+1}, \pi_{\beta}(x_{t+1})) - Z_{\tau}(x_t, a_t) $$ And optimizes: $$ L = \frac{1}{N'} \sum^{N}_i \sum^{N'}_j \rho^\kappa_{\tau_i} \delta^{\tau_i,\tau_j}_t $$
But similar quantiles can be got just from Q values, when doing so: $$ \delta^\tau_t = r_t + \gamma \frac{1}{N'} \sum_{j}^{N'} Z_{\tau_j}(x_{t+1}, \pi_{\beta}(x_{t+1})) - Z_\tau(x_t, a_t) \\ = r_t + \gamma Q (x_{t+1}, \pi_\beta(x_{t+1})) - Z_\tau(x_t, a_t) $$ optimizing: $$ L = \sum^N_i \rho^{\kappa}_{\tau_i} \delta^{\tau_i}_t $$
Both lead to similar performance on CartPole env. The loss function of the 2nd one is more simpler and intuitive (atleast to me). So i was thinking if there are any obvious reason why authors didin't use it?