In MAML for RL, are new tasks sampled for every meta update, or is the same set of tasks used throughout?

Question

Consider Model Agnostic Meta-Learning, as described here.

For a RL task $T_i$, represented with a model $f$, with parameters $\theta$ and learning rate $\alpha$, where the RL loss function is $\mathcal{L}$:

$$ \theta'_{i} = \theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right) $$

This is standard gradient descent.

The meta-update step samples multiple tasks $\mathcal{T}_i\sim p(\mathcal{T})$, calculates their respective parameters $\theta_i$ from $\theta$, and minimizes the loss of the original $\theta$ with respect to these intermediate $\theta_i$s:

$$ \min_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right) = \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)}\right) \\ \theta \leftarrow \theta - \beta\nabla_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right) $$

Quesstion: is $\mathcal{T}_i\sim p(\mathcal{T})$ sampling done for every meta-update? Or, is a constant set of tasks maintained from the beginning? Can the frequency of task sampling be different from the frequency of meta-updates?

definitely sampled from a bigger pool of tasks, otherwise bye bye generalization — Alberto, Jul 28 '23 at 23:31

In MAML for RL, are new tasks sampled for every meta update, or is the same set of tasks used throughout?

0 Answers0