In section 5 of the paper “Soft Actor Critic Algorithms and Applications”, the authors propose to optimize the policy subject to the constraints that the entropy of action distribution should be greater than a specific value $H_0$.
$ \text{argmax}_{\pi}{\left[\sum_{t=0}^{T}{r(s_t,a_t)}\right]}\ s.t.\ \mathbb{E}\left[-\log{\pi(a_t|s_t)} \right]\geq H_0\ \forall t $
This is then converted to a dual problem, and the temperature parameter $\alpha$ is essentially the dual variable in Lagrange function. However, I don’t know why the authors use only a single dual variable $\alpha$. Since the constraint applies to all possible $t$, the Lagrange function should be: $ L = \sum_{t=0}^{T}{r(s_t,a_t)} + \sum_{t=0}^{T}{\alpha_t \cdot(\mathbb{E}_{a_t\sim \pi, s_t\sim p_s}{\left[-\log(\pi(a_t|s_t))\right]-H_0)}} $
And there should be multiple $\alpha_t$ to solve. Mathematically, how could we end up in optimizing only a single $\alpha$ in the algorithm?