For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following:
$$ a' = \text{argmax}_{a'|G_{\omega}(a', s')/\text{max}~\hat{a}~G_{\omega}(\hat{a}, s')~>~\tau} Q_{\theta}(s', a') $$
I have doubts about the predictive model $G_{\omega}$. In behavior cloning, it could be an action obtained using supervised learning mapping states to actions. Instead, in BCQ it looks like an probability regarding the actions available. I'm right? And what is the action $\hat{a}$?
EDIT: As far I understood we compare the probability of the action $a'$ to $\hat{a}$, and if it's above the threshold $\tau$ we calculate the loss. The question is now how should I proceed if this is not true: should I take the next action Q with the highest value?