Firstly, I am assuming $r$ is the update to the hidden state. I will call it $\tilde{\mathbf{h}}_t$, because this is how it's called in the paper by Cho et al. that introduced the GRU (where $\mathbf{r}_t$ is actually a different vector).
Both formulas do exactly the same thing. The only difference is how you would interpret the role of the vector $\mathbf{z}_t \in (0, 1)^d$. Importantly, because $\mathbf{z}_t$ is sigmoid-activated, it's values are between $0$ and $1$. Thus, you could say that $\mathbf{z}_t$ acts like a continuous switch that controls how much of $\mathbf{h}_{t-1}$ and $\tilde{\mathbf{h}}_t$ ends up in the updated hidden state $\mathbf{h}_t$. This is also called a gate.
In the first formula $\mathbf{h}_t = \mathbf{h}_{t-1}\mathbf{z}_t+\tilde{\mathbf{h}}_t(1-\mathbf{z}_t)$
you could say that $\mathbf{z}_t$ acts like a keep-gate that tells you how much of the old information in the hidden state is preserved. It's negative ($1 - \mathbf{z}_t$), tells you how much the new information $\tilde{\mathbf{h}}_t$ contributes.
In the second formula: $\mathbf{h}_t = \mathbf{h}_{t-1}(1 - \mathbf{z}_t)+\tilde{\mathbf{h}}_t\mathbf{z}_t$
the role of $\mathbf{z}_t$ is inverted and could now be interpreted as an update-gate, controlling how much of old information is replaced by the update $\tilde{\mathbf{h}}_t$.
From an optimization perspective, it doesn't matter whether you optimize the negative or the positive version (i.e. the keep or the update version) of $\mathbf{z}_t$. Therefore, both equations are valid and work equally well.
The first formula is actually used in the original paper by Cho et al., where it is called an update gate (I would argue that z, in this case, acts rather like a keep-gate, but in the end its negative controls the portion of the update, so I guess you can also call it that).