The "trick" of subtracting a (state-dependent) baseline from the $Q(s, a)$ term in policy gradients to reduce variants (which is what is described in your "baseline reduction" link) is a different trick from the modifications to the rewards that you are asking about. The baseline subtraction trick for variance reduction does not appear to be present in the code you linked to.
The thing that your question is about appears to be standardization of rewards, as described in Brale_'s answer, to put all the observed rewards in a similar range of values. Such a standardization procedure inherently requires division by the standard deviation, so... that answers that part of your question.
As for why they are doing this on a per-episode-basis... I think you're right, in the general case this seems like a bad idea. If there are rare events with extremely high rewards that only occur in some episodes, and the majority of episodes only experience common events with lower-scale rewards... yes, this trick will likely mess up training.
In the specific case of the CartPole environment (which is the only environment used in these two examples), this is not a concern. In this implementation of the CartPole environment, the agent simply receives a reward with a value of exactly $1$ for every single time step in which it manages to "survive". The rewards
list in the example code is in my opinion poorly-named, because it actually contains discounted returns for the different time-steps, which look like: $G_T = \sum_{t=0}^{T} \gamma^t R_t$, where all the individual $R_t$ values are equal to $1$ in this particular environment.
These kinds of values tend to be in a fairly consistent range (especially if the policy used to generate them also only moves slowly), so the standardization that they do may be relatively safe, and may improve learning stability and/or speed (by making sure there are always roughly as many actions for which the probability gets increased as there are actions for which the probability gets decreased, and possibly by making hyperparameters easier to tune).
It does not seem to me like this trick would generalize well to many other environments, and personally I think it shouldn't be included in such a tutorial / example.
Note: I'm quite sure that a per-episode subtraction of the mean returns would be a valid, albeit possibly unusual, baseline for variance reduction. It's the subsequent division by standard deviation that seems particularly problematic to me in terms of generalizing to many different environments.