In their blog post, they link to (among many other papers) their IMPALA paper. Now, the blog post only links to that paper with text implying that they're using the "off-policy actor-critic reinforcement learning" described in that paper, but one of the major points of the IMPALA paper is actually an efficient, large-scale, distributed RL setup.
So, until we get more details (for example in their paper that's currently under review), our best guess would be that they're also using a similar kind of distributed RL setup as described in the IMPALA paper. As depicted in Figures 1 and 2, they decouple actors (machines running code to generate experience, e.g. by playing StarCraft) and learners (machines running code to learn/train/update weights of neural network(s)).
I would assume that their TPUs are definitely being used by the Learner (or, likely, multiple Learners). StarCraft 2 itself won't benefit from running on TPUs (and probably would be impossible to even get to run on them in the first place), because the game logic likely doesn't depend on large-scale, dense matrix operations (the kinds of operations that TPUs are optimized for). So, the StarCraft 2 game itself (which only needs to run for the "actors", not for the "learners") is almost certainly running on CPUs.
The actors will still have to run forwards passes through Neural Networks in order to select actions. I would assume that their Actors are still equipped with either GPUs or TPUs to do this more quickly than a CPU would be capable of, but the more expensive backwards passes are not necessary here; only the Learners need to perform those.