For Policy-Gradient methods which require a differentiable sample from the action distribution, such as Soft Actor-Critic, then you are correct that it suffers from the same problem of requiring the gradient to be able to pass through these samples.
The common way around this is to consider distributions that can be sampled from using the re-parameterisation trick. The classic example of this is with a Gaussian distribution: suppose our network outputs mean and standard deviation $\mu_\theta$ and $\sigma_{\theta}$, respectively ($\theta$ are the network parameters), then we can note that $X = \mu_\theta + \sigma_\theta \times \epsilon$, where $\epsilon$ follows a Unit-Normal distribution. So, to sample from $X$, all we need to do is sample from the unit normal distribution, which does not require us to backprop through our network parameters.
The same can be achieved for discrete actions using the Gumbel-Softmax trick. For information on this, I would recommend this blog-post (and the references within).
For algorithms derived from the REINFORCE algorithm, where the policy gradient is given by $G_t \nabla_\theta \log \pi_\theta(a|s)$, then you can see that this is equivalent to maximising the (log) likelihood, scaled by the returns. All we need to do here is obtain the parameters of the distribution of our policy (e.g. mean and std-dev of a Gaussian) by doing a forward pass of the network for the current state, and then evaluating the PDF/PMF at the action we selected -- this does not require the sampled action to be differentiable.