The update rules are not any different.
However, if you make many other decisions in the meantime, the timestamps that you are able to run estimate updates for will lag behind the current timestamp.
You will need a buffer of pending rewards, noting the state and action taken. You can clear the buffer, turning it into training data and running updates, once you have the matching rewards.
A lot of tutorial material will use the term $Q_t(s,a)$ to represent the current estimate of expected reward in update rules and to drive exploitation versus exploration. In a practical system with delays you will have to use the best available estimates instead. You don't need the subscript $t$ on the estimation function except for plotting learning curves - if you do so you will need to decide whether you use the decision time to label the estimates, or the update time when the estimates are revised (it doesn't really matter, the graph will look the same, and mean the same thing, just with an offset).
Typically you will also plot total reward or regret, or some combination. This plot will be the same as before. The impact of the reward delay will be slower initial learning and slower responsiveness to non-stationary environments, but there is no way around that.
Once you ignore (or change) the subscript $t$ on $Q_t$, then all the equations work the same as before.