The core problem here is state representation, not estimating return due to delayed response to actions on the original state representation (which is no longer complete for the new problem). If you fix that, then you can solve your problem as a normal MDP, and base calculations on single timesteps. This allows you to continue using dynamic programming to solve it, provided the state space remains small enough.
What needs to change is the state representation and state transitions. Instead of orders resulting in immediate change of stock levels, they become pending changes, and for each item you will have state representation for the amount of current stock, plus amount of stock in each lead time category. State transitions will modify expected lead time for each amount of pending stock as well as amount of current stock.
Your lead time categories will depend on whether the agent knows the lead time immediately after making an order:
If lead times are known, track remaining time until items arrive 1,2 or 3 days. These categories will be assigned by the enviroment following the order, then lead time will transition down on each day deterministically. A 1 day lead time will transition to in stock, 2 day lead will transition to 1 day etc.
If lead times are not known, but probabilities of them are, track time since the order was made. This will be 0, 1 or 2 days. Although you don't know when an order will arrive, you know the probabilities for state transition - e.g. items in 0 days have a 1 in 3 chance of transitioning to "in stock" and a 2 in 3 chance of transitioning to 1 days.
This makes the state space larger, but is less complex than moving to the Semi MDP representation. For instance, doing it this way means that you can still work with single time step transitions and apply dynamic programming in a standard way.
In general, if the environment has a delayed response to actions, then the best way to maintain Markov trait is to add relevant history of actions taken to the state. The added state variables can either be a direct list of the relevant actions, or something that tracks the logical consequence of those actions.