I am solving a problem for which I have to select the best possible servers (level 1) to hit for a given data. These servers (level 1) in turn hit some other servers (level 2) to complete the request. The level 1 servers have the same set of level 2 servers integrated with them. For a particular request, I am getting success or failure as a response.
For this, I am using Thompson Sampling with Bernoulli prior. On success, I am considering reward as 1 and, for failure, it is 0. But in case of failure, I am receiving errors as well. In some error, it is evident that the error is due to some issue at the server (level 1) end, and hence reward 0 makes sense, but some error results from request data errors or issue at level 2 servers. For these kinds of errors, we can't penalize the level 1 servers with reward 0 nor can we reward them with value 1.
Currently, I am using 0.5 as a reward for such cases.
Exploring over the Internet, I couldn't find any method/algorithm to calculate the reward for such cases in a proper (informed) way.
What could be the possible way to calculate reward in such cases?