1

Lets take the following example: I must predict the return (Q-values) of x state-action pairs using an ensemble of m models. Using NumPy I could have the following for x = 5 and m = 3:

 >>> predictions = np.random.rand(3, 1, 5)
 
 [[[0.22668968 0.58857404 0.49572979 0.68034031 0.96522052]]

 [[0.90452081 0.07554403 0.62139326 0.6269648  0.78426295]]

 [[0.14154026 0.75292144 0.99831914 0.7584285  0.69479723]]]

Thus, for each possible action we have the following prediction considering the set of models:

>> actions_out = [q[0] for q in predictions]
>> actions_out = [list(a) for a in zip(*actions_out)]
>>
[
[0.22668968082539054, 0.9045208066488987, 0.14154025891848865], 
[0.5885740401748317, 0.07554403461136683, 0.7529214398937515],
[0.4957297945825573, 0.6213932636399634, 0.998319138313377],
[0.6803403139829055, 0.6269648017308974, 0.7584284958713308],
[0.9652205174041535, 0.7842629542761801, 0.6947972303000536]
]

Where for example the actions_out[0] = [0.22668968082539054, 0.9045208066488987, 0.14154025891848865] is the prediction of 3 models for the action 1 (index = 0).

The question is: to calculate the variance of those values (the disagreement or uncertainty between models), the following is correct?

variance = np.var(actions_out, axis = 1)
avg_variance = np.average(variance)

Does this average capture the disagreement between the models?

nbro
  • 39,006
  • 12
  • 98
  • 176
HenDoNR
  • 81
  • 4

0 Answers0