Lets take the following example: I must predict the return (Q-values) of x state-action pairs using an ensemble of m models. Using NumPy I could have the following for x = 5 and m = 3:
>>> predictions = np.random.rand(3, 1, 5)
[[[0.22668968 0.58857404 0.49572979 0.68034031 0.96522052]]
[[0.90452081 0.07554403 0.62139326 0.6269648 0.78426295]]
[[0.14154026 0.75292144 0.99831914 0.7584285 0.69479723]]]
Thus, for each possible action we have the following prediction considering the set of models:
>> actions_out = [q[0] for q in predictions]
>> actions_out = [list(a) for a in zip(*actions_out)]
>>
[
[0.22668968082539054, 0.9045208066488987, 0.14154025891848865],
[0.5885740401748317, 0.07554403461136683, 0.7529214398937515],
[0.4957297945825573, 0.6213932636399634, 0.998319138313377],
[0.6803403139829055, 0.6269648017308974, 0.7584284958713308],
[0.9652205174041535, 0.7842629542761801, 0.6947972303000536]
]
Where for example the actions_out[0] = [0.22668968082539054, 0.9045208066488987, 0.14154025891848865]
is the prediction of 3 models for the action 1 (index = 0).
The question is: to calculate the variance of those values (the disagreement or uncertainty between models), the following is correct?
variance = np.var(actions_out, axis = 1)
avg_variance = np.average(variance)
Does this average capture the disagreement between the models?