I recently came across this function:
$$\sum_{t = 0}^{\infty} \gamma^t R_t.$$
It's elegant and looks to be useful in the type of deterministic, perfect-information, finite models I'm working with.
However, it occurs to me that using $\gamma^t$ in this manner might be seen as somewhat arbitrary.
Specifically, the objective is to discount per the added uncertainty/variance of "temporal distance" between the present gamestate and any potential gamestate being evaluated, but that variance would seem to be a function of the branching factors present in a given state, and the sum of the branching factors leading up to the evaluated state.
- Are there any defined discount-factors based on the number of branching factors for a given, evaluated node, or the number of branches in the nodes leading to it?
If not, I'd welcome thoughts on how this might be applied.
An initial thought is that I might divide 1 by the number of branches and add that value to the goodness of a given state, which is a technique I'm using for heuristic tie-breaking with no look-ahead, but that's a "value-add" as opposed to a discount.
For context, this is for a form of partisan Sudoku, where an expressed position $p_x$ (value, coordinates) typically removes some number of potential positions $p$ from the gameboard. (Without the addition of an element displacement mechanic, the number of branches can never increase.)
On a $(3^2)^2$ Sudoku, the first $p_x$ removes $30$ out of $729$ potential positions $p$, including itself.
With each $p_x$, the number of branches diminishes until the game collapses into a tractable state, allowing for perfect play in endgames. [Even there, a discounting function may have some utility because outcomes sets of ratios. Where the macro metric is territorial (controlled regions at the end of play), the most meaningful metric may ultimately be "efficiency" (loosely, "points_expended to regions_controlled"), which acknowledges a benefit to expending the least amount of points $p_x$, even in a tractable endgame where the ratio of controlled regions cannot be altered. Additionally, zugzwangs are possible in the endgame, and in that case reversing the discount to maximize branches may have utility.]
$(3^2)^2 = 3x3(3x3) = "9x9"$ but the exponent is preferred so as not to restrict the number of dimensions.