Determining a policy to play a game of chance

Question

I'm trying to optimize the expected return from a game of chance, but have quickly realized the problem outclasses the introductory AI course I took in college years ago. I would appreciate any guidance into what algorithms might be relevant to this type of game.

Game Description

The player has 31 tiles that they can flip over and reveal a hidden color. There are 6 blue tiles, 4 red tiles, and possibly one gold tile. (The rest are blank.) The player is only granted 11 attempts to turn over tiles. Turning over all tiles of a single color results in a reward (a different reward for each color). Uncovering no complete sets of tiles results in 0 reward. Due to illustrations on the sets of tiles, revealing a single red or blue tile will also inform the player of the locations of the other tiles of the same color. (i.e. if I flip over one red tile, I know which other three tiles would be red if I were to flip them over too)

The distribution of colors across these 31 tiles is not uniform (but can be assumed as known). Further, there are only ~100 possible tile color configurations, and these configurations have been discovered by the players of this game.

(Concretely, the game is Faux Hollows from the video game FFXIV; the above description merely removes the flavortext.)

My object is maximize the expected reward over multiple plays of the game; I want to have a program that, given a board state, tells me the optimal tile to flip to get that reward.

What I've considered

My initial thought was to treat this as a Markov Decision Process, since we are changing state, receiving some value/reward based on state change, and I want a policy to maximize that expected reward. However, the only way I could think to model the state space (i.e. state = which tiles have been flipped and what color they are) results in literal billions of states ($\sum_{1\leq k\leq11}\binom{31}{k}$), which is (AFIAK) intractable for classical solvers of MPDs.

I thought maybe Reinforcement Learning could help, but I'm not already deeply familiar with it. My cursory glance at q-learning suggested that that high-cardinality state spaces are handled by approximating the value function, but --- because the game has very few states that give rewards --- I worry about how well approximating the value function will work. (Also, I'm not sure how I would incorporate the 11 tile-flip limit.)

I also considered using a decision tree to determine the board configuration that currently is selected, but this doesn't optimize for reward at all; it merely finds the fewest moves that would indicate which color configuration I have.

Could I have some recommendations of algorithms that I should learn/consider for use in finding an optimal strategy to this game?

Do you have access to the data of the ~100 configurations? This makes a huge difference to the likely reward and to how you would approach a solution. — Neil Slater, Jun 23 '22 at 06:51
On a quick look, one other important fact is that coloured tiles appear together in a block on a 2 dimensional grid (so that together they reveal a picture with the colour as a background). This fact can be used by the game player. It can be used more effectively if the illustration is recognised following a tile flip - if that is possible, then it will immediately show where the rest of the tiles of the same colour must be. — Neil Slater, Jun 23 '22 at 07:10
@NeilSlater I do have access to that data and it can be assumed correct (i.e. not given to us in game files, but crowdsourced from players and the discovered configs have been stable for months). And, you are correct --- revealing a single tile of a color will show part of the illustration and the player can derive positions of all other tiles of that color. However, the player must then choose if hunting for rarer, higher-value tiles (e.g. the 1x1 gold tile) is worth not turning over the rest of the tiles that have known positions (due to the turn limit) — , Jun 23 '22 at 14:22
I am not clear from the way that you talk about it, what a "tile colour configuration" is. I was assuming initially that it is the precise location of all tiles - i.e. there are literally only 100 randomised states of the whole board? However, your note about "hunting over the rest of the tiles" implies that is not the case (or at least not the whole story)? Otherwise, if you found e.g. one red tile on move 6, and this narrowed the possible states down to only 4 matching variations, then you could just visit the 4 possible locations of the gold tile. — Neil Slater, Jun 23 '22 at 15:16
@NeilSlater There are 100 randomized states of the whole board. A picture might help: https://i.stack.imgur.com/2XxJe.png (dark grey cells are "blocked"/not a tile, white are blank tiles, then there's the blue/red/gold tiles) Finding one red tile on move 6 does mean that you know where the other 3 red tiles are and you can derive some information about the rest of the board, but not necessarily total information. (1/2) — , Jun 23 '22 at 15:59
For example, the first 6 boards in the first row of the image I linked *only* differ in the location/presence of the gold tile (and one doesn't have a gold tile at all), so until you check all 6 of those spots, you can't actually be sure which config you have. (2/2) — , Jun 23 '22 at 15:59
This gives rise to the optimization issue --- on each move, is it better to "explore" and gain information about the board configuration, or is it better to flip over the tiles that you know about already to lock in points. (e.g. if you have 3 turns left and have only turned over one red tile but no other colored tiles, do you turn over the last 3 red tiles and get a small amount of points, or do you hunt for a possible gold tile and maybe get lots of points or maybe get nothing?) — , Jun 23 '22 at 16:02
Please, rewrite the title of this post in the form of a **specific question**. "Determining a policy to play a game of chance" is not a question and it's also not specific. Thanks. — nbro, Jun 29 '22 at 22:53

Determining a policy to play a game of chance

Game Description

What I've considered

0 Answers0