Training a sequential model that can only evaluate after several hundred cycles

Question

I'm attempting to build a neural network to play the card game, Lost Cities.

A brief overview of the game:

The game involves two players taking turns to play cards on expeditions.
Expeditions incur a debt when you play the first card. Subsequent cards will buy out of that debt and return a profit.
Each player must either play or discard a card (with restrictions), and then draw a new card from the deck or the discard pile.
The game ends when the last card is drawn.

The expanded rules can be found here.

I'm attempting to train a sequential model to play this game, with the state of the board/hand/discard pile as a set of inputs and a three heuristic values for each card (value of playing, value of discarding, value of picking up from this discard) as the output values. However, it's unclear to me how I should approach this from a training perspective.

My biggest hurdle is that the network's success can only be evaluated by wether or not it can "beat" itself or a competing network in a game. A player's raw score during the game is not viable due to the nature of play. To me, this means that the network will have to be used with several hundred different sets of input data (for each turn of state of the game board during a match) before any meaningful results are generated.

So far, the only solution I've had minor success with is a generational algorithm that creates "fuzzy" children to compete against each other for the most "wins". This was done in Python's standard library, and was obscenely slow, even with a reduced sequential network.

My question:

Is there an established method to deliver delayed feedback to a network (after several uses of the network)?

I'm very new at this, so any and all feedback is more than welcomed.

Hello, welcome to Artificial Intelligence Stack Exchange. I didn't read your question in detail. But, it seems to me like a programming question in Tensorflow. If yes, please note that general programming questions are off-topic for or site. Please read [here](https://ai.stackexchange.com/help/on-topic) for more details and you can ask on [CV.SE](https://stats.stackexchange.com/). — hanugm, Aug 09 '21 at 22:59
Hello @hanugm, thanks for your response. I've edited my question to specifically ask about the high-level architecture for my network, rather than the specifics regarding the TensorFlow platform. Sorry for the inconvenience. — Justin Becker, Aug 09 '21 at 23:04

Training a sequential model that can only evaluate after several hundred cycles

0 Answers0