How is the probability transition matrix populated in the Markov process (chain) for a board game?

Question

Following on from my other (answered) question:

With regards to the Markov process (chain), if an environment is a board game and its states are the various position the game pieces may be in, how would the transition probability matrix be initialised? How would it be (if it is?) updated?

Check out this fun project on [reddit](https://www.reddit.com/r/3Blue1Brown/comments/l3cckw/probability_distribution_in_monopoly_using_markov/?utm_medium=android_app&utm_source=share). It tries to calculate probability of arriving each square in the game of Monopoly. — Enes, Jul 26 '21 at 07:15

Aray Karjauv · Answer 1 · 2020-03-02T23:58:48.203

Transition probability matrix cannot be initialized. Your game world has some rules. Since we do not know these rules we can approximate them. To do this, we should run the game over and over and collect the outcome of each action.

Here is a toy example: assume we control a plane. We want the plane to fly forward, but sometimes there may be wind in a random direction. So the plane won't end up in the desired position. Instead, it has some probability to fly left, right and forward. After the data is collected, we can estimate this probabilities.

Another example is estimating n-gram model: assume we want to train a model that will suggest a word in search field. We would analyze a corpus and calculate all 2 word sequences (for bigram model). Then, if a user starts typing I the model would suggest the word am, because I am occurs more frequently then I did, for example, and hence, the transition probability from I to am is greater.

Here is good book on Language Modeling

But your question suggest me that you want something like reinforcement learning. Check out this video wich is a part of a great lectures series on RL by David Silver

How is the probability transition matrix populated in the Markov process (chain) for a board game?

1 Answers1