I am trying to model the following problem as a Markov decision process.
In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is too much rust, we have to mechanically clean the pipe.
I have categorized the rusting states as StateA, StateB, StateC, StateD with increasing rusting from A to D . StateA is absolute clean state with almost no rust.
StateA -> StateB -> StateC -> StateD
∆ ∆ ∆ | | |
| | | | | |
Mnt Mnt Mnt | | |
| | |_clean_| | |
| |_______clean______| |
|_______________________________|
clean
We can take 3 possible actions:
- No Maintenance
- Clean
- Adding Anti Rusting Agent
The transition probabilities are mentioned below: The states degrades from StateA to StateD. State degrades with rusting with certain amount of rust denoted by transition probabilities. Adding Anti Rusting Agent decreases the probabilty of degradation of state
The transition probabilities from StateA to StateB is 0.6 with No Maintenance
The transition probabilities from StateA to StateB is 0.5 with adding an anti-rusting agent.
The transition probabilities from StateB to StateC is 0.7 with No Maintenance
The transition probabilities from StateB to StateC is 0.6 with adding an anti-rusting agent.
The transition probabilities from StateC to StateD is 0.8 with No Maintenance.
The transition probabilities from StateC to StateD is 0.7 with an anti-rusting agent.
Action clean will move any state to StateA with probability 1
Rewards for StateA is 0.6, StateB is 0.5, StateC is 0.4, StateD is 0.3 Clean action lead to Maintenance (Mnt) state which has 0.1 reward. The Maintenance state will lead to increase in productivity after cleaning which is good, but there will be shutdown while Maintenance, so there will be loss of production. So reward is less.
I am new to MDP. It will be helpful if anyone can help me in getting the decision about when should we clean through MDP through a python implementation (python codes)? Shall we Clean at StateB, Clean at StateC, Clean at StateD?