1

I am trying to model the following problem as a Markov decision process.

In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is too much rust, we have to mechanically clean the pipe.

I have categorized the rusting states as StateA, StateB, StateC, StateD with increasing rusting from A to D . StateA is absolute clean state with almost no rust.

     StateA -> StateB -> StateC -> StateD
     ∆  ∆ ∆       |        |         |           
     |  | |       |        |         |
   Mnt Mnt Mnt    |        |         |
     |  | |_clean_|        |         |
     |  |_______clean______|         |
     |_______________________________|     
                          clean

We can take 3 possible actions:

  • No Maintenance
  • Clean
  • Adding Anti Rusting Agent

The transition probabilities are mentioned below: The states degrades from StateA to StateD. State degrades with rusting with certain amount of rust denoted by transition probabilities. Adding Anti Rusting Agent decreases the probabilty of degradation of state

The transition probabilities from StateA to StateB is 0.6 with No Maintenance

The transition probabilities from StateA to StateB is 0.5 with adding an anti-rusting agent.

The transition probabilities from StateB to StateC is 0.7 with No Maintenance

The transition probabilities from StateB to StateC is 0.6 with adding an anti-rusting agent.

The transition probabilities from StateC to StateD is 0.8 with No Maintenance.

The transition probabilities from StateC to StateD is 0.7 with an anti-rusting agent.

Action clean will move any state to StateA with probability 1

Rewards for StateA is 0.6, StateB is 0.5, StateC is 0.4, StateD is 0.3 Clean action lead to Maintenance (Mnt) state which has 0.1 reward. The Maintenance state will lead to increase in productivity after cleaning which is good, but there will be shutdown while Maintenance, so there will be loss of production. So reward is less.

I am new to MDP. It will be helpful if anyone can help me in getting the decision about when should we clean through MDP through a python implementation (python codes)? Shall we Clean at StateB, Clean at StateC, Clean at StateD?

shan
  • 111
  • 2
  • Hello. Welcome to Artificial Intelligence Stack Exchange! Did you choose the transition probabilities and rewards arbitrarily? To me, it's also not clear what your question really is. Are you asking which methods you should use to find a policy to this defined MDP? Are you aware of value iteration or policy iteration? Please, put your **specific** question in the title to clarify what your question is. – nbro Oct 09 '21 at 23:53
  • 1
    Your MDP definition is not complete. For instance, you ask about a "decision about when should we clean", but there is no action to clean defined anywhere, and no transition probabilities go to the Clean state. In addition there seems to be no other consequence to adding anti-rusting agents, so not at all clear why you would ever *not* use them. Finally "Clean" looks like an end state, as if nothing happens afterwards, which seems odd. I think you probably need help with how to construct the MDP for your problem (which should be a different q) before you can ask a question about solving it. – Neil Slater Oct 10 '21 at 06:41
  • You have improved things, but it is still incomplete. There seemsto be no reason to choose between adding the anti-rust agent or doing nothing - one or other will always be optmnal. The optimal policy appears to always be "do nothing" since the agent will arrive in the highest rewarding StateD the fastest, then stay there (unless there are other transitions from StateD that you are not showing). – Neil Slater Oct 11 '21 at 06:36
  • Thanks for responding @NeilSlater – shan Oct 11 '21 at 07:28
  • @NeilSlater To answer your query, addition of Anti Rusting Agent will delay the rusting process. So the probability of moving to next degrade state decrease with addition of Anti Rusting Agent, compare to doing nothing. – shan Oct 11 '21 at 09:56
  • @nbro I have edited the question to clarify the objective. The Markov Decision Process should determine whether to do the clean action at StateB, StateC, StateD. It will be helpful if I can get help in terms of python codes with MDP. – shan Oct 11 '21 at 10:04
  • @shan To clarify, the MDP is just a mathematical model that represents your problem. The solution to this problem (the MDP) is a policy. So, "The Markov Decision Process should determine whether to do the clean action" does not make sense. What you want is an algorithm that is able to find a policy for the MDP. So, that's why I asked you: are you familiar with, e.g., _value iteration_? – nbro Oct 11 '21 at 10:06
  • @shan: I understand the probabilities are changed, that is clear from your description. However, the rest of the MDP so far makes "do nothing, always" the clear optimal policy, because landing in StateD gives the largest reward. The agent will want to get to StateD as fast as possible, then stay there. I think some further description about the rewards, and how you determined them might help clarify that. I still think the issue you face right now is understanding how to turn your problem into a MDP, not yet how to optimise the agent. – Neil Slater Oct 11 '21 at 10:12
  • @nbro: Thanks for the response. Sorry for not making myself clear. The solution to the above problem can be a optimum policy generated by MDP through value iteration, that will help us taking the decision whether to do clean action at StateB or StateC or StateD – shan Oct 13 '21 at 07:20
  • @NeilSlater: I have edited and accommodated your suggestion. My mistake. StateD is the most degraded state. The reward for that is lowest or negative. StateC, StateB are better states. Mnt state have less reward because of shutdown. Eager to have python implementation of the above problem, which give solution regarding optimum decision. – shan Oct 13 '21 at 07:55

0 Answers0