Yes and No.
Yes, it is possible to achieve that result. But instead of a self-optimizing Neural Network, I'd recommend another approach:
1. Don't let training time interfere.
If you are trying to train the agent DURING the environment runtime, that's probably the problem.
Training time is usually much bigger than evaluation time. And deployed models usually don't train, so it won't be a problem in production.
You can do 2 things about that:
1.1 "Pause" the game during training.
It might look like "cheating", but in your agent's point of view, they are not actually playing during train time. And once again, it's just simulating how it would behave in production.
But if you can't pause it:
1.2 Disable training during runtime.
Store all states and decisions. Wait until the game is over and then you train the whole batch.
Professional chess players don't try to learn during a blitz challenge. But they do study their own games later to learn from their mistakes.
2. Optimizing hyper-parameters for speed.
You could tweak some hyper-parameters (like the size of your NN) looking for a faster model. Keep in mind it would still be an algorithm that always runs in a fixed time, but you might find some way to make it always
2.1 Using Machine Learning for this Optimization
There are some meta-learning techniques and other methods like NEAT, that can automate your search for a simple effective topology. NEAT already rewards the simplest architectures, penalizing complexity (which is usually attached to speed), but you could also force it to consider running time specifically.
3. Another Network for Another Task
You could a make another short network for deciding if the next move should be accurate or fast. Based on this result, it will choose between precision or speed. This choice could be by flagging a parameter (like branch prediction) or even running a whole new algorithm:
NeedForSpeed = TimeEstimation(state)
#Sorry, I couldn't resist the pun!
if NeedForSpeed > 0.8
decision = agent.instantReactionDecisionTree(state)
elif NeedForSpeed > 0.5
decision = agent.decideStandard(state, branchPrediction=True )
elif NeedForSpeed > 0.2
decision = agent.decideStandard(state)
else:
decision = agent.DeepNN(state)
Bonus: Use Other ML algorithms
Some algorithms have explicit parameters that directly affect the tradeoff for time x precision.
For instance, MCST (Monte Carlos Search Tree) can keep running forever until it explores all possibilities, but it usually finishes before, giving you the best solution found so far.
So, one possibility would be trying some other method instead of Neural Networks.