What is the best way to train / do inference when the context matters highly as to what the inferred result should be?
For example in the image below all people are standing upright, but because of the perspective of the camera, their location highly affects their skeletal pose. If the 2D inferred skeleton of the person on the right were located where the middle person is in pixel space, it should not be considered upright even though it should be considered upright where it is now.
I assume the location would be fed in during both training and inference somehow, but I don't know the names of the techniques that should be used and are there any best practices when doing this type of scenario?