What are the ways and SOTA in domain of knowledge representation and reasoning over scene. Suppose there are 3 objects in the scene and which objects needs to be picked first among them is governed by 'Rules' written in text form, like Object which are square in shape will be picked first followed by Triangular shape. So basically priority queue or list of object selection needs to be generated based on rules. Or if there is any other way to handle this case?
I believe first way is to do object detection over the scene to get the different object from scene and then we have to represent 'Rules' in form of external knowledge, so that priority queue or list can be generated. However rules in textual form are non-differentiable, so how we can integrate rules in neural network training and what are the ways to represent the knowledge for such kind of priority queue generation over the image scene.
Paper 'Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding' tries to address that kind of problem but I feel its more like a general Question answering system then KRR system