Many recent research papers contain the phrase "Zero-Shot Visual Recognition".
What exactly is meant by zero-shot visual recognition? Does the task need only images or also the other data like text?
Many recent research papers contain the phrase "Zero-Shot Visual Recognition".
What exactly is meant by zero-shot visual recognition? Does the task need only images or also the other data like text?