I have a question about CLIP semantic image search. When you have an image of a person e.g. a skinny person wearing red shirt, clip will search for you similarity in all dimensions including body shape, gender, shirt color, etc yielding me more results of skinny people wearing red shirt. Is there a way to specify the CLIP to search in specific dimension e.g. shirt color only? So the result can be an image of a fat person as long as the shirt is very similar. I am thinking of using the embedding of the text e.g. "shirt color" to help guiding this somehow but I don't have a specific idea.
Asked
Active
Viewed 56 times