I am developing an image search engine. The engine is meant to retrieve wrist watches based on the input of the user. I am using SIFT descriptors to index the elements in the database and applying Euclidean distance to get the most similar watches. I feel like this type of descriptor is not the best since watches have a similar structure and shape. Right now, the average difference between the best and worst matches is not big enough (15%)
I've been thinking of adding colour to the descriptor, but I'd like to hear other suggestions.