0

As far as my understanding goes, the model used for feature extraction in DeepSort is specified as the first argument of the function create_box_encoder in the file tools/generate_detections.py:

def create_box_encoder(model_filename, input_name="images",
                       output_name="features", batch_size=32):
    image_encoder = ImageEncoder(model_filename, input_name, output_name)
    image_shape = image_encoder.image_shape

    def encoder(image, boxes):
        image_patches = []
        for box in boxes:
            patch = extract_image_patch(image, box, image_shape[:2])
            if patch is None:
                print("WARNING: Failed to extract image patch: %s." % str(box))
                patch = np.random.uniform(
                    0., 255., image_shape).astype(np.uint8)
            image_patches.append(patch)
        image_patches = np.asarray(image_patches)
        return image_encoder(image_patches, batch_size)

    return encoder

In the same file, the default value of the argument model_filename is specified under the parse_args() function to be resources/networks/mars-small128.pb, which appears to be a model for person re-identification.

Can a model for re-identifying objects other than people (and from multiple classes, such as cars, birds, trucks, etc) be used instead in DeepSort? If so, does DeepSort provide any means for training such models?

My initial understanding was that DeepSort would be able to track all classes recognized by a trained YOLO model. I didn't know that a stand-alone feature extractor was required.

1 Answers1

2

In my experience, if you have really discriminative objects with distinct features then yes! The original DeepSORT's reid model can be borrowed to track those things. But for the best result, you should train your own reid model to suit the task and the classes that you want to track.

GunFire
  • 124
  • 1
  • By DeepSORT's original reid model do you mean the default `resources/networks/mars-small128.pb`? – Mehdi Charife May 30 '23 at 21:55
  • 1
    Yes, that's correct :D – GunFire May 30 '23 at 21:58
  • Isn't that model only trained for extracting people features, and thus won't work on other objects? – Mehdi Charife May 30 '23 at 22:00
  • 2
    Theoretically, yes. But in reality, it CAN BE BORROWED because kernel filters that learned to extract people feature can still have different reactions to different features from unseen data. From those information, model might can still know something and patchworkly embbed those features in its embedding space. – GunFire May 30 '23 at 22:06
  • You said that I should train my own reid model to have better results. Do you have something that can get me started? I'm relatively new to this and so I'm basically lost when it comes to what is normally done in these cases. – Mehdi Charife May 31 '23 at 19:02
  • 1
    Sure, at first if you are new to this field, you can try re-train the base OSNet model. For the next step, you could try replacing the backbone in the architecture with newer backbone for better feature extraction. Furthermore, you can read related papers to understand better about the reid task, I would recommend you read more about metric - contrastive learning (Arcface, SubArcface, InfoNCE,...) and some training strategies (self-supervised, SupCon,...). Finally, learn and learn more. Expand your knowledge base so you can come up with many ideas to work on and improve your result. – GunFire Jun 01 '23 at 05:20