We can track human, car, animal, etc using YOLO V8. In order to track object, the model has to be trained on target objects.
But I have to track arbitrary objects picked by mouse on a video. Since the target object is unknown, I can't train YOLO model.
I thought that it's possible to track arbitrary object by comparing feature map calculated by pretrained YOLO model. i.e.
- In a frame, choose a pixel and save an image that contains the pixel.
- Calculate feature map of the image using pretrained YOLO model.
- In the next frame, find best matching candidate with above feature map.
I think this is possible, but not sure it's correct. And if possible, how can I implement this?