0

It seems that a lot of researchers predominantly use single encoder + multiple decoders like structure to achieve multi-task learning in computer vision. Would it be reasonable to achieve the multi-task learning using single decoder to deal with outputs in different domains (e.g., decoder generates segmentation (class labels) + key-point (xyz data) from encoded 3D point clouds)? My gut feeling is that some of these tasks are very closely related to each other, however different output formats can potentially makes the NNs suffer to learn the information.

I appreciate if anyone can give me some ideas or point out some references related to this.

HOJUN LEE
  • 1
  • 1

0 Answers0