I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?
2 Answers
If you would classify a transformer as Seq2Seq, then it is arguably the best. This is only arguably due to accuracy.
Shallow neural networks or even decision trees and forests may be better in production due to lower training time, lower inference time and smaller size in memory.
Overall, it requires a bit of a compromise, and you need to do what suits your use case. For example you wouldn't run GPT-3 on a raspberry pi pico (for example). So it depends on what you mean by "best".

- 1
I often see this type of question about finding the “best” model. Perhaps the best approach would be to use a sample or even better a toy dataset similar for the kind of problem you are solving and use an AutoML tool such as PyCaret or Darts to evaluate several/many different models to narrow the choices then experiment further. It would at least be a more scientific approach IMO

- 127
- 4