What would be the state of the art image captioning deep learning model?

Question

I saw a couple of architectures, like CNN-LSTM, with and without attention model, use of Glove vector, self-critical models, etc. I am overwhelmed looking at different notebooks and architectures, came here for a guidance. I am looking to build a personal project on image annotations. Also, if I wanted to use this deep learning model together with TFX pipeline, what would be the best type of architecture I can go with?

score 0 · Accepted Answer · answered Apr 09 '21 at 04:13

Here are a couple of Kaggle Kernels, Notebooks and Tutorials for Image Captioning.

What would be the state of the art image captioning deep learning model?

1 Answers1