For neural machine translation, there's this model "Seq2Seq with attention", also known as the "Bahdanau architecture" (a good image can be found on this page), where instead of Seq2Seq's encoder LSTM passing a single hidden vector $\vec h[T]$ to the decoder LSTM, the encoder makes all of its hidden vectors $\vec h[1] \dots \vec h[T]$ available and the decoder computes weights $\alpha_i[t]$ with each iteration -- by comparing the decoder's previous hidden state $\vec s[t-1]$ to each encoder hidden state $\vec h[i]$ -- to decide which of those hidden vectors are the most valuable. These are then added together to get a single "context vector" $\vec c[t] = \alpha_1[t]\,\vec h[1] + \alpha_2[t]\,\vec h[2]+\dots +\alpha_T[t]\,\vec h[T]$, which supposedly functions as Seq2Seq's single hidden vector.
But the latter can't be the case. Seq2Seq originally passed that vector to the decoder as initialisation for its hidden state. Evidently, you can only initialise it once. So then, how is $\vec c[t]$ used by the decoder? None of the sources I have read (see e.g. the original paper linked above, or this article, or this paper, or this otherwise excellent reader) disclose what happens. At most, they hide this mechanism behind "a function $f$" which is never explained. I must be overlooking something super obvious here, apparently.