For a transformer decoder, how exactly are K, Q, and V for each decoding step?
Assume my input prompt is "today is a" (good day).
At t= 0 (generation step 0): K, Q, and V are the projections of the sequence ("today is a") Then say the next token generated is "good".
At t=1
(generation step 1):
Which one is true:
- K, Q, and V are the projections of the sequence ("today is a good")
- K, Q, are the projections of the sequence ("today is a"), and V is the projection of the sequence ("good")?