The Transformer model proposed in "Attention Is All You Need" uses sinusoid functions to do the positional encoding.
Why have both sine and cosine been used? And why do we need to separate the odd and even dimensions to use different sinusoid functions?