Created: October 16, 2022
Modified: September 28, 2023
Modified: September 28, 2023
positional embedding
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.There are a few ways to do this. Google's PaLM uses rotary embeddings so it seems like that's probably close to the state of the art?
But also evidence that, at least for autoregressive language modeling, explicit positional encoding isn't necessary at all:
Rotary embeddings
RoFormer: Enhanced Transformer with Rotary Position Embedding