positional embedding: Nonlinear Function
Created: October 16, 2022
Modified: September 28, 2023

positional embedding

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

There are a few ways to do this. Google's PaLM uses rotary embeddings so it seems like that's probably close to the state of the art?

But also evidence that, at least for autoregressive language modeling, explicit positional encoding isn't necessary at all:

Rotary embeddings

RoFormer: Enhanced Transformer with Rotary Position Embedding