Notes tagged with "transformers": Nonlinear Function

9 notes tagged with "transformers"

Some ideas from this paper. Binary positional encodings . Memory index is represented by a binary vector . This has the property that…

This note is a scratchpad for investigating the expressivity of the [ transformer ] architecture. In general, one set of intuitions that we…

There are a few ways to do this. Google's PaLM uses rotary embeddings so it seems like that's probably close to the state of the art? But…

Suppose we want a [ transformer ] to evaluate the inequality returning if and otherwise. For integer , this can be done with a…

References: Jacobs, Jordan, Nowlan, Hinton. Adaptive Mixtures of Local Experts (1991) Shazeer et al. Outrageously Large Neural Networks…

How should a machine learning model represent text? Word-level and character-level features are obvious options, but both have drawbacks…

In developing intuition about [ transformer ]s it's useful to think about specific primitive operations that can be implemented by a small…

The core of the transformer architecture is multi-headed [ attention ]. The transformer block consists of a multi-headed attention layer…