Notes tagged with "transformers": Nonlinear Function

9 notes tagged with "transformers"

GPT

Tagged with: #ai#transformers

Looped Transformers as Programmable Computers

Some ideas from this paper. Binary positional encodings . Memory index is represented by a binary vector . This has the property that…

Tagged with: #transformers

expressive transformer

This note is a scratchpad for investigating the expressivity of the [ transformer ] architecture. In general, one set of intuitions that we…

Tagged with: #machine-learning#transformers

positional embedding

There are a few ways to do this. Google's PaLM uses rotary embeddings so it seems like that's probably close to the state of the art? But…

Tagged with: #transformers#ai

relu inequality

Suppose we want a [ transformer ] to evaluate the inequality returning if and otherwise. For integer , this can be done with a…

Tagged with: #machine-learning#transformers

sparse mixture of experts

References: Jacobs, Jordan, Nowlan, Hinton. Adaptive Mixtures of Local Experts (1991) Shazeer et al. Outrageously Large Neural Networks…

Tagged with: #machine-learning#transformers

tokenize

How should a machine learning model represent text? Word-level and character-level features are obvious options, but both have drawbacks…

Tagged with: #machine-learning#transformers

transformer primatives

In developing intuition about [ transformer ]s it's useful to think about specific primitive operations that can be implemented by a small…

Tagged with: #machine-learning#transformers

transformer

The core of the transformer architecture is multi-headed [ attention ]. The transformer block consists of a multi-headed attention layer…

Tagged with: #ai#machine-learning#transformers

See All tags