Created: May 23, 2021
Modified: October 04, 2021

meta learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Generally this means training some aspect of the learning procedure itself. There is then an inner-loop learning procedure, which follows gradients of the loss, and an outer loop procedure, which gets a single gradient when the inner loop executes.
We have a lot of choice in which aspects of the learning process or architecture we parameterize.
A simple approach that uses no extra parameters at all is to train the initial position of a short (just a few steps) optimization routine. This will tend to favor initial positions that allow the model+optimizer combination to 'adapt quickly', giving a good model with just a few gradient steps. This is called model-agnostic meta learning.
We can of course also train parameters of the optimization routine itself. Ultimately, the inner optimization routine (including the inner model executions and gradients) is just a big computation graph, which we can differentiate.

meta learning

Links to this note

emergent capabilities

model-agnostic meta learning

data efficiency

fast weights

Meta