Created: October 04, 2021
Modified: February 20, 2022

model-agnostic meta learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Original paper: Finn, Abbeel, and Levine, ICML 2017, https://arxiv.org/abs/1703.03400
An approach for meta learning that works with any model $f(x; w)$ and set of losses $l_i(y, f(x; w))$ . The goal is to learn a good set of 'initial' weights $w$ such that taking just a few gradient steps on a task-specific loss $l_i$ gives a good solution for that task.
In a general meta-learning approach, the 'adaptation' process given a new task at test-time could be anything. MAML commits to the adaptation process being just a small number of gradient steps parameterized by an initial $w$ .
The computation graph for the inner optimization routine looks something like:

def inner_loop(w):
  # Given initial weights w, take a few gradient steps on the loss,
  # then return the prediction from the final weights.
  for _ in num_inner_steps:
    with GradientTape() as tape:
      loss = loss_fn(y, f(x, w))
    w = w + learning_rate * tape.gradient(loss, w)
  return loss_fn(y, f(x, w))

Meanwhile, the outer loop looks like

w = Variable(weights_shape)
for _ in num_outer_steps:
  with GradientTape() as tape:
    loss = inner_loop()
  optimizer.apply_gradients(tape.gradient(loss, w), w)

To properly differentiate this requires second derivatives; in particular Hessian-vector products. However, the paper claims that ignoring the second-order terms still works pretty well, meaning that most of the juice is coming just from taking first-order gradients at the adapted value of w.

model-agnostic meta learning

Links to this note

meta learning

Meta