model-agnostic meta learning: Nonlinear Function
Created: October 04, 2021
Modified: February 20, 2022

model-agnostic meta learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • Original paper: Finn, Abbeel, and Levine, ICML 2017, https://arxiv.org/abs/1703.03400
  • An approach for meta learning that works with any model f(x;w)f(x; w) and set of losses li(y,f(x;w))l_i(y, f(x; w)). The goal is to learn a good set of 'initial' weights ww such that taking just a few gradient steps on a task-specific loss lil_i gives a good solution for that task.
  • In a general meta-learning approach, the 'adaptation' process given a new task at test-time could be anything. MAML commits to the adaptation process being just a small number of gradient steps parameterized by an initial ww.
  • The computation graph for the inner optimization routine looks something like:
def inner_loop(w):
  # Given initial weights w, take a few gradient steps on the loss,
  # then return the prediction from the final weights.
  for _ in num_inner_steps:
    with GradientTape() as tape:
      loss = loss_fn(y, f(x, w))
    w = w + learning_rate * tape.gradient(loss, w)
  return loss_fn(y, f(x, w))
  • Meanwhile, the outer loop looks like
w = Variable(weights_shape)
for _ in num_outer_steps:
  with GradientTape() as tape:
    loss = inner_loop()
  optimizer.apply_gradients(tape.gradient(loss, w), w)
  • To properly differentiate this requires second derivatives; in particular Hessian-vector products. However, the paper claims that ignoring the second-order terms still works pretty well, meaning that most of the juice is coming just from taking first-order gradients at the adapted value of w.