Created: October 04, 2021
Modified: February 20, 2022
Modified: February 20, 2022
model-agnostic meta learning
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Original paper: Finn, Abbeel, and Levine, ICML 2017, https://arxiv.org/abs/1703.03400
- An approach for meta learning that works with any model and set of losses . The goal is to learn a good set of 'initial' weights such that taking just a few gradient steps on a task-specific loss gives a good solution for that task.
- In a general meta-learning approach, the 'adaptation' process given a new task at test-time could be anything. MAML commits to the adaptation process being just a small number of gradient steps parameterized by an initial .
- The computation graph for the inner optimization routine looks something like:
def inner_loop(w):
# Given initial weights w, take a few gradient steps on the loss,
# then return the prediction from the final weights.
for _ in num_inner_steps:
with GradientTape() as tape:
loss = loss_fn(y, f(x, w))
w = w + learning_rate * tape.gradient(loss, w)
return loss_fn(y, f(x, w))
- Meanwhile, the outer loop looks like
w = Variable(weights_shape)
for _ in num_outer_steps:
with GradientTape() as tape:
loss = inner_loop()
optimizer.apply_gradients(tape.gradient(loss, w), w)
- To properly differentiate this requires second derivatives; in particular Hessian-vector products. However, the paper claims that ignoring the second-order terms still works pretty well, meaning that most of the juice is coming just from taking first-order gradients at the adapted value of
w
.