Created: January 16, 2022
Modified: January 16, 2022
Modified: January 16, 2022
meta-level shape of machine learning
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Unlike most modern deep learning systems, humans:
- don't have separate training/test phases (though we may have wake/sleep)
- don't operate on iid inputs --- indeed, we get a lot of juice learning from the dependence structure in our inputs.
- may not make a hard distinction between train-time 'slow weights' and inference-time fast weights (though we do have short and long-term memory).
- don't optimize a specific objective; indeed, we can learn objectives.
- likely don't follow a single global gradient
- can rewrite almost all levels of our computational processes as part of the learning process
It might be useful to reflect on these differences.
Specific to reinforcement learning, we don't really have a notion of repeated terminating trajectories. A human gets only one trajectory, ever, and it's very very long. So the model-free RL setting