Created: May 23, 2021
Modified: May 23, 2021
Modified: May 23, 2021
data efficiency
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Current (2021) deep networks require huge datasets in order to generalize. But we know that humans can do one-shot learning. How will we build ML systems that can learn as efficiently as people do?
- multimodal transformers: maybe a picture is worth a thousand words, but also maybe vice versa. Combining multiple modalities might force a deeper understanding. (see eg CLIP)
- General pretraining: once GPT-3 exists, we should be able to use it to help us with our vision problems.
- meta learning. SGD is a very ineffective learning algorithm; it may need to see a piece of data many times to fully incorporate it. We should be able to train better algorithms.
- fast weights: systems like GPT do succeed at one-shot learning, in part because their training task requires it. This is, in some sense, meta learning---the system is given a bunch of one-shot learning tasks, and updated to do well on them. But the learning from any given example isn't 'stored' anywhere; it exists ephemerally in the activations of the network. Formalizing this and finding a way to store and recall these activations (more efficiently than re-presenting and re-processing the conditioning context) could be useful.
- Explicit or implicit generative modeling: we all know that a good model is worth infinitely many data points.