Created: April 15, 2022
Modified: April 15, 2022

large control policies

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Taco Cohen speculates on Large Control Policies as a successor to large language models: https://twitter.com/TacoCohen/status/1514675897781080080

these would be foundation models for control. condition it with text describing a task/goal, and it can execute the task.
to work across many environments you'd somehow also need to give it an environment spec and some mechanism for processing observations? or maybe there's some general interface for environments-like a human leader who can accomplish a wide range of tasks 'just' by talking with people and giving them direction.
if you could even do this for a specific robot, with specific actuators and cameras, that'd be quite useful.

Pieter Abbeel's CVPR keynote "Towards a General Solution for Robotics": https://www.youtube.com/watch?v=l9pwlXXsi7Q&t=1316s

use unsupervised / contrastive losses to train perceptual encoder - can do pixel-based RL almost as efficiently as from the underlying state representation
often still need to use some reward supervision on the encoder
will need multi-task RL, or unsupervised rl by intrinsic motivation
still hard to specify good rewards exactly, so iterate human-in-the-loop supervision, where we learn a reward model from humans choosing the better of two trajectories, similar to the steering language models work (and iteratively generate new trajectories to ensure we're getting feedback on the right regions of state space)
- refs on preference-based RL and binary-feedback RL: https://youtu.be/l9pwlXXsi7Q?t=2399

large control policies

Meta