Modified:
MuZero
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Links to this note
AI reflections master
This page is a general jumping-off point for organizing my thoughts about the [ AI research landscape ], where the field is, where it is…
reward funnel
When thinking about the [ reward ] function for a real-world AI system, there is always some causal process that determines reward. For…
model-based rl
Often we don't explicitly use 'model-based RL' methods, instead people in robotics talk about Sim2Real: adapting a policy pretrained in a…
AI predictions
In the spirit of [ prediction as a model-building exercise ]. Language modeling: system writes publishable poetry: debatably already…
value aligned language game
Suppose I have an agent that generates text. I want it to generate text that is [ value alignment|aligned ] with human values. Approaches…
most learning is by demonstration
In any human-to-human interaction, language carries some very important high-order bits, but it can only carry a few bits. It can help…
AI research landscape
As of April 2021: Giant [ transformer ]s work better than anyone has a right to expect. GPT3 is fucking amazing. [ DALL-E ] clearly has some…
many models
An idea I got from [ John Higgs ]'s discussion of metamodernism is that taking [ all models are wrong ] to its logical conclusion requires…