Created:
Modified:

MuZero

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Links to this note

AI reflections master

This page is a general jumping-off point for organizing my thoughts about the [ AI research landscape ], where the field is, where it is…

reward funnel

When thinking about the [ reward ] function for a real-world AI system, there is always some causal process that determines reward. For…

model-based rl

Often we don't explicitly use 'model-based RL' methods, instead people in robotics talk about Sim2Real: adapting a policy pretrained in a…

AI predictions

In the spirit of [ prediction as a model-building exercise ]. Language modeling: system writes publishable poetry: debatably already…

value aligned language game

Suppose I have an agent that generates text. I want it to generate text that is [ value alignment|aligned ] with human values. Approaches…

most learning is by demonstration

In any human-to-human interaction, language carries some very important high-order bits, but it can only carry a few bits. It can help…

AI research landscape

As of April 2021: Giant [ transformer ]s work better than anyone has a right to expect. GPT3 is fucking amazing. [ DALL-E ] clearly has some…

many models

An idea I got from [ John Higgs ]'s discussion of metamodernism is that taking [ all models are wrong ] to its logical conclusion requires…