most learning is by demonstration: Nonlinear Function
Created: July 10, 2020
Modified: June 12, 2021

most learning is by demonstration

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

In any human-to-human interaction, language carries some very important high-order bits, but it can only carry a few bits. It can help define goals and concepts, but your brain has billions of parameters and language doesn't have enough bits to supervise them. Language is part of LeCun's Cherry: it's valuable, but not anywhere close to a sufficient learning signal.

Explicitly teaching people through language is hard: people don't always understand which elements of their practice are critical or know how to express them in language. Furthermore, hearing something expressed in language wouldn't immediately give you the ability to do it even if were a low enough bitrate task. You still have to 'translate' the language into actions, and this process may not be mechanical: it will require planning, and advice is hard to take.

On the other hand, in imitation learning you see an entire action sequence containing all necessary information. The planning problem is solved for you, and you observe the dynamics of the environment with attention to the goal-relevant parts. There's just a ton of juice there both statistically and computationally. In biological systems, learning by demonstration greatly predates language and can exploit built-in mechanisms like mirror neurons.

This has implications for our practical lives, and for machine learning.

In practice:

  • The prominence of imitation learning means you are the sum of the people you spend time around. Conversely, if you want to become someone else, spend more time around people like what you want to become.
  • It's impossible to learn to be a good researcher (or any job) by reading books. It might be that we can express enough of the abstract ideas. But seeing the practice of an intellectual community---giving talks, developing ideas, deciding 'how to tell a story' for a paper, what kinds of questions people ask---is still crucial to becoming a person who can function within the culture.
  • This is also why it's hard to do something new. Even un-theorized areas have cultures of practice that are built up over time. In doing something entirely new, you don't have access to existing practices. At best, you can hope to transfer and translate pieces of related practices to the new context.

Lessons for ML and AI research:

  • We should be about careful trying to program robots strictly with goals, or with language. Even for 'simple' tasks, like cutting fruit, most people have the benefit of demonstration before we do it ourselves. There may be a general mechanical fluency that comes from experience with many mechanical tasks, but that will be a later, higher-level thing built on top of a foundation of demonstration.
  • Experience from AlphaStar and OpenAI Five supports the importance of demonstration. Both systems were bootstrapped on many expert games. This gets them to the right part of the search space, where they're at least kind of doing the task. Searching for a policy with this property might otherwise have taken them combinatorically long.
  • MuZero seems to provide evidence against the thesis of learning by demonstration---starting from scratch, it manages to 'solve' the game better than previous systems and without the bias of human initialization. But it benefits from self-play in at least two ways:
    • Because Go is a simple game for which we have the compute to do many simulations, it can brute-force its way to decent policies that would be much more expensive to search for in the real world.
    • Self-play gradually ramps up difficulty: a system playing against itself will 'win' half the time from the very beginning of training. Real-world tasks, where Nature is the adversary, do not naturally adapt in this way. They often present "reward cliffs", where some minimal degree of competence is needed to do a task at all. Demonstrations provide a bridge to a point where exploration can actually get a reward signal.

The impossibility of demonstration is a big difficulty when learning meditation. Teachers can describe what to do, but you can never really see them do it. This makes one-on-one dialogue with a teacher, which can optimize information transfer analogously to interactive proofs, even more valuable than usual.