LeCun's Cherry: Nonlinear Function
Created: July 10, 2020
Modified: June 12, 2021

LeCun's Cherry

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • Yann Lecun's famous cake analogy is that: "If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning." He later updated 'unsupervised learning' to 'self-supervised learning'. (which is one approach to representation learning)
  • This is 'obviously true' from the standpoint of information content. Predicting the input provides way more bits of supervision than typical reward functions.
  • On the other hand: Silver, Singh, Precup, and Sutton argue that reward is enough: maximizing a reward signal implies, on its own, a very broad range of intelligent abilities.
  • These points do not conflict with each other, because both are true. But they are at least in tension. A predictive objective is not an RL objective. So if we're ultimately maximizing an RL objective, how do we get the bits of information from a predictive objective?
    • Of course maximizing reward implies the ability to plan, which implies being able to perceive and predict, etc. But it doesn't necessarily give us a strong gradient signal.
    • Can we formalize self-supervision as reward shaping?