Modified: May 04, 2023
natural abstraction
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.A 'natural' abstraction is one that we expect any agent (or at least, a wide range of agents) to develop because it gets at something essentially important about a system.
Generally speaking, developing 'good' or at least useful abstractions is an important capability for AIs and humans. Finding concepts that provide a useful lens on the world, like good definitions in mathematics, can be extraordinarily powerful.
There is a specific thread of 'natural abstractions' work proposed as an approach to AI alignment: if we can identify what makes some abstractions like 'person' natural, then perhaps we can find a way to robustly specify to an AI that it should care about people. https://www.lesswrong.com/posts/gvzW46Z3BsaZsLc25/natural-abstractions-key-claims-theorems-and-critiques-1
I don't yet understand what if anything is distinct about this research from the capabilities direction.
There seems to be a pretty fundamental connection to Noether's theorem: every symmetry has an associated conserved quantity. The conserved quantity could be viewed as a 'natural abstraction' of the system. For example, the energy of a physical system is a very natural abstract representation of the system's micro state.