priors are conceptual attention: Nonlinear Function
Created: May 25, 2023
Modified: May 25, 2023

priors are conceptual attention

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

A Bayesian view of (one aspect of) attention inspired by a conversation with Shamil Chandaria on predictive processing. (but this was my interpretation)

In a space of possible explanations, you can't check all of them due to finite resources. Which ones do you consider (attend to)? Those with high prior probability!

This is basically saying that we do something like importance sampling with the prior as the proposal. (more generally, attention is whatever proposal distribution we use - but there's a sense in which any non-data-driven proposal is our true revealed prior in that it's indicating where we actually expect to see the high-likelihood explanations)

In a multilevel Bayesian model of the world, we have priors on high-level concepts and organizational strategies (I have a body, there is an external world which contains objects), which induce expectations over more granular categories (there is a table in front of me), which induces expectations over even lower-level perceptions (the tabletop has a particular wood-grained texture), which ultimately can be checked against sensory data. And inference happens via some version of belief propagation or variational inference, which updates our beliefs at each level into some sort of self-consistent world model that includes our observed sensory data.

This inference seems to be top-down guided in important ways. Depression can be seen as a prior that the world is generally bad, things are deeply wrong, etc, which will tend to cause us to find explanations at all levels that are consistent with that prior.

Another view (which seems to be what Chandaria actually holds) is that attention is something separate from priors. When something surprising happens in awareness (data that is low-probability under our model), our attention is drawn towards that thing. This helps ensure that we devote resources to incorporating the surprising data into our model. So attention is part of the inference algorithm, part of the update mechanism. It might be (in fact, almost certainly is) that there are just multiple senses or things we mean by 'attention'; different mechanisms by which we allocate cognitive resources.

to let go in a spiritual sense might == relaxing priors, including the prior that the self exists (which in predictive processing is what drives action)