many models: Nonlinear Function
Created: May 05, 2020
Modified: January 06, 2023

many models

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

An idea I got from John Higgs's discussion of metamodernism is that taking all models are wrong to its logical conclusion requires us to have many models.

A model describes a view of how God created your data. What quantities are relevant, what are their dynamics and distributions, and in what ways are our measurements noisy? Classically Bayesian inference assumes that the model is 'true', and of course it is easy to construct situations where is this is a reasonable assumption; for example, by first building a model and then simulating data from it.

But in real life, all models are wrong; the map is not the territory. The true story for how God created your data is the Big Bang followed by 14 billion years of physics, but that's not usually the most useful lens. To be useful, your model will inevitably be a simplification that omits relevant factors.

Which factors should it preserve? If we have specific goals (a utility function), we might want our representation to help us choose actions to achieve those goals. For example, we can model the solar system as an interaction of a relatively small number of point masses (planets, moons, major asteroids, etc.). This is a great representation if our goal is to predict where Mercury will be next January. It is a bad representation if our goal is to predict who will win the next American presidential election. Of course, American politics is part of the reality of the solar system, but not one preserved in the planets-as-point-masses model.

In the absence of specific goals, a general principle is to find the model that best compresses the data (that is, the model producing the minimum description length). The most concise representation that fits the data is the most likely to generalize, since it includes the fewest unnecessary assumptions. To 'compress' the solar system as a whole, a model would need to encompass the physics of planetary orbits, the issues, tensions, coalitions and events of American political life, the fluid dynamics of the atmosphere of Venus, traffic patterns in Mumbai, and unimaginably many other phenomena. To combine all of these into a single model would require specifying not just models for each of these phenomena individually, but a joint model that specifies in all cases how they interact: what happens when political principles collide with economic or scientific principles? The answer is course is that 'it depends' on various context-specific details. How would such a joint model represent the world? It cannot just concatenate the representations of its component models, because it will need additional information not captured by those models to resolve conflicts in their predictions.

Q: what about probabilistic models? We can meaningfully define a notion of 'compression' that doesn't resolve all questions, but just specifies distributions and then optimally codes according to those distributions. But again: how do we reconcile the (distributional) predictions of higher-level models? We can define arbitrary answers (product of experts, etc.). A1: I argue that at this point---where we don't have the information to reconcile models in a principled way---we're already in a many-modeled world.

Principles:

  • By definition, a model will leave stuff out.
  • Different models will leave different stuff out.
  • The usefulness of any given model for making a particular decision depends on whether and how that model represents stuff relevant to the decision.
  • Therefore, no one model is right for all decisions. The model we'd theoretically want, the 'true model', is in fact not a model: it's the system itself. It's the territory, not a map. The contrapositive of all models are wrong is that, if it's right, it's not a model.

This observation, that different decision-making tasks demand different models, is related to the motivation for "loss-calibrated approximate Bayesian inference", which incorporates the reward signal to guide the belief representation.

This implies that an intelligent agent doesn't want just one model of the world; we want many models. We don't want just one representation; we need many representations, each capturing different aspects of the system.

This observation is a death knell for the idea that we can build any general-purpose system through purely Bayesian reasoning in a 'true model'. For most complex systems, no practical generative model will fully capture the raw data. Every model is built for a purpose. But any general-purpose agent has, tautologically, general purposes, so it must have many models.

I can think of two ways to implement 'many models'.

  • The first is: generative models on many different representations, each of which is produced (abstracted) from our raw data by some discriminative process (the raw data itself is never explicitly modeled). These representations and our predictions of them can be used for decision-making, to predict rewards and actions. In this architecture, the system never produces an underlying unified model of reality.
  • Alternately, we could somehow define our generative models on multiple representations to produce a model on underlying realities. For example, for any underlying state, we can abstract it to the representation of each model, compute its likelihood under that representation, and then assign the state a likelihood given by some weighted mixture/produce of experts, which we train to model the base-level distribution. Again, I claim that in this case we have already given in to the many-modeled world. (and/or maybe it's also troublesome for other reasons. can we sample or do inference in such a model?)

Intuitively: to travel the world we will want many different maps. Some will show terrain, some national and political boundaries, some, the locations of good Vietnamese restaurants. Each map is useful for a different purpose. A large (but manageable) collection of such maps is dramatically more useful than a single map the size of the territory would be.

I think my younger self fell too often into the idea that there is a 'true' model, or a 'right' way to think about the world. It is true that for a given goal (utility function), there is an optimal representation of the world, but it is not true that human beings all have the same goals, or even consistent goals over time.

An implication of the many-modeled viewpoint is that just seeking understanding is futile. The task is ill-posed; it's like asking for the perfect map. The answer will always depend on what you want to do with it. So, a life of learning may be attractive, and learning (both data collection and model formation/synthesis) is of course valuable, but it is not by itself a coherent goal.

Examples:

  • quantum mechanics and relativity. two different maps. we don't yet know the territory.

Outline of a book or blog series on this thesis: writing inbox

  • What are models and what are they good for?
  • Models are reality tunnels.
  • A day in the life of a person, and all of the different models that we use.
  • The philosophical/cultural story, following John Higgs. Postmodernism.
  • Relation to ensembles in machine learning.
    • We could think of a many-modeled representation as an ensemble. If each map implicitly defines a probability distribution on the underlying territory (even if we don't know how to normalize that distribution), then a collection of maps describes a mixture distribution (?). Actually that doesn't seem right. Each map puts constraints on the underlying territory. If we trust our maps, then the territory must satisfy all of the constraints. So the implied distribution is really a product of experts.
    • Relation to emptiness in Buddhism. Seeing some concept or experience as empty means realizing that it is 'only' part of a mental model. This is liberating because we understand that there are many potential models; there is always a different way to see things.
  • How to manage a many-modeled system? How do we know when to use each model? When do we learn new models?
  • What does many-modeled-ness imply about the design of artificial agents?
    • The first caveat is that it may not be necessary for us to think explicitly about models at all in designing agents. Deep networks are circuits, and we use differentiable programming to learn programs that can internally do model-based reasoning without needing our help. There's a world in which black-box learning of systems like GPT-3 gets us all the way, without building in a notion of model-based reasoning. But let's restrict ourselves to consideration of systems in which there is explicit representation of a model, e.g., the sort of systems built by people who do model-based RL.
    • There is a connection between many modeled-ness and hierarchical planning: to plan in terms of high-level actions, you need a model of what those actions will do. This model is necessarily different from the ground-level model of the domain.
    • If a MuZero-like agent learns a single abstract model of the world, is it doomed? If it wanted multiple models then presumably it could use part of the vector for one model, another part for another, and in situations where only one part is relevant it just wastes computation on the other parts.
  • How does it inform value alignment?
  • How does it inform curiosity and intrinsic motivation? I think we ordinarily define curiosity as model-driven.