Modified: February 25, 2022
in silico
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.In the 21st century, humanity is developing a new form of engineering. Rather than manually designing artifacts, we are optimizing over design spaces. Rather than manually testing ideas, we are running tests automatically in simulation and in the real world. This infrastructure doesn't necessarily require fundamentally better understanding of 'human' intelligence (although I bet that will help). But it will still revolutionize the world.
What are the pieces of the in silico engineering stack? For intuition, let's think concretely about the problem of protein design.
- There is a representation: for a protein, this is its amino acid sequence.
- There is a manufacturing process that maps the representation into a real-world artifact. For proteins, maybe this involves splicing a gene to produce the protein into some bacterium, yeast, etc.
- There is a real-world goal: the protein should catalyze a specific reaction, say, or bind to a specific target. The only way to judge a representation according to the real-world goal is to actually run an experiment. But this is expensive.
- Note that the 'real-world goal' doesn't really need to be in the real world; it just needs to be more expensive to evaluate than the proxy we're going to learn for it. So it can just as well be a difficult computation. That means we can nest this whole engineering structure to create optimizers at multiple level.
- We therefore need a proxy reward that can be evaluated in silico, on the representation.
- Since the real-world goal is (by definition) for some experiment to measure a particular outcome, a very natural way to construct a proxy reward function is using a simulator. We simulate the experiment and the resulting measurement.
- There is an optimizer that can be applied to improve the representation with respect to the proxy reward. This may involve discrete search, gradient-based optimization, evolution, etc.
- We occasionally need to do real-world experiments because our proxy reward is not the true reward. This means we need a notion of reward uncertainty: how well does the proxy reward represent the true reward?
- Since the proxy reward is typically defined on a model of the world, our reward uncertainty is partly due to model uncertainty: how many possible models have we not yet been able to rule out?
- We use Bayesian optimization to choose experiments to perform, according to what will best improve our proxy reward model (including e.g. any simulators on which it's based).
- The actual real-world experiments are implemented by robotics.
Note that the engineering problem is a special case of model-based RL with no state. That is, the only 'action' we take is to choose the initial conditions of each experiment, and our reward is based on what happens. The system state does evolve within an experiment (and we learn to simulate this), but we assume we can always reset the state and run a new independent experiment.
Other examples:
- Neural net hyperparameter tuning:
- Representation: hyperparameter vector
- Manufacturing process: run Keras to build a model.
- Real-world goal: get high validation loss on a trained model.
- Proxy reward: a GP.
- Simulator: not really present (the GP short-circuits simulator + reward)
- Optimizer: some kind of search over the GP's input space.
- Reward uncertainty: the GP maintains epistemic uncertainty about the reward function at different regions in space.
- Bayesian optimization: we use an acquisition function, like expected improvement, or do something like Thompson sampling.
- Real-world experiments: actually fitting the model. The 'robot' is Xmanager/Borg/etc.