Created: January 15, 2021
Modified: January 15, 2021

Pac-Bayes

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

I'm trying to build my understanding. These are fragments of intuitions.
Bayesian inference starts with a prior P and a likelihood. Given data D, these uniquely imply a posterior Q. PAC-Bayes generalizes this: for some 'prior' distribution P and 'posterior' Q, it asks for bounds on the generalization risk of the distribution Q.
- Note that risk is defined in terms of a loss function. PAC-Bayes bounds often assume bounded loss.
This also generalizes standard PAC learning, where we think about the risk of individual hypotheses. PAC-Bayes uses the 'Bayesian' risk, where we take the expectation over both data and our posterior on hypotheses.
A PAC-Bayes bound on generalization error provides a learning principle: we might try to find the posterior Q that minimizes the error bound.
Bounds are generally of the form out_of_sample_risk(Q) <= in_sample_risk(Q) + complexity_term(Q).
A natural question: when is the 'optimal' Q equal to the Bayesian posterior?
- A natural guess is, if our loss is equal to $-\log p_\text{likelihood}(y | h)$ for some likelihood model. It seems like this should be a central result??
One reason to understand PAC-Bayes is that I'd like to get a picture of how it relates to the idea of many models.

Pac-Bayes

Links to this note

Occam generalization bound

Meta