Created: January 15, 2021
Modified: January 15, 2021
Modified: January 15, 2021
Pac-Bayes
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- I'm trying to build my understanding. These are fragments of intuitions.
- Bayesian inference starts with a prior P and a likelihood. Given data D, these uniquely imply a posterior Q. PAC-Bayes generalizes this: for some 'prior' distribution P and 'posterior' Q, it asks for bounds on the generalization risk of the distribution Q.
- Note that risk is defined in terms of a loss function. PAC-Bayes bounds often assume bounded loss.
- This also generalizes standard PAC learning, where we think about the risk of individual hypotheses. PAC-Bayes uses the 'Bayesian' risk, where we take the expectation over both data and our posterior on hypotheses.
- A PAC-Bayes bound on generalization error provides a learning principle: we might try to find the posterior Q that minimizes the error bound.
- Bounds are generally of the form
out_of_sample_risk(Q) <= in_sample_risk(Q) + complexity_term(Q)
. - A natural question: when is the 'optimal' Q equal to the Bayesian posterior?
- A natural guess is, if our loss is equal to for some likelihood model. It seems like this should be a central result??
- One reason to understand PAC-Bayes is that I'd like to get a picture of how it relates to the idea of many models.