Pac-Bayes: Nonlinear Function
Created: January 15, 2021
Modified: January 15, 2021

Pac-Bayes

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • I'm trying to build my understanding. These are fragments of intuitions.
  • Bayesian inference starts with a prior P and a likelihood. Given data D, these uniquely imply a posterior Q. PAC-Bayes generalizes this: for some 'prior' distribution P and 'posterior' Q, it asks for bounds on the generalization risk of the distribution Q.
    • Note that risk is defined in terms of a loss function. PAC-Bayes bounds often assume bounded loss.
  • This also generalizes standard PAC learning, where we think about the risk of individual hypotheses. PAC-Bayes uses the 'Bayesian' risk, where we take the expectation over both data and our posterior on hypotheses.
  • A PAC-Bayes bound on generalization error provides a learning principle: we might try to find the posterior Q that minimizes the error bound.
  • Bounds are generally of the form out_of_sample_risk(Q) <= in_sample_risk(Q) + complexity_term(Q).
  • A natural question: when is the 'optimal' Q equal to the Bayesian posterior?
    • A natural guess is, if our loss is equal to logplikelihood(yh)-\log p_\text{likelihood}(y | h) for some likelihood model. It seems like this should be a central result??
  • One reason to understand PAC-Bayes is that I'd like to get a picture of how it relates to the idea of many models.