Modified: March 16, 2022
Gaussian process
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Weight-Space View
Recall standard linear regression. We suppose and where , where can be augmented with an implicit 1 term to allow a bias to be learned. This gives us a likelihood of observing any particular set of data points, given weights. If we're Bayesian we'll also need to put a prior on the weights; in particular let's take for some covariance matrix . Then we can use Bayes' rule to calculate the posterior weights given data; this will turn out to be Gaussian:
where . We can finally calculate the predictive distribution, which is again Gaussian:
Suppose we want to map our data into a higher-dimensional feature space using instead of . For brevity, we'll write instead of and instead of :
where now . Now we define . It is then possible to show that the mean and variance given above are equal to
(for the mean it's simple algebra, the covariance requires the matrix inversion lemma). Now let's define ; note that we can write the predictive distribution such that all of the data passes through , which we call the covariance function. It turns out that this gives exactly the same predictive distribution as the function-space view of a Gaussian process.