Gaussian Processes for Out-Of-Distribution Detection

3 minute read

Published: June 30, 2021

Gaussian processes (GPs)

(Rasmussen 2003, Rasmussen, Williams 2006, Görtler, Kehlbeck, Deussen | Konstanz | Distill 2019)

A random process / stochastic process is a sequence of random variables $X_1, X_2, …, X_T$. A random field / stochastic field is when the random variables of interest occur in 2 or more dimensions (eg $X_{ij}, i=1, 2..N, j=1,2,..M$). We define a particular family of random processes / fields (e.g. Bernoulli process, Markov process, Markov random fields, Gaussian processes, etc.) by choosing a family of distributions to represent the joint PDF over the random variables in question. For instance, a stochastic process is called a GP if the joint distribution over the RVs $X_1, X_2, …, X_T$ is a multivariate gaussian distribution. If the sequence of RVs is defined over a continuous domain, we have a joint distribution over infinitely-many RVs - in other words we have a distribution over functions in a continuous domain. Multivariate gaussian distributions are defined by a mean vector and a covariance matrix. Accordingly, gaussian processes are defined by a mean function $m(x)$ and a covariance function $k(x_i, x_j)$. We write $f \sim GP(m, k)$ and say ‘the function $f$ is distributed as a GP with mean function $m$ and covariance function $k$’. If the covariance function depends only on the distance between $x_i$ and $x_j$, and not on their absolute values, the gaussian process is said to be $\textit{stationary}$.

GPs can be used in several ways:

As priors over functions, with the covariance function that defines the GP representing the smoothness of the prior –> $f(x_i)$ and $f(x_j)$ are correlated according to the distance between $x_i$ and $x_j$, and the smoothness defined by the covariance function $k(x_i, x_j)$.
Updating the prior in the light of training data - in other words, computing the posterior GP. Consider a scenario where we know the value of a function at a number of training input points $x_i, i=1, .. n$, and are interested in knowing the distribution over the function values at a number of test input points $x_j^{test}, j=1, .. m$. According to the GP, the joint distribution of the vector ($\in \mathcal{R}^{m+n}$) of function values of all training as well as test points is a multivariate gaussian distribution (defined by a mean vector and a covariance matrix). The conditional distribution of the function values at the test points given the function values at the training points is also a gaussian distribution whose parameters can be computed analytically in terms of the training data and the covariance function. A computational concern is that the mean vector and the covariance matrix of the posterior GP depend on the inverse of the covariance matrix of the training datapoints. This inversion is $O(n^3)$.
Training the GP prior: This refers to specifying a GP’s mean and covariance function upto some parameters and then finding maximum likelihood estimates for these parameters using available data.

Sparse GPs

(Snelson, Ghahramani | UCL | NeurIPS 2005, Hensman, Fusi, Lawrence | Sheffield 2013)

SGPs introduce $m$ ($« n$) new input points (called psuedo-inputs or $\textit{inducing points}$), such that the distribution of the function values at test points conditioned on the new data points is close to the one conditioned on the original $n$ data points, with the computational benefit that one has to now invert only a $m$x$m$ covariance matrix instead of a $n$x$n$ one.

Distributional Gaussian Process Layers for Outlier Detection in Image Segmentation

(Popescu, Sharp, Cole, Kamnitsas, Glocker | Imperial College London | IPMI 2021)

Share on

Twitter Facebook LinkedIn

Neerav Karani

Gaussian Processes for Out-Of-Distribution Detection

Gaussian processes (GPs)

Sparse GPs

Deep GPs

Convolutional GPs

Deep convolutional GP

Distributional GPs

Distributional Gaussian Process Layers for Outlier Detection in Image Segmentation

Share on

You May Also Enjoy

Implicit neural representations