Other articles


  1. Finite Basis Gaussian Processes

    By Mercer's theorem, every positive definite kernel \(k(x, y) : \mathcal{X} \to \mathcal{X} \to \mathbb{R}\) that we might want to use in a Gaussian Process corresponds to some inner product \(\langle \phi(x), \phi(y) \rangle\), where \(\phi : \mathcal{X} \to \mathcal{V}\) maps our inputs into …

    read more
  2. Finite Particle Approximations

    Say you have a discrete distribution \(\pi\) that you want to approximate with a small number of weighted particles. Intuitively, it seems like the the best choice of particles would be the outputs of highest probability under \(\pi\), and that the relative weights of these particles should be the same …

    read more
  3. Nearest Neighbor Gaussian Processes

    In a k-Nearest Neighbor Gaussian Process, we assume that the input points \(x\) are ordered in such a way that \(f(x_i)\) is independent of \(f(x_j)\) whenever...

    read more
  4. Conjugate Computation

    This post is about a technique that allows us to use variational message passing on models where the likelihood doesn't have a conjugate prior. There will be a lot of Jax code snippets to make everything as concrete as possible.

    The Math

    Say \(X\) comes from a distribution with density …

    read more
  5. Fun with Likelihood Ratios

    Say you're trying to maximize a likelihood \(p_{\theta}(x)\), but you only have an unnormalized version \(\hat{p_{\theta}}\) for which \(p_{\theta}(x) = \frac{\hat{p_\theta}(x)}{N_\theta}\). How do you pick \(\theta\)? Well, you can rely on the magic of self normalized importance sampling.

    read more