By Mercer's theorem, every positive definite kernel \(k(x, y) : \mathcal{X} \to \mathcal{X} \to \mathbb{R}\) that we might want to use in a Gaussian Process corresponds to some inner product \(\langle \phi(x), \phi(y) \rangle\), where \(\phi : \mathcal{X} \to \mathcal{V}\) maps our inputs into …
Say you have a discrete distribution \(\pi\) that you want to approximate with a small number of weighted particles. Intuitively, it seems like the the best choice of particles would be the outputs of highest probability under \(\pi\), and that the relative weights of these particles should be the same …
In a k-Nearest Neighbor Gaussian Process, we assume that the input points \(x\) are ordered in such a way that \(f(x_i)\) is independent of \(f(x_j)\) whenever...
This post is about a technique that allows us to use variational message passing on models where the likelihood doesn't have a conjugate prior. There will be a lot of Jax code snippets to make everything as concrete as possible.
The Math
Say \(X\) comes from a distribution with density …
Say you're trying to maximize a likelihood \(p_{\theta}(x)\), but you only have an unnormalized version \(\hat{p_{\theta}}\) for which \(p_{\theta}(x) = \frac{\hat{p_\theta}(x)}{N_\theta}\). How do you pick \(\theta\)? Well, you can rely on the magic of self normalized importance sampling.