Other articles


  1. Finding Common Topics

    How do you find thematic clusters in a large corpus of text documents? The techniques baked into sklearn (e.g. nonnegative matrix factorization, LDA) give you some intuition about common themes. But contemporary NLP has largely moved on from bag-of-words representations. We can do better with some transformer models!

    For …

    read more
  2. Finite Basis Gaussian Processes

    By Mercer's theorem, every positive definite kernel \(k(x, y) : \mathcal{X} \to \mathcal{X} \to \mathbb{R}\) that we might want to use in a Gaussian Process corresponds to some inner product \(\langle \phi(x), \phi(y) \rangle\), where \(\phi : \mathcal{X} \to \mathcal{V}\) maps our inputs into …

    read more
  3. Finite Particle Approximations

    Say you have a discrete distribution \(\pi\) that you want to approximate with a small number of weighted particles. Intuitively, it seems like the the best choice of particles would be the outputs of highest probability under \(\pi\), and that the relative weights of these particles should be the same …

    read more
  4. Nearest Neighbor Gaussian Processes

    In a \(k\)-Nearest Neighbor Gaussian Process, we assume that the input points \(x\) are ordered in such a way that \(f(x_i)\) is independent of \(f(x_j)\) whenever \(i > j + k\). When \(k=2\), for example, this means we can generate the sequence of process values by sampling the …

    read more
  5. Conjugate Computation

    This post is about a technique that allows us to use variational message passing on models where the likelihood doesn't have a conjugate prior. There will be a lot of Jax code snippets to make everything as concrete as possible.

    The Math

    Say \(X\) comes from a distribution with density …

    read more