This post highlights a Bayesian approach to sample size estimation in A/B/n testing. Say we're trying to test which variant of an email message generates the highest response rate from a population. We consider \(k\) different messages and send out \(n\) emails for each message. After we wait …
read moreOther articles
Finding Common Topics
How do you find thematic clusters in a large corpus of text documents? The techniques baked into
sklearn
(e.g. nonnegative matrix factorization, LDA) give you some intuition about common themes. But contemporary NLP has largely moved on from bag-of-words representations. We can do better with some transformer models!For …
read moreSynthetic Controls for Texas Prison Data
This post uses a synthetic control design to study whether Texas's prison building boom in 1993 resulted in them incarcerating more prisoners than they would have if their rate of prison building had continued as normal. The analysis will build off the one in the book Causal Inference: The Mixtable …
read moreMatching in Observational Studies
A 'matching' quasi-experimental design controls for confounder variables \(x\) by estimating what the control outcomes \(y\) would be if the control population had the same values of \(x\) as the treatment population. To do this, we regress outcomes in the control population on \(x\), and apply this regression model to …
read moreHop Lists
Hop Lists are a novel retroactive set data-structure that allow for a branching timeline.
read moreGraph SLAM
Tracking a robot's motion with maximum likelihood estimation.
read moreDiagnosing Lack of Independence in Exogenous Variables
This post outlines a simple workflow for diagnosing lack of independence in
read morestatsmodels
.Finite Basis Gaussian Processes
By Mercer's theorem, every positive definite kernel \(k(x, y) : \mathcal{X} \to \mathcal{X} \to \mathbb{R}\) that we might want to use in a Gaussian Process corresponds to some inner product \(\langle \phi(x), \phi(y) \rangle\), where \(\phi : \mathcal{X} \to \mathcal{V}\) maps our inputs into …
read moreFinite Particle Approximations
Say you have a discrete distribution \(\pi\) that you want to approximate with a small number of weighted particles. Intuitively, it seems like the the best choice of particles would be the outputs of highest probability under \(\pi\), and that the relative weights of these particles should be the same …
read moreNearest Neighbor Gaussian Processes
In a k-Nearest Neighbor Gaussian Process, we assume that the input points \(x\) are ordered in such a way that \(f(x_i)\) is independent of \(f(x_j)\) whenever...
read moreFast SLAM
This notebook looks at a technique for simultaneous localization (finding the position of a robot) and mapping (finding the positions of any obstacles), abbreviated as SLAM. In this model, the probability distribution for the robot's trajectory \(x_{1:t}\) is represented with a set of weighted particles. Let the weight …
read moreMapping with Gaussian Conditioning
For a robot to navigate autonomously, it needs to learn the locations of any potential obsticles around it. One of the standard ways to do this is with an algorithm known as EKF-Slam. Slam stands for "simultaneous localization and mapping", as the algorithm must simultaneously find out where the robot …
read moreConjugate Computation
This post is about a technique that allows us to use variational message passing on models where the likelihood doesn't have a conjugate prior. There will be a lot of Jax code snippets to make everything as concrete as possible.
The Math
Say \(X\) comes from a distribution with density …
read moreSparse Variational Gaussian Processes
This notebook introduces Fully Independent Training Conditional (FITC) sparse variational Gaussian process model. You shouldn't need any prior knowledge about Gaussian processes- it's enough to know how to condition and marginalize finite dimensional Gaussian distributions. I'll assume you know about variational inference and Pyro, though.
read moreimport pyro import pyro.distributions …
Differential Equations Refresher
In my freshman year of college, I took an introductory differential equations class. That was nine years ago. I've forgotten pretty much everything, so I thought I'd review a little, trying to generalize the techniques along the way. I'll use summation notation throughout, and write \(\frac{\partial^n}{\partial x …
read moreFun with Likelihood Ratios
Say you're trying to maximize a likelihood \(p_{\theta}(x)\), but you only have an unnormalized version \(\hat{p_{\theta}}\) for which \(p_{\theta}(x) = \frac{\hat{p_\theta}(x)}{N_\theta}\). How do you pick \(\theta\)? Well, you can rely on the magic of self normalized importance sampling.
read moreJupyter for Everything
Forget vim or emacs of VSCode. Jupyter is hands down the best editor out there. The literate programming support, complete with images and beautiful latex snippets? The inline visualizations of all kinds of data-types, from maps to meshes to graphs and polygons? The widgets for interactively testing your code? It's …
read more