Sam’s Blog

Fitting Gaussians with Missing Observations

statistics

Say you want to fit a multivariate Normal distribution to some data. We’ll show how to generalize the complete data case to the missing data case. Let’s start by generating…

Air Quality and Congestion Pricing

statistics

This post dives into the data of a recent paper quantifying the effects of Manhattan’s recent congestion pricing scheme (Fraser et al. (2025)). The authors argue that…

AnkiVec: Vector Search for Anki

tools

I just released AnkiVec, an Anki addon that creates vector embeddings for cards using Ollama and enables hybrid semantic search with ChromaDB.

Classifying Ships with Gaussian Process Mixtures

machine_learning

I recently came across a dataset of container ship movement between Tallinn and Helsinki on Kaggle. In this notebook, we’ll try to classify whether a given ship’s trajectory…

Analyzing Coffee Yields

statistics

This post demonstrates working with generalized linear mixed models in the context of coffee bean yield data. Each row in the following dataset is an observation of coffee…

Sizecheck: Making Tensor Code Self-Documenting with Runtime Shape Validation

tools

Writing neural networks often feels like juggling tensors in the dark. You know that attention_weights should be 4-dimensional, but PyTorch won’t tell you until your matrix…

Books and Guides

statistics

machine_learning

In the style of Susan Rigetti’s classic “So You Want to Learn Physics”, this post lists some of my favorite resources for learning stuff.

Frequentist Sample Size Estimation

statistics

In the previous post, I showed a Bayesian method of sample size estimation for A/B/n testing. This post goes over the more conventional frequentist method.

Bayesian Power Analysis for A/B/n Tests

statistics

This post highlights a Bayesian approach to sample size estimation in A/B/n testing. Say we’re trying to test which variant of an email message generates the highest…

An Opinionated Tooling Guide

tools

Statistics and Data Analysis: Overall: use R. Its has the largest ecosystem of statistical packages.

Finding Common Topics

machine_learning

How do you find thematic clusters in a large corpus of text documents? The techniques baked into sklearn (e.g. nonnegative matrix factorization, LDA) give you some intuition…

Synthetic Controls for Texas Prison Data

statistics

This post uses a synthetic control design to study whether Texas’s prison building boom in 1993 resulted in them incarcerating more prisoners than they would have if their…

Hop Lists

algorithms

Hop Lists are a novel retroactive set data-structure that allow for a branching timeline. Each hop list node \(h_t\) is associated with a specific time \(t\) and a randomly…

Graph SLAM

SLAM

For a robot to navigate autonomously, it needs to learn both its own location, as well as the locations of any potential obsticles around it, given its sensors’ observations…

Diagnosing Lack of Independence in Exogenous Variables

statistics

While performing linear regression with statsmodels, you might occasionally find that your exogenous variables aren’t independent, giving you a error about a singular matrix.

Finite Basis Gaussian Processes

machine_learning

By Mercer’s theorem, every positive definite kernel \(k(x, y) : \mathcal{X} \to \mathcal{X} \to \mathbb{R}\) that we might want to use in a Gaussian Process corresponds to…

Finite Particle Approximations

machine_learning

Say you have a discrete distribution \(\pi\) that you want to approximate with a small number of weighted particles. Intuitively, it seems like the the best choice of…

Nearest Neighbor Gaussian Processes

machine_learning

In a \(k\)-Nearest Neighbor Gaussian Process, we assume that the input points \(x\) are ordered in such a way that \(f(x_i)\) is independent of \(f(x_j)\) whenever \(i > j +…

Krylov Methods

math

The \(i\)th Krylov subspace \(\mathcal{K}_i\) for a symmetric matrix \(A\) starting from vector \(b\) is the subspace spanned by the vectors \(b, Ab, A^2b, \dotsc A^{i-1}b\).…

Mapping with Gaussian Conditioning

SLAM

For a robot to navigate autonomously, it needs to learn the locations of any potential obsticles around it. One of the standard ways to do this is with an algorithm known as…

Conjugate Computation

machine_learning

This post is about a technique that allows us to use variational message passing on models where the likelihood doesn’t have a conjugate prior. There will be a lot of Jax…

Generative ODE Models are VAEs

machine_learning

Generative image models based on ordinary differential equations can be seen as forms of variational auto-encoders with a partially deterministic inference network. \(\newcom…

Sparse Variational Gaussian Processes

statistics

This notebook introduces Fully Independent Training Conditional (FITC) sparse variational Gaussian process model. You shouldn’t need any prior knowledge about Gaussian…

Differential Equations Refresher

math

In my freshman year of college, I took an introductory differential equations class. That was nine years ago. I’ve forgotten pretty much everything, so I thought I’d review…

Fun with Likelihood Ratios

machine_learning

Say you’re trying to maximize a likelihood \(p_{\theta}(x)\), but you only have an unnormalized version \(\hat{p_{\theta}}\) for which \(p_{\theta}(x) =…