Other articles


  1. Classifying Ships with Gausian Process Mixtures

    I recently came across a dataset of container ship movement between Tallinn and Helsinki on Kaggle. In this notebook, we'll try to classify whether a given ship's trajectory seems similar to those of the container ships, or whether we're looking at something else (perhaps a pirate).

    using DataFrames, PythonCall, CSV …
    read more
  2. Sizecheck: Making Tensor Code Self-Documenting with Runtime Shape Validation

    Writing neural networks often feels like juggling tensors in the dark. You know that attention_weights should be 4-dimensional, but PyTorch won't tell you until your matrix multiplication explodes at runtime. What if your variable names could automatically validate tensor shapes?

    Meet sizecheck – a Python decorator that for automatic runtime validation …

    read more
  3. Frequentist Sample Size Estimation

    In the previous post, I showed a Bayesian method of sample size estimation for A/B/n testing. This post goes over the more conventional frequentist method.

    As before, here's the context. Say we're trying to test which variant of an email message generates the highest response rate. We consider …

    read more
  4. Bayesian Power Analysis for A/B/n Tests

    This post highlights a Bayesian approach to sample size estimation in A/B/n testing. Say we're trying to test which variant of an email message generates the highest response rate from a population. We consider \(k\) different messages and send out \(n\) emails for each message. After we wait …

    read more
  5. Finding Common Topics

    How do you find thematic clusters in a large corpus of text documents? The techniques baked into sklearn (e.g. nonnegative matrix factorization, LDA) give you some intuition about common themes. But contemporary NLP has largely moved on from bag-of-words representations. We can do better with some transformer models!

    For …

    read more