Introduction To Hamiltonian Monte Carlo

Introduction to Hamiltonian Monte Carlo One thing that has been occupying my head in the past couple of weeks has been HMC and how it can be used in large data/large model context. HMC stands for Hamiltonian Monte Carlo and it’s the de facto Bayesian method for sampling due to it’s speed. Before getting into big datasets and big models, let me motivate this problem a little bit. If you are new to Bayesian modelling, I have a little primer on the topic so I will assume for the most part you are familiar with basic Bayesianism. [Read More]

Variational inference, the art of approximate sampling

In the spirit of looking at fancy word topics, this post is about variational inference. Suppose you granted me one super power and I chose the ability to sample from any distribution in a fast and accurate way. Now, you might think that’s a crappy super-power, but that basically enables me to fit any model I want and provide uncertainty estimates. To make the problem concrete, lets suppose you are trying to sample from a distribution \(p(x)\). [Read More]

Spike and slab: Bayesian linear regression with variable selection

Spike and slab is a Bayesian model for simultaneously picking features and doing linear regression. Spike and slab is a shrinkage method, much like ridge and lasso regression, in the sense that it shrinks the “weak” beta values from the regression towards zero. Don’t worry if you have never heard of any of those terms, we will explore all of these using Stan. If you don’t know anything about Bayesian statistics, you can read my introductory post before reading this one. [Read More]

Bayesian analysis of Premier League football

In this post we are going to look at some football statistics. In particular, we will examine English football, the Premier League, using Bayesian statistics with Stan. If you have no idea what Bayesian statistics is, you can read my introductory post on it. Otherwise this post shouldn’t be a difficult read. All right, let’s get to it. First, we need some data. I will use all the matches from the Premier League seasons 16/17 and 17/18 (which is still ongoing at the time of the writing). [Read More]

Summer Olympics: the countries that beat the expectations

In this post we take a look at the summer Olympics and try to see which countries performed substantially differently than was expected of them. We will look at the Olympics from 1964 through to 2008. For each year, we will run a predictive model, trying to predict the number of medals a country wins, using selected datasets that are available before each of the Olympics. We will see that this model performs well out of sample and this model will be what we expect. [Read More]