Spike and slab: Bayesian linear regression with variable selection

Spike and slab is a Bayesian model for simultaneously picking features and doing linear regression. Spike and slab is a shrinkage method, much like ridge and lasso regression, in the sense that it shrinks the “weak” beta values from the regression towards zero. Don’t worry if you have never heard of any of those terms, we will explore all of these using Stan. If you don’t know anything about Bayesian statistics, you can read my introductory post before reading this one. [Read More]

Correlation in linear regression

If you have a data set with large number of predictors, you might use some basic models to try and eliminate some of the predictors that don’t show a significant relationship to the response variable. In such cases it is important to look at the correlation between the predictors. How important? Let’s find out. Let us consider a very simple example here with two predictors and one response variable. set.seed(2017) data = tibble(x1 = rnorm(1000)) %>% mutate(y = 2 * x1^3 + rnorm(1000), x2 = 1. [Read More]