Bayes of our lives: a gentle introduction to Bayesian statistics

Bayesian statistics is an interpretation of statistics. It is used to help explain the frequentist methods and can give much more information. Even if you have never really learnt about Bayesian statistics, I guarantee you have encountered it in some way. Bayes, it’s everywhere In this post, we will only consider a linear model: $$y = \beta x + \epsilon$$ where $$\epsilon$$ is a standard normal. Suppose we have gathered some data $$(Y=\{y_i\}_{i=1}^n, X=\{\{x_{k,i}\}_{k=1}^p\}_{i=1}^n)$$, which consist of $$p$$ predictors and $$n$$ observations, and we wish to fit a linear model. [Read More]

Analysis of calving of JH Dorrington Farm Part III

Drum roll please. This is the long awaited third and final part of the analysis from JH Dorrington Farm. If you have not already, read the first part and second part. Leaving where I left off, almost all of our models fit pretty well except for CART, so in what follows, I will ignore the CART model. That leaves us with linear regression models and MARS. MARS essentially builds a piecewise linear model using hinges. [Read More]

Analysis of calving of JH Dorrington Farm Part II

This is the second part of the analysis for the data from JH Dorrington Farm. You might want to read the first part before reading this one. Before we put on our science hats, let us make an outline for what we will do. Previously we split the data into training and test sets 80/20. We will fit all of our models and calibrate them on the training set. Decisions about keeping/dropping predictors, transforming predictors and which model to chose will be left to the test set. [Read More]

Analysis of calving of JH Dorrington Farm Part I

Here I will analyse a real life problem. My friend Chris at JH Dorrington Farm has kindly provided me with the data and allowed me to make this post. This will be several parts as I explore the data and try to fit various models. I’m going to stop milking this introduction and get right to it. My friend Chris has been collecting various forms of data about his cows. [Read More]

Correlation in linear regression

If you have a data set with large number of predictors, you might use some basic models to try and eliminate some of the predictors that don’t show a significant relationship to the response variable. In such cases it is important to look at the correlation between the predictors. How important? Let’s find out. Let us consider a very simple example here with two predictors and one response variable. set.seed(2017) data = tibble(x1 = rnorm(1000)) %>% mutate(y = 2 * x1^3 + rnorm(1000), x2 = 1. [Read More]