This book presents code examples from Hernán and Robins (2020), which is available in draft form from the following webpage.

The R code is based on the code by Joy Shi and Sean McGrath given here.

The Stata code is based on the code by Eleanor Murray and Roger Logan given here.

This repo is rendered at Click the download button above for the pdf and eBook versions.

Downloading the code

The repo is available on GitHub here. There are a number of ways to download the code.


  • click the green Clone or download button then choose to Open in Desktop or Download ZIP.

    The Desktop option means open in the GitHub Desktop app (if you have that installed on your machine). The ZIP option will give you a zip archive of the repo, which you then unzip.

  • or fork the repo into your own GitHub account and then clone or download your forked repo to your machine.

Installing dependency packages

It is easiest to open the repo in RStudio, as an RStudio project, by doubling click the .Rproj file. This makes sure that R’s working directory is at the top level of the repo. If you don’t want to open the repo as a project set the working directory to the top level of the repo directories using setwd(). Then run:

# install.packages("devtools") # uncomment if devtools not installed

Downloading the datasets

We assume that you have downloaded the data from the Causal Inference Book website and saved it to a data subdirectory. You can do this manually or with the following code (nb. we use the here package to reference the data subdirectory).

dataurls <- list()
stub <- ""
dataurls[[1]] <- paste0(stub, "2012/10/")
dataurls[[2]] <- paste0(stub, "2012/10/")
dataurls[[3]] <- paste0(stub, "2017/01/")
dataurls[[4]] <- paste0(stub, "1268/20/nhefs.csv")

temp <- tempfile()
for (i in 1:3) {
  download.file(dataurls[[i]], temp)
  unzip(temp, exdir = "data")

download.file(dataurls[[4]], here("data", "nhefs.csv"))


Hernán, Miguel A, and James M Robins. 2020. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.