How to include mrrobust Stata code in an R Markdown or Quarto document using the Statamarkdown R package

Introduction

This example shows how to run R and Stata code within the same R Markdown (.Rmd) file using the Statamarkdown R package. More information about this package is available here and here. To install this package and load it into your current R session run the following in R.

# install.packages("Statamarkdown") # uncomment on first run
library(Statamarkdown)

Note when writing our Stata code chunks we need to be careful when we specify the collectcode=TRUE code chunk option, because each Stata code chunk is run as a separate batch job. For example, we include this chunk option in a chunk which reads in a dataset which we wish to use in subsequent chunks.

Using R and Stata code in the same script means that we can use the functions provided by the TwoSampleMR package to obtain data from MR-Base (Hemani et al. 2018).

Next we install the other required packages in R.

remotes::install_github("MRCIEU/TwoSampleMR")
remotes::install_github("MRCIEU/MRInstruments")
install.packages("foreign")

Example R Markdown document

---
title: Using mrrobust in an R Markdown or Quarto document
---

## Extracting data from MR-Base

We will be running the script from the MR-Base paper 
([Hemani et al., 2018](https://doi.org/10.7554/eLife.34408)).
The R code we will use is from
[here](https://raw.githubusercontent.com/explodecomputer/mr-base-methods-paper/master/scripts/ldl-chd.R).

We load the packages into our R session.
Note that the **foreign** package provides the `write.dta()` function
which we will use to save the data in Stata format.
```{r}
library(TwoSampleMR)
library(MRInstruments)
library(foreign)
```


We can access the data using the **MRInstruments** package.

```{r}
data(gwas_catalog)

# Get published SNPs for LDL cholesterol
ldl_snps <-
  subset(gwas_catalog,
    grepl("LDL choles", Phenotype) & Author == "Willer CJ")$SNP

# Extract from GLGC dataset
exposure <-
  convert_outcome_to_exposure(
    extract_outcome_data(ldl_snps, "ieu-a-300"))

# Get outcome data from Cardiogram 2015
outcome <- extract_outcome_data(exposure$SNP, "ieu-a-7")

# Harmonise exposure and outcome datasets
# Assume alleles are on the forward strand
dat <- harmonise_data(exposure, outcome, action = 1)
```

At this point we have our harmonised genotype-exposure and genotype-outcome
association data saved in an object in our R session called `dat`.

The next two code chunks perform the analysis in R.

```{r}
# Perform MR analysis
mr(dat)
mr_heterogeneity(dat)
dat$exposure <- "LDL cholesterol"
dat$outcome <- "Coronary heart disease"

# Label outliers and create plots
dat$labels <- dat$SNP
dat$labels[! dat$SNP %in% c("rs11065987", "rs1250229", "rs4530754")] <- NA
```


To proceed in Stata we can save our `dat` object as a Stata dataset

```{r}
write.dta(dat, file = "dat.dta")
```


## Analysis in Stata using the mrrobust package

At this point in Stata install the **mrrobust** package and its
dependencies if you have not done so previously.
```stata
net install github, from("https://haghish.github.io/github/")
gitget mrrobust
```

We now read the dataset into Stata and look at the variable names
and the number of observations.

```{stata, collectcode=TRUE}
qui use dat, clear
```

```{stata}
ds, v(28)
```

```{stata}
di _N
```


We can then run the IVW model using `mregger` with fixed effect standard errors.

```{stata}
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)], ivw fe
```


We then fit the MR-Egger, median, and modal based estimators.

```{stata}
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)]
```

```{stata}
mrmedian beta_outcome se_outcome beta_exposure se_exposure, weighted
```

```{stata}
mrmodal beta_outcome se_outcome beta_exposure se_exposure, weighted
```


And we could continue with additional Stata code (or indeed R code) as we like.

To run this R Markdown file, either; open it in RStudio and click the Knit button in the toolbar of the Source code window, or run

rmarkdown::render('rmarkdown-call-stata-example.Rmd')

References

Hemani, Gibran, Jie Zheng, Benjamin Elsworth, Kaitlin H Wade, Valeriia Haberland, Denis Baird, Charles Laurin, et al. 2018. The MR-Base platform supports systematic causal inference across the human phenome.” eLife 7: e34408. https://doi.org/10.7554/eLife.34408.