How to include mrrobust Stata code in an R Markdown or Quarto document using the Statamarkdown R package
Introduction
This example shows how to run R and Stata code within the same R Markdown (.Rmd
) file using the Statamarkdown R package. More information about this package is available here and here. To install this package and load it into your current R session run the following in R.
# install.packages("Statamarkdown") # uncomment on first run
library(Statamarkdown)
Note when writing our Stata code chunks we need to be careful when we specify the collectcode=TRUE
code chunk option, because each Stata code chunk is run as a separate batch job. For example, we include this chunk option in a chunk which reads in a dataset which we wish to use in subsequent chunks.
Using R and Stata code in the same script means that we can use the functions provided by the TwoSampleMR package to obtain data from MR-Base (Hemani et al. 2018).
Next we install the other required packages in R.
::install_github("MRCIEU/TwoSampleMR")
remotes::install_github("MRCIEU/MRInstruments")
remotesinstall.packages("foreign")
Example R Markdown document
---
title: Using mrrobust in an R Markdown or Quarto document
---
## Extracting data from MR-Base
We will be running the script from the MR-Base paper [Hemani et al., 2018](https://doi.org/10.7554/eLife.34408)).
(
The R code we will use is from[here](https://raw.githubusercontent.com/explodecomputer/mr-base-methods-paper/master/scripts/ldl-chd.R).
We load the packages into our R session.`write.dta()` function
Note that the **foreign** package provides the
which we will use to save the data in Stata format.```{r}
library(TwoSampleMR)
library(MRInstruments)
library(foreign)
```
We can access the data using the **MRInstruments** package.
```{r}
data(gwas_catalog)
# Get published SNPs for LDL cholesterol
ldl_snps <-
subset(gwas_catalog,
grepl("LDL choles", Phenotype) & Author == "Willer CJ")$SNP
# Extract from GLGC dataset
exposure <-
convert_outcome_to_exposure(
extract_outcome_data(ldl_snps, "ieu-a-300"))
# Get outcome data from Cardiogram 2015
outcome <- extract_outcome_data(exposure$SNP, "ieu-a-7")
# Harmonise exposure and outcome datasets
# Assume alleles are on the forward strand
dat <- harmonise_data(exposure, outcome, action = 1)
```
At this point we have our harmonised genotype-exposure and genotype-outcome`dat`.
association data saved in an object in our R session called
The next two code chunks perform the analysis in R.
```{r}
# Perform MR analysis
mr(dat)
mr_heterogeneity(dat)
dat$exposure <- "LDL cholesterol"
dat$outcome <- "Coronary heart disease"
# Label outliers and create plots
dat$labels <- dat$SNP
dat$labels[! dat$SNP %in% c("rs11065987", "rs1250229", "rs4530754")] <- NA
```
`dat` object as a Stata dataset
To proceed in Stata we can save our
```{r}
write.dta(dat, file = "dat.dta")
```
## Analysis in Stata using the mrrobust package
At this point in Stata install the **mrrobust** package and its
dependencies if you have not done so previously.```stata
net install github, from("https://haghish.github.io/github/")
gitget mrrobust
```
We now read the dataset into Stata and look at the variable names
and the number of observations.
```{stata, collectcode=TRUE}
qui use dat, clear
```
```{stata}
ds, v(28)
```
```{stata}
di _N
```
`mregger` with fixed effect standard errors.
We can then run the IVW model using
```{stata}
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)], ivw fe
```
We then fit the MR-Egger, median, and modal based estimators.
```{stata}
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)]
```
```{stata}
mrmedian beta_outcome se_outcome beta_exposure se_exposure, weighted
```
```{stata}
mrmodal beta_outcome se_outcome beta_exposure se_exposure, weighted
```
And we could continue with additional Stata code (or indeed R code) as we like.
To run this R Markdown file, either; open it in RStudio and click the Knit
button in the toolbar of the Source code window, or run
::render('rmarkdown-call-stata-example.Rmd') rmarkdown