Access the OpenGWAS API from a Stata do-file or Stata Markdown script

Introduction

This example shows how to run R and Stata code within the same Stata Markdown (.stmd) script. The general approach is detailed on the Stata Markdown website here and here.

This means that we can use the functions provided by the TwoSampleMR package to obtain data from the [OpenGWAS API}(https://api.opengwas.io/) which is the successor to MR-Base (Hemani et al. 2018).

The OpenGWAS API requires authentication. You must place an OPENGWAS_JWT token (by creating a free account on https://api.opengwas.io/profile/) in your .Renviron file before running the code below. See the ieugwasr website for more information.

Before you start please install the following two Stata packages from the SSC archive, so in Stata issue the following commands (I have commented them out because I have already installed them).

ssc install whereis
ssc install markstat

Example Stata Markdown document

The following is the example Stata Markdown document.

---
title: Using mrrobust in a Markstat document example
---

We first need to register the R executable with Stata.

```s
if c(os) == "Windows" local rpath "C:\\Program Files\\R\\R-4.4.1\\bin\\x64\\R.exe"
else if c(os) == "Unix" local rpath "/usr/bin/R"
else local rpath "/usr/local/bin/R"
whereis R "`rpath'"
```

Next we have an R code chunk in which we install the required packages in R.

```r
install.packages(
  "TwoSampleMR",
  repos = c(
    "https://mrcieu.r-universe.dev",
    "https://cloud.r-project.org"
  ),
  dependencies = TRUE
)
```

## Extracting data from the OpenGWAS API

We will be running the script from the MR-Base paper 
([Hemani et al., 2018](https://doi.org/10.7554/eLife.34408)). 
The R code we will use is from 
[here](https://raw.githubusercontent.com/explodecomputer/mr-base-methods-paper/master/scripts/ldl-chd.R).

First, we load the packages into our R session.
Note that the **foreign** package provides the 
`write.dta()` function which we will use to save the data in Stata format.

```r
library(TwoSampleMR)
library(MRInstruments)
library(foreign)
```

Our edited version of the code starts by reading in some code
to generate a set of plots in R.

```r
source("mrplots.R")
```

We can access the data using the **MRInstruments** package.

```r
data(gwas_catalog)

# Get published SNPs for LDL cholesterol
ldl_snps <- 
  subset(gwas_catalog, 
    grepl("LDL choles", Phenotype) & Author == "Willer CJ")$SNP

# Extract from GLGC dataset
exposure <- 
  convert_outcome_to_exposure(
    extract_outcome_data(ldl_snps, "ieu-a-300"))

# Get outcome data from Cardiogram 2015
outcome <- extract_outcome_data(exposure$SNP, "ieu-a-7")

# Harmonise exposure and outcome datasets
# Assume alleles are on the forward strand
dat <- harmonise_data(exposure, outcome, action = 1)
```

At this point we have our harmonised genotype-exposure and 
genotype-outcome association data saved in an object in our 
R session called `dat`.

The next two code chunks perform the analysis in R.

```r
# Perform MR
mr(dat)
mr_heterogeneity(dat)
dat$exposure <- "LDL cholesterol"
dat$outcome <- "Coronary heart disease"

# Label outliers and create plots
dat$labels <- dat$SNP
dat$labels[! dat$SNP %in% c("rs11065987", "rs1250229", "rs4530754")] <- NA
```

```r
png("ldl-chd.png", width=1000, height=1000)
mr_plots(dat)
dev.off()
```

![Plots generated by the TwoSampleMR R package.](ldl-chd.png)

We now save our `dat` object as a Stata dataset.

```r
write.dta(dat, file = "dat.dta")
```

## Performing the analysis using mrrobust in Stata

We now switch from using R code chunks to Stata code chunks.
We read the data into Stata and list the variable names
(note any `.` in the colnames of `dat` have been replaced with `_`).

```s
use dat, clear
ds, v(28)
di _N
```

We can then run the IVW model using `mregger` with multiplicative
standard errors.

```s
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)], ivw
```

It is helpful to view the forest plot of genotype specific IV estimates.

```s
mrforest beta_outcome se_outcome beta_exposure se_exposure, ivid(SNP) ///
    xlabel(-3,-2,-1,0,1,2,3)
graph export ldl-chd-mrforest.svg, width(600) replace
```

![Forest plot of genotype specific IV estimates.](ldl-chd-mrforest.svg)

We can visualise this model with `mreggerplot`.

```s
mreggerplot beta_outcome se_outcome beta_exposure se_exposure
graph export ldl-chd-mreggerplot.svg, width(600) replace
```

![Plot of the MR-Egger model.](ldl-chd-mreggerplot.svg)

We then fit the MR-Egger, median, and modal based estimators.

```s
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)]
```

```s
mrmedian beta_outcome se_outcome beta_exposure se_exposure, weighted
```

```s
mrmodal beta_outcome se_outcome beta_exposure se_exposure, weighted
```

And we could continue with additional Stata code (or indeed R code) as we liked.

Note to run this .stmd file in Stata we do so with the following command (specifying additional options as required, see help markstat for more information).

markstat using markstat-call-R-example

References

Hemani, Gibran, Jie Zheng, Benjamin Elsworth, Kaitlin H Wade, Valeriia Haberland, Denis Baird, Charles Laurin, et al. 2018. “The MR-Base platform supports systematic causal inference across the human phenome.” eLife 7: e34408. https://doi.org/10.7554/eLife.34408.