Access the OpenGWAS API from a Stata do-file or Stata Markdown script
Introduction
This example shows how to run R and Stata code within the same Stata Markdown (.stmd
) script. The general approach is detailed on the Stata Markdown website here and here.
This means that we can use the functions provided by the TwoSampleMR package to obtain data from the [OpenGWAS API}(https://api.opengwas.io/) which is the successor to MR-Base (Hemani et al. 2018).
The OpenGWAS API requires authentication. You must place an OPENGWAS_JWT
token (by creating a free account on https://api.opengwas.io/profile/) in your .Renviron file before running the code below. See the ieugwasr website for more information.
Before you start please install the following two Stata packages from the SSC archive, so in Stata issue the following commands (I have commented them out because I have already installed them).
ssc install whereis
ssc install markstat
Example Stata Markdown document
The following is the example Stata Markdown document.
---
title: Using mrrobust in a Markstat document example
---
We first need to register the R executable with Stata.
```s
if c(os) == "Windows" local rpath "C:\\Program Files\\R\\R-4.4.1\\bin\\x64\\R.exe"
else if c(os) == "Unix" local rpath "/usr/bin/R"
else local rpath "/usr/local/bin/R"
whereis R "`rpath'"
```
Next we have an R code chunk in which we install the required packages in R.
```r
install.packages(
"TwoSampleMR",
repos = c(
"https://mrcieu.r-universe.dev",
"https://cloud.r-project.org"
),dependencies = TRUE
)```
## Extracting data from the OpenGWAS API
We will be running the script from the MR-Base paper [Hemani et al., 2018](https://doi.org/10.7554/eLife.34408)).
(
The R code we will use is from [here](https://raw.githubusercontent.com/explodecomputer/mr-base-methods-paper/master/scripts/ldl-chd.R).
First, we load the packages into our R session.
Note that the **foreign** package provides the `write.dta()` function which we will use to save the data in Stata format.
```r
library(TwoSampleMR)
library(MRInstruments)
library(foreign)
```
Our edited version of the code starts by reading in some code
to generate a set of plots in R.
```r
source("mrplots.R")
```
We can access the data using the **MRInstruments** package.
```r
data(gwas_catalog)
# Get published SNPs for LDL cholesterol
<-
ldl_snps subset(gwas_catalog,
grepl("LDL choles", Phenotype) & Author == "Willer CJ")$SNP
# Extract from GLGC dataset
<-
exposure convert_outcome_to_exposure(
extract_outcome_data(ldl_snps, "ieu-a-300"))
# Get outcome data from Cardiogram 2015
<- extract_outcome_data(exposure$SNP, "ieu-a-7")
outcome
# Harmonise exposure and outcome datasets
# Assume alleles are on the forward strand
<- harmonise_data(exposure, outcome, action = 1)
dat ```
At this point we have our harmonised genotype-exposure and
genotype-outcome association data saved in an object in our `dat`.
R session called
The next two code chunks perform the analysis in R.
```r
# Perform MR
mr(dat)
mr_heterogeneity(dat)
$exposure <- "LDL cholesterol"
dat$outcome <- "Coronary heart disease"
dat
# Label outliers and create plots
$labels <- dat$SNP
dat$labels[! dat$SNP %in% c("rs11065987", "rs1250229", "rs4530754")] <- NA
dat```
```r
png("ldl-chd.png", width=1000, height=1000)
mr_plots(dat)
dev.off()
```
![Plots generated by the TwoSampleMR R package.](ldl-chd.png)
`dat` object as a Stata dataset.
We now save our
```r
write.dta(dat, file = "dat.dta")
```
## Performing the analysis using mrrobust in Stata
We now switch from using R code chunks to Stata code chunks.
We read the data into Stata and list the variable names`.` in the colnames of `dat` have been replaced with `_`).
(note any
```s
use dat, clear
ds, v(28)
di _N
```
`mregger` with multiplicative
We can then run the IVW model using
standard errors.
```s
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)], ivw
```
It is helpful to view the forest plot of genotype specific IV estimates.
```s
mrforest beta_outcome se_outcome beta_exposure se_exposure, ivid(SNP) ///
xlabel(-3,-2,-1,0,1,2,3)
graph export ldl-chd-mrforest.svg, width(600) replace
```
![Forest plot of genotype specific IV estimates.](ldl-chd-mrforest.svg)
`mreggerplot`.
We can visualise this model with
```s
mreggerplot beta_outcome se_outcome beta_exposure se_exposure
graph export ldl-chd-mreggerplot.svg, width(600) replace
```
![Plot of the MR-Egger model.](ldl-chd-mreggerplot.svg)
We then fit the MR-Egger, median, and modal based estimators.
```s
mregger beta_outcome beta_exposure [aw=1/(se_outcome^2)]
```
```s
mrmedian beta_outcome se_outcome beta_exposure se_exposure, weighted
```
```s
mrmodal beta_outcome se_outcome beta_exposure se_exposure, weighted
```
And we could continue with additional Stata code (or indeed R code) as we liked.
Note to run this .stmd
file in Stata we do so with the following command (specifying additional options as required, see help markstat
for more information).
using markstat-call-R-example markstat