16. Instrumental variables estimation: Stata
/***************************************************************
Stata code for Causal Inference: What If by Miguel Hernan & Jamie Robins
Date: 10/10/2019
Author: Eleanor Murray
For errors contact: ejmurray@bu.edu
***************************************************************/
Program 16.1
- Estimating the average causal effect using the standard IV estimator via the calculation of sample averages
- Data from NHEFS
- Section 16.2
use ./data/nhefs-formatted, clear
summarize price82
/* ignore subjects with missing outcome or missing instrument for simplicity*/
foreach var of varlist wt82 price82 {
drop if `var'==.
}
/*Create categorical instrument*/
gen byte highprice = (price82 > 1.5 & price82 < .)
save ./data/nhefs-highprice, replace
/*Calculate P[Z|A=a]*/
tab highprice qsmk, row
/*Calculate P[Y|Z=z]*/
ttest wt82_71, by(highprice)
/*Final IV estimate, OPTION 1: Hand calculations*/
/*Numerator: num = E[Y|Z=1] - E[Y|Z=0] = 2.686 - 2.536 = 0.150*/
/*Denominator: denom = P[A=1|Z=1] - P[A=1|Z=0] = 0.258 - 0.195 = 0.063 */
/*IV estimator: E[Ya=1] - E[Ya=0] =
(E[Y|Z=1]-E[Y|Z=0])/(P[A=1|Z=1]-P[A=1|Z=0]) = 0.150/0.063 = 2.397*/
display "Numerator, E[Y|Z=1] - E[Y|Z=0] =", 2.686 - 2.536
display "Denominator: denom = P[A=1|Z=1] - P[A=1|Z=0] =", 0.258 - 0.195
display "IV estimator =", 0.150/0.063
/*OPTION 2 2: automated calculation of instrument*/
/*Calculate P[A=1|Z=z], for each value of the instrument,
and store in a matrix*/
quietly summarize qsmk if (highprice==0)
matrix input pa = (`r(mean)')
quietly summarize qsmk if (highprice==1)
matrix pa = (pa ,`r(mean)')
matrix list pa
/*Calculate P[Y|Z=z], for each value of the instrument,
and store in a second matrix*/
quietly summarize wt82_71 if (highprice==0)
matrix input ey = (`r(mean)')
quietly summarize wt82_71 if (highprice==1)
matrix ey = (ey ,`r(mean)')
matrix list ey
/*Using Stata's built-in matrix manipulation feature (Mata),
calculate numerator, denominator and IV estimator*/
*Numerator: num = E[Y|Z=1] - E[Y|Z=0]*mata
*Denominator: denom = P[A=1|Z=1] - P[A=1|Z=0]*
*IV estimator: iv_est = IV estimate of E[Ya=1] - E[Ya=0] *
mata
pa = st_matrix("pa")
ey = st_matrix("ey")
num = ey[1,2] - ey[1,1]
denom = pa[1,2] - pa[1,1]
iv_est = num / denom
num
denom
st_numscalar("iv_est", iv_est)
end
di scalar(iv_est)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price82 | 1,476 1.805989 .1301703 1.451904 2.103027
(0 observations deleted)
(90 observations deleted)
file ./data/nhefs-highprice.dta saved
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
| quit smoking between
| baseline and 1982
highprice | No smokin Smoking c | Total
-----------+----------------------+----------
0 | 33 8 | 41
| 80.49 19.51 | 100.00
-----------+----------------------+----------
1 | 1,065 370 | 1,435
| 74.22 25.78 | 100.00
-----------+----------------------+----------
Total | 1,098 378 | 1,476
| 74.39 25.61 | 100.00
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
0 | 41 2.535729 1.461629 9.358993 -.4183336 5.489792
1 | 1,435 2.686018 .2084888 7.897848 2.277042 3.094994
---------+--------------------------------------------------------------------
Combined | 1,476 2.681843 .2066282 7.938395 2.276527 3.087159
---------+--------------------------------------------------------------------
diff | -.1502887 1.257776 -2.617509 2.316932
------------------------------------------------------------------------------
diff = mean(0) - mean(1) t = -0.1195
H0: diff = 0 Degrees of freedom = 1474
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.4525 Pr(|T| > |t|) = 0.9049 Pr(T > t) = 0.5475
Numerator, E[Y|Z=1] - E[Y|Z=0] = .15
Denominator: denom = P[A=1|Z=1] - P[A=1|Z=0] = .063
IV estimator = 2.3809524
pa[1,2]
c1 c2
r1 .19512195 .25783972
ey[1,2]
c1 c2
r1 2.535729 2.6860178
------------------------------------------------- mata (type end to exit) ------------
: pa = st_matrix("pa")
: ey = st_matrix("ey")
: num = ey[1,2] - ey[1,1]
: denom = pa[1,2] - pa[1,1]
: iv_est = num / denom
: num
.1502887173
: denom
.06271777
: st_numscalar("iv_est", iv_est)
: end
--------------------------------------------------------------------------------------
2.3962701
Program 16.2
- Estimating the average causal effect using the standard IV estimator via two-stage-least-squares regression
- Data from NHEFS
- Section 16.2
use ./data/nhefs-highprice, clear
/* ivregress fits the model in two stages:
- first model: qsmk = highprice
- second model: wt82_71 = predicted_qsmk */
ivregress 2sls wt82_71 (qsmk = highprice)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(1) = 0.01
Prob > chi2 = 0.9038
R-squared = 0.0213
Root MSE = 7.8508
------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
qsmk | 2.39627 19.82659 0.12 0.904 -36.46313 41.25567
_cons | 2.068164 5.081652 0.41 0.684 -7.89169 12.02802
------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: highprice
Program 16.3
- Estimating the average causal effect using the standard IV estimator via an additive marginal structural model
- Data from NHEFS
- Checking one possible value of psi.
- See Chapter 14 for program that checks several values and computes 95% confidence intervals
- Section 16.2
use ./data/nhefs-highprice, clear
gen psi = 2.396
gen hspi = wt82_71 - psi*qsmk
logit highprice hspi
Iteration 0: Log likelihood = -187.34948
Iteration 1: Log likelihood = -187.34948
Logistic regression Number of obs = 1,476
LR chi2(1) = 0.00
Prob > chi2 = 1.0000
Log likelihood = -187.34948 Pseudo R2 = 0.0000
------------------------------------------------------------------------------
highprice | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
hspi | 2.75e-07 .0201749 0.00 1.000 -.0395419 .0395424
_cons | 3.555347 .1637931 21.71 0.000 3.234319 3.876376
------------------------------------------------------------------------------
Program 16.4
- Estimating the average causal effect using the standard IV estimator based on alternative proposed instruments
- Data from NHEFS
- Section 16.5
use ./data/nhefs-highprice, clear
/*Instrument cut-point: 1.6*/
replace highprice = .
replace highprice = (price82 >1.6 & price82 < .)
ivregress 2sls wt82_71 (qsmk = highprice)
/*Instrument cut-point: 1.7*/
replace highprice = .
replace highprice = (price82 >1.7 & price82 < .)
ivregress 2sls wt82_71 (qsmk = highprice)
/*Instrument cut-point: 1.8*/
replace highprice = .
replace highprice = (price82 >1.8 & price82 < .)
ivregress 2sls wt82_71 (qsmk = highprice)
/*Instrument cut-point: 1.9*/
replace highprice = .
replace highprice = (price82 >1.9 & price82 < .)
ivregress 2sls wt82_71 (qsmk = highprice)
(1,476 real changes made, 1,476 to missing)
(1,476 real changes made)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(1) = 0.06
Prob > chi2 = 0.8023
R-squared = .
Root MSE = 18.593
------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
qsmk | 41.28124 164.8417 0.25 0.802 -281.8026 364.365
_cons | -7.890182 42.21833 -0.19 0.852 -90.63659 74.85623
------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: highprice
(1,476 real changes made, 1,476 to missing)
(1,476 real changes made)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(1) = 0.05
Prob > chi2 = 0.8274
R-squared = .
Root MSE = 20.577
------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
qsmk | -40.91185 187.6162 -0.22 0.827 -408.6328 326.8091
_cons | 13.15927 48.05103 0.27 0.784 -81.01901 107.3375
------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: highprice
(1,476 real changes made, 1,476 to missing)
(1,476 real changes made)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(1) = 0.55
Prob > chi2 = 0.4576
R-squared = .
Root MSE = 13.01
------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
qsmk | -21.10342 28.40885 -0.74 0.458 -76.78374 34.57691
_cons | 8.086377 7.283314 1.11 0.267 -6.188657 22.36141
------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: highprice
(1,476 real changes made, 1,476 to missing)
(1,476 real changes made)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(1) = 0.29
Prob > chi2 = 0.5880
R-squared = .
Root MSE = 10.357
------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
qsmk | -12.81141 23.65099 -0.54 0.588 -59.16649 33.54368
_cons | 5.962813 6.062956 0.98 0.325 -5.920362 17.84599
------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: highprice
Program 16.5
- Estimating the average causal effect using the standard IV estimator conditional on baseline covariates
- Data from NHEFS
- Section 16.5
use ./data/nhefs-highprice, clear
replace highprice = .
replace highprice = (price82 >1.5 & price82 < .)
ivregress 2sls wt82_71 sex race c.age c.smokeintensity ///
c.smokeyrs i.exercise i.active c.wt7 ///
(qsmk = highprice)
(1,476 real changes made, 1,476 to missing)
(1,476 real changes made)
Instrumental-variables 2SLS regression Number of obs = 1,476
Wald chi2(11) = 135.18
Prob > chi2 = 0.0000
R-squared = 0.0622
Root MSE = 7.6848
--------------------------------------------------------------------------------
wt82_71 | Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
qsmk | -1.042295 29.86522 -0.03 0.972 -59.57705 57.49246
sex | -1.644393 2.620115 -0.63 0.530 -6.779724 3.490938
race | -.1832546 4.631443 -0.04 0.968 -9.260716 8.894207
age | -.16364 .2395678 -0.68 0.495 -.6331844 .3059043
smokeintensity | .0057669 .144911 0.04 0.968 -.2782534 .2897872
smokeyrs | .0258357 .1607639 0.16 0.872 -.2892558 .3409271
|
exercise |
1 | .4987479 2.162395 0.23 0.818 -3.739469 4.736964
2 | .5818337 2.174255 0.27 0.789 -3.679628 4.843296
|
active |
1 | -1.170145 .6049921 -1.93 0.053 -2.355908 .0156176
2 | -.5122842 1.303121 -0.39 0.694 -3.066355 2.041787
|
wt71 | -.0979493 .036123 -2.71 0.007 -.168749 -.0271496
_cons | 17.28033 2.32589 7.43 0.000 12.72167 21.83899
--------------------------------------------------------------------------------
Endogenous: qsmk
Exogenous: sex race age smokeintensity smokeyrs 1.exercise 2.exercise
1.active 2.active wt71 highprice