Skip to contents

Implementation of the categorical instrumental variable estimator.

Usage

civ(y, D, Z, X = NULL, K = 2, regularize_EDZ = FALSE)

Arguments

y

The outcome variable, a numerical vector.

D

A matrix of endogenous variables.

Z

A matrix of instruments, where the first column corresponds to the categorical instrument.

X

An optional matrix of control variables.

K

The number of support points of the estimated instrument \(\hat{m}_K\), an integer greater than 2.

regularize_EDZ

A boolean indicating whether \(E[D|Z]\) should be regularized directly rather than \(E[D - X^\top\pi\vert Z]\).

Value

civ returns an object of S3 class civ. An object of class civ is a list containing the following components:

coef

A vector of second-stage coefficient estimates.

iv_fit

Object of class ivreg from the IV regression of y on D and X using the the estimated \(\hat{F}_K\) as an instrument for D. See also AER::ivreg() for details.

kcmeans_fit

Object of class kcmeans from the K-Conditional-Means regression of D on Z and X. See also kcmeans::kcmeans() for details.

K

Pass-through of selected user-provided arguments. See above.

References

Fox J, Kleiber C, Zeileis A (2023). "ivreg: Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics". R package.

Wiemann T (2023). "Optimal Categorical Instruments." https://arxiv.org/abs/2311.17021

Examples

# Simulate data from a simple IV model with 800 observations
nobs = 800 # sample size
Z <- sample(1:20, nobs, replace = TRUE) # observed instrument
Z0 <- Z %% 2 # underlying latent instrument
U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*%
  chol(matrix(c(1, 0.6, 0.6, 1), 2, 2)) # first and second stage errors
D <- Z0 + U_V[, 2] # endogenous variable
y <- D + U_V[, 1] # outcome variable
# Estimate categorical instrument variable estimator with K = 2
civ_fit <- civ(y, D, Z, K = 3)
summary(civ_fit)
#> 
#> Call:
#> AER::ivreg(formula = y ~ D | m_hat)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -2.713529 -0.696460 -0.005709  0.664267  3.179906 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.04118    0.04866  -0.846    0.398    
#> D            0.98494    0.07025  14.020   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9855 on 798 degrees of freedom
#> Multiple R-Squared:   0.7,	Adjusted R-squared: 0.6996 
#> Wald test: 196.6 on 1 and 798 DF,  p-value: < 2.2e-16 
#>