This article is a brief introduction to civ
.
To illustrate civ
on a simple example, consider the data
generating process from the simulation of Wiemann (2023). The code
snippet below draws a sample of size \(n=800\).
# Set seed
set.seed(51944)
# Sample parameters
nobs = 800 # sample size
C = 0.858 # first stage coefficient
sgm_V = sqrt(0.81) # first stage error
tau_X <- c(-0.5, 0.5) + 1 # second stage effects
# Sample controls and instrument
X <- sample(1:2, nobs, replace = T)
Z <- model.matrix(~ 0 + as.factor(sample(1:20, nobs, replace = T)):as.factor(X))
Z <- Z %*% c(1:ncol(Z))
# Create the low-dimensional latent instrument
Z0 <- Z %% 2 # underlying latent instrument
# Draw first and second stage errors
U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*%
chol(matrix(c(1, 0.6, 0.6, sgm_V), 2, 2))
# Draw treatment and outcome variables
D <- Z0 * C + U_V[, 2]
y <- D * tau_X[X] + U_V[, 1]
In the generated sample, the observed instrument takes 40 values with
varying numbers of observations per instrument. Using only the observed
instrument Z
, the goal is to estimate the in-sample average
treatment effect:
mean(tau_X[X])
## [1] 1.0325
The code snippet below estimates CIV where the first stage is
restricted to K=2
support points. The AER
package is used to compute heteroskedasticity robust standard
errors.
# Compute CIV with K=2 and conduct inference
civ_fit <- civ(y = y, D = D, Z = Z, X = as.factor(X), K = 2)
civ_res <- summary(civ_fit, vcov = vcovHC(civ_fit$iv_fit, type = "HC1"))
The CIV estimate and the corresponding standard error are shown below. The associated 95% confidence interval covers the true effect as indicated by the t-value of less than 1.96.
c(Estimate = civ_res$coef[2, 1], "Std. Error" = civ_res$coef[2, 2],
"t-val." = abs(civ_res$coef[2, 1]-mean(tau_X[X]))/civ_res$coef[2, 2])
## Estimate Std. Error t-val.
## 1.0063143 0.1086868 0.2409285
CIV uses a K-Conditional-Means (KCMeans) estimator in a first step to
estimate the optimal instrument. To understand the estimated mapping of
observed instruments to the support points of the latent instrument, it
is useful to print the cluster_map
attribute of the
first-stage kcmeans_fit
object (see also kcmeans
for
details). The code snippet below prints the results for the first 10
values of the instrument. Here, x
denotes the value of the
observed instrument while cluster_x
denotes the association
with the estimated optimal instrument.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 26 20 10 32 23 12 7 25 33 21
## cluster_x 1 1 1 1 2 1 2 2 2 2
References
Wiemann T (2023). “Optimal Categorical Instruments.” https://arxiv.org/abs/2311.17021