This article is a brief introduction to `civ`

.

To illustrate `civ`

on a simple example, consider the data
generating process from the simulation of Wiemann (2023). The code
snippet below draws a sample of size \(n=800\).

```
# Set seed
set.seed(51944)
# Sample parameters
nobs = 800 # sample size
C = 0.858 # first stage coefficient
sgm_V = sqrt(0.81) # first stage error
tau_X <- c(-0.5, 0.5) + 1 # second stage effects
# Sample controls and instrument
X <- sample(1:2, nobs, replace = T)
Z <- model.matrix(~ 0 + as.factor(sample(1:20, nobs, replace = T)):as.factor(X))
Z <- Z %*% c(1:ncol(Z))
# Create the low-dimensional latent instrument
Z0 <- Z %% 2 # underlying latent instrument
# Draw first and second stage errors
U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*%
chol(matrix(c(1, 0.6, 0.6, sgm_V), 2, 2))
# Draw treatment and outcome variables
D <- Z0 * C + U_V[, 2]
y <- D * tau_X[X] + U_V[, 1]
```

In the generated sample, the observed instrument takes 40 values with
varying numbers of observations per instrument. Using only the observed
instrument `Z`

, the goal is to estimate the in-sample average
treatment effect:

`mean(tau_X[X])`

`## [1] 1.0325`

The code snippet below estimates CIV where the first stage is
restricted to `K=2`

support points. The `AER`

package is used to compute heteroskedasticity robust standard
errors.

```
# Compute CIV with K=2 and conduct inference
civ_fit <- civ(y = y, D = D, Z = Z, X = as.factor(X), K = 2)
civ_res <- summary(civ_fit, vcov = vcovHC(civ_fit$iv_fit, type = "HC1"))
```

The CIV estimate and the corresponding standard error are shown
below. The associated 95% confidence interval covers the true effect as
indicated by the *t*-value of less than 1.96.

```
c(Estimate = civ_res$coef[2, 1], "Std. Error" = civ_res$coef[2, 2],
"t-val." = abs(civ_res$coef[2, 1]-mean(tau_X[X]))/civ_res$coef[2, 2])
```

```
## Estimate Std. Error t-val.
## 1.0063143 0.1086868 0.2409285
```

CIV uses a K-Conditional-Means (KCMeans) estimator in a first step to
estimate the optimal instrument. To understand the estimated mapping of
observed instruments to the support points of the latent instrument, it
is useful to print the `cluster_map`

attribute of the
first-stage `kcmeans_fit`

object (see also `kcmeans`

for
details). The code snippet below prints the results for the first 10
values of the instrument. Here, `x`

denotes the value of the
observed instrument while `cluster_x`

denotes the association
with the estimated optimal instrument.

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 26 20 10 32 23 12 7 25 33 21
## cluster_x 1 1 1 1 2 1 2 2 2 2
```

## References

Wiemann T (2023). “Optimal Categorical Instruments.” https://arxiv.org/abs/2311.17021