Implementation of the K-Conditional-Means estimator.

## Arguments

- y
The outcome variable, a numerical vector.

- X
A (sparse) feature matrix where one column is the categorical predictor.

- which_is_cat
An integer indicating which column of

`X`

corresponds to the categorical predictor.- K
The number of support points, an integer greater than 2.

## Value

`kcmeans`

returns an object of S3 class `kcmeans`

. An
object of class `kcmeans`

is a list containing the following
components:

`cluster_map`

A matrix that characterizes the estimated predictor of the residualized outcome \(\tilde{Y} \equiv Y - X_{2:}^\top \hat{\pi}\). The first column

`x`

denotes the value of the categorical variable that corresponds to the unrestricted sample mean`mean_x`

of \(\tilde{Y}\), the sample share`p_x`

, the estimated cluster`cluster_x`

, and the estimated restricted sample mean`mean_xK`

of \(\tilde{Y}\) with just`K`

support points.`mean_y`

The unconditional sample mean of \(\tilde{Y}\).

`pi`

The best linear prediction coefficients of \(Y\) on \(X\) corresponding to the non-categorical predictors \(X_{2:}\).

`which_is_cat`

,`K`

Passthrough of user-provided arguments. See above for details.

## References

Wang H and Song M (2011). "Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming." The R Journal 3(2), 29--33.

Wiemann T (2023). "Optimal Categorical Instruments." https://arxiv.org/abs/2311.17021

## Examples

```
# Simulate simple dataset with n=800 observations
X <- rnorm(800) # continuous predictor
Z <- sample(1:20, 800, replace = TRUE) # categorical predictor
Z0 <- Z %% 4 # lower-dimensional latent categorical variable
y <- Z0 + X + rnorm(800) # outcome
# Compute kcmeans with four support points
kcmeans_fit <- kcmeans(y, cbind(Z, X), K = 4)
# Print the estimated support points of the categorical predictor
print(unique(kcmeans_fit$cluster_map[, "mean_xK"]))
#> [1] 0.8919541 1.9124459 3.1148056 -0.1195223
```