Skip to contents

Estimator for the flexible partially linear IV model.

Usage

ddml_fpliv(
  y,
  D,
  Z,
  X,
  learners,
  learners_DXZ = learners,
  learners_DX = learners,
  sample_folds = 10,
  ensemble_type = "nnls",
  shortstack = FALSE,
  cv_folds = 10,
  enforce_LIE = TRUE,
  custom_ensemble_weights = NULL,
  custom_ensemble_weights_DXZ = custom_ensemble_weights,
  custom_ensemble_weights_DX = custom_ensemble_weights,
  cluster_variable = seq_along(y),
  subsamples = NULL,
  cv_subsamples_list = NULL,
  silent = FALSE
)

Arguments

y

The outcome variable.

D

A matrix of endogenous variables.

Z

A (sparse) matrix of instruments.

X

A (sparse) matrix of control variables.

learners

May take one of two forms, depending on whether a single learner or stacking with multiple learners is used for estimation of the conditional expectation functions. If a single learner is used, learners is a list with two named elements:

  • what The base learner function. The function must be such that it predicts a named input y using a named input X.

  • args Optional arguments to be passed to what.

If stacking with multiple learners is used, learners is a list of lists, each containing four named elements:

  • fun The base learner function. The function must be such that it predicts a named input y using a named input X.

  • args Optional arguments to be passed to fun.

  • assign_X An optional vector of column indices corresponding to control variables in X that are passed to the base learner.

  • assign_Z An optional vector of column indices corresponding to instruments in Z that are passed to the base learner.

Omission of the args element results in default arguments being used in fun. Omission of assign_X (and/or assign_Z) results in inclusion of all variables in X (and/or Z).

learners_DXZ, learners_DX

Optional arguments to allow for different estimators of \(E[D \vert X, Z]\), \(E[D \vert X]\). Setup is identical to learners.

sample_folds

Number of cross-fitting folds.

ensemble_type

Ensemble method to combine base learners into final estimate of the conditional expectation functions. Possible values are:

  • "nnls" Non-negative least squares.

  • "nnls1" Non-negative least squares with the constraint that all weights sum to one.

  • "singlebest" Select base learner with minimum MSPE.

  • "ols" Ordinary least squares.

  • "average" Simple average over base learners.

Multiple ensemble types may be passed as a vector of strings.

shortstack

Boolean to use short-stacking.

cv_folds

Number of folds used for cross-validation in ensemble construction.

enforce_LIE

Indicator equal to 1 if the law of iterated expectations is enforced in the first stage.

custom_ensemble_weights

A numerical matrix with user-specified ensemble weights. Each column corresponds to a custom ensemble specification, each row corresponds to a base learner in learners (in chronological order). Optional column names are used to name the estimation results corresponding the custom ensemble specification.

custom_ensemble_weights_DXZ, custom_ensemble_weights_DX

Optional arguments to allow for different custom ensemble weights for learners_DXZ,learners_DX. Setup is identical to custom_ensemble_weights. Note: custom_ensemble_weights and custom_ensemble_weights_DXZ,custom_ensemble_weights_DX must have the same number of columns.

cluster_variable

A vector of cluster indices.

subsamples

List of vectors with sample indices for cross-fitting.

cv_subsamples_list

List of lists, each corresponding to a subsample containing vectors with subsample indices for cross-validation.

silent

Boolean to silence estimation updates.

Value

ddml_fpliv returns an object of S3 class ddml_fpliv. An object of class ddml_fpliv is a list containing the following components:

coef

A vector with the \(\theta_0\) estimates.

weights

A list of matrices, providing the weight assigned to each base learner (in chronological order) by the ensemble procedure.

mspe

A list of matrices, providing the MSPE of each base learner (in chronological order) computed by the cross-validation step in the ensemble construction.

iv_fit

Object of class ivreg from the IV regression of \(Y - \hat{E}[Y\vert X]\) on \(D - \hat{E}[D\vert X]\) using \(\hat{E}[D\vert X,Z] - \hat{E}[D\vert X]\) as the instrument.

learners,learners_DX,learners_DXZ, cluster_variable,subsamples, cv_subsamples_list,ensemble_type

Pass-through of selected user-provided arguments. See above.

Details

ddml_fpliv provides a double/debiased machine learning estimator for the parameter of interest \(\theta_0\) in the partially linear IV model given by

\(Y = \theta_0D + g_0(X) + U,\)

where \((Y, D, X, Z, U)\) is a random vector such that \(E[U\vert X, Z] = 0\) and \(E[Var(E[D\vert X, Z]\vert X)] \neq 0\), and \(g_0\) is an unknown nuisance function.

References

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). "ddml: Double/debiased machine learning in Stata." https://arxiv.org/abs/2301.09397

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C B, Newey W, Robins J (2018). "Double/debiased machine learning for treatment and structural parameters." The Econometrics Journal, 21(1), C1-C68.

Wolpert D H (1992). "Stacked generalization." Neural Networks, 5(2), 241-259.

Examples

# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
D = AE98[, "morekids"]
Z = AE98[, "samesex", drop = FALSE]
X = AE98[, c("age","agefst","black","hisp","othrace","educ")]

# Estimate the partially linear IV model using a single base learner: Ridge.
fpliv_fit <- ddml_fpliv(y, D, Z, X,
                        learners = list(what = mdl_glmnet,
                                        args = list(alpha = 0)),
                        sample_folds = 2,
                        silent = TRUE)
summary(fpliv_fit)
#> FPLIV estimation results: 
#>  
#> , , single base learner
#> 
#>              Estimate Std. Error  t value Pr(>|t|)
#> (Intercept) -6.75e-05     0.0069 -0.00979    0.992
#> D_r         -1.40e-01     0.1818 -0.77195    0.440
#>