Example based on Berry, Levinson, & Pakes (1995)
Source:vignettes/articles/example_BLP95.Rmd
example_BLP95.RmdIntroduction
This article revisits the empirical example of Chernozhukov, Hansen, and Spindler (2015) (CHS2015, hereafter), which extends the instruments of Berry, Levinsohn, and Pakes (1995) (BLP1995, hereafter) and applies an instrument selection procedure based on the lasso. We consider the same instrument extension and apply Double/Debiased Machine Learning with short-stacking that combines conventional linear estimators with computational alternatives including lasso-based approaches, random forests, and gradient boosting.
Data Construction
We consider the automobile market data from Berry, Levinsohn, Pakes (1995) as retrieved from the reproduction exercise in Chernozhukov, Hansen, and Spindler (2015). link
BLP95 <- readRDS("data/BLP95.rds")
nobs <- length(BLP95$id)
y <- as.matrix(log(BLP95$share) - log(BLP95$outshr))
D <- as.matrix(BLP95$price)
X <- as.matrix(cbind(1, BLP95[, c("hpwt", "air", "mpd", "space")]))
colnames(y) <- "share"
colnames(D) <- "price"The below snippet constructs the BLP1995 instruments, which are sums of product characteristics (excluding price and other potentially endogenous variables) of other products offered by the firm as well as over competing firms. The exact instrument specification here follows the approach of CHS2015.
ncol_X <- ncol(X)
sum_other <- sum_rival <- matrix(0, nobs, 5)
for (i in 1:nobs) {
other_ind <- (BLP95$firmid == BLP95$firmid[i]) &
(BLP95$cdid == BLP95$cdid[i]) & (BLP95$id != BLP95$id[i])
rival_ind <- (BLP95$firmid != BLP95$firmid[i]) &
(BLP95$cdid == BLP95$cdid[i])
sum_other[i, ] <- colSums(X[other_ind, , drop = FALSE])
sum_rival[i, ] <- colSums(X[rival_ind, , drop = FALSE])
}#FOR
Z <- cbind(sum_other, sum_rival); ncol_Z <- ncol(Z)
cat("Number of baseline controls: ", ncol_X, "\n",
"Number of baseline instruments: ", ncol_Z)
#> Number of baseline controls: 5
#> Number of baseline instruments: 10In addition, we may be interested in extending the list of instruments. The below constructs the additional instruments considered in CHS2015.
tu = BLP95$trend/19;
mpdu = BLP95$mpd/7;
spaceu = BLP95$space/2;
XL <- as.matrix(cbind(1, BLP95[, c("hpwt", "air")], mpdu, spaceu, tu,
BLP95$hpwt^2, BLP95$hpwt^3, mpdu^2, mpdu^3,
spaceu^2, spaceu^3, tu^2, tu^3, BLP95$hpwt *
BLP95$air, mpdu * BLP95$air, spaceu *
BLP95$air, tu * BLP95$air, BLP95$hpwt *
mpdu, BLP95$hpwt * spaceu, BLP95$hpwt * tu,
mpdu * spaceu, mpdu * tu, spaceu * tu))
ncol_XL <- ncol(XL)
sum_otherL <- sum_rivalL <- matrix(0, nobs, 24)
for (i in 1:nobs) {
other_ind <- (BLP95$firmid == BLP95$firmid[i]) &
(BLP95$cdid == BLP95$cdid[i]) & (BLP95$id != BLP95$id[i])
rival_ind <- (BLP95$firmid != BLP95$firmid[i]) &
(BLP95$cdid == BLP95$cdid[i])
sum_otherL[i, ] <- colSums(XL[other_ind, , drop = FALSE])
sum_rivalL[i, ] <- colSums(XL[rival_ind, , drop = FALSE])
}#FOR
ZL <- cbind(sum_otherL,sum_rivalL); ncol_ZL <- ncol(ZL)
cat("Number of extended controls: ", ncol_XL, "\n",
"Number of extended instruments: ", ncol_ZL)
#> Number of extended controls: 24
#> Number of extended instruments: 48Baseline Estimates
We begin by computing the OLS and TSLS estimates using the baseline controls and instrumental variables. Note that the TSLS estimates differ from those in CHS2015. This is due to a slight instrument-construction error in the original code of the CHS2015.
ols_fit <- lm(y ~ D + X)
round(summary(ols_fit)$coefficients[2, ], 4)
#> Estimate Std. Error t value Pr(>|t|)
#> -0.0886 0.0040 -22.0145 0.0000
tsls_fit <- ivreg(y ~ D + X | X + Z)
round(summary(tsls_fit)$coefficients[2, ], 4)
#> Estimate Std. Error t value Pr(>|t|)
#> -0.1357 0.0108 -12.5993 0.0000Similarly, we compute estimates using the expanded set of controls and instruments.
ols_L_fit <- lm(y ~ D + XL)
round(summary(ols_L_fit)$coefficients[2, ], 4)
#> Estimate Std. Error t value Pr(>|t|)
#> -0.0991 0.0044 -22.4630 0.0000
tsls_L_fit <- ivreg(y ~ D + XL | XL + ZL)
round(summary(tsls_L_fit)$coefficients[2, ], 4)
#> Estimate Std. Error t value Pr(>|t|)
#> -0.1258 0.0070 -17.8691 0.0000Estimating the Flexible Partially Linear IV Model with CV-Lasso
Given the large set of controls and instruments in the expanded set
relative to the moderate sample size, it is reasonable to consider
regularized estimators. A frequent choice with many variables are
lasso-based estimators. Below, we combine Double/Debiased Machine
Learning with lasso selection of instruments and controls. (See
?mdl_glmnet and ?ddml_fpliv for details.)
# Base learner
learner <- list(what = mdl_glmnet)
# Estimate the ddml estimator using a single base learner
lasso_fit <- ddml_fpliv(y, D = D,
Z = ZL, X = XL,
learners = learner,
sample_folds = 5,
silent = TRUE)
summary(lasso_fit)
#> DDML estimation: Flexible Partially Linear IV Model
#> Obs: 2217 Folds: 5
#>
#> Estimate Std. Error z value Pr(>|z|)
#> price -0.144870 0.008810 -16.44 <2e-16 ***
#> (Intercept) 0.000712 0.022806 0.03 0.98
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Estimating the Flexible Partially Linear IV Model with Multiple Learners
A more ambitious approach may consider a variety of computational
models for the estimation of the first and second stage. Here, we
consider a combination of linear models, including unpenalized
regression (see ?ols), lasso and ridge regression, random
forests (see ?mdl_ranger), and boosted trees (see
?mdl_xgboost).
In addition to allowing for different machine learners, we also consider different sets of instruments and controls: Either the initial set of instruments as in BLP1995 or the extended set as in CHS2015. For this purpose, it is convenient to pre-specify sets of indices corresponding to the initial and extended sets.
# Construct column indices for combined control and instrument sets
X_c <- cbind(X, XL); colnames(X_c) <- c(1:ncol(X_c))
Z_c <- cbind(Z, ZL); colnames(Z_c) <- c(1:ncol(Z_c))
set_X <- 1:ncol(X); set_XL <- setdiff(c(1:ncol(X_c)), set_X)
set_Z <- 1:ncol(Z); set_ZL <- setdiff(c(1:ncol(Z_c)), set_Z)
# Base learners
learners <- list(list(what = ols, # ols with the baseline set
assign_X = set_X,
assign_Z = set_Z),
list(what = ols, # ols with the extended set
assign_X = set_XL,
assign_Z = set_ZL),
list(what = mdl_glmnet, # lasso with the extended set
args = list(alpha = 1),
assign_X = set_XL,
assign_Z = set_ZL),
list(what = mdl_glmnet, # ridge with the extended set
args = list(alpha = 0),
assign_X = set_XL,
assign_Z = set_ZL),
list(what = mdl_ranger, # random forests with the baseline set
args = list(num.trees = 100,
min.node.size = 10),
assign_X = set_X,
assign_Z = set_Z),
list(what = mdl_xgboost, # boosted trees with the baseline set
args = list(nrounds = 50),
assign_X = set_X,
assign_Z = set_Z))
# Compute short-stacked IV estimate
stacking_fit <- ddml_fpliv(y, D = D,
Z = Z_c, X = X_c,
learners = learners,
ensemble_type = c("nnls1"),
shortstack = TRUE,
sample_folds = 5,
silent = TRUE)
summary(stacking_fit)
#> DDML estimation: Flexible Partially Linear IV Model
#> Obs: 2217 Folds: 5 Stacking: short-stack
#>
#> Estimate Std. Error z value Pr(>|z|)
#> price -0.09812 0.00967 -10.15 <2e-16 ***
#> (Intercept) 0.00223 0.02035 0.11 0.91
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Interestingly, the coefficient is closer to the OLS estimates than to the TSLS estimates (with or without lasso)!
To better understand the composition of the final estimator, it is often useful to inspect the stacking diagnostics. Here, we see that the boosted trees and the random forest learners have been assigned the most weight. The linear methods – ols, lasso, and ridge – do not contribute substantially to the final estimates, suggesting that the user-defined expansions of the controls and instruments does little to improve bias and precision.
diagnostics(stacking_fit)
#> Stacking diagnostics: Flexible Partially Linear IV Model
#> Obs: 2217
#>
#> y_X:
#> learner mspe r2 weight_nnls1
#> learner_1 1.4341 0.2488 0.0000
#> learner_2 1.3458 0.2950 0.0000
#> learner_3 1.3543 0.2905 0.0000
#> learner_4 1.3850 0.2744 0.0000
#> learner_5 1.1000 0.4237 0.3525
#> learner_6 1.0350 0.4578 0.6475
#> nnls1 1.0076 0.4721 NA
#>
#> D1_X:
#> learner mspe r2 weight_nnls1
#> learner_1 32.8790 0.5599 0.0000
#> learner_2 25.9267 0.6530 0.0000
#> learner_3 25.9418 0.6528 0.0553
#> learner_4 28.2471 0.6219 0.0000
#> learner_5 16.9611 0.7730 0.1252
#> learner_6 13.4904 0.8194 0.8195
#> nnls1 13.2407 0.8228 NA
#>
#> D1_XZ:
#> learner mspe r2 weight_nnls1
#> learner_1 28.1808 0.6228 0.0000
#> learner_2 20.3428 0.7277 0.0000
#> learner_3 18.3504 0.7544 0.0080
#> learner_4 23.9814 0.6790 0.0000
#> learner_5 10.2051 0.8634 0.3372
#> learner_6 8.9730 0.8799 0.6548
#> nnls1 8.4965 0.8863 NA
#>
#> Note: Ensemble MSPE and R2 for short-stacking rely on full-sample weights
#> and represent in-sample fit over cross-fitted base predictions.Elasticities
Equipped with the multiple coefficient estimates, we may recalculate the number of products with inelastic demand as in CHS2015.
compute_inelastic_demand <- function(price_coef) {
sum(price_coef * (BLP95$price) * (1 - BLP95$share) > -1)
}#COMPUTE_INELASTIC_DEMANDBLP1995’s TSLS estimates suggest about 746-896 products with inelastic demand, nearly half of the number of products implied by simple OLS estimates.
# OLS implied number of products with inelastic demand
compute_inelastic_demand(ols_fit$coef[2])
#> [1] 1502
compute_inelastic_demand(ols_L_fit$coef[2])
#> [1] 1405
# TSLS implied number of products with inelastic demand
compute_inelastic_demand(tsls_fit$coef[2])
#> [1] 746
compute_inelastic_demand(tsls_L_fit$coef[2])
#> [1] 896Double/Debiased Machine Learning estimates using only a single lasso base learner suggest the smallest number of inelastic products. In stark contrast, the estimates based on multiple machine learners suggest a number closer to the initial OLS estimates.
Bonus: Post-Lasso Estimates without Sample-Splitting
In addition to ddml, we can also consider estimators
from the hdm package based on rigorous lasso – i.e., with
plug-in penalty parameters and without sample-splitting.
# Load hdm
library(hdm)
# Compute post-lasso IV estimator with the plug-in penalty
rlassoIV_fit <- rlassoIV(x = XL, d = D, y = y, z = ZL,
select.Z = TRUE, select.X = FALSE, post = TRUE)
summary(rlassoIV_fit)
#> [1] "Estimates and significance testing of the effect of target variables in the IV regression model"
#> coeff. se. t-value p-value
#> [1,] -0.31558 0.05354 -5.894 3.78e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The coefficient is drastically different from previous estimates, including the Double/Debiased Machine Learning estimates that included lasso-based approaches.
To gain some insight into potential causes for these differences, we check which instruments and controls were selected by the lasso procedure. Not surprisingly, lasso with a plug-in penalty selects only very few instruments and controls.
Zr_ <- ZL[, which(rlassoIV_fit$selected[1:48])]
dim(Zr_)[2]
#> [1] 2
Xr_ <- XL[, which(rlassoIV_fit$selected[49:length(rlassoIV_fit$selected)])]
dim(Xr_)[2]
#> [1] 6From the stacking weights above, we know that the Double/Debiased Machine Learning estimator assigns most weight to boosted trees. In contrast to lasso-based estimates, boosted trees adaptively create interactions from their input variables, allowing for rich non-linearities in the final predictions. It thus makes sense to check whether the stark differences between the rlasso-based IV estimates above and the stacking estimates is primarily due to potential non-linearities as opposed to the specific instrument and control variables that were selected. The below code snippet re-estimates the stacking learner with the pre-selected set of controls and instruments.
# Base learner
learner <- list(list(what = ols),
list(what = mdl_xgboost,
args = list(nrounds = 50)))
# Compute short-stacked IV estimate
stacking_r_fit <- ddml_fpliv(y, D = D,
Z = Zr_, X = Xr_,
learners = learner,
ensemble_type = c("nnls1"),
shortstack = TRUE,
sample_folds = 5,
silent = TRUE)
summary(stacking_r_fit)
#> DDML estimation: Flexible Partially Linear IV Model
#> Obs: 2217 Folds: 5 Stacking: short-stack
#>
#> Estimate Std. Error z value Pr(>|z|)
#> price -0.10628 0.01316 -8.08 6.5e-16 ***
#> (Intercept) 0.00425 0.02189 0.19 0.85
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The similarity of the coefficient estimates to the initial stacking estimates suggests that adaptively created interactions are indeed the key driver between the coefficient differences. This is further confirmed by the stacking diagnostics, which again place substantial weight on the boosted trees.
diagnostics(stacking_r_fit)
#> Stacking diagnostics: Flexible Partially Linear IV Model
#> Obs: 2217
#>
#> y_X:
#> learner mspe r2 weight_nnls1
#> learner_1 1.4843 0.2224 0.2298
#> learner_2 1.1869 0.3782 0.7702
#> nnls1 1.1579 0.3934 NA
#>
#> D1_X:
#> learner mspe r2 weight_nnls1
#> learner_1 29.7015 0.6025 0.1121
#> learner_2 14.2630 0.8091 0.8879
#> nnls1 14.0130 0.8124 NA
#>
#> D1_XZ:
#> learner mspe r2 weight_nnls1
#> learner_1 28.2924 0.6213 0.0725
#> learner_2 10.4439 0.8602 0.9275
#> nnls1 10.3342 0.8617 NA
#>
#> Note: Ensemble MSPE and R2 for short-stacking rely on full-sample weights
#> and represent in-sample fit over cross-fitted base predictions.The example thus highlights the importance of considering multiple machine learners for robustness and illustrates the usefulness of Double/Debiased Machine Learning with stacking as a practical solution.
References
Berry S, Levinshon J, Pakes A (1995). “Automobile Prices in Market Equilibrium.” Econometrica, 63(4), 841-890.
Chernozhukov V, Hansen C, Spindler M (2015). “Post-selection and post-regularization inference in linear models with many controls and instruments.” American Economic Review, 105(5), 486-490.