ddml
is an implementation of double/debiased machine learning estimators as proposed by Chernozhukov et al. (2018). The key feature of ddml
is the straightforward estimation of nuisance parameters using (short)stacking (Wolpert, 1992), which allows for multiple machine learners to increase robustness to the underlying data generating process.
ddml
is the sister R package to our Stata package, mirroring its key features while also leveraging R to simplify estimation with userprovided machine learners and/or sparse matrices. See also Ahrens et al. (2023) with additional discussion of the supported causal models and benefits of (short)stacking.
Installation
Install the latest development version from GitHub (requires devtools package):
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("thomaswiemann/ddml", dependencies = TRUE)
Install the latest public release from CRAN:
install.packages("ddml")
Example: LATE Estimation based on (Short)Stacking
To illustrate ddml
on a simple example, consider the included random subsample of 5,000 observations from the data of Angrist & Evans (1998). The data contains information on the labor supply of mothers, their children, as well as demographic data. See ?AE98
for details.
# Load ddml and set seed
library(ddml)
set.seed(75523)
# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
D = AE98[, "morekids"]
Z = AE98[, "samesex"]
X = AE98[, c("age","agefst","black","hisp","othrace","educ")]
ddml_late
estimates the local average treatment effect (LATE) using double/debiased machine learning (see ?ddml_late
). Since the statistical properties of machine learners depend heavily on the underlying (unknown!) structure of the data, adaptive combination of multiple machine learners can increase robustness. In the below snippet, ddml_late
estimates the LATE with shortstacking based on three base learners:
 linear regression (see
?ols
)  lasso (see
?mdl_glmnet
)  gradient boosting (see
?mdl_xgboost
)
# Estimate the local average treatment effect using shortstacking with base
# learners ols, rlasso, and xgboost.
late_fit_short < ddml_late(y, D, Z, X,
learners = list(list(fun = ols),
list(fun = mdl_glmnet),
list(fun = mdl_xgboost,
args = list(nrounds = 100,
max_depth = 1))),
ensemble_type = 'nnls1',
shortstack = TRUE,
sample_folds = 10,
silent = TRUE)
summary(late_fit_short)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>t)
#> nnls1 0.2105019 0.195529 1.076576 0.2816698
Learn More about ddml
Check out our articles to learn more:

vignette("ddml")
is a more detailed introduction toddml

vignette("stacking")
discusses computational benefits of shortstacking 
vignette("new_ml_wrapper")
shows how to write userprovided base learners 
vignette("sparse")
illustrates support of sparse matrices (see?Matrix
)
For additional applied examples, see our case studies:

vignette("example_401k")
revisits the effect of 401k participation on retirement savings 
vignette("example_BLP95")
considers flexible demand estimation with endogenous prices
Other Double/Debiased Machine Learning Packages
ddml
is built to easily (and quickly) estimate common causal parameters with multiple machine learners. With its support for shortstacking, sparse matrices, and easytolearn syntax, we hope ddml
is a useful complement to DoubleML
, the expansive R and Python package. DoubleML
supports many advanced features such as multiway clustering and stacking.
References
Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). “ddml: Double/debiased machine learning in Stata.” https://arxiv.org/abs/2301.09397
Angrist J, Evans W, (1998). “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review, 88(3), 450477.
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C B, Newey W, Robins J (2018). “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal, 21(1), C1C68.
Wolpert D H (1992). “Stacked generalization.” Neural Networks, 5(2), 241259.