Constructing a User-Provided Base Learner
Source:vignettes/articles/new_ml_wrapper.Rmd
new_ml_wrapper.Rmd
Introduction
This article illustrates how users can write their own base learners.
The necessary requirements for a base learner to be compatible with
ddml
are minimal:
- An estimation function that takes an outcome vector
y
and a feature matrixX
and returns a fitted model object as an S3 class - A
predict
method for the above S3 class that takes a feature matrixnewdata
and returns a numerical vector of predictions
We outline the requirements on two concrete examples:
- A simple wrapper for
gbm::gbm.fit()
for generalized boosted regression - A more extensive wrapper for neural networks based on
keras
To test our wrappers, we apply them to the included random subsample
of 5,000 observations from the data of Angrist & Evans (1998). See
?AE98
for details.
A Wrapper for gbm::gbm.fit()
We begin with writing an estimation function that takes named
arguments y
and X
and returns a fitted model
object. To allow for arguments to be passed from the wrapper to
gbm::gbm.fit()
, we also make use of the ...
argument.
# Load gbm
library(gbm)
# Write gbm wrapper
mdl_gbm <- function(y, X, ...) {
gbm_fit <- gbm.fit(x = X, y = y, ...) # fit gbm
class(gbm_fit) <- c("mdl_gbm", class(gbm_fit)) # append class
return(gbm_fit) # return fitted gbm object
}#MDL_GBM
Note that by default, predicted values from
gbm::predict.gbm()
do not return values on the scale of the
outcome variable (see ?gbm::predict.gbm()
). It is therefore
important to also construct a simple prediction method for the
mdl_gbm
object:
# Write prediction method for gbm.wrapper
predict.mdl_gbm <- function(object, newdata, ...) {
class(object) <- class(object)[-1] # remove mdl_gbm from the class list
predict(object, newdata, type = "response", n.trees = object$n.trees)
}#MDL_GBM
As a simple test, we check the class of the object returned by the wrapper and the prediction method we constructed:
# Estimate gbm
gbm_fit <- mdl_gbm(y, X,
distribution = "bernoulli",
n.trees = 1000,
interaction.depth = 6,
verbose = FALSE)
# Compute predictions
fitted_values <- predict(gbm_fit, X)
# Check class
class(gbm_fit)
#> [1] "mdl_gbm" "gbm"
class(fitted_values)
#> [1] "numeric"
The classes are what we expected. The wrapper should now be
compatible with the ddml
machinery. As a final test, we
estimate the local average treatment effect. (Computation can take a
while, feel free to set n.trees
or
interaction.depth
to a smaller value for this test.)
# Use the new gbm base learner
learner_gbm <- list(what = mdl_gbm,
args = list(distribution = "bernoulli",
n.trees = 1000,
interaction.depth = 6,
verbose = FALSE))
# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
learners = learner_gbm,
sample_folds = 10,
silent = TRUE)
summary(late_fit)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> -0.236 0.19 -1.24 0.214
All works well!
A Wrapper for Neural Networks based on keras
We now consider an example using the neural network implementation of
the keras
package. See this guide to keras
basics for an excellent introduction to neural networks in R.
The requirements for the wrapper to be compatible with
ddml
are the same as before, however, because estimating a
neural network is more involved, we need to write a more extensive
wrapper. In particular, our wrapper for the keras
package
has four components:
- Building the neural network architecture
- Compiling the neural network object
- Estimating the neural network
- Appending the class of the returned object (to write a custom
predict
method)
The wrapper below allows for custom specification of a feed-forward neural network architecture using the relu activation function. Of course, much more involved architectures are also supported (feel free to experiment!).
mdl_keras <- function(y, X,
units = 10, nhidden = 1,
optimizer_fun = "rmsprop",
loss = "mse",
epochs = 10,
batch_size = min(1000, length(y)),
validation_split = 0,
callbacks = NULL,
steps_per_epoch = NULL,
metrics = c("mae"),
verbose = 0) {
# Data parameters
nobs <- length(y)
# Normalize data
std_y <- c(mean(y), sd(y))
std_X <- apply(X, 2, function(x) c(mean(x), max(sd(x), 1e-3)))
y <- (y - std_y[1])/std_y[2]
X <- X - matrix(replicate(nobs, std_X[1, , drop = F]),
nrow = nobs, byrow = T)
X <- X / matrix(replicate(nobs, std_X[2, , drop = F]),
nrow = nobs, byrow = T)
# ===============================================
# ADJUST THIS PART FOR DIFFERENT ARCHITECTURES ==
# Construct neural network architecture
nnet <- keras_model_sequential()
for (k in 1:nhidden) {
nnet <- nnet %>%
layer_dense(units = units, use_bias = T,
activation = "relu")
}#FOR
nnet <- nnet %>%
layer_dense(units = 1, use_bias = T)
# ===============================================
# ===============================================
# Compile neural net
nnet %>% keras::compile(optimizer = optimizer_fun,
loss = loss,
metrics = metrics)
# Fit neural net
nnet %>% keras::fit(X, y,
epochs = epochs,
batch_size = batch_size,
validation_split = validation_split,
callbacks = callbacks,
steps_per_epoch = steps_per_epoch,
verbose = verbose)
# Add standardization to object
nnet$std_y <- std_y
nnet$std_X <- std_X
# Append class
class(nnet) <- c("mdl_keras", class(nnet))
# Return fitted object
return(nnet)
}#MDL_KERAS
The prediction method for our fitted object ensures the predictions are returned as a numerical vector:
predict.mdl_keras <- function(object, newdata){
# Standardize newdata
nobs <- nrow(newdata)
newdata <- newdata - matrix(replicate(nobs, object$std_X[1, , drop = F]),
nrow = nobs, byrow = T)
newdata <- newdata / matrix(replicate(nobs, object$std_X[2, , drop = F]),
nrow = nobs, byrow = T)
# Predict data and output as matrix
class(object) <- class(object)[-1] # Not a pretty solution...
fitted <- as.numeric(predict(object, newdata))
# Re-standardize to output and return
object$std_y[2] * fitted + object$std_y[1]
}#PREDICT.MDL_KERAS
To test the wrapper, we again estimate the local average treatment
effect. In addition to the architecture, it is important to properly
tune the optimization algorithm. callback_list
helps with
specifying learning rate adjustments and defines the early stopping
rule.
# Specify callbacks
callbacks_list <- list(callback_early_stopping(monitor = "val_loss",
patience = 15,
restore_best_weights = T),
callback_reduce_lr_on_plateau(monitor = "val_loss",
factor = 1/10,
patience = 10,
verbose = F))
# Use the neural network base learner
learner_keras = list(what = mdl_keras,
args = list(units = 10, nhidden = 1,
epochs = 100,
verbose = F,
validation_split = 0.1,
callbacks = callbacks_list))
# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
learners = learner_keras,
sample_folds = 10,
silent = T)
#> Warning in trim_propensity_scores(r_X, trim, ensemble_type): : 2 propensity
#> scores were trimmed.
summary(late_fit)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> -0.187 0.189 -0.989 0.322
All works well!