Constructing a User-Provided Base Learner
Source:vignettes/articles/new_ml_wrapper.Rmd
new_ml_wrapper.RmdIntroduction
This article illustrates how users can write their own base learners.
The necessary requirements for a base learner to be compatible with
ddml are minimal:
- An estimation function that takes an outcome vector
yand a feature matrixXand returns a fitted model object as an S3 class - A
predictmethod for the above S3 class that takes a feature matrixnewdataand returns a numerical vector of predictions
We outline the requirements on two concrete examples:
- A simple wrapper for
gbm::gbm.fit()for generalized boosted regression - A more extensive wrapper for neural networks based on
keras
To test our wrappers, we apply them to the included random subsample
of 5,000 observations from the data of Angrist & Evans (1998). See
?AE98 for details.
A Wrapper for gbm::gbm.fit()
We begin with writing an estimation function that takes named
arguments y and X and returns a fitted model
object. To allow for arguments to be passed from the wrapper to
gbm::gbm.fit(), we also make use of the ...
argument.
# Load gbm
library(gbm)
# Write gbm wrapper
mdl_gbm <- function(y, X, ...) {
gbm_fit <- gbm.fit(x = X, y = y, ...) # fit gbm
class(gbm_fit) <- c("mdl_gbm", class(gbm_fit)) # append class
return(gbm_fit) # return fitted gbm object
}#MDL_GBMNote that by default, predicted values from
gbm::predict.gbm() do not return values on the scale of the
outcome variable (see ?gbm::predict.gbm()). It is therefore
important to also construct a simple prediction method for the
mdl_gbm object:
# Write prediction method for gbm.wrapper
predict.mdl_gbm <- function(object, newdata, ...) {
class(object) <- class(object)[-1] # remove mdl_gbm from the class list
predict(object, newdata, type = "response", n.trees = object$n.trees)
}#MDL_GBMAs a simple test, we check the class of the object returned by the wrapper and the prediction method we constructed:
# Estimate gbm
gbm_fit <- mdl_gbm(y, X,
distribution = "bernoulli",
n.trees = 1000,
interaction.depth = 6,
verbose = FALSE)
# Compute predictions
fitted_values <- predict(gbm_fit, X)
# Check class
class(gbm_fit)
#> [1] "mdl_gbm" "gbm"
class(fitted_values)
#> [1] "numeric"The classes are what we expected. The wrapper should now be
compatible with the ddml machinery. As a final test, we
estimate the local average treatment effect. (Computation can take a
while, feel free to set n.trees or
interaction.depth to a smaller value for this test.)
# Use the new gbm base learner
learner_gbm <- list(what = mdl_gbm,
args = list(distribution = "bernoulli",
n.trees = 1000,
interaction.depth = 6,
verbose = FALSE))
# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
learners = learner_gbm,
sample_folds = 10,
silent = TRUE)
summary(late_fit)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> -0.236 0.19 -1.24 0.214All works well!
A Wrapper for Neural Networks based on keras
We now consider an example using the neural network implementation of
the keras package. See this guide to keras
basics for an excellent introduction to neural networks in R.
The requirements for the wrapper to be compatible with
ddml are the same as before, however, because estimating a
neural network is more involved, we need to write a more extensive
wrapper. In particular, our wrapper for the keras package
has four components:
- Building the neural network architecture
- Compiling the neural network object
- Estimating the neural network
- Appending the class of the returned object (to write a custom
predictmethod)
The wrapper below allows for custom specification of a feed-forward neural network architecture using the relu activation function. Of course, much more involved architectures are also supported (feel free to experiment!).
mdl_keras <- function(y, X,
units = 10, nhidden = 1,
optimizer_fun = "rmsprop",
loss = "mse",
epochs = 10,
batch_size = min(1000, length(y)),
validation_split = 0,
callbacks = NULL,
steps_per_epoch = NULL,
metrics = c("mae"),
verbose = 0) {
# Data parameters
nobs <- length(y)
# Normalize data
std_y <- c(mean(y), sd(y))
std_X <- apply(X, 2, function(x) c(mean(x), max(sd(x), 1e-3)))
y <- (y - std_y[1])/std_y[2]
X <- X - matrix(replicate(nobs, std_X[1, , drop = F]),
nrow = nobs, byrow = T)
X <- X / matrix(replicate(nobs, std_X[2, , drop = F]),
nrow = nobs, byrow = T)
# ===============================================
# ADJUST THIS PART FOR DIFFERENT ARCHITECTURES ==
# Construct neural network architecture
nnet <- keras_model_sequential()
for (k in 1:nhidden) {
nnet <- nnet %>%
layer_dense(units = units, use_bias = T,
activation = "relu")
}#FOR
nnet <- nnet %>%
layer_dense(units = 1, use_bias = T)
# ===============================================
# ===============================================
# Compile neural net
nnet %>% keras::compile(optimizer = optimizer_fun,
loss = loss,
metrics = metrics)
# Fit neural net
nnet %>% keras::fit(X, y,
epochs = epochs,
batch_size = batch_size,
validation_split = validation_split,
callbacks = callbacks,
steps_per_epoch = steps_per_epoch,
verbose = verbose)
# Add standardization to object
nnet$std_y <- std_y
nnet$std_X <- std_X
# Append class
class(nnet) <- c("mdl_keras", class(nnet))
# Return fitted object
return(nnet)
}#MDL_KERASThe prediction method for our fitted object ensures the predictions are returned as a numerical vector:
predict.mdl_keras <- function(object, newdata){
# Standardize newdata
nobs <- nrow(newdata)
newdata <- newdata - matrix(replicate(nobs, object$std_X[1, , drop = F]),
nrow = nobs, byrow = T)
newdata <- newdata / matrix(replicate(nobs, object$std_X[2, , drop = F]),
nrow = nobs, byrow = T)
# Predict data and output as matrix
class(object) <- class(object)[-1] # Not a pretty solution...
fitted <- as.numeric(predict(object, newdata))
# Re-standardize to output and return
object$std_y[2] * fitted + object$std_y[1]
}#PREDICT.MDL_KERASTo test the wrapper, we again estimate the local average treatment
effect. In addition to the architecture, it is important to properly
tune the optimization algorithm. callback_list helps with
specifying learning rate adjustments and defines the early stopping
rule.
# Specify callbacks
callbacks_list <- list(callback_early_stopping(monitor = "val_loss",
patience = 15,
restore_best_weights = T),
callback_reduce_lr_on_plateau(monitor = "val_loss",
factor = 1/10,
patience = 10,
verbose = F))
# Use the neural network base learner
learner_keras = list(what = mdl_keras,
args = list(units = 10, nhidden = 1,
epochs = 100,
verbose = F,
validation_split = 0.1,
callbacks = callbacks_list))
# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
learners = learner_keras,
sample_folds = 10,
silent = T)
#> Warning in trim_propensity_scores(r_X, trim, ensemble_type): : 2 propensity
#> scores were trimmed.
summary(late_fit)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> -0.187 0.189 -0.989 0.322All works well!