Constructing a User-Provided Base Learner

Introduction

This article illustrates how users can write their own base learners. The necessary requirements for a base learner to be compatible with ddml are minimal:

An estimation function that takes an outcome vector y and a feature matrix X and returns a fitted model object as an S3 class
A predict method for the above S3 class that takes a feature matrix newdata and returns a numerical vector of predictions

We outline the requirements on two concrete examples:

A simple wrapper for gbm::gbm.fit() for generalized boosted regression
A more extensive wrapper for neural networks based on keras

To test our wrappers, we apply them to the included random subsample of 5,000 observations from the data of Angrist & Evans (1998). See ?AE98 for details.

# Load ddml and set seed
library(ddml)
set.seed(75039)

# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
D = AE98[, "morekids"]
Z = AE98[, "samesex"]
X = AE98[, c("age","agefst","black","hisp","othrace","educ")]

A Wrapper for `gbm::gbm.fit()`

We begin with writing an estimation function that takes named arguments y and X and returns a fitted model object. To allow for arguments to be passed from the wrapper to gbm::gbm.fit(), we also make use of the ... argument.

# Load gbm
library(gbm)

# Write gbm wrapper
mdl_gbm <- function(y, X, ...) {
  gbm_fit <- gbm.fit(x = X, y = y, ...) # fit gbm
  class(gbm_fit) <- c("mdl_gbm", class(gbm_fit)) # append class
  return(gbm_fit) # return fitted gbm object
}#MDL_GBM

Note that by default, predicted values from gbm::predict.gbm() do not return values on the scale of the outcome variable (see ?gbm::predict.gbm()). It is therefore important to also construct a simple prediction method for the mdl_gbm object:

# Write prediction method for gbm.wrapper
predict.mdl_gbm <- function(object, newdata, ...) {
  class(object) <- class(object)[-1] # remove mdl_gbm from the class list
  predict(object, newdata, type = "response", n.trees = object$n.trees)
}#MDL_GBM

As a simple test, we check the class of the object returned by the wrapper and the prediction method we constructed:

# Estimate gbm
gbm_fit <- mdl_gbm(y, X,
                   distribution  = "bernoulli",
                   n.trees = 1000,
                   interaction.depth = 6,
                   verbose = FALSE)
# Compute predictions
fitted_values <- predict(gbm_fit, X)
# Check class
class(gbm_fit)
#> [1] "mdl_gbm" "gbm"
class(fitted_values)
#> [1] "numeric"

The classes are what we expected. The wrapper should now be compatible with the ddml machinery. As a final test, we estimate the local average treatment effect. (Computation can take a while, feel free to set n.trees or interaction.depth to a smaller value for this test.)

# Use the new gbm base learner
learner_gbm <-  list(what = mdl_gbm,
                     args = list(distribution  = "bernoulli",
                                 n.trees = 1000,
                                 interaction.depth = 6,
                                 verbose = FALSE))

# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
                      learners = learner_gbm,
                      sample_folds = 10,
                      silent = TRUE)
summary(late_fit)
#> LATE estimation results: 
#>  
#>   Estimate Std. Error t value Pr(>|t|)
#>     -0.236       0.19   -1.24    0.214

All works well!

A Wrapper for Neural Networks based on `keras`

We now consider an example using the neural network implementation of the keras package. See this guide to keras basics for an excellent introduction to neural networks in R.

# Load keras
library(keras)

The requirements for the wrapper to be compatible with ddml are the same as before, however, because estimating a neural network is more involved, we need to write a more extensive wrapper. In particular, our wrapper for the keras package has four components:

Building the neural network architecture
Compiling the neural network object
Estimating the neural network
Appending the class of the returned object (to write a custom predict method)

The wrapper below allows for custom specification of a feed-forward neural network architecture using the relu activation function. Of course, much more involved architectures are also supported (feel free to experiment!).

mdl_keras <- function(y, X,
                      units = 10, nhidden = 1,
                      optimizer_fun = "rmsprop",
                      loss = "mse",
                      epochs = 10,
                      batch_size = min(1000, length(y)),
                      validation_split = 0,
                      callbacks = NULL,
                      steps_per_epoch = NULL,
                      metrics = c("mae"),
                      verbose = 0) {

  # Data parameters
  nobs <- length(y)

  # Normalize data
  std_y <- c(mean(y), sd(y))
  std_X <- apply(X, 2, function(x) c(mean(x), max(sd(x), 1e-3)))
  y <- (y - std_y[1])/std_y[2]
  X <- X - matrix(replicate(nobs, std_X[1, , drop = F]),
    nrow = nobs, byrow = T)
  X <- X / matrix(replicate(nobs, std_X[2, , drop = F]),
    nrow = nobs, byrow = T)

  # ===============================================
  # ADJUST THIS PART FOR DIFFERENT ARCHITECTURES ==

  # Construct neural network architecture
  nnet <- keras_model_sequential()
  for (k in 1:nhidden) {
    nnet <- nnet %>%
      layer_dense(units = units, use_bias = T,
                  activation = "relu")
  }#FOR
  nnet <- nnet %>%
    layer_dense(units = 1, use_bias = T)

  # ===============================================
  # ===============================================

  # Compile neural net
  nnet %>% keras::compile(optimizer = optimizer_fun,
                           loss = loss,
                           metrics = metrics)

  # Fit neural net
  nnet %>% keras::fit(X, y,
                       epochs = epochs,
                       batch_size = batch_size,
                       validation_split = validation_split,
                       callbacks = callbacks,
                       steps_per_epoch = steps_per_epoch,
                       verbose = verbose)

  # Add standardization to object
  nnet$std_y <- std_y
  nnet$std_X <- std_X

  # Append class
  class(nnet) <- c("mdl_keras", class(nnet))

  # Return fitted object
  return(nnet)
}#MDL_KERAS

The prediction method for our fitted object ensures the predictions are returned as a numerical vector:

predict.mdl_keras <- function(object, newdata){
  # Standardize newdata
  nobs <- nrow(newdata)
  newdata <- newdata - matrix(replicate(nobs, object$std_X[1, , drop = F]),
    nrow = nobs, byrow = T)
  newdata <- newdata / matrix(replicate(nobs, object$std_X[2, , drop = F]),
    nrow = nobs, byrow = T)
  # Predict data and output as matrix
  class(object) <- class(object)[-1] # Not a pretty solution...
  fitted <- as.numeric(predict(object, newdata))
  # Re-standardize to output and return
  object$std_y[2] * fitted + object$std_y[1]
}#PREDICT.MDL_KERAS

To test the wrapper, we again estimate the local average treatment effect. In addition to the architecture, it is important to properly tune the optimization algorithm. callback_list helps with specifying learning rate adjustments and defines the early stopping rule.

# Specify callbacks
callbacks_list <- list(callback_early_stopping(monitor = "val_loss",
                                               patience = 15,
                                               restore_best_weights = T),
                       callback_reduce_lr_on_plateau(monitor = "val_loss",
                                                     factor = 1/10,
                                                     patience = 10,
                                                     verbose = F))

# Use the neural network base learner
learner_keras = list(what = mdl_keras,
                     args = list(units = 10, nhidden = 1,
                                 epochs = 100,
                                 verbose = F,
                                 validation_split = 0.1,
                                 callbacks = callbacks_list))

# Estimate ddml_late
late_fit <- ddml_late(y, D, Z, X,
                      learners = learner_keras,
                      sample_folds = 10,
                      silent = T)
#> Warning in trim_propensity_scores(r_X, trim, ensemble_type): : 2 propensity
#> scores were trimmed.
summary(late_fit)
#> LATE estimation results: 
#>  
#>   Estimate Std. Error t value Pr(>|t|)
#>     -0.187      0.189  -0.989    0.322

All works well!

Introduction

A Wrapper for gbm::gbm.fit()

A Wrapper for Neural Networks based on keras

A Wrapper for `gbm::gbm.fit()`

A Wrapper for Neural Networks based on `keras`