Skip to contents

rhierMnlRwMixture implements an MCMC algorithm for a Bayesian multinomial logit model with a Hierarchical Additive Regression Trees (HART) prior. HART is a hierarchical nonparametric prior that allows for flexible modeling of the representative consumer as a function of potentially many observed characteristics.

Usage

rhierMnlRwMixture(Data, Prior, Mcmc, r_verbose = TRUE)

Arguments

Data

A list containing:

  • p: Number of choice alternatives (integer).

  • lgtdata: A list of length nlgt. Each element lgtdata[[i]] must be a list with:

    • y: n_i x 1 vector of multinomial outcomes (1 to p).

    • X: (n_i * p) x nvar matrix of alternative-specific attributes.

  • Z (optional): nlgt x nz matrix of observed characteristics for each unit. Should NOT contain an intercept and should be centered.

Prior

A list containing prior parameters:

  • ncomp: Number of mixture components (required).

  • a (optional): ncomp x 1 vector of Dirichlet prior parameters for mixture weights pvec (default: rep(5, ncomp)).

  • deltabar (optional): nz * nvar x 1 prior mean for vec(Delta) (default: 0). Ignored if HART is used.

  • Ad (optional): Prior precision matrix for vec(Delta) (default: 0.01 * I). Ignored if HART is used.

  • mubar (optional): nvar x 1 prior mean vector for mixture component means (default: 0 if unrestricted, 2 if restricted).

  • Amu (optional): Prior precision for mixture component means (default: 0.01 if unrestricted, 0.1 if restricted).

  • nu (optional): Degrees of freedom for IW prior on component Sigma (default: nvar+3 if unrestricted, nvar+15 if restricted).

  • V (optional): Location matrix for IW prior on component Sigma (default: nu * I or scaled based on restriction).

  • SignRes (optional): nvar x 1 vector of sign restrictions. Must contain values of 0, -1, or 1. The value 0 means no restriction, -1 ensures the coefficient is negative, and 1 ensures the coefficient is positive. For example, SignRes = c(0,1,-1) means the first coefficient is unconstrained, the second will be positive, and the third will be negative. Default: rep(0, nvar).

  • bart (optional): List of parameters for the HART prior. If specified, this models the representative consumer \(\Delta(Z)\) as a scaled sum-of-trees factor model. See Details.

Mcmc

A list containing MCMC parameters:

  • R: Number of MCMC iterations (required).

  • keep (optional): Thinning parameter (default: 1).

  • nprint (optional): Print progress every nprint draws (default: 100, 0 for none).

  • s (optional): RW Metropolis scaling parameter (default: 2.93 / sqrt(nvar)).

  • w (optional): Fractional likelihood weighting parameter (default: 0.1).

r_verbose

Logical. Print startup messages? Default TRUE.

Value

A list containing:

  • Deltadraw: If Z provided and bart=NULL, (R/keep) x (nz * nvar) matrix of vec(Delta) draws.

  • betadraw: nlgt x nvar x (R/keep) array of unit-level beta_i draws.

  • nmix: List containing mixture draws with components:

    • probdraw: (R/keep) x ncomp matrix of mixture component probabilities.

    • zdraw: (R/keep) x nlgt matrix of component assignments for each unit.

    • compdraw: (R/keep) list of ncomp lists. compdraw[[r]][[j]] = list(mu, rooti) contains the draw of \(\mu_j\) and \(\Sigma_j^{-1/2}\) for component j at kept draw r.

  • loglike: (R/keep) x 1 vector of log-likelihood values at kept draws.

  • SignRes: nvar x 1 vector of sign restrictions used.

  • bart_trees: If HART used, list containing tree structures and related parameters.

Details

Model Specification

\(y_i \sim MNL(X_i, \beta_i)\) for unit \(i = 1, ..., nlgt\). The unit-level coefficients (part-worths) \(\beta_i\) are modeled as: $$\beta_i = \Delta(Z_i) + u_i$$ where \(\Delta(Z_i)\) is the representative consumer component, which depends on observed characteristics \(Z_i\), and \(u_i\) is the unobserved heterogeneity component.

The representative consumer component is specified as:

  • If Z is provided and Prior$bart is NULL: \(\Delta(Z_i) = Z_i \Delta\) where \(\Delta\) is an nz x nvar matrix (linear hierarchical model).

  • If Z is provided and Prior$bart is a list: \(\Delta(Z_i)\) is modeled with a HART prior (scaled sum-of-trees factor model).

  • If Z is NULL: \(\Delta(Z_i) = 0\).

With ncomp = 1 (currently required), the unobserved heterogeneity component follows: $$u_i \sim N(\mu_1, \Sigma_1)$$

Prior Specifications

  • Mixture weights: \(pvec \sim Dirichlet(a)\)

  • Linear model: \(\delta = vec(\Delta) \sim N(deltabar, A_d^{-1})\)

  • Mixture component means: \(\mu_j \sim N(mubar, \Sigma_j \otimes Amu^{-1})\) (covariance scaled by \(\Sigma_j\))

  • Mixture component covariance: \(\Sigma_j \sim IW(\nu, V)\)

  • HART model: A sum-of-trees prior is placed on each factor of the scaled sum-of-trees model (see HART details below).

HART Prior Details

If Prior$bart is a list, it specifies a HART prior for the representative consumer \(\Delta(Z)\). This replaces the conventional linear hierarchical specification. The HART prior models the representative consumer using a scaled vector of nvar sum-of-trees models.

HART Parameters (defaults used if not specified in Prior$bart):

  • num_trees: Number of trees H in each sum-of-trees model (default: 200).

  • power, base: Parameters for the tree structure prior. The probability of a node at depth q splitting is \(\alpha(1+q)^{-\beta}\), where base=\(\alpha\) and power=\(\beta\). Defaults are base=0.95, power=2, which strongly favors shallow trees.

  • tau: Parameter controlling the prior variance of terminal leaf coefficients. The default is \(\tau = 1/\sqrt{H}\) where \(\lambda_{dhg} \sim N(0, \tau^2)\) for terminal leaf coefficients.

  • numcut: Number of grid points for proposing splitting rules for continuous variables (default: 100).

  • sparse: If TRUE, use the Dirichlet HART prior to induce sparsity in variable selection (default: FALSE).

Dirichlet HART (sparse = TRUE): The Dirichlet HART model augments the HART prior to induce sparsity in variable selection, following Linero (2018). Instead of uniform probability for selecting splitting variables, the selection probabilities \(\tau = (\tau^{(1)}, \ldots, \tau^{(K)})\) are given a sparse Dirichlet prior: \((\tau^{(1)}, \ldots, \tau^{(K)}) \sim Dirichlet(\theta/K, \ldots, \theta/K)\), where K is the number of characteristics. The concentration parameter \(\theta\) is given a hierarchical prior: \(\theta/(\theta+\rho) \sim Beta(a,b)\).

  • a, b: Shape parameters for the Beta hyperprior. The default (a=0.5, b=1) induces sparsity where few variables have high selection probabilities.

  • rho: Parameter influencing sparsity. Default is the number of characteristics K. Reducing rho below K encourages greater sparsity.

  • theta: When used, sets Dirichlet concentration parameter without additional hyper-prior (default: 0.0).

  • burn: Number of internal burn-in iterations for the Dirichlet HART sampler before variable selection is allowed (default: 100).

Sign Restrictions

If SignRes[k] is non-zero, the k-th coefficient \(\beta_{ik}\) is modeled as $$\beta_{ik} = SignRes[k] \cdot \exp(\beta^*_{ik}).$$ The betadraw output contains the draws for \(\beta_{ik}\) (with the restriction applied). The nmix output contains draws for the unrestricted mixture components.

Note: Care should be taken when selecting priors on any sign restricted coefficients.

Note

Currently, the mixture component is not implemented. Please use ncomp = 1 in the Prior specification.

References

Chipman, Hugh A., Edward I. George, and Robert E. McCulloch (2010). "BART: Bayesian Additive Regression Trees." Annals of Applied Statistics 4.1.

Linero, Antonio R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." Journal of the American Statistical Association 113.522, pp. 626-636.

Rossi, Peter E., Greg M. Allenby, and Robert McCulloch (2009). Bayesian Statistics and Marketing. Reprint. Wiley Series in Probability and Statistics. Chichester: Wiley.

Rossi, Peter (2023). bayesm: Bayesian Inference for Marketing/Micro-Econometrics. Comprehensive R Archive Network.

Wiemann, Thomas (2025). "Personalization with HART." Working paper.

Author

Peter Rossi (original bayesm code), Thomas Wiemann (HART modifications).