Bayesian Multinomial Logit Model with HART Prior

rhierMnlRwMixture implements an MCMC algorithm for a Bayesian multinomial logit model with a Hierarchical Additive Regression Trees (HART) prior. HART is a hierarchical nonparametric prior that allows for flexible modeling of the representative consumer as a function of potentially many observed characteristics.

Usage

rhierMnlRwMixture(Data, Prior, Mcmc, r_verbose = TRUE)

Arguments

Data

A list containing:

p: Number of choice alternatives (integer).
lgtdata: A list of length nlgt. Each element lgtdata[[i]] must be a list with:
- y: n_i x 1 vector of multinomial outcomes (1 to p).
- X: (n_i * p) x nvar matrix of alternative-specific attributes.
Z (optional): nlgt x nz matrix of observed characteristics for each unit. Should NOT contain an intercept and should be centered.

Prior

A list containing prior parameters:

ncomp: Number of mixture components (required).
a (optional): ncomp x 1 vector of Dirichlet prior parameters for mixture weights pvec (default: rep(5, ncomp)).
deltabar (optional): nz * nvar x 1 prior mean for vec(Delta) (default: 0). Ignored if HART is used.
Ad (optional): Prior precision matrix for vec(Delta) (default: 0.01 * I). Ignored if HART is used.
mubar (optional): nvar x 1 prior mean vector for mixture component means (default: 0 if unrestricted, 2 if restricted).
Amu (optional): Prior precision for mixture component means (default: 0.01 if unrestricted, 0.1 if restricted).
nu (optional): Degrees of freedom for IW prior on component Sigma (default: nvar+3 if unrestricted, nvar+15 if restricted).
V (optional): Location matrix for IW prior on component Sigma (default: nu * I or scaled based on restriction).
SignRes (optional): nvar x 1 vector of sign restrictions. Must contain values of 0, -1, or 1. The value 0 means no restriction, -1 ensures the coefficient is negative, and 1 ensures the coefficient is positive. For example, SignRes = c(0,1,-1) means the first coefficient is unconstrained, the second will be positive, and the third will be negative. Default: rep(0, nvar).
bart (optional): List of parameters for the HART prior. If specified, this models the representative consumer $\Delta(Z)$ as a scaled sum-of-trees factor model. See Details.

Mcmc

A list containing MCMC parameters:

R: Number of MCMC iterations (required).
keep (optional): Thinning parameter (default: 1).
nprint (optional): Print progress every nprint draws (default: 100, 0 for none).
s (optional): RW Metropolis scaling parameter (default: 2.93 / sqrt(nvar)).
w (optional): Fractional likelihood weighting parameter (default: 0.1).

r_verbose

Logical. Print startup messages? Default TRUE.

Value

A list containing:

Deltadraw: If Z provided and bart=NULL, (R/keep) x (nz * nvar) matrix of vec(Delta) draws.
betadraw: nlgt x nvar x (R/keep) array of unit-level beta_i draws.
nmix: List containing mixture draws with components:
- probdraw: (R/keep) x ncomp matrix of mixture component probabilities.
- zdraw: (R/keep) x nlgt matrix of component assignments for each unit.
- compdraw: (R/keep) list of ncomp lists. compdraw[[r]][[j]] = list(mu, rooti) contains the draw of $\mu_j$ and $\Sigma_j^{-1/2}$ for component j at kept draw r.
loglike: (R/keep) x 1 vector of log-likelihood values at kept draws.
SignRes: nvar x 1 vector of sign restrictions used.
bart_trees: If HART used, list containing tree structures and related parameters.

Details

Model Specification

$y_i \sim MNL(X_i, \beta_i)$ for unit $i = 1, ..., nlgt$. The unit-level coefficients (part-worths) $\beta_i$ are modeled as: $$\beta_i = \Delta(Z_i) + u_i$$ where $\Delta(Z_i)$ is the representative consumer component, which depends on observed characteristics $Z_i$, and $u_i$ is the unobserved heterogeneity component.

The representative consumer component is specified as:

If Z is provided and Prior$bart is NULL: $\Delta(Z_i) = Z_i \Delta$ where $\Delta$ is an nz x nvar matrix (linear hierarchical model).
If Z is provided and Prior$bart is a list: $\Delta(Z_i)$ is modeled with a HART prior (scaled sum-of-trees factor model).
If Z is NULL: $\Delta(Z_i) = 0$.

With ncomp = 1 (currently required), the unobserved heterogeneity component follows: $$u_i \sim N(\mu_1, \Sigma_1)$$

Prior Specifications

Mixture weights: $pvec \sim Dirichlet(a)$
Linear model: $\delta = vec(\Delta) \sim N(deltabar, A_d^{-1})$
Mixture component means: $\mu_j \sim N(mubar, \Sigma_j \otimes Amu^{-1})$ (covariance scaled by $\Sigma_j$)
Mixture component covariance: $\Sigma_j \sim IW(\nu, V)$
HART model: A sum-of-trees prior is placed on each factor of the scaled sum-of-trees model (see HART details below).

HART Prior Details

If Prior$bart is a list, it specifies a HART prior for the representative consumer $\Delta(Z)$. This replaces the conventional linear hierarchical specification. The HART prior models the representative consumer using a scaled vector of nvar sum-of-trees models.

HART Parameters (defaults used if not specified in Prior$bart):

num_trees: Number of trees H in each sum-of-trees model (default: 200).
power, base: Parameters for the tree structure prior. The probability of a node at depth q splitting is $\alpha(1+q)^{-\beta}$, where base=$\alpha$ and power=$\beta$. Defaults are base=0.95, power=2, which strongly favors shallow trees.
tau: Parameter controlling the prior variance of terminal leaf coefficients. The default is $\tau = 1/\sqrt{H}$ where $\lambda_{dhg} \sim N(0, \tau^2)$ for terminal leaf coefficients.
numcut: Number of grid points for proposing splitting rules for continuous variables (default: 100).
sparse: If TRUE, use the Dirichlet HART prior to induce sparsity in variable selection (default: FALSE).

Dirichlet HART (sparse = TRUE): The Dirichlet HART model augments the HART prior to induce sparsity in variable selection, following Linero (2018). Instead of uniform probability for selecting splitting variables, the selection probabilities $\tau = (\tau^{(1)}, \ldots, \tau^{(K)})$ are given a sparse Dirichlet prior: $(\tau^{(1)}, \ldots, \tau^{(K)}) \sim Dirichlet(\theta/K, \ldots, \theta/K)$, where K is the number of characteristics. The concentration parameter $\theta$ is given a hierarchical prior: $\theta/(\theta+\rho) \sim Beta(a,b)$.

a, b: Shape parameters for the Beta hyperprior. The default (a=0.5, b=1) induces sparsity where few variables have high selection probabilities.
rho: Parameter influencing sparsity. Default is the number of characteristics K. Reducing rho below K encourages greater sparsity.
theta: When used, sets Dirichlet concentration parameter without additional hyper-prior (default: 0.0).
burn: Number of internal burn-in iterations for the Dirichlet HART sampler before variable selection is allowed (default: 100).

Sign Restrictions

If SignRes[k] is non-zero, the k-th coefficient $\beta_{ik}$ is modeled as $$\beta_{ik} = SignRes[k] \cdot \exp(\beta^*_{ik}).$$ The betadraw output contains the draws for $\beta_{ik}$ (with the restriction applied). The nmix output contains draws for the unrestricted mixture components.

Note: Care should be taken when selecting priors on any sign restricted coefficients.

Note

Currently, the mixture component is not implemented. Please use ncomp = 1 in the Prior specification.

References

Chipman, Hugh A., Edward I. George, and Robert E. McCulloch (2010). "BART: Bayesian Additive Regression Trees." Annals of Applied Statistics 4.1.

Linero, Antonio R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." Journal of the American Statistical Association 113.522, pp. 626-636.

Rossi, Peter E., Greg M. Allenby, and Robert McCulloch (2009). Bayesian Statistics and Marketing. Reprint. Wiley Series in Probability and Statistics. Chichester: Wiley.

Rossi, Peter (2023). bayesm: Bayesian Inference for Marketing/Micro-Econometrics. Comprehensive R Archive Network.

Wiemann, Thomas (2025). "Personalization with HART." Working paper.

Author

Peter Rossi (original bayesm code), Thomas Wiemann (HART modifications).