Bayesian Multinomial Logit Model with HART Prior
rhierMnlRwMixture.Rd
rhierMnlRwMixture
implements an MCMC algorithm for a Bayesian multinomial
logit model with a Hierarchical Additive Regression Trees (HART) prior. HART
is a hierarchical nonparametric prior that allows for flexible modeling of the
representative consumer as a function of potentially many observed characteristics.
Arguments
- Data
A list containing:
p
: Number of choice alternatives (integer).lgtdata
: A list of lengthnlgt
. Each elementlgtdata[[i]]
must be a list with:y
:n_i x 1
vector of multinomial outcomes (1 top
).X
:(n_i * p) x nvar
matrix of alternative-specific attributes.
Z
(optional):nlgt x nz
matrix of observed characteristics for each unit. Should NOT contain an intercept and should be centered.
- Prior
A list containing prior parameters:
ncomp
: Number of mixture components (required).a
(optional):ncomp x 1
vector of Dirichlet prior parameters for mixture weightspvec
(default:rep(5, ncomp)
).deltabar
(optional):nz * nvar x 1
prior mean forvec(Delta)
(default: 0). Ignored if HART is used.Ad
(optional): Prior precision matrix forvec(Delta)
(default:0.01 * I
). Ignored if HART is used.mubar
(optional):nvar x 1
prior mean vector for mixture component means (default: 0 if unrestricted, 2 if restricted).Amu
(optional): Prior precision for mixture component means (default: 0.01 if unrestricted, 0.1 if restricted).nu
(optional): Degrees of freedom for IW prior on componentSigma
(default:nvar+3
if unrestricted,nvar+15
if restricted).V
(optional): Location matrix for IW prior on componentSigma
(default:nu * I
or scaled based on restriction).SignRes
(optional):nvar x 1
vector of sign restrictions. Must contain values of 0, -1, or 1. The value 0 means no restriction, -1 ensures the coefficient is negative, and 1 ensures the coefficient is positive. For example,SignRes = c(0,1,-1)
means the first coefficient is unconstrained, the second will be positive, and the third will be negative. Default:rep(0, nvar)
.bart
(optional): List of parameters for the HART prior. If specified, this models the representative consumer \(\Delta(Z)\) as a scaled sum-of-trees factor model. See Details.
- Mcmc
A list containing MCMC parameters:
R
: Number of MCMC iterations (required).keep
(optional): Thinning parameter (default: 1).nprint
(optional): Print progress everynprint
draws (default: 100, 0 for none).s
(optional): RW Metropolis scaling parameter (default:2.93 / sqrt(nvar)
).w
(optional): Fractional likelihood weighting parameter (default: 0.1).
- r_verbose
Logical. Print startup messages? Default TRUE.
Value
A list containing:
Deltadraw
: IfZ
provided andbart=NULL
,(R/keep) x (nz * nvar)
matrix ofvec(Delta)
draws.betadraw
:nlgt x nvar x (R/keep)
array of unit-levelbeta_i
draws.nmix
: List containing mixture draws with components:probdraw
:(R/keep) x ncomp
matrix of mixture component probabilities.zdraw
:(R/keep) x nlgt
matrix of component assignments for each unit.compdraw
:(R/keep)
list ofncomp
lists.compdraw[[r]][[j]] = list(mu, rooti)
contains the draw of \(\mu_j\) and \(\Sigma_j^{-1/2}\) for componentj
at kept drawr
.
loglike
:(R/keep) x 1
vector of log-likelihood values at kept draws.SignRes
:nvar x 1
vector of sign restrictions used.bart_trees
: If HART used, list containing tree structures and related parameters.
Details
Model Specification
\(y_i \sim MNL(X_i, \beta_i)\) for unit \(i = 1, ..., nlgt\). The unit-level coefficients (part-worths) \(\beta_i\) are modeled as: $$\beta_i = \Delta(Z_i) + u_i$$ where \(\Delta(Z_i)\) is the representative consumer component, which depends on observed characteristics \(Z_i\), and \(u_i\) is the unobserved heterogeneity component.
The representative consumer component is specified as:
If
Z
is provided andPrior$bart
isNULL
: \(\Delta(Z_i) = Z_i \Delta\) where \(\Delta\) is annz x nvar
matrix (linear hierarchical model).If
Z
is provided andPrior$bart
is a list: \(\Delta(Z_i)\) is modeled with a HART prior (scaled sum-of-trees factor model).If
Z
isNULL
: \(\Delta(Z_i) = 0\).
With ncomp = 1
(currently required), the unobserved heterogeneity component follows:
$$u_i \sim N(\mu_1, \Sigma_1)$$
Prior Specifications
Mixture weights: \(pvec \sim Dirichlet(a)\)
Linear model: \(\delta = vec(\Delta) \sim N(deltabar, A_d^{-1})\)
Mixture component means: \(\mu_j \sim N(mubar, \Sigma_j \otimes Amu^{-1})\) (covariance scaled by \(\Sigma_j\))
Mixture component covariance: \(\Sigma_j \sim IW(\nu, V)\)
HART model: A sum-of-trees prior is placed on each factor of the scaled sum-of-trees model (see HART details below).
HART Prior Details
If Prior$bart
is a list, it specifies a HART prior for the representative consumer \(\Delta(Z)\).
This replaces the conventional linear hierarchical specification. The HART prior models the representative
consumer using a scaled vector of nvar
sum-of-trees models.
HART Parameters (defaults used if not specified in Prior$bart
):
num_trees
: Number of trees H in each sum-of-trees model (default: 200).power
,base
: Parameters for the tree structure prior. The probability of a node at depthq
splitting is \(\alpha(1+q)^{-\beta}\), wherebase
=\(\alpha\) andpower
=\(\beta\). Defaults arebase=0.95
,power=2
, which strongly favors shallow trees.tau
: Parameter controlling the prior variance of terminal leaf coefficients. The default is \(\tau = 1/\sqrt{H}\) where \(\lambda_{dhg} \sim N(0, \tau^2)\) for terminal leaf coefficients.numcut
: Number of grid points for proposing splitting rules for continuous variables (default: 100).sparse
: IfTRUE
, use the Dirichlet HART prior to induce sparsity in variable selection (default:FALSE
).
Dirichlet HART (sparse = TRUE
): The Dirichlet HART model augments the HART prior to induce sparsity in variable selection, following Linero (2018). Instead of uniform probability for selecting splitting variables, the selection probabilities \(\tau = (\tau^{(1)}, \ldots, \tau^{(K)})\) are given a sparse Dirichlet prior: \((\tau^{(1)}, \ldots, \tau^{(K)}) \sim Dirichlet(\theta/K, \ldots, \theta/K)\), where K is the number of characteristics. The concentration parameter \(\theta\) is given a hierarchical prior: \(\theta/(\theta+\rho) \sim Beta(a,b)\).
a
,b
: Shape parameters for the Beta hyperprior. The default (a=0.5, b=1
) induces sparsity where few variables have high selection probabilities.rho
: Parameter influencing sparsity. Default is the number of characteristics K. Reducing rho below K encourages greater sparsity.theta
: When used, sets Dirichlet concentration parameter without additional hyper-prior (default: 0.0).burn
: Number of internal burn-in iterations for the Dirichlet HART sampler before variable selection is allowed (default: 100).
Sign Restrictions
If SignRes[k]
is non-zero, the k-th coefficient \(\beta_{ik}\) is modeled as
$$\beta_{ik} = SignRes[k] \cdot \exp(\beta^*_{ik}).$$
The betadraw
output contains the draws for \(\beta_{ik}\) (with the restriction applied).
The nmix
output contains draws for the unrestricted mixture components.
Note: Care should be taken when selecting priors on any sign restricted coefficients.
Note
Currently, the mixture component is not implemented. Please use
ncomp = 1
in the Prior specification.
References
Chipman, Hugh A., Edward I. George, and Robert E. McCulloch (2010). "BART: Bayesian Additive Regression Trees." Annals of Applied Statistics 4.1.
Linero, Antonio R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." Journal of the American Statistical Association 113.522, pp. 626-636.
Rossi, Peter E., Greg M. Allenby, and Robert McCulloch (2009). Bayesian Statistics and Marketing. Reprint. Wiley Series in Probability and Statistics. Chichester: Wiley.
Rossi, Peter (2023). bayesm: Bayesian Inference for Marketing/Micro-Econometrics. Comprehensive R Archive Network.
Wiemann, Thomas (2025). "Personalization with HART." Working paper.