Simulate Hierarchical Multinomial Logit Data
sim_hier_mnl.Rd
Generates simulated data suitable for testing hierarchical multinomial logit models, particularly those involving individual-specific covariates (Z) influencing coefficients (beta_i). Supports various functional forms for the Z-beta relationship and mixture models for residual heterogeneity.
Usage
sim_hier_mnl(
nlgt = 300,
nT = 10,
p = 3,
nz = 5,
nXa = 2,
nXd = 1,
const = TRUE,
z_dist_func = function(n, d) matrix(stats::runif(n * d, -1, 1), n, d),
standardize_Z = TRUE,
beta_func_type = "linear",
beta_func_args = list(),
ncomp = 1,
mixture_comps = NULL,
sigma_inv_diag = 1,
Xa_dist_func = function(n, p, na) matrix(stats::runif(n * p * na, -1, 1), ncol = p *
na),
Xd_dist_func = function(n, nd) matrix(stats::rnorm(n * nd), ncol = nd),
seed = NULL
)
Arguments
- nlgt
Integer. Number of individuals or cross-sectional units.
- nT
Integer. Number of choice observations per individual.
- p
Integer. Number of choice alternatives (including outside option if any).
- nz
Integer. Number of demographic/individual-specific variables in Z. If
nz = 0
, no Z matrix is generated,betabar_true
is set to zero, andbeta_func_type
/beta_func_args
are ignored.- nXa
Integer. Number of alternative-specific variables in X.
- nXd
Integer. Number of choice-invariant variables in X (e.g., price).
- const
Logical. Include p-1 intercepts in the model?
- z_dist_func
Function. A function to generate the Z matrix. Must accept arguments
n
(nlgt) andd
(nz) and return an n x d matrix. Default:function(n, d) matrix(runif(n*d, -1, 1), n, d)
.- standardize_Z
Logical. Standardize the generated Z matrix (mean 0, sd 1)?
- beta_func_type
Character. Specifies the functional form mapping Z to the systematic component of beta (
betabar_i
). Ignored ifnz = 0
. Options:"linear"
: Linear functionbetabar_i = Z_i %*% Delta
. RequiresDelta
inbeta_func_args
."step"
: Step function based on one Z variable. Requirescutoff
,beta_1
,beta_2
,Z_index
inbeta_func_args
."friedman"
: Friedman benchmark function (modified) based on first 5 Z variables. Requirescoef_index
inbeta_func_args
to specify which coefficient it applies to (others are zero)."custom"
: A user-defined function provided inbeta_func_args$func
.
- beta_func_args
List. Arguments needed for the chosen
beta_func_type
. Ignored ifnz = 0
.For
"linear"
:list(Delta = matrix(runif(ncoef * nz), nrow=nz))
. Delta isnz x ncoef
.For
"step"
:list(cutoff = 0, beta_1 = rep(-1, ncoef), beta_2 = rep(1, ncoef), Z_index = 1)
.beta_1
/beta_2
are vectors of lengthncoef
.Z_index
is the column of Z to use.For
"friedman"
:list(coef_index = 1)
.nz
must be >= 5. The function is applied tobetabar_i[coef_index]
, others are 0.For
"custom"
:list(func = function(Zi) { ... })
. The function must take a vectorZi
(a row of Z) and return a vectorbetabar_i
of lengthncoef
.
- ncomp
Integer. Number of components in the normal mixture for residual heterogeneity (
eps_i
).- mixture_comps
List. Optional pre-specified mixture components. A list of length
ncomp
, where each element islist(mu = ..., rooti = ...)
.mu
is the mean vector (lengthncoef
),rooti
is the upper Cholesky factor of the inverse covariance matrix (ncoef x ncoef
). IfNULL
, components are generated based onsigma_inv_diag
.- sigma_inv_diag
Numeric. Diagonal value for the inverse covariance matrix (precision) of mixture components if
mixture_comps
isNULL
. Assumes identity covariance scaled by this.- Xa_dist_func
Function. Function to generate alternative-specific variables
Xa
. Takesn
(nT),p
,na
(nXa) and returns a matrix (usuallyn x (p*na)
or similar structure expected bycreateX
). Default:function(n, p, na) matrix(runif(n*p*na, -1, 1), ncol=p*na)
.- Xd_dist_func
Function. Function to generate choice-invariant variables
Xd
. Takesn
(nT),nd
(nXd) and returns ann x nd
matrix. Default:function(n, nd) matrix(rnorm(n*nd), ncol=nd)
.- seed
Integer. Optional random seed for reproducibility.
Value
A list suitable for direct use as the Data
argument in
rhierMnlRwMixture
, containing:
p
: Number of alternatives.lgtdata
: List of lengthnlgt
. Each elementi
islist(y=y_i, X=X_i, beta=beta_i, betabar=betabar_i)
.Z
: Thenlgt x nz
matrix of individual-specific covariates (standardized if requested). Additionally, the list containstrue_values
:true_values$beta_true
:nlgt x ncoef
matrix of truebeta_i
.true_values$betabar_true
:nlgt x ncoef
matrix of truebetabar_i = f(Z_i)
.true_values$true_params
: List containing parameters used for generation (beta_func_type
,beta_func_args
,mixture_comps
,pvec
).true_values$dimensions
: List containing key dimensions used (p
,nlgt
,nT
,nz
,ncoef
, etc.).
Examples
# Simple linear example
sim_data_linear <- sim_hier_mnl(nlgt = 50, nT = 5, p = 3, nz = 2, nXa = 1, nXd = 0,
beta_func_type = "linear", seed = 123)
plot(sim_data_linear$Z[,1], sim_data_linear$true_values$betabar_true[,1]) # Visualize linear
# Step function example
sim_data_step <- sim_hier_mnl(nlgt = 50, nT = 5, p = 3, nz = 2, nXa = 1, nXd = 0,
beta_func_type = "step",
beta_func_args = list(Z_index = 1),
seed = 456)
plot(sim_data_step$Z[,1], sim_data_step$true_values$betabar_true[,1]) # Visualize step