Thomas Wiemann
Welcome! I'm a Postdoctoral Scholar in Marketing at the University of Chicago Booth School of Business. I obtained my PhD in Economics from the University of Chicago Department of Economics in 2025. My research interests lie in the intersection of marketing, econometrics, and machine learning.
Job Market Paper
Personalization with HART
[draft; R package].
Firms personalize prices, advertising, product design, and more to find and serve their—often highly heterogeneous—consumers. When personalizing to known consumers, these marketing decisions can be informed by past choice behavior. However, personalization must rely on observed characteristics for new consumers with limited or no purchase histories. I propose Bayesian hierarchical additive regression trees (HART) to define optimal marketing decisions that adapt to the firm’s familiarity with the consumer. HART combines the strengths of supervised machine learning and hierarchical Bayesian models in one framework: First, it flexibly leverages potentially many observed characteristics to personalize to new consumers. Second, it optimally adapts to the consumer’s specific preferences as their choices are recorded over time. I develop an efficient Metropolis-within-Gibbs sampler for fully Bayesian inference and apply it in two discrete choice applications. Using data from a canonical conjoint study, I illustrate how HART discovers marketing opportunities for product design in new markets. In a CPG scanner data application, HART leverages observed characteristics to improve out-of-sample choice prediction by 60% for new consumers, and raises profits by 13% and 2% compared to conventional personalization approaches for new and known consumers, respectively.
Presented at: ISMS Marketing Science Conference 2025
Working papers
Optimal Categorical Instrumental Variables
Revision requested at the Journal of Business & Economic Statistics.
[abstract; arXiv; R package].
This paper discusses estimation with a categorical instrumental variable in settings with potentially few observations per category. The proposed categorical instrumental variable estimator (CIV) leverages a regularization assumption that implies existence of a latent categorical variable with fixed finite support achieving the same first stage fit as the observed instrument. In asymptotic regimes that allow the number of observations per category to grow at arbitrary small polynomial rate with the sample size, I show that when the cardinality of the support of the optimal instrument is known, CIV is root-n asymptotically normal, achieves the same asymptotic variance as the oracle IV estimator that presumes knowledge of the optimal instrument, and is semiparametrically efficient under homoskedasticity. Under-specifying the number of support points reduces efficiency but maintains asymptotic normality. In an application that leverages judge fixed effects as instruments, CIV compares favorably to commonly used jackknife-based instrumental variable estimators.
Presented at: International Association for Applied Econometrics 2023, North American Winter Meeting of the Econometric Society 2024
An Introduction to Double/Debiased Machine Learning
with Achim Ahrens, Victor Chernozhukov, Christian Hansen, Damian Kozbur, Mark Schaffer.
Revision requested at the Journal of Economic Literature.
[abstract; arXiv; tutorial].
This paper provides a practical introduction to Double/Debiased Machine Learning (DML). DML provides a general approach to performing inference about a target parameter in the presence of nuisance parameters. The aim of DML is to reduce the impact of nuisance parameter estimation on estimators of the parameter of interest. We describe DML and its two essential components: Neyman orthogonality and cross-fitting. We highlight that DML reduces functional form dependence and accommodates the use of complex data types, such as text data. We illustrate its application through three empirical examples that demonstrate DML's applicability in cross-sectional and panel settings.
Demand Estimation with Finitely Many Consumers
with Jonas Lieber.
[abstract; draft; slides]
Although market shares are frequently estimated via averages of finitely many consumer choices, commonly applied methods for demand estimation are not robust to estimation error in these shares. While non-negligible estimation error in market shares always introduces bias in the demand parameter estimators, the issue becomes most salient when estimated market shares are zero. In the presence of zero shares, widely applied estimators of the random coefficient logit model cannot be computed without ad-hoc data manipulations. This paper proposes a new estimator of demand parameters for settings with endogenous prices and estimated market shares that is robust to zero-valued market shares. The estimator generalizes the constrained optimization program of Dubé et al. (2012) with probabilistic bounds on the estimation error in market shares. We show consistency as the number of markets $T$ grows sufficiently slowly relative to the number of consumers $n$ such that $\log(T)/n\to 0$, and provide confidence intervals under the same regime. Simulations suggest improved finite sample properties of the proposed estimator to conventional alternatives.
Presented at: Optimization Conscious Econometrics Conference 2023, North American Summer Meetings of the Econometric Society 2023
Guarantees on Correct Conclusions with Incorrect Likelihoods
[abstract; draft]
This note studies robustness properties of (non)linear control function estimands such as (mixed) Logistic or Poisson pseudo maximum likelihood estimands. I show that under misspecification, commonly-applied estimands are not informative about the sign of the true partial effects. For example, (mixed) logistic regression estimands potentially imply positive partial effects even if all true partial effects are negative. I provide sufficient conditions to admit valid conclusions about the sign of partial effects. For a large class of estimands, including common pseudo maximum likelihood estimands based on natural exponential family distributions, nonparametrically conditioning on the control function is sufficient for sign preservation.
Effects of Health Care Policy Uncertainty on Households’ Portfolio Choice
with Robin L Lumsdaine.
[abstract; draft; slides]
This paper develops a nonparametric identification approach for causal effects of an endogenous macroeconomic variable on microeconomic outcomes. The key assumption is the existence of an exogenous variable that shifts responsiveness to the variable of interest without shifting responsiveness to other macroeconomic time series. We apply the approach to study the effect of health care policy uncertainty (HCPU) on households' portfolio choice using health shocks to capture cross-sectional heterogeneity. Under the additional assumption of risk averse agents, our approach provides an informative bound on the average causal effect of HCPU. The empirical results highlight HCPU as an important determinant of households' financial behavior, and showcase substantial heterogeneity in HCPU effects across varying unexpected changes to health.
Presented at: Stanford Institute for Theoretical Economics 2019, International Association for Applied Econometrics 2019, Society for Financial Econometrics 2019, Royal Economic Society 2023
Publications
Model Averaging and Double Machine Learning
with Achim Ahrens, Christian Hansen, Mark Schaffer.
Journal of Applied Econometrics, 2025, 40(3): 249-269.
[abstract; article; Stata package; R package]
This paper discusses pairing double/debiased machine learning (DDML) with stacking, a model averaging method for combining multiple candidate learners, to estimate structural parameters. We introduce two new stacking approaches for DDML: short-stacking exploits the cross-fitting step of DDML to substantially reduce the computational burden and pooled stacking enforces common stacking weights over cross-fitting folds. Using calibrated simulation studies and two applications estimating gender gaps in citations and wages, we show that DDML with stacking is more robust to partially unknown functional forms than common alternative approaches based on single pre-selected learners. We provide Stata and R software implementing our proposals.
Presented at: Machine Learning in Economics Summer Institute 2022
ddml: Double/debiased machine learning in Stata
with Achim Ahrens, Christian Hansen, Mark Schaffer.
Stata Journal, 2024, 24(1): 3-45.
[abstract; article; Stata package; R package]
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables. ddml is compatible with many existing supervised machine learning programs in Stata. We recommend using DDML in combination with stacking estimation which combines multiple machine learners into a final predictor. We provide Monte Carlo evidence to support our recommendation.
Work in Progress
Machine Learning learns Bayes
with Andrew Bai, Sanjog Misra.
Software
Teaching
Econometrics – Econ 21020 (Spring 2022)
[course website; syllabus; course material; evaluations]