Estimating Effects with Rare Outcomes and High Dimensional Covariates: Knowledge is Power


Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect or association of an exposure on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional mean of the outcome, given the exposure and measured confounders. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including a propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator yielded consistent estimates if either the conditional mean outcome or the propensity score was consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. We applied the estimator to investigate the association between permissive neighborhood drunkenness norms and alcohol use disorder. Our results highlight the potential for double robust, semiparametric efficient estimation with rare events and high dimensional covariates.

MIDAS Network Members