Use of Linked Databases for Improved Confounding Control: Considerations for Potential Selection Bias.


Pharmacoepidemiologic studies are increasingly conducted within linked databases, often to obtain richer confounder data. However, the potential for selection bias is frequently overlooked when linked data is only available for a subset of patients. We highlight the importance of accounting for potential selection bias by evaluating the association between antipsychotics and type 2 diabetes in youths within a claims database linked to a smaller laboratory database. We used inverse-probability of treatment weights (IPTW) to control for confounding. In analyses restricted to the linked cohorts, we applied inverse-probability of selection weights (IPSW) to create a population representative of the full cohort. We used pooled logistic regression weighted by IPTW only or IPTW and IPSW to estimate treatment effects. Metabolic conditions were more prevalent in linked cohorts compared to the full cohort. Within the full cohort, the confounding-adjusted hazard ratio (aHR) was 2.26 (95% CI: 2.07-2.49) comparing initiation of antipsychotics to initiation of control medications. Within the linked cohorts, a different magnitude of association was obtained without adjustment for selection, whereas applying IPSW resulted in similar point estimates as the full cohort (e.g., aHR of 1.63 became 2.12). Linked database studies may generate biased estimates without proper adjustment for potential selection bias.

MIDAS Network Members

This site is registered on as a development site.