Unsupervised clustering of over-the-counter healthcare products into product categories.


A general problem in biosurveillance is finding appropriate aggregates of elemental data to monitor for the detection of disease outbreaks. We developed an unsupervised clustering algorithm for aggregating over-the-counter healthcare (OTC) products into categories. This algorithm employs MCMC over hundreds of parameters in a Bayesian model to place products into clusters. Despite the high dimensionality, it still performs fast on hundreds of time series. The procedure was able to uncover a clinically significant distinction between OTC products intended for the treatment of allergy and OTC products intended for the treatment of cough, cold, and influenza symptoms.

MIDAS Network Members