A general problem in biosurveillance is finding the optimal aggregates of more basic data to monitor for the detection of disease outbreaks. We developed a multivariate procedure for identifying the set of over-the-counter (OTC) healthcare products that correlates best with a set of diagnoses. To ensure that the procedure produces results that agree with clinical knowledge of diseases and (OTC) products, we applied it to a set of products and set of diagnoses for which the correlation was known to be high. Our hypothesis was that the model could achieve parsimony in the set of diagnoses that correlate with sales of pediatric electrolytes while still producing a high correlation. The procedure narrowed the set of diagnoses that correlate with pediatric electrolytes from 51 diagnoses to eight diagnoses. The correlation of the set of 51 diagnoses with electrolyte sales was 0.95 and the correlation of the set of 8 diagnoses with electrolytes was 0.96. We conclude that the procedure functions as intended and is suitable for further testing with other problems in finding optimal aggregates of OTC products, and more generally of other types of biosurveillance data, to monitor for the detection of various disease outbreaks.
Li R, Wallstrom GL, Hogan WR. (2005). A multivariate procedure for identifying correlations between diagnoses and over-the-counter products from historical datasets. AMIA ... Annual Symposium proceedings. AMIA Symposium