Beginning with John Snow's investigations of cholera epidemics, understanding and preventing infectious disease transmission has been one of the fundamental goals of epidemiology. Whole-genome sequences from viruses and bacteria are a promising new source of information about disease transmission, but current statistical methods are unable to incorporate these data into the analysis of transmission in households and other close-contact groups. The long-term goal is to develop statistical and epidemiologic methods that use high-resolution transmission data and genetic sequence data to inform rapid and effective public health responses to emerging infections. The goal of the proposed research is to develop flexible and robust regression models for infectious disease transmission data that can incorporate pathogen genetic sequences. These will be based on a recently-developed semiparametric regression model that can estimate parameters crucial to mathematical models of epidemics and the design of interventions, including hazard ratios for covariate effects on infectiousness and susceptibility and baseline hazards of transmission in infectious-susceptible pairs. To make it a more practical tool for infectious disease epidemiology, this model will be extended to account for external sources of infection, missing data, and small samples. The partial likelihood for this model is a sum over the set of transmission trees consistent with the epidemiologic data on person, place, and time. Since a phylogeny linking pathogen samples from infected individuals constrains the set of possible transmission trees, pathogen genetic sequence data can be combined with epidemiologic data to obtain more efficient estimates of transmission parameters. Epidemiologic and genetic data will be combined by developing algorithms to find the set of transmission trees simultaneously consistent with both. These algorithms will be incorporated into Markov chain Monte Carlo or sequential Monte Carlo estimation procedures that will account for missing data and phylogenetic uncertainty. These methods will serve as a theoretical basis for the development of efficient case-control and case-cohort study designs for outbreak investigations and vaccine trials. The proposed research is innovative because it synthesizes survival analysis and statistical genetics to analyze infectious disease transmission data. It is significant because it will improve the collection and analysis o data and the evaluation of interventions in epidemics, allowing more effective control of emerging infections.


Funding Source

Project Period


This site is registered on as a development site.