Many features of virus populations make them ideal candidates for population genetic study, including a very high rate of mutation, high levels of nucleotide diversity, exceptionally large census population sizes, and frequent positive selection. However, these attributes also mean that special care must be taken in population genetic inference. for example, highly skewed progeny distributions, frequent and severe population bottleneck events associated with infection and compartmentalization, and strong selection all affect the distribution of genetic variation but are generally not taken into account. Thus, improved inference of viral populations will necessarily require not only theoretical development, but also the implementation of this developed theory into statistical inference tools capable of analyzing thousands of viral genomes in a computationally efficient manner. Here, I propose these necessary developments (Aims 1-2), as well as present an application to two exceptionally deep datasets to which we have unique access via our consortium affiliations (Aim 3). In total, this proposal represents not only a significant step in forwarding our understanding of population genetics in these extreme parameter spaces, but will also provide valuable clinical insights that are expected to improve future patient treatment strategies.
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES