ClassTR: Classifying Within-Host Heterogeneity Based on Tandem Repeats with Application to Mycobacterium tuberculosis Infections.


Genomic tools have revealed genetically diverse pathogens within some hosts. Within-host pathogen diversity, which we refer to as "complex infection", is increasingly recognized as a determinant of treatment outcome for infections like tuberculosis. Complex infection arises through two mechanisms: within-host mutation (which results in clonal heterogeneity) and reinfection (which results in mixed infections). Estimates of the frequency of within-host mutation and reinfection in populations are critical for understanding the natural history of disease. These estimates influence projections of disease trends and effects of interventions. The genotyping technique MLVA (multiple loci variable-number tandem repeats analysis) can identify complex infections, but the current method to distinguish clonal heterogeneity from mixed infections is based on a rather simple rule. Here we describe ClassTR, a method which leverages MLVA information from isolates collected in a population to distinguish mixed infections from clonal heterogeneity. We formulate the resolution of complex infections into their constituent strains as an optimization problem, and show its NP-completeness. We solve it efficiently by using mixed integer linear programming and graph decomposition. Once the complex infections are resolved into their constituent strains, ClassTR probabilistically classifies isolates as clonally heterogeneous or mixed by using a model of tandem repeat evolution. We first compare ClassTR with the standard rule-based classification on 100 simulated datasets. ClassTR outperforms the standard method, improving classification accuracy from 48% to 80%. We then apply ClassTR to a sample of 436 strains collected from tuberculosis patients in a South African community, of which 92 had complex infections. We find that ClassTR assigns an alternate classification to 18 of the 92 complex infections, suggesting important differences in practice. By explicitly modeling tandem repeat evolution, ClassTR helps to improve our understanding of the mechanisms driving within-host diversity of pathogens like Mycobacterium tuberculosis.

MIDAS Network Members