RNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations.
We present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters' structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks.
All algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources.