Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data


Cancer is a dynamical evolutionary process, where populations of tumor cells are continuously evolving to compete for resources, to metastasize, and to escape immune responses and therapy. Quantification of cancer evolutionary dynamics is therefore essential to understanding the mechanisms of cancer progression. Single-cell sequencing has enabled characterization of tumor composition at the finest possible resolution, thus providing researchers with the data needed to potentially allow for such quantification. However, to realize this potential, appropriate algorithms and data analysis tools are needed. The computational discipline that extracts evolutionary parameters from genomic data by integrating phylogenetics, population genetics and statistical learning is called phylodynamics. While almost all existing phylodynamics methods are developed for viruses, there is a growing realization that this methodology is also highly relevant to cancer biology. However, the development of cancer phylodynamics algorithms faces many challenges associated with the nature of cancer genomics data. The overarching goal of this proposal is to address these challenges by developing a phylodynamic framework for joint inference of cancer phylogenetic trees and evolutionary parameters from single-cell DNA sequencing (scDNA-Seq) data. This framework will allow cancer researchers to carry out a statistically and computationally sound evaluation of the effects of particular genome alterations or their combinations. In addition, this project will support development of innovative cross-disciplinary curricula, and bioinformatics training for diverse cohorts of undergraduate and graduate students at Georgia State University (Title III designation of Predominantly Black Institution), University of Connecticut, and UConn Health. The project has three interrelated technical aims. First, investigators will develop algorithms for joint reconstruction of clonal frequencies and phased cancer clone genomic profiles (including copy number variation profiles and single nucleotide variants). The project will concentrate on low-coverage scDNA-seq that can provide enough clonal data to guarantee the density of branching events in the cancer phylogenies necessary for phylodynamics analysis. Second, the researchers will design a novel methodology for intra-tumor phylodynamics inference. This includes scalable construction of plausible clone phylogenetic trees using a novel bipartition-based median-tree approach, together with maximum a posteriori inference of cancer fitness and mutability landscapes. The distinguishing feature of the proposed approach is the use of convex optimization techniques rather than MCMC sampling, which will guarantee scalability and accuracy of developed computational tools. Finally, a comprehensive set of experiments will be conducted to validate and assess the accuracy of developed methods. These will include computational experiments on simulated and publicly available scDNA-Seq data, as well as using scDNA-Seq datasets generated by in vitro and in vivo experiments conducted at UConn Health. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.


Funding Source

Project Period


This site is registered on as a development site.