Genetic diversity of SARS-CoV-2 (formerly 2019-nCoV), the virus which causes COVID-19, provides information about epidemic origins and the rate of epidemic growth. By analysing 53 SARS-CoV-2 whole genome sequences collected up to February 3, 2020, we find a strong association between the time of sample collection and accumulation of genetic diversity. Bayesian and maximum likelihood phylogenetic methods indicate that the virus was introduced into the human population in early December and has an epidemic doubling time of approximately seven days. Phylodynamic modelling provides an estimate of epidemic size through time. Precise estimates of epidemic size are not possible with current genetic data, but our analyses indicate evidence of substantial heterogeneity in the number of secondary infections caused by each case, as indicated by a high level of over-dispersion in the reproduction number. Larger numbers of more systematically sampled sequences-particularly from across China-will allow phylogenetic estimates of epidemic size and growth rate to be substantially refined.
Volz Erik, Baguelin Marc, Bhatia Sangeeta, Boonyasiri Adhiratha, Cori Anne, Cucunubá Zulma, Cuomo-Dannenburg Gina, Donnelly Christl A, Dorigatti Ilaria, Fitzjohn Rich, Fu Han, Gaythorpe Katy, Ghani Azra, Hamlet Arran, Hinsley Wes, Imai Natsuko, Laydon Daniel, Nedjati-Gilani Gemma, Okell Lucy, Riley Steven, Van Elsland Sabine, Wang Haowei, Wang Yuanrong, Xi Xiaoyue, Ferguson Neil M. Report 5: Phylogenetic analysis of SARS-CoV-2