HCV-hepatitis C virus, HLA-human leukocyte antigen, CTL-cytotoxic T lymphocyte, NS5B-nonstructural protein 5B, MSA-multiple sequence alignment, PEG-IFN-pegylated interferon.
Hepatitis C virus (HCV) afflicts 170 million people worldwide, 2%-3% of the global population, and kills 350 000 each year. Prophylactic vaccination offers the most realistic and cost effective hope of controlling this epidemic in the developing world where expensive drug therapies are not available. Despite 20 years of research, the high mutability of the virus and lack of knowledge of what constitutes effective immune responses have impeded development of an effective vaccine. Coupling data mining of sequence databases with spin glass models from statistical physics, we have developed a computational approach to translate clinical sequence databases into empirical fitness landscapes quantifying the replicative capacity of the virus as a function of its amino acid sequence. These landscapes explicitly connect viral genotype to phenotypic fitness, and reveal vulnerable immunological targets within the viral proteome that can be exploited to rationally design vaccine immunogens. We have recovered the empirical fitness landscape for the HCV RNA-dependent RNA polymerase (protein NS5B) responsible for viral genome replication, and validated the predictions of our model by demonstrating excellent accord with experimental measurements and clinical observations. We have used our landscapes to perform exhaustive in silico screening of 16.8 million T-cell immunogen candidates to identify 86 optimal formulations. By reducing the search space of immunogen candidates by over five orders of magnitude, our approach can offer valuable savings in time, expense, and labor for experimental vaccine development and accelerate the search for a HCV vaccine.