Close

AN ALIGNMENT FREE NETWORK APPROACH TO ANALYZING HIGHLY RECOMBINANT MALARIA PARASI

Abstract

Many of the most important human pathogens including the malaria parasite, HIV, and the pneumococcus, are characterized by extensive genetic diversity generated by recombination. In order to design of effective vaccines and long-lasting drug regimens, it is critical that we understand how this diversity relates to the epidemiological dynamics of disease in human populations. While high throughput sequencing techniques are generating vast volumes of genomic sequence data from these pathogens, the analytical tools capable of making sense of them are severely limited. One of the most pressing problems is the lack of tools that can deal with these pathogens' high rates of recombination, particularly when considering the vast genetic datasets now being generated by high throughput sequencing techniques. This project represents an interdisciplinary collaboration between computer scientists and malaria biologists, bringing together deep expertise on network analysis, computational methods, epidemiology, evolutionary biology and malaria, to develop a new suite of scalable, general computational tools for visualizing and analyzing recombinant gene sequences, using the malaria parasite as a case study. Drawing on recent advances in the field of network science, the project will develop novel methods for accurately inferring "recombination networks" from sequence data, automatically identifying statistically significant "clusters" in these networks, and testing their epidemiological significance. Our approach focuses on alignment-free analysis methods, which naturally accommodate the recombinant shuffling of sequences, allows for the analysis of structural features of the relationships between genes, and provides insights into the effects of recombination on their evolution. In additional to answering important biological and epidemiological questions, this project will produce a novel open-source software platform that will enable researchers to analyze recombinant sequence data from a wide variety of important human pathogens.

People

Funding

2013-2015