Genomes contain the complete set of instructions for building an organism. Structural variants are rearrangements in the genome such as insertions and deletions, whose discovery advances the understanding of the evolution and the adaptability of species. Recent advances in high-throughput sequencing technologies have led to the collection of vast quantities of genomic data. Because of this, fast and robust algorithms are needed to identify structural variants, which are rare and are prone to noise. This research will contribute fundamentally to optimization methods for large-scale problems in computational genomics. The algorithms will be disseminated publicly for use within and outside the biology, mathematics, and computer science community. Graduate students will be trained in scientific research and programming through this interdisciplinary research, and the participation of students from under-represented backgrounds will be highly encouraged. The research objective of this award is to develop computational tools for large-scale data-driven problems arising in computational genomics. These problems are especially difficult to solve since they are high-dimensional and the data are noisy and inexact. This study will take advantage of known relationships in sequenced genomes to improve the accuracy of identifying genomic variants in population studies when there is both low coverage in the data and multiple related individuals are sequenced. Specifically, the proposed research will (i) explore statistical models for describing the presence of structural variants in genomes, (ii) develop and implement novel sparse optimization methods for genomic structural variant detection, and (iii) validate on existing genomic data sets and predict on new data.
Div Of Information & Intelligent Systems