
INFERENCE OF MARKOVIAN PROPERTIES OF MOLECULAR SEQUENCES USING SHOTGUN READS AND APPLICATIONS
National Science Foundation, Division of Mathematical Sciences (DMS)
People: Fengzhu Sun
2015 – 2019
National Science Foundation, Division of Mathematical Sciences (DMS)
People: Fengzhu Sun
2015 – 2019
National Science Foundation, Division of Mathematical Sciences (DMS)
People: Fengzhu Sun
2011 – 2016
National Institutes of Health, National Institute of General Medical Sciences
People: Nathan Ahlgren, Fengzhu Sun
2017 – 2021
Tang K, Lu YY, Sun F. (2018). Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer. Frontiers in microbiology
Tang K, Ren J, Cronn R, Erickson DL, Milligan BG, Parker-Forney M, Spouge JL, Sun F. (2018). Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA. BMC Genomics, 19(1)
Li H, Sun F. (2018). Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences. Scientific reports, 8(1)
Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F. (2018). -mer Sequence Signatures. Frontiers in microbiology
Wang Y, Wang K, Lu YY, Sun F. (2017). Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity. BMC Bioinformatics, 18(1)
Lu YY, Lv J, Fuhrman JA, Sun F. (2017). Towards enhanced and interpretable clustering/classification in integrative genomics. Nucleic acids research, 45(20)
Lu YY, Chen T, Fuhrman JA, Sun F. (2017). COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge. Bioinformatics (Oxford, England), 33(6)
Zhang M, Yang L, Ren J, Ahlgren NA, Fuhrman JA, Sun F. (2017). Prediction of virus-host infectious association by supervised learning methods. BMC Bioinformatics, 18(Suppl 3)
Ahlgren NA, Ren J, Lu YY, Fuhrman JA, Sun F. (2017). Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 45(1)
Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F. (2017). CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic acids research, 45(W1)
Bai X, Tang K, Ren J, Waterman M, Sun F. (2017). -statistic. BMC Genomics, 18(Suppl 6)
Liao W, Ren J, Wang K, Wang S, Zeng F, Wang Y, Sun F. (2016). Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains. Scientific reports
Zhang W, Coba MP, Sun F. (2016). Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships. BMC systems biology
Xia LC, Ai D, Cram JA, Liang X, Fuhrman JA, Sun F. (2015). Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains. BMC Bioinformatics
Wang W, Zhou X, Liu Z, Sun F. (2015). Network tuned multiple rank aggregation and applications to gene ranking. BMC Bioinformatics
Chen Q, Zhou XJ, Sun F. (2015). Finding genetic overlaps among diseases based on ranked gene lists. Journal of computational biology : a journal of computational molecular cell biology, 22(2)
Wang Y, Liu L, Chen L, Chen T, Sun F. (2014). Comparison of metatranscriptomic samples based on k-tuple frequencies. PLoS One, 9(1)
Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F. (2014). New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Briefings in bioinformatics, 15(3)
Ma X, Chen T, Sun F. (2014). Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks. Briefings in bioinformatics, 15(5)
Chen Q, Sun F. (2013). A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics
Xia LC, Ai D, Cram J, Fuhrman JA, Sun F. (2013). Efficient statistical significance approximation for local similarity analysis of high-throughput time series data. Bioinformatics (Oxford, England), 29(2)
Sun F. (2013). Research in Computational Molecular Biology (RECOMB 2013). Journal of computational biology : a journal of computational molecular cell biology, 20(10)
Song K, Ren J, Zhai Z, Liu X, Deng M, Sun F. (2013). Alignment-free sequence comparison based on next-generation sequencing reads. Journal of computational biology : a journal of computational molecular cell biology, 20(2)
Zhai Z, Reinert G, Song K, Waterman MS, Luan Y, Sun F. (2012). Normal and compound poisson approximations for pattern occurrences in NGS reads. Journal of computational biology : a journal of computational molecular cell biology, 19(6)
Wan L, Yan X, Chen T, Sun F. (2012). Modeling RNA degradation for RNA-Seq with applications. Biostatistics (Oxford, England), 13(4)
Chang Q, Luan Y, Chen T, Fuhrman JA, Sun F. (2012). Computational methods for the analysis of tag sequences in metagenomics studies. Frontiers in bioscience (Scholar edition)
Wan L, Sun F. (2012). CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. IEEE/ACM transactions on computational biology and bioinformatics, 9(5)
Chang Q, Luan Y, Sun F. (2011). Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics
Xia LC, Steele JA, Cram JA, Cardon ZG, Simmons SL, Vallino JJ, Fuhrman JA, Sun F. (2011). Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC systems biology
Liu X, Wan L, Li J, Reinert G, Waterman MS, Sun F. (2011). New powerful statistics for alignment-free sequence comparison under a pattern transfer model. Journal of theoretical biology, 284(1)
Xia LC, Cram JA, Chen T, Fuhrman JA, Sun F. (2011). Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS One, 6(12)
Zhai Z, Ku SY, Luan Y, Reinert G, Waterman MS, Sun F. (2010). The power of detecting enriched patterns: an HMM approach. Journal of computational biology : a journal of computational molecular cell biology, 17(4)
Zhou L, Ma X, Arbeitman MN, Sun F. (2009). Chromatin regulation and gene centrality are essential for controlling fitness pleiotropy in yeast. PLoS One, 4(11)
Wang L, Tu Z, Sun F. (2009). A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila. BMC Genomics
Wang W, Nunez-Iglesias J, Luan Y, Sun F. (2009). Usefulness and limitations of dK random graph models to predict interactions and functional homogeneity in biological networks under a pseudo-likelihood parameter estimation approach. BMC Bioinformatics
Zhou L, Ma X, Sun F. (2008). The effects of protein interactions, gene essentiality and regulatory regions on expression variation. BMC systems biology
Yan X, Sun F. (2008). Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics
Ma X, Lee H, Wang L, Sun F. (2007). CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics (Oxford, England), 23(2)
Tu Z, Wang L, Xu M, Zhou X, Chen T, Sun F. (2006). Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics
Jiang R, Tu Z, Chen T, Sun F. (2006). Network motif identification in stochastic networks. Proceedings of the National Academy of Sciences of the United States of America, 103(25)
Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F. (2006). Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics (Oxford, England), 22(20)
Tu Z, Wang L, Arbeitman MN, Chen T, Sun F. (2006). An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics (Oxford, England), 22(14)
Ruan Q, Steele JA, Schwalbach MS, Fuhrman JA, Sun F. (2006). A dynamic programming algorithm for binning microbial community profiles. Bioinformatics (Oxford, England), 22(12)
Zhang K, Sun F. (2005). Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC genetics
Zhang K, Qin Z, Chen T, Liu JS, Waterman MS, Sun F. (2005). HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics (Oxford, England), 21(1)
Lai Y, Sun F. (2004). Sampling distribution for microsatellites amplified by PCR: mean field approximation and its applications to genotyping. Journal of theoretical biology, 228(2)
Zhang K, Qin ZS, Liu JS, Chen T, Waterman MS, Sun F. (2004). Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome research, 14(5)
Deng M, Chen T, Sun F. (2004). An integrated probabilistic model for functional prediction of proteins. Journal of computational biology : a journal of computational molecular cell biology, 11(2-3)
Kim S, Zhang K, Sun F. (2003). Detecting susceptibility genes in case-control studies using set association. BMC genetics
Lai Y, Sun F. (2003). Microsatellite mutations during the polymerase chain reaction: mean field approximations and their applications. Journal of theoretical biology, 224(1)
Lai Y, Sun F. (2003). The relationship between microsatellite slippage mutation rate and the number of repeat units. Molecular biology and evolution, 20(12)
Lai Y, Shinde D, Arnheim N, Sun F. (2003). The mutation process of microsatellites during the polymerase chain reaction. Journal of computational biology : a journal of computational molecular cell biology, 10(2)
Deng M, Zhang K, Mehta S, Chen T, Sun F. (2003). Prediction of protein function using protein-protein interaction data. Journal of computational biology : a journal of computational molecular cell biology, 10(6)
Sun F, Cui J, Gavras H, Schwartz F. (2003). A novel class of tests for the detection of mitochondrial DNA-mutation involvement in diseases. American journal of human genetics, 72(6)
Zhang K, Deng M, Chen T, Waterman MS, Sun F. (2002). A dynamic programming algorithm for haplotype block partitioning. Proceedings of the National Academy of Sciences of the United States of America, 99(11)
Zhang K, Calabrese P, Nordborg M, Sun F. (2002). Haplotype block structure and its applications to association studies: power and study designs. American journal of human genetics, 71(6)
MIDAS Coordination Center
University of Pittsburgh
A737 Public Health
130 DeSoto Street
Pittsburgh PA 15261