Fengzhu Sun





Preferred Pronouns

Personal Academic Website

Research/Topics of Interest

Pathogens/Diseases of Main Interest/Expertise


Countries of Work/Collaboration



Stackpole ML, Zeng W, Li S, Liu CC, Zhou Y, He S, Yeh A, Wang Z, Sun F, Li Q, Yuan Z, Yildirim A, Chen PJ, Winograd P, Tran B, Lee YT, Li PS, Noor Z, Yokomizo M, Ahuja P, Zhu Y, Tseng HR, Tomlinson JS, Garon E, French S, Magyar CE, Dry S, Lajonchere C, Geschwind D, Choi G, Saab S, Alber F, Wong WH, Dubinett SM, Aberle DR, Agopian V, Han SB, Ni X, Li W, Zhou XJ. (2022). Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer. Nature communications, 13(1)

Jia JA, Zhang S, Bai X, Fang M, Chen S, Liang X, Zhu S, Wong DK, Zhang A, Feng J, Sun F, Gao C. (2022). Sparse logistic regression revealed the associations between HBV PreS quasispecies and hepatocellular carcinoma. Virology journal, 19(1)

Du Y, Sun F. (2022). HiFine: integrating Hi-c-based and shotgun-based methods to reFine binning of metagenomic contigs. Bioinformatics (Oxford, England)

Du Y, Sun F. (2022). HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome biology, 23(1)

Tang T, Hou S, Fuhrman JA, Sun F. (2022). Phage-bacterial contig association prediction with a convolutional neural network. Bioinformatics (Oxford, England), 38(Suppl 1)

An S, Ren J, Sun F, Wan L. (2022). A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses. Journal of computational biology : a journal of computational molecular cell biology

Pevzner P, Vingron M, Reidys C, Sun F, Istrail S. (2022). Michael Waterman's Contributions to Computational Biology and Bioinformatics. Journal of computational biology : a journal of computational molecular cell biology

Zuo W, Wang B, Bai X, Luan Y, Fan Y, Michail S, Sun F. (2022). 16S rRNA and metagenomic shotgun sequencing data revealed consistent patterns of gut microbiome signature in pediatric ulcerative colitis. Scientific reports, 12(1)

Zuo W, Michail S, Sun F. (2022). Metagenomic Analyses of Multiple Gut Datasets Revealed the Association of Phage Signatures in Colorectal Cancer. Frontiers in cellular and infection microbiology, (12)

Gao Y, Zhu Z, Sun F. (2022). Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data. Synthetic and systems biotechnology, 7(1)

Istrail S, Pevzner P, Sun F, Vingron M. (2022). Special Issue: Professor Michael Waterman's 80th Birthday, Part 1. Journal of computational biology : a journal of computational molecular cell biology

Du Y, Laperriere SM, Fuhrman J, Sun F. (2022). Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression. Journal of computational biology : a journal of computational molecular cell biology

Bai X, Ren J, Sun F. (2022). MLR-OOD: a Markov chain based Likelihood Ratio method for Out-Of-Distribution detection of genomic sequences. Journal of molecular biology

Li X, Wang X, Huang R, Stucky A, Chen X, Sun L, Wen Q, Zeng Y, Fletcher H, Wang C, Xu Y, Cao H, Sun F, Li SC, Zhang X, Zhong JF. (2022). The Machine-Learning-Mediated Interface of Microbiome and Genetic Risk Stratification in Neuroblastoma Reveals Molecular Pathways Related to Patient Survival. Cancers, 14(12)

Jiao Z, Lai Y, Kang J, Gong W, Ma L, Jia T, Xie C, Xiang S, Cheng W, Heinz A, Desrivières S, Schumann G, , Sun F, Feng J. (2022). A model-based approach to assess reproducibility for large-scale high-throughput MRI-based studies. NeuroImage, (255)

Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. (2022). Critical Assessment of Metagenome Interpretation: the second round of challenges. Nature methods, 19(4)

Wang Y, Sun F, Lin W, Zhang S. (2022). AC-PCoA: Adjustment for confounding factors using principal coordinate analysis. PLoS computational biology, 18(7)

Ning K, Duffy BA, Franklin M, Matloff W, Zhao L, Arzouni N, Sun F, Toga AW. (2021). Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging. Neurobiology of aging, (105)

Wang Z, Li S, You R, Zhu S, Zhou XJ, Sun F. (2021). ARG-SHINE: improve antibiotic resistance class prediction by integrating sequence homology, functional information and deep convolutional neural network. NAR genomics and bioinformatics, 3(3)

Zhu Z, Fan Y, Kong Y, Lv J, Sun F. (2021). DeepLINK: Deep learning inference using knockoffs with applications to genomics. Proceedings of the National Academy of Sciences of the United States of America, 118(36)

Dong G, Feng J, Sun F, Chen J, Zhao XM. (2021). A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome medicine, 13(1)

Rotondo-Trivette S, Wang B, Gayer C, Parsana R, Luan Y, Sun F, Michail S. (2021). Decreased secondary faecal bile acids in children with ulcerative colitis and Clostridioides difficile infection. Alimentary pharmacology & therapeutics

Rotondo-Trivette S, Wang B, Luan Y, Fiehn O, Sun F, Michail S. (2021). Reduced fecal short-chain fatty acids in hispanic children with ulcerative colitis. Physiological reports, 9(14)

Ning K, Zhao L, Matloff W, Sun F, Toga AW. (2020). Association of relative brain age with tobacco smoking, alcohol consumption, and genetic variants. Scientific reports, 10(1)

Bai X, Ren J, Fan Y, Sun F. (2020). KIMI: Knockoff Inference for Motif Identification from molecular sequences with controlled false discovery rate. Bioinformatics (Oxford, England)

Wang Y, Chen Q, Deng C, Zheng Y, Sun F. (2020). -mers. Frontiers in microbiology, (11)

Wan L, Kang X, Ren J, Sun F. (2020). Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data. Quantitative biology (Beijing, China), 8(2)

Ning K, Zhao L, Franklin M, Matloff W, Batta I, Arzouni N, Sun F, Toga AW. (2020). Parity is associated with cognitive function and brain age in both females and males. Scientific reports, 10(1)

Lu YY, Bai J, Wang Y, Wang Y, Sun F. (2020). CRAFT: Compact genome Representation towards large-scale Alignment-Free daTabase. Bioinformatics (Oxford, England)

Chen P, Li S, Li W, Ren J, Sun F, Liu R, Zhou XJ. (2020). Rapid diagnosis and comprehensive bacteria profiling of sepsis based on cell-free DNA. Journal of translational medicine, 18(1)

Ren J, Song K, Deng C, Ahlgren NA, Fuhrman JA, Li Y, Xie X, Poplin R, Sun F. (2020). Identifying viruses from metagenomic data using deep learning. Quantitative biology (Beijing, China), 8(1)

Wang W, Ren J, Tang K, Dart E, Ignacio-Espinoza JC, Fuhrman JA, Braun J, Sun F, Ahlgren NA. (2020). A network-based integrated framework for predicting virus-prokaryote interactions. NAR genomics and bioinformatics, 2(2)

Zhang F, Sun F, Luan Y. (2019). Statistical significance approximation for local similarity analysis of dependent time series data. BMC bioinformatics, 20(1)

Zielezinski A, Girgis HZ, Bernard G, Leimeister CA, Tang K, Dencker T, Lau AK, Röhling S, Choi JJ, Waterman MS, Comin M, Kim SH, Vinga S, Almeida JS, Chan CX, James BT, Sun F, Morgenstern B, Karlowski WM. (2019). Benchmarking of alignment-free sequence comparison methods. Genome biology, 20(1)

Song K, Ren J, Sun F. (2019). Reads Binning Improves Alignment-Free Metagenome Comparison. Frontiers in genetics, (10)

You R, Yao S, Xiong Y, Huang X, Sun F, Mamitsuka H, Zhu S. (2019). NetGO: improving large-scale protein function prediction with massive network information. Nucleic acids research, 47(W1)

Chen S, Chen Y, Sun F, Waterman MS, Zhang X. (2019). A new statistic for efficient detection of repetitive sequences. Bioinformatics (Oxford, England), 35(22)

Wang Z, Wang Z, Lu YY, Sun F, Zhu S. (2019). SolidBin: improving metagenome binning with semi-supervised normalized cut. Bioinformatics (Oxford, England), 35(21)

Wang Z, Wang Y, Fuhrman JA, Sun F, Zhu S. (2020). Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences. Briefings in bioinformatics, 21(3)

Zhu Z, Ren J, Michail S, Sun F. (2019). MicroPro: using metagenomic unmapped reads to provide insights into human microbiota and disease associations. Genome biology, 20(1)

Tang K, Ren J, Sun F. (2019). Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression. Genome biology, 20(1)

Nusbaum DJ, Sun F, Ren J, Zhu Z, Ramsy N, Pervolarakis N, Kunde S, England W, Gao B, Fiehn O, Michail S, Whiteson K. (2018). Gut microbial and metabolomic profiles after fecal microbiota transplantation in pediatric ulcerative colitis patients. FEMS microbiology ecology, 94(9)

Tang K, Lu YY, Sun F. (2018). Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer. Frontiers in microbiology, (9)

Tang K, Ren J, Cronn R, Erickson DL, Milligan BG, Parker-Forney M, Spouge JL, Sun F. (2018). Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA. BMC Genomics, 19(1)

Li H, Sun F. (2018). Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences. Scientific reports, 8(1)

Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F. (2018). -mer Sequence Signatures. Frontiers in microbiology, (9)

Bai X, Jia JA, Fang M, Chen S, Liang X, Zhu S, Zhang S, Feng J, Sun F, Gao C. (2018). Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC. PLoS genetics, 14(2)

Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. (2018). Alignment-Free Sequence Analysis and Applications. Annual review of biomedical data science, (1)

Wang Y, Wang K, Lu YY, Sun F. (2017). Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity. BMC Bioinformatics, 18(1)

Zhang M, Yang L, Ren J, Ahlgren NA, Fuhrman JA, Sun F. (2017). Prediction of virus-host infectious association by supervised learning methods. BMC Bioinformatics, 18(Suppl 3)

Lu YY, Lv J, Fuhrman JA, Sun F. (2017). Towards enhanced and interpretable clustering/classification in integrative genomics. Nucleic acids research, 45(20)

Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F. (2017). CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic acids research, 45(W1)

Bai X, Tang K, Ren J, Waterman M, Sun F. (2017). Optimal choice of word length when comparing two Markov sequences using a chi-square statistic. BMC Genomics, 18(Suppl 6)

Liao W, Ren J, Wang K, Wang S, Zeng F, Wang Y, Sun F. (2016). Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains. Scientific reports, (6)

Zhang W, Coba MP, Sun F. (2016). Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships. BMC systems biology, (10 Suppl 1)

Lu YY, Chen T, Fuhrman JA, Sun F. (2017). COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge. Bioinformatics (Oxford, England), 33(6)

Ahlgren NA, Ren J, Lu YY, Fuhrman JA, Sun F. (2017). Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 45(1)

Xia LC, Ai D, Cram JA, Liang X, Fuhrman JA, Sun F. (2015). Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains. BMC Bioinformatics, (16)

Wang W, Zhou X, Liu Z, Sun F. (2015). Network tuned multiple rank aggregation and applications to gene ranking. BMC Bioinformatics, (16 Suppl 1)

Chen Q, Zhou XJ, Sun F. (2015). Finding genetic overlaps among diseases based on ranked gene lists. Journal of computational biology : a journal of computational molecular cell biology, 22(2)

Wang Y, Liu L, Chen L, Chen T, Sun F. (2014). Comparison of metatranscriptomic samples based on k-tuple frequencies. PLoS One, 9(1)

Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F. (2014). New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Briefings in bioinformatics, 15(3)

Ma X, Chen T, Sun F. (2014). Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks. Briefings in bioinformatics, 15(5)

Chen Q, Sun F. (2013). A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics, (14 Suppl 1)

Sun F. (2013). Research in Computational Molecular Biology (RECOMB 2013). Journal of computational biology : a journal of computational molecular cell biology, 20(10)

Song K, Ren J, Zhai Z, Liu X, Deng M, Sun F. (2013). Alignment-free sequence comparison based on next-generation sequencing reads. Journal of computational biology : a journal of computational molecular cell biology, 20(2)

Wan L, Sun F. (2012). CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. IEEE/ACM transactions on computational biology and bioinformatics, 9(5)

Xia LC, Ai D, Cram J, Fuhrman JA, Sun F. (2013). Efficient statistical significance approximation for local similarity analysis of high-throughput time series data. Bioinformatics (Oxford, England), 29(2)

Wan L, Yan X, Chen T, Sun F. (2012). Modeling RNA degradation for RNA-Seq with applications. Biostatistics (Oxford, England), 13(4)

Zhai Z, Reinert G, Song K, Waterman MS, Luan Y, Sun F. (2012). Normal and compound poisson approximations for pattern occurrences in NGS reads. Journal of computational biology : a journal of computational molecular cell biology, 19(6)

Chang Q, Luan Y, Chen T, Fuhrman JA, Sun F. (2012). Computational methods for the analysis of tag sequences in metagenomics studies. Frontiers in bioscience (Scholar edition), (4)

Chang Q, Luan Y, Sun F. (2011). Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics, (12)

Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY, Chow CE, Sachdeva R, Jones AC, Schwalbach MS, Rose JM, Hewson I, Patel A, Sun F, Caron DA, Fuhrman JA. (2011). Marine bacterial, archaeal and protistan association networks reveal ecological linkages. The ISME journal, 5(9)

Xia LC, Steele JA, Cram JA, Cardon ZG, Simmons SL, Vallino JJ, Fuhrman JA, Sun F. (2011). Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC systems biology, (5 Suppl 2)

Xia LC, Cram JA, Chen T, Fuhrman JA, Sun F. (2011). Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS One, 6(12)

Liu X, Wan L, Li J, Reinert G, Waterman MS, Sun F. (2011). New powerful statistics for alignment-free sequence comparison under a pattern transfer model. Journal of theoretical biology, 284(1)

Zhai Z, Ku SY, Luan Y, Reinert G, Waterman MS, Sun F. (2010). The power of detecting enriched patterns: an HMM approach. Journal of computational biology : a journal of computational molecular cell biology, 17(4)

Zhou L, Ma X, Arbeitman MN, Sun F. (2009). Chromatin regulation and gene centrality are essential for controlling fitness pleiotropy in yeast. PLoS One, 4(11)

Wang L, Tu Z, Sun F. (2009). A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila. BMC Genomics, (10)

Wang W, Nunez-Iglesias J, Luan Y, Sun F. (2009). Usefulness and limitations of dK random graph models to predict interactions and functional homogeneity in biological networks under a pseudo-likelihood parameter estimation approach. BMC Bioinformatics, (10)

Zhou L, Ma X, Sun F. (2008). The effects of protein interactions, gene essentiality and regulatory regions on expression variation. BMC systems biology, (2)

Yan X, Sun F. (2008). Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics, (9)

Ruan Q, Steele JA, Schwalbach MS, Fuhrman JA, Sun F. (2006). A dynamic programming algorithm for binning microbial community profiles. Bioinformatics (Oxford, England), 22(12)

Tu Z, Wang L, Xu M, Zhou X, Chen T, Sun F. (2006). Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics, (7)

Tu Z, Wang L, Arbeitman MN, Chen T, Sun F. (2006). An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics (Oxford, England), 22(14)

Jiang R, Tu Z, Chen T, Sun F. (2006). Network motif identification in stochastic networks. Proceedings of the National Academy of Sciences of the United States of America, 103(25)

Ma X, Lee H, Wang L, Sun F. (2007). CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics (Oxford, England), 23(2)

Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F. (2006). Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics (Oxford, England), 22(20)

Zhang K, Sun F. (2005). Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC genetics, (6)

Zhang K, Qin Z, Chen T, Liu JS, Waterman MS, Sun F. (2005). HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics (Oxford, England), 21(1)

Lai Y, Sun F. (2004). Sampling distribution for microsatellites amplified by PCR: mean field approximation and its applications to genotyping. Journal of theoretical biology, 228(2)

Zhang K, Qin ZS, Liu JS, Chen T, Waterman MS, Sun F. (2004). Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome research, 14(5)

Deng M, Chen T, Sun F. (2004). An integrated probabilistic model for functional prediction of proteins. Journal of computational biology : a journal of computational molecular cell biology, 11(2-3)

Kim S, Zhang K, Sun F. (2003). Detecting susceptibility genes in case-control studies using set association. BMC genetics, (4 Suppl 1)

Lai Y, Sun F. (2003). Microsatellite mutations during the polymerase chain reaction: mean field approximations and their applications. Journal of theoretical biology, 224(1)

Sun F, Cui J, Gavras H, Schwartz F. (2003). A novel class of tests for the detection of mitochondrial DNA-mutation involvement in diseases. American journal of human genetics, 72(6)

Lai Y, Sun F. (2003). The relationship between microsatellite slippage mutation rate and the number of repeat units. Molecular biology and evolution, 20(12)

Lai Y, Shinde D, Arnheim N, Sun F. (2003). The mutation process of microsatellites during the polymerase chain reaction. Journal of computational biology : a journal of computational molecular cell biology, 10(2)

Deng M, Zhang K, Mehta S, Chen T, Sun F. (2003). Prediction of protein function using protein-protein interaction data. Journal of computational biology : a journal of computational molecular cell biology, 10(6)

Zhang K, Deng M, Chen T, Waterman MS, Sun F. (2002). A dynamic programming algorithm for haplotype block partitioning. Proceedings of the National Academy of Sciences of the United States of America, 99(11)

Zhang K, Calabrese P, Nordborg M, Sun F. (2002). Haplotype block structure and its applications to association studies: power and study designs. American journal of human genetics, 71(6)

If you’d like to update your profile, please email modifications to

This site is registered on as a development site.