Reaching Others University at Buffalo - The State University of New York
Skip to Content
Yijun                          Sun

Yijun Sun PhD

Department of Microbiology and Immunology

Assistant Professor in Bioinformatics

Specialty/Research Focus

Bacterial Pathogenesis; Bioinformatics; Eukaryotic Pathogenesis; Immunology; Virology

Professional Summary:

The recent development of high-throughput genomics technologies is revolutionizing many aspects of modern biology. However, the lack of computational algorithms and resources for analyzing massive data generated by these techniques has become a rate-limiting factor for scientific discoveries in biology research.

In my laboratory, we study machine learning, data mining and bioinformatics and their applications to cancer informatics and metagenomics. Our work is based on solid mathematical and statistical theories. The main focus of our research is on developing advanced algorithms to help biologists keep pace with the unprecedented growth of genomics datasets available today and enable them to make full use of their massive, high-dimensional data for various biological enquiries.

My research team is working on two major projects. The first is focused on metagenomics, currently funded by the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Women’s Health Initiative. Our goal is to develop an integrated suite of computational and statistical algorithms to process millions or even hundreds of millions of microbial genome sequences to: 1) derive quantitative microbial signatures to characterize various infectious diseases, 2) interactively visualize the complex structure of a microbial community, 3) study microbe-microbe interactions and community dynamics and 4) identify novel species. We collaborate with researchers throughout the University at Buffalo, notably those in the School of Medicine and Biomedical Sciences, the School of Public Health and Health Professions and the College of Arts and Sciences.

The second project focuses on cancer progression modeling. We use advanced computational algorithms to integrate clinical and genetics data from thousands of tumor and normal tissue samples to build a model of cancer progression. Delineating the disease dynamic process and identifying the molecular events that drive stepwise progression to malignancy would provide a wealth of new insights. Results of this work also would guide the development of improved cancer diagnostics, prognostics and targeted therapeutics.

The bioinformatics algorithms and software developed in our lab have been used by more than 200 research institutes worldwide to process large, complex data sets that are core to a wide variety of biological and biomedical research.

Education and Training:
  • PhD, Electrical Engineering, University of Florida (2004)
  • MS, Electrical Engineering, University of Florida (2003)
  • BS, Electrical Engineering/Mechanical Engineering, Shanghai Jiao Tong University (1995)
  • Assistant Professor, Microbiology and Immunology, University at Buffalo, The State University of New York (2012-present)
  • Assistant Scientist, University of Florida, The Interdisciplinary Center for Biotechnology Research (2005–2012)
Awards and Honors:
  • Spotlight Paper in the September 2010 issue of IEEE Trans. on Pattern Analysis and Machine Intelligence (2010)
  • IEEE M. Barry Carlton Best Transactions Paper Award (2005)

Research Expertise:
  • Bioinformatics: metagenomics, sequence analysis, microarray data analysis, microbial community analysis, molecular classification and genetic network modeling for cancer diagnosis and prognosis, microbial network analysis, phylogenetic analysis
  • Machine learning: large margin classification/regression, large-scale clustering analysis, ensemble learning, feature selection/extraction, computational learning theory, network analysis, graphical modeling and Bayesian network
Research Centers:
  • Witebsky Center for Microbial Pathogenesis and Immunology
  • Center of Excellence in Bioinformatics and Life Sciences
UB 2020 Strategic Strengths:
  • Molecular Recognition in Biological Systems and Bioinformatics
Grants and Sponsored Research:
  • July 2014–July 2019
    Oral Microbiome and Periodontitis: A Prospective Study in Postmenopausal Women
    Role: Co-Investigator
  • February 2011–December 2016
    Microarray-Based Biomarkers in Juvenile Idiopathic Arthritis
    Role: Co-Investigator
  • July 2015–June 2016
    The Neonatal Microbiome Study: a Pilot Study in Meru County, Kenya
    UB-the Innovative Micro-Programs Accelerating Collaboration in Themes Program
    Role: Co-Investigator
  • January 2014–January 2016
    Derivation of molecular signatures for accurate prostate cancer prognosis using both annotated and non-annotated tissue samples
    SUNY Research Foundation
    Role: Principal Investigator
  • July 2014–June 2015
    Expression-Based Biomarkers in Cystic Fibrosis
    University at Buffalo Clinical and Translational Research Fund
    Role: Co-Investigator
  • June 2011–June 2015
    Advanced computational algorithms for deep interrogation of microbial communities using millions of 16S rRNA pyrosequences
    Role: Principal Investigator
  • June 2011–June 2013
    Derivation of molecular signatures for accurate breast cancer prognosis
    Bankhead-Coley Cancer Program
    Role: Principal Investigator
  • September 2007–October 2010
    Accurate breast cancer prognosis
    Susan Komen Breast Cancer Foundation
    Role: Co-Investigator

Journal Articles:
  • Sun Y, Cai Y, Mai V, Farmerie W, Yu F, Li J, Goodison S. Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data. Nucleic Acids Research. 2010; 38(22).
  • Sun Y, Todorovic S, Goodison S. Local-learning-based feature selection for high-dimensional data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010; 32(9).
  • Sun Y, Cai Y, Liu L, Yu F, Farrell ML, McKendree W, Farmerie W. ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Research. 2009; 37(10).
  • Sun Y. Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007; 29(6).
  • Lott K; Mukhopadhyay S; Li J; Wang J; Yao J; Sun Y; Qu J; Read L. Arginine methylation of DRBD18 differentially impacts its opposing effects on the trypanosome transcriptome. Nucleic Acids Research. 2015; 43(11).
  • Wang J, Sun Y, Gao, X. Sparse structure regularized ranking. Multimedia Tools and Applications. 2015; 74(2).
  • Sen R; Raychoudhury R; Cai Y; Sun Y; Lietze VU; Peterson BF;Scharf ME; Boucias DG. Molecular signatures of nicotinoid-pathogen synergy in the termite gut. PLoS ONE. 2015; 10(4).
  • Yao J; Mao Q;Mai V;Goodison S;Sun Y. Feature selection for unsupervised learning through local learning. Pattern Recognition Letters. 2015; 53(1).
  • Sun Y; Yao J; Nowak N, Goodison S. Cancer progression modeling using static sample data. Genome Biology. 2014; 15(8).
  • Chung A, Li Q, Blair SJ, Jesus MD, Dennis KL, LeVea C, Yao J, Sun Y, Conway TF, Virtuoso LP, Battaglia NG, Furtado S, Mathiowitz E, Mantis NJ, Khazaie K, Egilmez N. Oral interleukin-10 alleviates polyposis via neutralization of pathogenic T-regulatory cells. Cancer Research. 2014.
  • Hickman D, Jones MK, Zhu S, Kirkpatrick E, Ostrov DA, Wang X, Ukhanova M, Sun Y, Mai V, Salemi M. The effect of malnutrition on norovirus infection. mBio. 2014; 5(2).
  • Wang J and Sun Y. From one graph to many: ensemble transduction for content-based database retrieval. Knowledge-Based Systems. 2014; 65.
  • Sen R, Raychoudhury R, Cai Y, Sun Y, Lietze VU, Boucias DG, Scharf ME. Differential impacts of juvenile hormone, soldier head extract and alternate caste phenotypes on host and symbiont transcriptome composition in the gut of the termite Reticulitermes flavipes. BMC Genomics. 2013; 14.
  • Correll MJ, Pyle TP, Millar K, Sun Y, Yao J, Edelmann RE, Kiss J. Transcriptome analyses of Arabidopsis thaliana seedlings grown in space: implications for gravity-responsive genes. Planta. 2013; 238(3).
  • Boucias DG, Cai Y, Sun Y, Lietze VU, Sen R, Raychoudhury R, Scharf ME. The hindgut-lumen prokaryotic microbiota of the termite Reticulitermes flavipes and its responses to dietary lignocellulose composition. Molecular Ecology. 2013; 22(7).
  • Zhang X, Yao J, Zhang Y, Sun Y, Mou Z. The Arabidopsis Mediator Complex Subunits MED14/SWP and MED16/SFR6/IEN1 Differentially Regulate Defense Gene Expression in Plant Immune Responses. The Plant Journal. 2013; 75(3).
  • Raychoudhury R, Sen R, Cai Y, Sun Y, Lietze VU, Boucias DG, Scharf ME. Comparative metatranscriptomic signatures of wood and paper feeding in the gut of the termite Reticulitermes flavipes (Isoptera: Rhinotermitidae) Insect Molecular Biology. 2013; 22(2).
  • Wang X, Yao J, Sun Y, Mai V. M-pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformatics. 2013; 14(1).
  • Wang Y, An C, Zhang X, Yao J, Zhang Y, Sun Y, Yu F, Amador DM, Mou Z. The Arabidopsis elongator complex subunit2 epigenetically regulates plant immune responses. Plant Cell. 2013; 25(2).
  • Mai V, Torrazza RM, Ukhanova M, Wang X, Sun Y, Li N, Shuster J, Sharma R, Hudak ML, Neu J. Distortions in development of intestinal microbiota associated with late onset sepsis in preterm infants PLoS ONE. 2013; 8(1).
  • Yin L, Liu L, Sun Y, Hou W, Lowe AC, Gardner BP, Salemi M, Williams WB, Farmerie WG, Sleasman JW, Goodenow MM. High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems. Retrovirology. 2012; 9.
  • Urquidi V, Goodison S, Cai Y, Sun Y, Rosser CJ. A candidate molecular biomarker panel for the detection of bladder cancer. Cancer Epidemiology Biomarkers. 2012; 21(12).
  • Zhang X, Wang C, Zhang Y, Sun Y, Mou Z.. The Arabidopsis mediator complex subunit16 positively regulates salicylate-mediated systemic acquired resistance and jasmonate/ethylene-induced defense pathways. Plant Cell. 2012; 24(10).
  • Ukhanova M, Culpepper T, Baer D, Gordon D, Kanahori S, Valentine J, Neu J, Sun Y, Wang X, Mai V. Gut Microbiota Correlates with Energy Gain from a Dietary Fiber and Appears Associated with Acute and Chronic Intestinal Diseases. Clinical Microbiology and Infection. 2012.
  • Wang X, Cai Y, Sun Y, Knight R, Mai V. Secondary Structure Information Does not Improve OTU Picking for 16S rRNA Sequences. The ISME Journal. 2012; 6(7).
  • Paul AL, Zupanska AK, Ostrow DT, Zhang Y, Sun Y, Li JL, Shanker S, Farmerie WG, Amalfitano CE, Ferl RJ. Spaceflight Transcriptomes: Unique Responses to a Novel Environment. Astrobiology. 2012; 12(1).
  • Sun Y, Cai Y, Huse SM, Knight R, Farmerie WG, Wang X, Mai V. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Briefings in Bioinformatics. 2012; 13(1).
  • Cai, Y, Sun Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Research. 2011; 39(14).
  • Mai V, Young CM, Ukhanova M, Wang X, Sun Y, Casella G, Theriaque D, Li N, Sharma R, Hudak M, Neu J. Fecal microbiota in premature infants prior to necrotizing enterocolitis. PLoS One. 2011; 6(6).
  • Goodison S, Sun Y, Urquidi V. Derivation of cancer diagnostic and prognostic signatures from gene expression data. Bioanalysis. 2010; 2(5).
  • Bandyopadhyay N, Kahveci T, Goodison S, Sun Y, Ranka S. Pathway-based feature selection algorithm for cancer microarray data. Advances in Bioinformatics. 2010; 3.
  • Sun Y, Urquidi V, Goodison S. Derivation of molecular signatures for breast cancer recurrence prediction using a two-way validation approach. Breast Cancer Research and Treatment. 2010; 119(3).
  • Duan Y, Zhou L, Hall DG, Li W, Doddapaneni H, Lin H, Liu L, Vahling CM, Gabriel DW, Williams KP, Dickerman A, Sun Y, Gottwald T. Complete genome sequence of citrus huanglongbing bacterium, ‘Candidatus Liberibacter asiaticus‘ obtained through metagenomics. Molecular Plant-Microbe Interactions. 2009; 22(8).
  • Sun Y, Goodison S. Optimizing molecular signatures for predicting prostate cancer recurrence. Prostate. 2009; 69(10).
  • Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S. Bladder cancer-associated gene expression signatures identified by profiling of exfoliated urothelia. Cancer Epidemiology, Biomarkers and Prevention. 2009; 18(2).
  • Sun Y, Wu D. Feature extraction through local learning. Statistical Analysis and Data Mining. 2009; 2(1).
  • Yu F, Sun Y, Liu L, Farmerie W. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis. Bioinformation. 2009; 4(1).
  • Bandyopadhyay N, Kahveci T, Goodison S, Sun Y, Ranka S. Pathway-BasedFeature Selection Algorithm for Cancer Microarray Data. Advances in Bioinformatics. 2009.
  • Sun Y, Todorovic S, Li J. Increasing the robustness of boosting algorithms within the linear-programming framework. The Journal of VLSI Signal Processing. 2007; 48(1).
  • Sun Y, Todorovic S, Li J. Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework. Pattern Recognition Letters. 2007; 28(5).
  • Sun Y, Goodison S, Li J, Liu L, Farmerie W. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007; 23(1).
  • Sun Y, Liu Z, Todorovic S, Li J. Adaptive boosting for SAR automatic target recognition. IEEE Transactions on Aerospace and Electronic Systems. 2007; 43(1).
  • Sun Y, Todorovic S, Li J. Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence. 2006; 20(7).
  • Sun Y, Li J. Adaptive learning approach to landmine detection. IEEE Transactions on Aerospace and Electronic Systems. 2005; 41(3).
  • Wang Y, Sun Y, Li J, Stoica P. Adaptive imaging for forward-looking ground penetrating radar. IEEE Transactions on Aerospace and Electronic Systems. 2005; 41(3).
  • Sun Y, Li X, Li J. Practical landmine detector using forward-looking ground penetrating radar. Electronics Letters. 2005; 41(2).
  • Sun Y, Li J. Time-frequency analysis for plastic landmine detection via forward-looking ground penetrating radar. IEE Proceedings-Radar, Sonar and Navigation,. 2003; 150(4).
See all (37 more)
Books and Book Chapters:
  • Sun Y, Cai Y. Estimating Species Richness Using Large Collections of 16S rRNA Pyrosequences. 2011.
  • Sun Y. Feature Weighting through Local Learning. 2007.

  • "Toward optimal feature selection through local learning" , Department of Computer Science, SUNY Buffalo (2012)
  • "Advanced computational algorithms for mining massive high-dimensional biological data" , H. Lee Moffitt Cancer Center & Research Institute, Biomedical Informatics Department (2012)
  • "Advanced computational algorithms for deep interrogation of microbial communities" International Census of Marine Microbes Workshop, University of Southern California (2012)
  • "Derivation of molecular signatures for accurate breast cancer prognosis" Cancer Topics Seminar Series, University of Florida (2011)
  • "Pyrosequencing technology for microbial community analysis" Summer School, Marine Biological Laboratory (2011)
  • "Advanced computational algorithms for deep interrogation of microbial communities" International Census of Marine Microbes Workshop, Max Planck Institute for Marine Microbiology (2011)
  • "Molecular classification for breast cancer prognosis" , MD Anderson Cancer Center (2008)

Clinical Specialties:
Clinical Offices:
Insurance Accepted:

Contact Information

Center Of Excellence
701 Ellicott Street
Buffalo, NY 14203
Phone: 716 - 8811374

Log in to Update Your Profile