Yijun                          Sun

Yijun Sun PhD

Department of Microbiology and Immunology

Assistant Professor in Bioinformatics

Specialty/Research Focus


Professional Summary:

While high-throughput genomics technologies are revolutionizing many aspects of modern biology, the lack of computational algorithms and resources for analyzing the massive data generated by these techniques has become a rate-limiting factor for scientific discoveries in biology research.

Our lab studies machine learning, bioinformatics and their applications to cancer informatics and metagenomics. Our work is based on solid mathematical and statistical theories, and its focus is twofold: 1) developing advanced algorithms and building computational infrastructures to help biologists keep pace with the unprecedented growth of genomics datasets available today, and 2) enabling them to make full use of their massive, high-dimensional data for various biological inquiries.

We are currently working on three major projects. The first project is funded by the National Science Foundation (NSF). Our goal is to develop an integrated suite of computational and statistical algorithms that enable researchers to process millions of 16S ribosomal RNA sequences in order to: 1) derive quantitative microbial signatures to characterize various infectious diseases, 2) interactively visualize the complex metagenomic structure of a microbial community, 3) study microbe-microbe interactions and community dynamics, and 4) identify novel species. We collaborate with researchers throughout the University at Buffalo and at the University of Florida to apply bioinformatics algorithms developed in this project to various applications.

The second project is funded by the National Institutes of Health. This is a joint project with Dr. James Jarvis in UB’s Department of Pediatrics. Our goal is to use high-throughput genomics technologies to identify molecular markers that characterize juvenile idiopathic arthritis.

In the third project, we use advanced machine learning algorithms to develop computational models for breast cancer prognosis by using all available information, including clinical variables and genetics information. We have conducted extensive computational studies using thousands of cancer tissue samples and have obtained solid evidence suggesting that cancer progression trajectories exist. I hope that our work can significantly advance our understanding of the underlying mechanisms of cancer growth and thus open new avenues for cancer research.

The algorithms and software related to metagenomics and feature selection developed in my lab have been used by more than 200 research institutes worldwide to process large, complex data sets that are core to a wide variety of biological and biomedical research.

Education and Training:
  • PhD, Electrical Engineering, University of Florida (2004)
  • MS, Electrical Engineering, University of Florida (2003)
  • BS, Electrical Engineering/Mechanical Engineering, Shanghai Jiao Tong University (1995)
  • Assistant Professor, University at Buffalo, The State University of New York (2012-present)
  • Assistant Scientist, University of Florida, The Interdisciplinary Center for Biotechnology Research (2005–2012)
Awards and Honors:
  • Spotlight Paper in the September 2010 issue of IEEE Trans. on Pattern Analysis and Machine Intelligence (2010)
  • IEEE M. Barry Carlton Best Transactions Paper Award (2005)

Research Expertise:
  • Bioinformatics: metagenomics, sequence analysis, microarray data analysis, microbial community analysis, molecular classification and genetic network modeling for cancer diagnosis and prognosis, microbial network analysis, phylogenetic analysis
  • Machine Learning: large margin classification/regression, large-scale clustering analysis, ensemble learning, feature selection/extraction, computational learning theory, network analysis, graphical modeling and Bayesian network
Research Centers:
  • Witebsky Center for Microbial Pathogenesis and Immunology
  • Center of Excellence in Bioinformatics and Life Sciences
UB 2020 Strategic Strengths:
  • Molecular Recognition in Biological Systems and Bioinformatics
Grants and Sponsored Research:
  • July 2014–July 2019
    Oral Microbiome and Periodontitis: A Prospective Study in Postmenopausal Women
    Role: Co-Investigator
  • January 2014–January 2016
    Derivation of molecular signatures for accurate prostate cancer prognosis using both annotated and non-annotated tissue samples
    SUNY Research Foundation
    Role: Principal Investigator
  • February 2011–January 2016
    Microarray-based Biomarkers in Juvenile Idiopathic Arthritis
    Role: Co-Investigator
  • July 2014–June 2015
    Expression-Based Biomarkers in Cystic Fibrosis
    UB Translational Pilot Studies Fund
    Role: Co-Investigator
  • June 2011–June 2015
    Advanced computational algorithms for deep interrogation of microbial communities using millions of 16S rRNA pyrosequences
    Role: Principal Investigator
  • June 2011–June 2013
    Derivation of molecular signatures for accurate breast cancer prognosis
    Bankhead-Coley Cancer Program
    Role: Principal Investigator
  • September 2007–October 2010
    Accurate breast cancer prognosis
    Susan Komen Breast Cancer Foundation
    Role: Co-Investigator
Journal Articles:
  • Sun Y, Cai Y, Huse SM, Knight R, Farmerie WG, Wang X, Mai V. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Briefings in Bioinformatics. 2012; 13(1).
  • Sun Y, Cai Y, Mai V, Farmerie W, Yu F, Li J, Goodison S. Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data. Nucleic Acids Research. 2010; 38(22).
  • Sun Y, Todorovic S, Goodison S. Local-learning-based feature selection for high-dimensional data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010; 32(9).
  • Sun Y, Cai Y, Liu L, Yu F, Farrell ML, McKendree W, Farmerie W. ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Research. 2009; 37(10).
  • Sun Y. Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007; 29(6).
  • Wang J and Sun Y. From one graph to many: ensemble transduction for content-based database retrieval. Knowledge-Based Systems. 2014; 65.
  • Wang J, Sun Y, Gao, X. Sparse structure regularized ranking. Multimedia Tools and Applications. 2014.
  • Hickman D, Jones MK, Zhu S, Kirkpatrick E, Ostrov DA, Wang X, Ukhanova M, Sun Y, Mai V, Salemi M. The effect of malnutrition on norovirus infection. mBio. 2014; 5(2).
  • Chung A, Li Q, Blair SJ, Jesus MD, Dennis KL, LeVea C, Yao J, Sun Y, Conway TF, Virtuoso LP, Battaglia NG, Furtado S, Mathiowitz E, Mantis NJ, Khazaie K, Egilmez N. Oral interleukin-10 alleviates polyposis via neutralization of pathogenic T-regulatory cells. Cancer Research. 2014.
  • Sun Y; Yao J; Nowak N, Goodison S. Cancer progression modeling using static sample data. Genome Biology. 2014.
  • Sen R, Raychoudhury R, Cai Y, Sun Y, Lietze VU, Boucias DG, Scharf ME. Differential impacts of juvenile hormone, soldier head extract and alternate caste phenotypes on host and symbiont transcriptome composition in the gut of the termite Reticulitermes flavipes. BMC Genomics. 2013; 14.
  • Correll MJ, Pyle TP, Millar K, Sun Y, Yao J, Edelmann RE, Kiss J. Transcriptome analyses of Arabidopsis thaliana seedlings grown in space: implications for gravity-responsive genes. Planta. 2013; 238(3).
  • Zhang X, Yao J, Zhang Y, Sun Y, Mou Z. The Arabidopsis Mediator Complex Subunits MED14/SWP and MED16/SFR6/IEN1 Differentially Regulate Defense Gene Expression in Plant Immune Responses. The Plant Journal. 2013; 75(3).
  • Boucias DG, Cai Y, Sun Y, Lietze VU, Sen R, Raychoudhury R, Scharf ME. The hindgut-lumen prokaryotic microbiota of the termite Reticulitermes flavipes and its responses to dietary lignocellulose composition. Molecular Ecology. 2013; 22(7).
  • Raychoudhury R, Sen R, Cai Y, Sun Y, Lietze VU, Boucias DG, Scharf ME. Comparative metatranscriptomic signatures of wood and paper feeding in the gut of the termite Reticulitermes flavipes (Isoptera: Rhinotermitidae) Insect Molecular Biology. 2013; 22(2).
  • Wang Y, An C, Zhang X, Yao J, Zhang Y, Sun Y, Yu F, Amador DM, Mou Z. The Arabidopsis elongator complex subunit2 epigenetically regulates plant immune responses. Plant Cell. 2013; 25(2).
  • Wang X, Yao J, Sun Y, Mai V. M-pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformatics. 2013; 14(1).
  • Mai V, Torrazza RM, Ukhanova M, Wang X, Sun Y, Li N, Shuster J, Sharma R, Hudak ML, Neu J. Distortions in development of intestinal microbiota associated with late onset sepsis in preterm infants PLoS ONE. 2013; 8(1).
  • Yin L, Liu L, Sun Y, Hou W, Lowe AC, Gardner BP, Salemi M, Williams WB, Farmerie WG, Sleasman JW, Goodenow MM. High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems. Retrovirology. 2012; 9.
  • Urquidi V, Goodison S, Cai Y, Sun Y, Rosser CJ. A candidate molecular biomarker panel for the detection of bladder cancer. Cancer Epidemiology Biomarkers. 2012; 21(12).
  • Zhang X, Wang C, Zhang Y, Sun Y, Mou Z.. The Arabidopsis mediator complex subunit16 positively regulates salicylate-mediated systemic acquired resistance and jasmonate/ethylene-induced defense pathways. Plant Cell. 2012; 24(10).
  • Wang X, Cai Y, Sun Y, Knight R, Mai V. Secondary Structure Information Does not Improve OTU Picking for 16S rRNA Sequences. The ISME Journal. 2012; 6(7).
  • Ukhanova M, Culpepper T, Baer D, Gordon D, Kanahori S, Valentine J, Neu J, Sun Y, Wang X, Mai V. Gut Microbiota Correlates with Energy Gain from a Dietary Fiber and Appears Associated with Acute and Chronic Intestinal Diseases. Clinical Microbiology and Infection. 2012.
  • Paul AL, Zupanska AK, Ostrow DT, Zhang Y, Sun Y, Li JL, Shanker S, Farmerie WG, Amalfitano CE, Ferl RJ. Spaceflight Transcriptomes: Unique Responses to a Novel Environment. Astrobiology. 2012; 12(1).
  • Cai, Y, Sun Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Research. 2011; 39(14).
  • Mai V, Young CM, Ukhanova M, Wang X, Sun Y, Casella G, Theriaque D, Li N, Sharma R, Hudak M, Neu J. Fecal microbiota in premature infants prior to necrotizing enterocolitis. PLoS One. 2011; 6(6).
  • Goodison S, Sun Y, Urquidi V. Derivation of cancer diagnostic and prognostic signatures from gene expression data. Bioanalysis. 2010; 2(5).
  • Bandyopadhyay N, Kahveci T, Goodison S, Sun Y, Ranka S. Pathway-based feature selection algorithm for cancer microarray data. Advances in Bioinformatics. 2010; 3.
  • Sun Y, Urquidi V, Goodison S. Derivation of molecular signatures for breast cancer recurrence prediction using a two-way validation approach. Breast Cancer Research and Treatment. 2010; 119(3).
  • Duan Y, Zhou L, Hall DG, Li W, Doddapaneni H, Lin H, Liu L, Vahling CM, Gabriel DW, Williams KP, Dickerman A, Sun Y, Gottwald T. Complete genome sequence of citrus huanglongbing bacterium, ‘Candidatus Liberibacter asiaticus‘ obtained through metagenomics. Molecular Plant-Microbe Interactions. 2009; 22(8).
  • Sun Y, Goodison S. Optimizing molecular signatures for predicting prostate cancer recurrence. Prostate. 2009; 69(10).
  • Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S. Bladder cancer-associated gene expression signatures identified by profiling of exfoliated urothelia. Cancer Epidemiology, Biomarkers and Prevention. 2009; 18(2).
  • Sun Y, Wu D. Feature extraction through local learning. Statistical Analysis and Data Mining. 2009; 2(1).
  • Yu F, Sun Y, Liu L, Farmerie W. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis. Bioinformation. 2009; 4(1).
  • Bandyopadhyay N, Kahveci T, Goodison S, Sun Y, Ranka S. Pathway-BasedFeature Selection Algorithm for Cancer Microarray Data. Advances in Bioinformatics. 2009.
  • Sun Y, Todorovic S, Li J. Increasing the robustness of boosting algorithms within the linear-programming framework. The Journal of VLSI Signal Processing. 2007; 48(1).
  • Sun Y, Todorovic S, Li J. Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework. Pattern Recognition Letters. 2007; 28(5).
  • Sun Y, Goodison S, Li J, Liu L, Farmerie W. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007; 23(1).
  • Sun Y, Liu Z, Todorovic S, Li J. Adaptive boosting for SAR automatic target recognition. IEEE Transactions on Aerospace and Electronic Systems. 2007; 43(1).
  • Sun Y, Todorovic S, Li J. Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence. 2006; 20(7).
  • Sun Y, Li J. Adaptive learning approach to landmine detection. IEEE Transactions on Aerospace and Electronic Systems. 2005; 41(3).
  • Wang Y, Sun Y, Li J, Stoica P. Adaptive imaging for forward-looking ground penetrating radar. IEEE Transactions on Aerospace and Electronic Systems. 2005; 41(3).
  • Sun Y, Li X, Li J. Practical landmine detector using forward-looking ground penetrating radar. Electronics Letters. 2005; 41(2).
  • Sun Y, Li J. Time-frequency analysis for plastic landmine detection via forward-looking ground penetrating radar. IEE Proceedings-Radar, Sonar and Navigation,. 2003; 150(4).
Books and Book Chapters:
  • Sun Y, Cai Y. Estimating Species Richness Using Large Collections of 16S rRNA Pyrosequences. 2011.
  • Sun Y. Feature Weighting through Local Learning. 2007.

  • "Toward optimal feature selection through local learning" , Department of Computer Science, SUNY Buffalo (2012)
  • "Advanced computational algorithms for mining massive high-dimensional biological data" , H. Lee Moffitt Cancer Center & Research Institute, Biomedical Informatics Department (2012)
  • "Advanced computational algorithms for deep interrogation of microbial communities" International Census of Marine Microbes Workshop, University of Southern California (2012)
  • "Derivation of molecular signatures for accurate breast cancer prognosis" Cancer Topics Seminar Series, University of Florida (2011)
  • "Pyrosequencing technology for microbial community analysis" Summer School, Marine Biological Laboratory (2011)
  • "Advanced computational algorithms for deep interrogation of microbial communities" International Census of Marine Microbes Workshop, Max Planck Institute for Marine Microbiology (2011)
  • "Molecular classification for breast cancer prognosis" , MD Anderson Cancer Center (2008)

Contact Information

Center Of Excellence
701 Ellicott Street
Buffalo, NY 14203
Phone: 716 - 8811374

