Published April 20, 2017
Yijun Sun, PhD, assistant professor of microbiology and immunology, has been awarded a three-year $973,000 grant from the National Institute of Allergy and Infectious Diseases to develop advanced algorithms to address computational challenges in microbiome research.
“The human microbiome plays essential roles in many important physiological processes,” Sun says.
“If successfully implemented, this work could significantly expand the capacity of existing pipelines for large-scale data analysis and scientific discovery, resulting in a significant impact on the field,” he says. “The expected outcome of this work will be a set of computational tools of high utility for the microbiology community and beyond.”
The research addresses two key issues currently facing the metagenomics community.
Sun says accurate construction and annotation of OTU tables using millions of 16S rRNA sequences is one of the most important yet most difficult problems in microbiome data analysis.
“Currently, it lacks computational algorithms capable of handling extremely large sequence data and constructing biologically consistent OTU tables,” he says.
Sun’s research proposes a novel method that performs OTU table construction and annotation simultaneously by utilizing input and reference sequences, reference annotations and data clustering structure within one analytical framework.
Dynamic data-driven cutoffs are derived to identify OTUs that are consistent not only with data clustering structure but also with reference annotations.
“When successfully implemented, our method will generally address the computational needs of processing hundreds of millions of 16S rRNA reads that are currently being generated by large-scale studies,” Sun says.
The second issue concerns developing novel methods to extract pertinent information from massive sequence data, thereby facilitating the field shifting from descriptive research to mechanistic studies.
“We are particularly interested in microbial community dynamics analysis, which can provide a wealth of insight into disease development unattainable through a static experiment design and lays a critical foundation for developing probiotic and antibiotic strategies to manipulate microbial communities,” Sun notes.
Traditionally, system dynamics is approached through time-course studies. However, due to economical and logistical constraints, time-course studies are generally limited by the number of samples examined and the time period followed.
“With the rapid development of sequencing technology, many thousands of samples are being collected in large-scale studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data, instead of time-course data, to study microbial community dynamics,” Sun says.
“To our knowledge, this is the first time that massive static data is used to study dynamic aspects of microbial communities,” he adds. “When successfully implemented, our approach can effectively overcome the sampling limitation of time-course studies, and it opens a new avenue of research to study microbial dynamics underlying disease development without performing a resource-intensive time-course study.”
Collaborators on the research project are:
Sun, a machine learning and bioinformatics researcher, joined UB in 2012. He studies machine learning, data mining and bioinformatics and their applications to cancer informatics and metagenomics in his lab located within UB’s New York State Center of Excellence in Bioinformatics and Life Sciences.