Date of Award

Spring 2020

Document Type

Open Access Dissertation


Environmental Health Sciences

First Advisor

Sean Norman


Metagenomics has developed into a reliable mechanism to analyze microbial diversity of microbial communities in the recent years. Through the use of next-generation sequencing, metagenomic studies can generate billions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics, a large amount of data has been produced. This high level of data production requires the development of computational tools and pipelines to manage data scalability and performance. In this thesis, we developed several tools that will aid in the exploration of the large amount of DNA sequence data, and we further developed a bioinformatic pipeline that will enhance the use of the developed tools by researchers with minimum computational background while also making them available for widespread use across the field of microbiology so that the research community can further contribute to development of these tools to overcome the growing computational challenges resultant from continued technological advances in high throughput DNA sequencing.