Date of Award


Document Type

Open Access Dissertation


Computer Science and Engineering


College of Engineering and Computing

First Advisor

Jijun Tang


Gene order gets evolved under events such as rearrangements, duplications, and losses, which can change both the order and content along the genome, through the long history of genome evolution. Recently, the accumulation of genomic sequences provides researchers with the chance to handle long-standing problems about the phylogenies, or evolutionary histories, of sets of species, and ancestral genomic content and orders. Over the past few years, such problems have been proven so interesting that a large number of algorithms have been proposed in the attempt to resolve them, following different standards. The work presented in this dissertation focuses on algorithms and models for whole-genome evolution and their applications in phylogeny and ancestor inference from gene order. We developed a flexible ancestor reconstruction method (FARM) within the framework of maximum likelihood and weighted maximum matching. We designed binary code based framework to reconstruct evolutionary history for whole genome gene orders. We developed algorithms to estimate/predict missing adjacencies in ancestral reconstruction procedure to restore gene order from species, when leaf genomes are far from each other. We developed a pipeline involving maximum likelihood, weighted maximum matching and variable length binary encoding for estimation of ancestral gene content, to reconstruct ancestral genomes under the various evolutionary model, including genome rearrangements, additions, losses and duplications, with high accuracy and low time consumption. Phylogenetic analyses of whole-genome data have been limited to small collections of genomes and low-resolution data, or data without massive duplications. We designed a maximum-likelihood approach to phylogeny analysis (VLWD) based on variable length binary encoding, under maximum likelihood model, to reconstruct phylogenies from whole genome data, scaling up in accuracy and make it capable of reconstructing phylogeny from whole genome data, like triploids and tetraploids. Maximum likelihood based approaches have been applied to ancestral reconstruction but remain primitive for whole-genome data. We developed a hierarchical framework for ancestral reconstruction, using variable length binary encoding in content estimation, then adjacencies fixing and missing adjacencies predicting in adjacencies collection and finally, weighted maximum matching in gene order assembly. Therefore it extensively improves the performance of ancestral gene order reconstruction. We designed a series of experiments to validate these methods and compared the results with the most recent and comparable methods. According to the results, they are proven to be fast and accurate.