Computer Science and Engineering, Biology
In the past decade, genome rearrangements have attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. Methods for reconstructing phylogeny from genome rearrangements include distance-based methods, MCMC methods, and direct optimization methods. The latter, pioneered by Sankoff and extended with the software suites GRAPPA and MGR, is the most accurate approach, but is very limited due to the difficulty of its scoring procedure—it must solve multiple instances of the reversal median problem to compute the score of a given tree. The reversal median problem is known to be NP-hard and all existing solvers are extremely slow when the genomes are distant. In this paper, we present a new reversal median heuristic for unichromosomal genomes. The new method works by applying sets of reversals in a batch where all such reversals both commute and do not break the cycle of any other. Our testing using simulated datasets shows that this method is much faster than the leading solver for difficult datasets with only a slight accuracy penalty, yet retains better accuracy than other heuristics with comparable speed, and provides the additional option of searching for multiple medians. This method dramatically increases the speed of current direct optimization methods and enables us to extend the range of their applicability to organellar and small nuclear genomes with more than 50 reversals along each edge.
Published in Journal of Computational Biology, Volume 15, Issue 8, 2008, pages 1079-1092.
© 2008 Mary Ann Liebert, Inc.