Computer Science and Engineering, Biology
The genome halving problem, motivated by the whole genome duplication events in molecular evolution, was solved by El-Mabrouk and Sankoff in the pioneering paper [SIAM J. Comput., 32 (2003), pp. 754–792]. The El-Mabrouk–Sankoff algorithm is rather complex, inspiring a quest for a simpler solution. An alternative approach to the genome halving problem based on the notion of the contracted breakpoint graph was recently proposed in [M. A. Alekseyev and P. A. Pevzner, IEEE/ACM Trans. Comput. Biol. Bioinformatics, 4 (2007), pp. 98–107]. This new technique reveals that while the El-Mabrouk–Sankoff result is correct in most cases, it does not hold in the case of unichromosomal genomes. This raises a problem of correcting a flaw in the El- Mabrouk–Sankoff analysis and devising an algorithm that deals adequately with all genomes. In this paper we efficiently classify all genomes into two classes and show that while the El-Mabrouk–Sankoff theorem holds for the first class, it is incorrect for the second class. The crux of our analysis is a new combinatorial invariant defined on duplicated permutations. Using this invariant we were able to come up with a full proof of the genome halving theorem and a polynomial algorithm for the genome halving problem.
Published in SIAM Journal on Computing, ed. Madhu Sudan, Volume 36, Issue 6, 2007, pages 1748-1763.
© by the Society for Industrial and Applied Mathematics