Gaofeng Pan

Date of Award

Spring 2022

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Jijun Tang


Deep Learning had been widely used in computational biology research in past few years. A great amount of deep learning methods were proposed to solve bioinformatics problems, such as gene function prediction, protein interaction classification, drug effects analysis, and so on; most of these methods yield better solutions than traditional computing methods. However, few methods were proposed to solve problems encountered in evolutionary biology research. In this dissertation, two neural network learning methods are proposed to solve the problems of genome location prediction and median genome generation encountered in phylogenetic tree construction; the ability of neural network learning models on solving evolutionary biology problems will be explored.

Phylogenetic tree construction based on genomics genotype has more accurate results than construction based on genomics phenotype. The most famous phylogenetic tree construction framework utilizes median genome algorithms to filter tree topology structure and update phylogenetic ancestral genome. Currently, there are several median genome algorithms which could be applied on short genome and simple evolution pattern, however when genome length becomes longer and evolution pattern is complex these algorithms have unstable performance and exceptionally long running time. In order to lift these limitations, a novel median genome generator based on graph neural network learning model is proposed in this research. With graph neural network, genome rearrangement pattern and genome relation could be extracted out from internal gene connection. Experiment results show that this generator could obtain stable median genome results in constant time no matter how long or how complex genomes are; its outstanding performance makes it the best choice in GRAPPA framework for phylogenetic tree construction.