Date of Award


Document Type

Campus Access Dissertation


Computer Science and Engineering

First Advisor

Homayoun Valafar


Structural biologists will perform a significant portion of their future work in silico due to increasingly sophisticated computational tools and an overwhelming amount of available data. One area of focus in computational methods is the development of algorithms for aligning multiple protein structures. Although these algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Furthermore, state-of-the-art protein alignment algorithms use a 3D structure representation. This eliminates their ability to leverage comparison algorithms developed for 1D protein representations, such as sequences. Here we introduce a new approach to aligning multiple protein structures. We introduce a set of informative features to reduce a structure to a linear set of features. This approach allows string algorithms, initially developed for sequences, to be applied to structures. While other work has attempted this linearization, ours is the first to do so in a manner that is competitive with the best 3D algorithms. We evaluate our algorithm's effectiveness by focusing on two problems: protein core identification and structure phylogeny. These demonstrate the feasibility of the chosen set of features. They also allow comparison of our structure alignment algorithm to existing methods. We present results that show our 1D representation yields better results than the best 3D algorithms.

A second disadvantage of modern structure alignment algorithms is that they are only capable of performing a single task, usually protein core identification with the assumption of rigid-body structures. The algorithm described removes this limitation by allowing the user to weight the types of information used to generate the alignment. This expands its utility to a wide variety of problems usually solved by custom algorithms. We demonstrate an example customization that provides insight into a problem of biological relevance - identifying hinge points between flexible domains.

Finally, protein structure comparisons can be used to enhance other proteomics algorithms. We introduce an extension to REDCRAFT, a protein folding algorithm that uses experimental NMR data. This extension compares candidate folded structures to a template that is known to be similar to the target protein. Our method allows folding through data gaps and with lower amounts of data. We present results from three structures to demonstrate the utility of this approach.