Date of Award

Fall 2020

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Homayoun Valafar

Second Advisor

Jijun Tang


Proteins are responsible for various functions throughout organisms, or within the systems, they operate. Active-sites or functional/ binding sites are regions responsible for activity in a protein; they serve as a catalyst for reactions, attach or bind to other molecules (ligands), and maintain function. With the profusion of protein sequence and structure data, it's increasingly relevant to develop automated methods of identifying and investigating active-sites for proteins. Active-sites identification will have a direct impact: in better understanding molecular basis for diseases, assisting in drug design, the study of targeting mutants, and for functional annotation of unknown proteins. The proper knowledge of active-sites will also be beneficial in protein design and engineering. Existing computational approaches to active-site identification fall short of the ideal. Several approaches fail to include some critical information, such as, global structure, local structure, amino acid position, and local biochemical properties. Here we present msTALI (Multiple Structure Torsion Angle Alignment) to better understand and characterize protein sequence-structure-function relationships.

The existing studies establishing our understanding of active-sites stress the importance of sequence, structure, and biochemical properties of proteins in their function. Therefore, an ideal method for active-site analysis should consider all the information above. The msTALI tool is unique compared to other existing software in that it incorporates sequence, structure and biochemical properties of amino acids to perform its analysis. Furthermore, msTALI generates competitive results and exhibits an ability to address proteins undergoing rigid-body motion. Additionally, the customization capability of msTALI makes it an expandable algorithm; suitable for the valid identification of active-sites.

We utilize msTALI successful structural alignment capabilities under premises for active-site studies. The theoretical background is paramount since the research is interdisciplinary. We discuss molecular biological constructs, relate such descriptions to active-site research, survey previous methods, and expand our methodology. The msTALI software is used first to examine sets of proteins with confirmed ATPase activity. We use several fold families to evaluate effectiveness. Additionally, we map the trajectory for additional studies with upward of ten functional classes of proteins to strengthen the targeting set of proteins for observation. Collectively, findings will expand the understanding of active-sites, yield development for automated site description, and generate the programmatic development of software.