Date of Award


Document Type

Open Access Dissertation


Computer Science and Engineering


College of Engineering and Computing

First Advisor

Homayoun Valafar


Proteins are often referred as working molecule of a cell, performing many structural, functional and regulatory processes. Revealing the function of proteins still remains a challenging problem. Advancement in genomics sequence projects produces large protein sequence repository, but due to technical difficulty and cost related to structure determination, the number of identified protein structure is far behind. Novel structures identification are particularly important for a number of reasons: they generate models of similar proteins for comparison; identify evolutionary relationships; further contribute to our understanding of protein function and mechanism; and allow for the fold of other family members to be inferred. Considering the evolutionary mechanisms responsible for the generation of new structures in proteins, it has been speculated that there may be a limited number of unique protein folds as few as ten thousand families. Currently, the Protein Data Bank consists of nearly 113,000 protein structures, but less than 1,500 families are represented, and almost no new fold families have been reported since 2008. Ideally, solved protein structures for new protein families would be used as templates for in silico structure prediction methods, and the results of both solved and predicted structures would in turn be used to infer function. However, such an approach requires new, efficient and cost-effective computational methods for target selection and structure determination. Traditional characterization of a protein structure by NMR spectroscopy is expensive and time consuming regardless of the structural novelty of the target protein. In an effort to expand the applicability of NMR spectroscopy, the community is continually focused on the development of new and economical approaches that enable the study of more challenging, or structurally novel proteins. While many advances have been made in this regard, very little attention has been made on reducing the cost of structural characterization of routine proteins.

Probability Density Profile Analysis (PDPA) has been previously introduced to directly addresses the economies of structure determination of routine proteins and subsequently, identification of novel structures from minimal sets of NMR data. The latest version of PDPA (2D-PDPA) has been successful in identifying the structural homologue of an unknown protein within a library of 1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, incorporation of additional data is necessary. However, current PDPA approach is limited by its computational requirements, and its expansion to include additional data will render it computationally infeasible. Here we propose a new method and developments that eliminate PDPA’s computational limitations and allow inclusion of Residual Dipolar Coupling (RDC) data from multiple vector types in multiple alignment media. Additionally nD-PDPA will be used to refine an unknown protein to obtain closer structure to the native in terms of bb-rmsd.