Datamining Protein Structure Databanks for Crystallization Patterns of Proteins

Document Type



A study of 345 protein structures selected among 1,500 structures determined by nuclear magnetic resonance (NMR) methods, revealed useful correlations between crystallization properties and several parameters for the studied proteins. NMR methods of structure determination do not require the growth of protein crystals, and hence allow comparison of properties of proteins that have or have not been the subject of crystallographic approaches. One‐ and two‐dimensional statistical analyses of the data confirmed a hypothesized relation between the size of the molecule and its crystallization potential. Furthermore, two‐dimensional Bayesian analysis revealed a significant relationship between relative ratio of different secondary structures and the likelihood of success for crystallization trials. The most immediate result is an apparent correlation of crystallization potential with protein size. Further analysis of the data revealed a relationship between the unstructured fraction of proteins and the success of its crystallization. Utilization of Bayesian analysis on the latter correlation resulted in a prediction performance of about 64%, whereas a two‐dimensional Bayesian analysis succeeded with a performance of about 75%.