BS8 - Generating New Protein Samples to Improve Accuracy of Protein Structure Prediction

Presenter Information

Wei Zhong, USC UpstateFollow

SCURS Disciplines

Computer Sciences

Document Type

General Presentation (Oral)

Invited Presentation Choice

Not Applicable

Abstract

Since protein structure prediction is very important for biochemical studies, researchers have developed many deep learning-based AI models to predict protein structure using sequence information only. In order to further enhance the performance of deep learning-based AI system for protein structure prediction, diverse and realistic new protein samples need be generated. These protein samples can provide additional patterns to enhance training of the computational system. Traditionally, researchers use the single generative model to explore the complex data distribution from the whole dataset, which is very challenging task. In order to overcome this problem, we propose Clustering Diffusion Model (CDM). In the CDM model, the whole dataset is divided into the multi-level tree and each ‘branch’ represents one Diffusion Model (DM). The Diffusion Model (DM) is the widely used single generative model to produce realistic and complex computer images and videos. As compared to the single generative model, each ‘branch’ of the multiple generative model-based system focuses on learning unique and distinct data distribution for only one protein family. Several ‘branches’ of the generative models in my system work together to figure out complex data distribution for the entire dataset. This strategy can greatly reduce the learning difficulties of generative models and has the potential to produce more realistic and diverse protein samples than the traditional approach. Experimental results demonstrate that CDM can further enhance the performance of protein structure prediction as compared to the single generative model. Using parallel strategies, the training time of CDM decreases significantly. Reduction of the training time can help researchers speed up the process of new drug exploration and other biochemical studies.

Keywords

Protein Structure Prediction, Generative Model, Diffusion Model, Generative Adversarial Network, Deep Learning

Start Date

10-4-2026 4:25 PM

Location

CASB 102

End Date

10-4-2026 4:40 PM

This document is currently not available here.

Share

COinS
 
Apr 10th, 4:25 PM Apr 10th, 4:40 PM

BS8 - Generating New Protein Samples to Improve Accuracy of Protein Structure Prediction

CASB 102

Since protein structure prediction is very important for biochemical studies, researchers have developed many deep learning-based AI models to predict protein structure using sequence information only. In order to further enhance the performance of deep learning-based AI system for protein structure prediction, diverse and realistic new protein samples need be generated. These protein samples can provide additional patterns to enhance training of the computational system. Traditionally, researchers use the single generative model to explore the complex data distribution from the whole dataset, which is very challenging task. In order to overcome this problem, we propose Clustering Diffusion Model (CDM). In the CDM model, the whole dataset is divided into the multi-level tree and each ‘branch’ represents one Diffusion Model (DM). The Diffusion Model (DM) is the widely used single generative model to produce realistic and complex computer images and videos. As compared to the single generative model, each ‘branch’ of the multiple generative model-based system focuses on learning unique and distinct data distribution for only one protein family. Several ‘branches’ of the generative models in my system work together to figure out complex data distribution for the entire dataset. This strategy can greatly reduce the learning difficulties of generative models and has the potential to produce more realistic and diverse protein samples than the traditional approach. Experimental results demonstrate that CDM can further enhance the performance of protein structure prediction as compared to the single generative model. Using parallel strategies, the training time of CDM decreases significantly. Reduction of the training time can help researchers speed up the process of new drug exploration and other biochemical studies.