Scholar Commons - SC Upstate Research Symposium: CH-5 Generating New Malware Samples Using Tree based Multiple Generative AI Models
 

CH-5 Generating New Malware Samples Using Tree based Multiple Generative AI Models

Presenter Information

Wei Zhong, USC UpstateFollow

SCURS Disciplines

Computer Sciences

Document Type

Oral Presentation

Abstract

Malware is the malicious software deliberately designed to damage, interrupt, or gain illegal access to computer systems, networks, or personal devices. To discover these malware-attacks early, researchers developed Malware Detection Systems (MDSs). Compared with the traditional MDSs, deep learning based MDSs can automatically extract important features from malware samples and learn sophisticated patterns within the data. The deep learning model is a cutting-edge machine learning architecture with many layers of computational units. Adding more malware samples into the training set can play significant roles in improving the detection performance of deep learning models since deep learning models require a large number of malware samples to understand the wide variety of malware variations and attack scenarios. Traditionally, researchers have adopted a single generative model built from the whole dataset to produce new malware variants. A single generative model usually experiences great difficulties in understanding the underlying complex data distribution for the whole dataset. Consequently, the effectiveness of a single generative model is reduced noticeably. To further enhance the performance of a single generative model, I proposed and developed a multiple generative models organized in the tree structure. Each generative model trained for one ‘branch’ of the tree can learn the unique pattern of malware language for one malware subfamily. Cooperation of multiple generative models in the tree can potentially generate more diverse and complex malware variants as compared with the single generative model. These diverse malware variants can provide better training samples for deep learning models to adapt to the fast-evolving malware landscape. Experimental results show that performance improvement of tree-based multiple generative models is statistically significant as compared to the single generative model while the construction time of the tree-based model is comparable to the single generative model due to the parallel strategy.

Keywords

Malware, Malware Detection System, Generative Model, Artificial Intelligence, Deep Learning

Start Date

11-4-2025 3:40 PM

Location

CASB 102

End Date

11-4-2025 3:55 PM

This document is currently not available here.

Share

COinS
 
Apr 11th, 3:40 PM Apr 11th, 3:55 PM

CH-5 Generating New Malware Samples Using Tree based Multiple Generative AI Models

CASB 102

Malware is the malicious software deliberately designed to damage, interrupt, or gain illegal access to computer systems, networks, or personal devices. To discover these malware-attacks early, researchers developed Malware Detection Systems (MDSs). Compared with the traditional MDSs, deep learning based MDSs can automatically extract important features from malware samples and learn sophisticated patterns within the data. The deep learning model is a cutting-edge machine learning architecture with many layers of computational units. Adding more malware samples into the training set can play significant roles in improving the detection performance of deep learning models since deep learning models require a large number of malware samples to understand the wide variety of malware variations and attack scenarios. Traditionally, researchers have adopted a single generative model built from the whole dataset to produce new malware variants. A single generative model usually experiences great difficulties in understanding the underlying complex data distribution for the whole dataset. Consequently, the effectiveness of a single generative model is reduced noticeably. To further enhance the performance of a single generative model, I proposed and developed a multiple generative models organized in the tree structure. Each generative model trained for one ‘branch’ of the tree can learn the unique pattern of malware language for one malware subfamily. Cooperation of multiple generative models in the tree can potentially generate more diverse and complex malware variants as compared with the single generative model. These diverse malware variants can provide better training samples for deep learning models to adapt to the fast-evolving malware landscape. Experimental results show that performance improvement of tree-based multiple generative models is statistically significant as compared to the single generative model while the construction time of the tree-based model is comparable to the single generative model due to the parallel strategy.