CH-5 Generating New Malware Samples Using Tree based Multiple Generative AI Models
SCURS Disciplines
Computer Sciences
Document Type
Oral Presentation
Abstract
Malware is the malicious software deliberately designed to damage, interrupt, or gain illegal access to computer systems, networks, or personal devices. To discover these malware-attacks early, researchers developed Malware Detection Systems (MDSs). Compared with the traditional MDSs, deep learning based MDSs can automatically extract important features from malware samples and learn sophisticated patterns within the data. The deep learning model is a cutting-edge machine learning architecture with many layers of computational units. Adding more malware samples into the training set can play significant roles in improving the detection performance of deep learning models since deep learning models require a large number of malware samples to understand the wide variety of malware variations and attack scenarios. Traditionally, researchers have adopted a single generative model built from the whole dataset to produce new malware variants. A single generative model usually experiences great difficulties in understanding the underlying complex data distribution for the whole dataset. Consequently, the effectiveness of a single generative model is reduced noticeably. To further enhance the performance of a single generative model, I proposed and developed a multiple generative models organized in the tree structure. Each generative model trained for one ‘branch’ of the tree can learn the unique pattern of malware language for one malware subfamily. Cooperation of multiple generative models in the tree can potentially generate more diverse and complex malware variants as compared with the single generative model. These diverse malware variants can provide better training samples for deep learning models to adapt to the fast-evolving malware landscape. Experimental results show that performance improvement of tree-based multiple generative models is statistically significant as compared to the single generative model while the construction time of the tree-based model is comparable to the single generative model due to the parallel strategy.
Keywords
Malware, Malware Detection System, Generative Model, Artificial Intelligence, Deep Learning
Start Date
11-4-2025 3:40 PM
Location
CASB 102
End Date
11-4-2025 3:55 PM
CH-5 Generating New Malware Samples Using Tree based Multiple Generative AI Models
CASB 102
Malware is the malicious software deliberately designed to damage, interrupt, or gain illegal access to computer systems, networks, or personal devices. To discover these malware-attacks early, researchers developed Malware Detection Systems (MDSs). Compared with the traditional MDSs, deep learning based MDSs can automatically extract important features from malware samples and learn sophisticated patterns within the data. The deep learning model is a cutting-edge machine learning architecture with many layers of computational units. Adding more malware samples into the training set can play significant roles in improving the detection performance of deep learning models since deep learning models require a large number of malware samples to understand the wide variety of malware variations and attack scenarios. Traditionally, researchers have adopted a single generative model built from the whole dataset to produce new malware variants. A single generative model usually experiences great difficulties in understanding the underlying complex data distribution for the whole dataset. Consequently, the effectiveness of a single generative model is reduced noticeably. To further enhance the performance of a single generative model, I proposed and developed a multiple generative models organized in the tree structure. Each generative model trained for one ‘branch’ of the tree can learn the unique pattern of malware language for one malware subfamily. Cooperation of multiple generative models in the tree can potentially generate more diverse and complex malware variants as compared with the single generative model. These diverse malware variants can provide better training samples for deep learning models to adapt to the fast-evolving malware landscape. Experimental results show that performance improvement of tree-based multiple generative models is statistically significant as compared to the single generative model while the construction time of the tree-based model is comparable to the single generative model due to the parallel strategy.