Author

Ahmed Saaudi

Date of Award

Fall 2019

Document Type

Open Access Dissertation

Department

Computer Science and Engineering

First Advisor

Csilla Farkas

Abstract

Malicious insiders increasingly affect organizations by leaking classified data to unautho- rized entities. Detecting insiders’ misuses in computer systems is a challenging problem. In this dissertation, we propose two approaches to detect such threats: a probabilistic graph- ical model-based approach and a deep learning-based approach. We investigate the logs of computer-based activities to discover patterns of misuse. We model user’s behaviors as sequences of computer-based events.

For our probabilistic graphical model-based approach, we propose an unsupervised model for insider’s misuse detection. That is, we develop Stochastic Gradient Descent method to learn Hidden Markov Models (SGD-HMM) with the goal of analyzing user log data. We propose the use of varying granularity levels to represent users’ log data: Session-based, Day-based, and Week-based. A user’s normal behavior is modeled using SGD-HMM. The model is used to detect any deviation from the normal behavior. We also propose a Sliding Window Technique (SWT) to identify malicious activity by considering the near history of the user’s activities. We evaluate the experimental results in terms of Receiver Operating Characteristic (ROC). The area under the curve (AUC) represents the model’s performance with respect to the separability of the normal and abnormal behaviors. The higher the AUC scores, the better the model’s performance. Combining SGD-HMM with SWT resulted in AUC values between 0.81 and 0.9 based on the window size. Our solution is superior to the solutions presented by other researchers.

For our deep learning-based approach, we propose a supervised model for insider’s misuse detection. Our solution is based on using natural language processing with deep learning. We examine textual event logs to investigate the semantic meaning behind a user’s behavior. The proposed approaches consist of character embeddings and deep learning net- works that involve Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). We develop three deep-learning models: CNN, LSTM, and CNN-LSTM. We run a 10-fold subject-independent cross-validation procedure to evaluate the developed mod- els. Our deep learning-based approach shows promising behavior. The first model, CNN, presents a good performance of classifying normal samples with an AUC score of 0.85, false-negative rate of 29%, and false-positive rate of 26%. The second model, LSTM, shows the best performance of detecting malicious samples with an AUC score of 0.873, false-negative rate of 0%, and false-positive rate of 37%. The third model, CNN-LSTM, presents a moderate behavior of detecting both normal and insider samples with an AUC score of 0.862, false-negative rate 16%, and 17% false-positive rate. Moreover, we use our proposed approach to investigate networks with deeper and wider structures. For this, we study the impact of increasing the number of CNN or LSTM layers, nodes per layer, and both of them at the same time on the model performance.

Our results indicate that machine learning approaches can be effectively deployed to detect insiders’ misuse. However, it is difficult to obtain labeled data. Furthermore, the high presence of normal behavior and limited misuse activities create a highly unbalanced data set. This impacts the performance of our models.

COinS