Zaid Alibadi

Date of Award

Spring 2021

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Jose M. Vidal


Throughout the years' several strategies and tools were proposed and developed to help the users cope with the problem of email overload, but each of these solutions had its own limitations and, in some cases, contribute to further problems. One major theme that encapsulates many of these solutions is automatically classifying emails into predefined categories (ex: Finance, Sport, Promotion, etc.) then move/tag the incoming email to that particular category. In general, these solutions have two main limitations: 1) they need to adapt to changing user’s behavior. 2) they require handcrafted features engineering which in turn need a lot of time, effort, and domain knowledge to produce acceptable performance.This dissertation aims to explore the email phenomenon and provide a scalable solution that addresses the above limitations. Our proposed system requires no handcrafted features engineering and utilizes the Speech Act Theory to design a classification system that detects whether an email required an action (i.e. to do) or no action (i.e. to read). We can automate both the features extraction and the classification phases by using our own word embeddings, trained on the entire Enron Email dataset, to represent the input. Then, we use a convolutional layer to capture local tri-gram features, followed by an LSTM layer to consider the meaning of a given feature (trigrams) concerning some “memory” of words that could occur much earlier in the email. Our system detects the email intent with 89% accuracy outperforming other related works.

In developing this system, we followed the concept of Occam’s razor (i.e. law of parsimony). It is a problem-solving principle stating that entities should not be multiplied without necessity. Chapter four present our efforts to simplify the above-proposed model by dropping the use of the CNN layer and showing that fine-tuning a pre-trained Language Model on the Enron email dataset can achieve comparable results. To the best of our knowledge, this is the first attempt of using transfer learning to develop a deep learning model in the email domain. Finally, we showed that we could even drop the LSTM layer by representing each email’s sentences using contextual word/sentence embeddings. Our experimental results using three different types of embeddings: context-free word embeddings (word2vec and GloVe), contextual word embeddings (ELMo and BERT), and sentence embeddings (DAN-based Universal Sentence Encoder and Transformer-based Universal Sentence Encoder) suggest that using ELMo embeddings produce the best result. We achieved an accuracy of 90.10%, comparing with word2vec (82.02%), BERT (58.08%), DAN-based USE (86.66%), and Transformer-based USE (88.16%).