Fawad Kirmani

Date of Award

Fall 2018

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

John Rose


The goal of this work is to improve proteotypic peptide prediction with lower pro- cessing time and better efficiency. Proteotypic peptides are the peptides in protein sequence that can be confidently observed by mass-spectrometry based proteomics. One of the widely used method for identifying peptides is tandem mass spectrometry (MS/MS). The peptides that need to be identified are compared with the accurate mass and elution time (AMT) tag database. The AMT tag database helps in reducing the processing time and increases the accuracy of the identified peptides. Prediction of proteotypic peptides has seen a rapid improvement in recent years for AMT studies for peptides using amino acid properties like charge, code, solubility and hydropathy.

We describe the improved version of a support vector machine (SVM) classifier that has achieved similar classification sensitivity, specificity and AUC on Yersinia Pestis, Saccharomyces cerevisiae and Bacillus subtilis str. 168 datasets as was de- scribed by Web-Robertson et al. [15] and Ahmed Alqurri [11]. The improved version of the SVM classifier uses the C++ SVM library instead of the MATLAB built in li- brary. We describe how we achieved these similar results with much lesser processing time.

Furthermore, we tested four machine learning classifiers on Yersinia Pestis, Sac- charomyces cerevisiae and Bacillus subtilis str. 168 data. We performed feature selection from scratch, using four different algorithms to achieve better results from the different machine learning algorithms. Some of these classifiers gave similar or better results than the SVM classifiers with fewer features. We describe the results of these four classifiers with different feature sets.