Faculty Publications

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Amir Karami, University of South CarolinaFollow

Document Type

Conference Proceeding

Abstract

The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy.

Digital Object Identifier (DOI)

https://doi.org/10.1109/ICDMW.2017.73

Publication Info

2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017.

APA Citation

Karami, A. (2017). Taming wild high dimensional text data with a fuzzy lash. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 518-522. https://doi.org/10.1109/ICDMW.2017.73

Link to Record

COinS

Faculty Publications

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Document Type

Abstract

Digital Object Identifier (DOI)

Publication Info

APA Citation

Search

Browse

Submissions

Links

Faculty Publications

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Author(s)

Document Type

Abstract

Digital Object Identifier (DOI)

Publication Info

APA Citation

Share

Search

Browse

Submissions

Links