Document Type

Article

Abstract

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification.

Publication Info

Published in Entropy, Volume 20, Issue 2, 2018, pages 1-19.

Rights

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., & Hu, J. (2018). Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification. Entropy, 20(2), 104. doi: 10.3390/e20020104

Download

Find in your library

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Faculty Publications

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Document Type

Abstract

Publication Info

Rights

Included in

Search

Browse

Submissions

Links

Faculty Publications

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Author(s)

Document Type

Abstract

Publication Info

Rights

Included in

Share

Search

Browse

Submissions

Links