Wijesiriwardene: 0000-0001-8431-8443

Wickramarachchi: 0000-0001-5810-1849

Shalin: 0000-0001-8135-2793

Sheth: 0000-0002-0021-5293

Document Type



Switching from an analogy pedagogy based on comprehension to analogy pedagogy based on production raises an impractical manual analogy scoring problem. Conventional symbol-matching approaches to computational analogy evaluation focus on positive cases, and challenge computational feasibility. This work presents the Discriminative Analogy Features (DAF) pipeline to identify the discriminative features of strong and weak long-form text analogies. We introduce four feature categories (semantic, syntactic, sentiment, and statistical) used with supervised vector-based learning methods to discriminate between strong and weak analogies. Using a modestly sized vector of engineered features with SVM attains a 0.67 macro F1 score. While a semantic feature is the most discriminative, out of the top 15 discriminative features, most are syntactic. Combining these engineered features with an ELMo-generated embedding still improves classification relative to an embedding alone. While an unsupervised K-Means clustering-based approach falls short, similar hints of improvement appear when inputs include the engineered features used in supervised learning.

APA Citation

Wijesiriwardene, T., Wickramarachchi, R., Valerie L. Shalin, V. L., & Sheth, Amit P. (2020). Towards Efficient Scoring of Student-generated Long-form Analogies in STEM. ICCBR Analogies’22: Workshop on Analogies: from Theory to Applications at ICCBR-2022.