ORCID iD
Wijesiriwardene: 0000-0001-8431-8443
Wickramarachchi: 0000-0001-5810-1849
Shalin: 0000-0001-8135-2793
Sheth: 0000-0002-0021-5293
Document Type
Workshop
Abstract
Switching from an analogy pedagogy based on comprehension to analogy pedagogy based on production raises an impractical manual analogy scoring problem. Conventional symbol-matching approaches to computational analogy evaluation focus on positive cases, and challenge computational feasibility. This work presents the Discriminative Analogy Features (DAF) pipeline to identify the discriminative features of strong and weak long-form text analogies. We introduce four feature categories (semantic, syntactic, sentiment, and statistical) used with supervised vector-based learning methods to discriminate between strong and weak analogies. Using a modestly sized vector of engineered features with SVM attains a 0.67 macro F1 score. While a semantic feature is the most discriminative, out of the top 15 discriminative features, most are syntactic. Combining these engineered features with an ELMo-generated embedding still improves classification relative to an embedding alone. While an unsupervised K-Means clustering-based approach falls short, similar hints of improvement appear when inputs include the engineered features used in supervised learning.
Publication Info
Published in ICCBR Analogies’22: Workshop on Analogies: from Theory to Applications at ICCBR-2022, 2022.
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
APA Citation
Wijesiriwardene, T., Wickramarachchi, R., Valerie L. Shalin, V. L., & Sheth, Amit P. (2020). Towards Efficient Scoring of Student-generated Long-form Analogies in STEM. ICCBR Analogies’22: Workshop on Analogies: from Theory to Applications at ICCBR-2022.