TDLR: Top (Semantic)-Down (Syntactic) Language Representation
Language understanding involves processing text with both the grammatical and common-sense contexts of the text fragments. The text “I went to the grocery store and brought home a car” requires both the grammatical context (syntactic) and common-sense context (semantic) to capture the oddity in the sentence. Contextualized text representations learned by Language Models (LMs) are expected to capture a variety of syntactic and semantic contexts from large amounts of training data corpora. Recent work such as ERNIE has shown that infusing the knowledge contexts, where they are available in LMs, results in significant performance gains on General Language Understanding (GLUE) benchmark tasks. However, to our knowledge, no knowledge-aware model has attempted to infuse knowledge through top-down semantics-driven syntactic processing (Eg: Common-sense to Grammatical) and directly operated on the attention mechanism that LMs leverage to learn the data context. We propose a learning framework Top-Down Language Representation (TDLR) to infuse common-sense semantics into LMs. In our implementation, we build on BERT for its rich syntactic knowledge and use the knowledge graphs ConceptNet and WordNet to infuse semantic knowledge.
Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 2022.
© The Authors, 2022
Rawte, V., Chakraborty, M., Roy, K., Gaur, M., Faldu, K., Kikani, P., Akbari, H., & Sheth, A. (2022). TDLR: Top (Semantic)-Down (Syntactic) Language Representation.