As Named Entity Recognition (NER) has been essential in identifying critical elements of unstructured content, generic NER tools remain limited in recognizing entities specific to a domain, such as drug use and public health. For such high-impact areas, accurately capturing relevant entities at a more granular level is critical, as this information influences real-world processes. On the other hand, training NER models for a specific domain without handcrafted features requires an extensive amount of labeled data, which is expensive in human effort and time. In this study, we employ distant supervision utilizing a domain-specific ontology to reduce the need for human labor and train models incorporating domain-specific (e.g., drug use) external knowledge to recognize domain specific entities. We capture entities related the drug use and their trends in government epidemiology reports, with an improvement of 8% in F1-score.
Digital Object Identifier (DOI)
Published in Studies in Health Technology and Informatics, Volume 290, 2022, pages 140-144.
© 2022 International Medical Informatics Association (IMIA) and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
Bajaj, G., Kursuncu, U., Gaur, M., Lokala, U., Hyder, A., Parthasarathy, S., & Sheth, A. (2022). Knowledge-Driven Drug-Use NamedEntity Recognition with Distant Supervision. MEDINFO 2021: One World, One Health – Global Partnership for Digital Innovation, 140–144. https://doi.org/10.3233/SHTI220048