A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

Document Type



While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex, domain specific information needs. Some complex search situations require knowledge of both ontological concepts as well as 'intelligible constructs' not typically modeled in ontologies. Intelligible constructs convey essential information, which may be important to the holistic information needs of information seekers. Such constructs may include notions of intensity, frequency, interval, dosage, emotion, sentiment, equivalence, synonymy, negation, parts-of-speech, etc. However, few search systems utilize both structured background knowledge (ontologies) and the aforementioned knowledge for query interpretation in domain specific searches. Instead, there is considerable reliance on ontological knowledge to address search tasks. Given that a vast amount of information is expressed in the unstructured form and therefore not suitable for formal representation in ontologies, there is a clear misalignment between the information needs of users and the knowledge model developed to meet such needs. To address this problem, we present a hybrid approach to domain specific information retrieval that goes beyond ontology-driven query interpretation as well as beyond synonym-based query expansion used in Information Retrieval (IR), to address complex searches. This hybrid approach is particularly effective in searches that involve social media (i.e., web forum posts), in which ontology incompleteness may significantly limit effective query interpretation and information retrieval. Unlike state-of-the-art semantic search and hybrid search applications, we are able to interpret four distinct types of data elements in search of domain specific information using social media. This data includes: 1) ontological concepts; 2) concepts in lexicons (such as emotions, sentiments, etc); 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects, routes and methods of administration, etc), and 4) expressions derived solely through rules (such as date, time, interval, frequency, dosage, etc). Specifically, our hybrid framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of textual patterns that share membership across broad templates (i.e., isomorphic), and 2) a low-level CFG that enables interpretation of specific expressions that may constitute such textual patterns. Our approach is embodied in a novel Semantic Web platform for prescription drug abuse epidemiology called PREDOSE, which when applied to a corpus of over 1 million web forum posts on prescription drug abuse discussions, proved effective in retrieving relevant documents for complex information needs.

Digital Object Identifier (DOI)

APA Citation

Cameron, D. H., Sheth, A. P., Jaykumar, N., Anand, G., Thirunarayan, K., & Smith, G. A. (2013). A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs.