Date of Award


Document Type

Open Access Dissertation


Background: A growing emphasis in the healthcare industry today is being placed on demonstrating meaningful use of one’s Electronic Health Record (EHR) system. As rates of chronic disease, including diabetes mellitus (DM) rise, it has become clear that accurate and timely disease surveillance could be greatly improved utilizing the technologies available to clinicians today. As the Centers for Medicare and Medicaid Services (CMS) meaningful use incentive program deadlines fast approach, it remains unclear if their limited attestation criteria clearly reflect their end goal of improving patient care. The objective of this research was to determine the diagnostic accuracy of an automated text- based algorithm for identifying patients with diabetes mellitus from the longitudinal PPRNet Database.

Methods: The longitudinal PPRNet database is comprised of McKesson’s Practice Partner, Lytec or Medisoft EHR system users nationwide. The analysis included data from the 115 PPRNet practices that submitted their 4th quarter data extract in January 2014. An unstructured free-text algorithm was used to determine the number of type 2 diabetics among all active adult patients. This algorithm which examines unstructured free-text data documented within the EHR title lines was compared to a previously established protocol which used a combination of ICD-9 diagnostic codes and/or active DM prescriptions

Results: Between all algorithm comparisons, the patients identified as having diabetes varied considerably. Using the combination of ICD-9 diagnostic codes and/or active DM prescriptions as comparison method, the resulting sensitivity was 77.8% and specificity was 97.2% for the free-text definition. Using diagnostic codes alone as the standard for comparison resulted in a much higher sensitivity (99.3%), and lower specificity (91.9%). However, when we compared the free-text definition to the ICD-9 diagnostic codes alone, 70% of free-text identified cases were found to be un-coded.

Conclusions: As EHR use continues to rise, it is crucial that we continue to develop ways to accurately translate patient data out of these systems in order to meaningfully utilize these powerful technologies. This thesis has helped clarify the need for further development of accurate data translation platforms in order to capture each patient’s full and unique health story as well as for monitoring treatment and outcomes all while minimizing physician burden.


© 2014, Vanessa L. Congdon