Background: Melena and hematochezia are two distinct forms of bleeding associated with upper gastrointestinal (GI) and lower GI disease, respectively. Diagnostic codes are not available to distinguish between them in structured electronic health record (EHR) data. Unstructured clinical notes provide an opportunity to identify and distinguish between these types of GI bleeding in real-world data sources.
Objectives: To develop an N-gram model to differentiate between melena and hematochezia encounters using unstructured clinical notes from EHRs, and to validate the model using associated diagnoses, procedures, and treatments.
Methods: This retrospective observational study used data from the OMNY Health real-world data platform (United States, 2017–2025). Patients with an International Classification of Diseases, 10th Revision (ICD-10) code starting with “K” (GI disease) or “E” (endocrine disease) were included. A clinical domain expert reviewed 500 random clinical notes belonging to patients with K92.1 or K92.2 codes for phrases indicating melena or hematochezia. These phrases were used to search notes of all included patient encounters to generate melena and hematochezia encounters. Encounters indicating both types of bleeding were excluded. Two validation outcomes were used: (1) percent of patients with upper/lower GI tract disease diagnosis (ICD-10: K20 – K31 vs. K50 – K64) and (2) percent of patients undergoing upper GI endoscopy (EGD) / colonoscopy [Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System codes]. Pharmacologic treatments given procedurally (using “J” CPT codes) were also qualitatively compared. All outcomes were measured within 30 days following the corresponding encounter start date.
Results: Initial review yielded 18 and 30 phrases indicating melena and hematochezia with final patient cohort sizes of 41,226 and 43,924, respectively. Patients with melena were more likely to have a diagnosis code for upper GI disease than for lower GI disease (35.8% vs. 24.7%) while the opposite was true for patients with hematochezia (14.8% vs. 75.1%). For patients with melena, 27.8% underwent EGD and 8.1% underwent colonoscopy compared to 8.6% and 14.5% for the patients with hematochezia, respectively. Propofol was the most common medication given to both cohorts (43.6% for melena; 29.0% for hematochezia).
Conclusions: Patients with melena were more/less likely to have an associated upper/lower GI diagnosis and more/less likely to undergo EGD/colonoscopy, respectively, than patients with hematochezia, in line with expectations. These findings provide evidence to validate the N-gram model used to differentiate between episodes of melena and hematochezia using unstructured EHR data.