Society for Surgery of the Alimentary Tract

SSAT Home SSAT Home Past & Future Meetings Past & Future Meetings
Facebook X Linkedin YouTube

Back to 2025 Posters


ARTIFICIAL INTELLIGENCE PERFORMANCE ON PATIENT-LEVEL EDUCATION OF INFLAMMATORY BOWEL DISEASE
Makenna Marty*, Seija Maniskas, Jonathan Zuo, Adam Truong
Colon and Rectal Surgery, Huntington Health, Los Angeles, CA

Purpose/Background
Artificial intelligence (AI), though still in early stages in healthcare applications, shows promise as a valuable resource for patients due to its broad accessibility. However, its accuracy and reliability in providing medical information is largely untested. We aim to evaluate the ability of ChatGPT, the most widely recognized AI platform, to provide accurate patient-level information for Crohn’s disease (CD) and ulcerative colitis (UC). Our hypothesis is that ChatGPT can provide some accurate information, but will not be able to distinguish between reliable medical sources and disinformation.

Methods
A prospective, comparative study was conducted evaluating ChatGPT’s responses against validated medical literature sources, including PubMed and UpToDate, for CD and UC. Information quality and accuracy for each topic of interest was assessed using the Modified Ensuring Quality Information for Patients (EQIP) tool, a quality assessment scoring system that evaluates the reliability of written healthcare communication for patients. The Modified EQIP consists of 36 standardized items within three domains: content, identification, and structure. AI answers were recorded and awarded 1 point per Modified EQIP item for correct and complete answers, or given no points for incorrect, incomplete, or contradictory answers or for not applicable items. The process was repeated 3 times per EQIP item per topic. All answers were independently assessed by multiple authors and aggregated.

Results
ChatGPT had an overall aggregate score of 55% on the Modified EQIP for all topics, domains, and reviewers. It scored highest in the content domain overall (total 60%), where informational questions are answered. It scored consistently lowest in the identification domain (total 41%), where the Modified EQIP emphasizes questions about sources (Figure 1). ChatGPT performed similarly for CD and UC (56% and 54% respectively, P > 0.99). Points were deducted mainly for inclusion of non-evidence-based treatments and being unable to provide resources of information.

Conclusion/Discussion
ChatGPT appears to deliver mostly accurate information at an accessible reading and comprehension level. However, it demonstrated notable deficiencies, including the recommendation of treatment modalities that are not evidence-based and failure to produce reliable resources and links. It also lacked important nuance in understanding inflammatory bowel disease care, such as stratification of disease severity. Given these limitations, we advise caution before recommending ChatGPT as a reliable source of medical information for inflammatory bowel disease.
It improves legibility to be consistent with text orders. So the figure has CD then UC throughout. Our results also list CD then UC. Everything should sort of match.


Back to 2025 Posters