SSAT - AI OR DOCTORS? COMPARING RESPONSES TO CANCER PATIENTS' INQUIRIES FROM A PUBLIC SOCIAL MEDIA FORUM

Back to 2025 Abstracts

AI OR DOCTORS? COMPARING RESPONSES TO CANCER PATIENTS' INQUIRIES FROM A PUBLIC SOCIAL MEDIA FORUM
Muhammad Anees^*¹, Edward A. Joseph¹, Abiha Abdullah⁴, Nathan Bahary³, Patrick Wagner³, Nabil Wasif², Zhi Ven Fong², David Bartlett³, Casey Allen³
¹Allegheny-Singer Research Institute, Pittsburgh, PA; ²Mayo Clinic Arizona, Scottsdale, AZ; ³Allegheny Health Network, Pittsburgh, PA; ⁴UPMC, Pittsburgh, PA

Introduction
Publicly accessible large language models (LLMs), including ChatGPT, are increasingly used by patients to access healthcare information. However, their ability to address cancer-related inquiries remains unclear. We compare the quality, empathy, and readability of responses to questions regarding gastrointestinal (GI) cancers between physicians and a popular LLM.
Methods
Cancer patient queries were identified from Reddit's r/AskDocs forum. Fifteen questions specific to pancreatic, liver, esophageal, stomach, and colon cancer were selected through predefined criteria. For each query, the corresponding verified physician response was matched with a response generated by ChatGPT. The readability of each response was assessed using the Flesch Reading Ease Score. Twelve US board-certified medical and surgical oncologists, blinded to the source of each response, independently evaluated the quality and empathy of both the LLM and physician responses using a 5-point Likert scale. Additionally, responses were compared between medical and surgical oncologists as well as early-career (<10 years) and late-career (>10 years) oncologists.
Results
Overall, the readability of LLM responses was lower than that of physician responses (30.9�9.7 vs. 58.8�10.3, p<0.001), with this difference consistent across disease sites. When evaluated by cancer specialists, LLM responses were overall preferred over physician responses in 78.3% (141/180) of assessments. LLM responses were rated higher in quality, with 80.6% (145/180) deemed �good� or �very good� compared to 35% (63/180) of physician responses (p<0.001), with mean scores of 4.24�0.79 for LLM vs 2.99�1.16 for physicians, p<0.001. LLM responses were also rated higher in empathy, with 82.2% (148/180) of LLM responses rated as �empathetic� or �very empathetic,� compared to 17.8% (32/180) of physician responses (p<0.001). The mean scores for empathy were 4.16�1.04 for LLM vs. 2.03�1.30 for physicians, p <0.001. These differences in quality and empathy were consistent across all disease sites (Figure). There was no difference in assessments based on evaluator specialty or experience.
Conclusion
While LLM responses exhibit lower readability, they have potential in providing high-quality and empathetic responses to cancer patient queries from a public social media forum. These findings present opportunities to enhance LLMs and underscore their potential in improving cancer patient education and communication.

Back to 2025 Abstracts