Back to 2025 Abstracts
AI OR DOCTORS? COMPARING RESPONSES TO CANCER PATIENTS' INQUIRIES FROM A PUBLIC SOCIAL MEDIA FORUM
Muhammad Anees
*1, Edward A. Joseph
1, Abiha Abdullah
4, Nathan Bahary
3, Patrick Wagner
3, Nabil Wasif
2, Zhi Ven Fong
2, David Bartlett
3, Casey Allen
31Allegheny-Singer Research Institute, Pittsburgh, PA; 2Mayo Clinic Arizona, Scottsdale, AZ; 3Allegheny Health Network, Pittsburgh, PA; 4UPMC, Pittsburgh, PA
IntroductionPublicly accessible large language models (LLMs), including ChatGPT, are increasingly used by patients to access healthcare information. However, their ability to address cancer-related inquiries remains unclear. We compare the quality, empathy, and readability of responses to questions regarding gastrointestinal (GI) cancers between physicians and a popular LLM.
MethodsCancer patient queries were identified from Reddit's r/AskDocs forum. Fifteen questions specific to pancreatic, liver, esophageal, stomach, and colon cancer were selected through predefined criteria. For each query, the corresponding verified physician response was matched with a response generated by ChatGPT. The readability of each response was assessed using the Flesch Reading Ease Score. Twelve US board-certified medical and surgical oncologists, blinded to the source of each response, independently evaluated the quality and empathy of both the LLM and physician responses using a 5-point Likert scale. Additionally, responses were compared between medical and surgical oncologists as well as early-career (<10 years) and late-career (>10 years) oncologists.
ResultsOverall, the readability of LLM responses was lower than that of physician responses (30.9±9.7 vs. 58.8±10.3, p<0.001), with this difference consistent across disease sites. When evaluated by cancer specialists, LLM responses were overall preferred over physician responses in 78.3% (141/180) of assessments. LLM responses were rated higher in quality, with 80.6% (145/180) deemed ‘good’ or ‘very good’ compared to 35% (63/180) of physician responses (p<0.001), with mean scores of 4.24±0.79 for LLM vs 2.99±1.16 for physicians, p<0.001. LLM responses were also rated higher in empathy, with 82.2% (148/180) of LLM responses rated as ‘empathetic’ or ‘very empathetic,’ compared to 17.8% (32/180) of physician responses (p<0.001). The mean scores for empathy were 4.16±1.04 for LLM vs. 2.03±1.30 for physicians, p <0.001. These differences in quality and empathy were consistent across all disease sites (Figure). There was no difference in assessments based on evaluator specialty or experience.
ConclusionWhile LLM responses exhibit lower readability, they have potential in providing high-quality and empathetic responses to cancer patient queries from a public social media forum. These findings present opportunities to enhance LLMs and underscore their potential in improving cancer patient education and communication.
Back to 2025 Abstracts