SSAT - LARGE LANGUAGE MODELS ENABLE ACCURATE DATA EXTRACTION AND CURATION FROM SYNOPTIC RADIOLOGY REPORTS FOR PANCREATIC CYST SURVEILLANCE

Back to 2025 Abstracts

LARGE LANGUAGE MODELS ENABLE ACCURATE DATA EXTRACTION AND CURATION FROM SYNOPTIC RADIOLOGY REPORTS FOR PANCREATIC CYST SURVEILLANCE
Ankur P. Choubey^*¹, Emanuel Eguia¹, Alexander Hollingsworth², Subrata Chatterjee², Remo Alessandris¹, Misha Armstrong¹, Emily Manin¹, Lily V. Saadat¹, Jenny Flood¹, Avijit Chatterjee², Vinod Balachandran¹, Jeffrey Drebin¹, T. Peter Kingham¹, Michael D'Angelica¹, William Jarnagin¹, Alice Wei¹, Vineet S. Rolston³, Mark A. Schattner³, Kevin Soares¹
¹Hepatopancreatobiliary Surgery, Memorial Sloan Kettering Cancer Center Department of Surgery, New York, NY; ²Memorial Sloan Kettering Cancer Center, New York, NY; ³Memorial Sloan Kettering Cancer Center Department of Medicine, New York, NY⁴

Introduction: Intraductal mucinous neoplasm (IPMN) are pre-malignant lesions that require long-term surveillance. Manual curation of radiographic features in cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. Our aim was to examine the accuracy and feasibility of using large language models (LLMs) to extract clinical variables from radiology reports.
Methods: A single center retrospective study was performed including all patients under surveillance for pancreatic cysts. Five radiographic elements used to monitor cyst progression were included for evaluation: cyst size, main pancreatic duct (MPD) dilation ?5mm, MPD size, branch duct dilation, and presence of a solid component. LLMs on the OpenAI GPT-4 platform were employed to extract elements of interest using a zero-shot learning approach without any training data with prompting to facilitate annotation. A manually annotated institutional cyst database was used as the gold standard for comparison and to determine accuracy, sensitivity, and specificity.
Results: Overall, 3199 scans from 991 patients were included. LLMs successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, LLMs demonstrated accuracy rates of 98% for MPD dilation, 95% for branch duct dilation, and 97% for solid component compared to the manually annotated database. Accuracy rates for numerical data were 91% for cyst size and 97% for MPD size. Sensitivity ranged from 72% for presence of solid component to 97% for cyst size. Specificity varied from 89% for cyst size to 99% for presence of solid component.
Conclusion: LLMs can accurately extract and curate data from synoptic radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.

Back to 2025 Abstracts