Back to 2025 Abstracts
LARGE LANGUAGE MODELS ENABLE ACCURATE DATA EXTRACTION AND CURATION FROM SYNOPTIC RADIOLOGY REPORTS FOR PANCREATIC CYST SURVEILLANCE
Ankur P. Choubey
*1, Emanuel Eguia
1, Alexander Hollingsworth
2, Subrata Chatterjee
2, Remo Alessandris
1, Misha Armstrong
1, Emily Manin
1, Lily V. Saadat
1, Jenny Flood
1, Avijit Chatterjee
2, Vinod Balachandran
1, Jeffrey Drebin
1, T. Peter Kingham
1, Michael D'Angelica
1, William Jarnagin
1, Alice Wei
1, Vineet S. Rolston
3, Mark A. Schattner
3, Kevin Soares
11Hepatopancreatobiliary Surgery, Memorial Sloan Kettering Cancer Center Department of Surgery, New York, NY; 2Memorial Sloan Kettering Cancer Center, New York, NY; 3Memorial Sloan Kettering Cancer Center Department of Medicine, New York, NY4
Introduction: Intraductal mucinous neoplasm (IPMN) are pre-malignant lesions that require long-term surveillance. Manual curation of radiographic features in cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. Our aim was to examine the accuracy and feasibility of using large language models (LLMs) to extract clinical variables from radiology reports.
Methods: A single center retrospective study was performed including all patients under surveillance for pancreatic cysts. Five radiographic elements used to monitor cyst progression were included for evaluation: cyst size, main pancreatic duct (MPD) dilation ?5mm, MPD size, branch duct dilation, and presence of a solid component. LLMs on the OpenAI GPT-4 platform were employed to extract elements of interest using a zero-shot learning approach without any training data with prompting to facilitate annotation. A manually annotated institutional cyst database was used as the gold standard for comparison and to determine accuracy, sensitivity, and specificity.
Results: Overall, 3199 scans from 991 patients were included. LLMs successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, LLMs demonstrated accuracy rates of 98% for MPD dilation, 95% for branch duct dilation, and 97% for solid component compared to the manually annotated database. Accuracy rates for numerical data were 91% for cyst size and 97% for MPD size. Sensitivity ranged from 72% for presence of solid component to 97% for cyst size. Specificity varied from 89% for cyst size to 99% for presence of solid component.
Conclusion: LLMs can accurately extract and curate data from synoptic radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.
Back to 2025 Abstracts