Back to Annual Meeting Program
Automatic Cancer Staging for Esophageal Pathological Reports by Text Mining and Data Mining—Comparison Between AJCC 6th and 7th Editions
Yung-Han Sun*1,2, Chih-Cheng Hsieh1,3, Chun-Hsien Chen1,2, Shih-Wei Lin1,2 1Department of Surgery, Taipei-Veterans General Hospital, Taipei, Taiwan; 2Department of Information Management, Chang Gung University, Taoyuan, Taiwan; 3School of Medicine, National Yang-Ming University, Taipei, Taiwan
Backgrounds: Cancer staging by manual interpretation of pathological report is very time-consuming. In our previous research, text mining and data mining techniques were applied to automatic staging of esophageal cancer for pathological reports according to the 6th edition American Joint Committee on Cancer (AJCC) cancer staging system. Since the staging system is updated every several years, how to quickly and accurately transform the old stages into new stages becomes an important issue. The nodal status for esophageal cancer staging in the 6th edition was just grouped into positive (N1) and negative (N0), but it was different in the 7th edition. The aim of this study was to compare the results of the automatic cancer staging model using new staging edition with those based on the old staging edition. Methods: Pathological reports of 234 patients undergone esophagectomy were collected in this study. All the pathological reports were collected and entered into Access database as text file. The reports were compuationally converted into weighted frequency vectors of keywords by using text mining techniques to analyze cancer staging related keywords in the reports. Lymph node metastasis status N of a pathology report were derived from the total number of positive lymph nodes and the distal metastasis status (M) were also modified by analyzing the text keywords of the pathology report computationally. J48 decision tree learning algorithm was used to train the classification model for cancer staging. One third of the data was used for training and two thirds of the data was used for testing in evaluating the prediction performance of the model. Results: The results were shown in Table 1. The prediction accuracies for cell type and T status nearly did not change, and the prediction accuracies for N and M status reached 91.9% and 95.3% respectively. Comparison with the accuracies for predicting N and M status based on the 6th edition of AJCC cancer staging guideline, those based on the new edition decreased just a little. Conclusions: This study provides a computational model for automatic cancer staging of esophageal pathological reports according to the 7th edition American Joint Committee on Cancer (AJCC) cancer staging system. In the future, we hope to apply this automatic cancer staging model to pathological reports of other cancers and collect clinical data for other text file reports. The prediction accuracies of the proposed automatic cancer staging model for esophageal cancer based on different editions of AJCC cancer staging system | Based on 7th edition | Based on 6th edition | Cell type | 97.5% | 97.5% | Tumor Depth status (T) | 88.5% | 88.5% | Lymph node metastasis status (N) | 91.9% | 95.0% | Distant metastasis status (M) | 95.3% | 96.3% |
Back to Annual Meeting Program
|