Introduction: Recent microarray studies have shown that cancer classification by gene expression profiling is feasible and provides clinicians with additional information to choose the most appropriate forms of treatment. Comparison of these results is complicated due to a variety of different unvalidated statistical approaches. The aim of this study was the identification and validation of gene signatures with potential prognostic value in colorectal cancer and the establishment of the applied statistical algorithm.
Material and Methods: Hybridization of 70 laser-microdissected colorectal cancer samples (UICC I n=10, UICC II n= 25, UICC III n= 27, UICC IV n=8) and 28 corresponsding normal tissues in Affymetrix Technology (U133A) after linear amplification and Biotin labelling. “Genetic algorithm”, using K-Nearest Neighbour-Classification was used to identify diagnostically relevant probeset combinations (classifier) to classify patients with sporadic colorectal cancer into stages without nodal and distant metastasis (UICC I / II) and patients with nodal and distant metastasis (UICC III / IV). Results: Algorithm was feeded with the 5% top and 5% botttom probesets after statistical ranking (Golub, Wilcoxon, foldchange). Thus 2228 probesets have been used as a starting pool. Discriminating probeset combinations have been identified in a training data set and checked using a non-overlapping test set. Probesets classifying correctly in more than 99% could be identified for tumor vs. normal tissue distinction and for UICC I / II vs. III / IV distinction in the training set. In non-overlapping test-set tumor / normal classifier performed well (90%), whereas UICC classifier only reached 60% performance. Conclusion: In conclusion most likely generalization properties of the signatures are poor because data representativity is not sufficient using the this approach. Sample number is quite appropriate to identify differentially expressed genes between tumor and normal tissues but it is likely to be insufficient to reliably reveal differentially expressed genes between distinct prognostic stages based on UICC.