Gewählte Publikation:
Bukovics, J.
Creation of Annotated Datasets for the Development of AI-Supported Colon Polyp-Classification Algorithms
Humanmedizin; [ Diplomarbeit ] Medizinische Universität Graz; 2023. pp. 66
[OPEN ACCESS]
FullText
- Autor*innen der Med Uni Graz:
- Betreuer*innen:
-
Brcic Iva
-
Plass Markus
- Altmetrics:
- Abstract:
- Introduction: The colorectal carcinoma ranks third globally in cancer statistics, preceded by bronchial and mammary carcinomas. Some colon polyps, particularly adenomas and sessile serrated lesions, are considered precursors to colorectal carcinoma. The removal of colon polyps during colonoscopy can significantly reduce the risk of developing colorectal carcinoma. Therefore, the timely detection and their removal play a crucial role in the prevention of colorectal carcinoma. The objective of this thesis is to create an annotated dataset that will serve as the foundation for developing an artificial intelligence-based algorithm.
Methods: The Biobank Graz provided the required slides for this project. Their storage comprises of over 11 million histologic slides with patient data. For this study, a cohort of patients with colon polyps from the years 1984 to 2014 was chosen. Histologic slides of these colon polyps were retrieved and digitized using high-resolution whole slide imaging scanners, followed by anonymization of the data. Further annotations were made using the open-source software QuPath. The dataset comprises of H&E-stained WSIs depicting both colon polyps and normal colonic mucosa. The crop images were exported as rectangular images, with the annotation-polygon indicating the center of each image. All images have a resolution of 1024x1024 pixels.
Results: The dataset comprises 533 whole slide images that were stained with hematoxylin and eosin (H&E). Among them, 33 slides were excluded from the project. In total, 17,937 image samples were collected, with 10,088 representing physiological glands and 7,848 representing dysplastic glands.
Conclusion: Our colon gland dataset is of significant size and quality compared to publicly available datasets that have been used for studies and scientific contests. However, it is important to note that direct comparison of these datasets may be limited due to differences in their intended purposes. Automated analysis will be an important part in digitized histopathology, but diverse tissue structures and subjective evaluations could cause difficulties. Robust computational methods are needed for diagnostic reproducibility. The implementation of medical application-focused deep learning models in digital pathology has the potential to reduce the time and workload of clinicians and pathologists, minimize potential errors, and improve the accuracy of colorectal cancer screening.