Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation

Brennan Nichyporuk1,2, Jillian Cardinell2,1, Justin Szeto2,1, Raghav Mehta2,1, Jean-Pierre Falet3,1,2, Douglas L. Arnold3,4, Sotirios A. Tsaftaris5,6, Tal Arbel2,1
1: MILA (Quebec Artificial Intelligence Institute), Montreal, Canada, 2: Centre for Intelligent Machines, McGill University, Canada, 3: Department of Neurology and Neurosurgery, McGill University, Canada, 4: NeuroRx Research, Montreal, Canada, 5: School of Engineering, University of Edinburgh, UK, 6: The Alan Turing Institute, UK
Publication date: 2022/12/15
DOI: 10.59275/j.melba.2022-2d93
PDF · arXiv


Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the "ground-truth" label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified.


deep learning · medical image segmentation · multiple sclerosis · label bias · annotation bias · cohort bias · detection bias · observer bias · annotation style · generalization

Bibtex @article{melba:2022:029:nichyporuk, title = "Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation", author = "Nichyporuk, Brennan and Cardinell, Jillian and Szeto, Justin and Mehta, Raghav and Falet, Jean-Pierre and Arnold, Douglas L. and Tsaftaris, Sotirios A. and Arbel, Tal", journal = "Machine Learning for Biomedical Imaging", volume = "1", issue = "December 2022 issue", year = "2022", pages = "1--37", issn = "2766-905X", doi = "10.59275/j.melba.2022-2d93", url = "" }
RISTY - JOUR AU - Nichyporuk, Brennan AU - Cardinell, Jillian AU - Szeto, Justin AU - Mehta, Raghav AU - Falet, Jean-Pierre AU - Arnold, Douglas L. AU - Tsaftaris, Sotirios A. AU - Arbel, Tal PY - 2022 TI - Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation T2 - Machine Learning for Biomedical Imaging VL - 1 IS - December 2022 issue SP - 1 EP - 37 SN - 2766-905X DO - 10.59275/j.melba.2022-2d93 UR - ER -

2022:029 cover