Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals
Susu Sun1
, Stefano Woerner1, Andreas Maier2
, Lisa M. Koch3,4
, Christian F. Baumgartner1,5
1: Cluster of Excellence: Machine Learning - New Perspectives for Science, University of Tübingen, Tübingen, Germany, 2: Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 3: Hertie Institute for Artificial Intelligence in Brain Health, University of Tübingen, Tübingen, Germany, 4: Department of Diabetes, Endocrinology, Nutritional Medicine and Metabolism, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland, 5: Faculty of Health Sciences and Medicine, University of Lucerne, Lucerne, Switzerland
Publication date: 2025/10/30
https://doi.org/10.59275/j.melba.2025-gb33
Abstract
Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand neural networks but have been shown to suffer from conceptual problems. Moreover, current research largely focuses on providing local explanations for individual samples rather than global explanations for the model itself. In this paper, we propose Attri-Net, an inherently interpretable model for multi-label classification that provides both local and global explanations. Attri-Net first counterfactually generates class-specific attribution maps to highlight the disease evidence, then performs classification with logistic regression classifiers based solely on the attribution maps. Local explanations for each prediction can be obtained by interpreting the attribution maps weighted by the classifiers’ weights. Global explanation of whole model can be obtained by jointly considering learned average representations of the attribution maps for each class (called the class centers) and the weights of the linear classifiers. To ensure the model is “right for the right reason”, we introduce a mechanism to guide the model’s explanations to align with human knowledge. Our comprehensive evaluations show that Attri-Net can generate high-quality explanations consistent with clinical knowledge while not sacrificing classification performance. Our code is available at https://github.com/ss-sun/Attri-Net-V2
Keywords
Explainable machine learning · Inherently interpretable model · Multi-label classification · Model guidance
Bibtex
@article{melba:2025:028:sun,
title = "Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals",
author = "Sun, Susu and Woerner, Stefano and Maier, Andreas and Koch, Lisa M. and Baumgartner, Christian F.",
journal = "Machine Learning for Biomedical Imaging",
volume = "3",
issue = "October 2025 issue",
year = "2025",
pages = "636--664",
issn = "2766-905X",
doi = "https://doi.org/10.59275/j.melba.2025-gb33",
url = "https://melba-journal.org/2025:028"
}
RIS
TY - JOUR
AU - Sun, Susu
AU - Woerner, Stefano
AU - Maier, Andreas
AU - Koch, Lisa M.
AU - Baumgartner, Christian F.
PY - 2025
TI - Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals
T2 - Machine Learning for Biomedical Imaging
VL - 3
IS - October 2025 issue
SP - 636
EP - 664
SN - 2766-905X
DO - https://doi.org/10.59275/j.melba.2025-gb33
UR - https://melba-journal.org/2025:028
ER -