Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model
Keywords:
Bayesian Neural Network, Classification, Differentiated thyroid cancer recurrence, Machine Learning, SHAP, Uncertainty quantificationAbstract
IntroductionDifferentiated thyroid cancer (DTC) recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty-aware.
Methods
This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning (ML) models were employed using the complete dataset, where the Support Vector Machines (SVM) model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression (LR) model obtained the maximum accuracy of 0.9611.
Results
To address the limitation of ML models lacking uncertainty quantification, Bayesian Neural Networks (BNN) with six varying prior distributions, including Normal (0,1), Normal (0,10), Laplace (0,1), Cauchy (0,1), Cauchy (0,2.5), and Horseshoe (1), were implemented on both the complete and reduced datasets. The BNN model with Normal (0,10) prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively. As the BNN model with (0,10) after feature selection outperformed other models, it was chosen as the best-performing model for DTC recurrence classification.
Discussion
The selected BNN model was further analyzed using epistemic and aleatoric uncertainty, reflecting the model’s confidence in its prediction. In addition, to enhance this model's interpretability, SHapley Additive exPlanations (SHAP) values were calculated, providing valuable insights into the contribution of key variables to the model’s output.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nadeesha Shyami Kumari Herath Mudiyanselage, Malithi Nawarathne, UMMPK Nawarathne

This work is licensed under a Creative Commons Attribution 4.0 International License.