XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models

Dimitar Jetchev; Marius Vuille

doi:10.62056/a3qjmp-3y

XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models

Authors

Dimitar Jetchev, Marius Vuille

Dimitar Jetchev

Input Output Global (IOG), Lausanne, Switzerland
dimitar dot jetchev at iohk dot io

Marius Vuille

Arcium, Zug, Switzerland
marius at arcium dot com

Keywords: Explainable AI model explainability gradient boosting decision trees SHAP values secure multiparty computation

Abstract

Explainable AI (XAI) refers to the development of AI systems and machine learning models in a way that humans can understand, interpret and trust the predictions, decisions and outputs of these models. A common approach to explainability is feature importance, that is, determining which input features of the model have the most significant impact on the model prediction. Two major techniques for computing feature importance are LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). While very generic, these methods are computationally expensive even when the data is not encrypted. Applying them in the privacy-preserving setting when part or all of the input data is private is therefore a major computational challenge. In this paper, we present XorSHAP - the first practical data-oblivious algorithm for computing SHAP values for decision tree ensemble models. The algorithm is applicable in various privacy-preserving settings such as SMPC, FHE and differential privacy. Our algorithm has complexity $O(T \widetilde{M} D 2^D)$, where $T$ is the number of decision trees in the ensemble, $D$ is the depth of the decision trees and $\widetilde{M}$ is the maximum of the number of features $M$ and $2^D$ (the number of leaf nodes of a tree), and scales to real-world datasets. We implement the algorithm in the semi-honest Secure Multiparty Computation (SMPC) setting with full threshold using Inpher's Manticore framework. Our implementation simultaneously computes the SHAP values for 100 samples for an ensemble of $T = 60$ trees of depth $D = 4$ and $M = 100$ features in just 7.5 minutes, meaning that the SHAP values for a single prediction are computed in just 4.5 seconds for the same decision tree ensemble model. Additionally, it is parallelization-friendly, thus, enabling future work on massive hardware acceleration with GPUs.

References

[Act23]

EU AI Act. European Union Artificial Intelligence Act. https://artificialintelligenceact.eu/, 2023.

Google Scholar ePrint

[aM23]

Scale-and-Mamba. Scale and Mamba. https://github.com/KULeuven-COSIC/SCALE-MAMBA. 2023.

Google Scholar ePrint

[BCD⁺23]

M. Georgieva Belorgey, S. Carpov, K. Deforth, D. Jetchev, A. Sae-Tang, M. Vuille, N. Gama, J. Katz, I. Leontiadis, and M. Mohammadi. Manticore: A Framework for Efficient Multiparty Computation Supporting Real Number and Boolean Arithmetic. J. Cryptol., 36(3):31, 2023. DOI: https://doi.org/10.1007/s00145-023-09464-4

Google Scholar ePrint

[BCG⁺18]

Christina Boura, Ilaria Chillotti, Nicolas Gama, Dimitar Jetchev, Stanislav Peceny, and Alexander Petric. High-Precision Privacy-Preserving Real-Valued Function Evaluation. In Financial Cryptography and Data Security, pages 183–202, Berlin, Heidelberg. 2018. Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/978-3-662-58387-6_10

Google Scholar ePrint

[BIS⁺22]

A. Bogdanova, A. Imakura, T. Sakurai, T. Fujii, T. Sakamoto, and H. Abe. Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration. CoRR, abs/2212.03373, 2022. DOI: 10.48550/arXiv.2212.03373

Google Scholar ePrint

[BLW08]

D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In European Symposium on Research in Computer Security, pages 192–206. 2008. Springer. DOI: https://doi.org/10.1007/978-3-540-88313-5_13

Google Scholar ePrint

[CDE⁺18]

R. Cramer, I. Damgård, D. Escudero, P. Scholl, and C. Xing. SPD$\mathbb{Z}_{2^k}$: Efficient MPC mod $2^k$ for Dishonest Majority. In Advances in Cryptology – CRYPTO 2018, pages 769–798. 2018. DOI: https://doi.org/10.1007/978-3-319-96881-0_26

Google Scholar ePrint

[DDG⁺22]

K. Deforth, M. Desgroseilliers, N. Gama, M. Georgieva, D. Jetchev, and M. Vuille. XORBoost: Tree Boosting in the Multiparty Computation Setting. Proc. Priv. Enhancing Technol., 2022(4):66–85, 2022. DOI: https://doi.org/10.56553/popets-2022-0099

Google Scholar ePrint

[DH12]

W.J. Dally and R.C. Harting. Digital Design: A Systems Approach. Cambridge University Press 2012.

Google Scholar ePrint

[DPSZ12]

I. Damgård, V. Pastro, N. Smart, and S. Zakarias. Multiparty computation from somewhat homomorphic encryption. In Annual Cryptology Conference, pages 643–662. 2012. Springer. DOI: https://doi.org/10.1007/978-3-642-32009-5_38

Google Scholar ePrint

[EDPS23]

European Data Protection Supervisor. EDPS TechDispatch on Explainable AI. www.edps.europa.eu/system/files/2023-11/23-11-16_techdispatch_xai_en.pdf, 2023.

Google Scholar ePrint

[EGK⁺20]

D. Escudero, S. Ghosh, M. Keller, R. Rachuri, and P. Scholl. Improved Primitives for MPC over Mixed Arithmetic-Binary Circuits. In 40th Annual International Cryptology Conference, CRYPTO, volume 12171 of Lecture Notes in Computer Science, pages 823–852. 2020. DOI: https://doi.org/10.1007/978-3-030-56880-1_29

Google Scholar ePrint

[JSTV23]

D. Jetchev, A. Sae-Tang, and M. Vuille. Balancing Privacy and Explainable AI in Semiconductor Manufacturing. https://inpher.io/blog/xai-semiconductor-manufacturing/, 2023.

Google Scholar ePrint

[Kel20]

M. Keller. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1575–1590. 2020. DOI: https://doi.org/10.1145/3372297.3417872

Google Scholar ePrint

[KOS16]

M. Keller, E. Orsini, and P. Scholl. MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 830–842. 2016. DOI: https://doi.org/10.1145/2976749.2978357

Google Scholar ePrint

[KPR18]

M. Keller, V. Pastro, and D. Rotaru. Overdrive: Making SPDZ Great Again. In EUROCRYPT 2018, volume 10822 of Lecture Notes in Computer Science, pages 158–189. 2018. DOI: https://doi.org/10.1007/978-3-319-78372-7_6

Google Scholar ePrint

[LEC⁺20]

S. Lundberg, G. Erion, H. Chen, A. DeGrave, J. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2(1):56–67, 2020. DOI: https://doi.org/10.1038/s42256-019-0138-9

Google Scholar ePrint

[LEL18]

S.. Lundberg, G. Erion, and S. Lee. Consistent Individualized Feature Attribution for Tree Ensembles. CoRR, abs/1802.03888, 2018.

Google Scholar ePrint

[LL17]

S. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774. 2017.

Google Scholar ePrint

[MFH22]

Rory Mitchell, Eibe Frank, and Geoffrey Holmes. GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles. PeerJ Computer Science, 8:e880, April 2022. DOI: 10.7717/peerj-cs.880

Google Scholar ePrint

[Mol22]

C. Molnar. Interpretable Machine Learning. , 2 edition. 2022.

Google Scholar ePrint

[MR18]

P. Mohassel and P. Rindal. ABY${}^{\mbox{3}}$: A Mixed Protocol Framework for Machine Learning. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 35–52. 2018. ACM. DOI: 10.1145/3243734.3243760

Google Scholar ePrint

[MZ17]

P. Mohassel and Y. Zhang. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 19–38. 2017. IEEE Computer Society. DOI: 10.1109/SP.2017.12

Google Scholar ePrint

[PSSY21]

A. Patra, T. Schneider, A. Suresh, and H. Yalame. ABY2. 0: Improved mixed-protocol secure two-party computation. In 30th USENIX Security Symposium. 2021.

Google Scholar ePrint

[RSG16]

M. Túlio Ribeiro, S. Singh, and C. Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the Demonstrations Session, NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 97–101. 2016. The Association for Computational Linguistics. DOI: 10.18653/v1/N16-3020

Google Scholar ePrint

[Sha53]

L. Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317, 1953. DOI: https://doi.org/10.1515/9781400881970-018

Google Scholar ePrint

[SK14]

E. Strumbelj and I. Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst., 41(3):647–665, 2014. DOI: 10.1007/s10115-013-0679-x

Google Scholar ePrint

[TLL⁺22]

Z. Tian, J. Liu, J. Li, X. Cao, R. Jia, and K. Ren. Private Data Valuation and Fair Payment in Data Marketplaces. CoRR, abs/2210.08723, 2022. DOI: 10.48550/arXiv.2210.08723

Google Scholar ePrint

[Wan19]

Guan Wang. Interpret Federated Learning with Shapley Values. May 2019.

Google Scholar ePrint

[WAYS22]

Lauren Watson, Rayna Andreeva, Hao-Tsung Yang, and Rik Sarkar. Differentially Private Shapley Values for Data Evaluation. June 2022.

Google Scholar ePrint

[WGC19]

S. Wagh, D. Gupta, and N. Chandran. SecureNN: 3-party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies, 2019(3):26–49, 2019. DOI: https://doi.org/10.2478/popets-2019-0035

Google Scholar ePrint

[Yan22]

Jilei Yang. Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. 2022.

Google Scholar ePrint

PDF Open access

DOI: https://doi.org/10.62056/a3qjmp-3y

History

Submitted: 2024-10-08
Accepted: 2024-12-03
Published: 2025-01-13

How to cite

Dimitar Jetchev and Marius Vuille, XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models. IACR Communications in Cryptology, vol. 1, no. 4, Jan 13, 2025, doi: 10.62056/a3qjmp-3y.

License

Copyright is held by the author(s)

This work is licensed under a Creative Commons Attribution (CC BY) license.

@article{10.62056/a3qjmp-3y,
         author={Dimitar Jetchev and Marius Vuille},
         title={Xor{SHAP}: Privacy-Preserving Explainable {AI} for Decision Tree Models},
         volume={1},
         number={4},
         year={2025},
         date={2025-01-13},
         issn={3006-5496},
         doi={10.62056/a3qjmp-3y},
         journal={{IACR} Communications in Cryptology},
         publisher={International Association for Cryptologic Research}
}

TY - JOUR
AU - Dimitar Jetchev
AU - Marius Vuille
PY - 2025
TI - XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models
JF - IACR Communications in Cryptology
JA - CIC
VL - 1
IS - 4
DO - 10.62056/a3qjmp-3y
UR - https://doi.org/10.62056/a3qjmp-3y
AB - <p>Explainable AI (XAI) refers to the development of AI systems and machine learning models in a way that humans can understand, interpret and trust the predictions, decisions and outputs of these models. A common approach to explainability is feature importance, that is, determining which input features of the model have the most significant impact on the model prediction. Two major techniques for computing feature importance are LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). While very generic, these methods are computationally expensive even when the data is not encrypted. Applying them in the privacy-preserving setting when part or all of the input data is private is therefore a major computational challenge. In this paper, we present XorSHAP - the first practical data-oblivious algorithm for computing SHAP values for decision tree ensemble models. The algorithm is applicable in various privacy-preserving settings such as SMPC, FHE and differential privacy. Our algorithm has complexity $O(T \widetilde{M} D 2^D)$, where $T$ is the number of decision trees in the ensemble, $D$ is the depth of the decision trees and $\widetilde{M}$ is the maximum of the number of features $M$ and $2^D$ (the number of leaf nodes of a tree), and scales to real-world datasets. We implement the algorithm in the semi-honest Secure Multiparty Computation (SMPC) setting with full threshold using Inpher's Manticore framework. Our implementation simultaneously computes the SHAP values for 100 samples for an ensemble of $T = 60$ trees of depth $D = 4$ and $M = 100$ features in just 7.5 minutes, meaning that the SHAP values for a single prediction are computed in just 4.5 seconds for the same decision tree ensemble model. Additionally, it is parallelization-friendly, thus, enabling future work on massive hardware acceleration with GPUs. </p>
ER -

Dimitar Jetchev and Marius Vuille, XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models. IACR Communications in Cryptology, vol. 1, no. 4, Jan 13, 2025, doi: 10.62056/a3qjmp-3y.