Communications in Cryptology IACR CiC

XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models

Authors

Dimitar Jetchev, Marius Vuille
Dimitar Jetchev
Input Output Global (IOG), Lausanne, Switzerland
dimitar dot jetchev at iohk dot io
Marius Vuille
Arcium, Zug, Switzerland
marius at arcium dot com

Abstract

Explainable AI (XAI) refers to the development of AI systems and machine learning models in a way that humans can understand, interpret and trust the predictions, decisions and outputs of these models. A common approach to explainability is feature importance, that is, determining which input features of the model have the most significant impact on the model prediction. Two major techniques for computing feature importance are LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). While very generic, these methods are computationally expensive even when the data is not encrypted. Applying them in the privacy-preserving setting when part or all of the input data is private is therefore a major computational challenge. In this paper, we present XorSHAP - the first practical data-oblivious algorithm for computing SHAP values for decision tree ensemble models. The algorithm is applicable in various privacy-preserving settings such as SMPC, FHE and differential privacy. Our algorithm has complexity $O(T \widetilde{M} D 2^D)$, where $T$ is the number of decision trees in the ensemble, $D$ is the depth of the decision trees and $\widetilde{M}$ is the maximum of the number of features $M$ and $2^D$ (the number of leaf nodes of a tree), and scales to real-world datasets. We implement the algorithm in the semi-honest Secure Multiparty Computation (SMPC) setting with full threshold using Inpher's Manticore framework. Our implementation simultaneously computes the SHAP values for 100 samples for an ensemble of $T = 60$ trees of depth $D = 4$ and $M = 100$ features in just 7.5 minutes, meaning that the SHAP values for a single prediction are computed in just 4.5 seconds for the same decision tree ensemble model. Additionally, it is parallelization-friendly, thus, enabling future work on massive hardware acceleration with GPUs.

References

[Act23]
EU AI Act. European Union Artificial Intelligence Act. https://artificialintelligenceact.eu/, 2023.
[aM23]
Scale-and-Mamba. Scale and Mamba. https://github.com/KULeuven-COSIC/SCALE-MAMBA. 2023.
[BCD+23]
M. Georgieva Belorgey, S. Carpov, K. Deforth, D. Jetchev, A. Sae-Tang, M. Vuille, N. Gama, J. Katz, I. Leontiadis, and M. Mohammadi. Manticore: A Framework for Efficient Multiparty Computation Supporting Real Number and Boolean Arithmetic. J. Cryptol., 36(3):31, 2023. DOI: https://doi.org/10.1007/s00145-023-09464-4
[BCG+18]
Christina Boura, Ilaria Chillotti, Nicolas Gama, Dimitar Jetchev, Stanislav Peceny, and Alexander Petric. High-Precision Privacy-Preserving Real-Valued Function Evaluation. In Financial Cryptography and Data Security, pages 183–202, Berlin, Heidelberg. 2018. Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/978-3-662-58387-6_10
[BIS+22]
A. Bogdanova, A. Imakura, T. Sakurai, T. Fujii, T. Sakamoto, and H. Abe. Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration. CoRR, abs/2212.03373, 2022. DOI: 10.48550/arXiv.2212.03373
[BLW08]
D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In European Symposium on Research in Computer Security, pages 192–206. 2008. Springer. DOI: https://doi.org/10.1007/978-3-540-88313-5_13
[CDE+18]
R. Cramer, I. Damgård, D. Escudero, P. Scholl, and C. Xing. SPD$\mathbb{Z}_{2^k}$: Efficient MPC mod $2^k$ for Dishonest Majority. In Advances in Cryptology – CRYPTO 2018, pages 769–798. 2018. DOI: https://doi.org/10.1007/978-3-319-96881-0_26
[DDG+22]
K. Deforth, M. Desgroseilliers, N. Gama, M. Georgieva, D. Jetchev, and M. Vuille. XORBoost: Tree Boosting in the Multiparty Computation Setting. Proc. Priv. Enhancing Technol., 2022(4):66–85, 2022. DOI: https://doi.org/10.56553/popets-2022-0099
[DH12]
W.J. Dally and R.C. Harting. Digital Design: A Systems Approach. Cambridge University Press 2012.
[DPSZ12]
I. Damgård, V. Pastro, N. Smart, and S. Zakarias. Multiparty computation from somewhat homomorphic encryption. In Annual Cryptology Conference, pages 643–662. 2012. Springer. DOI: https://doi.org/10.1007/978-3-642-32009-5_38
[EDPS23]
European Data Protection Supervisor. EDPS TechDispatch on Explainable AI. www.edps.europa.eu/system/files/2023-11/23-11-16_techdispatch_xai_en.pdf, 2023.
[EGK+20]
D. Escudero, S. Ghosh, M. Keller, R. Rachuri, and P. Scholl. Improved Primitives for MPC over Mixed Arithmetic-Binary Circuits. In 40th Annual International Cryptology Conference, CRYPTO, volume 12171 of Lecture Notes in Computer Science, pages 823–852. 2020. DOI: https://doi.org/10.1007/978-3-030-56880-1_29
[JSTV23]
D. Jetchev, A. Sae-Tang, and M. Vuille. Balancing Privacy and Explainable AI in Semiconductor Manufacturing. https://inpher.io/blog/xai-semiconductor-manufacturing/, 2023.
[Kel20]
M. Keller. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1575–1590. 2020. DOI: https://doi.org/10.1145/3372297.3417872
[KOS16]
M. Keller, E. Orsini, and P. Scholl. MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 830–842. 2016. DOI: https://doi.org/10.1145/2976749.2978357
[KPR18]
M. Keller, V. Pastro, and D. Rotaru. Overdrive: Making SPDZ Great Again. In EUROCRYPT 2018, volume 10822 of Lecture Notes in Computer Science, pages 158–189. 2018. DOI: https://doi.org/10.1007/978-3-319-78372-7_6
[LEC+20]
S. Lundberg, G. Erion, H. Chen, A. DeGrave, J. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2(1):56–67, 2020. DOI: https://doi.org/10.1038/s42256-019-0138-9
[LEL18]
S.. Lundberg, G. Erion, and S. Lee. Consistent Individualized Feature Attribution for Tree Ensembles. CoRR, abs/1802.03888, 2018.
[LL17]
S. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774. 2017.
[MFH22]
Rory Mitchell, Eibe Frank, and Geoffrey Holmes. GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles. PeerJ Computer Science, 8:e880, April 2022. DOI: 10.7717/peerj-cs.880
[Mol22]
C. Molnar. Interpretable Machine Learning. , 2 edition. 2022.
[MR18]
P. Mohassel and P. Rindal. ABY\({}^{\mbox{3}}\): A Mixed Protocol Framework for Machine Learning. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 35–52. 2018. ACM. DOI: 10.1145/3243734.3243760
[MZ17]
P. Mohassel and Y. Zhang. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 19–38. 2017. IEEE Computer Society. DOI: 10.1109/SP.2017.12
[PSSY21]
A. Patra, T. Schneider, A. Suresh, and H. Yalame. ABY2. 0: Improved mixed-protocol secure two-party computation. In 30th USENIX Security Symposium. 2021.
[RSG16]
M. Túlio Ribeiro, S. Singh, and C. Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the Demonstrations Session, NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 97–101. 2016. The Association for Computational Linguistics. DOI: 10.18653/v1/N16-3020
[Sha53]
L. Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317, 1953. DOI: https://doi.org/10.1515/9781400881970-018
[SK14]
E. Strumbelj and I. Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst., 41(3):647–665, 2014. DOI: 10.1007/s10115-013-0679-x
[TLL+22]
Z. Tian, J. Liu, J. Li, X. Cao, R. Jia, and K. Ren. Private Data Valuation and Fair Payment in Data Marketplaces. CoRR, abs/2210.08723, 2022. DOI: 10.48550/arXiv.2210.08723
[Wan19]
Guan Wang. Interpret Federated Learning with Shapley Values. May 2019.
[WAYS22]
Lauren Watson, Rayna Andreeva, Hao-Tsung Yang, and Rik Sarkar. Differentially Private Shapley Values for Data Evaluation. June 2022.
[WGC19]
S. Wagh, D. Gupta, and N. Chandran. SecureNN: 3-party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies, 2019(3):26–49, 2019. DOI: https://doi.org/10.2478/popets-2019-0035
[Yan22]
Jilei Yang. Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. 2022.

PDFPDF Open access

History
Submitted: 2024-10-08
Accepted: 2024-12-03
Published: 2025-01-13
How to cite

Dimitar Jetchev and Marius Vuille, XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models. IACR Communications in Cryptology, vol. 1, no. 4, Jan 13, 2025, doi: 10.62056/a3qjmp-3y.

License

Copyright is held by the author(s)

This work is licensed under a Creative Commons Attribution (CC BY) license.