Publicly-Detectable Watermarking for Language Models
Authors
Abstract
We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.
References
How to cite
Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, and Mingyuan Wang, Publicly-Detectable Watermarking for Language Models. IACR Communications in Cryptology, vol. 1, no. 4, Jan 13, 2025, doi: 10.62056/ahmpdkp10.
License
Copyright is held by the author(s)
This work is licensed under a Creative Commons Attribution (CC BY) license.