Literature Review｜A review of the development of watermarking in the LM field

Foreword: I have been reading papers related to LM model watermarking some time ago. Here is a brief summary of the current research status in this field.

For students who don’t understand model watermarking, it is recommended to read this blog first: Literature Review | Overview of the Development of Neural Network Watermarking in the CV Field

The following are some personal summary of Survey of X-Watermarks, welcome to star?~

Natural language text watermark: https://github.com/meiling-fdu/Text-Watermark
Language model watermark: https://github.com/meiling-fdu/LM-Watermark
Visual model watermark: https://github.com/meiling-fdu/CV-Watermark

LM watermark overview

According to the conditions required for watermark extraction, there are three main existing language model (LM) watermarking methods:

White-box watermarking: Embedding a watermark in the parameters of a neural network, and verifying the existence of the watermark by examining the weight distribution of the model during verification.
Black box watermarking: Embedding watermarks in the parameters of the neural network through backdoor implantation. During verification, the existence of the watermark is checked by comparing the consistency of the model output with the trigger set label.
Boxless watermark: Embed a watermark in the output text of the model. During verification, the existence of the watermark is verified by detecting the watermark signal carried by the model output text.

Among them, when extracting white-box watermarks, you need to know the parameter information of the model. When extracting black-box watermarks and boxless watermarks, you only need to obtain the API of the model.

In addition, the two fields of LM boxless watermarking and natural language text watermarking have something in common. Most of them embed watermarks based on modified text steganography, while some boxless watermarks embed watermarks based on generative text steganography. . For a review related to “Generative Text Steganography“, see this article: Paper Study | Overview of the Development of Generative Text Steganography

White box watermark

The current typical articles on protecting LM model white box watermarks are as follows. The first two works are from the Chee Seng Chan team of Universiti Malaya, and the third work is from the Wu Hanzhou team of Shanghai University.

2021 PR｜Protect, show, attend and tell: Empowering image captioning models with ownership protection（Universiti Malaya）
The first article on protecting image description generation models uses white-box watermarking to embed the watermark signal into the hidden state of the LSTM model.
2022 AACL｜An Embarrassingly Simple Approach for Intellectual Property Rights Protection on RNNs（Universiti Malaya）
GateKeeper, a method to protect the RNN model, is proposed, using the threshold control unit of the RNN model to embed the watermark signal.
2023 Symmetry｜An Effective Framework for Intellectual Property Protection of NLG Models (Shanghai University)
Small improvements to the first work.

The above work all belongs to the category of zero-bit watermarking, which can only determine the existence of the watermark signal, but does not establish the relationship between the model owner and the watermark signal.

Black box watermark

As of October 2023, there are 5 articles related to LM black box watermarking in the published work. The difference between these articles mainly lies in the way the trigger set is constructed. The protection model of black box watermarking can be a text classification model

[

]

^{[1,2]}

[1,2], pre-trained language model

[

]

^{[3]}

[3] (PLM), Embedding

[

]

^{[4]}

[4]……

2021｜Robust Black-box Watermarking for Deep Neural Network using Inverse Document Frequency（University of New Brunswick）
2022 DSD｜TextBack: Watermarking Text Classifiers using Backdooring（Nanyang Technological University Singapore）
2023 AAAI｜PLMmark: A Secure and Robust Black-Box Watermarking Framework for Pre-trained Language Models (Shanghai Jiaotong University)
2023 ACL｜Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark (University of Science and Technology of China)
2023 TrustNLP｜GPTs Don’t Keep Secrets: Searching for Backdoor Watermark Triggers in Autoregressive LMs（Michigan Technological University）

Among the above works, [1, 2, 3, 4] all conducted experiments on text classification models, and they all belong to the category of zero-bit watermarking. The work [5] went in the opposite direction and verified through experiments the difficulty of embedding black-box watermark signals into autoregressive language models (high false trigger rate). PLMmark[3] establishes the relationship between the model owner and the trigger set through a one-way hash function.

No box watermark

Boxless watermarks are mostly used to protect LM APIs, such as Google Translator and Bing Translator.

2011 EMNLP｜Watermarking the Outputs of Structured Prediction with an application in SMT task（Google）
This paper proposes a boxless watermarking method to protect statistical machine translation models, which can effectively resist local editing operation attacks.
2022 AAAI｜Protecting Intellectual Property of Language Generation APIs with Lexical Watermark (Monash University)
This paper makes semantic-preserving modifications to the content generated by the model API, and uses hypothesis testing to determine the existence of watermarks while protecting the original performance of the model.
2022 EMNLP-F｜Distillation-Resistant Watermarking for Model Protection in NLP (UCSB)
2022 NIPS｜CATER- Intellectual Property Protection on Text Generation APIs via Conditional Watermarks (University College London)
This article proposes a boxless watermark method to protect language model API, which can effectively verify the existence of the watermark while ensuring that the vocabulary distribution before and after the watermark is added is close. The method proposed in this article can effectively resist model theft attacks.
2023 ICML｜A Watermark for Large Language Models（University of Maryland）
This paper embeds watermarks by changing the distribution of generated text: using the word at the t-1th time step as a random seed to assist in generating the current word, focusing on selecting words in the watermark vocabulary. During watermark extraction, a detection algorithm based on p-value statistical testing is used to verify the existence of the watermark.
2023 ICML｜Protecting Language Generation Models via Invisible Watermarking (UCSB)
2023 ICML-W Provable Robust Watermarking for AI-Generated Text (UCSB)
2023 NeuroComputing｜A novel watermarking framework for intellectual property protection of NLG APIs (Shanghai University)
A small improvement on the work [2].
2023 NLPCC｜COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation (Xiamen University)
This paper proposes a box-free watermark embedding method based on a mask language model. The mask language model is used to select replacement word sets to achieve semantic consistency before and after watermark embedding.

Among the above methods, [1, 2, 3, 4, 6, 8, 9] all obtain watermarked text through secondary processing of the output text of the generation model, and [5, 7] have a tendency in the model generation stage. Generate watermark vocabulary to obtain watermarked text. The above methods all verify the watermark through hypothesis testing. The replacement objects of work [1] are candidate sentences, while the replacement objects of [2, 3, 4, 6, 8, 9] are words with specified parts of speech. The above work all belongs to the category of Zero-bit watermarking.

Watermark characteristics

Concealment

Mobility

Robustness

Pruning

Fine-tuning

Security

Generally, the robustness and security of watermarks are measured from the following aspects.

Forgery attack

Local modification attack

This attack has a greater impact on boxless watermarks because local modifications to the output text may destroy the distribution of watermark words.

Model stealing attack

This attack should be a relatively severe attack on generative language models, which means that the attacker steals the functionality of the model without providing compensation or due consideration to the model owner.

The whole month of October is dedicated to model watermarking

References

How to add watermark to large models – Zhihu (zhihu.com)
Issuing ID cards to text generated by large models: Encodable watermark technology for text generated by large models (qq.com)

Directory