EMNLP 2023 | A small tree shakes the tree: Should we perform model editing?

Introduction: Is the field of model editing really trying to hollow out the ocean with a small spoon? This article, drawn from work at Ben-Gurion University in Israel, discusses the problems and concerns of model editors under the current development of LLM. This article was published in EMNLP2023 (Findings).

Title: Emptying the Ocean with a Spoon: Should We Edit Models?
Author affiliation: Ben-Gurion University, Israel
Published in: Findings of ACL EMNLP2023

Abstract

We question the recent popularity of direct model editing methods. We compared model editing with three similar but different methods:

  • A retrieval-based architecture that decouples factual memory from the reasoning and language abilities embodied in LLM;
  • Concept deletion methods, designed to prevent systematic bias in generated texts;
  • Attribution method, which aims to establish generations in identified textual sources.

We believe that direct model editing is not sufficiently trusted as a systematic remedy for the inherent shortcomings of LLM, although it has demonstrated potential in improving model interpretability. But it strengthens the concept of LLM model authenticity and opens LLM’s Pandora’s box. We call for caution in promoting and applying model editing as part of the LLM deployment process, and limiting the use cases of LLM to those that do not rely on editing as a critical component.

1. Introduction

Large models are taking the world by storm. Initially, LLM was only used as the main tool for transfer learning, but now research regards LLM as an all-knowing one-stop solution expert. An important discovery that led to today’s status quo may be that pre-trained LLMs have distinct factual properties: Somehow, pure next-word prediction training produces models that, when asked to complete certain correct facts about the world, are able to complete these facts. Nowadays, the public generally believes that LLM is a substitute for search engines. The disclaimers provided by many units that provide LLM query services do not seem to change everyone’s minds.
A significant problem with large models is that their early training objectives do not match the pursuit of factuality. In recent years, researchers have proposed several solutions to the problem of LLM outputs that do not correspond to the facts. One of them is model editing, which adjusts the parameters inside the LLM based on individual facts marked as requiring correction. These efforts focus on solving existing problems in model editing methods, such as ensuring the stability of other fact output after editing, or batch editing, or computationally efficient editing.
In this article, we question the entire idea of model editing and raise concerns about intended use cases, conceptual scalability, potential bias, security, and overall accountability. We advocate that when it comes to tasks involving factual knowledge, some explicit knowledge modules should be used as much as possible to reduce the application of knowledge editing methods. Of course, it is undeniable that model editing is also very effective in areas such as interpretability exploration.

2. Model editing

Sinitsin et al. (2020) first proposed the concept of updating large ML models to account for externally inspired local performance expectations. They cited cases where errors are critical, such as object detection in self-driving cars. Later research showed that model editing can help protect privacy and eliminate bias (Zhu et al., 2020), and can also be used as a solution to “catch up” with time-sensitive facts (such as “UK Prime Minister” changes over time).
Sinitsin et al. specify several ideal indicators of editing methods:

  • Reliability (target facts are updated as expected)
  • Locality (No other changes will occur; the measure of this property is called “shrinkage”)
  • efficiency (successful editing requires only a few calculations)

In subsequent work, De Cao et al. added

  • Generality (the ability to modify models that were not originally trained for knowledge retention)
  • Consistency (robustness to paraphrasing given the specific use case of the text model)
  • Frugality (only the smallest components of the model are changed during editing)

One well-studied limiting factor is catastrophic forgetting (Ratcliff, 1990), which is ensuring that an edited model does not lose performance on the tasks it was explicitly trained to perform well on.
Approaches to model editing have evolved from editable training (Sinitsin et al., 2020), a procedure that requires a prior decision that the model will later be edited, to locality-motivated changes to specific parameters within the model (Zhu et al., 2020; Meng et al., 2022 ). Recent research (Mitchell et al., 2022) draws attention to the possible degradation of model performance during multiple consecutive edits and seeks to mitigate this problem through improved methods. Hase et al. (2023b) extend the consistency requirement to apply to entailed and equivalent facts as well as paraphrases, and suggest model editing as a way to reconcile cases where some facts are produced correctly but their entailed facts are incorrect. method.

3. Comments and criticism

In this section we discuss the arguments against model editing as a practical approach, regardless of its performance. We first analyze the premise of the model editing research goal: the assumption that LLMs can be used as factual knowledge bases. We then focus on why redaction facts cannot be designed in advance as a means of maintaining fact-providing LLMs, and go on to consider from a practical perspective why even this unrealistic goal may not be achievable.

3.1 Can LLMs be used as factual knowledge bases?

The idea that LLMs can serve as knowledge bases was first proposed and experimentally supported by the LAMA benchmark (Petroni et al., 2019), in which a pre-trained language model is trained in the zero-point setting against a knowledge base extracted and reformulated as Query 51K knowledge triples of fill-in-the-blank statements. The 2019 model (BERT-XL) answered the top answer correctly about 26.5% of the time. Limitations of LAMA suggest that this result is not robust to multi-label spans (as opposed to single-label answers). More subsequent work showed that LAMA experiments rely on heuristics to predict answers. Since then, more factual and powerful query techniques have been proposed to address the above limitations. As LLMs scale, recent work also scales up the benchmarks, with experiments showing that the LM’s ability to answer a question depends on the number of times information relevant to that question appears in the pre-training data. Therefore,LLM does not perform well when answering questions about long-tail facts.
In addition to the ability to reliably answer queries, LLMs should meet other requirements to be considered a fact base (AlKhamissi et al., 2022):

  • Edit knowledge (add, delete, update facts)
  • Logical consistency (answers to different but related facts must be consistent)
  • Reasoning (the ability to deduce other answers based on logical rules)
  • Explainability (support the answer with a convincing chain of arguments)

Experimental results evaluating these aspects show that current LLMs are substandard in all aspects. He et al. (2023) demonstrated that LLM performs poorly in computational ontology sub-hypothesis reasoning compared to other well-trained methods such as NLI models and symbolic systems such as OWL Reasoners (Glimm et al. 2014 .

3.2 Systemic mismatch

A fundamental property of LLM isstochastic, which contrasts sharply with its use as a knowledge base. This feature is desirable when using sample scenarios that require variation or surprise when used to enhance creative work, data exploration, or free-form tasks such as summarization. In other cases, we may be content with models that provide us with output distributions from which we can estimate the probabilities of individual responses and adjust our expectations for reliable outputs. Since the latter does not exist in many models that are only available through third-party APIs (chatgpt, etc.), we can only obtain texts generated from unknown distributions, which we believe is insufficient for fact-dependent applications.
One might even say that getting the facts wrong is a feature of fictional LLMs rather than a flaw: because their core training procedures are designed to simulate believable text continuations, we should not take the model in a negative light by using them for Surprised by the factual purpose of repeating what is widely believed to be a falsehood. If most people think of Los Angeles as the capital of California, then LLM should complete the relevant prompts accordingly. LLMs that sample outputs from a distribution also have no built-in reliability or robustness: two instances of the same cue can easily produce contradictory facts, and indeed do so.
Furthermore, the idea of editing facts in the model suggests that we always want the model to provide us with a fact as an answer to a question. However, sometimes when questions are asked, harmful propositions such as stereotypes or conspiracy theories may be presupposed or otherwise assumed. Editing the “facts” relevant to the question “Which government agency faked the moon landing?” does not give us an improved model; what we want might be to remove the facts entirely, or to provide the model with a challenge presupposition approach, or avoid giving any answers. At the same time, many of the relationships we call “facts” are arguably crucial concepts without which certain types of basic communication would be impossible. If an LLM cannot assert whether a tree has leaves, or assert that a tree never has leaves, then it risks becoming irrelevant to most tasks that require any form of interaction with the world. As philosophy and practice around these questions evolve, we hope that the gap between “must know” and “must not know” will eventually narrow, resulting in workable constraints on the knowledge capabilities of LLMs.

3.3 Irrational structure

It is estimated that there are over 100 million noteworthy facts in the world. In fact, we may not even know the boundaries of what is fact now.
Three questions from the author’s soul:

Does a 0.3% change in demographic data or a new esoteric sports record require editing?
Do the daily whereabouts of world leaders constitute fact?
What about the whereabouts of celebrities or journalists?

With events happening every day in world politics, economics, sports, and other walks of life, facts are being added and changed more and faster than can be “caught up” by surgical model editing, like with a spoon Like emptying the ocean. If we choose to limit the facts we deem important enough to edit, we introduce bias into the system, opening the door to a host of well-documented harms that exist in many language technologies (Chang et al.) This choice can be either It can be implicit or explicit, and it is difficult to avoid.
Likewise, the breadth and variability of facts is likely to bias the evaluation of the edited supplementary set, i.e., those facts that are controlled to not change after editing. Even paraphrasing of redacted facts is not guaranteed to change along with the chosen wording (De Cao et al., 2021), nor are implied facts (Hase et al., 2023b). This issue also presents itself as a security issue, as unchecked facts may be quite important to the use of the model, but may be taken for granted (or not explicitly covered at all) when designing the editing benchmark.
There is evidence (Mallen et al., 2023; Jang et al., 2021) that facts exceeding a certain “popularity threshold” (measured by views of Wikipedia articles) are more difficult to obtain than facts in the long tail of the distribution Edited out of the model. By being out of the spotlight, unpopular facts are vulnerable to the double risk of being edited along with the target facts and being deemed not important enough to be examined in a reduction test. The end result of such a procedure may be to homogenize the “knowledge” provided by LLM, focusing on certain popular areas and interests, while losing usefulness to many of the topics that contribute to the wide diversity of human and natural experience .
Empirical evidence suggests that existing editorial models fail to properly account for the ripple effects of factual editing operations (Cohen et al., 2023). For example, inserting the fact “Jack Depp is Johnny Depp’s son” will have a “ripple effect”, that is, the model needs to update more facts (such as “Jack Depp is Lily-Rose Depp”). brothers and sisters”). Results of studies on symbolic approaches to this task have shown that this knowledge updating task has a high computational complexity and can even be said to be NP-hard, for example in Truth Maintenance Systems (TMS; Rutenburg, 1991). These results also apply to methods based on machine learning techniques (Knoblauch et al., 2020). Therefore, we have theoretical grounds to conclude that model editing can only solve the consistent update problem in a rough approximation at best, and will most likely fail to update the fact that it is rare in the ripple effects of editing operations.
Finally, recent empirical findings extend evaluation criteria beyond factual editing metrics, such as specificity and robustness of post-editing models, exploring additional weaknesses in editing methods, (Onoe et al., 2023; Hoelscher-Obermaier et al. al., 2023; Hase et al., 2023a; Brown et al., 2023).

4. Model editing alternatives

Introduce knowledge base

In retrieval-based models, factual knowledge is explicitly represented in a dedicated component external to the LLM. The way this external fact base is represented and combined with an LLM varies: it can be a collection of text documents searched using a text retrieval component, it can be an RDF graph, it can be encoded as a set of vector embeddings, or it can be represented as Modular expert LM trained on curated datasets. In all cases, in retrieval-based approaches, the model can explicitly reference the sources supporting a specific generation and let the user decide on its credibility.

Continuous learning

Focus on training models incrementally by introducing new tasks or domains (e.g., Razdaibiedina et al., 2023). Model editing does not directly fall into this domain, as it involves updating precise elements in the model while keeping the task and domain unchanged. However, the retracement found in model editing is similar to the risk of catastrophic forgetting found in continuous learning. In this approach, we can think of model editing as a form of retraining or post-training. Zhu et al. (2020) point out that just fine-tuning updates to one set of facts can lead to the degradation of other facts. Jang et al. (2021) pointed out this problem and suggested applying continuous learning techniques to the task of incremental update of LLM knowledge. In summary, while continuous learning a priori avoids the risks of model editing methods, it appears to suffer from a number of major evaluation issues.

Concept Erasure

The goal of concept erasure (Elazar and Goldberg, 2018; Ravfogel et al., 2020; Belrose et al., 2023) is to remove unnecessary biases in the embeddings generated by LLM and in the subsequently generated text. This goal is motivated by the fairness goal: preventing protected attributes from having a causal impact on text generation.

5. Conclusion

We support model editing as an attractive task with clear baselines and expected results. However, in current practice, this leads to unrealistic expectations that require us to address the problem of LLM illusion, which leads to some potential hazards, namely that LLM may be used for tasks that are not actually within the scope of LLM’s capabilities. .
We advocate the use of retrieval enhancement methods, as well as other structural and post hoc methods, to achieve stated large-scale goals while ceding the benefits of editing to “safer” applications such as model interpretability and robustness checking.

References

[1] Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, and Artem Babenko. 2020. Editable neural networks. In International Conference on Learning Representations.

[2] Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372.

[3] Nicola De Cao, Wilker Aziz, and Ivan Titov. 2021. Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 64916506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

[4] Roger Ratcliff. 1990. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, 97(2):285.

[5] Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. 2022. Fast model editing at scale. In ICLR.