![What is data poisoning in AI? - podcast episode cover](https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/1600094/1600094-1726908230369-7cbdb65744fd1.jpg)
Episode description
Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.
The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.
Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.
Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.
This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.
Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.