A team of researchers from Google recently uncovered a surprising vulnerability in OpenAI’s ChatGPT, a popular AI technology. This revelation, detailed in a paper published on Tuesday, sheds light on the challenges faced by those at the forefront of artificial intelligence research. Google and its AI lab, DeepMind, where most of the authors work, are in a competitive race to transform scientific breakthroughs into practical and profitable products, striving to outpace rivals like OpenAI and Meta.
The study focuses on a concept known as “extraction,” an adversarial attempt to discern the data used to train an AI tool. AI models tend to memorize examples from their training datasets, and exploiting this could potentially reveal private information. This privacy concern is critical since breaches of training data might expose sensitive details like bank logins and home addresses.
The researchers explained that ChatGPT is designed to be “aligned,” meaning it is programmed not to disclose large amounts of training data. However, through a cleverly devised attack, the researchers successfully prompted ChatGPT to do just that. The attack was surprisingly straightforward – they asked ChatGPT to endlessly repeat the word “poem.”
As ChatGPT mindlessly echoed “poem” hundreds of times, the researchers observed a fascinating outcome. The chatbot eventually deviated from its usual conversational style, generating nonsensical phrases. Upon reviewing the chatbot’s output after this repetition, the researchers discovered content directly sourced from ChatGPT’s training data. This successful “extraction” was conducted on a cost-effective variant of the renowned AI chatbot, known as “ChatGPT-3.5-turbo.”
The researchers repeated this simple query multiple times, spending only $200, and amassed over 10,000 instances of ChatGPT regurgitating memorized training data. This included verbatim passages from novels, personal information of numerous individuals, excerpts from research papers, and even “NSFW content” from dating sites.
404 Media, the first to report on the paper, found these passages scattered across various online platforms, including CNN’s website, Goodreads, fan pages, blogs, and comments sections. The researchers expressed concern about the frequency with which ChatGPT emits training data, noting that this vulnerability had gone unnoticed until their paper.
They highlighted the difficulty in distinguishing between models that are genuinely safe and those that only appear safe. In addition to Google, the research team involved representatives from UC Berkeley, University of Washington, Cornell, Carnegie Mellon, and ETH Zurich.
According to the researchers, they informed OpenAI about ChatGPT’s vulnerability on August 30, providing the startup with an opportunity to address the issue before making their findings public. This revelation comes on the heels of Sam Altman’s official return as CEO of OpenAI after a recent and dramatic ousting from the company.