Image by Tumisu, from Pixabay

ChatGPT’s Memory Vulnerability: A Potential Security Risk

Reading time: 2 min

First published: Sep 25, 2024

Updated 2 times since publishing

Written by Kiara Fabbri Multimedia Journalist
Fact-Checked by Justyn Newman Lead Cybersecurity Editor

In a Rush? Here are the Quick Facts!

Identified a vulnerability in ChatGPT’s long-term memory feature.
The flaw allows prompt injection from untrusted sources like emails.
ChatGPT can store false information based on manipulated memory inputs.

ArsTechnica (AT) reported on Tuesday a study showcasing a vulnerability in OpenAI’s ChatGPT that allowed attackers to manipulate users’ long-term memories by simply having the AI view a malicious web link, which then sent all interactions with ChatGPT to the attacker’s website.

Security researcher Johann Rehberger demonstrated this flaw through a Proof of Concept (PoC), showing how the vulnerability could be exploited to exfiltrate data from ChatGPT’s long-term memory.

Rehberger discovered that ChatGPT’s long-term memory feature was vulnerable. This feature has been widely available since September.

The vulnerability involves a technique known as “prompt injection.” This technique causes large language models (LLMs) like ChatGPT to follow instructions embedded in untrusted sources, such as emails or websites.

The PoC exploit specifically targeted the ChatGPT macOS app, where an attacker could host a malicious image on a web link and instruct the AI to view it.

Once the link was accessed, all interactions with ChatGPT were transmitted to the attacker’s server.

According to AT, Rehberger found this flaw in May, shortly after OpenAI began testing the memory feature, which stores user details such as age, gender, and beliefs for use in future interactions.

Although he privately reported the vulnerability, OpenAI initially classified it as a “safety issue” and closed the report.

In June, Rehberger submitted a follow-up disclosure, including the PoC exploit that enabled continuous exfiltration of user input to a remote server, prompting OpenAI engineers to issue a partial fix.

While the recent fix prevents this specific method of data exfiltration, Rehberger warns that prompt injections can still manipulate the memory tool to store false information planted by attackers.

Users are advised to monitor their stored memories for suspicious or incorrect entries and regularly review their settings.

OpenAI has provided guidelines for managing and deleting memories or disabling the memory feature entirely.

The company has yet to respond to inquiries about broader measures to prevent similar attacks in the future.

Rehberger’s findings highlight the potential risks of long-term memory in AI systems, particularly when vulnerable to prompt injections and manipulation.

ChatGPT’s Memory Vulnerability: A Potential Security Risk

We're thrilled you enjoyed our work!

Leave a Comment