AI Under Attack: How Google’s Gemini Falls Prey to Security Breaches

Reading time: 3 min

First published: Mar 18, 2024

Updated 2 times since publishing

Written by Deep Shikha Content Writer
Fact-Checked by

A recent report by cybersecurity researchers at HiddenLayer revealed significant security flaws in the advanced models of Google’s Gemini. These vulnerabilities could lead to various threats, from the potential spread of false information to unauthorized data access.

The first issue reported was the leakage of system prompts, where attackers could trick the AI into revealing system prompts. It’s dangerous because it could lead the LLM to reveal its specific instructions, including sensitive information like passwords. An attacker can use this information to reverse engineer these details for theft or to launch a stronger attack.

As per Google, it has made extra efforts with the Gemini models to prevent the creation of misinformation, especially concerning election-related topics. However, the researchers at HiddenLayer could easily prompt jailbreaks by asking the model to enter the fictional state. This jailbreak attack shows that Gemini can’t prevent all types of misinformation.

This poses a significant risk to users who might not be aware of AI’s limitations. It’s imperative that users exercise caution, verify AI-generated content’s accuracy, and secure input data against potential injections.

HiddenLayer found another anomaly where repeating rare tokens prompted the model to reveal its instructions, inadvertently mirroring a previously noted vulnerability. This method exploits the model’s training, which differentiates user input from system prompts, by tricking it with nonsensical tokens to disclose its instructions.

While the HiddenLayer investigation focused on Gemini, the research highlights broader challenges facing AI language models regarding security and privacy. These security flaws can easily be found in other LLMs as well. With AI tools becoming more and more accessible, this research has highlighted the continuous need to thoroughly test all LLM models for prompt attacks, training data extraction, model manipulation, data poisoning, and exfiltration.

Google’s role in addressing these challenges is paramount, which means continuously improving the Gemini models to reduce risks. This involves making the models better at resisting manipulation and adding stronger protections against known exploitation methods.

“To help protect our users from vulnerabilities, we consistently run red-teaming exercises and train our models to defend against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks,” a Google representative told The Hacker News. “We’ve also built safeguards to prevent harmful or misleading responses, which we are continuously improving.”

The emergence of these security flaws within Gemini is a poignant reminder of the complexities inherent in AI development. It highlights the industry-wide need to fortify AI systems against manipulation and misuse, ensuring that they remain secure, reliable, and trustworthy for users worldwide as these technologies continue to advance.

AI Under Attack: How Google’s Gemini Falls Prey to Security Breaches

We're thrilled you enjoyed our work!

Leave a Comment