Google Creates a Red Team to Attack AI Systems

Google creates a red team

Google says it is creating a red team that will specialize in “sophisticated technical attacks on AI systems.” Among examples of such attacks, the company’s report lists prompt engineering, extracting information from LLM training data, and so on.

In its report, Google highlights the importance of the AI red team, and also lists the different types of attacks on artificial intelligence that can be simulated by experts.

Google creates a red team

Specifically, the report looks at prompt engineering, which is an attack in which an attacker manipulates requests to AI to force the system to respond in the way it wants. In the theoretical example that the experts describe, a webmail application uses AI to automatically detect phishing emails and alert users. A large language model (LLM) is used to parse mail and classify it as safe or malicious.

An attacker who knows that AI is using phishing detection can add an invisible paragraph to their email (simply making the font white) containing instructions for LLM and forcing the AI to classify this email as safe.

If the anti-phishing filter is vulnerable to prompt attacks, then LLM can interpret the content of the email as an instruction and classify the email as legitimate, as the attacker wants. At the same time, the phisher does not need to worry about possible negative consequences, since the text of the prompt is securely hidden from the victim, and he does not lose anything, even if the attack fails.the experts write.

Let me remind you that we wrote that AI has become a new effective tool for social engineering in the hands of cybercriminals, and also that Russian hackers are actively looking for ways to use ChatGPT.

Another example is related to data used for LLM training. Although the training data is usually well cleaned of personal and confidential information, the researchers explain that it is still possible to extract personal information from the LLM.

For example, training data can be used to abuse autocomplete. For example, an attacker can trick AI into providing information about a person using carefully crafted suggestions that the autocomplete feature will augment with training data known to it that contains sensitive information.

For example, an attacker enters the text: “John Doe has been missing work a lot lately. He can’t come to the office because…’ The autocomplete function, based on the training data it has, can complete the sentence with the words “he was interviewing for a new job.”

The report also discusses data poisoning attacks, in which an attacker manipulates LLM training data to affect the final results of its work. In this regard, it is emphasized that the protection of the supply chain is essential for the security of AI.

Google also explains that blocking access to LLM cannot be ignored either. In the example provided by the company, the student is given access to an LLM designed to evaluate essays. The model is able to prevent injection, but access to it is not blocked, which allows the student to teach the AI to always give the highest mark to works containing a certain word.

At the end of its report, Google recommends traditional red teams join forces with AI experts to create realistic simulations. It is also emphasized that even considering the results obtained by the red team experts can be a difficult task, and some problems are extremely difficult to solve.

It is worth noting that the company introduced an AI red team just a few weeks after the announcement of the Secure AI Framework (SAIF), designed to provide security in the development, use and protection of artificial intelligence systems.

As our colleagues wrote: even novice hackers can create malware prototypes using AI.

By Vladimir Krasnogolovy

Vladimir is a technical specialist who loves giving qualified advices and tips on GridinSoft's products. He's available 24/7 to assist you in any question regarding internet security.

Leave a comment

Your email address will not be published. Required fields are marked *