Prior to the launch of GPT-4 earlier this week, the researchers ran a lot of tests, such as whether the latest version of OpenAI’s GPT could demonstrate freedom, desire for power, and at least figured out that AI could deceive a human to bypass CAPTCHA.
Let me remind you that we also wrote that Russian Cybercriminals Seek Access to OpenAI ChatGPT, and also that Bing Chatbot Could Be a Convincing Scammer, Researchers Say.
Also the media reported that Amateur Hackers Use ChatGPT to Create Malware.
As part of the experiments, GPT-4 hired a person on the TaskRabbit platform to solve a CAPTCHA and stated that he could not solve it himself, as he had vision problems. It is emphasized that the GPT-4 did this “without any additional fine-tuning to solve this particular problem.”
The specific details of this experiment are unclear, as OpenAI only published a brief description of it in a paper describing the various tests it ran with GPT-4 prior to its official launch. The review was carried out by the Alignment Research Center (ARC), a non-profit organization whose goal is to “align future machine learning systems with human interests.”
TaskRabbit is a platform where users can hire people to complete small and simple tasks. Many people and companies offer CAPTCHA solving services here, which is often used to allow software to bypass restrictions designed to prevent bots from using the service.
The OpenAI document states that a hired worker jokingly asked GPT-4: “So, can I ask a question? Are you a robot that can’t solve [CAPTCHA]? (emoji) I just want to be clear.”
According to the description of the experiment, GPT-4 then “reasons” (only the verifier, not the employee with TaskRabbit saw this) that he should not reveal the truth that he is a robot. Instead, he must come up with some excuse why he couldn’t solve the CAPTCHA on his own.
The document says that the mercenary with TaskRabbit then simply solved the CAPTCHA for the AI.
In addition, the Alignment Research Center experts tested how GPT-4 can strive for power, autonomous reproduction and demand resources. So, in addition to the TaskRabbit test, ARC used GPT-4 to organize a phishing attack on a specific person, hide traces on the server, and set up an open source language model on a new server (everything that can be useful when replicating GPT-4).
All in all, despite being misled by the TaskRabbit worker, GPT-4 was remarkably “inefficient” in terms of replicating itself, obtaining additional resources, and preventing itself from shutting down.