Passing the Turing test and the next generation of AI agents

icon
By Matt Poile, Content Editor

In March 2025, researchers at the University of California, San Diego, announced that artificial intelligence had passed the Turing test. So how did we reach this landmark achievement and what does it really say about the large language models used to get there?

First proposed in 1950, at the dawn of the computer age, by visionary British mathematician Alan Turing, the Turing test is an assessment of machine intelligence that tries to ascertain if a machine displays the same level of intelligence as a human being, based on whether it can convince a human in conversation that it is also a real person. Originally referred to by Turing as ‘the imitation game’, the essence of the test is not necessarily about intelligence, but more about passing an emotional threshold that’s convincing enough to pass as a real human interaction. In other words, it’s a vibe check (Voight-Kampff machines at the ready, fellow Philip K Dick fans).

In a recent study, OpenAI’s large language model GPT-4.5 was able to convince participants that they were speaking to a real person 73% of the time. Researchers at UC San Diego conducted a text-based chat experiment following the three-way format that Turing originally proposed. Participants were assigned roles as either judges or witnesses in randomised five-minute conversations, while GPT-4.5 served as the computer witness. Each human judge simultaneously engaged with both a human witness and a computer witness, and was asked to guess which one was human.

For the computer witness role, researchers employed various large language models, feeding them the judges’ inquiries using different prompts. The study evaluated the GPT-4.5 model by comparing two very distinct prompt strategies. The first was given only basic information. The second, allocated the name ‘PERSONA’, was provided with extensive contextual details, instructing the AI to adopt the character of “a young, introverted individual who is well-versed in internet culture and frequently uses slang,” according to the study authors. It was this second version that passed the test with a 73% success rate. The computer witness that received only minimal instruction, didn’t score nearly as well – only fooling human judges between 21% of the time. You can try the test out for yourself here.

The results are undoubtedly a major breakthrough, although the success comes with several large caveats. While the computer was able to make human judges believe it was human, this was largely based on the emotional reaction of the judges to the conversations they had, rather than being based on any particular knowledge a human might have that a computer would lack. What the experiment ultimately showed, however, is that AI models now have the capacity to perform emotionally intelligent roles, and that they can be very persuasive in them. The authors of the study, Cameron Jones and Benjamin Bergen, concluded that: “Fundamentally, the Turing test is not a direct test of intelligence, but a test of humanlikeness.”

So, while passing the Turing test certainly is not the same thing as achieving AGI – there are definitely some interesting potential applications and use cases companies should be aware of.

Personalised customer service at scale

Brands can use AI that feels convincingly human-like to provide personalised support to customers, free from the normal limitations of human teams. This includes 24/7 availability, maintaining a completely consistent tone and the ability to handle multiple customer interactions at the same time, scaling up according to demand at busy times without any loss of quality.

Case study: Bland AI is automating phone calls using hyper-realistic AI agents

Emotional intelligence in digital interactions

Vastly improved emotional intelligence in AI chatbots means they are getting much better at recognising and responding appropriately to emotional cues from callers, helping to de-escalate frustrated customers by adapting their tone of voice based on the sentiment detected by the system.

Case study: Hume AI is an emotionally intelligent voice interface

Guided product discovery

The conversational nature of these systems transforms the shopping experience into something akin to working with a personal shopper. Customers can express their needs in natural language and receive thoughtful recommendations. Rather than simply listing features or product descriptions, these AI tools can engage in storytelling around products and help customers articulate needs they might struggle to define themselves.

Case study: Shopping Muse turns colloquial language into personalised e-commerce results

Content creation and curation

Brands can leverage human-like AI for generating product descriptions that perfectly align with your brand voice. They can create personalised marketing messages for different customer segments and adapt existing content for various platforms and audiences. When responding to reviews, these systems craft replies that feel authentic and caring, maintaining the brand’s relationship with its community.

Case study: Spotify Wrapped comes with a personalised podcast from Google’s NotebookLM

08/04/2025