Benchmarks used to rank AI models are several years old, often sourced from amateur websites, and, experts worry, lending automated systems a dubious sense of authority
The Turing Test (as some people believe it to be): if you can have a conversation with a computer and not tell if it’s a computer, then it must be intelligent.
AI companies: writes ML model that is specifically designed to convincingly play one side of a conversation, even though it has no ability to understand the things it talks about.
Also as Turing proposed it it’s meant to be infinitely repeatable. The test isn’t supposed to just be if a machine can convince one person with one conversation. That would be trivial. The real Turing test is the converse, it says that there should be no conversation one could have with the machine where it wouldn’t convince you it’s a human.
The Turing Test (as some people believe it to be): if you can have a conversation with a computer and not tell if it’s a computer, then it must be intelligent.
AI companies: writes ML model that is specifically designed to convincingly play one side of a conversation, even though it has no ability to understand the things it talks about.
It’s worth emphasizing that the “Turing Test” is not a good test since it’s not at all scientific.
It’s just another thought experiment that grifters have taken to the bank.
Also as Turing proposed it it’s meant to be infinitely repeatable. The test isn’t supposed to just be if a machine can convince one person with one conversation. That would be trivial. The real Turing test is the converse, it says that there should be no conversation one could have with the machine where it wouldn’t convince you it’s a human.
The most advanced models absolutely have modeling about what’s being discussed and relationships between concepts.
Even toy models have been shown to build world models from very basic training data.
Honestly, read at least a little bit of the relevant research:
https://www.anthropic.com/news/mapping-mind-language-model