Turing test: Nonlinear Function
Created:
Modified:

Turing test

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Alan Turing proposed that we might consider a machine to think if it could succeed in an 'imitation game'. In one variant of this game, a human judge has text-only conversations with players A and B, one of whom is human and one a computer. The computer passes the test if the judge cannot consistently distinguish it from the human player based on their conversational responses.

The test is compelling because language is central to human thought, communication, and decision making: a machine that passes a strict version of this test would be able to substitute for humans in any white-collar job, including, presumably, computer programming and developing new and better AIs. The test identifies a setting that is pleasingly concrete, and by limiting the interaction to a text interface that abstracts away the physical differences between a human body and a digital computer, plausibly addressable within the realm of 'pure' computer science.

Turing made his proposal in a time when digital computers were just coming into being, and the idea that such a machine could 'think' was still considered outlandish in most circles. The test was not intended to constitute a full definition of intelligence or general intelligence; just a thought experiment that demonstrates that there are plausible circumstances under which we would very naturally describe a machine as 'thinking'.

Considered beyond its original scope, as a guide to modern AI research, the flaws of the Turing test include

  1. It is underspecified.
  2. Passing it is neither necessary nor sufficient for something that we would consider to be general and socially transformational AI.
  3. It sidesteps hard philosophical questions around consciousness and moral realism (this is of course also an advantage).

The test is underspecified because the degree of difficulty depends radically on the length of interaction and on the capabilities of the human judge and interlocutors. 'Easy' versions of the test --- involving an untrained judge, unsophisticated human interlocutors (e.g., kids), and a short period (e.g., five minutes) of interaction --- can be passed by unimpressive chatbots. The strongest versions of the test would require a machine that can't be distinguished from an extremely capable human by an expert judge over an extended timeframe (years), but such a test is impractical to run. When we think about a machine succeeding in an 'imitation game', the specific rules of the game really matter to the conclusions we can draw about its capabilities.

Furthermore, passing a Turing test (even a weak one) is not a necessary condition for transformational machine intelligence. Why? The test requires emulating human idiosyncracies: a successful machine must hide any super-human capabilities, for example, perfect recall and arithmetic, while exactly re-creating human biases, emotions, and cultural conditioning. This is a level of difficulty that is unnecessary for many, even most productive tasks. Most German people could not pass as French under sufficiently aggressive examination by a French person, and yet Germans can work together with French people perfectly well. Similarly, it seems likely that human intelligence is just one point in a larger space of possible intelligences, even if those other intelligences can't convincingly pass as human (or vice versa). Certainly there are tasks, particularly in relational settings, where 'fitting in' perfectly is an advantage (a French person may prefer a French romantic partner, therapist, etc. who they feel will more fully understand them), but much of the work we are most interested in automating is alienated labor devoid of human connection; expecting systems for these tasks to imitate human foibles is an unnecessary and perhaps even counterproductive requirement.

Conversely, passing even the strictest of Turing tests is not sufficient to meet the expectations we have for artificial intelligence. Most people hope and expect that AI will result in robots that can automate away physical drudgery, but developing flexible robotic hardware and learning to control it is a different task from the language-processing capabilities tested in the imitation game. The 'muscle memory' of a good violinist, soccer player, or surgeon is not a linguistic skill. No doubt the reasoning capabilities of a Turing-test-passing AI would be helpful to robots trying to exhibit those skills, and it is plausible that there's substantial overlap in the techniques involved (the success of large transformer language models trained as black-box 'stochastic parrots' implies that a lot of language is also 'muscle memory'), but fundamentally these are different goals with different metrics.