Turing Test 2012: friend or faux

17 July 2012 James Hayes What relevance does Alan Turing’s controversial proposed method for testing a computer system’s ability to behave ‘intelligently’ have in a world of ever-smarter interactive applications, robotic companions and artificial intelligence? Last June’s Alan Turing centenary…

17 July 2012

James Hayes

What relevance does Alan Turing’s controversial proposed method for testing a computer system’s ability to behave ‘intelligently’ have in a world of ever-smarter interactive applications, robotic companions and artificial intelligence?

Last June’s Alan Turing centenary celebrations renewed debate around one of his most contentious ideas: the Turing Test. Introduced in the 1950 paper ‘Computing Machinery and Intelligence’, the test was Turing’s first concerted attempt to address exclusively some principles of machine intelligence, with its famous opening declaration that the author proposes “to consider the question, ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous… [So] instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words… Are there imaginable digital computers which would do well in the imitation game?”

This change of focus in an attempt to better present his line of enquiry also arguably introduced a degree of confusion as to what Turing was concerned with determining. Turing then couches his enquiry in terms of an ‘imitation game’, in which over controlled series of five-minute sessions a human ‘judge’ engages in natural language intercourse with two unseen interlocutors. Over time this has become conducted via a standard PC keyboard and screen: one linked to another human, the other a computer program. The test aims to assess how well the judge is able to detect whether they are conversing with a person or program, and also how accomplished the program involved is in convincing the judge that they are conversing with a human rather than a computer.

Since the 1960s, by which point computer technology had matured to a level where an attempt to realise Turing’s proposals could be made (albeit in a primitive form), the test has had a double reputation: both as an estimable starting point for modern investigations into machine intelligence, artificial intelligence (AI), and interactive software design, but also as an example of a concept whose value has been compromised by advances in computer science, and thus as a target for criticism that has decried the Test as a “a red herring” that distracts from more critical researchers into AI study.

Testing times

Its advocates, meanwhile, maintain that the Turing Test continues to have relevance to issues related to how (and why) humans interface with virtual entities, and that at least some of its inherent principles have actually gained in importance since the advent of the Web, and the enticements and threats to be found there.

“I doubt that Turing could have imagined the power of the Internet and social networking – or that he could have predicted the nature of cyber-crime,” says Kevin Warwick, professor of cybernetics at the University of Reading. “Yet the Turing Test, in its practical realisation, gives us an insight not only into how machines can communicate, but even more so into the prolific nature of human communication. The test is not simply about how well machines communicate, but rather into how they communicate in comparison with humans, and how well they can fool other humans. What we increasingly witness in practical tests is the case of human interrogators… not being able to tell the difference between a machine and a human, in terms of communication.”

It’s also fair to say that the basic scenario underlying the Turing Test ‘live’ holds a certain popular appeal, particularly in the fact that theoretically anyone could – and should – participate as a judge (including young people). The fame factor has been heightened by the fact that since 1991 practical Turing Test runs have occurred in association with the Loebner Prize in AI, which offers a top award (gold medal and $100,000) for computers whose responses are ‘indistinguishable from a human’s’ (a bronze medal and $2,000 is awarded to the ‘most human-like computer’ that competes).

A sizeable canon of papers and books about the Turing Test has been published. The weight of evidence from the pro- and anti-perspectives has shown that Turing’s methodology has certain perceived strengths and weaknesses when subjected to analytical scrutiny that’s informed by successive advances in computer science. Turing’s own writings leave a number of aspects of the test open to interpretation, possibly as he intended; are they intended to determine whether a computer program is able to spoof an interrogator into believing that it is a human? Or to prove that a computer program could convincingly imitate human conversation?

As technology in other areas of ‘machine intelligence’ has progressed since the Turing Test was first mooted, its overall value has been questioned, and even dismissed, by some computer scientists and software academics. In 1995 Professor Patrick Hayes (then of the University of Illinois) and Kenneth Ford of the Institute for Human and Machine Cognition, co-authored a paper ‘Turing Test Considered Harmful’ which claimed that “adherence to Turing’s vision from 1950 is now actively harmful to [the field of AI]”. Speaking now as the institute’s senior research scientist at the Alan Turing Centenary Conference in Manchester two days after the 2012 Tests took place, Hayes described them as “a complete red herring””[Turing] didn’t put forward that imitation game as a criterion for whether a machine can think’ He was quite clear on the matter. He thought that was a pointless discussion. We shouldn’t even be talking about whether a machine can think. He put it forward as an alternative to that question.”

Hayes went on to disavow necessary interdependencies between the Turing Test and more formalised developments in artificial intelligence (AI): “Has [the Turing Test] in fact been useful in AI? Has AI research over the last 40 years devoted itself to trying to pass the Turing Test? The answer is no… Nobody is really taking the Turing Test seriously except [for] people who write about AI and keep saying ‘oh, but it hasn’t succeeded because it hasn’t passed the Turing Test’,” said Hayes. “It’s time we should stop talking about it. It’s a waste of time.”

Test words, then concepts

Others however have argued that what some see as the test’s weaknesses actually constitute its enduring value, because it addresses some fundamental issues around definitions of ‘intelligence’ and ‘thinking’ from the very outside as they apply to both machines and humankind, and obliges us to ensure that semantic principles are part of any broader investigation of how well computers can really emulate the mechanics of reciprocal inter-entity communication.

Some of these definitions remain open to this day; and, for all its loose ends, its proponents point out that the Turing Test does provide data that can be measured, and that over time it does provide the focus for procedures that provide a some indication of how well imitative software is progressing.

At the University of Reading’s School of Systems Engineering Dr Huma Shah’s involvement with the organisation of the public Turing Test events (since 2008, with the most recent occurring at Bletchley Park last 23 June, Turing’s 100th birthday) is rooted in her academic advocacy of their continued relevance toward a broader range of emerging technological phenomena. Shah is also author (and co-author with Professor Warwick) of papers that set out to re-evaluate the Turing Test in the context of recent developments in computer ‘intelligence’, such as IBM’s Watson supercomputer, which gained fame after winning the TV game show ‘Jeopardy’

Shah acknowledges the Turing Test’s perceived shortcomings have been well noted, but argues that they do not necessarily devalue its intrinsic qualities. “I know that the Turing Test is disparaged by some,” she says, “but I still feel that Turing’s ideas in this area can be made into valid scientific experiences. Some critics say that the Turing Tests are ‘just theatre’, but that is in the nature of open experiments that are conducted in the public domain. They remain scientifically valid nonetheless.”

Shah believes that in addition to their symbolic value, the tests continue to touch on some key issues intrinsic to areas of our developing relationship with computerised systems and robots. First, it provides an opportunity for imitative software to be appraised by ordinary members of the public who can use them to provide feedback not only on how well the competing systems are able to simulate a human-like engagement, but also say how satisfactory the human component finds the experience.

“As technology moves forward we are all going to come into contact with robots in some form or another, and I believe that everyone should have a say on how those robots are designed to behave,” Shah adds. “We need broad input into these areas of science and technology, and the Turing Test is a simple but effective way to examine techniques of machine ‘thinking’.”

Furthermore, the University of Reading Turing Test team argues, lessons learned from cross-analysis of successive tests will over time make worthwhile contributions to raising awareness of emerging threats in cyberspace, such as the growing trend for malevolent ‘chatbots’ – computer programs that simulate intelligent conversation with one or more human users via auditory or textual methods – which are used to facilitate phishing attacks, for example.

“Studying why [humans are increasingly being fooled by hidden entities], even when experts in the field are involved, is vital if we are to understand and fight cyber crime,” says Shah’s associate colleague Professor Kevin Warwick. “Developments in AI communication have been significant; intrinsically this involves the machines taking on human personas, with emotion, humour, wit and charisma. At the tests of 23 June even top [AI] experts were fooled.”

This aspect of AI research “tells us as much about aspects of human intelligence as it does about the nature of machine intelligence”, he adds.

For the Test ‘antis’ Alan Turing’s status as an intellectual progenitor of the highest order, whose vital contribution to wartime cryptanalysis at Bletchley Park, and post-war work on the ACE (Automatic Computing Engine) at the National Physical Laboratory, make it highly improbable that the Turing Test will be overlooked by computer science textbooks as its declaimers might wish.

But is it possible that be that the regard in which it is held could actually be enhanced as more experiments are conducted and more data is crunched? What are the chances that the Turing Test may have a role to play in rescaling simulatory computer interfaces as software designers innovate in the ways that humans interact with computer-based systems via some means of natural language (textual or audial) for common applications?

One area where the test is likely to find renewed interest is in the field of so-called ‘companion robots’. This concept has gained much interest, both because of advances in robotics’ abilities to provide a simulacrum of human-like response, but also because of the role that companionship plays in arresting some degenerative health conditions.

The market for robotic companions – whether as online entities or in humanoid form – could prove considerable: developed societies, for instance, face ageing population demographics, and such products could help alleviate some of the challenges that this will present. The Turing Test – or a derivative – may well help inform the basis of a simple market testing model for human-companion robot interfaces that in turn could possibly even evolve into a future industry standard.

Reading University’s Turing Test event took place at Bletchley Park on 23 June, over five sessions, each featuring different sets of six judges with mixed social profiles. Five software ‘machines’ (also known as Artificial Conversational Entities or ACEs) competed in the sessions.

Each judge conducted a series of five minute tests with both single and dual ‘hidden entities’ and human interlocutors located out of sight and hearing of the judges, who then recorded their opinion of whether each mystery interlocutor is corporeal or computer – and, if they thought they had been interacting with a human, whether they were able to tell if they had been male or female.

Alan Turing’s original proposals set a pass mark of 30 per cent – that’s to say a contending ‘machine’ has to ‘fool’ 30+ per cent of the judges it converses with. The 2012 winner – chatbot ‘Eugene Goostman’, brainchild of Vladimir Veselov Eugene Demchenko and Sergey Ulasen, created around the personality of a boy purporting to be based in the Ukraine – achieved a ‘fool rate’ of 29.17 per cent. This was a significant improvement on its best score for the 2008 Turing Test of almost 21 per cent. ‘Goostman’ had previously competed in the 2001, 2005 and 2008 Turing Tests.

The technology of course will never see out Turing’s initial prediction that programmers would bring his vision of machine intelligence to the 30 per cent pass rate by about the end of the 20th century, although he did revise this in 1952 to “at least 100 years”. The other four 2012 machine contestants were ‘JFRED’ (Robby Garner and Paco Xander Nathan), ‘Cleverbot’ (Rollo Carpenter), ‘Elbot’ (Fred Roberts, AI Solutions), and ‘Ultra Hal’ (Robert Medeksza). The 2012 Turing Test prize was awarded by The Colonnade Hotel London, which now occupies the building where Alan Turing was born.