TURING TEST

The Turing Test is an AI test to see whether, through a chat conversation, a computer can convince a human that it is human.

A human is asked to judge whether the “person” they are speaking to a human or a computer. If they judge that they are speaking to a human but they are actually speaking to a computer, the computer has passed the Turing Test. Essentially, it is a test to assess whether a computer can imitate a human so convincingly that it can fool a human that they are speaking to a human.

Of course, there are many things to unpack about this test.

The first question we need to ask is what is the point of this test?

This may seem like a strange question as the point seems obvious, can a machine convincingly imitate a human in a chat conversation. There are however some deeper considerations however.

Are we testing whether a machine can genuinely imitate a human in terms of underlying thought or intelligence or just fool a human that is it human? There is a difference.

Imitating humans in terms of underlying thought or intelligence is what people typically think of when they think about the Turing Test. That humans can genuinely not tell the difference between chatting to a human or a machine. This was actually not the way the test was first envisaged because “tricking” humans were allowed. For example, making typos might be a way for a computer to trick a human into believing it was human as a machine would never make a spelling mistake.

The underlying problem is that tests have rules and therefore are flawed in some ways. For example how long you speak to the test subject matters. It’s easier to imitate a human over five minutes than over one hundred hours of conversation. Tricks might work over the 5 minute version but not over the 20 minute version.

Who is doing the testing also matters

A scientist with training on how to spot machines versus humans will be much harder to fool than someone off the street with no training. Not just in their ability to evaluate answers but also in knowing what questions to ask.

Even if the computer has a level of “thinking” and intelligence at the level of a human, that may not be enough to fool the tester. That is because the computer could be too perfect or too unemotional in its responses.

There are even philosophical considerations around the Turing Test such as whether if computers reach a generalized human-level intelligence that would mean that machines can “think” or are conscious. This was in part a question that Alan Turing was trying to bypass with this test. If a machine can accurately imitate a human then for all intents and purposes it is “thinking”. Of course, that does not mean it has consciousness or that it is thinking in the same way that a human thinks. In fact, it’s guaranteed that it does not think in the way that humans think. The real question is how interesting is this question from a practical point of view. Aircraft fly, for example, that is what is important, it’s much less interesting that they don’t imitate birds in the way they fly.

The Turing Test is interested in the results, not in the way the results are achieved.

A more important point is that the Turing Test is understood generally to describe a state of affairs where machine intelligence has reached at least human level intelligence. It is a much smaller group that is interestested in the question of whether a machine has technically passed a Turing Test considering all the flaws described above.

While passing a Turing Test could be an impressive technical feat, especially if the test is long-running and run by knowledgeable people, it is much less impressive than a machine that could fool all the people, all of the time. Of course the longer the period of time over which the test is run and the higher the level of expertise of the evaluators, the more likely these goals are going to converge.

Now that you understand what the test is, the next question must be are we anywhere near a computer passing the test i.e. achieving generalized human intelligence. The short answer is “No”.

While there has been tremendous progress in Natural Language Processing which is the ability of a computer to identify the intention behind a single spoken phrase (which is the technology driving all the voice assistants), we are very far from a generalized human-level intelligence.

It turns out that current technology is not very good at ambiguity (understanding the meaning behind ambiguous statements), memory (incorporating previously stated facts into the current conversation) or context (factoring in facts that are unstated but relevant to the current situation). In short, the current technology is almost nowhere in terms of what is needed.

Part of the problem is current AI technology needs to learn using huge amounts of data. Any domain where there is a huge amount of repetitive data available is ripe for introducing AI. For example speech recognition, image processing including self-driving cars. Success in NLP is driven by the fact that there is almost unlimited data for one-off statements and questions with no context or no memory. If I say “I want to buy orange” it is in most cases a simple statement needing no additional information about context or memory to understand. The intention is: “Buy Orange Juice”.

When there is context or memory involved, this creates dimensionality. If I say I want to “buy orange juice” but I have previously told you that I am a financial trader who trades in orange juice, then you need to understand that in this context I want to buy a financial instrument that will make money if the price of orange juice goes up.

So now what does our data look like:

“Buy orange juice” means: buying a bottle of orange juice from the shop OR if has previously stated that they are a financial trader in orange juice, it means they want to buy a financial instrument linked to the price of orange juice.

What if our financial trader has just said he is thirsty, then he means he wants to buy a bottle of orange juice from the shop. So we add another data point OR if has previously stated that they are a financial trader in orange juice but they have recently stated that they are thirsty, it means they want to buy a bottle of orange juice.

An financial enterprise would quickly run into problems if they launched a trading bot that users believed had human level "intelligence".

Conversation data has many dimensions, unfortunately. Infinite dimensions. This means that the machine learning algorithms would need to have access to a dataset that had large amounts of data for every possible dimension, and that is of course impossible.

This does not, of course, mean that passing the Turing Test is impossible. We know it’s possible because we already have the technology to do it, in our brains. Just like people hundreds of years ago knew that flight was possible by observing birds flying.

The issue is that our approach to AI in this cannot be built on big data because big data with sufficient dimensionality does not exist. There are simply too many variables, too many dimensions. Even as we speak Google gets 800 million searches a day that it has never seen before. That gives you a clue as to how difficult the data approach would be.

Ray Kurtzweil at Google is following an approach that to some extent tries to replicate the human brain. He has estimated that we will get to generalized intelligence and be able to pass a very hard Turing Test by 2029.

His forecast is based on the assumption that progress in this field will be exponential and therefore even relatively modest progress today is much more significant than it seems if you assume that we are on an exponential trajectory of progress.

Whether he is right we will have to wait and see, but what it does tell you is that it is highly unlikely that the break through will happen in the next 10 years.

The final point is what would it mean if a machine passed a credible Turing Test. If the machine passed the test using some sort of big data approaches, in a similar way to the way machines beat humans at board games, even sophisticated ones, the implications would not be as great as if the machine passed it using a brain replication approach. The brain replication approach would mean that the machine is likely to be closer to “thinking” in the way that we define thinking as humans. It could extrapolate meaning from minimal examples in the way that humans do, rather than need hundreds of examples of the exact case to extrapolate meaning.

As mentioned above, it is more likely that a “brain replication” approach will provide the break though as a big data approach is not possible. This would likely mean that machines would have achieved a general intelligence, not just in conversation, but it multiple domains. The implication of this cannot, of course, be overstated as this would likely lead to complete reset of society. This is especially true if machines have the ability to improve themselves in meaningful ways which will lead to possibility exponential increase in their intelligence in a virtuous circle that will change life as we know it.

Sticking to more mundane matters, it is worth bearing in mind that even if a machine was the equivalent of a human, that does not mean that we would interact with them like we do with humans. This exactly the same as with a human. Interacting with humans is not always efficient. Trying to explain to your colleague how to do something over the phone can be tedious and inefficient in situations where it would be easier to show them how to do it. If only humans had a graphical interface available over the web!

Voice interfaces (or chat based interfaces) clearly have limitations in terms of the inputting or outputting of information. Clearly there are limitations and situations where it is much more efficient to show information graphically, or click on a graphical interface, than use a voice interface. Bot frameworks are therefore designed to always try to get the user back to the happy path and not let the conversation meander.

My point is also that computers are not limited like humans in terms of the interfaces they can use to receive or provide information and therefore conversations with machines will necessarily involve using the optimal interface for the task at hand.

While passing the Turing Test would be a huge milestone in terms of human / computer interaction, the actual human / computer “conversations” will not be limited, to just voice and text.