What is a conversational Interface?

A conversational interface is a software interface that allow users to write or speak in natural language to software instead of using a graphical user interface. It is smart enough to understand what the user wants to do. The advantage is that there is zero or reduced learning curve in using the software as the user does not have to learn to use an inflexible software interface to use the software.

This technology has been enabled by advances in artificial intelligence (AI), in particular in natural language processing (NLP). The technology is in its infancy, however the pace of change is exponential, and soon this technology will dominate our interactions with computers.

What are conversational interfaces?

Conversational interfaces are software interfaces that replace graphical or command line interfaces. They allows people to talk using voice, or chat using a text interface using natural language with computers. They key here is that the conversational interface understands natural language and therefore understands what the users want to do from their own expression of what they want to do in natural language.

Unlike a command line interface where the user needs to instruct the computer with exact syntax or a graphical interface where the user needs to be precise in how they use it, the conversational UI allows the user to express themselves in their own way. Instead of humans having to understand machines in order to use them, we are entering the age where the machines need to understand humans. This reduces the learning curve in using applications. This also means that a more personal experiences can be created for the customers and users of apps.

These conversational UIs can be used through chat applications such as Facebook Messenger, Whatsapp or Twitter, as well as through custom interfaces set up for the web such as chatbots. They are more prominently used right now through voice devices such as Alexa or the mobile phone, but in future will be available through many different commoditized devices (such as cheap versions of smart speakers that can be put in every room) and products such as kitchen appliances.

The conversational interface is already ubiquitous in the form of Alexa, Google home, Siri and other voice assistants. Millions of people can speak to these voice assistants through various devices and for some tasks, such as watching music videos, they are already proving to be a superior user experience. It can be argued however that the bar was very low in this case as typing using the TV remote was always a terrible user experience and virtually any alternative would be better.

Voice assistants are used much less on the PC as the speed of interacting with the PC through the keyboard and mouse is much faster than via a remote. In time voice assistants will be used much more on mobile as it is faster to speak to the assistant than use apps for many tasks.

These assistants demonstrate clearly what a conversational interface is, however they have many flaws. Some people believe that their current sales are based more on novelty value around very limited use cases and that they haven’t achieved product market fit.

The holy grail for the conversational UI design is to be able to speak to humans in the way that humans speak to each other, with some obvious improvements.

With improvements in AI and the bot building tools, the technology can seamlessly handle conversations with multiple dialog turns. While broad assistants such as Alexa and Google Home typically focus on understanding one off commands and questions, it is possible, using a bot development platform, to build customized bots that support multiple dialog turns and that are continuously learning within the narrow topic domain in which they are used.

For example a bot that books appointments needs to know everything about booking appointments, but does not need to know how to play videos on youtube or answer every question in the world. Companies are building these type of highly specialized business bots for specific purposes.

Conversational Interfaces Advantages and Challenges

It is clear that different types of user interfaces are best for different types of tasks. Graphical user interfaces for example are great for many tasks and could not be improved upon by a voice or text interface.

The best interface would likely be an interface that matches the interface to the task or situation at hand. Obviously if the user is using their hands to do other things, or they cannot look at a screen, for example when they are driving, a voice interface is the best interface. The same thing may apply for augmented reality and potentially virtual reality when users are doing other things with their hands.

If they are able to look at and interact with a screen a graphical interface may offer the optimal experience.

Voice interfaces can be a slow way to communicate information from humans to a computer, and even more importantly they are a slow way for humans to receive information. In the extreme, seeing a chart is invariably more efficient than having a machine read out the chart data to you.

A conversational UI however has many advantages, the primary one being flexibility.

Any information can instantly be transmitted over the conversational interface so no input interface needs to be constructed in advance for the data that needs to be input.

For frequent repetitive simple tasks, it may be better to have a graphical user interface. For example, for a shopping task it’s easy to tap in exactly what you need and when you wanted it delivered to your house rather than asking for each item one by one?

It is possible, however, that it may be faster to speak a shopping list to your computer than tap it into a computer when starting from scratch. For one thing you could mix generic names and specific brand names in cases where you didn’t care. For example, you could say get bread, not get Cloverdale multiseed brown bread, if it didn’t matter to you what they should get, or there is already some previous understanding as to what type of bread you like.

Not being as precise can save you time and mental effort. It may be easier to say “bread” than find it on the graphical interface, and certainly easier than finding Cloverdale multiseed brown bread. There is also less effort in recalling bread than recalling Cloverdale etc. It’s arguable that the graphical interface could be amended to take into consideration these types of issues, however the voice interface makes this task much faster and smoother.

In the case that this is a repetitive task, where the list is already set up and only modifications are needed, it’s hard to argue that a screen is not the most efficient interface. Having the entire list read back to you would definitely not be efficient.

There are use cases that seem better for voice (or at least voice and a graphical UI) than just a graphical UI such as “I don’t need washing powder at the moment”. This would tell the system to leave it out for this week but add it back in future. This type of instruction is harder to represent on a graphical UI.

The entire exercise above is assuming however that the conversational interface works well. Current interfaces need to be summoned, are slow to get ready to receive information, slow to process the instruction, and slow to respond. They make errors too frequently in understanding speech. And generally they are not efficient at interacting with graphical user interfaces.

Using current technology it would probably still not be efficient to use a conversational UI to do this job. You can easily imagine however that soon the technology will improve to such an extent that the speed to voice input or text input will be better than for a human. There will be no summoning, no slowness and few errors. In this case the response will be immediate. With this technology the voice or text input could be much faster than for a human. The level of precision can be chosen by the speaker.

And on top of it all, the interaction with the graphical UI will be seamless and fast. This can only be an improvement on the current interaction with software.

The implicit advantage here is that the user does not need to learn a user interface, spend time searching for different functions, or have uncertainty about what happens next. They can simply issue instructions as they would do to another human.

In the near future there will be many cheap devices that can receive instructions and respond or interact with other devices.

There are issues of privacy and access to the devices that need to be addressed. In particular businesses will not tolerate having devices that record to the cloud every conversation in the office. There will need to be on-prem NLP used and a privacy policy regarding digital bots put in place to safeguard privacy.

The question right now is how good can a conversational UI be with current technology and what is the best way to design these UIs.

Conversational interface design is an art and a science. The interface needs to be designed with conversational UI advances and challenges in mind. The design needs to take account of the limitations of the technology right now.

There are many techniques that can be used to design effective conversational UIs right now but the most important approaches are to narrow the topic domain and use human in the loop if necessary. Both of these approaches are ways of addressing the same problem, that open ended bots (such as Alexa or Google Assistant that try to respond to any question or instruction that can be imagined) are likely to fail or be very superficial. Narrowing the topic domain or to a single task makes it far more likely that what the user says will be recognized by the bot. Having a human in the loop means that statements that are not understood can be escalated to a human. This means that the end user can have confidence that they can say anything and their query will be dealt with.

Both these techniques are currently used very effectively in customer service. Customer service is currently the best and most popular use case for conversational UIs. This is because the bots are good at answering repetitive questions fast but also because the humans are freed up to deal with more complex issues. Not only is the customer service experience improved if implemented properly, but the return on investment for implementing a chatbot is high.

Investment in chatbot technology in this area will continue to drive innovation until the point that conversational interfaces become a part of all software interaction and in particularly becomes a key part of the delivery of many services i.e. shifts from being primarily focussed on customer service and providing help to be primarily focussed on customer enablement i.e. doing things.

State of the art conversational interface design is being driven by two related concepts, innovation in natural language understanding, and innovation in context driven bot development techniques.

There is a great deal of ongoing research in language processing. The goal of this research is to make the chatbots capable of interpreting text and spoken language on an almost human level. The theory is that you would be able to point the chatbots at unstructured text and they would be able to answer questions about the text at a human level or better.

The second innovation relates to the way chatbots are explicitly designed. In this case there is a structured process of development, either by categorizing the underlying data or designing specific context triggers and flows for which the bot undertakes specific actions. Even using current day technology it is possible to create bots that seem to be human level intelligence within a narrow domain of expertise such as booking an appointment or ordering food.

As mentioned previously, conversational UIs are part of a general trend to make UIs conform to user requirements rather than the other way round. Current users need to learn UIs and need to be precise in operating them. In the future users will be able to pick which channel they want to use and how they want to use the UI. They will not need to be precise in their instructions as the UI will ask follow up questions if there is any ambiguity.

This will not only reduce the effort in being able to use these UIs, it will drastically increase convenience.

Convenience is ultimately what the conversational UI is about. Imagine a world where every machine or app can be spoken to as though you were speaking to a human support agent who has deep knowledge about the operations of that machine. A human support agent that you can either ask support questions to or instruct to do something for you.

There is no doubt that exponential innovation in conversational UI design will lead to conversational UIs being applicable to more and more tasks, and to them becoming hidden in plain sight throughout our daily lives.