The phrase “I’ll know it when I see it” has probably been used in one form or another for a very long time, reflecting the fact that the experience of being a human is deeply wrapped up in our ability to use intuition.
As with many aspects of fundamentally human behaviour, language is something that is incredibly hard to describe, and yet the difference between good communication and bad communication is easy to spot. In what follows, we’ll offer a brief overview of the space surrounding artificial conversational agents and some of our thoughts on how to spot the signs of good (or not so good) language tech.
Conversational tech needs to go deep, not wide
There are a number of conversational technologies that purport to be great at hearing and understanding people, regardless of what they’re talking about. The appeal of this sort of technology is pretty obvious; a single product like that could be peddled as supporting any number of different use cases.
There’s a catch though – that doesn’t really work. A universally competent artificial conversational agent would be tantamount to what’s known as artificial general intelligence, and this is a goal that remains probably a number of technological and theoretical advances away from reality.
“Being successful in general with language has to do with having a body in a world, and knowing simple things about that world like the colour of a clear sky or the sound of traffic.”
A fluent human brings to their communication a wealth of knowledge about the way the world works in general, as well as the cultural specificities of a particular language. Being successful in general with language has to do with having a body in a world, and knowing simple things about that world like the colour of a clear sky or the sound of traffic.
Rather than strive for artificial general intelligence, at action.ai we develop systems that are endowed with focused expert knowledge about a particular domain. Our agents learn about the great many ways people use language in the specific context of a particular communicative goal, for instance extracting financial information from a complex financial ledger of business transactions. This means that our virtual assistants won’t chatter about current events or answer questions about the weather, but they will help users efficiently achieve the goals they bring to conversations, using their own natural way of speaking. When users go conversationally off-piste, our virtual assistant will bring back the conversation to the task at hand.
Natural language should be natural
A typical feature of contemporary chatbots is that they fall back on rigid multiple-choice routines, effectively taking users out of their own natural language space by explicitly mandating how interaction unfolds. In a textual setting, this often takes the form of button or keyword interfaces that provide users with a small set of choices about how to proceed at a given point in an interaction. In a voice setting, systems will similarly resort to asking users to provide simple “yes or no” or keyword answers, or to choose from a shortlist of options, sometimes simply by speaking a number, so effectively pressing a button with their voice.
This sort of corralling is fine, if not exactly conducive to an excellent user experience, so long as users are prepared to stick to the pathways delineated by the limited choices offered to them. But if a user is effectively just navigating a simplistic decision tree, what’s the point of offering them the ability to do this using complex language in the first place?
Our philosophy is to treat language as a mechanism for facilitating communication, rather than as an impediment to information processing. We strive to build systems that allow users to communicate about their own objectives on their own terms. By creating agents that are truly dedicated to understanding language, we provide users with a chance to work together with an AI to get a grip on complex information using truly natural ways of speaking. Our goal is to offer a delightful interaction with our state-of-the-art technology playing a key role in the process.
Knowing when we don’t know
Another claim often made by conversational AI developers is that their technology has a ‘superhuman’ ability to understand language – a stance in line with the idea that computers can clearly outperform humans in many areas. When it comes to filtering data, crunching numbers, or considering all the possible moves in a board game, computers have humans beat, and they have done for some time.
“An important property of language is that it is designed not only for conveying information, but also for absorbing and resolving miscommunication.”
Language is a special domain, though, in that language is a fundamentally human endeavour. An important property of language is that it is designed not only for conveying information, but also for absorbing and resolving miscommunication. Analysis of human-to-human conversation, particularly in the setting of strangers trying to achieve a mutual goal on a phone line or through a webchat, reveals that much of our communicative energy is spent verifying, amending, correcting, and adjusting our statements in order to reach a point of shared understanding.
At action.ai, we develop systems that are good not only at understanding people, but also at dealing with the inevitability of misunderstanding. This is another area where our combination of technical and linguistic expertise puts us in good stead. We apply our technical expertise to the implementation not just of models that are big and fast, but also ones that instantiate the wealth of theoretical knowledge that underwrites the study of language.
Transparent complexity
Language is hard, and the truth is that nobody really has a great idea of how exactly humans do it – in fact, there are still open debates about the nature of things as essential as grammar and vocabulary in the linguistic community. The inscrutability of language can sometimes lend itself to the presentation of language processing models as likewise impenetrably complicated things.
Just because language is a complex topic doesn’t mean that we have to be mystified by the workings of language technology, though. A human communicating in a professional environment is expected not only to know how to listen and construct lucid responses, but also to be able to explain their understanding of what they’ve heard and to justify their choice of words in their response.
We feel that an artificial conversational agent should be held to these same standards, and so we work to develop technology that is simultaneously dynamic and transparent.
“Just because language is a complex topic doesn’t mean that we have to be mystified by the workings of language technology”
Our systems operate in an end-to-end fashion, with different components constantly interacting with one another in order to converge rapidly and accurately on interpretations of and responses to user input in the sometimes chaotic context of a real-time conversation – and yet the different aspects of our systems are also open to scrutiny. This means that a company hosting one of our systems can be confident that they will have control over the substance and tone of the way that our conversational agents interact with their customers. At the end of the day, we can provide technology that is under your control and honed to facilitate the user experience that you want to provide.
Don’t try this at home
There are a number of conversational AI technologies available on the market today that offer developers a chance to build their own chatbots on the back of generic models for natural language understanding and dialogue management.
“Our stance on this proposition is that if it seems too good to be true, then it probably is”
The appeal of being able to adapt an established platform to a targeted objective with relatively little effort is obvious, but our stance on this proposition is that if it seems too good to be true, then it probably is. The reality is that building high-quality language technology takes years of expertise. At action.ai, we’ve established a team of language technology experts who have their fingers on the pulse of the many different aspects of our field. Considerations that we bring to any project include an understanding of the conceptual scope of the language space in which a group of users is going to operate, the structuring of the right data for reflecting what our systems need to learn about this space, and the integration of these theoretical and practical commitments within a complex technological pipeline designed specifically to provide users with an effective and fulfilling experience.
With this in mind, we would like to suggest that building a fantastic artificial conversational agent is not something that is simple or obvious. But when you see the technology that we can provide at action.ai, you’ll know that you’ve seen something exceptional.