Representing language using logic has been an ambition for nearly a century now. Around a hundred years ago, polymaths like Bertrand Russell were furiously fighting to capture the nuances of language with a view to developing a universal formal language. This turned out to be a mammoth and ongoing academic pursuit that characterised the early days of research in the field of computer science – and still continues today.
action.ai has been dedicated to capturing the nuances of language in order to create enterprise-level conversational AI that can be successfully used for customer service. We have a deeply commercial focus in the health and banking industries and our mission is to create automation that provides a delightful experience for customers and excellent ROI for businesses.
“Imagine language as a wild horse that we’ve captured, harnessed and ‘broken’ so that we can work with it rather than fight against it.”
To date, success stories in conversational automation have been few and far between. Chatbots have had some success in dealing with simple requests, but operate using overly-simplified language models and so cannot handle the natural back-and-forth of conversation. Imagine language as a wild horse that we’ve captured, harnessed and ‘broken’ so that we can work with it rather than fight against it. That’s not to say that the beast is fully tamed, of course.
It may be hard to think of language as anything close to a wild animal. After all, we probably voiced our first word when we were about 14 months of age, and will likely have enjoyed a natural progression since then. It comes so naturally to us now that we can sometimes talk out loud seemingly without really thinking. Most of us can talk on a range of subjects with a range of people. We can even talk about things we know nothing about and some of us can get away with it. For the average adult human, conversation comes relatively simply. But language is anything but.
Communication is so complex that the jury is still out on what goes on when we actually do it. One school of thinking (cognitive linguistics) is that we actually think in metaphor, and so metaphor in language is a real reflection of, or isomorphic with the way we basically exist. If this is true, then metaphor is something that happens before we think of words, and is tangled up in the way that we use words.
The ‘Whorfian’ school of thinking holds that we actually use language to organise our thoughts, and not the other way around. A classic example of this is in Inuit languages having lots of words for snow – the line being that people who speak these languages have used language to structure their conceptualisation of the environment in which their culture has evolved. There are a number of counterexamples to this and it’s an open topic, but to the extent that we take a pragmatic approach at action.ai – we see the dynamics of minds-words-world as quite fluid.
This fluid way that we all communicate with one another changes from one unique human to the next. When we use the spoken word, we cherry pick the words we want to use from our inner bank of lexicon. We build our words into sentences and grammatical configurations, we colour our dialogue with metaphor, analogy, humour, insinuation. We pepper them with humour or disdain or sarcasm or enthusiasm – maybe jokes or references to past events. We throw in red herrings, we make mistakes and correct them. We forget where we were, we repeat ourselves. We forget to mention important facts. We get sidetracked or lose focus. We make up words and use current slang or jargon – or sometimes just to make ourselves sound funny or clever. Sometimes we are successful in conveying the right picture using the right words. Sometimes we fail miserably. Every time we open our mouths to communicate, a festival goes on. All avenues are open in real time for anything to happen.
This is just the tip of the iceberg. During two way conversation, two or more brains work together to fully enjoy the seamless back and forth conversation that comes so naturally to us. We align with one another, taking cues here and there, using linguistic and situational context to drive the interaction forwards. Now imagine you’re a computer trying to make sense of it all, trying to join the conversation and respond intelligently. Pretending to be human is really hard. This is just the kind of thing computers have nightmares about when we put them to sleep at night.
“People use language all the time to avoid saying what they really mean”
The wonderfully spontaneous ebb and flow of language is only a small part of what makes it so excitingly nightmarish to work with. Language is after all a creative process and we can shape it into any number of permutations. People use it all the time for example to avoid saying what they really mean – and computers struggle with this. Even though we can now make computers paint pictures, tell a story or write poems, getting them to interact intelligently within a real time, unscripted conversation is Herculean in difficulty. One reason we are able to make our technology work at this level is that we have concentrated on remaining domain-specific, concentrating our efforts into specific verticals.
Working in specific industry verticals does not diminish the complexity of language taming. Every human conversation is riddled with rich variety and presents a huge challenge to language processing technology whose job it is to constantly sniff around for meaning. The tech we’ve created must break down every utterance and cue and context frame. It must search around complex-layer upon complex-layer to find meaning, and then it must relay a well selected response in return.
“Our linguists, Machine learning and AI experts must teach our conversational AI tech to anticipate how humans will interact”
Natural, conversational language needs a human counterpart. Without a mouthpiece, the spoken word doesn’t just sit there like a disembodied brain or a severed limb – it simply doesn’t exist. This makes language processing to some degree an abstract activity, and a constant challenge as a consequence. Our linguists, Machine learning and AI experts must teach our conversational AI tech to anticipate how humans will interact in a given context.
Being domain specific makes the process of supporting conversation much more efficient and effective. It would be unusual for example for a customer to call a banking customer service number in order to ask how to fix a broken washing machine or request a weather update. Our tech won’t shut down an unlikely conversation, it will just lead the customer gently back to the task in hand. At all times, whatever the interaction, the customer receives a high quality, professional service.
As AI experts, our objective is to work around the concrete and literal to get to the abstraction of an intent, and then, importantly, map back to a concrete, literal response. When we develop a model for natural language understanding, we must help our technology establish what the speaker is actually saying. Sometimes familiar turns of phrase in natural conversation can be difficult to interpret.
Consider this turn of phrase:
“I recently lost my job”
A statement like this could come up in any number of contexts – in a conversation discussing a mortgage application for example.
Simply identifying that “this user has lost their job” is only the beginning. Effective language processing requires the technology to work out the context of the statement and why the customer has provided the information. The in situ context must be identified to ensure the information ‘makes sense’ and the correct response is acted upon and relayed back. In this situation, some empathy might be required within the response.
Working primarily in banking and healthcare means that action.ai’s AI technology requires huge attention to detail to avoid error. In dealing with a response such as “I’ve recently lost my job” we seek to build models of natural language understanding that are plugged into the context in which the speaker has said what they’ve said. This needs to include an understanding of the goal the speaker is trying to accomplish, how this goal relates to real-world knowledge about jobs, the implications of a job loss in that context, and perhaps information specific to the speaker themselves.
Our approach to modelling natural language
Here are some general but also concrete responses to some of the linguistic challenges being discussed here:
We’re domain specific. We develop systems that are endowed with domain-specific awareness. This means our AI system knows why its users are making contact and the corresponding linguistic spaces that those users will be navigating.
We capture meaning. We use models that capture the emergence of meaning on a number of different levels of abstraction. These could be sounds, words, phrases, utterances, dialogues, and discourses. Our model allows these different layers of representation to dynamically interact.
We process natural language. We build pipelines that reveal processing at the various points of contact humans themselves have with language. This exposes how classifications of everything from sounds to discourses can be applied contextually to generate accurate and functional interpretations.
We facilitate interfacing. Between abstract computations and human-readable representations of user, domain, and world knowledge, allowing for an understanding of why systems are making the interpretive choices and corresponding linguistic responses they do at any given point in a conversation.
This provides a broad brush outline of our approach to modelling natural language at action.ai. We accomplish these goals by using our system architectures that combine cutting-edge data-driven classifiers with sophisticated methods for representing and interacting with information about specific domains as well as a general encyclopaedic knowledge of the world.
Our commitment to our end users
We have a commitment to those who use our technology in a customer service capacity. We will ensure that users’ personal and unique everyday language will become an instrument for facilitating their objectives, not a hindrance.
“We invite customers to use our technology in their own linguistic space – the space that feels the most natural to them”
Instead of persistently nudging customers towards the exit by getting them to say less and click more, our technology treats users like welcome guests. We invite those guests to use our technology in their own linguistic space – the space that feels the most natural to them. We want to help them accomplish their goals easily, in their own time and using their own words.
In essence, we want our guests to actually enjoy interacting with our conversational AI, and receiving a useful and intelligent service from it in return. This is a combination we fondly like to refer to as ‘a delightful experience’.