Microsoft has recently announced Copilot for Office365. This nifty new tool lets users interact with the Microsoft Office suite using natural language. Open AI has even more recently announced ‘GPT plugins’, where developers can create simpler versions of these interfaces for third party applications. We’re even seeing chatbots come into their own to become useful and enjoyable tools for the end user. It’s been a long time coming.
This short blog discusses some of the key challenges faced by the developer community in creating sophisticated chatbot experiences and how these challenges have redirected the predicted flow of the chatbot revolution.
We also speculate that Robotic Process Automation (RPA) companies will flourish in this energised market, offering new services and tools to organisations wishing to add natural language interfaces to better automate their applications.
Finally, what are the advantages of a conversational user interface (CUI) over a graphical user interface (GUI)? Why has this interface paradigm change been such a long time coming – despite being so clearly articulated seven years ago?
How well has Microsoft predicted the future?
In 2016, Microsoft’s CEO, Satya Nadella, explained that chatbots are going to fundamentally change how computing is experienced by all people. Initially, he said, bots will augment apps, but in time human language will be taught to all computers and become a “new interface.”
He continued by saying that anyone who builds an application, whether that be a mobile app, desktop app or website, will build bots that offer a human dialogue interface. He talked about leveraging human language more deeply in all aspects of our computing, and said we need to make computers more intelligent. On a final note he added that chatbots may have just as profound an impact as the Internet and mobile phones.
There was a sense of celebration and wonder about a paradigm change in human computing, and then the world went chatbot crazy.
Conversation commerce and the chatbot race
Conversational commerce is a term used to describe the use of messaging apps, chatbots, and other conversational interfaces that facilitate the buying and selling of products and services. The idea is that it allows customers to engage in real-time conversations with businesses, just as they would with friends and family.
The vision of conversational commerce was to create a new seamless and personalised shopping experience for customers by integrating the shopping process into the messaging and conversation experience. With conversational commerce, customers can get product recommendations, place orders, track deliveries, and receive customer support all within the same messaging interface.
Early examples of this included using chatbots to order food delivery or flowers, using a messaging app to book a hotel room or flight, or using a voice assistant to order products from an online retailer. It was the year of 2016 and the chatbot race had started.
Retailers invested a great deal of energy into conversational commerce, as did third party software vendors trying to serve those retailers. There was a mighty rush not just to kick off the revolution but to be the first one to hit the jackpot. Fresh development tools appeared on the market, including Facebook’s wit.ai, Microsoft’s Luis, and Google’s Dialogflow (previously named api.ai). It turned out that creating natural language bots particularly for transactions was incredibly difficult, so button bots appeared to bridge the chasm. These were heralded as the promised chatbot, but were in fact simple bots that didn’t require writing much in the way of code and offered a restrictive experience.
Facebook made a bold move at this point by integrating peer-to-peer payments into its messaging app. It also built development tools for businesses to build their own chatbots to let their users order flowers, cabs, browse fashion, and make purchases. They even added support for button bots within their Messenger interface. Alexa joined the party and saw applications of users ordering pizzas, albeit with a limited selection.
But despite the enthusiasm, the central conversational commerce movement was finding it hard to gain consumer support. Natural language technology had promised great things to consumers but was failing to deliver. Consumer and media criticism was harsh. The technology wasn’t easy, natural or free flowing, it was lumpy, restrictive, and disappointing.
Migration from Conversational Commerce to Customer Services
Many companies have developed chatbots in an attempt to handle popular customer service requests such as “Where’s my order?” and “My product arrived damaged.” However, the answers to these types of questions are often idiosyncratic to an organisation. In response to the unique requirements of every customer service department, self-serve development platforms were built and became increasingly popular. However, algorithmically identifying meaning from language is hard, and these development platforms became simpler to use but at the expense of supporting sophisticated customer experiences.
Efforts to create better and more sophisticated chatbots continued, mostly favouring text over and above speech as a channel. Businesses continued to invest heavily in chatbots, supported by large product teams, but their chatbots remained heavily criticised. The frustration was real.
It is difficult to elicit meaning from human language automatically using algorithms especially when using rules, and neural networks are increasingly being used to do so, not least through the likes of large language models such as GPT. Bots are currently in use everywhere and in every country, but the majority focus on customer services rather than conversational commerce or indeed controlling applications in a way prophesied by Microsoft’s CEO. So what happened? Why was this vision never executed by the developer community?
We speculate that this is because chatbots failed in first order language processing challenges. This triggered the migration to information based customer service bots, and the use of buttons in bots, rather than the harder to create conversational commerce experiences.
Welcome to the revolution
The same company that had this original vision seven years ago is now exemplifying the potential of large language models, as well as integrating their use into core products for natural language interaction and control. It’s a big achievement.
We can finally see the potential for users to interact with Office products such as powerpoint. They can ask for a presentation template for a technology sales pitch to a large blue chip organisation that is thinking of migrating from one cloud computing provider to another – for example. There’s talk that the tools might support transcriptions from Teams online video meetings and allow this collective transcription to be used seamlessly as input to natural language inputs for an associated report. Microsoft also provides an example where someone uses an interface to ask a company’s financially loaded spreadsheet which products were the most profitable this quarter.
Microsoft’s Satya Nadella says, “With our new copilot for work, we’re giving people more agency and making technology more accessible through the most universal interface — natural language.” It’s completely aligned with the original Microsoft prophecy, and doesn’t include reference to customer services.
The union of opposites
GUIs are an excellent resource for users wishing to interface with many consumer applications, and they offer many advantages over command line interfaces (CLI). Indeed, GUIs offer advantages in many use cases over CUIs. But CUIs are better in many use cases because instead of navigating individual clicks in a graphical user interface, users can express what they are trying to accomplish in their own words.
There are a number of ways for users to create, amend, experiment, and roll back using language that signals many compound actions within a single utterance. Users can also follow up and naturally cross reference what they have already done. The GUI and CLI are here to stay, but the CUI offers new and complementary interface opportunities. According to a report in the Financial Times upgrading 10 percent of Microsoft’s 370mn commercial Office 365 users to AI-enhanced software would generate $33.6bn over the next five years. OpenAI et al have just published a paper that predicts “…around 80% of the US workforce could have at least 10% of their work tasks affected by the introduction of Large Language Models, while approximately 19% of workers may see at least 50% of their tasks impacted.”
AI companies that have been in this space and understand conversational design, as well as natural language processing and automatic speech recognition, are well positioned to exploit large language models and develop paradigm-shifting CUI interfaces. If speech is used as such an interface through the web (or via a phone call for system control use cases), then additional skills are required for start of speech signalling, handling of speech interruptions, end of speech signalling, handling background noise, speech to text transcription and the biassing of that with dynamic language models to map text to phonemes – and the rest. Being able to use the likes of Open AI’s Whisper will become important as a leading model in speech recognition, but additional skills will be needed to allow for the streaming of speech to enable natural responsiveness. Unless they adapt fast and radically, bot companies with development platforms that serve button-based experiences and low code/no code platforms will be left in the dust.
Importantly, the use of LLMs like GPT for interacting with tools like Microsoft Office requires the models to be trained using domain specific language that establishes user meaning. You can’t simply bolt these models into a tool like the Adobe Creative Suite and expect them to control the application and generate outputs by themselves. That said, it is likely the GPT Plugins will be configurable to enable some basic but effective interface interactions fairly quickly and without much expertise. Fine tune training the likes of GPT to achieve more sophisticated language understanding requires much more skill and effort than prompt engineering, and the LLMs must be trained specifically to interact with APIs of applications.
In our opinion, consumer software providers will quickly start to recognise the benefits of natural language interfaces for enhancing their applications and may even move to create their own NLP models for purposes of allowing users to control their applications in multiple languages via chat and perhaps even speech. We find RPA companies well positioned to enter this field rapidly, providing clients with services and tools to design new custom user interfaces to their applications. For example, a webchat assistant that can help fill in a long and difficult online form promises to reduce friction for users and keep them engaged as they work through it. Some RPA companies may also decide to use their large data sources as base training data in order to create specialised large language models for applications needing specialised human interfaces.
The chatbot journey may have had some unexpected diversions after the initial prophecy, but it’s now flowing, and the current is picking up.
A commentary on the plan to control computers with minds