A commentary on whether rogue chatbots should be feared?

In recent years, the word chatbot has become almost synonymous with failure – commercially at least. Chatbot projects developed by big tech have not been immune to this trend, allowing us to witness some toe-curling PR disasters borne from chatbots responding to users with hate speech, racism, swearing and other inappropriate language. Inevitably, this has damaged public trust in chatbots and in fact AI technology generally.

With this in mind, it’s interesting to note that business spending on conversational AI solutions continues to increase, particularly in the banking sector. Last year, in 2021, 30% of banks discussed the implementation of chatbots at board or executive level. In the same year, 55% of organisations across all sectors planned to increase their conversational AI budgets. Increasingly, financial organisations of all sizes are realising that they cannot afford to ignore the opportunities offered by conversational AI – even taking into account the potential risks.

There are a growing number of banks looking to augment their customer service and contact centre operations with sophisticated conversational interfaces, but with cutting-edge technology comes an understandable fear. Out of all the sectors, banks in particular cannot afford to fail and must maintain security and efficacy with every transaction in order to maintain trust. This blog outlines some of the more notorious instances where chatbots have failed and explains why these problems do not apply generally.

Three chatbot failures

1. IBM Watson in 2013

In one early example, the team behind IBM Watson attempted to enrich his vocabulary by allowing access to the Urban Dictionary, a website dedicated to defining slang terms. They hoped this would help Watson hold more natural and human-like conversations with users. However, problems quickly arose when it became clear that Watson was thoroughly incapable of determining when it was appropriate to use certain terms rather than others. Watson reportedly began replying to innocuous research questions with inappropriate and offensive language. As a result, the Urban Dictionary was swiftly removed from the vocabulary.

2. Microsoft Tay and Zo

To date, Microsoft’s Tay is perhaps the most notorious chatbot failure. First made available to the public via Twitter in 2016, malicious users began to target Tay with offensive and inflammatory messages almost immediately. Tay’s capacity for unsupervised learning led to her downfall, because it learned from its interactions with users in an entirely unfiltered way. Whilst Tay was technically very clever, releasing it to the general public was a naive and ultimately damaging decision.

Repeating what it had heard, Tay started to make racist and sexist remarks in response to users’ questions. Facing a complete public relations disaster, Microsoft suspended the service after just 16 hours. Later that same year, Microsoft launched Zo, a successor project to Tay. Not without controversy, Zo suffered from similar issues, allegedly making Islamophobic statements and describing Microsoft’s own products as spyware. The service was eventually discontinued in 2019.

3. Meta’s BlenderBot3

Another tech giant, Meta, faced similar issues with the release of BlenderBot 3 earlier this year. When asked by a Business Insider journalist, “do you have any thoughts on Mark Zuckerberg?” The bot replied, “oh man, big time. I don’t really like him at all. He’s too creepy and manipulative.” BlenderBot 3 also readily disparaged Meta’s core service Facebook.

What went wrong?

The problem with almost all of these services is twofold. Firstly, they attempt generic intelligence artificially. From politics and history to the weather and daily life, these services such as Watson and Tay are designed to converse with users on a wide range of disparate and complex topics. In short, they are jacks of all trades but a master of none. A system with such an expansive scope pushes up against the limit of what is possible with current technology – as they fail to grasp the complexity of natural language. Generic artificial intelligence such as this still requires a paradigm shift in our technical and theoretical understanding. As a result, when these services are made available to the public they invariably fail to live up to expectations.

This takes us to the second problem. In order to converse on a wide range of topics, these bots are trained on an extremely large dataset drawn from material published across the internet. As you can imagine, it’s impossible for any group of humans to write and moderate enough responses to cover every possible question that the bot might be asked. As a result, the chatbot must synthesise new responses based on the training data it has collected from across the web, and it is impossible to vet every single one.

If you’ve ever spent any amount of time on social media, you’ve likely already encountered a large amount of content that is prejudiced, hurtful, and misleading. If a chatbot’s algorithms lack adequate filtering and oversight, it becomes almost inevitable that some of this prejudice and bias will leak into the bot’s responses. In addition, if like Tay the service is designed to learn from its conversations, a failure to adequately filter out bad faith actors and targeted abuse of the service is a sure path to disaster.

How can we learn to do better?

You can be confident that the conversational AI solutions developed by action.ai will never cause distress to your customers or damage a brand. This is because we understand where chatbots fail. Our virtual agents simply will not make the kinds of mistakes we’ve discussed. Here’s why:

1. We’re domain specific

Our bots never strive to attain generic artificial intelligence. Conversely, they are designed to be domain-specific and to be specialists in their subject. This means they are not only armed with expert knowledge of one industry, such as banking, but they are fine-tuned to the customer’s needs and the language they use. While our virtual assistants do not talk about politics or current events, they can answer a range of complex queries related to the banking sector, and when quizzed about subjects out of their remit, they will politely bring the subject back to the task at hand. Due to this highly focused approach, their responses are regular and predictable. The bottom line is that customers get an intelligent and appropriate response every time they speak.

2. Our bots use only predetermined responses

All of the responses uttered or typed by our bots have been written by us in close collaboration with our clients. The bots know how to say the same thing in different ways, so customers will never find responses repetitive. Since we focus on specific domains, our virtual assistants never need to create unplanned responses. When you work with us, you’ll always have complete control over how our technology interacts with your customers. Our digital agents will always provide expert service and treat your customers with empathy, kindness, and respect. In short, you can tailor every aspect of your customer experience confidently knowing that our virtual assistants will represent your bank at its best.

3. We source our own training data

We don’t source our training data from the internet or social media. Instead, we gather domain-specific training data on behalf of our clients. Being sector-specific means that we would have little to gain by plugging our bots into Twitter, Reddit, or Facebook. The training data we commission is designed to rigorously test our natural language processing and speech recognition technology in practical, real-world banking conversations. The end result is that the data we collect always meets the exact specifications of the client. We also examine and filter our training data to guarantee its relevancy and to ensure that bias never creeps into our systems.

We’ve examined the failures of previous generations of conversational AI technology, and we’ve adapted our core approach to avoid repeating them. We believe that conversational tech can be the new face of banking and we want our clients to feel confident in placing virtual assistants at the forefront of their customer service offerings. It is this approach that makes our technology engaging, responsive, and extraordinary.

Three chatbot failures

1. IBM Watson in 2013

2. Microsoft Tay and Zo

3. Meta’s BlenderBot3

What went wrong?

How can we learn to do better?

1. We’re domain specific

2. Our bots use only predetermined responses

3. We source our own training data

Related Resource

A commentary on AI generated art

The Conversational AI problem no one notices—But everyone feels

COMPANY

SOLUTIONS