Let’s take an airline example: “I want to change my flight from the 3rd to the 4th”.

This is one of the 1000+ requests that an airline might hear across its communication channels. Whether the airline’s conversational AI is able to actually change my flight for me depends on a lot of factors - but mostly on how sophisticated the conversational AI is.

In rare cases, it will successfully help me change my flight within the conversation  with a great experience. This will require multiple exchanges to ask me for my name, flight confirmation #, preferred seating arrangements etc - which is where entities (and slot-filling) are useful.

But I don't think I've ever had a good conversational transaction like this. If I want to change my seat, good luck giving me a better experience via chat, than what I'm accustomed to directly within the airline reservation portal.

Imagine a world where the conversational AI is simply reaaaally good at understanding whatever you say or ask, and pointing you to the right place to complete or answer your request.  Sounds like something you already use? Yes, Google!

Much like I don’t expect to book a trip directly inside Google's search field, I don’t expect the flight change to take place in the conversation (it just happens that a lot of chatbot development roadmaps set that as an expectation); I'll personally be impressed if the AI understands that I want to change my flight, and sends me directly to the flight change UX (where my personal information and context is likely already captured).

I see positive signs that things are moving in this direction:

  • Conversation extensions provided by Zendesk Sunshine Conversations provide "custom interactive experiences that sit on top of the chat window. They combine the intimacy and contextuality of conversational interfaces with the richness and flexibility of traditional web UIs. They’re intended to help users complete more complex tasks and transactions, ones that would be too tedious to complete by exchanging multiple messages back and forth."
  • RCS (new SMS standard) introduces richer components directly within the messaging experience (like menus, carousels etc)
  • LiveChat Inc allowing developers to easily embed iframes and apps right into the conversation experience
  • Great enterprise chatbot experiences (like Kraken's chatbot) are not going outside of their scope, and instead handling requests like "I want to create an address to deposit crypto" by providing a link to the appropriate documentation (rather than try to resolve this in-conversation).

With this approach, entities and slot-filling logic becomes much less important: you can capture entities and pass them on to the "transactional" part of the UX, but you don't have to.

Are you saying transactional dialog flows are a bad thing?

There are obviously scenarios where you can't replace the "multiple messages back and forth experience": voice channels are an obvious one (although I can imagine Siri eventually integrating with my Apple Watch or iPhone and automatically popping open the appropriate UX).

However, the previous example was relatively easy; most airline conversational AIs will understand “I want to change my flight”, and help you fulfill it one way or another: it’s something they hear very often, and therefore have trained to recognize it.

What about “hey I forgot my bad on my flight yesterday” (note the typo)

The long-tail of requests like this one are often not understood (even for a company like WestJet); and if inserting a typo is sufficient to break the bot, it’s clear indication that this long tail request is not as well trained as other intents.

Why are so many conversational AIs unable to understand (and handle) more things, better ? If Google can do it for all general knowledge, surely it shouldn't be magic to build AI that understands the few hundred (or thousands) things your customers might say or ask.

I can imagine the reason being as simple as: There is a finite amount of resources for any given conversational AI project, and something has to be prioritized. 🤷‍♂️

Changing the development mindset from building « transactional dialog flows »  to instead provide a more  « search-like » end-user experience implies that:

  • Non-technical resources can focus on curating the natural language understanding component (the brain of the AI), and build an AI that can understand all long-tail "search" requests
  • Technical teams can plan for different end-user experiences depending on the type of user intent/request; "transactional" experiences can be easily de-coupled from the conversation, and therefore easier to integrate and to maintain.

This shifts the emphasis (and resources) on surprising the user with deep, precise answers to pretty much anything they can ask - and can potentially make the more complex & transactional requests easier to develop and maintain in the long run.

About HumanFirst

HumanFirst productizes the data engineering capabilities necessary for companies to build and improve Natural Language Understanding at scale.