The first iteration of HumanFirst was a browser extension that ingested customer support live chat conversations, trained a black-box AI model that predicted what the chat agent’s next response should be, and provided auto-complete suggestions to the agents to speed up their replies.

While we were building this (and after raising a pre-seed round to bring it to market 😓), we saw clear signs that at least one of the big players would commoditize this type of technology sooner rather than later, putting HumanFirst at risk - so we pivoted.

Google’s launch of Gmail auto-complete and smart replies supported our decision: I don't doubt that these will go from being simply a Gmail feature, to a full-fledged Google API/SDK in the near future.

Then, OpenAI's GPT-3 introduced a completely state-of-the-art, massively pre-trained language model, that predicts text (based on user input): the ways people apply its capabilities go way beyond conversational AI (and are fascinating).

And this week, Google announced another conversational language model trained on seemingly millions (if not billions) of topics: LaMDA looks similar in approach to GPT-3 (although I might be wrong, and look forward to learning more about it).

I’ll start with the obvious: like GPT-3, LaMDA’s demo is really impressive: the AI generates an open-ended conversational-like experience that is light-years ahead of what was available even a few years ago.

Truly, this is “AI” in the real sense: there are no hard-coded rules, what is spit out is magically generated from billions and billions of pieces of data.


There are clear applications for this type of technology: one of the coolest applications I saw for GPT-3 gives video game avatars (i.e: the virtual people you bump into in Grand Theft Auto for example) completely unscripted conversational abilities, that allow you to initiate random conversations that could probably pass the Turing test.

Very cool, lots of fun to play with, and clearly a lot of potential for use-cases where creativity and spontaneity play a big role.

Is GPT-3 the future of conversational AI?

I'd argue that by itself, GPT-3 won't be sufficient, for a simple reason:

GPT-3 is extremely powerful but doesn't understand a single word it produces.


Because of this, most of the initial applications of GPT-3 stayed in the realm of either “fun”, “cool tech demos”, "design assistants", or “proof of concepts” (google “best applications of GPT-3” and see for yourself).

This might be simply a question of time it takes for people to figure out how to correctly apply it to solve actual problems: indeed, more recently OpenAI posted examples of real-world applications incorporating GPT-3 inside their solutions, including for customer support and search use-cases.

However I think that at its core, generative models like GPT-3 will be hard to apply to use-cases where consistency and "explainability" of the AI's prediction is important, unless they provide the ability to easily "override" (or intercept) the AI's response.

And this is clearly something the GPT-3 development team anticipates: the answers and classification endpoints of the GPT-3 API allow developers to "leverage labeled training data without fine-tuning" and "for relevant context to be added to the prompt before completing with GPT-3".

In this sense, developers will still be dependant on a hybrid model, where they need labeled training data for the things they want to be able to measure and control.

The clearest feedback we got from initial HumanFirst users who deployed our browser extension, was that they valued the quality and consistency of their agents’ responses, not speed: and that providing a response that’s not correct introduces unacceptable risk.

This is what led us to building the complete opposite of a black-box model: a platform that gives users complete control over their AI’s data (and therefore its quality, consistency, measurability etc).

Teams want to harness AI more, not less.
And harnessing GPT-3 (or other generative models) for conversational AI appears difficult. Let’s say:

  • You deploy a chatbot powered by GPT-3 that answers customer questions about travelling. It’s able to small-talk and provide answers to questions about cities, what there is to eat, what the sights and sounds are etc. It’s been trained on millions of conversations about sight-seeing, it can fool a human into thinking they’re talking to another human.
  • But what if you want to change the name of the restaurant or area that it recommends? It's harder to control that behaviour. The model finds an answer that seems to “auto-complete” the conversation correctly, but you can’t “intercept" that response - unless you understand what the question is in the first place.
  • And therein lies the chicken-and-egg problem: if you want to be able to “override” this type of generative model, you need to build and train a regular classifier that has a label for every single “intent” that you want to intercept in the first place.


The other obvious problem is that while models like GPT-3 and LaMDA can talk about planets and economic principles, their knowledge of your custom domain information is likely very limited: they will try to pretend like they know (and give an answer that tries to make sense), but there’s no control over what comes out, and it’s impossible to be accurate for domains and information that it hasn’t been trained with.

To be honest, I’m not sure to what extent re-training GPT-3 yourself with custom data is possible, to override the behaviour and build more domain-specific knowledge; there are open models being developed to compete with GPT-3 though, so it's only a question of time; but it doesn’t change the fact that in end-to-end-models, it’s the AI deciding what to answer, instead of you controlling what to do after understanding what the customer said.

To be clear: I think that generative models like GPT-3 and LaMDA open amazing opportunities to extend the capabilities of both conversational AI, and the tools used to develop it. For example, we’ve been asked numerous times whether GPT-3 (or something similar) was powering the training example recommendations in HumanFirst: it’s not (we use your own data to find semantically similar utterances), however the idea of integrating generative models within the conversational AI tooling workflow makes a lot of sense.

And who knows, GPT3's boundless creativity might eventually take conversational AI into directions that we haven't considered yet! I'm open to that.


I’d love to hear from people who have explored, built with, experience with and otherwise interest in GPT-3 like models, especially to apply it to conversational AI: do you believe the future will see end-to-end models completely replace traditional “intent-based” conversational AI, or will we see the continued need for a more hybrid approach? What is the best approach for harnessing this type of technology adequately?

Please feel free to comment and share your thoughts on LinkedIn: https://www.linkedin.com/posts/gregory-whiteside-7bbbab2_gpt-3-in-conversational-ai-activity-6800825104323366912-4XQv