Heads Up: If you’re new to bottom-up labeling, please read “A bottom-up approach to NLU”.

What is bottom-up labeling?

Bottom-up labeling applies the tried and tested divide-and-conquer approach to the problem of labeling large datasets, with great success. Instead of expecting a human or unsupervised algorithm to correctly “predict” what intents and abstractions exist in the data, it provides a simple framework to iteratively discover this information. [1]

Below is a simple example of what bottom-up labeling looks like. Starting from the left with unlabeled utterances and moving to the right shows intents with increasing specificity. This specificity is achieved using a bottom-up approach to labeling. We’ll show you how to put this approach into practice using HumanFirst!

Part 1: Setting things up!

This article is part 1 in a series that will show you how to apply a bottom-up approach to labeling and intent discovery with HumanFirst. In this article, we’ll focus on getting started with HumanFirst and how to set up the bottom-up labeling process.

Getting started is simple.

Step 1: Upload your raw conversational data to HumanFirst. You can upload utterances or 2-way conversations in TXT or CSV formats respectively. For more information click here.

Step 2: Head to the Unlabeled Data section of HumanFirst and begin selecting utterances that are related with a high level of abstraction (i.e. questions, problems, requests etc).

In the example above we chose the initial level of abstraction to be: questions. We then selected utterances that relate to this label. Once a decent amount have been selected, we label the utterances in an intent. We’ll call this one “has a question”.

The outcome of these steps is valuable, as it provides high-quality and domain-specific training data to classify users who “have a question”.

Step 4: We’ll now want to look at our intent “has a question” and begin selecting some of its training data.

As you can see, selecting an utterance within an intent causes a semantic re-rank within the training data. This speeds up the selection and re-factoring process.

Step 5: Assign the selected utterances to a more specific sub-intent of your choosing.

We end up with two sub-intents: has a question > about account & has a question > about settings

Step 6: Repeat steps 4 & 5 within your new sub-intents (to classify labeled utterances into further sub-intents) until the desired level of granularity is achieved.

Every step produces training data for classifiers that can recognize increasingly specific intents: this is one of the major advantages of this approach.

Repeating this approach will yield an intent structure/hierarchy that will reflect your domain. After a few minutes of this process we’ve generated an intent structure that contains trained classifiers at every level of abstraction. This facilitates the understanding of our corpus and our identification of long-tail intents.


[1] A bottom-up approach to intent discovery and training