HumanFirst Studio was built in order to manage and continuously improve the training data of large conversational assistants, identifying valuable training data from existing sources that are often available but hard to tap into without proper tooling.

In this article we’ll see how to use available datasets or your own in order to create a Botpress bot from scratch without having to come up with every single training phrase. We’ll also see how to use our command line tool, hf, in order to seamlessly integrate Botpress with studio in a git-oriented workflow.

Note: What you’ll learn in this article can also be applied for continuous improvement of deployed Botpress projects


You will need a HumanFirst Studio account in order to go through this tutorial, you can create a free account here to get started.

Install the HumanFirst CLI tool

Download one of our precompiled binaries at:

Choose the binary for your operating system.
For linux do:

curl -L
chmod +x hf-linux-amd64
sudo mv hf-linux-amd64 /usr/local/bin/hf

You can then login to your studio account from the command line:

hf auth login <email here>

You should then see something like this, indicating you have logged in properly.

Authenticated as (org=example uid=ABCDEFGHIJ)

Install Botpress

Download the latest archive from their downloads area

Start the server using the provided binary

$ ./bp

You can now navigate to http://localhost:3000/ and create the admin user to begin using Botpress.

Starting a new Botpress project

Once you have logged in, click the Create Bot button on the top right of the screen, then select New Bot. Let's pick an example that's already within their templates. Call it smalltalk and select Small Talk in the bot templates dropdown, a bit further down. Click Create Bot to confirm.

This bot is fairly simple, and most of the logic lied within the Q&A section.

Importing your new Botpress project into Studio

Now that you have a bot, create a workspace in which you’ll import your data. (this is essentially our labeled container that will contain your intents and let you manage and improve them.)

$ hf workspace create Smalltalk
Created: playbook-X3NUP5UISVBB3AP66EVMFUWM

The playbook-* string is the unique identifier for your new project, it won't change so it's safe to add it to any local scripts you might use during the project's lifecycle.

The botpress integration relies on running our hf command line tool within botpress' root directory, or within the bot directory itself. We'll assume here that you are running all commands from the botpress root. You can always look at our full command-line tool documentation for more information on our options, or realy on hf botpress --help.

Push your project data to your HumanFirst workspace.

$ hf botpress --bot smalltalk import --workspace playbook-X3NUP5UISVBB3AP66EVMFUWM --clear
Using bot in directory: '/home/mrene/work/bp/data/bots/smalltalk'
Zipping intents/__qna__0xbmxyusvu_tell_me_a_joke.json
Zipping intents/__qna__2wys1802lo_thank_you.json
Zipping intents/__qna__3xcljo3jr5_where_are_you_right_now.json
Zipping intents/__qna__4wywaoky6v_bye.json
Zipping intents/__qna__4xue79vezl_how_old_are_you.json
Zipping intents/__qna__54qp2njdz0_how_are_you.json
Zipping intents/__qna__5i6fs3vzfj_what_can_you_do.json
Zipping intents/__qna__bvj5soxdb7_who_are_you.json
Zipping intents/__qna__d6q3rr8u64_you_are_great.json
Zipping intents/__qna__ewbosxr8fa_who_created_you.json
Zipping intents/__qna__f2osy09wyi_what_s_your_hobby.json
Zipping intents/__qna__fejwh1accg_would_you_marry_me.json
Zipping intents/__qna__gvare098je_shut_up.json
Zipping intents/__qna__pfd6iuovva_what_are_you_doing.json
Zipping intents/__qna__r1kyn11enn_i_love_you.json
Zipping intents/__qna__rkmsylph53_hello.json
Zipping intents/__qna__uqvj6jdj32_you_are_annoying.json
Bot data successfully imported into workspace 'playbook-X3NUP5UISVBB3AP66EVMFUWM'
Note: Since the commands are ran from the botpress’ root folder, we have to specify the bot id that you selected in the Create bot dialog. If you didn't name your bot smalltalk - you'll have to edit the command accordingly.
Note: We use --clear in order to erase the workspace's contents so it reflects exactly what you have in your repository. It's not necessary for the first time, but it's a good way to bring in changes that someone else committed to the repository. will now show your newly created workspace along with the intents imported from the Botpress project.

Adding more data

We’ll add some phrases to the existing intents. We can use publicly available datasets in order to search for training phrases that fit. Since the intents added in Botpress init are pretty generic, there are good chances we'll find relevant matches.

In Studio, click on the Data sources menu item on the left, then click the Use one of our data sets button to add existing conversations to your project. There are many choices available, but for this tutorial pick the STAR dataset, which contain goal oriented conversations for different tasks. If you have existing data, either from existing human-human conversation or a list of unclassified utterances, this is where you would import it into your workspace.

Augmenting existing intents

Now that we have some unlabled data to work with we can expand the currently defined intents.

In the Labeled data section, you'll find the list of imported intents. Activating one will bring up the list of its associated training examples. Click the Get Suggestions button and some suggestions will be provided from the dataset you added in the previous step. You can then accept training examples that make sense. The None of these look good button rejects the remaining elements.

Note: Recommendations work by looking at all the workspace’s training data and returns examples from your data sources. When you reject, we maintain a list of phrases that are internally tagged as “not part of that intent”. This list is used to improve suggestions, you can see it as an ephemeral binary classifier helping to narrow down your search until you get enough relevant examples.

Discovering new intents

Next, let’s take a look at the Unlabeled data section. This is where all utterances that haven't been assigned to an intent are located.

You’ll see a list of unlabeled utterances that is sourced from your data sources. Since you’ve already added some demo data, there should be a lot of data. The search bar on top is a full-text search feature allowing you to find things the old fashioned way. Try it first by searching for hotel - there are a few intents that can be created relating to these

One of the initial matches is Hi, I am looking for the rating of a hotel.. Go ahead and select it, you'll notice that a new option is available right under the selection: Show similar suggestions. This button will use semantic search to look for similar phrases in the corpus. It's a good idea to mix these two techniques because full text search gives you keyword-based results, and semantic search expands on the meaning of the utterance and returns more relevant matches.

Select a few examples where the user clearly asks for a hotel with a specific rating. Notice that the button is clickable again, doing so will look for results similar to all selected items.

Tip: You can shift+click to select a range without clicking on each of them separately.

Once you have enough elements, click the Label selected data button on the left, and click + Create here to create a new intent. Let's name it hotel_request_rating and click the Create and edit button.

Here are a few intents you may want to create:

  • Book an appointment
  • Reserve a hotel
  • Reserve a hotel with a specific rating (see if you can make this one a child intent of the reserve a hotel one)

Refactoring projects

While working on your project, you may decide that some intents should be merged together or even broken down into more specific intents. In the Labeled data section, where you can view the list of training phrases for an intent, you'll notice a checkbox next to each phrase, clicking it with automatically sort the rest of the list by similarity to the selected phrases. You can click the similar phrases and move them using the left column, as we did with unlabeled utterances in the previous step.

Back to Botpress

We can export our changes using the command line

$ hf botpress export --workspace playbook-X3NUP5UISVBB3AP66EVMFUWM
Using bot in directory: /home/mrene/work/bp/data/bots/smalltalk
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__rkmsylph53_hello.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__54qp2njdz0_how_are_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__r1kyn11enn_i_love_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__gvare098je_shut_up.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__0xbmxyusvu_tell_me_a_joke.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__2wys1802lo_thank_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__pfd6iuovva_what_are_you_doing.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__5i6fs3vzfj_what_can_you_do.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__f2osy09wyi_what_s_your_hobby.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__fejwh1accg_would_you_marry_me.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__4wywaoky6v_bye.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__4xue79vezl_how_old_are_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__3xcljo3jr5_where_are_you_right_now.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__bvj5soxdb7_who_are_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__ewbosxr8fa_who_created_you.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__uqvj6jdj32_you_are_annoying.json
Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__d6q3rr8u64_you_are_great.json
Updating answers in '4wywaoky6v_bye.json'
Workspace 'playbook-X3NUP5UISVBB3AP66EVMFUWM' successfully exported and merged into bot data

You can now go back to Botpress and see your updated data.