HumanFirst Studio was built in order to manage and continuously improve the training data of large conversational assistants, identifying valuable training data from existing sources that are often available but hard to tap into without proper tooling.
In this article we’ll see how to use available datasets or your own in order to create a Botpress bot from scratch without having to come up with every single training phrase. We’ll also see how to use our command line tool,
hf, in order to seamlessly integrate Botpress with studio in a git-oriented workflow.
Note: What you’ll learn in this article can also be applied for continuous improvement of deployed Botpress projects
You will need a HumanFirst Studio account in order to go through this tutorial, you can create a free account here to get started.
Install the HumanFirst CLI tool
Download one of our precompiled binaries at: https://github.com/zia-ai/humanfirst/releases/tag/cli-0.0.4
Choose the binary for your operating system.
For linux do:
curl -L https://github.com/zia-ai/humanfirst/releases/download/cli-0.0.2/hf-linux-amd64 chmod +x hf-linux-amd64 sudo mv hf-linux-amd64 /usr/local/bin/hf
You can then login to your studio account from the command line:
hf auth login <email here>
You should then see something like this, indicating you have logged in properly.
Authenticated as email@example.com (org=example uid=ABCDEFGHIJ)
Download the latest archive from their downloads area
Start the server using the provided binary
You can now navigate to http://localhost:3000/ and create the admin user to begin using Botpress.
Starting a new Botpress project
Once you have logged in, click the
Create Bot button on the top right of the screen, then select
New Bot. Let's pick an example that's already within their templates. Call it
smalltalk and select
Small Talk in the bot templates dropdown, a bit further down. Click
Create Bot to confirm.
This bot is fairly simple, and most of the logic lied within the Q&A section.
Importing your new Botpress project into Studio
Now that you have a bot, create a workspace in which you’ll import your data. (this is essentially our labeled container that will contain your intents and let you manage and improve them.)
$ hf workspace create Smalltalk Created: playbook-X3NUP5UISVBB3AP66EVMFUWM
playbook-* string is the unique identifier for your new project, it won't change so it's safe to add it to any local scripts you might use during the project's lifecycle.
The botpress integration relies on running our
hf command line tool within botpress' root directory, or within the bot directory itself. We'll assume here that you are running all commands from the botpress root. You can always look at our full command-line tool documentation for more information on our options, or realy on
hf botpress --help.
Push your project data to your HumanFirst workspace.
$ hf botpress --bot smalltalk import --workspace playbook-X3NUP5UISVBB3AP66EVMFUWM --clear Using bot in directory: '/home/mrene/work/bp/data/bots/smalltalk' Zipping intents/__qna__0xbmxyusvu_tell_me_a_joke.json Zipping intents/__qna__2wys1802lo_thank_you.json Zipping intents/__qna__3xcljo3jr5_where_are_you_right_now.json Zipping intents/__qna__4wywaoky6v_bye.json Zipping intents/__qna__4xue79vezl_how_old_are_you.json Zipping intents/__qna__54qp2njdz0_how_are_you.json Zipping intents/__qna__5i6fs3vzfj_what_can_you_do.json Zipping intents/__qna__bvj5soxdb7_who_are_you.json Zipping intents/__qna__d6q3rr8u64_you_are_great.json Zipping intents/__qna__ewbosxr8fa_who_created_you.json Zipping intents/__qna__f2osy09wyi_what_s_your_hobby.json Zipping intents/__qna__fejwh1accg_would_you_marry_me.json Zipping intents/__qna__gvare098je_shut_up.json Zipping intents/__qna__pfd6iuovva_what_are_you_doing.json Zipping intents/__qna__r1kyn11enn_i_love_you.json Zipping intents/__qna__rkmsylph53_hello.json Zipping intents/__qna__uqvj6jdj32_you_are_annoying.json Bot data successfully imported into workspace 'playbook-X3NUP5UISVBB3AP66EVMFUWM'
Note: Since the commands are ran from the botpress’ root folder, we have to specify the bot id that you selected in the
Create botdialog. If you didn't name your bot
smalltalk- you'll have to edit the command accordingly.
Note: We use
--clearin order to erase the workspace's contents so it reflects exactly what you have in your repository. It's not necessary for the first time, but it's a good way to bring in changes that someone else committed to the repository.
http://studio.humanfirst.ai/ will now show your newly created workspace along with the intents imported from the Botpress project.
Adding more data
We’ll add some phrases to the existing intents. We can use publicly available datasets in order to search for training phrases that fit. Since the intents added in
Botpress init are pretty generic, there are good chances we'll find relevant matches.
In Studio, click on the
Data sources menu item on the left, then click the
Use one of our data sets button to add existing conversations to your project. There are many choices available, but for this tutorial pick the
STAR dataset, which contain goal oriented conversations for different tasks. If you have existing data, either from existing human-human conversation or a list of unclassified utterances, this is where you would import it into your workspace.
Augmenting existing intents
Now that we have some unlabled data to work with we can expand the currently defined intents.
Labeled data section, you'll find the list of imported intents. Activating one will bring up the list of its associated training examples. Click the
Get Suggestions button and some suggestions will be provided from the dataset you added in the previous step. You can then accept training examples that make sense. The
None of these look good button rejects the remaining elements.
Note: Recommendations work by looking at all the workspace’s training data and returns examples from your data sources. When you reject, we maintain a list of phrases that are internally tagged as “not part of that intent”. This list is used to improve suggestions, you can see it as an ephemeral binary classifier helping to narrow down your search until you get enough relevant examples.
Discovering new intents
Next, let’s take a look at the
Unlabeled data section. This is where all utterances that haven't been assigned to an intent are located.
You’ll see a list of unlabeled utterances that is sourced from your data sources. Since you’ve already added some demo data, there should be a lot of data. The search bar on top is a full-text search feature allowing you to find things the old fashioned way. Try it first by searching for
hotel - there are a few intents that can be created relating to these
One of the initial matches is
Hi, I am looking for the rating of a hotel.. Go ahead and select it, you'll notice that a new option is available right under the selection:
Show similar suggestions. This button will use semantic search to look for similar phrases in the corpus. It's a good idea to mix these two techniques because full text search gives you keyword-based results, and semantic search expands on the meaning of the utterance and returns more relevant matches.
Select a few examples where the user clearly asks for a hotel with a specific rating. Notice that the button is clickable again, doing so will look for results similar to all selected items.
Tip: You can shift+click to select a range without clicking on each of them separately.
Once you have enough elements, click the
Label selected data button on the left, and click
+ Create here to create a new intent. Let's name it
hotel_request_rating and click the
Create and edit button.
Here are a few intents you may want to create:
- Book an appointment
- Reserve a hotel
- Reserve a hotel with a specific rating (see if you can make this one a child intent of the
reserve a hotel one)
While working on your project, you may decide that some intents should be merged together or even broken down into more specific intents. In the
Labeled data section, where you can view the list of training phrases for an intent, you'll notice a checkbox next to each phrase, clicking it with automatically sort the rest of the list by similarity to the selected phrases. You can click the similar phrases and move them using the left column, as we did with unlabeled utterances in the previous step.
Back to Botpress
We can export our changes using the command line
$ hf botpress export --workspace playbook-X3NUP5UISVBB3AP66EVMFUWM Using bot in directory: /home/mrene/work/bp/data/bots/smalltalk Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__rkmsylph53_hello.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__54qp2njdz0_how_are_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__r1kyn11enn_i_love_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__gvare098je_shut_up.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__0xbmxyusvu_tell_me_a_joke.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__2wys1802lo_thank_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__pfd6iuovva_what_are_you_doing.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__5i6fs3vzfj_what_can_you_do.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__f2osy09wyi_what_s_your_hobby.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__fejwh1accg_would_you_marry_me.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__4wywaoky6v_bye.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__4xue79vezl_how_old_are_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__3xcljo3jr5_where_are_you_right_now.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__bvj5soxdb7_who_are_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__ewbosxr8fa_who_created_you.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__uqvj6jdj32_you_are_annoying.json Writing to /home/mrene/work/bp/data/bots/smalltalk/intents/__qna__d6q3rr8u64_you_are_great.json Updating answers in '4wywaoky6v_bye.json' Workspace 'playbook-X3NUP5UISVBB3AP66EVMFUWM' successfully exported and merged into bot data
You can now go back to Botpress and see your updated data.