If you’re using Excel to collaborate around training data that powers your NLU, you’re not alone—a shocking 85% of the companies we’ve surveyed resort to tools not made for NLU.

NLU teams are still using Excel for a multitude of use-cases, such as conversational analytics, chatbots, virtual assistants, voice apps, and more. As reported by our customers, it’s been a consistent source of frustration.

Businesses are having to consistently aggregate data across channels, use CTRL-F to explore their unstructured data, and copy-paste to manually organize and label intents and utterances. If that isn’t a big enough headache, teams are fixing errors manually (only when they’re spotted, of course).

Using Excel this way is a living hell:

Being able to easily work with and transform data is the key to understanding what stories it can tell, and what business value it can unlock. Using tooling that was not meant for natural language data unnecessarily complicates NLU projects.

Here are some reasons why:

Spreadsheets suck up your time

Life is short, time is precious; you don’t want to spend it aggregating, cleansing, identifying, augmenting, and labeling your data.

Most enterprise-grade NLU projects involve millions of data points. Therefore, tedious but necessary processes (like the ones I described above) without the help of AI or ML can take days or even weeks. Even with ten power-users on your team, using Excel to create and maintain a large database of intents and their training data will unnecessarily slow down time to market.

It’s no wonder machine learning projects often invite data engineering problems:

In a spreadsheet-based environment, it’s also timely to extract, aggregate, and summarize information, making it slow to support decision-making.

Consider this: An E2E solution for continuously evolving NLU with machine learning assistance from beginning to end. With workflows around discovery, augmentation, organization, labeling, testing, and fixing, teams will have time to focus on what matters most - operationalization, model tuning and training, and algorithm development.

Spreadsheets make collaboration difficult

Although Google Sheets (the cloud alternative to Excel) enables real-time collaboration, teams are still hesitant to use it due to certain absences of professional features offered by Excel. And when working with large datasets, Google Sheets is often cited as being slower than Internet Explorer circa 2003.  

If you are resorting to Excel, a collaboration problem arises.

Firstly, data hardly stays in a single location. When files change hands between people who aren’t necessarily knowledgeable in best practices (back-up, security, changes, etc.), the integrity of the data is compromised. People will struggle to keep track of the latest files and changes in a decentralized process. This is especially true in omnichannel conversational experiences, where data is coming in from so many different sources.

Consider this: A centralized data hub for your NLU projects. HumanFirst tracks all changes done within workspaces, with the ability to revert back to any previous state. It allows all teams (developers, product owners, data scientists, and annotators) to collaborate around the data, across all use-cases.

Spreadsheets aren’t optimized for processing / working with text data

Excel is powerful, but it was clearly built for numbers and lacks critical features to seamlessly work with text data. Although there are ways to analyze text on Excel: word clouds, word count bar charts, sentiment analysis, they often involved long, convoluted formulas, like:


Like, whatttt?!

Also, text data was never intended to be tested in Excel. Without workflows around error control, quick iteration is impossible.

Consider this: Natural language training data as first-class citizens. We aim for HumanFirst to be a ubiquitous tool, similar to Excel, exposing powerful operations for organizing and transforming natural language data:

  • Semantic Search: Index and prepare your entire unlabeled data for full-text search as well as semantic search, which is 10x more efficient than CTRL-F.
  • Labeling: Quickly drill down into your unlabeled data to find semantically similar utterances, and label hundreds of utterances in a single click.
  • Organizing: Scale and manage up to thousands of intents, with hierarchical organization of intents.
  • Clustering: Explore your data with interactive clustering, and quickly modify the clustering parameters (granularity, size of clusters) in real-time.
  • Disambiguating: Quickly view what intents are conflicting, and ensure each intent's scope is as clear and specific as possible (both real-time, and based on your trained model).
  • Evaluating: On-demand 5-fold cross-validation analysis against its NLU (or your own), to provide intent-level metrics (F1, precision, recall, accuracy) that can be used to understand and tune your model.

Simply put, HumanFirst is the future of NLU data management.

Spreadsheets have little reusability

The copy-paste operation is also flawed to introduce data into new projects. The data has to be in the exact same format and physically aligned to bootstrap new projects.

Consider this: Reusable modular intent catalogs. Import and re-use any intent (and its accompanying training data) across verticals, projects, and workspaces.

Efficiency is everything, and HumanFirst is here to help you optimize your time.

To sum up:

Despite how powerful Excel is at analyzing numbers and performing calculations, it wasn’t intended to be used by natural language owners. Many teams rely on it for its familiarity, but as you can see, using Excel in your NLU projects is highly inefficient, very costly in human resources (and mistakes), and prevents quick iteration, testing, and speed to market.

Using HumanFirst, a tool specifically designed for NLU, allows you to quickly explore, search and discover all semantically similar utterances contained in your raw data before organizing them as labeled intents. This drastically accelerates your understanding of the data as well as your ability to curate and collaborate around qualitative data as a team.