There has been a recent surge in Visual Programming tools which enable developers to chain large language model prompts into an application, thus creating a conversational user interface.


A variety of chatbot development frameworks have incorporated Language Learning Models (LLMs) into their functionality as a means to supplement the existing chatbot development features. Unfortunately, the LLM implementations across these chatbot development frameworks have been largely similar in nature.

The initial implementation was followed by a second wave of conversational flow generation and conversation design assistants; which was spearheaded by Cognigy. And subsequently followed by Yellow AI and Kore AI.

LLMs are highly versatile with open-ended capabilities.

In this article, I want to dive deeper into the underlying principles behind designing and creating a native, visual programming interface for prompt chaining.

Native Prompt Chaining User Interface

Chaining Large Language Model prompts through a Visual Programming UI, the largest part of the functionality will be the GUI which facilitates the authoring process.

Below is such a GUI for prompt engineering, and prompt chain authoring. This design originated from research performed by the University of Washington and Google.


The interface must offer granularity on multiple levels of the authoring process. Considering the chaining process, it is evident that between chains there are connections.

Any prompt chaining or LLM chaining tool must support data transformation between the steps of the chain. Considering that in most cases the LLM will return unstructured data, there will be a level of structuring and data transformation required.

A second element is the slight unpredictable nature of LLMs. A LLM can generate multiple responses to the same prompt. This gives rise to a challenge of cascading undesired responses in prompt chains. Where unexpected data is propagated throughout the chains.

Prompts can be customised at run-time without any specific model training required.

New tasks are easily be absorbed by the LLM Chained application.  This is achieved  by ingesting natural language instructions called prompts.

Granted, many real-world applications involve complex and often parallel multi-step tasks which pose a challenge for a single instance of chains.

Below a functionality stack below, a prompt chaining application based on Large Language Models (LLMs) & Generative AI should have three main components:

  1. LLMs
  2. Helpers
  3. Communication
Adapted from source


For many chains/nodes use-cases, the LLM output can be used unedited as supplied by the LLM.

In most cases this view will be presented to the user, or data transformation might be required for a next chain in the LLM App (aka Gen App).

The LLM output will determine to which chain to branch out to, hence these chain/node edges need to contain rules and conditions.

A last consideration is using the LLM as a classifier. LLMs can be implemented in a generative or predictive capacity. Generative AI is the most popular and most accessible side of LLMs. Predictive is more tricky and requires more fine-tuning or at least precise prompts.

Leveraging a LLM as a classifier (hence predictive) within a chain, makes the decision making nodes of the chaining application intelligent and flexible.

There will always be the temptation to quickly code/script decision making into the edges; also referred to as the chain-transition points. This is a quick-fix; but again introduces rigidity into the LLM Chaining application.


Helpers assist in the evaluation of LLM output, considering aspects like politeness, empathy, etc. Helpers can also re-rank or resubmit prompts.

As granularity is a requirement for any production implementation, tweaking the prompt chained application will be important. A form of scripting will be required for processing and transformation of data.

Scripting encapsulates the development affordances given to the chain author to implement control measures, data transformation and more.


Communication takes place on a few levels, there is user data input in the form of unstructured conversational data.

Other user actions like asking the user to select the best response, or disambiguate based on a few possibilities.

And of course third-party systems (APIs) will be interrogated for any enterprise or production implementation.

The Design Of PromptChainer

The PromptChainer interface has a chain view visualiser where the chain structure with node-edges or “between-chains” can be created, edited, read and deleted [A].

The node viewer supports the implementing, improving and testing of each individual node or chain [B].

The editing of prompts for each node can be performed here.

PromptChainer also supports running the chain end-to-end [C] with options to clear the cache, view run logs, etc.


Based on the quantitive research performed by the University of Washington and Google Research, discoveries where made on how users build and debug chains:

[1] Users want to build chains to not only mitigate LLM limitations, but also make their applications extensible and scale.

[2] Some users built one step of a chain at a time, while others sketched out abstract placeholders for all steps before filling them in.

[3] The interactions between multiple LLM prompts and chains can be complex, requiring both local and global debugging of prompts.

Considering the image below, here is a description for each node or chain:

[1] Define the input to a chain

[2,3] Use LLM output to filter and branch out inputs.

[4] Use the LLM output directly as the node output.

[5] Pre-implemented JavaScript functions for typical data transformation.

[6] Use the LLM output directly as the node output.

[7] Call external functions to connect professional services with LLMs.

[8] User-defined JS functions, in case pre-defined helpers are insufficient.

[9] Use the LLM output directly as the node output.

[10] Filter or re-rank LLM outputs by human-designed criteria, e.g., politeness. Enables external (end user) editing on intermediate data points.

Adapted from source


There are a few terms used currently for applications built on Foundation Models and Large Language Models. These terms are Gen Apps, Generative Apps, LLM Apps, Prompt Chaining, LLM Chaining and more.

Suffice to say, that there is a need to build on Large Models, and somehow harness the power of LLMs.

This need will be underpinned by the following principles:

  1. Prompts
  2. Leveraging LLM Predictive and Generative capabilities.
  3. Chaining of Prompts; in both a parallel and series fashion.
  4. A visual editor for programming chains/nodes and edges, between-nodes/chains.
  5. LLM Response data transformation.
  6. LLM Chained Applications will be conversational for both input and output, or the output can also encapsulate a RPA component.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language. Including NLU design, evaluation & optimisation. Data-centric prompt tuning & LLM observability, evaluation & fine-tuning.