> For the complete documentation index, see [llms.txt](https://ruilabs.gitbook.io/airgent/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ruilabs.gitbook.io/airgent/agent/parsing-natural-language-to-structure-output.md).

# Parsing natural language to structure output

## **Goal**

We will **modify our workflow** so that:

1. The user inputs a query like **“latest news in Japan today”**.
2. **ChatGPT extracts** structured details from the input (e.g., country, category, keyword).
3. The extracted details are **passed to the Mediastack Node**, dynamically filtering news.

***

## **Step 1: Add a New ChatGPT Node for Information Extraction**

We need a new ChatGPT node to **extract structured data** from the user’s input before passing it to the Mediastack Node.

<figure><img src="/files/qHwJt2OK3HNRdnqYfhrj" alt=""><figcaption><p>New workflow setup with only ChatOpenAI node</p></figcaption></figure>

{% stepper %}
{% step %}
Let's remove the connection of input and output node, instead connect it to another **ChatOpenAI Node**

{% endstep %}

{% step %}
Connect the **Input Node’s** `message` output to the new **ChatGPT Node’s** `message` input.

{% endstep %}

{% step %}
**Test it** by sending the message:

* *“latest news in Malaysia today”*
* Right now, ChatGPT doesn’t understand what to do, because we haven’t given it proper instructions.
  {% endstep %}
  {% endstepper %}

## **Step 2: Instruct ChatGPT to Extract Information**

To make ChatGPT extract information, we need to **provide a system instruction**. However, the **ChatGPT Node** only has **one input**, but we need to send **both a system instruction and user query**. In this case, we use JavaScript Node **combines them into a single string**, ensuring ChatGPT processes the request properly.

{% stepper %}
{% step %}

### &#x20;Add a JavaScript Node

* Double-click an empty space in the editor and search for **JavaScript** to quickly add it.
* Name it **"System Prompt"** for clarity.
* Define **two inputs**:
  * **User Input** (`user` - string) → This will hold the user’s message.
  * **System Prompt** (`system` - string) → This will contain the instruction:\
    `"Extract the country, category, and keyword from the following user request"`
* Define **one output** (`output` - string).
  {% endstep %}

{% step %}
**Modify the JavaScript Code**

<figure><img src="/files/Yy9hbvnt1TBz3qDPUfDM" alt=""><figcaption><p>Configuring Javascript Node</p></figcaption></figure>

* Inside the **Function Code** of the JavaScript Node, enter the following:

  ```javascript
  return `system: ${inputs[1]}\n user: ${inputs[0]}`;
  ```
* This ensures that ChatGPT receives the prompt in a structured format:

  ```
  system: Extract the country, category, and keyword from the following user request
  user: latest news in Malaysia today
  ```

{% endstep %}

{% step %}
**Link the Inputs & Outputs**

* Connect:
  * The **user’s message** from the **Input Node** to the **User Input** of the JavaScript Node.
  * A **Text Area Node** containing the system instruction to the **System Input** of the JavaScript Node.
  * The **JavaScript Node's output** to the **message input** of the ChatGPT Node.
    {% endstep %}

{% step %}
**Test the Setup**

<figure><img src="/files/bnr2DD9glwL9AOXnKmv8" alt=""><figcaption><p>Workflow integrated with Javascript Node</p></figcaption></figure>

* Type: **"latest news in Malaysia today"**
* ChatGPT should now able to extract the information
  {% endstep %}
  {% endstepper %}

## **Step 3: Using a JSON Schema for ChatGPT Extraction**

In the previous step, we saw that the extracted data was incorrect

```
country: Malaysia
category: news
keyword: latest news today
```

This happens because **ChatGPT doesn’t know exactly how to format its response** based on the Mediastack API requirements.

To **fix this**, we will provide **a JSON Schema** to **standardize the extracted data format**.

{% stepper %}
{% step %}

#### **Prepare the JSON Schema**

We generated the JSON Schema for the [**Mediastack Live News API**](https://mediastack.com/documentation) using ChatGPT and the official JSON Schema standards ([json-schema.org](https://json-schema.org/draft/2020-12/schema)).

Here is an example JSON Schema that **enforces the correct format**:

{% code overflow="wrap" %}

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "sources": {
      "type": "string",
      "description": "Use this parameter to include or exclude one or multiple comma-separated news sources. Example: 'cnn,-bbc' to include CNN and exclude BBC.",
      "pattern": "^[a-zA-Z0-9,-]+$"
    },
    "categories": {
      "type": "string",
      "description": "Use this parameter to include or exclude one or multiple comma-separated news categories. Example: 'business,-sports' to include business and exclude sports.",
      "pattern": "^[a-zA-Z,-]+$"
    },
    "countries": {
      "type": "string",
      "description": "Use this parameter to include or exclude one or multiple comma-separated countries. Example: 'au,-us' to include Australia and exclude the US.",
      "pattern": "^[a-zA-Z,-]+$"
    },
    "languages": {
      "type": "string",
      "description": "Use this parameter to include or exclude one or multiple comma-separated languages. Example: 'en,-de' to include English and exclude German.",
      "pattern": "^[a-zA-Z,-]+$"
    },
    "keywords": {
      "type": "string",
      "description": "Use this parameter to search for sentences or exclude words. Example: 'new movies 2021 -matrix' to search for 'New movies 2021' but exclude 'Matrix'."
    },
    "date": {
      "type": "string",
      "description": "Use this parameter to specify a date or date range. Examples: '2020-01-01' for a specific date or '2020-12-24,2020-12-31' for a date range.",
      "pattern": "^\\d{4}-\\d{2}-\\d{2}(,\\d{4}-\\d{2}-\\d{2})?$"
    },
    "sort": {
      "type": "string",
      "description": "Use this parameter to specify a sorting order. Available values: 'published_desc' (default), 'published_asc', 'popularity'.",
      "enum": [
        "published_desc",
        "published_asc",
        "popularity"
      ],
      "default": "published_desc"
    },
    "limit": {
      "type": "integer",
      "description": "Use this parameter to specify a pagination limit (number of results per page). Default is 25, maximum allowed is 100.",
      "minimum": 1,
      "maximum": 100,
      "default": 25
    },
    "offset": {
      "type": "integer",
      "description": "Use this parameter to specify a pagination offset value. Default is 0, starting with the first available result.",
      "minimum": 0,
      "default": 0
    }
  },
  "additionalProperties": false
}

```

{% endcode %}

{% endstep %}

{% step %}

#### **Integrate the JSON Schema into ChatGPT**

1. **Add a "Const Data" Node** to store the JSON Schema.
   * Double-click an empty space in the editor, search **Data**, and add it.
   * Paste the JSON Schema into the **Const Data** Node.
2. **Modify the System Prompt**
   * Change the prompt in the **Text Area Node** to:

     ```graphql
     Extract the mediastack data from the following user request
     ```
3. **Connect the Schema to the ChatGPT Node**
   * Link the **Const Data (JSON Schema Node)** to the **Schema input** of the ChatGPT Node.
   * Now, ChatGPT will **refer to the schema** before generating the extracted response.

{% endstep %}

{% step %}

### Test the result

<figure><img src="/files/PEqJOiQrcEjTwI3wjB1A" alt=""><figcaption><p>Agent setup with correct schema</p></figcaption></figure>

1. Send a message like:

   ```
   latest sports news in Japan today
   ```
2. The **correct output should now be**:

   <pre class="language-json"><code class="lang-json">{
   <strong>    "categories": "sports",
   </strong>    "countries": "jp",
       "date": "2023-12-07"
   }
   </code></pre>

{% endstep %}
{% endstepper %}

## **Step 4: Connect everything together**

Next, we’ll **connect the structured data to the Mediastack API** and dynamically fetch news based on user input! <br>

**Since ChatGPT** outputs **JSON as a string** instead of an actual JSON object. We need to **parse the stringified JSON** before passing it to Mediastack

{% stepper %}
{% step %}
**Create a JavaScript Node to Parse JSON**

&#x20;Since ChatGPT outputs JSON as a string instead of an actual JSON object. We need to parse the stringified JSON before passing it to Mediastack

<figure><img src="/files/iTqXXiqbrTgWMfblvZuR" alt=""><figcaption><p>Javascript Node to parse JSON</p></figcaption></figure>

1. **Double-click an empty space** in the editor and search for **JavaScript**.
2. **Rename it to "Parse JSON"** for clarity.
3. Set **Code, Inputs and Outputs** accordingly.
   {% endstep %}

{% step %}

#### **Link the Nodes**

1. **Connect ChatGPT's `response` output** → **Parse JSON Node’s `input`**.
2. **Parse JSON Node’s `output`** → **Mediastack News Node’s `countries`, `category`, and `keywords` inputs**.
3. **Use an Object Property Node** (if needed) to extract specific fields from the parsed JSON.
   {% endstep %}

{% step %}

#### **Test the Workflow**

1. Send: **"latest sports news in Malaysia today"**
2. Mediastack should now **correctly receive** the extracted country, category, and keyword and replied with relevent information
   {% endstep %}
   {% endstepper %}

## Summary

So now we have successfully create a agent that reply with news based on user's prompt.

<figure><img src="/files/EwjsPuMqssiiXElHH7Fo" alt=""><figcaption><p>New Reporter Agent Setup</p></figcaption></figure>

Here is the **exported JSON file** containing the full workflow setup. You can import this into Editor to instantly recreate the agent.

{% file src="/files/plETAavyHducvLv0tCms" %}
Exported Agent's JSON File
{% endfile %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ruilabs.gitbook.io/airgent/agent/parsing-natural-language-to-structure-output.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
