将OpenCopilot与LangChain的计划执行代理进行比较

ChatGPT中文站

几天前,我们宣布推出了一款名为OpenCopilot的新型AI驱动副驾驶代理。OpenCopilot旨在为软件公司提供高效且可控的框架,可以与公司的软件生态系统无缝集成,并能够在最少的指导下执行复杂任务。

在本文中,我们将比较OpenCopilot和LangChain的计划和执行代理,后者是一种类似目的的AI代理,专注于在执行任务之前规划任务。计划和执行代理基于Lei Wang等人于2023年5月发表的“计划和解决”论文。

一个SQL数据库的副驾驶员

想象一下您希望开发 SQL 数据库的副驾驶。该数据库包含一组表格,您想要对其中存储的数据进行提问。在这种情况下,我们将为您提供两个工具,一个是使用某些语义搜索引擎查找相关表格的工具,另一个则是执行 SQL 查询的工具。

RelevantTables: Get the most relevant tables for 
a given query.
SQLDatabase: Executes a sql query on the SQL DB.

问题将是:有多少名员工?

在本文中,我们的主要关注点将是与LLM模型交换的提示,而不是基础代码。我们将使用ChatGPT-3.5-Turbo,温度为0。

朗链的计划和执行代理

我们将通过使用计划和执行代理来尝试上述查询。该过程通常从定义代理可用的工具开始,然后使用用户的问题运行它。

LangChain和LLM之间的初始交互涉及以下计划的制定:

Let's first understand the problem and
devise a plan to solve the problem.
Please output the plan starting with
the header 'Plan:' and then followed by
a numbered list of steps. Please make
the plan the minimum number of steps
required to accurately complete the
task. If the task is a question, the
final step should almost always be
'Given the above steps taken, please
respond to the users original
question'. At the end of your plan, say
'<END_OF_PLAN>'

User: The plan should be relevant to a SQL DB software.
How many employees are there?

请注意,在问题之前我们添加了句子“计划应与SQL DB软件相关”,以提供一些背景。

以下是GPT-3.5-Turbo的回复:

Plan:
1. Connect to the SQL database.
2. Identify the table that contains employee information.
3. Write a SQL query to count the number of rows in the employee table.
4. Execute the SQL query.
5. Retrieve the result of the query.
6. Display the result to the user.
7. Close the database connection.

Given the above steps taken, we can accurately determine the
number of employees in the database.

下一步向LLM发出的调用是执行第一个任务:

Respond to the human as helpfully and
accurately as possible. You have access
to the following tools:

SQLDatabase: Executes a sql query on
the SQL DB, args: {{'tool_input':
{{'type': 'string'}}}}

RelevantTables: Use this tool to get
the most relevant tables for a given
query., args: {{'query': {{'title':
'Query', 'type': 'string'}}}}

Use a json blob to specify a tool by
providing an action key (tool name) and
an action_input key (tool input).

Valid "action" values: "Final Answer"
or SQLDatabase, RelevantTables

Provide only ONE action per $JSON_BLOB,
... (ommitted for simplicity) ...

Previous steps: steps=[]

Current objective: value='Connect to
the SQL database.'

以下是GPT-3.5-Turbo的回复:

Action:
```
{
"action": "SQLDatabase",
"action_input": "Connect to the SQL database."
}
```

很明显,代理不会按照提供的计划继续前进。与 SQL 数据库的连接不是 SQLDatabase 工具公开的功能之一。数据库连接是在后台中的代码中完成的,SQLDatabase 期望执行 SQL 查询。

这种思维的连锁反应,当然永远没有成功的结束。

问题很明显:该计划没有考虑到系统的背景和能力。

OpenCopilot开放式飞行控制软件

通过首先对LLM进行背景说明,OpenCopilot解决了先前提到的问题,描述他应该是谁。他是一个助手。当您的软件用户问飞行员他们是谁时,这也很有用——您不希望回答是“我是ChatGPT”,对吧?

其次,OpenCopilot引入了“操作员”概念。每组相关任务应该属于一个操作员。我们添加了工具概述并确保他将始终在这些外部能力范围内制定计划:

Your are SQL Assistant and your name is
SQL Assistant. I am SQL Operator. You
are talking to me only and you can talk
to me by providing a set of tasks in
JSON format.

You are required to answer user
inquiries that will be sent to you by
SQL Operator. You can ask SQL Operator
to do some tasks like executing query
on SQL system. If the inquiry is
complex, break it down to simpler
tasks.

Tasks can be one or more of the
following:
- SQL Metadata Helper -->
(TableFinderOp) All queries must
be against a single table, this
operator can help you find the most
relevant table
- SQL Query Helper --> (QueryOp) This
operator requires input from the
TableFinderOp, this operator can
execute queries on SQL DB. The
queries would be on data and not
metadata.
-UiText Operator, (UiTextOp) This operator
can send a reply to the user

You should only respond with tasks in
JSON format as described in the
following JSON schema:
... (ommitted for simplicity) ...


From SQL Operator: The user is asking:
How many employees are there?
JSON:

根据以上提示,LLM 提供了以下简洁而精确的计划:

{
"tasks": [
{
"id": 1,
"name": "Find relevant table",
"goal_and_purpose": "Find the table that contains
information about employees",
"operator": "TableFinderOp",
"status": "TODO",
"depends_on_output_of": [],
"detailed_purpose": "Search for tables with column names
like 'employee', 'staff', 'worker', etc.",
"result": ""
},
{
"id": 2,
"name": "Execute query",
"goal_and_purpose": "Execute a query to count the
number of employees",
"operator": "QueryOp",
"status": "TODO",
"depends_on_output_of": [1],
"detailed_purpose": "Execute a query
like 'SELECT COUNT(*) FROM employees'",
"result": ""
},
{
"id": 3,
"name": "Send reply to user",
"goal_and_purpose": "Send the result of the query to the user",
"operator": "UiTextOp",
"status": "TODO",
"depends_on_output_of": [2],
"detailed_purpose": "Send the result of the query to
the user in a clear and concise manner",
"result": ""
}
]
}

如示,该计划正好包括三项任务,这是此问题的最佳任务数量。

接下来,我们将深入探讨并提供每个运算符的详细命令定义(或工具,如LangChain),以获得最终的命令。以下是任务1的请求示例。

Your are Incorta Assistant and your
name is Incorta Assistant. I am Incorta
Operator. You are talking to me only
and you can talk to me by providing a
command in JSON format.

We have a user who sent an inquiry to
the Incorta Operator. Incorta Operator
has prepared a set of tasks to carry
out the inquiry of the user. I want you
to generate command for a specific task
that I will provide later.

The user inquiry is: How many employees are there?

The list of tasks prepared by Incorta
Operator is:
[
{
"id": 1,
"name": "Find relevant table",
"goal_and_purpose": "Find the table that contains
information about employees",
"operator": "TableFinderOp",
"status": "TODO",
"depends_on_output_of": [],
"detailed_purpose": "Search for tables with
column names like 'employee', 'staff', 'worker', etc.",
"result": ""
}
]

You must generate one of the following
commands only and it must follow the
JSON format provided:

Commands Overview: I'm
TableFinderOp, I can get you the
tables that's most relevant to
the user query.

Command 1 Description: [
"Given query, search the available
tables, to find the most relevant table.
That contains all the required field"
]
Command 1 JSON Format:
{
"command_name": "GetRelevantTable",
"args": {
"query": "Query to search for the most relevant table",
"require_result_summary": false
}
}


Now, given the list of possible
commands, and the list of tasks,
generate the command for Task_id: 1


Ensure the response can be parsed by
Python json.loads.

Command ==> JSON:

最后,对于这个操作,LLM 直接回复命令:

[
{
"command_name": "GetRelevantTable",
"args": {
"query": "employee OR staff OR worker",
"require_result_summary": false
}
}
]

它将保持一致的命令,不会偏离我们的SQL DB软件的范围。

更多的命令描述空间

使用运算符分类命令时,我们只需要为每个运算符提供命令描述。这种方法允许更多的空间来描述复杂的命令,而不会牺牲其他命令的包含。

将命令分类为操作对于LLM可能有些限制。但是,在许多情况下,我们发现大多数需要的命令可以归类为一个操作,无需告知LLM其他可用命令。

结论

在本文中,我们展示了OpenCopilot如何提供更简明的计划,更好地满足软件公司寻求副驾驶的需求,与LangChain中的通用代理(如Plan and Execute Agent)相比。此外,将命令分类为操作为开发人员提供了更多的空间来描述复杂的命令,同时仍然适合LLMs的有限上下文。

2023-10-20 16:58:09 AI中文站翻译自原文