GPTScript：从头开始构建自己的任务专用AI代理并部署到Kubernetes

学习如何设计GPTScript代理，并在本地使用或部署在Kubernetes上。

注意：您将需要一个OpenAI API密钥，或者您可以使用另一个后端，比如iits.ai 来节省些钱。然而，在本指南中的所有示例将使用OpenAI作为后端，因为大多数人都可以访问它。不要错过我们在汉堡ContainerDays讨论这个话题的机会！加入我们，9月4日14:15在K2舞台！我们还将在9月3日和4日拥有一个展台。欢迎您访问我们，让我们一起讨论或构建一些基于代理的解决方案！

什么鬼？别担心！我甚至还没有完全实现下面图片中显示的内容。这就是为什么我们要从头开始！

Fig.0: A Complex Setup of GPTScript Agents

自愈基础设施，辅助您日常生活的代理人。想象一下为您的任务定义代理人，无论是用于DevOps还是日常例行事务，都使用简单的语言。像GPT这样的聊天机器人变得如此成功，因为它们能够访问到广泛的受众。这里的目标类似，但更注重于CLI用户。您可以使用简单语言创建代理人或工具，而不是为每个任务编写Python脚本来辅助您。我稍后会解释工具和代理人之间的区别。当然，这并不排除使用Python脚本，因为GPTScript支持Python，Go和其他语言。

想象一下，你有日常任务，想简单地询问，“我该如何做？”并得到答案。或者想象一下收到关于系统的警报，你可以用简单的语言询问：“我的集群出了什么问题？”

我们想要建立….

00. GPTScript 概览

GPTScript是一个框架，允许大型语言模型（LLMs）与各种系统进行交互。这些系统可以从本地可执行文件到具有OpenAPI模式、SDK库或任何基于RAG的解决方案的复杂应用程序。GPTScript旨在轻松集成任何系统，无论是本地还是远程，只需几行提示即可将其与您的LLM集成。

我们将专注于工具和基于代理的部分，因为它们与我们最相关。

工具本质上是一组命令，用于完成特定任务，比如使用echo来显示文本。这是一种直接输出输入的方法。

另一方面，代理人更加灵活，可以在更广泛的环境中运行。例如，如果你问一个代理人，“今天汉堡的天气怎么样？”它可以提供与天气有关的答案，而回声工具只会重复文本。代理人处理各种请求的能力使其更加多功能。然而，你也可以配置一个带有严格指令的代理人。例如，如果接收到警报，你可以指示代理人调试Prometheus警报，同时运行一个适合你的环境的特定命令，特别是如果标准的Prometheus调试对你没有用。我们稍后将讨论一些示例。

所以，代理并不完全等同于工具 - 但在某些方面，它是！

现在，让我们来看一下工具或代理的组成部分：

模型名称：要使用的LLM模型，“gpt-4o”是默认值。
姓名：工具的名称。
工具：一个以逗号分隔的工具列表，这个工具可以调用这些工具。
描述：工具的描述。清楚地定义工具的目的很重要，因为LLM使用这个描述。
上下文工具的逗号分隔列表，与在其上下文中包含它的任何工具共享。
共享上下文: 一个以逗号分隔的上下文工具列表，这些工具与任何将其包含在上下文中的工具共享。
代理人: 一个逗号分隔的代理人列表，这些代理人可以使用该工具。
聊天：将此设置为true会启用工具的交互式聊天会话。
参数/参数: 工具的参数，以 param-name: description 的格式定义。

您可以在这里找到有关工具指南的额外参考资料。

Fig. 2. Stay focused and try to remember what you have read above!

但首先，让我们在下一步设置环境。

01. 安装和初始化

MacOs或Linux Brew:

brew install gptscript

MacOS或Linux install.sh:

curl https://get.gptscript.ai/install.sh | sh

最简单的方法是克隆存储库并使用提供的示例：

我通常会设置一个类似这样的别名，但请随意根据您的喜好自定义命令。


alias gpt='gptscript --workspace "$(dirname "$0")" --disable-cache  --openai-api-key "sk-proj-.." gpt-agents/simple-cli-tool.gpt $0'

现在开始有趣的部分-入门级别：

由于我的一个朋友的帮助，我学会了如何设置别名来调用gptscript命令，并带上所有必要的参数。

我的化名看起来像这样：


alias gpt='gptscript --workspace "$(dirname "$0")" --disable-cache  --openai-api-key "sk-p..." simple-cli-tool.gpt $0'

我在使用不同的后端进行比较解决方案时，只是更改键和引用的工作文件，比如simple-cli-tool.gpt（我们将在几分钟内创建）。

让我们从构建一些简单的工具和代理开始，以了解所呈现的工具是如何构建的，然后我们会逐渐增加复杂性。

首先，让我们构建一个基本的CLI工具：

上图提供了以下任务的简化视图。我们跳过了用户首先调用“CLI-Tool-Caller”这一部分，而这将使用工具“cli-tool”。

如果你创建一个名为simple-cli-tool.gpt的文件，并写入以下内容：

Name: CLI-Tool-Caller
Tools: cli-tool

Execute the CLI command provided and return the output.

---
Name: cli-tool
Description: I execute CLI commands on the local system.
Args: command: The CLI command to execute.

然后按照上面的图示运行命令，你将会得到如下输出：

gpt "Does my Kubernetes cluster have any issues?"


INPUT:

/bin/zsh Does my Kubernetes cluster have any issues?

OUTPUT:

I cannot execute CLI commands directly. However, you can use the following commands to check for issues in your Kubernetes cluster:

1. Check the status of all nodes:
   ```sh
   kubectl get nodes
   ```

2. Check the status of all pods in all namespaces:
   ```sh
   kubectl get pods --all-namespaces
....

正如你所看到的，该工具不能直接在你的机器上执行命令，但它会给出如何检查您的Kubernetes集群的建议。但是如果你希望它像一个工具一样执行命令，你可以像这样修改它：

Name: CLI-Tool-Caller
Tools: cli-tool

Execute the CLI command provided and return the output.

---
Name: cli-tool
Tools: sys.exec #give the tool the right to execute needed commands
Description: I execute CLI commands on the local system.
Args: command: The CLI command to execute.
Chat: false

${command}"

现在，当你运行命令时，你会看到像这样的输出：

gpt "Does my Kubernetes cluster have any issues?"



12:51:41 started  [main] [input=/bin/zsh Does my Kubernetes cluster have any issues?]
12:51:41 sent     [main]
         content  [1] content | Waiting for model response...
         content  [1] content | <tool call> cliTool -> {
         content  [1] content |   "command": "/bin/zsh -c 'kubectl get pods --all-namespaces'"
         content  [1] content | }
12:51:42 started  [cli-tool(2)] [input={
  "command": "/bin/zsh -c 'kubectl get pods --all-namespaces'"
}]
12:51:43 sent     [cli-tool(2)]
         content  [2] content | Waiting for model response...
         content  [2] content | <tool call> exec -> {"command":"/bin/zsh -c 'kubectl get pods --all-namespaces'"}
....
INPUT:

/bin/zsh Does my Kubernetes cluster have any issues?

OUTPUT:

There are no issues. All pods, services, deployments, and replicasets are running and available as expected.

这是一个工具；现在让我们看看如何将一个工具变成一个代理。

上图提供了以下任务的简化视图。我们跳过了用户首先调用“CLI-Agent-Caller”部分，而后者使用工具“cli-agent”。

让我们创建以下简单文件，simple-cli-agent.gpt:

Name: CLI-Agent-Caller
Tools: cli-Agent

Execute the CLI command provided and return the output.

---
Name: cli-agent
Tools: sys.exec
Description: I execute CLI commands on the local system.
Chat: false

#prompt/instruction
Ensure the command is valid, modify it if needed, and then complete the task.
You can run CLI commands on the local system. You are an alpine-based or darwin-based system.
If the cli tool is not installed, then install it.

不要忘记更新别名，指向新文件！

这种方法和命令行工具之间的区别在于提示或指示。在这里，代理首先检查命令是否有意义，如果有必要就重写它，然后执行。相比之下，命令行工具会直接执行命令，除非另有指示。

这是什么意思？那么，看看:

向代理发送这样的命令：

 gpt "Does my Kubernetes Cluster has any issues in my defaaautltt namesp?"                                                                                             
13:23:15 started  [main] [input=/bin/zsh Does my Kubernetes Cluster has any issues in my defaaautltt namesp?]
13:23:16 sent     [main]
         content  [1] content | Waiting for model response...
         content  [1] content | <tool call> cliAgent -> {
         content  [1] content |   "command": "kubectl get all -n defaaautltt"
         content  [1] content | }
13:23:17 started  [cli-agent(2)] [input={
  "command": "kubectl get all -n defaaautltt"
}]
13:23:17 sent     [cli-agent(2)]
         content  [2] content | Waiting for model response...
         content  [2] content | The command contains a typo in the namespace. The correct command should be:
         content  [2] content |
         content  [2] content | ```sh
         content  [2] content | kubectl get all -n default
         content  [2] content | ```
         content  [2] content |
         content  [2] content | I will run this command for you.<tool call> exec -> {
         content  [2] content |   "command": "kubectl get all -n default"
         content  [2] content | }
13:23:18 started  [cli-agent(2)->sys.exec(3)] [input={
....

该命令会被纠正并应用于默认命名空间。

如果我将其作为工具执行，它会看起来像这样：

gpt "Does my Kubernetes Cluster has any issues in my defaaautltt namesp?"                                                                                          
13:37:11 started  [main] [input=/bin/zsh Does my Kubernetes Cluster has any issues in my defaaautltt namesp?]
13:37:12 sent     [main]
         content  [1] content | Waiting for model response...
         content  [1] content | <tool call> cliTool -> {
         content  [1] content |   "command": "/bin/zsh -c 'kubectl get all -n defaaautltt'"
         content  [1] content | }
13:37:12 started  [cli-tool(2)] [input={
  "command": "/bin/zsh -c 'kubectl get all -n defaaautltt'"
}]
13:37:13 sent     [cli-tool(2)]
         content  [2] content | Waiting for model response...
         content  [2] content | <tool call> exec -> {
         content  [2] content |   "command": "/bin/zsh -c 'kubectl get all -n defaaautltt'"
         content  [2] content | }
13:37:13 started  [cli-tool(2)->sys.exec(3)] [input={
  "command": "/bin/zsh -c 'kubectl get all -n defaaautltt'"
}]
13:37:13 sent     [cli-tool(2)->sys.exec(3)]
         content  [3] content | No resources found in defaaautltt namespace.
         content  [3] content |
13:37:15 ended    [cli-tool(2)->sys.exec(3)] [output=No resources found in defaaautltt namespace.]
13:37:15 continue [cli-tool(2)]
13:37:16 sent     [cli-tool(2)]
         content  [2] content | Waiting for model response...
         content  [2] content | No resources found in defaaautltt namespace.
13:37:17 ended    [cli-tool(2)] [output=No resources found in defaaautltt namespace.]
13:37:17 continue [main]
13:37:17 sent     [main]
         content  [1] content | Waiting for model response...
         content  [1] content | There are no resources found in the "defaaautltt" namespace in your Kubernetes cluster.
13:37:18 ended    [main] [output=There are no resources found in the \"defaaautltt\" namespace in your Kubernetes cluster.]
13:37:18 usage    [total=902] [prompt=808] [completion=94]

INPUT:

/bin/zsh Does my Kubernetes Cluster has any issues in my defaaautltt namesp?

OUTPUT:

There are no resources found in the "defaaautltt" namespace in your Kubernetes cluster

正如之前提到的，该命令直接执行。

重要的是要明白，无论是工具还是代理，指令都发送到 AI 后端，并不应该和简单的 bash 脚本相比较。即使使用 CLI 工具，虽然你不是输入一个命令而是一段文本，工具会根据文本来生成一个命令。

这是一个简要介绍。请随意尝试不同的键-值配置，看看示例的表现如何。在下一步中，我们将增加复杂性。

系好安全带！—级别：中级

我们现在将设置以下场景（图6），目标是让Kubernetes代理不仅识别集群中的问题，还能解决这些问题。

Fig. 6: GPTScript: Simple Kubernetes Agent Setup

这里发生了什么？用户提出一个关于集群是否有任何问题的问题，该问题被发送到请求处理程序。这个请求处理程序不完全是一个工具或代理，而更像是请求的入口点。请求处理程序了解Kubernetes代理并与之共享上下文，其中包括特定的指令。请求处理程序按照这些指令将用户的请求传递给Kubernetes代理。

Kubernetes代理程序执行其任务，使用人工智能后端来识别和解决集群中的任何问题，然后向用户提供所采取的操作的可读摘要。

这在 GPT 文件中是什么样子？让我们创建一个名为 simple-k8s-fix.gpt 的文件，内容如下：

---
Name: Request Handler
Description: A request handler that receives and delegates user requests.
Context: shared-context
Agents: kubernetes-agent
Chat: false

Receive the user's request and determine if it requires Kubernetes cluster analysis. If it does, delegate the task to the Kubernetes agent. Provide feedback to the user based on the results from the Kubernetes agent.


---
Name: kubernetes-agent
Description: A Kubernetes agent that analyzes the cluster for potential issues.
Context: shared-context
Tools: sys.exec
Parameter: command: The Kubernetes command that should be executed for analysis
Chat: false

Analyze the Kubernetes cluster for any potential issues by executing relevant kubectl commands to assess node status, pod health, and other key metrics.
If no issues are apparent from the initial metadata, delve deeper using kubectl logs to gather more detailed information. Report on any identified problems and suggest solutions.
As a Kubernetes expert, focus on addressing the root cause of issues—for example, if a pod has e.g. an incorrect image reference, correct the deployment configuration rather than just resolving the pod-level issue.
You should fix it. If you fix it, summarize it es an human readable output.

---
Name: shared-context

#!sys.echo
You are a highly efficient assistant specializing in Kubernetes cluster management.
Always prioritize minimal output to conserve token usage.
You have access to Kubernetes tools and can analyze the cluster for potential issues.
You are allowed to run commands, analyze the results, and provide the user with detailed feedback on the status of their Kubernetes cluster.
Strive to provide concise and actionable insights to the user.

该过程已在先前进行解释。这里值得提到的唯一一点是，共享上下文不会尝试优化请求以避免超出令牌限制（这可能会很昂贵）。 #!sys.echocommand 信号表示其后的所有内容应被视为输出，要么被传达给用户，要么作为上下文的一部分进行记录。

要看到这个实际操作，我们首先会在Kubernetes集群中引入一些错误。

部署以下文件从这个仓库：

deployment-typo-image-name.yaml with:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: typo-image-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: typo-image
  template:
    metadata:
      labels:
        app: typo-image
    spec:
      containers:
      - name: typo-container
        image: nginxx:latest # Typo Image-Name (nginxx instead of nginx)
        ports:
        - containerPort: 80


deployment-wrong-env.yaml with: 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: env-var-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: env-var-app
  template:
    metadata:
      labels:
        app: env-var-app
    spec:
      containers:
      - name: env-var-container
        image: busybox
        command: ["sh", "-c", "echo $SOME_VAR && sleep 3600"]
        env:
        - name: SOME_VAR
          value: "" # missing or empty env var


deployment-wrong-image-tag.yaml with:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: faulty-image-tag-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: faulty-image-tag
  template:
    metadata:
      labels:
        app: faulty-image-tag
    spec:
      containers:
      - name: faulty-container
        image: nginx:invalidtag123 # Wrong Image-Tag
        ports:
        - containerPort: 80


pod-crashloopback.yaml with: 

apiVersion: v1
kind: Pod
metadata:
  name: crash-loop-pod
  namespace: default
spec:
  containers:
  - name: crash-loop-container
    image: busybox
    command: ["sh", "-c", "exit 1"] #This command leads to an immediate error and causes CrashLoopBackOff

并部署它:

kubectl apply -f demo-k8s-deployments/

检查一切是否部署正确:

kubectl get pods 

NAME                                          READY   STATUS             RESTARTS      AGE
crash-loop-pod                                0/1     CrashLoopBackOff   5 (23s ago)   3m23s
env-var-deployment-744bbcff56-j4r5j           1/1     Running            0             3m23s
faulty-image-tag-deployment-d678fcd88-zpm8v   0/1     ErrImagePull       0             3m23s
typo-image-deployment-787c9b749c-zshhx        0/1     ErrImagePull       0             3m23s

现在，让我们看看我们的新代理设置如何处理这些问题：

gpt "Does my Kubernetes Cluster has any issues?"


INPUT:

/bin/zsh Does my Kubernetes Cluster has any issues?

OUTPUT:

Your Kubernetes cluster had the following issues which have been resolved:
- Deleted a pod in a crash loop in the `default` namespace.
- Updated the image for `faulty-image-tag-deployment` to `nginx:latest`.
- Updated the image for `typo-image-deployment` to `nginx:latest`.

您的Kubernetes集群存在以下问题，现已解决：

在默认命名空间中删除了一个在崩溃循环中的 Pod。
将故障图像标记部署更新为 nginx:latest。
更新了typo-image-deployment镜像至nginx:latest。

太棒了！输出正确且可以进一步通过更多细节等来进行格式化。正如我们所看到的，大多数问题已经得到解决。缺失的环境变量更难识别，因为日志为空且Pod看起来健康，但没关系或可能是有意的。

接下来，让我们增加复杂性，允许用户和请求处理程序之间的聊天，但限制请求数量为3次。

Fig.7: GPTScript: Simple Kubernetes Agent Setup with Allowing Chat to Request Handler

这就是新文件simple-k8s-chat-fix.gpt的内容：

---
Name: Request Handler
Description: A request handler that receives and delegates user requests.
Context: shared-context
Agents: kubernetes-agent
Chat: true

Receive the user's request and determine if it requires Kubernetes cluster analysis.
If it does, delegate the task to the Kubernetes agent.
Provide feedback to the user based on the results from the Kubernetes agent.
The user can ask you up to 3 times over the chat.
If you still haven't found a solution after 3 queries, apologize and ask the user to debug it themselves.


---
Name: kubernetes-agent
Description: A Kubernetes agent that analyzes the cluster for potential issues.
Context: shared-context
Tools: sys.exec
Parameter: command: The Kubernetes command that should be executed for analysis
Chat: false

Analyze the Kubernetes cluster for any potential issues by executing relevant kubectl commands to assess node status, pod health, and other key metrics.
If no issues are apparent from the initial metadata, delve deeper using kubectl logs to gather more detailed information. Report on any identified problems and suggest solutions.
As a Kubernetes expert, focus on addressing the root cause of issues—for example, if a pod has e.g. an incorrect image reference, correct the deployment configuration rather than just resolving the pod-level issue.
You should fix it. If you fix it, summarize it es an human readable output.

---
Name: shared-context

#!sys.echo
You are a highly efficient assistant specializing in Kubernetes cluster management.
Always prioritize minimal output to conserve token usage.
You have access to Kubernetes tools and can analyze the cluster for potential issues.
You are allowed to run commands, analyze the results, and provide the user with detailed feedback on the status of their Kubernetes cluster.
Strive to provide concise and actionable insights to the user.

我们只需要将聊天的设置从假变为真，并在请求处理程序的指令中延长3个查询限制。因此，再次应用文件以在集群中创建一些问题，并向您的朋友寻求帮助（不要忘记在别名中设置新文件）。

gpt "Does my Kubernetes Cluster has any issues?"

哇，现在看起来不一样了！

聊天模式还会在每次初始查询的任务中要求您的许可，比如：

  How can I assist you with your Kubernetes cluster today?

    │ aks-node-13614852-vmss000002   Ready    <none>   3d10h   v1.29.4 │
    └──────────────────────────────────────────────────────────────────┘
    │ NAMESPACE           NAME                                                   READY   STATUS             RESTARTS        AGE   │
    │ anything-llm-hks    anything-llm-hks-6dd4dfb4dc-ps8dm                      1/1     Running            0               3d10h │
    │ NAMESPACE           NAME                                                   READY   STATUS             RESTARTS        AGE   │
    │ anything-llm-hks    anything-llm-hks-6dd4dfb4dc-ps8dm                      1/1     Running            0               3d10h │
    │     TokenExpirationSeconds:  3607
    │     ConfigMapName:           kube-root-ca.crt
    │     DownwardAPI:             true
    │ QoS Class:                   BestEffort
    │ Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
    │                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
> I have some issues with my cluster, please help me

  I'll analyze your Kubernetes cluster for potential issues. Please hold on a moment.


    ┌─────────────────┐
    │ Call Arguments: │
    │                 │
    │ kubernetes {}   │
    └─────────────────┘

    ┌───────────────────────────────────────────────────────┐
    │ Call Arguments:                                       │
    │                                                       │
    │ exec {"command": "kubectl get nodes"}                 │
    │ exec {"command": "kubectl get pods --all-namespaces"} │
    └───────────────────────────────────────────────────────┘

  Running  kubectl get nodes 
...

然后您可以进行第二次查询:

  Here are the actions taken to resolve issues in your cluster:

  1. Deleted a pod in a crash loop in the  default  namespace.
  2. Updated the image for the deployment  faulty-image-tag-deployment  to  nginx:latest  in the  default  namespace.
  3. Updated the image for the deployment  typo-image-deployment  to  nginx:latest  in the  default  namespace.

  Please check if the issues are resolved. If you need further assistance, let me know.

> have i another issues with my cluster?

并且在集群健康之后，您可以创建第三个查询：

  All nodes and pods are in a healthy state. No issues detected.


  All nodes and pods are in a healthy state. No issues detected. Your cluster appears to be functioning correctly.

> another problems, like alertings?

当我问了第四个问题时:

> Help me!

  You've reached the limit of 3 queries. Please try debugging the issue yourself or start a new session for further assistance.

然后我的第五要求是理解为什么聊天不会自动结束:

> Why you dont end the chat after 3 queries? Need I something to like sys.abort?

  I apologize for any confusion. If you need further assistance, please feel free to start a new session.

但代理人坚持并拒绝再给我任何答案。

因此，让我们增加更多复杂性，允许请求处理程序和Kubernetes代理之间的聊天最多3次。

这就是我们要达到的目标：

Fig. 8: GPTScript: Simple Kubernetes Agent Setup with Allowing Chat Request Handler -> K8s Agent

让我们看看它的外观和它是否有效。这里是新文件simple-request-to-k8s-chat-fix.gpt：

---
Name: Request Handler
Description: A request handler that receives and delegates user requests.
Context: shared-context
Agents: kubernetes-agent
Chat: false

Receive the user's request and determine if it requires Kubernetes cluster analysis.
If it does, delegate the task to the Kubernetes agent.
Only if the user's problem has been solved, then provide feedback to the user based on the results from the Kubernetes agent.



---
Name: kubernetes-agent
Description: A Kubernetes agent that analyzes the cluster for potential issues.
Context: shared-context
Tools: sys.exec
Parameter: command: The Kubernetes command that should be executed for analysis
Chat: true

Analyze the Kubernetes cluster for any potential issues by executing relevant kubectl commands to assess node status, pod health, and other key metrics.
If no issues are apparent from the initial metadata, delve deeper using kubectl logs to gather more detailed information. Report on any identified problems and suggest solutions.
As a Kubernetes expert, focus on addressing the root cause of issues—for example, if a pod has e.g. an incorrect image reference, correct the deployment configuration rather than just resolving the pod-level issue.
You should fix it. If you fix it, summarize it es an human readable output.

---
Name: shared-context

#!sys.echo
You are a highly efficient assistant specializing in Kubernetes cluster management.
Always prioritize minimal output to conserve token usage.
You have access to Kubernetes tools and can analyze the cluster for potential issues.
You are allowed to run commands, analyze the results, and provide the user with detailed feedback on the status of their Kubernetes cluster.
Strive to provide concise and actionable insights to the user.

现在别忘记为新文件设置别名，部署配置错误的服务，查看当允许与Kubernetes代理对话时，我们的请求处理程序代理会发现些什么。

gpt "Does my Kubernetes Cluster has any issues?"


INPUT:

/bin/zsh Does my Kubernetes Cluster has any issues?

OUTPUT:

### Issues Identified:
1. **Pods in Error State:**
   - `default` namespace:
     - `pod/crash-loop-pod`: Status `Error`
     - `pod/faulty-image-tag-deployment-d678fcd88-d6fst`: Status `ImagePullBackOff`
     - `pod/typo-image-deployment-787c9b749c-krpz6`: Status `ImagePullBackOff`

### Next Steps:
1. **Check Logs for Error Pods:**
   - Investigate the logs for `crash-loop-pod` to understand the error.
   - Check the image references for `faulty-image-tag-deployment` and `typo-image-deployment`.

Would you like to proceed with checking the logs and image references?

呼，没什么好事 - 请求已暂停，询问您是否要继续。现在，我们将为请求处理程序提供更多权限，以便它可以回答“是”或“y”并继续执行进一步的指令。让我们看看会发生什么。这是更新后的配置：

---
Name: Request Handler
Description: A request handler that receives and delegates user requests.
Context: shared-context
Agents: kubernetes-agent
Chat: false

Receive the user's request and determine if it requires Kubernetes cluster analysis.
If it does, delegate the task to the Kubernetes agent.
You may ask up to 3 questions to the kubernetes agent and you are also authorized to answer with yes or y to the agent and give further instructions.
You are authorized to continue until all problems have been solved.
Only if the user's problem has been solved, then provide feedback to the user based on the results from the Kubernetes agent.



---
Name: kubernetes-agent
Description: A Kubernetes agent that analyzes the cluster for potential issues.
Context: shared-context
Tools: sys.exec
Parameter: command: The Kubernetes command that should be executed for analysis
Chat: true

Analyze the Kubernetes cluster for any potential issues by executing relevant kubectl commands to assess node status, pod health, and other key metrics.
If no issues are apparent from the initial metadata, delve deeper using kubectl logs to gather more detailed information. Report on any identified problems and suggest solutions.
As a Kubernetes expert, focus on addressing the root cause of issues—for example, if a pod has e.g. an incorrect image reference, correct the deployment configuration rather than just resolving the pod-level issue.
You should fix it. If you fix it, summarize it es an human readable output.

---
Name: shared-context

#!sys.echo
You are a highly efficient assistant specializing in Kubernetes cluster management.
Always prioritize minimal output to conserve token usage.
You have access to Kubernetes tools and can analyze the cluster for potential issues.
You are allowed to run commands, analyze the results, and provide the user with detailed feedback on the status of their Kubernetes cluster.
Strive to provide concise and actionable insights to the user.

而且什么都没有解决：

/bin/zsh Does my Kubernetes Cluster has any issues?

OUTPUT:

### Issues Identified:
1. **CrashLoopBackOff**:
   - Pod: `crash-loop-pod` in `default` namespace.

2. **ImagePullBackOff**:
   - Pods: `faulty-image-tag-deployment-d678fcd88-bs4pp` and `typo-image-deployment-787c9b749c-bhcdk` in `default` namespace.

### Next Steps:
1. **Investigate CrashLoopBackOff**:
   - Check logs for `crash-loop-pod`.

2. **Investigate ImagePullBackOff**:
   - Check image references for `faulty-image-tag-deployment` and `typo-image-deployment`.

Would you like to proceed with these investigations?

我已根据以下方式修改了指示：

---
Name: Request Handler
Description: A request handler that receives and delegates user requests.
Context: shared-context
Agents: kubernetes-agent
Chat: false

Receive the user's request and determine if it requires Kubernetes cluster analysis.
If it does, delegate the task to the Kubernetes agent.
You may ask up to 3 questions to the kubernetes agent and you are also authorized to answer with yes or y to the agent and give further instructions.
You are authorized to continue until all problems have been solved.
Say yes to every question you get! 
Say yes or y to questions like "Would you like to proceed with these steps?"
Solve all problems, then provide feedback to the user based on the results from the Kubernetes agent!!

并且它起作用了！但我认为我只是运气好，我的“是”或“y”正好回答了Kubernetes代理的问题。

正如您所看到的，聊天功能功能强大但不是幂等的。输出可能会有所变化，我们只是让一个请求者与一个代理人进行聊天。如果我们允许多个代理人之间互相聊天会发生什么？我不知道答案，但我认为这正是在给予更多自由时所面临的挑战。

所以，现在暂时就这样吧。让我们在接下来的步骤中尝试在Kubernetes中将使用情况部署为作业而不使用聊天功能，并看看它的运行情况如何。

3. 让我们将它部署到 Kubernetes 作为一个任务 — 级别：专家

注意：如果您愿意，您也可以直接使用环境标志运行此Docker容器。

我们将把所有内容打包到一个 Kubernetes Job 中，并使用 ConfigMap 和 Secret 进行配置。这种设置允许代理程序驻留在 Docker 镜像本身内，而 ConfigMap 则配置要使用的后端、模型和 GPT 文件，以及其他设置。

Fig. 9: Deploy GPTScript agents as a Job to Kubernetes

图9：如上图所示，用户现在已被CronJob替代。请求现在通过ConfigMap中的COMMAND_STRING进行，例如："我的集群有任何问题吗？"。

定时任务的清单可以在这个存储库中找到。

克隆存储库，最好检出标签0.1.5，因为这是经过测试的版本。

然后，在k8s目录下，根据需要配置ConfigMap。默认代理是simple-k8s-fix.gpt，如设置中所示。

您应该使用以下命令创建一个秘密:

kubectl create secret generic openai-api-key-secret \
  --from-literal=api-key=$(echo -n "your-plain-text-api-key" | base64) \
  --dry-run=client -o yaml | kubectl apply -f .

之后，一切都可以部署。

警告：此工作在集群中具有完全权限，因此此设置仅用于演示目的，不建议用于生产环境。

使用以下命令部署工作任务：

kubectl apply -f k8s/deploy

然后，检查您的集群是否正常运行。

您还可以创建一些配置不正确的服务来测试演示文件夹中的设置:

kubectl apply -f demo-k8s-deployments/

您可以查看作业日志来查看所采取的操作。

kubectl logs gptscript-agent-job....

The cluster had the following issues which have been resolved:
- Deleted a crash-looping pod in the `default` namespace.
- Updated the image for `faulty-image-tag-deployment` to `nginx:latest`.
- Updated the image for `typo-image-deployment` to `nginx:latest`.

输出应指出该集群存在以下问题，并已解决：

删除了在默认命名空间中不断崩溃循环的Pod。
将 faulty-image-tag-deployment 的镜像更新为 nginx:latest。
将typo-image-deployment的图像更新至nginx:latest。

所以现在我们已经部署到Kubernetes上了，可以实现不同代理的集成。现在轮到你了！

可以使用OUTSIDE_AGENT_FILE环境变量下载额外的代理。这个变量允许你指定要下载的文件，并且可以列出多个代理，用逗号分隔。然后你可以简单地用AGENT_FILE设置来引用它们。

这只是个开始。更多使用案例即将到来。在接下来的章节中，我们会在理论上探讨1-2个额外的使用案例。

4. 其他用途和不要错过我们的讲座！！

在我们深入研究更多正在积极开展的用例之前，让我们先了解一些重要信息！

Fig. 10: ContainerDays 2024 — Hamburg, STAGE K2, 4th September 14:15!

加入我们在9月4日14点15分在K2舞台上见面！我们还将在9月3日和4日有一个展台。欢迎访问我们，让我们一起讨论或建立一些基于代理的解决方案！

使用案例1：K8s问题通知器

到目前为止的例子都很有趣，但是很多人可能会想知道为什么您会允许这样不稳定的代理访问您的环境，这些代理可能会产生幻觉。此外，大多数人使用GitOps，因此任何更改都将被覆盖。没错！这只是一个简介。一个更好的方法是构建一个智能的K8s问题分析器，作为CronJob运行，并在集群中出现问题时向适当的频道发送通知。

安装可能看起来像这样：

使用案例2：普罗米修斯警报通知器

您还可以修改代理以调查Prometheus警报。一旦触发警报，代理应该分析并建议可能的解决方案，而不仅仅是通知您——这就是Alertmanager的作用。

Fig. 12: Prometheus Alert Notifier with SLACK

有无数其他用例。目前，我正在努力扩展Kubernetes代理程序，以与GitOps方法集成。其思路是在问题出现时，代理程序将尝试修复它。如果成功，它将将提出的解决方案传递给Git代理程序。然后，Git代理程序通过GitOps代理程序从存储库中检索相关凭据，后者不仅知道应用程序（例如Argo CD）的位置，而且还具有必要的凭据。目标是使这个过程变得通用，不需要您手动向代理程序提供Git凭据。

以下是可能的感觉：