使用Infinite.Tech Fine-Tuning一个定制的GPT模型，并利用个人聊天记录进一步改进。

新年即将来临，直到下一年，体验相对进步和变化将是令人惊叹的。人工智能在聊天机器人之前就被广泛运用，与一般聊天机器人相对呆板的关系已经成为一种同质化的智能风格。同时，广告商和算法利用人工智能能够高度准确地读取你内心深处的想法。

我对使用我的Chat-GPT对话历史来优化模型的这个实验并没有太多的期待。我多次提到过这个实验，但朋友们都对此不置可否。我对调整完之后的效果感到兴奋，但并没有抱太大期望。

训练了一台基于自己的自定义人工智能，嗯，是的，第一次看起来就像是面对着一面奇怪的新型镜子。

它很可怕，一种体会到你是你的共情第三人称认知。

不知道，科幻可能已经捕捉到了这个体验，或者我们可能需要一个新术语来描述这种体验 - 未来.罗布 x.com

以下记录了培训未来机器人（FR1）的实验和观察结果。

过程

这包括前往Chat-GPT，导出你的对话，将它们重新格式化为训练集，然后使用Open-AI API playground来对模型进行微调。

然后，使用Infinite.Tech测试并观察模型，以了解其在一般领域中的有用性和有效性。

步骤

从Open-AI ChatGPT下载对话

前往聊天-GPT，并在底部右击您的图标，然后请求导出您的数据。

Open-AI将给您发送一封电子邮件，其中包含下载包含conversations.json文件的文件夹的链接。

将对话转换为Fine-Tune JSONL格式。

文件需要从对话中转换为训练集。可以使用下面的Python脚本来完成这个任务。我还需要手动删除一些文件。

Python脚本

将此脚本保存到一个名为convert_conversations_to_fine_tune_set.py的文件中，并在带有命令提示符的文件夹中运行此脚本。

python conversationsToTuningSet.py [line_count]

# Run this script in a folder with conversations.json file.
# Creates a jsonl file that can be used as a tuning set for the chatbot
#
# Usage:
# python conversationsToTuningSet.py [line_count]

import json
import sys
import random

# Load your dataset (replace with the path to your dataset file)
with open('conversations.json') as file:
    data = json.load(file)

# Function to process each entry

def process_dataset(data):
    processed_data = []

    for entry in data:
        messages = []
        title = entry.get('title', '')
        if(title == None):
            title = "No title"
        mapping = entry.get('mapping', {})

        # Adding the title as a system message
        # messages.append({"role": "system", "content": title})
        newMessage = {"messages":[{ "role": "system", "content": title}]}

        
        # Iterating through the messages in the mapping and adding them to the
        for key, value in mapping.items():
            message_info = value.get('message')
            if message_info:
                role = message_info.get('author', {}).get('role')
                content = message_info.get('content')
                parts = content.get('parts')

                # Skip system messages
                if(role == "system" or role == "tool"):
                    continue
                    
                if role and parts and len(parts) > 0:
                    newMessage["messages"].append({"role": role, "content": parts[0]})
        
        # Only add conversations with more than 2 messages
        if len(newMessage["messages"]) < 2:
            continue
    
        processed_data.append(json.dumps(newMessage))

    return processed_data

# Process the dataset
processed_data = process_dataset(data)

# get the argument for line count if it exists and randomly reduce the dataset to that size
if len(sys.argv) > 1:
    line_count = int(sys.argv[1])
    processed_data = random.sample(processed_data, line_count)


# Attempt to encode with utf-8 and ignore errors
processed_data = [line.encode('utf-8', errors='ignore').decode('utf-8') for line in processed_data]


with open('conversations_processed.jsonl', 'w') as file:
    for line in processed_data:
        file.write(line + '\n')