Sure, here's the translation in simplified Chinese, keeping the HTML structure: ```html 提取来自 Gemini Flash 的棘手 PDF 文件中的结构化数据 ``` This HTML structure keeps the text properly formatted while displaying the Chinese translation.

We’ll be using a very fast and efficient vision-language model called Gemini Flash to extract structured data from visually tricky PDF documents.

Certainly! Here's the translation in simplified Chinese while keeping the HTML structure intact: ```html 在本指南中,我将向您展示如何使用视觉语言模型(VLMs)如 Gemini Flash 或 GPT-4o 从 PDF 中提取结构化数据。 ``` In this HTML snippet: - The text content is translated into simplified Chinese. - The structure and formatting (if any) of the HTML are preserved for embedding into a web page or document.

```html

Gemini,谷歌最新的视觉语言模型系列,展示了在文本和图像理解方面的最新技术表现。这种改进的多模态能力和长上下文窗口使其特别适用于处理传统提取模型难以处理的视觉复杂的PDF数据,如图形、图表、表格和图解。

```

Sure, here's how you can translate that into simplified Chinese while keeping the HTML structure: ```html

通过这样做,您可以轻松地为视觉文件和网页提取构建自己的数据提取工具。以下是具体步骤:

``` This HTML structure maintains the original text formatting and allows for easy integration into web pages or documents.
Gemini’s long context window and multimodal capability makes it particularly useful for processing visually complex PDF data where traditional extraction models struggle.

Certainly! Here's how you can translate "Setting Up Your Environment" into simplified Chinese while keeping the HTML structure intact: ```html

设置您的环境

``` This HTML snippet ensures the translated text maintains its structure and can be integrated directly into a web page.

Certainly! Here's how you can structure your HTML document while including the translated text in simplified Chinese: ```html

Setting Up Development Environment

设置开发环境

在我们开始数据提取之前,请先设置好开发环境。本指南假设您的系统已经安装了 Python。如果没有,请从 https://www.python.org/downloads/ 下载并安装。

``` In this HTML structure: - `` specifies that the language of the document is simplified Chinese. - `

设置开发环境

` translates to "Setting Up Development Environment". - `

...

` contains the translated text. The link to Python's download page remains in English as it's a URL and typically left in its original language for clarity. This setup retains the original HTML structure while incorporating the translated content in simplified Chinese.

Sure, here's the HTML structure with the text translated to simplified Chinese: ```html

⚠️ 请注意,如果您不想使用Python,您可以使用云平台 thepi.pe 来上传您的文件,并下载CSV格式的结果,而无需编写任何代码。

``` This HTML structure maintains the original formatting and includes the translated text in simplified Chinese.

Sure, here's the translation in simplified Chinese while keeping the HTML structure: ```html 安装所需的库 ```

在你的终端或命令提示符中输入以下命令:

pip install git+https://github.com/emcf/thepipe
pip install pandas

Certainly! Here's how you could structure and translate the text into simplified Chinese while keeping HTML tags intact: ```html

对于初学Python的人来说,pip 是 Python 的包安装程序,使用以下命令可以下载并安装必要的库。

``` In this HTML snippet: - `

` and `

` denote a paragraph tag. - The Chinese text inside the `

` tags translates the original English sentence into simplified Chinese. This maintains the structure while providing the translation.

Certainly! Here's how you can write "Set Up Your API Key" in simplified Chinese within an HTML structure: ```html 设置您的API密钥 ``` This translation preserves the meaning and is structured to fit within an HTML document.

Sure, here's the translation of "To use the pipe, you need an API key" in simplified Chinese while keeping the HTML structure intact: ```html

要使用该管道,您需要一个 API 密钥。

```

Certainly! Here's the HTML structure with the translated text in simplified Chinese: ```html

免责声明:尽管 thepi.pe 是一个免费开源工具,但 API 使用会产生费用,大约每个令牌 $0.00002。如果您希望避免这些费用,请查阅 GitHub 上的本地设置说明。请注意,您仍需支付您选择的 LLM 服务提供商的费用。

``` In this HTML snippet: - `

` tags are used to enclose the translated text for paragraph formatting. - The English text has been faithfully translated into simplified Chinese, maintaining the original structure and context.

Certainly! Here's how you would structure it in HTML and translate the text into simplified Chinese: ```html

这是如何获取和设置它的方法:

``` In this HTML snippet, the text "Here’s how to get and set it up:" is translated into simplified Chinese as "这是如何获取和设置它的方法:".

Sure, here's the text translated to simplified Chinese, formatted for HTML: ```html 1. 访问 https://thepi.pe/platform/2 2. 创建一个帐户或登录 3. 在设置页面找到您的 API 密钥 ``` This maintains the HTML structure and provides the translated text in simplified Chinese.

To translate the English text "Now, you need to set this as an environment variable. The process varies depending on your operating system:" into simplified Chinese and keep it in HTML structure, you would use the following: ```html

现在,您需要将此设置为环境变量。该过程因您的操作系统而异:

``` In this HTML snippet: - `

` tags are used to enclose the translated text, assuming it's a paragraph in your HTML document. - The Chinese text inside the `

` tags is the translation of the provided English sentence.

Copy the API key from the settings menu on thepi.pe Platform

Sure, here's the translation of "For Windows" in simplified Chinese while maintaining HTML structure: ```html 适用于 Windows: ``` This HTML snippet ensures that the text "适用于 Windows:" is displayed in simplified Chinese, indicating compatibility with Windows operating systems.

Certainly! Here is the HTML structure with the translated text in simplified Chinese: ```html

1. 在“开始”菜单中搜索“环境变量”

2. 单击“编辑系统环境变量”

3. 单击“环境变量”按钮

4. 在“用户变量”下,单击“新建”

5. 将变量名称设置为THEPIPE_API_KEY,值设置为您的API密钥

6. 单击“确定”保存

``` This HTML structure preserves the numbered steps in the original text while presenting them in simplified Chinese.

Sure, here's the translation in simplified Chinese, keeping the HTML structure: ```html

对于 macOS 和 Linux:

```

Sure, here's the translation in simplified Chinese while maintaining HTML structure: ```html 打开你的终端,并将以下行添加到你的shell配置文件(例如,~/.bashrc或~/.zshrc): ``` This HTML structure keeps the translated Chinese text in a format that can be embedded in an HTML document.

export THEPIPE_API_KEY=your_api_key_here

Sure, here's how you can structure the HTML while providing the translation: ```html

Then, reload your configuration:

``` Translated to simplified Chinese: ```html

然后,重新加载您的配置:

``` This HTML snippet maintains the structure and simply changes the text content to the translated Chinese version.
source ~/.bashrc # or ~/.zshrc

Certainly! Here's the text "Defining Your Extraction Schema" translated into simplified Chinese while maintaining HTML structure: ```html 定义您的抽取模式 ``` This HTML snippet retains the structure while providing the translated text in simplified Chinese.

Certainly! Here's the translated text in simplified Chinese, keeping the HTML structure: ```html

成功提取的关键在于为您想要提取的数据定义清晰的模式。假设我们从一个清单文件中提取数据:

``` This HTML snippet now contains the translated text in simplified Chinese.
An example of a page from the Bill of Quantity document. The data on each page is independent of the other pages, so we do our extraction “per page”. There are multiple pieces of data to extract per page, so we set multiple extractions to True

To translate "Looking at the column names, we might want to extract a schema like this:" into simplified Chinese while keeping the HTML structure, you can use the following: ```html 看着列名,我们可能希望提取这样的模式: ``` This HTML snippet preserves the structure and inserts the Chinese translation appropriately.

schema = {
“item”: “string”,
“unit”: “string”,
“quantity”: “int”,
}
You can modify the schema to your liking on thepi.pe Platform. Clicking “View Schema” will give you a schema you can copy and paste for use with the Python API

Sure, here is the translated text in simplified Chinese while keeping the HTML structure: ```html 提取PDF中的数据 ``` This HTML snippet keeps the original text "Extracting Data from PDFs" but translated into simplified Chinese.

Sure, here's the HTML structure with the translated text in simplified Chinese: ```html

现在,让我们使用 extract_from_file 方法从PDF中提取数据。在这里,我们将 chunking_method 设置为“chunk_by_page”,因为我们希望将每一页单独发送到AI模型(PDF文件太大,无法一次性处理)。我们还将 multiple_extractions 设置为 True,因为每个PDF页面都包含多行数据。以下是PDF页面的示例:

``` This HTML snippet maintains the structure while incorporating the translated text in simplified Chinese.
from thepipe.extract import extract_from_file
results = extract_from_file(
file_path = “bill_of_quantity.pdf”,
schema = schema,
ai_model = ”google/gemini-flash-1.5b”,
chunking_method = "chunk_by_page"
)

Certainly! Here's how you can structure the HTML to display the translated text in simplified Chinese: ```html

Translation Note

⚠️ Note: google/gemini-flash-1.5b is experiencing elevated error rates with OpenRouter as of July 18, 2024, causing issues with API responses. If this does not work for you, try switching to openai/gpt-4o

注意:截至2024年7月18日,google/gemini-flash-1.5b与OpenRouter存在较高的错误率,导致API响应出现问题。如果这对您不起作用,请尝试切换到openai/gpt-4o。

``` In this HTML structure: - The original English text is included in a `

` element with the warning note. - The translated simplified Chinese text is also included in a `

` element immediately after, prefixed with a `` tag for emphasis. This setup maintains the structure and ensures the translated content is correctly displayed on a web page.

The results of the extraction for the Bill of Quantity PDF as viewed on thepi.pe Platform

Sure, here is the text "Processing the Results" translated into simplified Chinese while maintaining HTML structure: ```html

处理结果

```

Sure, here's the translation of the text into simplified Chinese, while maintaining the HTML structure: ```html

提取结果将作为字典列表返回。我们可以处理这些结果来创建一个 pandas DataFrame:

``` In this HTML snippet, the Chinese text translates to: "The extraction results are returned as a list of dictionaries. We can process these results to create a pandas DataFrame:"
import pandas as pd
df = pd.DataFrame(results)
# Display the first few rows of the DataFrame
print(df.head())

这将创建一个包含所有提取信息的DataFrame,包括文本内容以及图表和表格等视觉元素的描述。

Sure, here's how you can write "Exporting to Different Formats" in simplified Chinese within an HTML structure: ```html 导出至不同格式 ``` This HTML snippet preserves the structure while providing the translated text.

Certainly! Here's the translated text in simplified Chinese, keeping the HTML structure intact: ```html 现在我们已经将数据存储在 DataFrame 中,我们可以轻松地将其导出为各种格式。以下是一些选项: ``` Feel free to use this in your HTML document.

Sure, here's the translation of "Exporting to Excel" in simplified Chinese, while keeping the HTML structure: ```html 导出到Excel ``` In this HTML snippet, "导出到Excel" represents the translated text "Exporting to Excel" in simplified Chinese.

df.to_excel(“extracted_research_data.xlsx”, index=False, sheet_name=”Research Data”)

Certainly! Here's how you can structure your HTML to display the translated text in simplified Chinese: ```html

这将创建一个名为“extracted_research_data.xlsx”的Excel文件,其中包含一个名为“Research Data”的工作表。参数index=False可以防止DataFrame索引被包括为单独的列。

``` In this HTML snippet: - `

` tags are used to wrap the translated text to indicate it's a paragraph. - Chinese characters are used directly within the text. This setup maintains the HTML structure while displaying the translated content in simplified Chinese.

Sure, here's how you can write "Exporting to CSV" in simplified Chinese while keeping the HTML structure intact: ```html 导出到CSV ```

Sure, here's the translated text in simplified Chinese, keeping the HTML structure intact: ```html 如果您喜欢更简单的格式,您可以导出为CSV: ``` In HTML, this would appear as: ```html 如果您喜欢更简单的格式,您可以导出为CSV: ```

df.to_csv(“extracted_research_data.csv”, index=False)

这将创建一个可以在Excel或任何文本编辑器中打开的CSV文件。

Certainly! In simplified Chinese, "Ending Notes" translates to "结尾笔记". Here is how you can incorporate it into an HTML structure: ```html 结尾笔记

结尾笔记

``` This HTML structure includes the translated title "结尾笔记" (Ending Notes) within the `` tag and `<h1>` heading for clarity. Adjust the `<meta>` tags and other content as needed for your specific webpage.</h1> <p>Sure, here is the translated text in simplified Chinese, while keeping the HTML structure intact: ```html <span>成功提取的关键在于定义清晰的模式架构,并利用AI模型的多模态能力。</span> ``` This HTML snippet maintains the structure and presents the translated text in simplified Chinese.</p> <p>Sure, here's the text translated into simplified Chinese while maintaining HTML structure: ```html 随着您对这些技术越来越熟悉,您可以探索更高级的功能,如自定义分块方法、自定义提取提示以及将提取过程集成到更大的数据管道中。 ``` This HTML code can be used directly in a web context to display the translated text while preserving any necessary formatting.</p> <p>Sure, here's the translated text in simplified Chinese, while keeping the HTML structure intact: ```html </p> <p>愉快地提取数据,愿你的数据始终结构化,洞见丰富!</p> ```

2024-07-19 04:47:53 AI中文站翻译自原文