在知识图谱的应用中提高RAG的准确性

Sure, here's the translated text in simplified Chinese: ``` 使用Neo4j和LangChain在RAG应用程序中构建和检索知识图的实用指南 ```

Image created by DALL-E.

Sure, here's the translated text in simplified Chinese within an HTML structure: ```html

图检索增强生成(GraphRAG)正在迅速发展,并成为传统向量搜索检索方法的强大补充。该方法利用图数据库的结构化特性,将数据组织为节点和关系,以增强检索到的信息的深度和上下文。

```
Example of a knowledge graph.

Sure, here's the translated text in simplified Chinese, keeping the HTML structure: ```html

图表非常擅长以结构化方式表示和存储异构且相互连接的信息,毫不费力地捕捉跨多种数据类型的复杂关系和属性。相比之下,向量数据库常常在处理这种结构化信息时遇到困难,因为它们的优势在于通过高维向量处理非结构化数据。在您的RAG应用程序中,您可以通过将结构化图数据与向量搜索相结合,以实现两者的最佳效果。这正是我们将在本博客文章中展示的内容。

```

Sure, here's the simplified Chinese translation of your text: ```html 知识图谱很棒,但如何创建一个?

知识图谱很棒,但如何创建一个?

```

在构建知识图谱时,通常是最具挑战性的步骤。这涉及收集和结构化数据,需要对领域和图模型都有深刻的理解。

Sure, here's the translated text in simplified Chinese while keeping the HTML structure: ```html

为了简化这个过程,我们一直在尝试LLM。借助他们对语言和上下文的深刻理解,LLM可以自动化知识图谱创建过程的重要部分。通过分析文本数据,这些模型可以识别实体,理解它们的关系,并建议如何最好地在图形结构中表示它们。

```

Sure, here is the translated text in simplified Chinese: 由于这些实验的结果,我们在LangChain中添加了图构建模块的第一个版本,我们将在本博客文章中进行演示。

Sure, here is the translation of "The code is available on GitHub." in simplified Chinese while maintaining HTML structure: ```html

代码可以在GitHub上找到。

```

Neo4j环境设置

你需要设置一个 Neo4j 实例。请按照这篇博客文章中的示例操作。最简单的方法是在 Neo4j Aura 上启动一个免费实例,它提供 Neo4j 数据库的云实例。或者,您也可以通过下载 Neo4j Desktop 应用程序并创建一个本地数据库实例来设置 Neo4j 数据库的本地实例。

os.environ["OPENAI_API_KEY"] = "sk-"
os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

graph = Neo4jGraph()

Sure, here's the translated text in simplified Chinese within an HTML structure: ```html

此外,您必须提供 OpenAI 密钥,因为我们将在本博客文章中使用他们的模型。

```

```html Data Ingestion 数据摄取 ```

Here is the translation in simplified Chinese, while maintaining the HTML structure: ```html

为了这个演示,我们将使用伊丽莎白一世的维基百科页面。我们可以使用LangChain加载器无缝地获取并分割维基百科的文档。

``` In this HTML snippet: - `

` and `

` are paragraph tags to structure the content. - Chinese text is provided within the paragraph tags. This maintains the structure and layout of the original English text while translating it to simplified Chinese.
# Read the wikipedia article
raw_documents = WikipediaLoader(query="Elizabeth I").load()
# Define chunking strategy
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(raw_documents[:3])

Sure, here's the translated text in simplified Chinese within an HTML structure: ```html

是时候根据检索到的文档构建图形了。为此,我们实现了一个LLMGraphTransformer模块,可以显著简化在图数据库中构建和存储知识图的过程。

```
llm=ChatOpenAI(temperature=0, model_name="gpt-4-0125-preview")
llm_transformer = LLMGraphTransformer(llm=llm)

# Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)
# Store to neo4j
graph.add_graph_documents(
graph_documents,
baseEntityLabel=True,
include_source=True
)

```html

您可以定义知识图生成链使用哪个LLM。目前,我们仅支持来自OpenAI和Mistral的函数调用模型。但是,我们计划在未来扩展LLM选择。在本示例中,我们使用最新的GPT-4。请注意,生成的图的质量显着取决于您使用的模型。理论上,您始终希望使用最有能力的模型。LLM图转换器返回图文档,可以通过add_graph_documents方法导入到Neo4j中。baseEntityLabel参数为每个节点分配一个额外的__Entity__标签,增强索引和查询性能。include_source参数将节点链接到其来源文档,便于数据可追溯性和上下文理解。

```

Sure, here's the translation in simplified Chinese, keeping the HTML structure: ```html 您可以在Neo4j浏览器中检查生成的图形。 ```

Part of the generated graph.

Sure, here's the translation: ```html

请注意,此图像仅代表生成图的一部分。
```

Sure, here's the translated text in simplified Chinese, keeping the HTML structure: ```html Hybrid Retrieval for RAG ```

Sure, here's the translation: 在图形生成之后,我们将使用混合检索方法,结合向量和关键词索引与图形检索,用于RAG应用。

Combining hybrid (vector + keyword) and graph retrieval methods. Image by author.

```html

图表说明了一个检索过程,从用户提出问题开始,然后将问题定向到一个RAG检索器。这个检索器利用关键词和向量搜索来搜索非结构化文本数据,并将其与从知识图中收集的信息相结合。由于Neo4j同时具有关键词和向量索引,您可以使用单个数据库系统实现所有三种检索选项。从这些来源收集的数据被输入到LLM中,以生成并提供最终答案。
```

To translate "Unstructured Data Retriever" into simplified Chinese while keeping the HTML structure, you would use the following: ```html 非结构化数据检索器 ``` This HTML code ensures that the translated text "非结构化数据检索器" (which means "Unstructured Data Retriever" in simplified Chinese) is correctly interpreted by browsers and other applications that process HTML.

Sure, here's the translated text in simplified Chinese, while maintaining the HTML structure: ```html 你可以使用 Neo4jVector.from_existing_graph 方法向文档中添加关键字和向量检索。此方法配置了用于混合搜索方法的关键字和向量搜索索引,目标是标记为 Document 的节点。此外,如果缺失,它还会计算文本嵌入值。 ```

vector_index = Neo4jVector.from_existing_graph(
OpenAIEmbeddings(),
search_type="hybrid",
node_label="Document",
text_node_properties=["text"],
embedding_node_property="embedding"
)

Sure, here is the translation: ```html

矢量索引可以使用 similarity_search 方法调用。

```

Sure, here's the translation: ```html 图检索器标签> ```

Sure, here's the translated text in simplified Chinese, while maintaining the HTML structure: ```html

另一方面,配置图检索更为复杂,但提供更多自由。此示例将使用全文索引来识别相关节点并返回其直接邻居。

```
Graph retriever. Image by author.

Sure, here's the translated text: 图形检索器首先通过识别输入中的相关实体来启动。为简单起见,我们指示LLM识别人员、组织和位置。为了实现这一点,我们将使用具有新添加的with_structured_output方法的LCEL来实现这一点。

# Extract entities from text
class Entities(BaseModel):
"""Identifying information about entities."""

names: List[str] = Field(
...,
description="All the person, organization, or business entities that "
"appear in the text",
)

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are extracting organization and person entities from the text.",
),
(
"human",
"Use the given format to extract information from the following "
"input: {question}",
),
]
)

entity_chain = prompt | llm.with_structured_output(Entities)

Sure, here's the translation: ```html 让我们来测试一下: ``` This HTML structure preserves the text while allowing it to be displayed in simplified Chinese.

entity_chain.invoke({"question": "Where was Amelia Earhart born?"}).names
# ['Amelia Earhart']

Sure, here's the translated text in simplified Chinese, keeping the HTML structure intact: ```html 太好了,现在我们可以检测问题中的实体了,让我们使用全文索引将它们映射到知识图谱。首先,我们需要定义一个全文索引和一个函数,用于生成允许有一点拼写错误的全文查询,这里我们不会详细讨论。 ```

graph.query(
"CREATE FULLTEXT INDEX entity IF NOT EXISTS FOR (e:__Entity__) ON EACH [e.id]")

def generate_full_text_query(input: str) -> str:
"""
Generate a full-text search query for a given input string.

This function constructs a query string suitable for a full-text search.
It processes the input string by splitting it into words and appending a
similarity threshold (~2 changed characters) to each word, then combines
them using the AND operator. Useful for mapping entities from user questions
to database values, and allows for some misspelings.
"""
full_text_query = ""
words = [el for el in remove_lucene_chars(input).split() if el]
for word in words[:-1]:
full_text_query += f" {word}~2 AND"
full_text_query += f" {words[-1]}~2"
return full_text_query.strip()

Sure, here is the translated text in simplified Chinese: 让我们现在把所有东西都放在一起。

# Fulltext index query
def structured_retriever(question: str) -> str:
"""
Collects the neighborhood of entities mentioned
in the question
"""
result = ""
entities = entity_chain.invoke({"question": question})
for entity in entities.names:
response = graph.query(
"""CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})
YIELD node,score
CALL {
MATCH (node)-[r:!MENTIONS]->(neighbor)
RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
UNION
MATCH (node)<-[r:!MENTIONS]-(neighbor)
RETURN neighbor.id + ' - ' + type(r) + ' -> ' + node.id AS output
}
RETURN output LIMIT 50
""",
{"query": generate_full_text_query(entity)},
)
result += "\n".join([el['output'] for el in response])
return result

Sure, here's the translation in simplified Chinese: ```html

structured_retriever函数首先检测用户问题中的实体。接下来,它会遍历检测到的实体,并使用Cypher模板来检索相关节点的邻居。让我们来测试一下吧!

```
print(structured_retriever("Who is Elizabeth I?"))
# Elizabeth I - BORN_ON -> 7 September 1533
# Elizabeth I - DIED_ON -> 24 March 1603
# Elizabeth I - TITLE_HELD_FROM -> Queen Of England And Ireland
# Elizabeth I - TITLE_HELD_UNTIL -> 17 November 1558
# Elizabeth I - MEMBER_OF -> House Of Tudor
# Elizabeth I - CHILD_OF -> Henry Viii
# and more...

Sure, here's the translation in simplified Chinese while keeping the HTML structure: ```html
终极检索器
```

Sure, here's the translated text in simplified Chinese within an HTML structure: ```html

如开头所述,我们将结合非结构化和图检索器来创建传递给LLM的最终上下文。

```
def retriever(question: str):
print(f"Search query: {question}")
structured_data = structured_retriever(question)
unstructured_data = [el.page_content for el in vector_index.similarity_search(question)]
final_data = f"""Structured data:
{structured_data}
Unstructured data:
{"#Document ". join(unstructured_data)}
"""
return final_data

当我们在处理Python时,我们可以使用f-string简单地连接输出。

定义RAG链

Sure, here's the translated text: ```html 我们已成功实现了RAG的检索组件。接下来,我们引入一个提示,利用集成的混合检索器提供的上下文来生成响应,完成RAG链的实现。 ```

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
RunnableParallel(
{
"context": _search_query | retriever,
"question": RunnablePassthrough(),
}
)
| prompt
| llm
| StrOutputParser()
)

Sure, here is the translation of "Finally, we can go ahead and test our hybrid RAG implementation." in simplified Chinese: ```html 最后,我们可以继续测试我们的混合RAG实现。 ``` This will keep the structure of your HTML intact while providing the translated text.

chain.invoke({"question": "Which house did Elizabeth I belong to?"})
# Search query: Which house did Elizabeth I belong to?
# 'Elizabeth I belonged to the House of Tudor.'

以下是经过修改的查询重写功能,使RAG链能够适应允许后续问题的对话设置。考虑到我们使用向量和关键词搜索方法,我们必须重写后续问题以优化我们的搜索过程。

chain.invoke(
{
"question": "When was she born?",
"chat_history": [("Which house did Elizabeth I belong to?", "House Of Tudor")],
}
)
# Search query: When was Elizabeth I born?
# 'Elizabeth I was born on 7 September 1533.'

你可以观察到,原本的“她是什么时候出生的?”被首先改写成了“伊丽莎白一世是什么时候出生的?”然后使用改写后的查询来检索相关的上下文和回答这个问题。

Sure, here's the translation: 简化中文: "轻松提升 RAG 应用程序"

```html

随着LLMGraphTransformer的引入,生成知识图的过程现在应该更加顺畅和易于访问,这将使任何希望通过知识图提供的深度和上下文来增强其RAG应用程序的人更容易。这只是一个开始,因为我们计划了许多改进。

```

Sure, here's the text translated into simplified Chinese, maintaining the HTML structure: ```html 如果您对我们使用LLMs生成图表有任何见解、建议或疑问,请随时联系我们。 ```

代码可在GitHub上找到。

2024-06-11 05:15:27 AI中文站翻译自原文