探索 Hugging Face:摘要化

文本摘要化

Photo by Kelly Sikkema on Unsplash

在自然语言处理中,总结任务是指自动生成一个长文本文档的简洁连贯摘要的过程。

Extractive and Abstractive Summarization. Source

在自然语言处理中,有两种主要的总结类型:

  • 抽取式摘要:该方法涉及识别和提取原始文本中的关键短语、句子或段落,并将它们组合起来形成摘要。其目标是选择文本中捕捉核心含义或重要信息的最有代表性的部分。得到的摘要是原始文本中出现内容的子集。
  • 抽象概括:该方法涉及生成一个新的文本,以更简洁的形式传达原始文本的基本含义。与抽取式概括不同,抽象概括可能包含重新表述或使用原始文本中没有的新词。这种方法需要较深的理解和语言生成能力,因为它涉及改写、概括和整合信息。

汇总任务的设置是接受一段文本并输出一个较短、概括的版本。

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """
Kanye West decided to join a local amateur soccer game in the park.
Wearing his signature sunglasses, he attempted to blend in.
The game was fun and lighthearted, but Kanye, not being a seasoned
soccer player, accidentally kicked the ball into a nearby picnic,
scattering sandwiches and startling a group of ducks.

The ducks, now part of the game, waddled around with the soccer ball,
quacking happily. Kanye couldn't help but laugh. He ended up spending
the afternoon chasing the ball with the ducks, much to the amusement
of everyone in the park.

In the end, Kanye thought, "Maybe I'll stick to music and fashion,"
but he enjoyed every quirky moment of his soccer adventure.
"""

print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

"""
[{'summary_text': 'Kanye West joined a local amateur soccer game
in the park. He accidentally kicked the ball into a nearby picnic,
scattering sandwiches and startling a group of ducks. Kanye ended up
spending the afternoon chasing the ball with the ducks.'}]
"""
  • max_length=130:此参数指定了摘要的最大长度。
  • min_length=30: 此参数设置摘要的最小长度。
  • do_sample=False:这个参数确定了生成摘要的方法。当do_sample被设置为False时,模型将使用贪婪算法来选择下一个被认为是摘要中正确的部分的标记。这通常会导致输出更加确定性和少变化。
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=True))

"""
[{'summary_text': 'Kanye West joined an amateur soccer game in a park.
He accidentally kicked the ball into a nearby picnic, scattering sandwiches
and startling a group of ducks. He ended up spending the afternoon chasing
the ball with the ducks.'}]
"""

一个抽象模型:

summarizer = pipeline("summarization", model="Atharvgarg/distilbart-xsum-6-6-finetuned-bbc-news-on-abstractive")

print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

"""
[{'summary_text': 'the game was fun and lighthearted, but kanye,
not being a seasoned soccer player, accidentally kicked the ball
into a nearby picnic, scattering sandwiches and startling a group of ducks.
the ducks, now part of the game, waddled around with the soccer ball,
quacking happily. kanye decided to join a local amateur soccer game in
the park. the game was also a quirky way to get to know kanye was not
a professional ballerin, but he enjoyed every quirky moment of his soccer
adventure. he ended up spending the afternoon chasing the ball with the daves,
much to the amusement of everyone in park.'}]
"""

阅读更多

来源

https://huggingface.co/zh/docs/transformers/main_classes/pipelines

https://huggingface.co/facebook/bart-large-cnn

探索文本摘要的抽取方法

在本篇博文中,我们将探索文本摘要的抽取方法。 文本摘要是提炼源文本中重要信息的过程。 它可以帮助读者快速了解文本的内容,并且对于处理大量信息的应用程序来说非常有用。

什么是文本摘要?

文本摘要是源文本的简化版本,仅包含最重要的信息和关键词。 它旨在提供关键信息的精确概述,帮助读者了解文本的基本内容,并决定是否需要进一步阅读。

文本摘要的类型

文本摘要可以根据抽取方法和生成方法进行分类。

抽取式摘要

抽取式摘要直接从源文本中选择最相关和重要的句子或短语。 它不涉及生成新的语句或短语,而是依赖于已有的信息提炼。

生成式摘要

生成式摘要使用自然语言处理技术生成全新的摘要。 它不仅仅是选择和提取源文本中的句子或短语,而是可以创造全新的内容。

抽取式摘要的方法

抽取式摘要的方法包括:

  • 基于频率的方法
  • 基于位置的方法
  • 基于统计的方法
  • 基于机器学习的方法

这些方法都试图从给定的文本中选择最具代表性和信息丰富的句子或短语,以形成摘要。

结语

文本摘要是处理大量信息和概念的有用工具。 无论是抽取式摘要还是生成式摘要,它们都可以帮助读者迅速了解文本内容,提供简洁的概述。

保持HTML结构,将以下英文文本翻译为简体中文:https://huggingface.co/Atharvgarg/distilbart-xsum-6-6-finetuned-bbc-news-on-abstractive 抱歉,我是一个文本生成AI助手,无法直接访问链接并翻译页面上的文本。如果您有其他的英文文本需要翻译为简体中文,我很乐意帮助您。

在医疗保健中的抽取式摘要与生成式摘要对比网址:https://www.abstractivehealth.com/extractive-vs-abstractive-summarization-in-healthcare

2024-01-22 04:52:51 AI中文站翻译自原文